This guide explains how to export data from Insider One to Databricks using cloud object storage as the intermediary data layer.
You can currently use these options:
Both methods leverage Insider One’s native export capabilities and Databricks’ ability to ingest data directly from cloud storage buckets.
How does the integration work?
The integration between Insider One and Databricks follows this high-level workflow:
Insider One exports event or user data to a customer-owned cloud storage bucket (Amazon S3 or Google Cloud Storage).
Databricks reads data from that bucket using customer-configured ingestion pipelines.
Data is processed, transformed, or queried in Databricks according to customer needs.
Prerequisites
Before setting up the export to Databricks, make sure you meet the following requirements across Insider One, your cloud provider, and Databricks.
Insider One
Access to an Insider One account with permissions to configure data exports
Ability to configure Amazon S3 or Google Cloud Storage as an export destination
Amazon Web Services (AWS) (if using Amazon S3)
An AWS account
Permissions to create and manage S3 buckets
IAM permissions allowing Insider One to write data to the selected bucket
Google Cloud Platform (GCP) (if using Google Cloud Storage)
A GCP project
Permissions to create and manage Google Cloud Storage buckets
Service account and IAM permissions allowing Insider One to write data to the selected bucket
Databricks
Access to a Databricks workspace
Permissions to configure data ingestion from your chosen cloud storage (S3 or GCS)
After meeting the prerequisites, select the cloud storage provider that fits your infrastructure and follow the corresponding setup steps below.
Option 1: Export via Amazon S3
Follow the steps below to export data from Insider One to Amazon S3 and configure Databricks to ingest the exported files.
Step 1: Set Up Amazon S3 Export in Insider One
Insider One supports exporting data directly to Amazon S3.
Using the Insider One dashboard, you can:
Select or create an Amazon S3 bucket for export
Configure IAM credentials and bucket policies that allow Insider One to write data
Choose export frequency and data types (for example, events or user attributes)
Validate the integration and monitor exported files in S3
Refer to Export Data to Amazon S3 for detailed information.
Step 2: Ingest Data from Amazon S3 into Databricks
Once data is available in Amazon S3, it can be consumed by Databricks using ingestion pipelines. Databricks supports multiple approaches for ingesting data from Amazon S3, including:
Continuous ingestion for near real-time or incremental processing
(for example, streaming-based ingestion using Auto Loader)Batch ingestion for scheduled, ad-hoc, or backfill workloads
(for example, using COPY INTO)
The choice of ingestion method depends on:
Desired data freshness
Pipeline architecture
Operational preferences of your team
Insider One does not require or enforce a specific Databricks ingestion method.
Refer to these Databricks documentation for implementation details:
Option 2: Export via Google Cloud Storage (GCS)
Follow the steps below to export data from Insider One to Google Cloud Storage and configure Databricks to ingest the exported files.
Step 1: Set Up Google Cloud Storage Export in Insider One
Insider One also supports exporting data directly to Google Cloud Storage.
Using the Insider One dashboard, you can:
Select or create a Google Cloud Storage bucket for export
Configure a GCP service account, key, and bucket IAM permissions that allow Insider One to write data
Choose export frequency and data types (for example, events or user attributes)
Validate the integration and monitor exported files in GCS
Refer to Export Data to Google Cloud Storage for further details.
Step 2: Ingest Data from Google Cloud Storage into Databricks
Once data is available in Google Cloud Storage, it can be consumed by Databricks using ingestion pipelines. Databricks supports multiple approaches for ingesting data from Google Cloud Storage, including:
Continuous ingestion for near real-time or incremental processing
(for example, streaming-based ingestion using Auto Loader)Batch ingestion for scheduled, ad-hoc, or backfill workloads
(for example, using COPY INTO)
The choice of ingestion method depends on:
Desired data freshness
Pipeline architecture
Operational preferences of your team
Insider One does not require or enforce a specific Databricks ingestion method.
Refer to these Databricks documentation for implementation details:
Automation and Monitoring
To operate a reliable pipeline:
Configure Insider One export schedules to match your desired data refresh cadence
Use Databricks monitoring and alerting tools to track ingestion health and troubleshoot issues