This guide explains how to export data from Insider One to Databricks using Amazon S3 as the intermediary data lake. This method leverages Insider One’s native Amazon S3 export capability and Databricks’ ability to ingest data directly from S3 buckets.
Overview
Insider One exports event or user data to a customer-owned Amazon S3 bucket.
Databricks reads data from Amazon S3 using customer-configured ingestion pipelines.
Data is processed, transformed, or queried in Databricks according to customer needs.
Prerequisites
Insider One
Access to an Insider One account with permissions to configure data exports
Ability to configure Amazon S3 as an export destination
Amazon Web Services (AWS)
An AWS account
Permissions to create and manage S3 buckets
IAM permissions allowing Insider One to write data to the selected bucket
Databricks
Access to a Databricks workspace
Permissions to configure data ingestion from Amazon S3
Step 1: Set Up Amazon S3 Export in Insider One
Insider One supports exporting data directly to Amazon S3.
Using the Insider One dashboard, you can:
Select or create an Amazon S3 bucket for export
Configure IAM credentials and bucket policies that allow Insider One to write data
Choose export frequency and data types (for example, events or user attributes)
Validate the integration and monitor exported files in S3
For detailed, up-to-date instructions, refer to Export data to Amazon S3.
Step 2: Ingest Data from Amazon S3 into Databricks
Once data is available in Amazon S3, it can be consumed by Databricks using ingestion pipelines. Databricks supports multiple approaches for ingesting data from Amazon S3, including:
Continuous ingestion for near real-time or incremental processing (for example, streaming-based ingestion using Auto Loader)
Batch ingestion for scheduled, ad-hoc, or backfill workloads (for example, using COPY INTO)
The choice of ingestion method depends on:
Desired data freshness
Pipeline architecture
Operational preferences of your team
Insider One does not require or enforce a specific Databricks ingestion method.
For implementation details, refer to Databricks documentation:
Step 3: Automation and Monitoring
To operate a reliable pipeline:
Configure Insider One export schedules to match your desired data refresh cadence
Use Databricks monitoring and alerting tools to track ingestion health and troubleshoot issues