Export Data from Insider One to Databricks

Prev Next

This guide explains how to export data from Insider One to Databricks using Amazon S3 as the intermediary data lake. This method leverages Insider One’s native Amazon S3 export capability and Databricks’ ability to ingest data directly from S3 buckets.

Overview

  1. Insider One exports event or user data to a customer-owned Amazon S3 bucket.

  2. Databricks reads data from Amazon S3 using customer-configured ingestion pipelines.

  3. Data is processed, transformed, or queried in Databricks according to customer needs.

Prerequisites

Insider One

  • Access to an Insider One account with permissions to configure data exports

  • Ability to configure Amazon S3 as an export destination

Amazon Web Services (AWS)

  • An AWS account

  • Permissions to create and manage S3 buckets

  • IAM permissions allowing Insider One to write data to the selected bucket

Databricks

  • Access to a Databricks workspace

  • Permissions to configure data ingestion from Amazon S3

Step 1: Set Up Amazon S3 Export in Insider One

Insider One supports exporting data directly to Amazon S3.

Using the Insider One dashboard, you can:

  • Select or create an Amazon S3 bucket for export

  • Configure IAM credentials and bucket policies that allow Insider One to write data

  • Choose export frequency and data types (for example, events or user attributes)

  • Validate the integration and monitor exported files in S3

For detailed, up-to-date instructions, refer to Export data to Amazon S3.

Step 2: Ingest Data from Amazon S3 into Databricks

Once data is available in Amazon S3, it can be consumed by Databricks using ingestion pipelines. Databricks supports multiple approaches for ingesting data from Amazon S3, including:

  • Continuous ingestion for near real-time or incremental processing (for example, streaming-based ingestion using Auto Loader)

  • Batch ingestion for scheduled, ad-hoc, or backfill workloads (for example, using COPY INTO)

The choice of ingestion method depends on:

  • Desired data freshness

  • Pipeline architecture

  • Operational preferences of your team

Insider One does not require or enforce a specific Databricks ingestion method.

For implementation details, refer to Databricks documentation:

Step 3: Automation and Monitoring

To operate a reliable pipeline:

  • Configure Insider One export schedules to match your desired data refresh cadence

  • Use Databricks monitoring and alerting tools to track ingestion health and troubleshoot issues