Documentation Index

Fetch the complete documentation index at: https://academy.insiderone.com/llms.txt

Use this file to discover all available pages before exploring further.

Export Data from Insider One to Databricks

Prev Next

This guide explains how to export Insider One data to Databricks using cloud object storage as the intermediary data layer.

You can use one of the following storage destinations:

  • Amazon S3

  • Google Cloud Storage (GCS)

  • Microsoft Azure Blob Storage

Insider One exports your event or user data to your selected cloud storage destination. Databricks then reads the exported data from that location through your own ingestion pipelines.

How does the export flow work?

  1. Insider One exports event or user data to your customer-owned cloud storage destination.

  2. Databricks reads the exported files from that storage destination through your configured ingestion pipeline.

  3. You process, transform, or query the data in Databricks based on your business needs.

Requirements

Before you start, make sure you have the required access and permissions for Insider One, your selected cloud storage provider, and Databricks.

Platform

Requirement

Insider One

Access to an Insider One account with permissions to configure data exports.      

Insider One

Ability to configure Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage as an export destination.      

Amazon Web Services

An AWS account with permissions to create and manage S3 buckets.      

Amazon Web Services

IAM permissions that allow Insider One to write data to the selected S3 bucket.      

Google Cloud Platform

A GCP project with permissions to create and manage Google Cloud Storage buckets.      

Google Cloud Platform

A service account, key, and IAM permissions that allow Insider One to write data to the selected bucket.      

Microsoft Azure

An Azure subscription, a Microsoft Azure Storage Account, and a Blob Service Container.      

Microsoft Azure

The Connection String for the Storage Account with permissions that allow creating objects in the container Insider One will write to.      

Databricks

Access to a Databricks workspace with permissions to configure ingestion from your selected cloud storage destination.      

Option 1: Export data through Amazon S3

Step 1: Configure Amazon S3 as an export destination in Insider One

Use Amazon S3 if you want Insider One to export data to an S3 bucket that your team owns and manages.

In Insider One, you can:

  • Select or create an Amazon S3 bucket for export.

  • Configure IAM credentials and bucket policies that allow Insider One to write data.

  • Choose the export frequency and data types, such as events or user attributes.

  • Validate the integration and monitor exported files in Amazon S3.

For detailed setup instructions, refer to Export Data to Amazon S3.

Step 2: Ingest Amazon S3 data into Databricks

After the data is available in Amazon S3, configure your Databricks ingestion pipeline to read data from the bucket.

Databricks supports different ingestion methods, including:

  • Continuous ingestion: Use this method for near real-time or incremental processing, such as ingestion with Auto Loader.

  • Batch ingestion: Use this method for scheduled, ad-hoc, or backfill workloads, such as ingestion with COPY INTO.

Your ingestion method depends on your data freshness requirements, pipeline architecture, and operational preferences. Insider One does not require or enforce a specific Databricks ingestion method. For implementation details, refer to Databricks documentation:

Option 2: Export data through Google Cloud Storage

Step 1: Configure Google Cloud Storage as an export destination in Insider One

Use Google Cloud Storage if you want Insider One to export data to a GCS bucket that your team owns and manages.

In Insider One, you can:

  • Select or create a Google Cloud Storage bucket for export.

  • Configure a GCP service account, key, and bucket IAM permissions that allow Insider One to write data.

  • Choose the export frequency and data types, such as events or user attributes.

  • Validate the integration and monitor exported files in GCS.

For detailed setup instructions, refer to Export Data to Google Cloud Storage.

Step 2: Ingest Google Cloud Storage data into Databricks

After the data is available in Google Cloud Storage, configure your Databricks ingestion pipeline to read data from the bucket.

Databricks supports different ingestion methods, including:

  • Continuous ingestion: Use this method for near real-time or incremental processing, such as ingestion with Auto Loader.

  • Batch ingestion: Use this method for scheduled, ad-hoc, or backfill workloads, such as ingestion with COPY INTO.

Your ingestion method depends on your data freshness requirements, pipeline architecture, and operational preferences. Insider One does not require or enforce a specific Databricks ingestion method. For implementation details, refer to Databricks documentation:

Option 3: Export data through Microsoft Azure Blob Storage

Step 1: Configure Microsoft Azure Blob Storage as an export destination in Insider One

Use Microsoft Azure Blob Storage if you want Insider One to export data to an Azure Blob container that your team owns and manages.

In Insider One, you can:

  • Select or create a Microsoft Azure Storage Account and Blob container for export.

  • Configure the Connection String and container permissions that allow Insider One to write data.

  • Optionally set a Blob Path Prefix to organize exported files within the container.

  • Choose the export frequency and data types, such as events or user attributes.

  • Validate the integration and monitor exported files in your Blob container.

For detailed setup instructions, refer to Export Data from Insider One to Azure Blob Storage.

Step 2: Ingest Azure Blob Storage data into Databricks

After the data is available in Azure Blob Storage, configure your Databricks ingestion pipeline to read data from the container.

Databricks supports different ingestion methods, including:

  • Continuous ingestion: Use this method for near real-time or incremental processing, such as ingestion with Auto Loader.

  • Batch ingestion: Use this method for scheduled, ad-hoc, or backfill workloads, such as ingestion with COPY INTO.

Your ingestion method depends on your data freshness requirements, pipeline architecture, and operational preferences. Insider One does not require or enforce a specific Databricks ingestion method. For implementation details, refer to Databricks documentation:

Automation and monitoring

To maintain a reliable data pipeline, align your Insider One export schedule with your preferred data refresh cadence.

  • Configure Insider One export schedules based on how often you need fresh data in Databricks.

  • Use Databricks monitoring and alerting tools to track ingestion health and troubleshoot pipeline issues.