Azure Databricks

Overview

Azure Databricks is a data lakehouse platform that many revenue operations and data teams use to store, process, and analyze valuable customer data. HockeyStack can integrate directly with Azure Databricks to both ingest data from your Databricks environment and export processed data back into it.

Prerequisites & Credentials

Before setting up the integration, gather the following credentials and share them securely with our support team. Be sure to review the Databricks documentation for any additional details that may apply:

DATABRICKS_TOKEN: Your personal access token (PAT)
DATABRICKS_HOST: The Databricks host URL associated with your workspace

Please provide these details via a secure channel so we can establish a trusted connection with your environment.

Ingesting Data from Azure Databricks into HockeyStack

HockeyStack transforms your Databricks data into “actions” that can be viewed and analyzed within our platform’s activity timeline. Each action typically includes:

Action Name: A descriptor for what happened (e.g. “Monthly Revenue”).
Action Date(s): The date(s) associated with the action (e.g. 2024-01-01).
Entity Identifier: An identifier (e.g., email, company domain, CRM record ID) that links the action to a known person or company in HockeyStack.
Additional Attributes: Details like revenue amount, currency, or any other metrics that provide deeper insights.

Example Action Record:

action_name = Monthly Revenue
revenue_date = 2024-01-01
revenue_amount = 100
revenue_currency = USD
account_id = 183114939199

Data Requirements for Ingestion

To ensure a smooth ingestion process, the table(s) in Azure Databricks you plan to sync must meet the following criteria:

Incremental Sync Timestamp: At least one datetime column representing the “last modified date” of each record. HockeyStack uses this column to determine which records are new or updated since the last sync.
Action Name Column: At least one column should serve as the “action name” to categorize or identify the type of event or record.
Action Date Column(s): One or more columns should represent the date(s) on which the action occurred.
Entity Identifier Column: At least one column must contain an identifier (e.g., email, CRM ID) that matches identifiers known to HockeyStack. This allows actions to be tied back to the correct person or company in your system.

Once you’ve identified or prepared a table that meets these requirements, share the schema details with our support team to begin the ingestion setup.

Exporting Data from HockeyStack to Azure Databricks

HockeyStack can also push data into your Azure Databricks instance. This allows you to incorporate HockeyStack’s enriched analytics data back into your broader data ecosystem for advanced analytics, modeling, or reporting.

How Export Works

Scheduled Syncs: By default, HockeyStack runs daily export jobs that can be customized in frequency.
Data Formatting: Our workers query HockeyStack’s database, format the data as needed, and push it into the specified destination table(s) in Azure Databricks.
Schema Planning: Before initiating exports, our support team will propose a table structure and work with you to confirm the data columns, formats, and fields needed.

Sync Options

Incremental Sync: Push only new or updated records since the last export.
Recurring Full Sync: Clear the designated Databricks table and repopulate it entirely each run.

Default Recommendations

Daily Cronjob: Clears and resyncs the table once per day.
Data Format: We typically mirror HockeyStack’s raw data format — a single denormalized table containing a timeline of activity for each person and company. This format can vary per customer due to the range of data fields we ingest from different sources.

If you have any questions or need guidance on setup, data mapping, or optimizing your integration, our support team is here to help. By connecting Azure Databricks with HockeyStack, you’ll unlock powerful insights and streamlined workflows that help drive better decision-making and more efficient data operations.

Last updated 10 months ago