HockeyStack
HomeLive DemoBook a DemoLogin
  • Getting Started
    • 👋Welcome to the Docs!
    • Product Onboarding
    • HockeyStack Implementation Scope: Reporting Product
  • Release Notes
    • May 5, 2025
    • April 21, 2025
    • April 14, 2025
    • April 7, 2025
    • March 31, 2025
    • March 24, 2025
    • March 14, 2025
    • March 6, 2025
    • February 28, 2025
    • February 17, 2025
  • Guides
    • ❓FAQ
      • Merging in HockeyStack
      • Why has my data changed?
      • Viewing form submissions by page
      • A touchpoint's influence on conversion rates
      • Average number of touchpoints
      • Self-reported attribution
      • Trend of Engagement Score
      • How do I see which individuals / companies are included in a metric?
      • What touchpoints get credit in attribution?
      • What object/integration is this field pulling from?
      • What is "Source"?
      • What is "UTM Source"?
      • What is "UTM Medium"?
      • What is an Action?
      • How can I add Salesforce Task object into Defined Properties?
      • How can I track offline events in HubSpot?
      • How can I use LinkedIn Impressions and Engagements in my reports?
      • Measuring sales and marketing penetration in an account list
      • Measuring number of engaged contacts per company
      • Offline conversions for ad platforms
      • Tracking progression on targets
      • Building a Campaign / Asset Grouping property
      • Percentage of high quality job titles by Channel
      • Measuring conversion rates
      • Building a goal that shows open opportunities
      • Number report: Funnel stages influenced by different types of marketing touchpoints
      • Best Practices for Lifecycle Tracking in Salesforce / HubSpot
      • Adding HubSpot form fills to defined properties
      • Does HockeyStack website pixel track US States?
      • Measuring Time Between Two Goals in HockeyStack
      • What is the HubSpot "email bounces" action?
      • Hiding Fields from your CRM in HockeyStack
      • How long does it take for a relation mapping to be ready to use?
      • Can I integrate multiple LinkedIn ads accounts?
      • Can I use Zapier for integrations with HockeyStack?
      • I created a new field in Salesforce (SFDC), but I don’t see it in HockeyStack. What should I do?
      • Using two similar fields in one breakdown
      • Why can't I map back to property?
      • GA4 vs. HockeyStack Website Data Tracking
      • How does HockeyStack deduplicate accounts?
      • How do Table Totals Work: Campaign vs Campaign Group?
      • Can I create one field that calculates the total ad spend + SFDC campaign spend?
      • HubSpot: Can I filter a goal on X object by Y object fields?
      • LinkedIn Impressions: Different Ways of Measuring
      • How to define Engaged Accounts and Engaged People?
      • Building a Campaign Grouping property
    • 🖥️Dashboard Building Guides
      • Business Overview Dashboard
      • CMO Dashboard
      • Website Analytics Dashboard
      • Paid Ads Dashboard
      • Google Ads Dashboard
      • LinkedIn Ads Dashboard
      • In-Person Events
      • ABM Live-Demo
      • Content/Organic Dashboard
      • Dashboards from Labs Reports
        • LinkedIn Ads Benchmarks
        • Google Ads Benchmarks
        • Q1 2024 Recap
        • G2 Impact 2024 Report
        • Website Benchmarks
  • Documentation
    • The HockeyStack Data Model
    • 🎯Goals
      • Funnel Stages Goals
      • Form Fill Goals
      • Page View Goals
      • Click Goals
      • Finding Out a Button's CSS Selector
      • Goals on the Task Object
      • Building an All Touchpoints (Channel) Goal
    • Defined Properties
    • Track Date Properties
    • 📊Reports
      • Building a Basic Report
      • Journeys Use Cases
      • Customer Touchpoint Hierarchy
      • Sequences
      • Lift Reports
      • Lift Analysis vs. Multi-Touch Attribution
      • Types of report filters and when to use them
      • Attribution Models
      • Attribution Lookback
      • Defining Custom Attribution Weights
      • Importing a Google Sheet to use as a Goal Column
      • Advanced Attribution Models
    • 🖥️Dashboards
    • Dashboard Filters
      • When to use AND vs. OR logic?
      • Using Regex
    • 🌠Journeys
      • Syncing journeys to CRM and Slack
    • 🥇Golden Paths
    • Funnels
    • Attribution Funnel
    • 👥Segments
    • ⚙️Settings
      • Account Reset Guide
      • Auto-tagging of URLs
      • Data Categorization in HockeyStack
      • Team Sharing
      • Tracking Multiple Domains
      • Excluding Users
      • Reporting Configuration
      • Multi-Factor Authentication
    • Advanced Data Connections
      • Account List Import
      • Property Relation Mappings
      • Sync Spend
      • Syncing spend from offline channels and campaigns
    • 🔃Audience Syncs
    • Send View updates to Webhooks
    • Odin AI
      • HockeyStack AI: Security, Privacy, and Responsible Use
  • DataSyncs
    • Connecting your Warehouse
      • Authenticate Snowflake
      • Authenticate Google Sheets
      • Authenticate BigQuery
      • Authenticate S3
        • Use an S3 User
        • Use an IAM Role
    • Configure a DataSync Import
    • Configure a DataSync Export
      • Data Export Schema
        • Raw Actions Export Schema
  • Integrations
    • Website Tracker
      • Google Tag Manager
      • WordPress
      • React
      • Troubleshooting
      • Reverting to Cookie-Based Tracking
      • Identifying Users
      • Tracking Custom Goals
    • Ad Platforms
      • LinkedIn Ads
      • Bing Ads
      • Capterra Ads
      • Google Ads
      • Facebook Ads
      • Tiktok Ads
      • Twitter Ads
      • StackAdapt Ads
      • Reddit Ads
      • AdRoll Ads
    • Analytics & Data Warehouse
      • Snowflake
      • Amazon Redshift
      • Google Bigquery
      • Amazon S3
      • Azure Databricks
    • CRMs
      • Salesforce
        • Properties Pulled from Salesforce
        • Salesforce Pulled Objects List
        • Sending Data to Salesforce
      • HubSpot
        • HubSpot Pulled Objects List
    • SSO
      • Azure AD
      • Google Workspace
      • Okta
    • ABM
      • Qualified
      • 6sense
      • Demandbase
      • Clearbit
      • Rollworks
      • G2 Intent
      • Stackadapt
    • Marketing Automation
      • Marketo
        • How to Find Your Marketo Account Details
        • Marketo Pulled Objects List
      • Pardot
      • HubSpot
        • HubSpot Pulled Objects List
    • Other Integrations
      • Calendly
      • Drift
      • Okta
      • Segment
      • Customer.io
  • Setting up your Data for import
    • Import Custom Actions
    • Import Website Actions
    • Import Properties
    • Import Metadata
  • Technical Details
    • ↖️Website Tracking
      • How Website Tracking Works
      • Cookieless Tracking
      • Bot Traffic
      • Privacy Policy
      • GDPR Compliance
    • ⚙️Data Processing from Integrations
    • 🧮Data Cleaning
  • Account Intelligence
    • ☕Getting Started
      • HockeyStack Implementation Scope: Account Intelligence Product
      • Salesforce
        • Salesforce Permissions
        • Salesforce iFrame Installation
        • Salesforce Sync Fields
    • 🏗️Workflows
      • Creating a Workflow
      • List of Workflows
      • Starter Workflow
      • Recurring Workflow Runs
      • Nodes
        • Transformations
          • Condition
          • AI for Accounts
          • Contact Discovery
          • Contact Enrichment
          • Branching
        • Destinations
          • Salesforce
          • HubSpot
          • Outreach
          • StackAdapt
          • Salesloft
          • LinkedIn
          • Pardot
    • 👀Views
      • Create a New View
    • 🔢Scoring
      • Data
Powered by GitBook
On this page
  • Data Ingestion Approach
  • Methods for Pulling Data from S3
  • 1. Direct S3 Bucket Access via ClickHouse
  • 2. Using Amazon Athena
  • IAM & Permissions Setup
  • Data Schema Documentation
  • Advanced Option: Custom Data Pipeline to ClickHouse
  • Overview
  • Data Ingestion Approach
  • Methods for Pulling Data from S3
  • Advanced Option: Custom Data Pipeline to ClickHouse
  1. Integrations
  2. Analytics & Data Warehouse

Amazon S3

PreviousGoogle BigqueryNextAzure Databricks

Last updated 4 months ago

Amazon S3 is a highly scalable and reliable object storage service. Its flexibility makes it a perfect source for storing customer-related data, such as marketing and sales logs, event records, or customer attributes. HockeyStack integrates seamlessly with S3 to ingest your data, offering the flexibility you need to unify and analyze all of your customer insights in one place.

Data Ingestion Approach

We recommend using CSV or Parquet files to store the data you want HockeyStack to process. This ensures efficient ingestion and parsing of large datasets.

HockeyStack maintains an internal Last Sync Date to manage incremental updates. We only pull new or updated objects where the Last Sync Date is earlier than the object’s Timestamp. If you anticipate adding historical data after the initial sync, include an Added or Updated At column to ensure proper incremental loading of past records.

Initial Backfill: We start by pulling your historical data — often from the past few years, depending on your requirements — to create a comprehensive baseline of your datasets.

Incremental Syncs: After the initial backfill, HockeyStack retrieves only the daily deltas (differences) so your analytics stay up-to-date without unnecessary overhead.

Methods for Pulling Data from S3

Depending on your technical environment and preferences, we offer multiple ways to integrate with S3.

1. Direct S3 Bucket Access via ClickHouse

Method Overview: Expose your S3 buckets directly for ingestion through the . ClickHouse is our primary analytical database, which stores every datapoint about your customers for high-performance querying.

Table Requirements:

  • Timestamp: A column indicating when the event or record occurred.

  • Identity (Email): A unique identifier (e.g., email) to link records to individual customers or entities.

  • Action Data: Additional columns representing activities, attributes, or metrics you want to analyze.

2. Using Amazon Athena

Considerations:

  • Cost Control: Athena charges per query and data scanned, so we’ll work with you to set query limits or schedules that keep costs predictable.

  • Schema & Structure: Ensure the queried data contains the required Timestamp, Identity, and action-related columns.

IAM & Permissions Setup

For both direct S3 access and Athena-based ingestion, you’ll need to:

  • Create an IAM User: Provide HockeyStack with AccessKeyID and SecretAccessKey for secure programmatic access.

  • Permissions: Grant the IAM user AmazonS3FullAccess or a more restricted, bucket-level read policy that still allows HockeyStack to retrieve the necessary files. If using Athena, also include AmazonAthenaFullAccess or equivalent permissions so HockeyStack can run queries against your S3 data.

Data Schema Documentation

Once you’ve chosen a method, tested the connection, and confirmed access, provide HockeyStack with a short description of each object type, file structure, and any notable fields within your S3 data. For example:

  • Column Descriptions:

    • timestamp: Event occurrence time

    • email: User’s email address

    • page_view_count: Number of pages viewed in a session

    • added_at/updated_at: When the record was inserted or modified

This information ensures HockeyStack can accurately interpret, map, and utilize your S3 datasets.

Advanced Option: Custom Data Pipeline to ClickHouse

For customers seeking even more control, there’s a third approach:

  • Custom Data Pipeline: Build a custom pipeline that pushes data directly into a dedicated ClickHouse cluster managed by HockeyStack.

    • Who’s It For?: Technical teams comfortable with custom development.

    • Benefits: Fine-grained control over data ingestion frequency, batch sizes, and data handling logic.

    • Considerations: Requires additional engineering resources on your end and a call with our team to discuss schema and requirements.


Whether you choose direct S3 access, Athena, or a custom pipeline, we’re here to guide you through the integration. Once set up, HockeyStack will pull historical and incremental data from S3, enabling advanced analytics and deeper insights into your customer journeys. If you have any questions or need assistance, reach out to our support team for tailored guidance.

Overview

Amazon S3 is a highly scalable and reliable object storage service. Its flexibility makes it a perfect source for storing customer-related data, such as marketing and sales logs, event records, or customer attributes. HockeyStack integrates seamlessly with S3 to ingest your data, offering the flexibility you need to unify and analyze all of your customer insights in one place.

Data Ingestion Approach

We recommend using CSV or Parquet files to store the data you want HockeyStack to process. This ensures efficient ingestion and parsing of large datasets.

HockeyStack maintains an internal Last Sync Date to manage incremental updates. We only pull new or updated objects where the Last Sync Date is earlier than the object’s Timestamp. If you anticipate adding historical data after the initial sync, include an Added or Updated At column to ensure proper incremental loading of past records.

Initial Backfill: We start by pulling your historical data — often from the past few years, depending on your requirements — to create a comprehensive baseline of your datasets.

Incremental Syncs: After the initial backfill, HockeyStack retrieves only the daily deltas (differences) so your analytics stay up-to-date without unnecessary overhead.

Methods for Pulling Data from S3

Depending on your technical environment and preferences, we offer multiple ways to integrate with S3.

1. Direct S3 Bucket Access via ClickHouse

Table Requirements:

  • Timestamp: A column indicating when the event or record occurred.

  • Identity (Email): A unique identifier (e.g., email) to link records to individual customers or entities.

  • Action Data: Additional columns representing activities, attributes, or metrics you want to analyze.

2. Using Amazon Athena

Considerations:

  • Cost Control: Athena charges per query and data scanned, so we’ll work with you to set query limits or schedules that keep costs predictable.

  • Schema & Structure: Ensure the queried data contains the required Timestamp, Identity, and action-related columns.

IAM & Permissions Setup

For both direct S3 access and Athena-based ingestion, you’ll need to:

  • Create an IAM User: Provide HockeyStack with AccessKeyID and SecretAccessKey for secure programmatic access.

  • Permissions: Grant the IAM user AmazonS3FullAccess or a more restricted, bucket-level read policy that still allows HockeyStack to retrieve the necessary files. If using Athena, also include AmazonAthenaFullAccess or equivalent permissions so HockeyStack can run queries against your S3 data.

Data Schema Documentation

Once you’ve chosen a method, tested the connection, and confirmed access, provide HockeyStack with a short description of each object type, file structure, and any notable fields within your S3 data. For example:

  • Column Descriptions:

    • timestamp: Event occurrence time

    • email: User’s email address

    • page_view_count: Number of pages viewed in a session

    • added_at/updated_at: When the record was inserted or modified

This information ensures HockeyStack can accurately interpret, map, and utilize your S3 datasets.

Advanced Option: Custom Data Pipeline to ClickHouse

For customers seeking even more control, there’s a third approach:

  • Custom Data Pipeline: Build a custom pipeline that pushes data directly into a dedicated ClickHouse cluster managed by HockeyStack.

    • Who’s It For?: Technical teams comfortable with custom development.

    • Benefits: Fine-grained control over data ingestion frequency, batch sizes, and data handling logic.

    • Considerations: Requires additional engineering resources on your end and a call with our team to discuss schema and requirements.


Whether you choose direct S3 access, Athena, or a custom pipeline, we’re here to guide you through the integration. Once set up, HockeyStack will pull historical and incremental data from S3, enabling advanced analytics and deeper insights into your customer journeys. If you have any questions or need assistance, reach out to our support team for tailored guidance.

Method Overview: provides a serverless, SQL-based interface to query your S3 data. HockeyStack uses the NPM package to handle the connection and querying process.

Method Overview: Expose your S3 buckets directly for ingestion through the . ClickHouse is our primary analytical database, which stores every datapoint about your customers for high-performance querying.

Method Overview: provides a serverless, SQL-based interface to query your S3 data. HockeyStack uses the NPM package to handle the connection and querying process.

ClickHouse S3 connector
Amazon Athena
athena-express
ClickHouse S3 connector
Amazon Athena
athena-express