Snowplow CDP with GTM Server, Metabase and Omni Analytics on AWS

Exploring the 2nd generation of our Snowplow service, now closer to CDP nirvana

After testing multiple CDP SaaS tools or a stack of tools running your analytics, you might still be frustrated by the lack of complete control over your data and pipelines, vendor lock-in, and high data bills. With your user and event volumes continually growing, you might feel you're leaving ROI on the table with no clear path for the future. We've seen this happen many times, and we're committed to helping you transition to a superior CDI platform.

Introducing Snowplow-based customer data infrastructure built on AWS

In this service, we deploy a real-time Snowplow AWS pipeline using Terraform components, including our own open-source components with over 1,000 downloads. We configure and integrate this pipeline with downstream layers for activation and reporting. For more intricate third-party data source needs, we also integrate a private Omni Analytics instance from the Omni CDI suite to ensure high-quality data from third-party streams feeding Snowplow collectors. Everything operates entirely within your private cloud.

Collect, validate and warehouse at scale

Capture 1st-party and 3rd-party events, validate against Iglu schemas, and warehouse data in real-time.

Activate and monitor with GTM Server

Tag server-side to lighten browser load and activate downstream tools using our own custom tags that follow the Snowplow Data Client.

Report and get alerts with Metabase

Gain real-time metrics and insights through dashboard sitting on top of you private Metabase instance, and receive alerts if things go off track.

Full control over data. No third party SaaS involved

Snowplow-based customer data infrastructure built on AWS eliminates reliance on external SaaS analytics tools or SaaS CDPs, enabling real-time data activation and reporting with complete control over your data. We believe this service provides a sensible ROI when deployed for high-traffic businesses with multiple data sources handling over 10 million events per month and a significant level of data maturity in the organization. If you're looking for something that "just works," without significant infrastructure overhead, contact us to discuss more flexible CDIs like Segment or Omni CDI if you prefer full data control and customization.

datomni blog contents data private

Client-side and server-side 1st party events

The core of AWS Snowplow CDI is the real-time Snowplow pipeline. This robust infrastructure processes massive event volumes but requires stable event and customer schemas. Once your data model is fixed, we'll populate Iglu schema repositories with self-describing, versioned schemas. To maximize event value, we'll configure enrichments like IP lookups, campaign attribution, and custom enrichments using the Custom Enrichment API.

Server-side activation with monitoring

As a high-traffic organization, you can't afford bloated browsers due to scripts or event capturing. We understand. That's why in Snowplow CDI, events are sent to server-side tagging containers for each environment and then routed downstream to various tools. There's no reliance on external SaaS for data processing, which limits tags, bandwidth usage, and execution times. Tags are designed for monitoring and comply with the Snowplow Data Client protocol, fully utilizing custom contexts in event collection and processing.

Real-time metrics and private dashboards

Snowplow CDI provides real-time metrics displayed on dashboards via a private Metabase dashboard, updated every 1, 5, 10, 30, or 60 minutes. There are no hefty SaaS BI fees, and you maintain full control over data sourced from RDS or Redshift loaded with your enriched events. Alerts notify you of metric deviations, enabling you to track goals and team progress. With no reliance on external clouds or additional subscriptions, you have complete control over your data operations.

Third-party event structuring and enrichment

Our Snowplow CDI integrates with your private Omni Analytics instance to enrich events from dynamic third-party sources such as webhook streams for payment statuses, or offline event pipelines. This setup, part of Omni CDI, operates as a Dockerized app in the AWS cloud environment alongside Snowplow. Directly feeding such data into Snowplow is impractical due to their changing payloads and Snowplow's strict schema requirements. In our opinion sources often require complex enrichment and validation before they're ready to be digested into Snowplow collectors. Omni Analytics fulfills this crucial role.

Your Snowplow CDI has already been partially built

Your infrastructure will utilize multiple pre-built components that we've open-sourced or made available to our clients. Here are the highlighted ones that we use in Snowplow CDI's.

Omni Warehousing Terraform Elasticsearch Cluster

Module creates a simple, single-node Elasticsearch cluster on AWS. Over 1000 community downloads.

Omni Warehousing Terraform Snowplow Databricks Pipeline

Module builds the Collector application, the Enrich application, and the Databricks Loader.

Omni Warehousing Terraform Snowplow Elasticsearch Pipeline

Module builds the Collector application, the Enrich application, and the Elasticsearch Loader.

Omni Activation Meta Conversion API Tag for Snowplow Data Client

This tag ingests rich Snowplow events into the Facebook Conversion API. It is currently closed-source and provided exclusively to clients, with an open-source version in development.

PoC that works, and we proceed from there

Snowplow CDI is one of the services we build using a Proof of Concept (PoC) model. We create a quick PoC in 3 weeks and refine it based on your priorities in 2-week sprints. You can book as many sprints as needed to perfect your infrastructure, including 0 if you have a strong team that can handle refinement independently. Let's schedule a call to get started.

Latest blogs on analytics