Building custom stacks on top of existing CDPs, such as Snowplow CDP with GTM Server, Metabase and Omni Analytics on AWS, is not the only way we ensure that we derive maximum value from the implementations of both SaaS-based and open-source customer data platforms for our clients.
In this article, I’m introducing our custom-built CDP collector called Omni Analytics, which is part of the Omni CDI. It focuses on three key aspects that are top-of-mind for our clients: complete control over data, cost-effectiveness of data processing, and high-quality data provided downstream.
In the second part of the article, we’ll also demonstrate Omni Analytics in action at one of our packaged services. Future articles will cover more details about the installation of the platform.
Omni Analytics: Omni CDI’s data collector
Omni Analytics is a Dockerized platform designed to address the key issues outlined in the article about challenges accompanying CDP implementations, such as lack of portability, limited data control, difficulty managing data infrastructure with a small team, slow analytics implementation, and often high data bills.
Omni Analytics is a flexible container built to accommodate multiple processing pipelines tailored to diverse events or event groups. It’s designed for growth-driven teams in mid-sized companies, reflecting the perspective of our typical clients.
Omni Analytics revolves around key concepts: Collectors (Sinks), Enrichers, Mappers, Consumers, Pipelines, and Loggers. Let’s dive into each of these concepts to explain the key design choices behind Omni Analytics.
Collectors capture events server-side
With the collector module in Omni Analytics, you set up individual server-side event collectors to capture data. Each collector should have a clear business goal assigned, and as a result, these collectors handle specific event sets with shared requirements for processing, validating, and enriching data.
Collectors prioritize how data flows through the pipeline over its origin. Collectors are used to capture both first-party and third-party events, including webhooks—web, mobile, offline, and so on.
In Omni Analytics, collectors aren’t tied to specific languages or frameworks. There’s no separate sink for Node.js, Python, or other languages. Collectors are designed for specific business processes, handling events that require similar transformations as they flow through the system.
No need to design the schema for each event separately with Omni Analytics collectors—whether through filters, protocols, or extensive schema validations. We’ve found this approach involves a lot of mundane work that businesses do not understand and consumes significant time to maintain, especially with extensive event models. Here, you define the entry schema at the specific collector, and it applies to all events processed through that specific collector. We assert that if a specific group of events shares a similar business goal and process, they should adhere to a predefined set of data points. Non-conforming events may be rejected based on their deviation from the collector-level schema.
Collectors exclusively perform server-side collection.
Enrichers refine events
The collector-level entry schema reduces noise to some extent, but the real refinement begins at the raw pipeline enrichment step.
All events in a collector should undergo a similar enrichment process, because they share a business goal. This can include straightforward enrichments like adding geo-headers or more complex multi-step enrichments involving multiple third-party APIs and additional filters. For GDPR compliance, you can perform server-side consent checks to halt the processing of events that aren’t GDPR-ready.
In a specific collector’s context, multiple enrichments can run sequentially. In this process, each step of enrichment accesses and modifies the payload state inherited from the previous step. This is actually powerful for constructing rich payloads.
Mappers structure events
If the event has passed through the collector (which has already pre-validated it) and through the enrichment process, it is likely a juicy and valid event carrying a significant signal, and is almost ready to be dispatched to the data activation layer (Omni Activation, which we will cover separately).
We define the event dispatch structure in the final step of the mapper module, which maintains a 1-to-1 association with the collector (and hence the enricher). Yes, one structure for all events captured by the collector (and not each one for 1 pair of destination and event). The mapper module takes the event in its final form after the enrichment process and organizes everything into a clear payload depending on your activation client. If you need to change the payload structure or add an additional data point downstream, you simply edit it once in the mapper, avoiding a two-week journey around data collection, formatting, and destination configuration with your development and data teams. In the same way, if your developer changes the tracking schema by accident or not, it will not automatically propagate downstream unless the business decision of updating the payload schema is made and acted upon. This adds to ensuring data quality. The mapping module is wrapped up in a simple interface to quickly adapt data to the changing requirements of the activation layer without needing developer assistance.
Pipelines authorise and control all sub-processes
Each collector is linked to enrichment and validation processes. This raises the issue of authorizing incoming data and ensuring no unauthorized processes or transformations occur.
We address this with pipelines. Pipelines manage authorization between various parts of the data processing system, such as collection, mappers, and enrichment. They ensure that data related to each collector is only transformed as permitted.
You can create as many pipelines as needed—no arbitrary limits or unique user counts. One pipeline can manage the authorization of data processing for multiple collectors.
Pipelines also enable Omni Analytics to function as a multi-environment solution. We recommend creating dedicated pipelines for each environment, even if you have only one collector for your entire business. This approach allows for dedicated processing of the same data based on its originating environment.
Consumers capture rich and valuable events
Your consumption phase is for rich, well-formatted, and ready-to-use events. That’s why we run it at the very end. Consumption is not the time to do major reformatting of the event. You should not encounter erroneous data in your consumption layer either. This is where your events are converted into business value, so no experiments here. Experiments, enrichment, and formatting are done at earlier stages of the processing inside your Omni Analytics instance, always in the context of a specific collector.
Consumers consume events authorised by the pipeline, and they mark the entrance to the activation layer. Consumers include: Segment, Snowplow, GTM Server, and a few others. Each pipeline can write to multiple consumers, which further limits the work you have to perform to run this. Unlike some other CDPs, there are no restrictions on which consumer can work with each data source.
Configuration of individual consumers is different. The most mature integration is that of GTM Server, which is facilitated by the Omni Data Client, which captures rich events shipped to your consumption phase and makes them ready for the tagging container. You will get this Omni Data Client from us as we collaborate.
Once you have data processed by the consumer, it’s time to make full use of it, or downstream activation or warehousing.
Loggers log key transactions
The final piece of the puzzle is the logger, which records all transactions throughout the entire lifecycle of transformations and operations.
Eating our own dog food — running Pipedrive Booster 2.0 on the Omni Analytics backend
Apart from the full-blown customer data infrastructures we deploy, we also use Omni CDI, specifically Omni Analytics, in our analytics and MarTech consulting packages sold at Datomni. These packages typically consist of one- or two-pipeline setups to achieve concrete business goals and utilize our backend. Let’s explore how Omni Analytics provides an analytics pipeline for our Pipedrive Booster 2.0 implementation service.
This example demonstrates not a theoretical but an actual, practical data pipeline processed by the Omni Analytics instance, delivering value to real clients, as recently described in Vyde Pipedrive implementation case study.
The Pipedrive Booster’s 2.0 analytics backend is as follows.
On the backend, Pipedrive Booster 2.0 is managed by an Omni Analytics instance configured to run a single pipeline with one collector, one enricher, and one mapper, all supporting an extensive enrichment sequence. Data is fed to both the activation layer’s destination and the warehousing parts.
The collector captures HTTP callbacks from Pipedrive accounts related to deal management, lead management, adding new contacts, organizations, campaigns, and other components. Approximately 25 different callbacks are captured to ensure accurate calculations of deals and leads, which are the main selling points of the package. All callbacks are authorised using a token assigned at the pipeline level, which is also used in the mapper and enricher steps.
Events are routed to a dedicated collector endpoint, optionally available in a client’s private cloud, and are pre-validated against the built-in collector schema. This schema, which comprises around 90 data points, validates all incoming events, including standard fields and custom fields. All captures are logged in the Omni Analytics instance. The capability to subscribe to custom fields created as part of the package enables us to design highly detailed reports later on.
Once the events are pre-validated and captured, they undergo enrichment, the most advanced component of this pipeline. The enrichment sequence is multi-step. By design, not every Pipedrive callback topic includes all desired information, so events typically require a follow-up call to the Pipedrive API to retrieve additional data points. Once a golden record is created by combining callback data with Pipedrive API data, the event undergoes further enrichment, starting with email and phone validation, followed by data enrichment from third-party databases related to organizations and individuals. At this point, the event payload is quite rich and ready for a dispatch to the actual business processes such as deal valuation and scoring. In Pipedrive Booster 2.0, we offer automated deal valuation based on predefined criteria and formulas. This step ends with calculating a synthetic deal score.
Once the event passes through full enrichment, it is composed into a dispatch-ready payload by the mapper, now containing over 100 data points. This event is then structured and pushed further through the pipeline to the activation layer. When the event is captured by Omni Data Clients, either in AWS Lambda or GTM Server, it undergoes further processing for dispatch to end destinations and warehousing. The warehousing pipeline is invoked, routing events to our reporting layer and subsequently to the metrics layer. This allows us to provide intricate, custom dashboards that operate independently from Pipedrive, giving clients full control over their data.
This backend singularly manages all processes required to deliver the entire Pipedrive Booster 2.0 package. Setting up this package involves considerable customization within the account, such as configuring custom fields and automations. However, all these functionalities are made possible by the Omni CDI backend.
Get a test drive of Omni Analytics today
Schedule a call to test Omni Analytics. After a brief chat and confirmation that we’re a good fit and that Omni Analytics meets your needs (our client profile is quite specific), we’ll be happy to deploy our Dockerized apps and pilot Omni Analytics in a private cloud environment. You will find more technical details about Omni CDI in the dedicated documentation page, which we’re constantly updating.
The trial offer: activate the first event in Meta Conversion API from your development environment (e.g., a dev version of your website) using exclusively Omni CDI’s components within 7 days, deployed to a private cloud. We look forward to hearing from you!
Photo attribution
As usual, the featured image of the article is a piece of abstract photography that corresponds with the article’s topic. This time, the shoutout goes to Ahmad Dirini via Unsplash.