GA4 and User ID: Why bother?

At Datomni, we believe that in today’s world of fragmented user journeys, robust user identification is key to achieving significant ROI from marketing and analytics. GA4’s User ID is one of the ways we make this happen for our clients.

User ID is an advanced, often underused feature of Google Analytics 4. It allows you to supply GA4 with external, anonymous identifiers linked to users. You control and provide this identifier, which can be anything you choose, within certain limitations. By using User ID, you can transform GA4 from a web tracking tool into a comprehensive reporting system that captures the entire customer journey. In this article, we’ll explain the fundamentals of the GA4 User ID system and share some practical tips. 

High-quality data in your reports with identity spaces

With GA4, Google has introduced “identity spaces” to improve reporting by deduplicating data ingested into GA4. These identity spaces help unify the user journey across all reports using four key mechanisms.

  1. User ID: The most accurate identifier, especially when implemented consistently. In GA4, User ID deduplication is applied across all reports, unlike Universal Analytics. This will be our focus of this article.
  2. User-Provided Data: This space includes consented personal data such as hashed emails, phone numbers, and addresses. It can be used with or without User ID, but the most effective strategy is to implement both.
  3. Device ID: On websites, the device ID is equivalent to the client ID, while on mobile devices, it is the app instance ID.
  4. Modeling: When users decline cookies, GA4 uses modeling to estimate their behavior based on similar users who do accept cookie

Introducing GA4’s User ID and its benefits

User ID is a unique identifier that businesses assign to their users in GA4 accounts, extending tracking to include interactions across different realms, such as offline systems or external devices.

When businesses first encounter User ID with us, there’s often a moment of confusion. The question that frequently arises is, ‘Why bother implementing User ID? It doesn’t seem like something that can actually impact our marketing.’ The topic may seem technical and not immediately relevant to daily marketing and analytics operations.

The fact is, there are at least three immediate benefits of adopting User ID in your GA4 system. Let’s zoom in on them.

Higher quality data in reports due to deduplication

The key benefit of User ID is that it prevents counting the same user multiple times across devices and sessions. Without User ID, GA4 treats the same user as new when they switch devices or clear cookies, inflating user counts and skewing metrics like ‘New Users’ (and all derivative metrics), which would then reflect new (consented) devices, not actual new users. A properly implemented User ID helps deduplicate users across all GA4 reports. The word “all” is important—GA4 applies User ID across its entire reporting structure, unlike Universal Analytics, which limited it to certain reports. This deduplication is crucial for creating a more accurate picture of user behavior and interactions, especially if you track multiple channels.

Improved audience quality and lower cost per result (ceteris paribus)

User ID enhances the accuracy of audiences in GA4 and those used for remarketing in Google Ads, reducing wasted ad spend. Accurate segmentation and audience seasoning is key for Google Ads targeting, and User ID prevents users from being counted multiple times across audiences. Without this deduplication, users may be overexposed to ads, leading to wasted spend. You’ll also have less control over the timing and messaging users see, potentially undermining your entire communication strategy.

Full customer journey waiting to be explored in BigQuery

When you link GA4 with BigQuery, User ID information is exported regardless of user consent level. User IDs are populated in their own dedicated column, allowing you to build richer datasets for advanced analysis, particularly for end-to-end user journeys. Since User ID is about ingesting data from multiple sources and platforms, it helps you gain a full understanding of user behavior.

User ID in GA4: Implementation best practices

Implement as early as possible

User ID cannot be applied retroactively. Any data collected before the implementation of User ID will not be associated with the user’s unique identifier. That’s why it’s best to roll out User ID as early as possible, ideally when you first start collecting data in your GA4 account. Starting early ensures that GA4 captures this data moving forward, enabling more accurate historical analysis.

Use permanent identifiers for User ID

Ensure that the User ID is as permanent as possible. This means assigning the same User ID across all interactions and devices to create a cohesive user profile. If you change the User ID during the user lifecycle—whether due to a new assignment logic or new systems—you will effectively create a completely new identity in GA4. GA4 doesn’t have features like Mixpanel’s merge, which can unify multiple user identifiers under one profile.

Be consistent about your User ID assignments

For maximum impact, User ID should remain persistent throughout the entire user journey and be assigned consistently across all identified user events. Gaps in User ID capture can result in a fragmented user journey.

Let’s look at an example to illustrate the importance of maintaining consistent User ID tracking throughout the entire user journey.

For example, if a user initially performs actions A, B, and C while anonymous (with only a Client ID assigned), then logs in and performs actions D and E (while logged in), logs out for action F, and performs action G, GA4 will associate events A through F with the User ID, including the browser-side Client ID. However, event G (post-logout) will not be linked unless it also carries User ID.

Complete coverage of user ID for identified users

The User ID should be applied uniformly to all users who reach a specific stage in your user journey, such as logging in or signing up. Inconsistent or incomplete User ID assignment can lead to inaccurate reports and reduce overall data quality.

Avoid null or non-unique values for User ID

Assigning null or placeholder values (e.g., ‘NA’ or ‘not set’) to User ID fields can lead to erroneous data aggregation, mixing user journeys, and ultimately inaccurate reporting — even more so than if you don’t use the User ID feature at all.

Avoid using Personally Identifiable Information (PII)

Each User ID must be unique, 256 characters or fewer, and composed of UTF-8 characters. It is also crucial that the User ID does not contain personally identifiable information (PII), in accordance with Google’s policy.

Avoid saving user ID in a custom dimension

Implementing User ID in a custom dimension can lead to exceeding GA4’s cardinality limits, which may cause user data to be condensed under the “other” category, reducing the quality of your reports.

Update privacy policy

Ensure that your privacy policy explicitly mentions your use of User ID tracking and provides users with proper notice.

Sourcing user ID: 3 main approaches

If you decide to implement the User ID feature, you will naturally start wondering where to source it from.

Your internal app database

One of the most common approaches to sourcing a User ID is pulling it from the database. Many companies use this method to pass the database user ID at signup, login or other identified events, such as purchase. We don’t think this is an optimal solution.

First, this approach may work if you only want to enrich a couple of events generated by your application, website or other third-party system with your internal user ID. But what if your CDP scales to include other data sources that need to be enriched with the unified user ID prior to ingestion into downstream destinations, including GA4? This will multiply the number of enrichment transactions you’ll need to perform using your database. Third-party tools often use webhooks to generate events. Enriching these raw, unverified data streams with your database ID puts sensitive backend data at risk and can jeopardize security.

Second, relying on a central database for real-time user ID resolution can introduce significant performance bottlenecks. Querying large databases for identity matching during peak traffic periods—such as flash sales or campaign launches—can slow down the entire system and affect the performance of other applications relying on the same database.

Lastly, using your database as a dependency for user ID sourcing is quite limiting. While User ID is useful on its own, it also opens up many other opportunities to enrich data flows in real-time with important user-related data points, which can help you activate data in more powerful ways in your downstream destinations. For example, imagine being able to calculate the lifetime value or lifetime value score for a user in real-time as your event is being processed, allowing you to fine-tune bidding in your downstream advertising destinations. If you always rely on your database for that, you may quickly hit some limits.

Third party identity tools

For companies facing challenges in scaling their internal identity resolution processes, third-party identity resolution tools can help. These tools specialize in merging, cleaning, and enriching user identities across multiple platforms, allowing businesses to avoid the complexities of managing this internally.

The problem with these tools is that they’re typically offered in a SaaS model, handling your customer/user data on their own servers. This means that all your customer data resides and is processed in someone else’s cloud.

Additionally, these tools often sync identities across various platforms using their own identity graph, without exposing raw identifiers or identity envelopes. This leaves you unable to fully control the identity enrichment process or the final payload.

Vendor lock-in is a risk: Once you’re tied to a standalone platform that captures your identities, switching vendors becomes costly and time-consuming. This creates long-term dependence on a single provider, which may not meet your evolving business or tech needs.

Custom-built identity tools

From what we’ve found, companies see the most success with custom-built identity solutions that operate as dedicated microservices in their own cloud. These tools are designed with backends tailored to retrieve the necessary payloads from the identity app, ensuring full data control. Since they function as external apps, they don’t burden the main app database, preventing performance issues.

In this solution, identity resolution and the enrichment of payloads with unified User IDs are moved outside the main database into a dedicated microservice that enriches data flows to GA4 in real time. This setup optimizes performance and keeps the core database unburdened. The microservice approach also enables real-time processing, making it highly scalable and adaptable as the business grows. This is our recommended strategy to avoid database dependency and future-proof identity management.

An identity enrichment microservice provides full control over how user IDs are resolved and enriched, enabling custom algorithms, logic, and optimizations tailored to your specific needs. This flexibility makes it easier to manage evolving data sources and business requirements.

Implementing user ID

Let’s look at some ways that the User ID can be implemented before we wrap up this article.

Client-side implementation with gtag.js

To implement the User ID feature with the standard gtag setup, update the config command in the measurement code across your website pages. Ensure you include your Tag ID (formatted as G-XXXXXXXX) and incorporate the generated User ID following the rules in this guide. That’s all there is to it.

gtag('config', 'TAG_ID', {
  'user_id': 'USER_ID'
});

Server-side implementation with Measurement Protocol

Measurement Protocol is Google’s server-side event collector that can capture events enriched with User ID. It bypasses standard automatic GA4 tracking by exposing an HTTP endpoint that can ingest events directly, regardless of the source, making it possible to include offline event streams, PoS data, and more—ideal for collecting data from third-party sources.

We’ll dive deeper into the Measurement Protocol in a separate blog, but for now, note that the JSON POST body must include the client_id string, while user_id is optional. Deciding whether to generate the User ID is up to you. Keep in mind that before ingesting your first Measurement Protocol event, you must have the browser-side client_id of the associated user to stitch together the full customer journey.

Here’s an example of how to format a call to the Measurement Protocol API after you’ve enriched the payload with you user ID.

const measurement_id = 'G-XXXXXXXXXX';
const api_secret = '<secret_value>';

fetch(`https://www.google-analytics.com/mp/collect?measurement_id=${measurement_id}&api_secret=${api_secret}`, {
  method: "POST",
  body: JSON.stringify({
    client_id: 'XXXXXXXXXX.YYYYYYYYYY',
    user_id: "XXX",
    events: [{
      name: 'purchase'
    }]
  })
});

User ID makes your GA4 a cross-platform analytics tool

Implementing User ID is crucial for unlocking accurate, scalable analytics. It allows you to track unified customer journeys across platforms, improving data quality and marketing precision. For optimal performance, consider using a custom-built identity solution outside your main database, ensuring real-time processing and future scalability. Always use the same unified User ID identifier for the same user, whether you’re ingesting enriched payloads via gtag or Measurement Protocol.

Photo attribution

As usual, the featured image of the article is a photograph that corresponds with the article’s topic. This time, the shoutout goes to Pawel Czerwinski via Unsplash.