What if you have a marketing question that you can’t easily answer using Google Analytics reports alone?
For example, you may be interested in finding the combination of cities, age groups, marketing sources, and devices that generates the highest number of transactions and e-commerce revenues.
It would be hard to do in a regular, free version of Google Analytics (Universal Analytics), because you can only add two dimensions to a Google Analytics report at a time (one primary dimension and one secondary dimension.)
Even if you create a custom report or build an advanced segment out of one of the dimensions, it will still feel as a repetitive thing to do.
The deeper the insights you want to get, the harder it will be for you to get them using Google Analytics reports only.
The way out of such situations is to export your Google Analytics data to Google BigQuery and derive insights from the raw dataset by running your own queries.
In the next few sections, we’ll help you do just that. We’ll set up a data pipeline that sends your data to Google BigQuery and then runs queries that generate insights for you.
Please note that this tutorial works with the Universal Analytics version of Google Analytics. GA4, the newest version of Google Analytics, makes sending data to BigQuery much easier. Nevertheless, we’re sure that following this tutorial will teach you many things about BigQuery, Google Analytics, and building your own data pipelines. Follow along!
From Analytics to BigQuery in 3 steps
Getting your data into BigQuery boils down to 3 things.
- Choosing a proper combination of metrics and dimensions from Google Analytics that will help you answer business question at hand.
- Launching an ETL (a data pipeline) to move data from Google Analytics to Google BigQuery.
- Exploring your data in Data Studio and then wrangling and querying it in BigQuery.
Combine proper dimensions and metrics
As you know, Google Analytics is organised into metrics and dimensions.
Dimensions describe, metrics measure. In general, all metrics just measure dimensions in a certain way, e.g. by counting, averaging, summing etc. Examples of metrics include session time, unique pageviews or number of transactions. Examples of dimensions include marketing source or medium, device category, city etc.
Analytics reports are built by combining dimensions and metrics. For example, the Acquisition report will have sources as dimensions and time on site, goal completions and conversions as metrics.
For the Analytics-BigQuery link to give sensible results, you have to choose a correct combination of metrics and dimensions to be sent through the data pipeline. The key in combining dimensions and metrics properly is to look at their scope.
Each dimension and metric has only one scope: user, session or hit (there’s also product scope – irrelevant for now). Scope tells you when and how each dimension or metric is filled with actual data.
Hit scope describes individual actions – events, transactions or pageviews. Some examples of hit-scope dimensions and metrics are e.g. search term, event category, time on site. They’re set at the moment of the hit. This makes the scope of hit equal to itself.
Metrics and dimensions with session scope are set once for the series of hits called session. Typical dimensions and metrics with session scope include e.g. source, medium, bounce rate.
User scope aggregates data from multiple sessions of a given user. Examples of dimensions with the user scope are age, gender, city. All dimensions and metrics with user scope are set at the level of unique user identifier called client ID. Identifier is stamped into the browser, not the actual person. That’s why it’s inaccurate to treat it as equivalent to the behaviour of individual, unique people.
Building reports, be it inside Google Analytics or outside – e.g. in BigQuery, only makes sense if you combine dimensions and metrics from the same scope. If you combine metrics and dimensions from different scopes, you’ll either get no data or wrong data.
Google Analytics typically will not stop you from cross-scope combinations of metrics and dimensions and it will print a regular report for you. However, the data in such report will be meaningless.
That’s why it’s key to understand the differences between scopes and choose the right combination of metrics and dimensions for your Analytics-BigQuery data pipeline and reporting.
In order to see which data can be combined for your data pipeline, use Google Analytics Dimensions & Metrics explorer. The neat thing about the explorer is that the metrics or dimensions that can’t be combined will be greyed out. Take a look below.
In our report, we’ll be trying to discover how effective are various combinations of device categories, cities, and acquisition sources in generating engagement, transactions, and sales.
That’s why our data pipeline will send data from the following Analytics fields to BigQuery (this combination passed the Dimensions & Metrics Explorer test):
- Metrics: Goal Completions, Transactions, Revenue, Total Value.
- Dimensions: Acquisition Source / Medium, Device Category, City, Network Domain.
Note that Acquisition Source / Medium has user scope. The regular Source / Medium scope has session scope. The former is the source through which users were acquired, derived from users’ first session, according to Google.
Setting up data pipeline
Let’s get that data pipeline up and running.
In our case, we’ll set up a very specific case of data pipeline called ETL. ETL is an acronym for Extract, Transform, Load.
It means an automated process that takes fresh data from the source (like Google Analytics), transforms it in a certain way, loads into the destination (e.g. database or data warehouse like BigQuery). This process is repeated on a regular basis, e.g. every 12 hours.
BigQuery is a data warehouse. Think of it as a place to store and process a lot of data relatively cheap.
Building your own ETL tool is a lot of hard work. We won’t do it. Instead, we’ll use one of the off-the-shelf ETL tools to do all the heavy lifting for us. The tool we’ll use is Stitch.
Stitch is one of the top ETL tools out there. It can connect and grab data from a number of well-known marketing sources (e.g. Salesforce, AdRoll, Amplitude etc.) and send it to a number of databases and destinations (MySQL, PostgreSQL etc.), formatting it correctly along the way.
I’ll walk you through setting up your Stitch Account.
Visit Stitch website and sign up for a free trial. Free account gets you unlimited data volume in the trial period and 5 million rows of data points for free.
Successful account set-up gets you to the second step of the onboarding process – Adding an integration.
This is where you choose your datasource. In our case it will be Google Analytics. Look how many tools you can grab data from!
Choose search bar on the left to quickly look up Google Analytics.
Click on the Google Analytics icon. You’ll be taken to the config panel for this integration. Put some name down.
Lookup period for your data with integration default of 30 days. You can change it per your liking. Replication frequency one tells Stitch how often it should sync fresh data.
Once you log in, you’ll see a list of Google Analytics views to connect. Choose the one you want to sync your data from.
Next you’re asked to choose the combination of metrics and dimensions you want to sync data from. You can choose a maximum of 10 metrics and 7 dimensions.
Here we just choose what we’ve decided earlier. It should look like the following. Stitch will double-check your combination of dimensions and metrics for you.
Click Save and you’ll see another screen asking you to configure your warehouse. As you know, we’ll be sending data to Google BigQuery. That’s your choice for this screen.
As per instruction, for BigQuery to serve as your destination, first you’ll need an existing BigQuery account with admin permissions as well as a project with billing enabled and credit card attached.
Let’s do just that now.
Create new BigQuery project.
Name your project. It can be “Ecommerce Report 2020”, like below.
Go to Billing, create a new billing account and link your newly created BigQuery project to this billing account. Put down your credit card number into Google to activate the service. And claim your $300 credit for new users!
Now you need to create a new service account for Stitch to connect and send data to your BigQuery project.
In order to do it, in the GCP (Google Cloud Platform) console, choose the IAM & Admin tab on the left and click into Service Accounts.
In the Service account list, select Create service account. Enter a name in the Service account name field. From the Role list, select BigQuery > Admin.
Click Create and you’ll see a JSON file that contains your key downloading to your computer.
Open Stitch again.
Upload your JSON file to Stitch.
Choose Google Cloud Storage Location in Frankfurt.
Finally, choose loading behavior you want. You can choose upsert or append. I recommend choosing append.
If everything is correct, you should see a message that your data pipeline has been created.
It will take a while for your data pipeline to start working. In the meantime, you will see a pending status in the dashboard.
After a while, you should start seeing some data in your report.
You can see how many rows of data were synced in this billing period and when’s the next report due.
Once you have your data in Google BigQuery, it’s now time to get some valuable insights on ecommerce transactions and sales.
Let’s go get some sweet insights
We’ll first explore our BigQuery dataset in Google Data Studio.
In order to do that, open Data Studio and launch a Blank Report. Remember to log in to data studio through the same name you used for BigQuery signup. Otherwise Data Studio won’t see your BigQuery tables.
Name your Data Studio, BigQuery Deep Dive for example.
Click Create new data source in your BigQuery report.
Choose BigQuery as your data source.
Connect to your data.
Review the schema and add data to your report.
You can now freely use the data in your Data Studio report!
With so many categorical variables, the fastest way to arrive at valuable insights would be to use pivot table with some additional coloring. That’s the chart type we’ll play with.
For that, click Add a chart, then choose Pivot table with heatmap. This adds a basic pivot table to your dashboard.
Add rows, dimensions and metrics to your pivot table. One example of doing so is below. Play with it.
Click “View” and see your report live.
In the final report, you can quickly detect top performers across our categories: acquisition source/medium, device categories and cities. Color coding makes it much easier to consume the information.
Visual report and exploratory data analysis gives you much more orientation in your data, but it has some limits!
Let’s do some additional manipulation on our data. For that, we’ll need BigQuery and a bit of SQL.
Open Google Cloud Console and BigQuery. This is the view you should see.
Let’s run our first query. We want to see what is the cumulative revenue broken down by dimension values. The following, super simple SQL script (as in BigQuery version of SQL) should do the job.
SELECT acquisitionsourcemedium, city, devicecategory, SUM(transactionrevenue) as sumreve , FROM `analyticsbigquerytransactions.report` GROUP BY acquisitionsourcemedium, city, devicecategory ORDER BY sumreve desc
This query executed in no time and printed the following results (only top three rows shown).
What if we’re interested in checking the average size of transaction and the number of transactions across our dimensions? This should work.
SELECT acquisitionsourcemedium, city, devicecategory, SUM(transactionrevenue) as sumreve , SUM(transactions) as sumtrans, AVG(transactionrevenue) as avgreve FROM `analyticsbigquerytransactions.report` GROUP BY acquisitionsourcemedium, city, devicecategory ORDER BY sumreve desc
These are just very simple examples to illustrate that having your data in BigQuery makes it easy to manipulate it and get deeper business insights.
All you have to do to create a new variable is to add a tiny bit of script, rather than clicking endlessly around your Analytics reports. That’s efficiency to me.
We can also save query for the future to speed up insights generation on a regular basis.
What’s more, you can also schedule this query using BigQuery interface. Scheduled queries use Google BigQuery Data Transfer Service. By doing so, all you need to do it just to log in once in a while, and check up the result of your query.