Raw Google Analytics 4 (GA4) Data in BigQuery (BQ)

Investigate the first visits from users. By Sanu Maharjan

Bildschirmfoto 2022-06-01 um 22.09.17.png

After Google announced shutting down Universal Analytics (UA) a lot of users are bound to shift from UA to GA4. With this new change, it is essential to look at the data structure and how to extract important metrics from raw GA4. In this blog, I would like to introduce how to count event sessions, in a given calendar week, date, and hour in BigQuery.


We have already published an intro blog on how to query the data with static, dynamic, and hybrid dates. Please have a look at the blog from Vaishnavee to refresh your memory. With that in mind, let’s first look at the “Preview” of the data, before starting the query itself. The data is from a website that has been integrated with GA4 and the raw data is already imported to BQ.

Bildschirmfoto 2022-06-01 um 22.09.25.png

For this blog, let's focus on the following columns: event_date and event_name. Each row is created when a user interacts with your website or app. For example, first_visit is triggered when a new user visits the website or launches the app for the first time. Let's then look at the first visits to our website on different days and times. We are aiming for a table that looks like this:

Bildschirmfoto 2022-06-01 um 22.09.32.png

Assuming that, it is required to fetch data for this current running month-to-date (May 2022). The following query is valid:

 

SELECT *

FROM `PROJECT_NAME.analytics_PROPERTY_ID.events_*`

WHERE TRUE

AND _TABLE_SUFFIX BETWEEN '20220501' AND FORMAT_DATE('%Y%m%d', DATE_SUB( CURRENT_DATE(), INTERVAL 1 DAY))

 

Since we want to look at first_visits only, we will filter out data that has event_name = ‘first_visit’ and get the date from the event_date column. With the date column, it is possible to get the calendar week number. Here I have provided you with a special calendar week format that we use in datadice. If you have other approaches to format date to the responding week, let us know. Then we group it by calendar week and date and count the event_name. The following updated query will provide the result we want:

 

SELECT

CONCAT(FORMAT_DATE('%Y CW', PARSE_DATE('%Y%m%d', event_date)),

LPAD(CAST(CAST(FORMAT_DATE('%U', PARSE_DATE('%Y%m%d', event_date)) AS INT64)+1 AS STRING),2,'0')) AS calender_week,

PARSE_DATE('%Y%m%d', event_date) AS date,

COUNT(event_name) AS first_visits,

FROM `PROJECT_NAME.analytics_PROPERTY_ID.events_*`

 

WHERE TRUE

AND event_name = 'first_visit'

AND _TABLE_SUFFIX BETWEEN '20220501' AND FORMAT_DATE('%Y%m%d', DATE_SUB( CURRENT_DATE(), INTERVAL 1 DAY))

 

GROUP BY calendar_week, date

 

ORDER BY calendar_week, date

 

Now the remaining one is only the hour column, the time is stored in event_timestamp but in micros. In BQ, it is possible to change from micros to an hour-minute-second format by using TIMESTAMP_MICROS(event_timestamp).

 

Now using a CASE-WHEN statement, we will divide the hours into 4 groups: 00-06, 07-12, 13-18, and 19-23.

 

SELECT

CONCAT(FORMAT_DATE('%Y CW', PARSE_DATE('%Y%m%d', event_date)),

LPAD(CAST(CAST(FORMAT_DATE('%U', PARSE_DATE('%Y%m%d', event_date)) AS INT64)+1 AS STRING),2,'0')) AS calender_week,

PARSE_DATE('%Y%m%d', event_date) AS date,

CASE

   WHEN EXTRACT(HOUR FROM TIMESTAMP_MICROS(event_timestamp)) BETWEEN 0 AND 6                            

         THEN '00-06'

   WHEN EXTRACT(HOUR FROM TIMESTAMP_MICROS(event_timestamp)) BETWEEN 7 AND 12 

  THEN '07-12'

  WHEN EXTRACT(HOUR FROM TIMESTAMP_MICROS(event_timestamp)) BETWEEN 13 AND 18 

THEN '13-18'

  ELSE '19-23'

END AS hour,

COUNT(event_name) AS first_visits,

 

FROM `PROJECT_NAME.analytics_PROPERTY_ID.events_*`

 

WHERE TRUE

AND event_name = 'first_visit'

AND _TABLE_SUFFIX BETWEEN '20220501' AND FORMAT_DATE('%Y%m%d', DATE_SUB( 

CURRENT_DATE(), INTERVAL 1 DAY))

 

GROUP BY calendar_week, date

ORDER BY calendar_week, date

 

With this query, you are able to count the number of first visits sub-grouped by calendar week, date, and hours.

Conclusion

This post is a part of the “Raw GA4 data in BigQuery” series to get some quick insights into the user behavior on your website or app. I will end this session here for now, but in the upcoming posts, I will try to provide more queries to analyze data from GA4 and build up more complex queries in the upcoming future.

Further Links

This post is part of the series to get insights from your GA4 raw data in BQ.

 

Check out our LinkedIn account, to get insights into our daily working life and get important updates about BigQuery, Data Studio, and marketing analytics.

 

We also started with our own YouTube channel. We talk about important DWH, BigQuery, Data Studio, and many more topics. Check out the channel here.

 

If you want to learn more about how to use Google Data Studio and take it to the next level in combination with BigQuery, check our Udemy course here.

 

If you are looking for help to set up a modern and cost-efficient data warehouse or analytical dashboards, send us an email at hello@datadice.io and we will schedule a call.