Prado, Matthew - Lecture 2 - Intermediate/Advanced BigQuery for GA4

Lecture 2 — Intermediate/Advanced BigQuery for GA4

Lecture 2 — Intermediate/Advanced BigQuery for GA4

1) CTEs (WITH) to structure logic

What it does: creates named “mini tables” inside a query.

Total users + new users 

WITH UserInfo AS (

SELECT

user_pseudo_id,

MAX(IF(event_name IN ('first_visit', 'first_open'), 1, 0)) AS is_new_user

FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

WHERE _TABLE_SUFFIX BETWEEN '20201101' AND '20201130'

GROUP BY user_pseudo_id

)

SELECT

COUNT(*) AS total_users,

SUM(is_new_user) AS new_users

FROM UserInfo;

2) Arrays + UNNEST (the GA4 superpower)

GA4 stores lots of fields inside arrays:

  • event_params (key/value pairs)

  • items (products in commerce events)

Two common patterns:

Pattern A — Scalar subquery extraction (simple, safe)

Pull one parameter from event_params without exploding rows.

SELECT

TIMESTAMP_MICROS(event_timestamp) AS event_time,

(

SELECT value.string_value

FROM UNNEST(event_params)

WHERE key = 'page_location' LIMIT 1

) AS page_location

FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

WHERE event_name = 'page_view'

AND _TABLE_SUFFIX BETWEEN '20201201' AND '20201202'

LIMIT 50;

Pattern B — Flattening (UNNEST in FROM) for item-level analysis

This multiplies rows (one event can have many items).

SELECT

event_date,

item.item_name,

COUNT(*) AS item_rows

FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` e,

UNNEST(e.items) AS item

WHERE e.event_name = 'purchase'

AND _TABLE_SUFFIX BETWEEN '20201201' AND '20201231'

GROUP BY event_date, item.item_name

ORDER BY item_rows DESC

LIMIT 20;

Note: UNNEST changes the “grain.” After unnesting items, you’re no longer at “event-level”; you’re at “item-row per event.”

3) STRING_AGG, ARRAY_AGG (useful aggregations)

What they do: combine many values into one row per group.

Example: show a few event_names seen per day

SELECT

event_date,

STRING_AGG(DISTINCT event_name, ', ' ORDER BY event_name) AS events_seen

FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

WHERE _TABLE_SUFFIX BETWEEN '20201201' AND '20201203'

GROUP BY event_date

RDER BY event_date;

Example: Build a “session cart summary” (top items a user added to cart)

Use Case 1:

– Task: Create a list of items per session

SELECT     user_pseudo_id,     ARRAY_AGG(item_name) AS items_added_to_cart FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`, UNNEST(items) AS item WHERE event_name = 'add_to_cart' AND _TABLE_SUFFIX BETWEEN '20201201' AND '20201231' GROUP BY user_pseudo_id ORDER BY user_pseudo_id LIMIT 10;

Use Case 2:

– Session-level “cart summary” using ARRAY_AGG

WITH add_to_cart AS (   SELECT     user_pseudo_id,     (       SELECT value.int_value        FROM UNNEST(event_params)        WHERE key = 'ga_session_id'     ) AS session_id,     TIMESTAMP_MICROS(event_timestamp) AS event_timestamp,     i.item_id,     i.item_name,     i.quantity,     i.price   FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`,   UNNEST(items) AS i   WHERE event_name = 'add_to_cart'     AND _TABLE_SUFFIX BETWEEN '20210101' AND '20211231'  -- cost control ) SELECT   user_pseudo_id,   session_id,   COUNT(*) AS total_add_to_cart_events,              -- added insight   ARRAY_AGG(     STRUCT(item_id, item_name, quantity, price, event_timestamp)     ORDER BY quantity DESC, event_timestamp ASC     LIMIT 10   ) AS cart_items FROM add_to_cart WHERE session_id IS NOT NULL                          --  filter nulls GROUP BY user_pseudo_id, session_id;

4) Joins (start with INNER and LEFT)

What it does: combines results based on matching keys.

Example: daily users joined with daily purchases (using CTEs)

WITH daily_users AS (

  SELECT

    event_date,

    COUNT(DISTINCT user_pseudo_id) AS users

  FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

  WHERE _TABLE_SUFFIX BETWEEN '20201201' AND '20201231'

  GROUP BY event_date

),

daily_purchases AS (

  SELECT

    event_date,

    COUNT(DISTINCT

      (SELECT value.string_value

       FROM UNNEST(event_params)

       WHERE key = 'transaction_id')

    ) AS purchases    -- distinct transactions

  FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

  WHERE event_name = 'purchase'

    AND _TABLE_SUFFIX BETWEEN '20201201' AND '20201231'

  GROUP BY event_date

)

SELECT

  u.event_date,

  u.users,

  -- turns “no purchase row (NULL)” into 0 purchases, which is what you want for a daily time series

  IFNULL(p.purchases, 0) AS purchases,              

  -- Conversion rate: what % of daily users made a purchase

  ROUND(

    IFNULL(p.purchases, 0) / NULLIF(u.users, 0) * 100, 2

  ) AS conversion_rate_pct        -- NULLIF prevents division by zero

FROM daily_users u

LEFT JOIN daily_purchases p

  ON u.event_date = p.event_date

ORDER BY u.event_date;

5) Window functions + QUALIFY

What they do: calculate “across rows” without collapsing to one row per group.

Top 3 event types per day (RANK) + QUALIFY

WITH daily_event_counts AS (

SELECT

event_date,

event_name,

COUNT(*) AS events

FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

WHERE _TABLE_SUFFIX BETWEEN '20201201' AND '20201207'

GROUP BY event_date, event_name

)

SELECT

event_date,

event_name,

events,

RANK() OVER (PARTITION BY event_date ORDER BY events DESC) AS rnk

FROM daily_event_counts

QUALIFY rnk <= 3 

ORDER BY event_date, rnk;

Rolling 7-day average of daily purchases

WITH daily_purchases AS (

SELECT

PARSE_DATE('%Y%m%d', event_date) AS dt,

COUNT(*) AS purchases

FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

WHERE event_name = 'purchase'

AND _TABLE_SUFFIX BETWEEN '20201201' AND '20201231' GROUP BY dt

)

SELECT

dt,

purchases,

AVG(purchases) OVER (

ORDER BY dt

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

) AS purchases_7d_avg

FROM daily_purchases

ORDER BY dt;

6) Approximate functions (performance-minded)

What they do: return “close enough” answers faster on huge datasets.

Approx distinct users per event type

SELECT

event_name,

APPROX_COUNT_DISTINCT(user_pseudo_id) AS approx_users

FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

WHERE _TABLE_SUFFIX BETWEEN '20201201' AND '20201231'

GROUP BY event_name

ORDER BY approx_users DESC

LIMIT 15;