How to query the GA realtime dataset using standard SQL

If you’re using GA360 and have setup the automated export to BiqQuery – you’ve probably noticed the realtime streaming option. Although it would outwardly seem like you can query these (.ga_realtime_sessions_yyyymmdd) tables like the regular (.ga_sessions_yyyymmdd) tables using standard or legacy sql – its not that simple. If you do – you will note data in which the counts are highly inflated. NOW, If you’re using legacy sql, google provides a view you can query – and you can skip the rest of this article. However, if you’re using standard sql, this view is not accessible (it will give you a compatibility error). In order to get accurate numbers, we need to filter to only the top visitkey value. One option is to create a new series of views (which I won’t cover). As an alternative – you can use the following query, which will deduplicate the data to the proper counts. Voila!

Note: this query is actually part of a larger view in which I “UNION ALL” about 25 tables. The consumer of which is PowerBI – but you can use it (or modify) it to your liking. Also note that I have some custom dimensions in here – those can be modified or removed to suit your implementation.

select
‘site name’ as SiteCode,
DATE_TRUNC(PARSE_DATE(“%E4Y%m%d”, date), DAY) AS Date,
REGEXP_REPLACE(hits.page.pagePath,r'(\?.*)’,”) AS PAGE, — Scrubs off the querystring values (if you haven’t already put a filter in GA)
(SELECT MAX(IF(index=1, value, NULL)) FROM UNNEST(hits.customDimensions)) AS ArticleID, — Custom Dimension
(SELECT MAX(IF(index=14, value, NULL)) FROM UNNEST(hits.customDimensions)) AS LogIn,  — Custom Dimension
(SELECT MAX(IF(index=5, value, NULL)) FROM UNNEST(hits.customDimensions)) AS PageType,  — Custom Dimension
device.deviceCategory as Mobile_Device_Type,
trafficSource.campaign AS TrackingCode,
trafficSource.source AS ReferringDomain,
trafficSource.medium AS Medium,
COUNT(hits.page.pagePath) AS PageViews — Use INSTEAD of SUM(totals.pageViews)
FROM
(SELECT
ARRAY_AGG(t ORDER BY exportTimeUsec DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.ga_realtime_sessions_20*` AS t
WHERE _TABLE_SUFFIX = FORMAT_DATE(“%y%m%d”, CURRENT_DATE())
GROUP BY t.visitKey), — SubQuery to de-dupe the realtime data
UNNEST(hits) as hits
WHERE
hits.type = “PAGE” — Removes events and such from results
GROUP by
Date,
PAGE,
ArticleID,
Login,
PageType,
Mobile_Device_Type,
TrackingCode,
ReferringDomain,
Medium

Article by David

It seems like this author has no description. Add your discription/bio at user profile or disable this widget in theme customizer if you dont want to use it.

Leave a Reply

Your email address will not be published. Required fields are marked *