Microsoft’s PowerBI platform continues to present itself as a strong competitor to entrenched players like Tableau, Looker, and Qlik. Power BI has strengths with its embedded ETL toolset, very fast Columnar (SSAS) database, powerful DAX programming language, R integration, and its compelling pricepoint. That aside, there are still key areas that are missing when compared to the competition. One of those missing areas is in Forecasting in a time series, which I will walk you through how to setup a simple, yet effective system using PowerBI and DAX to bring Forecasting into your reporting.
A little background
Time Series Modelling and Forecasting is a relatively new (past 2 decades) and complex science. The overarching goal of Time Series forecasting is to observe historical, linear, time-series trends and to use those patterns to predict future values. To achieve this there are many methodologies (AR, MA, ARIMA, SARIMA, Artificial Neural Networks, and Support Vector Machines) to name just a few. These various methodologies are individually designed to best fit certain business patterns (univariate, multivariate, seasonality, etc). However, in the interest of simplicity and accessibility – I will skip these complex methodologies and show you how to implement this in a manner that still achieves a good fit (~1% observed, monthly) and allows you to tweak the model to fit your web traffic.
Before beginning the model, we nee to have some key assumptions regarding the data in order for the model to be effective:
- The time series must be linear in scale
- Traffic should follow a know statistical (normal) distribution. (e.g. Wild organic swings will introduce error into the modelling)
- Parsimony (KISS) – Try to keep the input points as simple and consistent as possible.
So, why not just do this in R?
Yes, R-Project has all the various time time series functions available and PowerBI has native R integration – but there are limitations on R interoperbility that this overcomes:
- R visualizations don’t work with the [Web based] Power BI service. This prevents you from sharing dashboards with your organization.
- R visualizations are only reactive to other visualizations (you cant click into them and filter on other visuals in the work space)
- Keeps the development level-of-effort to a minimum
First things first
- We will be using Pageviews at the key metric in this forecast. This model is flexible enough to work directly with GA data you may already have been pulled into PowerBI. For this model, I have created a sample data set which is included in the downloadable files. This data has three fields, date, traffic source, and pageviews and is in a .csv file. Again, you can easily modify the project to use data from your own data source.
- Since we are creating a forecast for future values, we need to create a table that has future dates – as the data from the file (or your data source) will only be up to current. The .pbix file I created has this data set loaded and I will also provide the corresponding .csv as well.
- We need to create a 1:many relationship with the date-lookup table to our data. Going forward, we will use the datelookup table’s date instead of the date in the data..
Creating the DAX Measures
Our time-series forecast will be using historical data points from up to 5 data points (contingent on availability) to forecast future data points: t-365, t-1 month, t-28d, t-7d, and t-1d. It will take the average of those data points to create the forecast data point. To get DAX to grab data points outside of the current filtered dataset, we use the CALCULATETABLE() function. You will create the following six (6) MEASURES under your data table (for the example, I have them on the site_data table)
- t-365 to address Annual Seasonality
- t-1month (essentially, same date in previous month) to address end-of-month behavior (such as ramping up traffic initiatives to make traffic goals)
- t-28d (4 week trailing – but focused around weekpart trends)
- t-7d (1 week trailing – again, focused to influence growthrate in the near term, but maintaining weekpart biases)
- t-1d (1 day trailing – to pull in any recent influencers)
Remember – that t
he further out you forecast from historical data, the less accurate it will become. However, these models will constantly evolve as you get nearer to the forecast date (filling in more
ecent data points)
Pageviews (Prior Day) =
Pageviews (Prior Week) =
Pageviews (Prior 28d) =
Pageviews (Prior Month) =
Pageviews (Prior Year) =
Pageviews (Trend) =
([Pageviews (Prior Day)]
+[Pageviews (Prior Week)]
+[Pageviews (Prior 28d)]
+[Pageviews (Prior Month)]
(IF(ISBLANK([Pageviews (Prior Day)]),0,1)
+IF(ISBLANK([Pageviews (Prior Week)]),0,1)
+IF(ISBLANK([Pageviews (Prior 28d)]),0,1)
+IF(ISBLANK([Pageviews (Prior Month)]),0,1)
Bringing it all together
At this point, you’re ready to implement the visualizations. For the key visual, I am using the native “line and stacked bar” visualization. I also have a pie chart (to select the different traffic sources), the date range selector (YYYYMM), and lastly – the SQLBI Bullet Chart (available via https://app.powerbi.com/visuals/).
From here, you can select dimensions on any of the visualizations and the forecast will refresh to show just that data. This is extremely useful when you have variances in your model and you want to find out the attribution point(s) that has a variance to the model. Perhaps you had a hit on a social network or your email platform didn’t send out scheduled emails – this is a great way to proactively spot that.
At the end of the day, its all about finding your ideal fit (the forecast should best match the actuals). You may need to introduce weightings into your model (currently they are evenly weighted). For example, if your traffic has a very strong seasonal component (e.g. Holiday traffic) – you may want to add more weight behind your t-365 data point. In my experience I have removed the annual seasonality all together as well as the t-28d data point. The net result was a ~1% variance to actuals. It all comes down to understanding your traffic patterns first, then revising the model to fit.
If you have any questions – feel free to email me or post a comment. Best of luck and I hope this was useful.
I have zipped up the pbix and source files. Click timeseriesforecast to download them.