Creating a Forecast for Google Shopping Ads using Prophet and R

Introduction

If you are a digital marketer, then you have probably had to do forecasting for the following quarter or year. Typically, the way it is done at agencies is that you grab historical data, see how much it has grown year over year, turn that into a factor, multiply it against the past year, and voila! You have next year’s forecast.

These forecast are usually presented in beginning of the year/QBR and then rarely looked at again. These forecast are subjective and unreliable when comparing against actual.

Seasonality, every digital marketers favorite scapegoat

If you have worked in digital marketing long enough, when performance goes sideway, then I’m sure you’ve heard , "it’s probably just, seasonality. Seasonality is often used as an easy-way-out of explaining bad performance.

As a digital marketer, how do you know if your marketing efforts are performing better than expected if you can not set expectations you are confindent in to compare it against?

Agencies Have a Severe Shortage of Highly Trained Analyst

Historically, making a high-quality forecast has been difficult to do. It took highly skilled statisticians/analyst to deliver forecasts with the level of accuracy that could drive serious business decisions. Simply multiplying last year’s results by a random feel-good factor does not produce a forecast you can depend on.

Analytics can be divided into the following three categories.

  • Descriptive: Analytics that looks into the past and tells you what happened? For example, CPA was high, Sales were higher than the previous year.
  • Predictive: Analytics that uses statistical models and forecasting methods that help answer: “What’s going to happen”. For example, ‘ Next year, sales will jump in Q3 because of A, B, and C reason.
  • Prescriptive: Analytics that uses statistical or machine learning models to run thousands if not millions of scenarios to answer: “What should we do?”

For this excercise, we will be focusing on the second type of analytics just mentioned, Predictive Analytics using Facebook’s Prophet library in R.

What is Time Series Analysis?

Time series analysis is a statistical technique that is used to detect trends in a value that variable takes over a length of time. These trends are then used to project an estimate of what is expected to happen in the future.

How is time-series different from other types of modeling? Time-series analysis only takes trends into consideration whether it is seasonality or year over year growth. Time-series does not take other variables into account. So, questions like, “What if I raise my bids?” or “What if the economy crashes?” aren’t accounted for using time-series modeling. Time-Series modeling only uses two variables, a time variable, ds, and y, the value you are trying to predict.

What is Prophet?

Facebook describes Prophet as: >Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

If the above does not make complete sense to you, no worries. The same way you do not need to know quantum mechanics to use a cellphone, you do not have to know the details of what Prophet is doing under the hood to make reliable predictions using Time Series Analysis.

It’s Free!

Prophet is an open-source (meaning free!) library used in R and Python that creates fast and automated forecast that can beat many a skilled analyst using traditional methods. Prophet is great because: + Fast & Accurate: Performs better than any other approach + Fully Automatic: Delivers great forecast right out of the box. + Tunable Forecast: Allows the user to tweak and adjust to improve forecast.

Prophet takes yearly, weekly, daily, and holidays effects into consideration when creating forecast. The forecast are robust, reliable, and the library contains functions that automatically add holidays for different countries. Prophet gives the flexibility to include special dates like Black Friday or special promotion dates that are specific and meaningful to a particular business.

Created by Facebook for Facebook

The Facebook data science team created Prophet to help their team produce reliable forecast. In 2016, Facebook shared the library to the public. In Facebook’s own words:

The intent behind Prophet is to “make it easier for experts and non-experts to make high-quality forecasts that keep up with demand.” Prophet is able to produce reliable and robust forecasts (often performing better than other common forecasting techniques) with very little manual effort while allowing for the application of domain knowledge via easily-interpretable parameters.

What can we use Prophet for in Digital Marketing?

Prophet can be used to predict all types of variables like rise in CPA, revenue, clicks, visits. Anything that has a seasonal trend can be used by Prophet to predict future outcomes. ’

Once you have these predictions, you can then compare to actual outcomes and see if your efforts helped drive performance higher than expectations or lower. This can help answer the question, is it seasonality or is it something else?

Getting Started

In this tutorial, we will use Prophet and R to solve a real-world problem digital marketers often face: forecasting sales for a marketing channel for a year. Wew ill then compare that the forecast we predicted to actual performance.

We will be training our models on data from 2016 to 2019 and then compare it to 2020 to see how accurate how actual performance compares to our expected results. It is advised that we use atleast one years worth of data to make a forecast. We will add special timer perioud into the model to show how we can tune Prophet to produce more accurate forecast.

Hopefully, this sparks some inspiration for you to dive even deeper into the library’s vast functionality.

We will be using R and going over the following major steps: + Setting up our R environment + Getting and preparing the Data + Create a Prophet Model + Forecast and visualize + Comparing predicted to actual results

Setting Up

We will first install and load the neccessary libraries.

Getting the Data Into R

To make this accessible to users who are familiar with R, I downloaded daily performance as a CSV file starting from 2016 until December 20 directly from the Google Ads interface. If you have a database, you can connect R directly to your db.

Prophet requires the following qualities: + No large missing gaps: There should be no periods with large time gaps. If there are some gaps, it’s okay to get the mean of surrounding days and replace those gaps with the mean. + One Entry Per Day: One entry for whatever time granularity you are using. If you have multiple entries per day, make you aggregate your data so that you have only one entry per day. + Column Names: The y variable which is what you’re measuring and the time variable, which Prophet needs it to be labeled as ds.

Make sure that you take note of the file name and that it is in your Working Directory. In the following code chunk, we will enter our csv file into R and then take a quick peek usig the dlookr’s diagnose function to see the following:

  • Variables
  • Data Type
  • Missing Data
  • Repeated Data
variables types missing_count missing_percent unique_count unique_rate
Day factor 0 0 1790 1.0000000
Currency factor 0 0 1 0.0005587
Clicks factor 0 0 1073 0.5994413
Impressions factor 0 0 1770 0.9888268
CTR factor 0 0 127 0.0709497
Avg..CPC numeric 0 0 192 0.1072626
Cost factor 0 0 1761 0.9837989
Impr…Abs..Top… factor 0 0 1 0.0005587
Impr…Top… factor 0 0 1 0.0005587
Conversions numeric 0 0 910 0.5083799
View.through.conv. integer 0 0 4 0.0022346
Cost…conv. numeric 0 0 1270 0.7094972
Conv..rate factor 0 0 651 0.3636872

Taking a quick glance at the data, we realize the following:

  • Unnecessary Data: 13 columns, we only need two.
  • Day Format: Day is a factor and not in Date format
  • Days are Unique: There are no repeated days, which is exactly how we need it.

Prepare the Data

The Prophet Library requires that you you used two columns labeled ds for time and y for the variable you’re trying to predict. It is also highly recommended that all date entries be even, unique, without any large gaps. The ds time variable has to be in the “yyyy-m-d” time format.

In the next step, We will do the following:

  • Remove unnecessary Columns
    • Reformat day to time format
    • Rename columns to ds for time and y for conversions
    • Visualize the data to inspect for abnormalities like missing dates

Looking at the chart above and the following sticks out: + No obvious times gaps + Sales spike during the end-of-the-year Holiday Season + It seems the pandemic had little impact on sales in 2020

Sales peak at the end of every year. It is my suspicioun that these peak days are most likely around the Black Friday weekend. Lets confirm this by looking at the top three sales dates for every year.

Top 3 Sales Dates Per Year
rank 2016 2017 2018 2019 2020
1 2016-12-18 2017-11-27 2018-12-09 2019-12-02 2020-11-27
2 2016-11-28 2017-12-10 2018-12-11 2019-12-01 2020-11-30
3 2016-12-13 2017-12-11 2018-11-26 2019-12-08 2020-11-28

Looking at the top three sales dates for the past five years,we see that the all fall into the next three groups: + Black Friday + Cyber Monday + December 18 and the ten day period before the last day one can order and get it before the Christmas

Holidays and Special Dates Custom Dates

Prophet has a feature that automatically adds all national holidays to your data set with one simple function.

Adding Custom Dates

As previously noted, the dates with the highest sales fall into the long Black Friday Weekend, and the ten days leading up to the last day you can order and receive by mail before Christmas day, which is around the 18nth of December.

In the followings step, we’re going to include the Black Friday weekend by adding an upper windown of four days after the Thanksgiving Holiday. In doing so, we are telling Prophet to take special note of Black Friday to Cyber Monday. We will also let Prophet to look at the last 10 days before the last day you can order online and get your order before Christmas.

Special Dates

The custom holidays feature can be used for different types of special dates or periods. If there is special promotions throughout the year, this would be a great place to enter those dates to help Prophet get more accurate results.

If you have special promotions throughout the year, this would be a great place to enter those dates to help Prophet get more accurate results.

Splitting our Data

Now that we have our custom* holiday_season table, we will split the shopping data into a training and test set. We are going to use our shopping data prior to 2020 as training data and then use 2020 data as our test set. We will use our test set to see how we performed in 2020 in relation to the overall trend from the past years.

Creating Our First Model

Now that we added our custom holidays table and split our data, we will use Prophet to create our first model, m and include our custom holidays table, using the prophet function.

Forecasting 2020 Sales

Now that we have our model, lets create a dataframe for the next 365 days or the amount of days you want to forecast for which we can then use to feed into our predict function.

Now that we have our future dateframe of days we want to forecast for, we we will feed our model, “m” and “future” into the predict function.

We grab our model and our forecast and plot it to visualize what our forecast looks like layered over past data.

Detecting Changepoints

The Prophet library include the ability to detect changepoints easily. Change point are dates that Prophet finds notable. To help detect these change points, we can use the add_changepoints_to_plot to our plot.

The change points show us several anomalies in 2017. Perhaps this company had a special promotion or discounts that caused a surge in sales as they seem to be equidistant for each other. There also seems to be a special date in the first quarter of the year for the previous three years. We will not do it in this excercise but normally, I would look to see what these dates to see if they alighn with the businesses promotion calendar of if they align with any other holidays.

Overall Trend and Seasonality

Overall, Prophet predicts higher sales in 2020 than it did in 2019. Lets take a look at the different trends by calling the prophet_plot_components function.

Loooking at the various components, I have the following observations:

  • Sales are trending up Sales are trening up year over year.
  • Black Friday Weekend Spike There is a big spike on Black Friday following by smaller spike days two weeks before Christmas.
  • Sundays are the bestThe best sales day is Sunday and stays strong until Wednesday, when it then tapers off to the worst day, Saturday.
  • End-of-Year Bump The best time of year for sales begin in the later half in November then taper off as it January approaches.

Comparing Forecast to Reality

When we split our data, we excluded 2020 data from our model to compare our shopping channels sales performance to what was forecasted. In the following code chunk, we will join our forecast table with our test set containing 2020 data. We will then create a delta column showing the difference between forecasted sales and actual sales. We will visualize what the delta looks like over time.

From what we see above, we observe the following: + For most of 2020, sales underperformed when compared to the previous trend. + A strong effort prior to Black Friday that exceeded expectations. + The shopping campaign did not have a strong enough follow through during the period following the Black Friday Weekend.

When we compare our expected sales, (yhat) against actual sales, (y), we see that predictions were closer to being accurate on high sales days.

Now that we see that our sales were below expectations, we can start digging into why and what happened? Was it Covid? Did we this account as much? Did CPM’s go up? Did the landing page experience change dramatically?

Conclusion

As you can see, thanks to the generosity of the Data Science team at Facebook, we can now create great forecast without being an expert in statitistics or machine learning.

If you have any questions, comments, would like to share your experience using Prophet, please feel free to share!