Time Series Forecasting

Time Series Forecasting is a prediction technique that analyses past events to find trends that could be repeated in the future. In this analysis, we will be looking into the popularity of the Essendon Football Club over time and forecasting the club’s popularity over the next 3 years.

Downloading The Data

The data used for this analysis can be found at https://trends.google.com.au/trends/explore?date=all&q=%2Fm%2F02sc5&hl=en-AU and data was taken from 2004 to the present. Save the data as essendon.csv.

Preparing The Data

Within the CSV file, remove the top two rows of the data. Change the names of the two columns to something more appropriate - in this case I have named column 1 as Year.Month and column 2 as Essendon.Football.Club. Add the CSV to your R project and read in the data.

essendon <- read.csv('essendon.csv')

Load Libraries

If libraries are not installed, first install them.

library(tidyverse)
library(xgboost)
library(tidymodels)
library(modeltime)
library(lubridate)
library(timetk)
library(fpp3)

Time Series Plot

First, we will look at the data on a time series plot. We will need to change the Year.Month variable to an appropriate data type first.

# convert to date

essendon$Year.Month <- ym(essendon$Year.Month)

# time series plot

essendon %>% 
  timetk::plot_time_series(.date_var = Year.Month,
                           .value = Essendon.Football.Club)

Anomalies Plot

Now we will investigate this plot further by plotting for anomalies. A quick Google of the Essendon Football Club with each of these time periods attached will help to give context as to why these anomalies occurred. The alpha value is generally set to 0.5.

# We can adjust the threshold for anomalies with .alpha
essendon %>% 
  timetk::plot_anomaly_diagnostics(.date_var = Year.Month,
                                   .value = Essendon.Football.Club,
                                   .alpha = 0.05)

Splitting The Data

Next we will split the data into training and testing sets. This will allow us to train our models and see how they perform on existing data. The splits can be inspected below.

## Split the data 
splits <- initial_time_split(essendon)

## Create training and testing sets
train <- training(splits)
test <- testing(splits)

# Use tk_time_series_cv_plan to inspect the splits
splits %>% 
  tk_time_series_cv_plan() %>% 
  plot_time_series_cv_plan(.date_var = Year.Month,
                           .value = Essendon.Football.Club)

Creating The Forecast Models

Now we will create our forecast models. For the sake of this paper we will use a linear regression model, an auto-ARIMA model and a prophet model, however there are plenty more that can be used.

# create the models

# Auto ARIMA 

arima_fit <- arima_reg() %>% 
  set_engine("auto_arima") %>% 
  fit(Essendon.Football.Club ~ Year.Month, data = train)

# Prophet 

prophet_fit <- prophet_reg() %>% 
  set_engine("prophet") %>% 
  fit(Essendon.Football.Club ~ Year.Month, data = train)

# Linear regression

lm_fit <- linear_reg() %>% 
  set_engine("lm") %>% 
  fit(Essendon.Football.Club ~ Year.Month, data = train)

Calibrate the Models Using Test Data

Next we will add our models to a table and calibrate them to the test data set. The model predictions will then be plotted against the actual data.

# put models into table

models_tbl <- modeltime_table(
  arima_fit,
  prophet_fit,
  lm_fit
)

# calibrate models

calibrate_tbl <- models_tbl %>% 
  modeltime_calibrate(new_data = test)

# plot the forecasts

calibrate_tbl %>% 
  modeltime_forecast(
    actual_data = essendon,
    new_data = test
  ) %>% 
  plot_modeltime_forecast()

Check Accuracy

We will now check the accuracy of these models to assess which performed the best.

# Check results 
calibrate_tbl %>% 
  modeltime_accuracy()
## # A tibble: 3 × 9
##   .model_id .model_desc             .type   mae  mape  mase smape  rmse    rsq
##       <int> <chr>                   <chr> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1         1 ARIMA(1,0,1)(2,1,0)[12] Test   3.28  29.1 0.652  21.3  5.01 0.724 
## 2         2 PROPHET                 Test   6.60  48.7 1.31   60.1  7.93 0.566 
## 3         3 LM                      Test   9.13 110.  1.81   55.4 11.0  0.0229

Refit the Models and Forecast the Next 3 Years

Finally, we will refit the models to the full data set and forecast the interest in the Essendon Football Club over the next 3 years.

## Refit and forecast forward 

refit_tbl <- calibrate_tbl %>% 
  modeltime_refit(data = essendon)

refit_tbl %>% 
  modeltime_forecast(h = "3 years", actual_data = essendon) %>% 
  plot_modeltime_forecast()