Time series analyses can provide insight to trends and seasonal changes within data sets over time. Forecasting can also be done to understand future cyclical patterns, detect anomalies in previous data and help to make data-driven decisions.

In this example, we looked at the interest levels in AFL from 2004 to build two different models.



Prepare the Data

Step 1: Download Data

Go to https://trends.google.com/trends/explore?date=today%205-y&geo=AU&q=%2Fm%2F0ckh09&hl=en to dowload the data. Store the downloaded data in a new folder called Data within the same folder that contains your R Project.

Open the csv and remove the first two rows. Save the new file as AFL_Interest.csv



Step 2: Load Packages & Data into R

Use the read.csv() function to call up the downloaded data in the Data folder.

# load packages
library(tidyverse)
library(tidymodels)
library(modeltime)
library(timetk)

# load data
afl <- read.csv("Data/AFL_Interest.csv")



Step 3: Clean the Data

The first step to clean the data is to rename the second column to something that makes more sense such as Interest. We mutate the Month column to ensure it’s in the correct format to use the timetk package. This includes adding a day to the month-year format.

afl <- afl %>% 
  rename(Interest = 2) %>% 
  mutate(Month = as.Date(paste0(Month, '-01')))



Step 4: Visualise Data

To plot the data and a trend line, use the plot_time_series functions from the timetk package. In doing this, we can see a clear upwards trend along with seasonal changes.

# initial view of data 
afl %>% 
  timetk::plot_time_series(
    .date_var = Month, 
    .value = Interest
  )



We can also plot the anomalies to further investigate potential outliers. The .alph is generally set to 0.05, however, this can be changed to increase or decrease the confidence interval.

afl %>% 
  timetk::plot_anomaly_diagnostics(
    .date_var = Month, 
    .value = Interest, 
    .alpha = 0.05
  )



Step 5: Split Data

To be able to train and test models, we need to create training and testing data sets. To do this, use the initial_time_split() function. Assign the training data and testing data to new variables using the training() and testing() functions.

# split data 
splits <- initial_time_split(afl) 

# define training and testing 
train <- training(splits)
test <- testing(splits)



Forecast

Step 1: Create Models

Use the training data to build a linear model and an auto ARIMA model. These are two models that have been selected, however, there are many more model types to choose from. Add these models to a table to be quickly test and refit models later on.

# create linear model
lm_fit <- linear_reg() %>% 
  set_engine('lm') %>% 
  fit(Interest ~ Month, data = train)

# create auto ARIMA model
arima_fit <- arima_reg() %>% 
  set_engine('auto_arima') %>% 
  fit(Interest ~ Month, data = train)

# add models to a table
# built table of models
model_tbl <- modeltime_table(
    lm_fit, 
    arima_fit
)



Step 2: Calibrate & Test Models

Calibrate the models on the test data and create a visual of the predicted values and the actual values of the test data set for both models.

# calibrate models on testing data
calibrate_tbl <- model_tbl %>% 
  modeltime_calibrate(new_data = test)
# forecast testing
calibrate_tbl %>% 
  modeltime_forecast(
    actual_data = afl, 
    new_data = test
  ) %>% 
  plot_modeltime_forecast()



Step 3: Refit & Forecast Forward 3 Years

Now to forecast. To do this, the models need to be refitted using the modeltime_refit() function on the entire data set and projecting 3 years forward using the modeltime_forecast() function.

# refit
refit_tbl <- calibrate_tbl %>% 
  modeltime_refit(data = afl)

# forecast
refit_tbl %>% 
  modeltime_forecast(h = '3 years', actual_data = afl) %>% 
  plot_modeltime_forecast()