Time series analyses can provide insight to trends and seasonal changes within data sets over time. Forecasting can also be done to understand future cyclical patterns, detect anomalies in previous data and help to make data-driven decisions.
In this example, we looked at the interest levels in AFL from 2004 to build two different models.
Go to https://trends.google.com/trends/explore?date=today%205-y&geo=AU&q=%2Fm%2F0ckh09&hl=en to dowload the data. Store the downloaded data in a new folder called Data within the same folder that contains your R Project.
Open the csv and remove the first two rows. Save the new file as AFL_Interest.csv
Use the read.csv() function to call up the downloaded data in the Data folder.
# load packages
library(tidyverse)
library(tidymodels)
library(modeltime)
library(timetk)
# load data
afl <- read.csv("Data/AFL_Interest.csv")
The first step to clean the data is to rename the second column to
something that makes more sense such as Interest. We mutate the
Month column to ensure it’s in the correct format to use the
timetk package. This includes adding a day to the
month-year format.
afl <- afl %>%
rename(Interest = 2) %>%
mutate(Month = as.Date(paste0(Month, '-01')))
To plot the data and a trend line, use the plot_time_series
functions from the timetk package. In doing this, we can
see a clear upwards trend along with seasonal changes.
# initial view of data
afl %>%
timetk::plot_time_series(
.date_var = Month,
.value = Interest
)
We can also plot the anomalies to further investigate potential outliers. The .alph is generally set to 0.05, however, this can be changed to increase or decrease the confidence interval.
afl %>%
timetk::plot_anomaly_diagnostics(
.date_var = Month,
.value = Interest,
.alpha = 0.05
)
To be able to train and test models, we need to create training and testing data sets. To do this, use the initial_time_split() function. Assign the training data and testing data to new variables using the training() and testing() functions.
# split data
splits <- initial_time_split(afl)
# define training and testing
train <- training(splits)
test <- testing(splits)
Use the training data to build a linear model and an auto ARIMA model. These are two models that have been selected, however, there are many more model types to choose from. Add these models to a table to be quickly test and refit models later on.
# create linear model
lm_fit <- linear_reg() %>%
set_engine('lm') %>%
fit(Interest ~ Month, data = train)
# create auto ARIMA model
arima_fit <- arima_reg() %>%
set_engine('auto_arima') %>%
fit(Interest ~ Month, data = train)
# add models to a table
# built table of models
model_tbl <- modeltime_table(
lm_fit,
arima_fit
)
Calibrate the models on the test data and create a visual of the predicted values and the actual values of the test data set for both models.
# calibrate models on testing data
calibrate_tbl <- model_tbl %>%
modeltime_calibrate(new_data = test)
# forecast testing
calibrate_tbl %>%
modeltime_forecast(
actual_data = afl,
new_data = test
) %>%
plot_modeltime_forecast()
Now to forecast. To do this, the models need to be refitted using the modeltime_refit() function on the entire data set and projecting 3 years forward using the modeltime_forecast() function.
# refit
refit_tbl <- calibrate_tbl %>%
modeltime_refit(data = afl)
# forecast
refit_tbl %>%
modeltime_forecast(h = '3 years', actual_data = afl) %>%
plot_modeltime_forecast()