Time series analysis is used to find trends, patterns and seasonal changes within a data set. Forecasting is used to predict future patterns and trends. For this analysis we will be looking at Carlton Football Club’s time series and forecast 3 years into the future

Step 1: Source Data

The data for this analysis was found on Google Trends (https://trends.google.com.au/trends/) where you can search your interest topic for the time series and forecast which was Carlton Football Club in this analysis. Within google trends you are able to download the data set onto your computer.

Step 2: Load Packages and Load Data

For this step the functions library() which is used to call the packages and read.csv() is used to call up the data.

# Load Packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
## ✔ broom        1.0.5      ✔ rsample      1.2.1 
## ✔ dials        1.3.0      ✔ tune         1.2.1 
## ✔ infer        1.0.7      ✔ workflows    1.1.4 
## ✔ modeldata    1.4.0      ✔ workflowsets 1.1.0 
## ✔ parsnip      1.2.1      ✔ yardstick    1.3.1 
## ✔ recipes      1.0.10     
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter()   masks stats::filter()
## ✖ recipes::fixed()  masks stringr::fixed()
## ✖ dplyr::lag()      masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step()   masks stats::step()
## • Use tidymodels_prefer() to resolve common conflicts.
library(modeltime)
library(timetk)

#Load data
carlton_afl <- read.csv("Carlton AFL (Aus).csv")

Step 3: Clean the Data

To clean the data we first need to change the string of Month from a character string to a date string. It is optional if needed to filter the data set to only include the last 10 years of data.

#Transform the Month column to a date string

carlton_afl$Date <- ym(carlton_afl$Date)

#Filter the data to the last 10 years (optional)
carlton_afl <- carlton_afl %>% 
  filter(Date >= as.Date("2014-01-01"))

#Check to see if the string is a date string
str(carlton_afl)
## 'data.frame':    130 obs. of  2 variables:
##  $ Date          : Date, format: "2014-01-01" "2014-02-01" ...
##  $ CFC.popularity: int  5 10 17 24 18 18 14 14 11 9 ...

Step 4: Create Visualisations for Time Series

To create the visualizations there are multiple that you can create. For this analysis we will be looking at the time series plot, seasonal diagnostics and anomaly diagnostics. The time series visualisation is done to look at the data of Carlton over the years

#Plot Time Series
carlton_afl %>% 
  plot_time_series(.date_var = Date,
                   .value = CFC.popularity)

Seasonal Diagnostics is completed to future understand patterns within the data set over months, years and quarterly.

#Plot Seasonal Diagnostics
carlton_afl %>% 
  plot_seasonal_diagnostics(.date_var = Date,
                            .value = CFC.popularity)

Anomaly Diagnstics is completed to identify and anomalies within the data set and find possible reasons for these anomilies. The .alpha value is set at 0.05.

#Plot Anomalies Diagnostics
carlton_afl %>% 
  plot_anomaly_diagnostics(.date_var = Date,
                           .value = CFC.popularity,
                           .alpha = 0.05) 
## frequency = 12 observations per 1 year
## trend = 60 observations per 5 years

Step 5: Splitting the Data

Splitting the data is done for the models to be created and the training data is used for the models to recognise patterns within this set of data. Then after these model are created, they are used on testing data to see the effectiveness of them.

#Split the data
data_splitting <- initial_time_split(carlton_afl)

#tk_time_series_cv_plan to inspect the splits
data_splitting %>% 
  tk_time_series_cv_plan() %>% 
  plot_time_series_cv_plan(.date_var = Date,
                           .value = CFC.popularity)
#Creating Training and Testing Sets
train_afl <- training(data_splitting)
test_afl <- testing(data_splitting)

Step 6: Creating Models

There are many models that we can make, however for this analysis we will be creating a auto arima model, exponential smoothing model and a linear regression model.

#Creating Models
## a. Auto ARIMA 
ari_m <- arima_reg() %>% 
  set_engine("auto_arima") %>% 
  fit(CFC.popularity ~ Date, data = train_afl)
## frequency = 12 observations per 1 year
## b. Exponential smoothing 
et_m <- exp_smoothing() %>% 
  set_engine("ets") %>% 
  fit(CFC.popularity ~ Date, data = train_afl)
## frequency = 12 observations per 1 year
## c. Linear reg 
lm_m <- linear_reg() %>% 
  set_engine("lm") %>% 
  fit(CFC.popularity ~ Date, data = train_afl)

# Create a Table to put the Models made into
table_models <- modeltime_table(
  ari_m,
  et_m,
  lm_m
)

# Calibrate Model Table with Test Data
calibrate_tbl <- table_models %>% 
  modeltime_calibrate(new_data = test_afl)

Step 7: Make Forcasts

The final step includes creating the forecasts models. Firstly we assess the forecast model on the testing data that we created above to see the accuracy of the model. After we completed the testing forecast model, we create a forcast model that will predict 3 years into the future.

#Create Forcasts 
calibrate_tbl %>% 
  modeltime_forecast(
    actual_data = carlton_afl,
    new_data = test_afl
  ) %>% 
  plot_modeltime_forecast()
#Check Accuracy Results
results <- calibrate_tbl %>% 
  modeltime_accuracy() 


#Refit and Forecast
refit_tbl <- calibrate_tbl %>% 
  modeltime_refit(data = carlton_afl)
## frequency = 12 observations per 1 year
## frequency = 12 observations per 1 year
#Forcast for 3 Years
refit_tbl %>% 
  modeltime_forecast(h = "3 years", actual_data = carlton_afl) %>% 
  plot_modeltime_forecast()