Overview

We will be performing a time series analysis on NBA search interests. The data has been collected through Google trends and has been filtered to collect trends for the past 5 years to present day.

Load Required Libraries and Dataset

library(xgboost)
library(tidymodels)
library(modeltime)
library(tidyverse)
library(lubridate)
library(timetk)
nba <- read.csv("Data/NBA.csv") %>% as.tibble()

Change Date from a character vector to date format

nba <- nba %>%
  mutate(date = dmy(DMY)) %>% 
  select(-c(DMY)) %>% 
  select(date,NBA)

Data Exploration

After cleaning and ensuring all data is in the correct format, we can now perform some data exploration.

Time Series Plot

nba %>% 
plot_time_series(
    .date_var = date,
    .value = NBA,
    .title = 'NBA Time Series Plot'
  )
  • plot_time_series shows us the variance in trend throughout the years that we have collected. as seen in the plot above. there is a clear trend of seasonality with highs most likely representing period during the finals series and low periods representing off-season where no games are played.

Seasonal Pattern

nba %>% 
  plot_seasonal_diagnostics(
    .date_var = date,
    .value = NBA,
    .feature_set = c('month.lbl','year'),
    .title = 'NBA Sesonal Diagnostics'
  )
  • plot_seasonal_diagnostics gives us a more detailed insight into seasonality trends. Here we have filtered to monthly and yearly trends.

  • Looking at monthly diagnostics, we can see that the low months of July to September are the off-season and from October till April represents the regular season, from April to June is where the final series begins.

  • Looking at yearly diagnostics from 2020 to 2025, we can see that each year is quite consistent. 2020 and 2021 are the only two years that are abnormal, this is due to 2020 season being affected due to covid and all teams put into a bubble, and 2021 season where the amount of games were reduced affecting the season schedule.

Anomalies

nba %>% 
  plot_anomaly_diagnostics(
    .date_var = date,
    .value = NBA,
    .title = 'NBA Anomalies'
  )
  • plot_anomaly_diagnostics helps us see if any values are abnormally high or low. this allows us to then research why this anomaly has occurred and if it positively or negatively affected our data.

  • As seen in the plot, when the .alpha is set to default which is 0.05 there are no anomalies to report on.

Decomposed Series

nba %>% 
  plot_stl_diagnostics(
    .date_var = date,
    .value = NBA,
    .facet_scales = 'free',
    .feature_set = c('observed','trend')
  )
  • plot_stl_diagnostics further confirms that the data set relies heavily on seasonality throughout each year.

Split Data to train and test forecasting models

The next step is to create train and test sets so we are able to train our forecasting models and allow see how our models perform on untrained data.

splits <- initial_time_split(nba)
train <- training(splits)
test <- testing(splits)

Visualise splits

splits %>% 
  tk_time_series_cv_plan() %>% 
  plot_time_series_cv_plan(
    .date_var = date,
    .title = 'NBA Data Training & Testing Split',
    .value = NBA)
  • Through visualising splits we can see that our train data consists of, data through 2021 to the end of the 2024 season. The test data begins from the beginning of the 24/25 off-season till current day.

Create forecasting models

Now that we have created our train and test data, we can now create our forecasting models and visualise how accurate each model is on our test data.

Auto ARIMA

arima_fit <- arima_reg() %>% 
  set_engine("auto_arima") %>% 
  fit(NBA ~ date, data = train)

Boosted ARIMA

arima_boost_fit <- arima_boost() %>% 
  set_engine("auto_arima_xgboost") %>% 
  fit(NBA ~ date, data = train)

Exponential Smoothing

ets_fit <- exp_smoothing() %>% 
  set_engine("ets") %>% 
  fit(NBA ~ date, data = train)

Prophet Model

prophet_fit <- prophet_reg() %>% 
  set_engine("prophet") %>% 
  fit(NBA ~ date, data = train)

Add model to table

model_tbl <- modeltime_table(
  arima_fit,
  arima_boost_fit,
  ets_fit,
  prophet_fit)

Calibrate table

calibrate_tbl <- model_tbl %>% 
  modeltime_calibrate(new_data = test)

Forecast on test dataset

calibrate_tbl %>% 
  modeltime_forecast(
    actual_data = nba,
    new_data = test
  ) %>% 
  plot_modeltime_forecast()
  • Now we have applied our models that have been created through train data onto our test data. plotly creates interactive graphs, we can see that only the Prophet model is able to accurately predict seasonality well.

Check Results

calibrate_tbl %>% 
  modeltime_accuracy()
## # A tibble: 4 × 9
##   .model_id .model_desc              .type   mae  mape  mase smape  rmse     rsq
##       <int> <chr>                    <chr> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1         1 ARIMA(2,0,3) WITH NON-Z… Test  14.2  117.   3.27  57.4 18.0  0.0821 
## 2         2 ARIMA(2,0,0) WITH NON-Z… Test  12.9   96.8  2.95  54.3 16.5  0.176  
## 3         3 ETS(M,A,N)               Test  28.5  113.   6.54 166.  34.3  0.00825
## 4         4 PROPHET                  Test   4.38  21.1  1.01  22.2  5.68 0.900
  • modeltime_accuracy generates a table where we can see how each models statistics (R2, RMSE). We can see that prophet performed the best with a RMSE of 5.7 and R2 of 90. the table also tells us that, ETS, Auto ARIMAs and Xgboosted ARIMA do not perform well in forecasting. It is more likely that creating your own ARIMA models could potentially create better results as you are able to manipulate multiple variables such as moving averages and auto regressive orders.

Forecast 5 years into the future

So far we have, analysed the NBA data set, created models on trained data, and analysed each models performed on test data.The next step is to use the models created to forecast into the future. Doing this allows us to have a general idea of how chosen data sets may perform in the future based on previous trends through the use of historical data.

Refit Models

refit_tbl <- calibrate_tbl %>% 
  modeltime_refit(data = nba)

Forecast +5 years

refit_tbl %>% 
  modeltime_forecast(h = '5 years', actual_data = nba) %>% 
  plot_modeltime_forecast()