BANA 7050: Time Series Model Selection and Validation

Author

Saurabh Verma

Read library

Code
suppressWarnings({
  suppressMessages({
library(dplyr)
library(ggplot2)
library(lubridate)
library(gridExtra)
library(forecast)
library(modeltime)
library(fabletools)
library(feasts)
library(tsibble)
library(tibble)
library(fable)
library(data.table) 
library(knitr) 

options(warn=-1)
})
})

Read data and convert to time series format

Code
milk_df <- read.csv('/Users/saurabhverma/Desktop/UC BANA/Spring 2024 Flex I/Forecasting/milk_price.csv')

 milk_price_tbl_ts <- milk_df %>% select(date, milk_price) %>%
  mutate(value = milk_price) %>%
  mutate(date = yearmonth(date)) %>%
  as_tsibble(index = date)

Section 1 - Train and Test Splits

First, split your time series into a training and a test set, such that you are training on approximately 80% of the data and testing on approximately 20% of the data. Visualize the training and test sets in a single plot.

Code
# Set the seed for reproducibility
set.seed(123)

# Determine the number of rows for training (80%) and testing (20%)
n_rows <- nrow(milk_price_tbl_ts)
train_rows <- round(0.8 * n_rows)

# Split the data into training and testing sets
train_data <- milk_price_tbl_ts %>% slice(1:train_rows)
test_data <- milk_price_tbl_ts %>% slice((train_rows + 1):n_rows)

# Visualize the training and test sets
ggplot() +
  geom_line(data = train_data, aes(x = date, y = value, color = "Training"), linewidth = 1) +
  geom_line(data = test_data, aes(x = date, y = value, color = "Test"), linewidth = 1) +
  scale_color_manual(values = c("Training" = "blue", "Test" = "red")) +
  labs(title = "Training and Test Sets", x = "Date", y = "Milk Price") +
  theme_minimal()

Does it appear that the test set is representative of the training set?
The test set does not seem to accurately represent the training set. While the training set exhibits variation over the years with an upward trend, the test set displays minimal variation but features a steep upward trend.

Section 2 - Cross-Validation Scheme

Next, set up a rolling window cross-validation scheme using stretch_tsibble. Be sure to make appropriate choices on the initial training period and the interval at which you will step through the data considering the length of your time series.

Code
milk_train_cv <- train_data %>%
  stretch_tsibble(.init = 7*12, .step = 12)

milk_train_cv %>%
    ggplot()+
    geom_point(aes(date,factor(.id),color=factor(.id)))+
    ylab('Iteration')+
    ggtitle('Samples included in each CV Iteration')

Section 3 - Model Selection and Comparison

Fit the selected best ARIMA from Assignment 3 and a naive model to each fold in the cross-validation scheme you created. Then, produce a forecast of each model for each fold. Visualize the actual versus predicted for each cross-validation iteration. Does it seem like one model is likely to outperform the other?

Code
milk_train_cv_forecast = milk_train_cv %>%
  model(
    arima = ARIMA(value~pdq(2,1,2)),
    naive = NAIVE(value~drift())
  ) %>%
  forecast(h=6)

milk_train_cv_forecast %>%
  autoplot(milk_train_cv) +
  facet_wrap(~.id,nrow=6)+
  theme_bw()+
  ylab('Milk Price')

Code
# Assuming you have already computed milk_price_cv_forecast

milk_train_cv_forecast %>%
  as_tibble() %>%
  select(-value) %>%
  left_join(train_data, , by = c("date" = "date")) %>%
  ggplot() +
  geom_line(aes(date, value)) +
  geom_line(aes(date, .mean, color = factor(.id), linetype = .model)) +
  scale_color_discrete(name = 'Iteration') +
  ylab('Milk Prices') +
  theme_bw()

Code
milk_train_cv_forecast %>%
  group_by(.id,.model) %>%
  mutate(h = row_number()) %>%
  ungroup() %>%
  as_fable(response = "value", distribution = value) %>%
  accuracy(train_data, by = c("h", ".model")) %>%
  ggplot(aes(x = h, y = RMSE,color=.model)) +
  geom_point()+
  geom_line()+
  ylab('Average RMSE at Forecasting Intervals')+
  xlab('Months in the Future')

Code
milk_train_cv_forecast %>%
  group_by(.id,.model) %>%
  mutate(h = row_number()) %>%
  ungroup() %>%
  as_fable(response = "value", distribution = value) %>%
  accuracy(train_data, by = c("h", ".model")) %>%
  mutate(MAPE = MAPE/100) %>% # Rescale
  ggplot(aes(x = h, y = MAPE,color=.model)) +
  geom_point()+
  geom_line()+
  theme_bw()+
  scale_y_continuous(
    name = 'Average MAPE at Forecasting Intervals',labels=scales::percent)

In the training set, there’s approximately 23 years of milk price data. comprising 273 data points. At times, the naive model outperforms the ARIMA model, while in few instances, the ARIMA model surpasses the naive model. However, Naive model models exhibit better performance across various forecast horizons when compared to ARIMA model.

Compare the performance of the two models using the following metrics: RMSE, MAE, MAPE, and MASE. Which model performs better? Does the ARIMA model outperform the naive model? Does this match your intuition from the visualizations? Be sure to include visualizations and tables as appropriate, including an assessment of how the models perform at different forecasting horizons.

Code
 milk_train_cv_forecast %>%
    filter(.id<20) %>%
     group_by(.id) %>%
     accuracy(train_data) %>%
     ungroup() %>%
     arrange(.id) %>%
  data.table()
    .model .id .type           ME       RMSE        MAE          MPE       MAPE
 1:  arima   1  Test -0.078458798 0.08376195 0.07845880  -2.89156532  2.8915653
 2:  naive   1  Test -0.059771084 0.06466528 0.05977108  -2.20365970  2.2036597
 3:  arima   2  Test  0.158258170 0.19215019 0.16383649   5.41582677  5.6250661
 4:  naive   2  Test  0.161168421 0.19513992 0.16589825   5.51679657  5.6942093
 5:  arima   3  Test -0.287296241 0.30447088 0.28729624  -8.93197215  8.9319722
 6:  naive   3  Test -0.353549844 0.37427532 0.35354984 -10.99261349 10.9926135
 7:  arima   4  Test  0.021482203 0.03946314 0.02949580   0.66384503  0.9216068
 8:  naive   4  Test  0.022696078 0.04742431 0.03692297   0.69656327  1.1562016
 9:  arima   5  Test -0.025847872 0.05805588 0.05036810  -0.86931481  1.6646514
10:  naive   5  Test  0.019000000 0.04363867 0.03800000   0.61179971  1.2468324
11:  arima   6  Test  0.495238093 0.50654256 0.49523809  12.88639398 12.8863940
12:  naive   6  Test  0.382414918 0.38480263 0.38241492   9.96433390  9.9643339
13:  arima   7  Test -0.001758505 0.10839390 0.09438317  -0.12808230  2.4825558
14:  naive   7  Test -0.020431183 0.12352038 0.11240323  -0.63223192  2.9698090
15:  arima   8  Test  0.049534897 0.06511985 0.04953490   1.61881991  1.6188199
16:  naive   8  Test  0.002683633 0.04025557 0.03405489   0.06935580  1.1211726
17:  arima   9  Test -0.014087847 0.02260801 0.01881707  -0.42785562  0.5701622
18:  naive   9  Test -0.003033520 0.01492828 0.01109683  -0.09354085  0.3364870
19:  arima  10  Test  0.027564965 0.06979353 0.06056414   0.72739646  1.6541790
20:  naive  10  Test -0.005315009 0.07162921 0.06532548  -0.18016808  1.7998011
21:  arima  11  Test  0.114807483 0.12450974 0.11480748   3.25960816  3.2596082
22:  naive  11  Test  0.089988506 0.09957196 0.08998851   2.55289048  2.5528905
23:  arima  12  Test  0.003386139 0.02560759 0.02179512   0.09241548  0.6280234
24:  naive  12  Test -0.010803101 0.02239549 0.01940620  -0.31599919  0.5620097
25:  arima  13  Test  0.159681398 0.17535117 0.15968140   4.22236822  4.2223682
26:  naive  13  Test  0.105284141 0.12492569 0.10528414   2.77344062  2.7734406
27:  arima  14  Test -0.005987456 0.04984171 0.04441061  -0.19885316  1.3235353
28:  naive  14  Test -0.019352162 0.05685577 0.04957950  -0.59853547  1.4832441
29:  arima  15  Test  0.122947079 0.14919078 0.13334299   3.75404044  4.0935542
30:  naive  15  Test  0.091770252 0.12341212 0.11028420   2.78546360  3.3900993
31:  arima  16  Test -0.034069725 0.04385521 0.03776563  -1.08003069  1.1948461
32:  naive  16  Test -0.046294677 0.05571817 0.04736185  -1.46579413  1.4989465
    .model .id .type           ME       RMSE        MAE          MPE       MAPE
          MASE      RMSSE        ACF1
 1: 0.67823670 0.59801899  0.20791034
 2: 0.51669085 0.46167817  0.14864643
 3: 1.34831637 1.33822212  0.41042915
 4: 1.36528389 1.35904396  0.41313140
 5: 2.05762751 1.63689586  0.37227207
 6: 2.53213854 2.01217840  0.29093540
 7: 0.17658238 0.17583285  0.25814846
 8: 0.22104660 0.21130482  0.41072894
 9: 0.31453851 0.26683001  0.15925297
10: 0.23730225 0.20056722  0.04881210
11: 3.11515027 2.35947708  0.43082908
12: 2.40546911 1.79241203  0.22679615
13: 0.45936313 0.36745296  0.40687819
14: 0.54706677 0.41873141  0.43364418
15: 0.22606764 0.20933732  0.21038801
16: 0.15541990 0.12940743  0.22814851
17: 0.07959484 0.06738560  0.34258301
18: 0.04693880 0.04449533 -0.15878610
19: 0.25216961 0.20895947  0.55208433
20: 0.27199432 0.21445544  0.56991851
21: 0.47690523 0.37662607  0.37135866
22: 0.37380829 0.30119249  0.34177108
23: 0.09389488 0.07947593  0.47589113
24: 0.08360326 0.06950684  0.38644376
25: 0.70833964 0.55658619  0.50265170
26: 0.46703580 0.39652951  0.53126162
27: 0.19553943 0.15933010  0.45490955
28: 0.21829800 0.18175209  0.50350556
29: 0.57322032 0.47347267  0.45352884
30: 0.47409424 0.39166136  0.46890580
31: 0.16642952 0.14179931 -0.02705124
32: 0.20871914 0.18015645  0.07841145
          MASE      RMSSE        ACF1
Code
 milk_train_cv_forecast %>%
  accuracy(train_data) %>%
  data.table()
   .model .type         ME      RMSE       MAE       MPE     MAPE      MASE
1:  arima  Test 0.04408712 0.1755522 0.1149873 1.1320650 3.373057 0.5067378
2:  naive  Test 0.02227846 0.1589794 0.1050839 0.5305063 3.109109 0.4630945
       RMSSE      ACF1
1: 0.5676221 0.8262086
2: 0.5140364 0.8251922

The graph illustrates that the naive model exhibit better performance than ARIMA model across multiple forecast horizons. This observation is reinforced by examining the RMSE and MAPE values provided in the accuracy table. In some forecast horizons, the naive model outperforms the ARIMA model, while in others, the ARIMA model demonstrates superior performance.

Section 4

After identifying the model that performed the best in Section 3, refit that model to the entire training set and produce a forecast for the test set. Visualize the actual versus predicted for the test set, and recalculate your performance metrics on the test set for this selected model.

Code
Milk_model = train_data %>%
model(
     naive = NAIVE(value~drift()))

Milk_model  %>%
    forecast(h=68) %>%
    autoplot(train_data %>%
    bind_rows(test_data))+
    ylab('Milk Price Value')+
    theme_bw()

Code
Milk_model %>%
    forecast(h=68) %>%
    accuracy(train_data %>% bind_rows(test_data)) %>%
   dplyr::select(.model, RMSE, MAE, MAPE, MASE) %>%
  kable()
.model RMSE MAE MAPE MASE
naive 0.6954863 0.5539131 14.50824 2.453189

How does the model perform on the test set?

The forecast fails to replicate the variability observed in the test set. However, it manages to capture the subtle trend evident in the test set.

Does it seem like the model is overfitting or underfitting? Be sure to include visualizations and tables as appropriate, as well as a text description of the performance of the model using appropriate metrics.

Code
# Produce forecasts for the training set
train_forecast <- Milk_model %>%
  forecast(new_data = train_data)

# Visualize actual versus forecasted values for the training set
autoplot(train_forecast) +
  autolayer(train_data) +
  labs(title = "Actual vs Forecasted Values for Training Set", y = "Milk Prices", x = "Date") +
  theme_minimal()
Plot variable not specified, automatically selected `.vars = milk_price`

To assess whether the model is underfitting or overfitting, I examined its bias by evaluating how well the model’s forecasts aligned with the training data itself. By plotting the actual versus forecasted values, I observed a high bias within the training data. Additionally, upon comparing it with the test data, I noticed a low variance, indicative of underfitting.