Discussion 9

Author

Ryan Bean

Load Data:

library(fpp3)
library(fredr)
library(tidyverse)
library(patchwork)
library(knitr)
library(kableExtra)
library(writexl)
library(urca)
library(ggtime)

remove(list=ls())

# Dataset 1
fredr_set_key("523a2b98a1ce120186357fd0c916cc26")

op <- fredr(series_id = "OILPRICE",
              observation_start = as.Date("1980-01-01"),
              observation_end   = as.Date("2024-12-01")
              ) |>
  transmute(Month = yearmonth(date), value) |>
  as_tsibble(index = Month)

I. ETS/ARIMA:

# Train / test split

n_test <- 24   # last 24 months as test set
n_total <- nrow(op)

train <- op |> slice(1:(n_total - n_test))
test  <- op |> slice((n_total - n_test + 1):n_total)

# Fit models
fits <- train |>
  model(
    ETS   = ETS(value),
    ARIMA = ARIMA(value)
  )

report(fits)

# A tibble: 2 × 11
  .model  sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE     MAE ar_roots ma_roots
  <chr>    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <list>   <list>  
1 ETS    6.77e-3  -1462. 2929. 2929. 2941.  13.3  39.6  0.0594 <NULL>   <NULL>  
2 ARIMA  1.05e+1   -979. 1969. 1969. 1988.  NA    NA   NA      <cpl>    <cpl>

The ARIMA model has much lower AIC, AICc, and BIC than the ETS model, so within this output it appears to fit the training data better. This table suggests ARIMA is the stronger in-sample fit, but I would still rely on test-set forecast accuracy to decide which model is actually better.

# Residual diagnostics
# ETS
fits |> 
  select(ETS) |> 
  ggtime::gg_tsresiduals()

# ARIMA
fits |> 
  select(ARIMA) |> 
  ggtime::gg_tsresiduals()

For the ETS residual plot, the residuals are centered around zero and the variance looks fairly stable over time, which is a good sign. The histogram is reasonably symmetric, although there are a few outliers, and the ACF shows some remaining spikes at a few lags, suggesting the model may not have captured all of the serial dependence perfectly.

For the ARIMA residual plot, the residuals are also centered near zero, but the spread clearly increases over time, especially in the later years, which suggests heteroskedasticity or changing volatility. The ACF is mostly small, but there are still a few notable spikes, so while the model captures much of the dependence, the residual variance pattern indicates the fit is less stable over the full sample.

# Forecast test
fc <- fits |>
  forecast(h = nrow(test))

# Accuracy on training + test

accuracy_table <- bind_rows(
  fits |> accuracy(),
  fc   |> accuracy(op)
) |>
  select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)

accuracy_table |>
  knitr::kable(digits = 3, caption = "ETS vs ARIMA accuracy") |>
  kableExtra::kable_styling(full_width = FALSE)

ETS vs ARIMA accuracy
.model	.type	RMSE	MAE	MAPE	MASE	RMSSE
ETS	Training	3.640	2.133	5.960	0.242	0.239
ARIMA	Training	3.221	2.095	5.979	0.238	0.211
ARIMA	Test	8.039	6.768	7.040	0.768	0.528
ETS	Test	7.112	5.909	6.492	0.671	0.467

On the training set, ARIMA performs slightly better, with lower RMSE, MAE, MASE, and RMSSE, while MAPE is nearly identical between the two models. However, on the test set, ETS is clearly more accurate across every reported measure, with lower RMSE, MAE, MAPE, MASE, and RMSSE. So although ARIMA fit the training data a bit better, ETS generalized better to unseen data and was the stronger forecasting model here.

# Plot forecasts against actuals

fc |>
  autoplot(op) +
  autolayer(train, value, colour = "black") +
  labs(
    title = "ETS and ARIMA forecasts for monthly oil prices",
    x = "Month",
    y = "Oil Price"
  )

The forecast plot shows that both models project oil prices staying around the recent level in the short run, but the ETS forecast intervals widen much more quickly, indicating greater uncertainty. The ARIMA forecast is more conservative and tighter, while ETS allows for much larger upside and downside movement, which matches the fact that oil prices are volatile and hard to predict. Combined with the test-set accuracy results, this suggests that even though ETS produces wider and less certain future paths, it still handled the holdout sample better than ARIMA, making it the stronger forecasting model in this application.

II. Dynamic Models:

# Dynamic regression on train data
fit_dyn <- train |>
  model(
    DYNREG = ARIMA(value ~ trend() + season())
  )

report(fit_dyn)

Series: value 
Model: LM w/ ARIMA(1,0,5) errors 

Coefficients:
         ar1     ma1     ma2     ma3     ma4     ma5  trend()  season()year2
      0.9256  0.4438  0.3740  0.2661  0.2353  0.1721   0.1725        -0.4144
s.e.  0.0248  0.0560  0.0664  0.0742  0.0746  0.0684   0.0239         0.5720
      season()year3  season()year4  season()year5  season()year6  season()year7
             0.8714         1.9895         2.2159         2.5806         2.7724
s.e.         0.9219         1.2021         1.4098         1.5484         1.6074
      season()year8  season()year9  season()year10  season()year11
             2.9996         2.5708          2.0513          1.0496
s.e.         1.5525         1.4162          1.2096          0.9292
      season()year12
             -0.3569
s.e.          0.5779

sigma^2 estimated as 10.54:  log likelihood=-976.83
AIC=1991.65   AICc=1993.77   BIC=2066.47

The dynamic regression selected a linear model with ARIMA(1,0,5) errors, meaning oil prices are explained partly by a deterministic trend and monthly seasonal effects, with the remaining autocorrelation captured by a fairly rich ARIMA error structure. The positive trend() coefficient suggests oil prices were generally increasing over the training period, while the seasonal coefficients indicate that some months tended to be systematically above or below the baseline month. The large AR coefficient (ar1 = 0.9256) shows strong persistence in the regression errors, and the multiple MA terms suggest short-run shocks continued to affect the series for several months. Overall, this implies that trend and seasonality alone were not enough, so the ARIMA error process was important for capturing the remaining time-series dependence.

# Innovation residual diagnostics
fit_dyn |>
  ggtime::gg_tsresiduals()

augment(fit_dyn) |>
  features(.innov, ljung_box, lag = 24, dof = 0)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 DYNREG    28.5     0.239

The residual diagnostics for the dynamic regression model are fairly solid overall: the residuals are centered near zero and the Ljung–Box test has a p-value of 0.239, so we do not reject the null of no remaining autocorrelation. That suggests the ARIMA error structure did a reasonable job accounting for the serial dependence left after fitting the trend and seasonal regression terms. That said, the residual plot still shows increasing volatility over time, especially in the later years, and the histogram has some heavy tails and outliers. So while the residuals look acceptable from an autocorrelation standpoint, the model still struggles with the changing variance and occasional large shocks in oil prices.

# Forecast over the test set

fc_dyn <- fit_dyn |>
  forecast(new_data = test)

# Accuracy on train and test

dyn_accuracy <- bind_rows(
  fit_dyn |> accuracy(),
  fc_dyn   |> accuracy(op)
) |>
  select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)

dyn_accuracy |>
  knitr::kable(digits = 3, caption = "Dynamic regression accuracy") |>
  kableExtra::kable_styling(full_width = FALSE)

Dynamic regression accuracy
.model	.type	RMSE	MAE	MAPE	MASE	RMSSE
DYNREG	Training	3.168	2.096	6.112	0.238	0.208
DYNREG	Test	15.314	13.952	14.535	1.584	1.005

Across all three models, ETS is preferred because it had the best test-set accuracy, which matters most for forecasting. On the holdout sample, ETS beat ARIMA on every metric shown earlier (RMSE 7.112 vs 8.039, MAE 5.909 vs 6.768, MAPE 6.492 vs 7.040), and it was far better than the dynamic regression model, whose test errors were much larger (RMSE 15.314, MAPE 14.535). So even though ARIMA and dynamic regression fit the training data slightly better, ETS generalized best to unseen data, making it the strongest forecasting model of the three.

# Plot forecasts against actual data

fc_dyn |>
  autoplot(op) +
  autolayer(train, value, colour = "black") +
  labs(
    title = "Dynamic regression forecast for monthly oil prices",
    x = "Month",
    y = "Oil Price"
  )

These results can be reconciled by noting that the “external regressors” in the dynamic regression were just trend and month effects, not genuinely informative outside predictors like supply, demand, inflation, or macro shocks. In other words, the model did not gain new information about future oil prices; it only forced a linear trend and fixed seasonal pattern onto a series that is volatile, subject to structural breaks, and probably not strongly seasonal, so the added regression terms may have hurt out-of-sample performance rather than helped. Also, external regressors do not automatically improve a basic ARIMA model — they only help if they contain real predictive signal beyond what ARIMA already captures. Here, the plain ARIMA model was already modeling persistence well, and the dynamic regression likely overfit the training data by combining many seasonal coefficients with a complex ARIMA(1,0,5) error structure, which helps explain why its training fit looked decent but its test-set forecast accuracy was much worse.