Discussion 9
Load Data:
I. ETS/ARIMA:
# Fit models
fits <- train |>
model(
ETS = ETS(value),
ARIMA = ARIMA(value)
)
report(fits)# A tibble: 2 × 11
.model sigma2 log_lik AIC AICc BIC MSE AMSE MAE ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 ETS 6.77e-3 -1462. 2929. 2929. 2941. 13.3 39.6 0.0594 <NULL> <NULL>
2 ARIMA 1.05e+1 -979. 1969. 1969. 1988. NA NA NA <cpl> <cpl>
The ARIMA model has much lower AIC, AICc, and BIC than the ETS model, so within this output it appears to fit the training data better. This table suggests ARIMA is the stronger in-sample fit, but I would still rely on test-set forecast accuracy to decide which model is actually better.
# Residual diagnostics
# ETS
fits |>
select(ETS) |>
ggtime::gg_tsresiduals()# ARIMA
fits |>
select(ARIMA) |>
ggtime::gg_tsresiduals()For the ETS residual plot, the residuals are centered around zero and the variance looks fairly stable over time, which is a good sign. The histogram is reasonably symmetric, although there are a few outliers, and the ACF shows some remaining spikes at a few lags, suggesting the model may not have captured all of the serial dependence perfectly.
For the ARIMA residual plot, the residuals are also centered near zero, but the spread clearly increases over time, especially in the later years, which suggests heteroskedasticity or changing volatility. The ACF is mostly small, but there are still a few notable spikes, so while the model captures much of the dependence, the residual variance pattern indicates the fit is less stable over the full sample.
# Forecast test
fc <- fits |>
forecast(h = nrow(test))
# Accuracy on training + test
accuracy_table <- bind_rows(
fits |> accuracy(),
fc |> accuracy(op)
) |>
select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)
accuracy_table |>
knitr::kable(digits = 3, caption = "ETS vs ARIMA accuracy") |>
kableExtra::kable_styling(full_width = FALSE)| .model | .type | RMSE | MAE | MAPE | MASE | RMSSE |
|---|---|---|---|---|---|---|
| ETS | Training | 3.640 | 2.133 | 5.960 | 0.242 | 0.239 |
| ARIMA | Training | 3.221 | 2.095 | 5.979 | 0.238 | 0.211 |
| ARIMA | Test | 8.039 | 6.768 | 7.040 | 0.768 | 0.528 |
| ETS | Test | 7.112 | 5.909 | 6.492 | 0.671 | 0.467 |
On the training set, ARIMA performs slightly better, with lower RMSE, MAE, MASE, and RMSSE, while MAPE is nearly identical between the two models. However, on the test set, ETS is clearly more accurate across every reported measure, with lower RMSE, MAE, MAPE, MASE, and RMSSE. So although ARIMA fit the training data a bit better, ETS generalized better to unseen data and was the stronger forecasting model here.
The forecast plot shows that both models project oil prices staying around the recent level in the short run, but the ETS forecast intervals widen much more quickly, indicating greater uncertainty. The ARIMA forecast is more conservative and tighter, while ETS allows for much larger upside and downside movement, which matches the fact that oil prices are volatile and hard to predict. Combined with the test-set accuracy results, this suggests that even though ETS produces wider and less certain future paths, it still handled the holdout sample better than ARIMA, making it the stronger forecasting model in this application.
II. Dynamic Models:
# Dynamic regression on train data
fit_dyn <- train |>
model(
DYNREG = ARIMA(value ~ trend() + season())
)
report(fit_dyn)Series: value
Model: LM w/ ARIMA(1,0,5) errors
Coefficients:
ar1 ma1 ma2 ma3 ma4 ma5 trend() season()year2
0.9256 0.4438 0.3740 0.2661 0.2353 0.1721 0.1725 -0.4144
s.e. 0.0248 0.0560 0.0664 0.0742 0.0746 0.0684 0.0239 0.5720
season()year3 season()year4 season()year5 season()year6 season()year7
0.8714 1.9895 2.2159 2.5806 2.7724
s.e. 0.9219 1.2021 1.4098 1.5484 1.6074
season()year8 season()year9 season()year10 season()year11
2.9996 2.5708 2.0513 1.0496
s.e. 1.5525 1.4162 1.2096 0.9292
season()year12
-0.3569
s.e. 0.5779
sigma^2 estimated as 10.54: log likelihood=-976.83
AIC=1991.65 AICc=1993.77 BIC=2066.47
The dynamic regression selected a linear model with ARIMA(1,0,5) errors, meaning oil prices are explained partly by a deterministic trend and monthly seasonal effects, with the remaining autocorrelation captured by a fairly rich ARIMA error structure. The positive trend() coefficient suggests oil prices were generally increasing over the training period, while the seasonal coefficients indicate that some months tended to be systematically above or below the baseline month. The large AR coefficient (ar1 = 0.9256) shows strong persistence in the regression errors, and the multiple MA terms suggest short-run shocks continued to affect the series for several months. Overall, this implies that trend and seasonality alone were not enough, so the ARIMA error process was important for capturing the remaining time-series dependence.
# Innovation residual diagnostics
fit_dyn |>
ggtime::gg_tsresiduals()augment(fit_dyn) |>
features(.innov, ljung_box, lag = 24, dof = 0)# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 DYNREG 28.5 0.239
The residual diagnostics for the dynamic regression model are fairly solid overall: the residuals are centered near zero and the Ljung–Box test has a p-value of 0.239, so we do not reject the null of no remaining autocorrelation. That suggests the ARIMA error structure did a reasonable job accounting for the serial dependence left after fitting the trend and seasonal regression terms. That said, the residual plot still shows increasing volatility over time, especially in the later years, and the histogram has some heavy tails and outliers. So while the residuals look acceptable from an autocorrelation standpoint, the model still struggles with the changing variance and occasional large shocks in oil prices.
# Forecast over the test set
fc_dyn <- fit_dyn |>
forecast(new_data = test)
# Accuracy on train and test
dyn_accuracy <- bind_rows(
fit_dyn |> accuracy(),
fc_dyn |> accuracy(op)
) |>
select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)
dyn_accuracy |>
knitr::kable(digits = 3, caption = "Dynamic regression accuracy") |>
kableExtra::kable_styling(full_width = FALSE)| .model | .type | RMSE | MAE | MAPE | MASE | RMSSE |
|---|---|---|---|---|---|---|
| DYNREG | Training | 3.168 | 2.096 | 6.112 | 0.238 | 0.208 |
| DYNREG | Test | 15.314 | 13.952 | 14.535 | 1.584 | 1.005 |
Across all three models, ETS is preferred because it had the best test-set accuracy, which matters most for forecasting. On the holdout sample, ETS beat ARIMA on every metric shown earlier (RMSE 7.112 vs 8.039, MAE 5.909 vs 6.768, MAPE 6.492 vs 7.040), and it was far better than the dynamic regression model, whose test errors were much larger (RMSE 15.314, MAPE 14.535). So even though ARIMA and dynamic regression fit the training data slightly better, ETS generalized best to unseen data, making it the strongest forecasting model of the three.
These results can be reconciled by noting that the “external regressors” in the dynamic regression were just trend and month effects, not genuinely informative outside predictors like supply, demand, inflation, or macro shocks. In other words, the model did not gain new information about future oil prices; it only forced a linear trend and fixed seasonal pattern onto a series that is volatile, subject to structural breaks, and probably not strongly seasonal, so the added regression terms may have hurt out-of-sample performance rather than helped. Also, external regressors do not automatically improve a basic ARIMA model — they only help if they contain real predictive signal beyond what ARIMA already captures. Here, the plain ARIMA model was already modeling persistence well, and the dynamic regression likely overfit the training data by combining many seasonal coefficients with a complex ARIMA(1,0,5) error structure, which helps explain why its training fit looked decent but its test-set forecast accuracy was much worse.