Module 9 Discussion_ARIMAX

Author

You-Jin,Tsai

library(fpp3)
library(fredr)
library(tidyverse)
library(patchwork)
library(knitr)
library(kableExtra)
library(gridExtra)
library(writexl)
library(tseries)
library(forecast)   
library(quantmod)
library(feasts)
library(broom)
library(lmtest)

Sys.setlocale("LC_TIME", "en_US.UTF-8")

[1] "en_US.UTF-8"

remove(list=ls())
fredr_set_key("11c23965cf4414274d293d0c36ec7507")

1. Comparative Forecasting of U.S. Vehicle Sales — ETS vs. ARIMA Models

Load data and split data

# U.S. Total Vehicle Sales (Not Seasonally Adjusted)
# Period: 2021-01-01 to 2025-12-01
TVC <- fredr(
  series_id = "TOTALNSA",
  observation_start = as.Date("2021-01-01"),
  observation_end = as.Date("2025-12-01")
) |>
  # Convert units from Thousands to Millions for better readability
  transmute(month = yearmonth(date), sales = value / 1000) |>
  as_tsibble(index = month)

# Split into train (80%) and test (20%) sets 
# (60 observations -> Train: 48, Test: 12)
n <- nrow(TVC)
split_index <- floor(0.8 * n)
train_ts <- TVC[1:split_index, ]
test_ts  <- TVC[(split_index + 1):n, ]

# Add a label column for plotting purposes
TVC <- TVC |>
  mutate(set = if_else(row_number() <= split_index, "Train", "Test"))

Time series plot

Training Set: January 2021 to December 2024 (48 months), used for model fitting.
Test Set: January 2025 to December 2025 (12 months), reserved for out-of-sample evaluation.

split_plot <- ggplot(TVC, aes(x = month, y = sales, color = set)) +
  geom_line(linewidth = 1) +
  scale_color_manual(values = c("Train" = "steelblue", "Test" = "firebrick")) +
  labs(title = "U.S. Total Vehicle Sales (NSA)",
       subtitle = "Train vs Test Split (80/20)",
       x = "Month", y = "Sales (Millions of Units)", color = "Dataset") +
  theme_minimal() +
  theme(legend.position = "bottom")

split_plot

Fit model with train data

The two models are not equivalent. The ARIMA model relies on seasonal differencing and lagged residuals to stabilize the data, while the ETS(M,N,M) model uses a state-space framework with multiplicative components. The low \(\gamma\) (0.0001) in ETS suggests that the seasonal pattern is highly stable over time, whereas the ARIMA model’s SMA(1) coefficient (-0.5161) indicates an active error-correction mechanism for annual shocks.

Comparing the Information Criteria, the ARIMA model is significantly superior for this dataset:

ARIMA AICc: -59.51
ETS AICc: -23.67 The much lower AICc value for ARIMA indicates a better fit with a more efficient use of parameters. While ETS(M,N,M) captures multiplicative seasonality, the ARIMA structure is better suited to capture the serial dependence and short-term momentum (AR1) of vehicle sales.

models_fit <- model(
  train_ts,
  ETS    = ETS(sales),
  ARIMA  = ARIMA(sales)
)

models_fit |> select(ARIMA) |> report()

Series: sales 
Model: ARIMA(1,0,2)(0,1,1)[12] 

Coefficients:
         ar1      ma1     ma2     sma1
      0.6823  -0.3290  0.9168  -0.5161
s.e.  0.1480   0.1238  0.1666   0.2688

sigma^2 estimated as 0.00731:  log likelihood=35.75
AIC=-61.51   AICc=-59.51   BIC=-53.59

models_fit |> select(ETS) |> report()

Series: sales 
Model: ETS(M,N,M) 
  Smoothing parameters:
    alpha = 0.5447689 
    gamma = 0.0001287709 

  Initial states:
     l[0]     s[0]    s[-1]     s[-2]     s[-3]    s[-4]    s[-5]    s[-6]
 1.351164 1.061524 0.939749 0.9499343 0.9480795 1.006966 1.001766 1.030766
    s[-7]    s[-8]    s[-9]    s[-10]    s[-11]
 1.092161 1.084858 1.121581 0.9092473 0.8533675

  sigma^2:  0.0043

      AIC      AICc       BIC 
-38.67211 -23.67211 -10.60410

Accuracy metrics

After evaluating the models on the 20% test set (the year 2025), the ETS(M,N,M) model outperformed the ARIMA model across all accuracy metrics. Specifically, ETS achieved a lower RMSE (0.079) and MAPE (5.01%) compared to ARIMA’s RMSE (0.087) and MAPE (5.33%).

# Forecast on the test set horizon
h <- nrow(test_ts)
fc_base <- forecast(models_fit, h = h)

# Calculate accuracy metrics 
accuracy_base <- fabletools::accuracy(fc_base, test_ts) |>
  select(.model, RMSE, MAE, MAPE) |>
  rename(Model = .model)

accuracy_base

Intervals plot (95% confidence)

The difference between the two forecasts comes from how ETS and ARIMA handle the sales level and the seasons.

Seasonality and level in ETS

In the ETS model the seasonal part is fixed and does not change over time. Because this model uses multiplication it scales the seasonal peaks based on the current sales level. When recent sales are high the model makes the future peaks much larger even if there is no trend. It treats the recent high sales as a new normal level for the future.
The role of differencing in ARIMA

The ARIMA model uses seasonal differencing. This means the model looks at the change from the same month last year instead of the total sales number. This method makes the forecast go back to the historical average instead of staying at the recent high values. Because of this ARIMA is less sensitive to short term increases in sales.
Error correction and results

ARIMA has a part that corrects errors and pulls the forecast back to the long run pattern. On the other hand ETS relies on the current high level and the fixed seasonal shape. This is why the ETS forecast stays high while the ARIMA forecast moves back toward the average.

forecast_intervals_plot <- fc_base |> 
  autoplot(TVC, level = c(95)) + 
  labs(title = "Forecast with 95% Prediction Intervals",
       subtitle = "ETS vs ARIMA comparison against actual vehicle sales",
       x = "Month", y = "Sales (Millions of Units)") +
  theme_minimal() +
  theme(legend.position = "bottom")

forecast_intervals_plot

2. Dynamic Regression Analysis with Consumer Sentiment

I chose Consumer Sentiment (UMCSENT) as the single exogenous variable for this model. Buying a vehicle is a major financial decision that depends on consumer confidence in the economy. By adding this variable I want to see if the model can explain the systematic changes in sales that the basic ARIMA model might miss. If the new model shows a lower AICc it means that the external economic mood provides useful information for the forecast.

Load data

# Fetch Consumer Sentiment data from FRED
CSENT_raw <- fredr(
  series_id = "UMCSENT",
  observation_start = as.Date("2021-01-01"),
  observation_end = as.Date("2025-12-01")
)

CSENT <- CSENT_raw |>
  transmute(month = yearmonth(date), sentiment = value) |>
  as_tsibble(index = month)

# Merge the sentiment data with vehicle sales table
TVC_dynamic <- TVC |>
  left_join(CSENT, by = "month")

train_dynamic <- TVC_dynamic |> slice(1:split_index)
test_dynamic  <- TVC_dynamic |> slice((split_index + 1):n())

Time series plot

In the second part of this project we add Consumer Sentiment as an external variable. Buying a car is a large expense and people usually buy more when they feel good about the economy. The graph shows that consumer sentiment changed a lot from 2021 to 2025. In the test period starting 2025 the sentiment index dropped quickly. By adding this data to our ARIMA model we want to see if the model can predict car sales better.

split_plot <- ggplot(TVC_dynamic, aes(x = month, y = sentiment, color = set)) +
  geom_line(linewidth = 1) +
  scale_color_manual(values = c("Train" = "steelblue", "Test" = "firebrick")) +
  labs(title = "Consumer Sentiment",
       subtitle = "Train vs Test Split (80/20)",
       x = "Month", y = " Index 1966:Q1=100", color = "Dataset") +
  theme_minimal() +
  theme(legend.position = "bottom")

split_plot

Fit model with train data

ARIMAX vs Base ARIMA

In the training stage the base ARIMA model has an AICc of -59.51. The new ARIMAX model with Consumer Sentiment has a higher AICc of -56.32. In statistics a lower AICc is better. This means that adding Consumer Sentiment actually made the model fit less efficiently because the improvement in the data was not enough to justify the extra variable.
Significance of Sentiment

If we look at the sentiment coefficient it is 0.0035 but the standard error is 0.0025. This means the effect of sentiment is not statistically significant. The relationship between how people feel and total car sales is not strong enough in this specific period to help the model.

models_fit <- model(
  train_dynamic,
  ETS    = ETS(sales),
  ARIMA  = ARIMA(sales),
  ARIMAX = ARIMA(sales ~ sentiment)
)

models_fit |> select(ARIMAX) |> report()

Series: sales 
Model: LM w/ ARIMA(0,0,3)(1,1,0)[12] errors 

Coefficients:
         ma1     ma2     ma3     sar1  sentiment  intercept
      0.3635  0.6883  0.5777  -0.5858     0.0035     0.0486
s.e.  0.1622  0.2721  0.1939   0.1721     0.0025     0.0242

sigma^2 estimated as 0.00724:  log likelihood=37.16
AIC=-60.32   AICc=-56.32   BIC=-49.23

models_fit |> select(ARIMA) |> report()

Series: sales 
Model: ARIMA(1,0,2)(0,1,1)[12] 

Coefficients:
         ar1      ma1     ma2     sma1
      0.6823  -0.3290  0.9168  -0.5161
s.e.  0.1480   0.1238  0.1666   0.2688

sigma^2 estimated as 0.00731:  log likelihood=35.75
AIC=-61.51   AICc=-59.51   BIC=-53.59

Accuracy metrics

In the 2025 test period the ARIMAX model performed better than the base ARIMA model. The ARIMAX model reached a lower RMSE of 0.0803 while the base ARIMA had a higher RMSE of 0.0872. This shows that the Consumer Sentiment data helped the model make more accurate predictions on new data. Even though the ARIMAX model was more complex and had a higher AICc during training it was more useful for the actual forecast in 2025. However the ETS model still remains the best overall with the lowest RMSE of 0.0790.

# Forecast on the test set horizon
h <- nrow(test_dynamic)
fc_base <- forecast(models_fit, new_data = test_dynamic)

# Calculate accuracy metrics 
accuracy_base <- fabletools::accuracy(fc_base, test_dynamic) |>
  select(.model, RMSE, MAE, MAPE) |>
  rename(Model = .model)

accuracy_base

Intervals plot (95% confidence)

This graph compares the forecasts for 2025. The black line shows the real sales data. The cyan color represents ARIMAX and the red color represents ARIMA.

The most important detail is the size of the shaded areas. The cyan area for ARIMAX is much narrower than the red area for ARIMA. This means the ARIMAX model is more certain about its forecast. By using Consumer Sentiment the model reduced its uncertainty and gave a more precise range.

Also the cyan line follows the real black line better in the first half of the year. The red area is very wide which shows that the base ARIMA model is unsure about the future when it only looks at past sales. In conclusion ARIMAX provides a much more useful and reliable forecast for the 2025 test period.

fc_base |>
  filter(.model %in% c("ARIMA", "ARIMAX")) |>
  autoplot(TVC_dynamic |> filter(year(month) == 2025), level = 95) +
  labs(title = "2025 Forecast: ARIMA vs ARIMAX", 
       x = "Month", y = "Sales (Millions)") +
  theme_minimal()

Testing for Granger Causality

I added Consumer Sentiment to our model to see if it helps. During the training stage the results were not very strong. The AICc did not go down and the Granger test p value was 0.2465. Also the coefficient for sentiment was not significant. From these numbers the extra variable seems unnecessary.

However the model with this variable performed much better during the 2025 test period. I am confused about this point. If anyone reading this has any thoughts please share your ideas to help me understand this better. Is this improvement in 2025 a real signal or just a lucky coincidence?

grangertest(sales ~ sentiment, order = 1, data = train_dynamic)