The dataset represents the total vehicle sales in the USA (“TOTALNSA” series) published by the Federal Reserve Bank of St. Louis (FRED). This time series data, updated monthly, serves as a critical economic indicator, reflecting consumer spending trends and the trends in the automotive industry. The dataset spans several years, offering a holistic view of vehicle sales dynamics.
data <- read.csv("//Users/keerthichereddy/Documents/Sem2/Forecasting Methods/Project/total_vehicle_sales.csv", stringsAsFactors = FALSE)
data$date <- as.Date(data$date)
data_ts <- ts(data$vehicle_sales, start = c(1976, 1), frequency = 12)
log_data_ts <- log(data_ts)
seasonally_diff_log_data_ts <- diff(log_data_ts, lag = 12)
plot(data_ts, xlab = "Time", ylab = "Vehicle Sales", main = "Original Monthly Vehicle Sales Time Series", type = "o")
# log-transformed time series
plot(log_data_ts, xlab = "Time", ylab = "Log Vehicle Sales", main = "Log Transformed Monthly Vehicle Sales Time Series", type = "o")
# seasonally differenced log-transformed series
plot(seasonally_diff_log_data_ts, xlab = "Time", ylab = "Differenced Log Vehicle Sales", main = "Seasonally Differenced Log Transformed Monthly Vehicle Sales", type = "o")
# KPSS test on the seasonally differenced, log-transformed series
kpss_test <- kpss.test(as.numeric(seasonally_diff_log_data_ts))
## Warning in kpss.test(as.numeric(seasonally_diff_log_data_ts)): p-value greater
## than printed p-value
print(kpss_test)
##
## KPSS Test for Level Stationarity
##
## data: as.numeric(seasonally_diff_log_data_ts)
## KPSS Level = 0.064599, Truncation lag parameter = 6, p-value = 0.1
### Section 2
acf(seasonally_diff_log_data_ts, main="ACF of Seasonally Differenced Log Transformed Series")
pacf(seasonally_diff_log_data_ts, main="PACF of Seasonally Differenced Log Transformed Series")
The ACF and PACF plots imply that the seasonally differenced
log-transformed series could be modeled with an ARIMA(1,0,0)
process.
arima_100 <- Arima(seasonally_diff_log_data_ts, order=c(1,0,0))
arima_010 <- Arima(seasonally_diff_log_data_ts, order=c(0,1,0))
arima_001 <- Arima(seasonally_diff_log_data_ts, order=c(0,0,1))
AIC(arima_100, arima_010, arima_001)
## Warning in AIC.default(arima_100, arima_010, arima_001): models are not all
## fitted to the same number of observations
## df AIC
## arima_100 3 -993.2551
## arima_010 1 -895.9443
## arima_001 3 -862.6340
BIC(arima_100, arima_010, arima_001)
## Warning in BIC.default(arima_100, arima_010, arima_001): models are not all
## fitted to the same number of observations
## df BIC
## arima_100 3 -980.2553
## arima_010 1 -891.6128
## arima_001 3 -849.6341
best_arima <- auto.arima(seasonally_diff_log_data_ts)
summary(best_arima)
## Series: seasonally_diff_log_data_ts
## ARIMA(1,0,2)(0,0,2)[12] with zero mean
##
## Coefficients:
## ar1 ma1 ma2 sma1 sma2
## 0.9598 -0.4329 -0.1069 -0.6846 -0.1577
## s.e. 0.0148 0.0456 0.0453 0.0425 0.0412
##
## sigma^2 = 0.006475: log likelihood = 615.52
## AIC=-1219.04 AICc=-1218.89 BIC=-1193.04
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.002102387 0.08010934 0.05792226 10.14138 261.8839 0.4310673
## ACF1
## Training set -0.001249682
fitted_values <- fitted(best_arima)
plot(seasonally_diff_log_data_ts, main="Observed vs Fitted Values")
lines(fitted_values, col='red')
While the manual selection suggested an ARIMA(1,0,0) model, the
automated selection found a more complex model to be a better fit. The
chosen model by auto.arima performs well in capturing the in-sample
trends and patterns of the series.
residuals <- residuals(best_arima)
box_ljung_test <- Box.test(residuals, lag=log(length(residuals)))
acf(residuals, main="ACF of Residuals")
pacf(residuals, main="PACF of Residuals")
print(box_ljung_test)
##
## Box-Pierce test
##
## data: residuals
## X-squared = 0.78926, df = 6.3333, p-value = 0.9947
forecast <- forecast(best_arima, h=6)
print(forecast)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Dec 2023 0.10606286 0.002939657 0.2091861 -0.05165047 0.2637762
## Jan 2024 0.03042964 -0.086131070 0.1469903 -0.14783458 0.2086939
## Feb 2024 0.05471515 -0.068888449 0.1783188 -0.13432024 0.2437506
## Mar 2024 0.03681277 -0.092940829 0.1665664 -0.16162824 0.2352538
## Apr 2024 -0.03463281 -0.169804346 0.1005387 -0.24135983 0.1720942
## May 2024 0.03551635 -0.104460674 0.1754934 -0.17856003 0.2495927
plot(forecast)
The selected ARIMA(1,0,2)(0,0,2)[12] model appears to provide a good fit
to the data, indicating that the model captures the data’s underlying
process well. The forecasts generated are reasonable based on the
historical data, and the increasing forecast intervals reflect the
expected uncertainty.