ForecastingMethods

Section 1

The dataset represents the total vehicle sales in the USA (“TOTALNSA” series) published by the Federal Reserve Bank of St. Louis (FRED). This time series data, updated monthly, serves as a critical economic indicator, reflecting consumer spending trends and the trends in the automotive industry. The dataset spans several years, offering a holistic view of vehicle sales dynamics.

data <- read.csv("//Users/keerthichereddy/Documents/Sem2/Forecasting Methods/Project/total_vehicle_sales.csv", stringsAsFactors = FALSE)
data$date <- as.Date(data$date)
data_ts <- ts(data$vehicle_sales, start = c(1976, 1), frequency = 12)
log_data_ts <- log(data_ts)
seasonally_diff_log_data_ts <- diff(log_data_ts, lag = 12)

plot(data_ts, xlab = "Time", ylab = "Vehicle Sales", main = "Original Monthly Vehicle Sales Time Series", type = "o")

# log-transformed time series
plot(log_data_ts, xlab = "Time", ylab = "Log Vehicle Sales", main = "Log Transformed Monthly Vehicle Sales Time Series", type = "o")

# seasonally differenced log-transformed series
plot(seasonally_diff_log_data_ts, xlab = "Time", ylab = "Differenced Log Vehicle Sales", main = "Seasonally Differenced Log Transformed Monthly Vehicle Sales", type = "o")

# KPSS test on the seasonally differenced, log-transformed series
kpss_test <- kpss.test(as.numeric(seasonally_diff_log_data_ts))

## Warning in kpss.test(as.numeric(seasonally_diff_log_data_ts)): p-value greater
## than printed p-value

print(kpss_test)

## 
##  KPSS Test for Level Stationarity
## 
## data:  as.numeric(seasonally_diff_log_data_ts)
## KPSS Level = 0.064599, Truncation lag parameter = 6, p-value = 0.1

The presence of the trend in the original plot suggests that the series is not mean stationary. The variance also does not appear to be stationary, as the fluctuations in sales volume seem to increase over time.
In the log-transformed vehicle sales plot, while there still are the trend and seasonality, the variance looks more constant over time compared to the original series.
The third plot indicates the seasonally differenced log-transformed series. The seasonal patterns seem to be less apparent, suggesting that the seasonal differencing has been effective in removing the seasonality from the series.
The KPSS test results for the seasonally differenced log-transformed series give a statistic of 0.064599 and a p-value of 0.1. With a p-value higher than the typical significance levels, the null hypothesis of stationarity cannot be rejected. This implies that the series, after log transformation and seasonal differencing, can be considered mean stationary.

### Section 2

acf(seasonally_diff_log_data_ts, main="ACF of Seasonally Differenced Log Transformed Series")

pacf(seasonally_diff_log_data_ts, main="PACF of Seasonally Differenced Log Transformed Series")

The ACF and PACF plots imply that the seasonally differenced log-transformed series could be modeled with an ARIMA(1,0,0) process.

The gradual decline in ACF plot suggests that the data may have an autoregressive component.A significant spike in PACF plot at lag 1 and no further significant spikes suggest an AR(1) process.
The plots do not show clear seasonal patterns, indicating that seasonal differencing has addresses the seasonality.

Section 3

arima_100 <- Arima(seasonally_diff_log_data_ts, order=c(1,0,0))
arima_010 <- Arima(seasonally_diff_log_data_ts, order=c(0,1,0))
arima_001 <- Arima(seasonally_diff_log_data_ts, order=c(0,0,1))

AIC(arima_100, arima_010, arima_001)

## Warning in AIC.default(arima_100, arima_010, arima_001): models are not all
## fitted to the same number of observations

##           df       AIC
## arima_100  3 -993.2551
## arima_010  1 -895.9443
## arima_001  3 -862.6340

BIC(arima_100, arima_010, arima_001)

## Warning in BIC.default(arima_100, arima_010, arima_001): models are not all
## fitted to the same number of observations

##           df       BIC
## arima_100  3 -980.2553
## arima_010  1 -891.6128
## arima_001  3 -849.6341

best_arima <- auto.arima(seasonally_diff_log_data_ts)

summary(best_arima)

## Series: seasonally_diff_log_data_ts 
## ARIMA(1,0,2)(0,0,2)[12] with zero mean 
## 
## Coefficients:
##          ar1      ma1      ma2     sma1     sma2
##       0.9598  -0.4329  -0.1069  -0.6846  -0.1577
## s.e.  0.0148   0.0456   0.0453   0.0425   0.0412
## 
## sigma^2 = 0.006475:  log likelihood = 615.52
## AIC=-1219.04   AICc=-1218.89   BIC=-1193.04
## 
## Training set error measures:
##                       ME       RMSE        MAE      MPE     MAPE      MASE
## Training set 0.002102387 0.08010934 0.05792226 10.14138 261.8839 0.4310673
##                      ACF1
## Training set -0.001249682

fitted_values <- fitted(best_arima)

plot(seasonally_diff_log_data_ts, main="Observed vs Fitted Values")
lines(fitted_values, col='red')

While the manual selection suggested an ARIMA(1,0,0) model, the automated selection found a more complex model to be a better fit. The chosen model by auto.arima performs well in capturing the in-sample trends and patterns of the series.

The ARIMA(1,0,0) model has the lowest AIC and BIC scores, indicating it is the best fit among the models compared manually.
The auto.arima function selected an ARIMA(1,0,2)(0,0,2)[12] with zero mean, which is a more complex model than our manual selection. The AIC and BIC for this model are even lower than for ARIMA(1,0,0).(However, the automated function may have chosen a different model because it considers a broader range of models it may have detected additional patterns in the data that the simpler ARIMA(1,0,0) did not capture)
The plot of observed versus fitted values shows that the fitted values from the selected ARIMA model closely follow the observed values in the series, indicating a good in-sample fit.
The error measures provided (ME, RMSE, MAE, MPE, MAPE, MASE, and ACF1) are all relatively low, which suggests that the model’s predictions are quite close to the actual values. The near-zero ME and ACF1 indicate that there is no bias in the predictions.
### Section 4

residuals <- residuals(best_arima)

box_ljung_test <- Box.test(residuals, lag=log(length(residuals)))

acf(residuals, main="ACF of Residuals")

pacf(residuals, main="PACF of Residuals")

print(box_ljung_test)

## 
##  Box-Pierce test
## 
## data:  residuals
## X-squared = 0.78926, df = 6.3333, p-value = 0.9947

forecast <- forecast(best_arima, h=6)
print(forecast)

##          Point Forecast        Lo 80     Hi 80       Lo 95     Hi 95
## Dec 2023     0.10606286  0.002939657 0.2091861 -0.05165047 0.2637762
## Jan 2024     0.03042964 -0.086131070 0.1469903 -0.14783458 0.2086939
## Feb 2024     0.05471515 -0.068888449 0.1783188 -0.13432024 0.2437506
## Mar 2024     0.03681277 -0.092940829 0.1665664 -0.16162824 0.2352538
## Apr 2024    -0.03463281 -0.169804346 0.1005387 -0.24135983 0.1720942
## May 2024     0.03551635 -0.104460674 0.1754934 -0.17856003 0.2495927

plot(forecast)

The selected ARIMA(1,0,2)(0,0,2)[12] model appears to provide a good fit to the data, indicating that the model captures the data’s underlying process well. The forecasts generated are reasonable based on the historical data, and the increasing forecast intervals reflect the expected uncertainty.

The ACF and PACF plots of the residuals do not show any significant spikes outside the confidence bounds, suggesting that the residuals do not exhibit autocorrelation and are consistent with white noise behavior.
Box-Ljung Test: With a p-value of 0.9947, there is no evidence against the null hypothesis of no autocorrelation in the residuals. This supports the conclusion from the ACF and PACF plots that the residuals are white noise.
Forecast Evaluation: The forecasted values for the next 6 periods show a continuation of the pattern observed at the end of the historical data. The 80% and 95% prediction intervals widen over time, indicating increasing uncertainty further into the future.
The forecasted values seem reasonable as they follow the series’ last observed patterns and the prediction intervals appear consistent with the variability in the historical data.

ForecastingMethods_Assignment3

Keerthi Chereddy

2024-02-08

Section 1

Section 3