Getting a clear working environment:

rm(list = ls())

Setting a working directory in my class folder:

setwd("/Users/noam/Desktop/Brandeis University/Graduate/Spring 2022/Forecasting/R Files")

Loading the data:

library(fpp2)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.4 ──
## ✓ ggplot2   3.3.5     ✓ fma       2.4  
## ✓ forecast  8.16      ✓ expsmooth 2.3
## 
data <- read.csv("UsedCarSales.csv", header = TRUE)
data

1.

cars.ts <- ts(data$CarSales, start = c(1992,1), frequency = 12)
nValid <- 12
nTrain <- length(cars.ts) - nValid
train.ts <- window(cars.ts, start = c(1992,1), end = c(1992, nTrain))
valid.ts <- window(cars.ts, start = c(1992, nTrain+1), end = c(1992, nTrain + nValid))

# AUTOMATED MODEL SELECTION: set model "ZZZ" 
auto <- ets(train.ts, model = "ZZZ", restrict = FALSE, allow.multiplicative.trend = TRUE)
auto
## ETS(M,A,M) 
## 
## Call:
##  ets(y = train.ts, model = "ZZZ", restrict = FALSE, allow.multiplicative.trend = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.3438 
##     beta  = 1e-04 
##     gamma = 0.2742 
## 
##   Initial states:
##     l = 1936.6385 
##     b = 29.9296 
##     s = 0.7943 0.8872 1.0217 0.9736 1.0521 1.0907
##            1.053 1.0383 1.1469 1.0473 0.9804 0.9144
## 
##   sigma:  0.057
## 
##      AIC     AICc      BIC 
## 6054.171 6056.026 6119.659
a. The output is the following: M,A,M
The first letter stands for the type of error term. In this case, the error observed is M, meaning *multiplicative* error.
The second letter is for the trend.In this case, the trend observed is A, meaning *additive* trend.
The third letter is for the seasonality. In this case, the observed is M, meaning *multiplicative* seasonality.

Fitting the automated model & finding forecast:

auto.pred <- forecast(auto, h = nValid, level = 0)
plot(auto.pred, ylim = c(1300, 18000), ylab = "Ridership", xlab = "Time", 
     bty = "l", xaxt = "n", xlim = c(1992, 2021), main = "", flty = 2)
axis(1, at = seq(1992, 2021, 1), labels = format(seq(1992, 2021, 1)))
lines(auto.pred$fitted, lwd = 2, col ="blue")
lines(valid.ts)

b. According to the plot, the model does rather poorly at predicting the sharp spike in used car sales over the validation period.
This is somewhat expected, considering the influence of the outbreak of COVID-19 on used car sales. It is hard to believe any time series model could have
successfully predicted this spike -- we better not judge this model too severely. It does a pretty good job over the training period.
For more information on why this happened, see:
https://fortune.com/2021/06/24/pandemic-demand-for-used-cars-have-contributed-to-the-recent-spike-in-us-inflation/
https://fortune.com/2022/02/10/used-car-prices-soar-45-percent-in-a-year/

Automated Model Training and Validation Residuals:

checkresiduals(auto.pred)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,A,M)
## Q* = 55.634, df = 8, p-value = 3.324e-09
## 
## Model df: 16.   Total lags used: 24
We see that overall, the error is pretty Gaussian, and the ACF is satisfactory (within the blue bounds.) As I said, the model does a pretty good job over the training period.

Automated Model Validation Residuals Only:

Considering the residuals over the validation period, however, we see that the residuals are much more substantial and do not follow a normal distribution:
valid.err <- valid.ts - auto.pred$mean
checkresiduals(valid.err)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

Accuracy measures and discussion:

accuracy(auto.pred, valid.ts)
##                       ME      RMSE      MAE        MPE      MAPE      MASE
## Training set   -8.829618  425.7005  238.132 -0.4895708  3.949568 0.5008128
## Test set     3274.369558 3767.3693 3415.223 20.8391933 22.091891 7.1825174
##                   ACF1 Theil's U
## Training set 0.3455564        NA
## Test set     0.2958598  1.454582
From class forum:

"Key determinates of a good model are (1) consistency of the values of the performance metrics between training and validation periods - i.e., model robustness, and (2) low values for the metrics - i.e., residuals are low. In general, choosing between 2 models, pick the one that achieves better results for (1) and (2). Going further, we also consider whether any "information" is contained in the residual (and hence not captured within the model)."- Professor Dumas

In the context of this problem, from the table above we see that the values of the performance metrics between the training set and the test set of this model are inherently inconsistent, indicating poor model performance. In addition, the value of the MAPE metric is pretty for the test set, while pretty good for the training set.

A MAPE value of more than 20% is considered an "OK" value according to Stephen Allwright (see https://stephenallwright.com/good-mape-score/), where as a good MAPE value would be below 10%. This source https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781119199885.app1 also quotes 10% as a good benchmark (Michael Gilliland, 2010: Forecasting FAQs, page 196).

The MAPE parameter is relevant for this problem because the residuals over the validation period are skewed left. At the same time, other measures are also relevant, if we consider that the residuals are rather Gaussian over the training period.

2.

a. Assuming that by "automated model selection", this question refers to the ets method in R, the answer is 3:
The likelihood over the training period with a penalty on the number of parameters.

From Shmueli:
"How does ets define "best" for choosing between models with different numbers of parameters? It uses a criterion called Akaike’s Information Criterion (AIC), which combines fit to the training data with a penalty for the number of smoothing parameters and initial values (L0, T0, etc.) included in a model." (Shmueli, page 100.)

AIC = −log(likelihood) + 2p

"where likelihood is a measure of fit to the training period and p is the number of smoothing parameters and initial states in the model. The concept is similar
to adjusted-R squared used in regression models, which combines the model’s
fit with a penalty on the number of parameters".

This penalty makes sense, since lesser parameters mean our model is less complicated. The more complicated your model is, the more prone it would be to overfitting the data.

b. There is no trend. This is equivalent to saying that the trend is N in the ets function in R.
I can show how this is true by taking lag-1 diff and use an ets function to determine what the trend is for the same time series:
#  Create difference LAG = 1 to REMOVE TREND
diff1.ts <- diff(cars.ts, lag = 1)
nValid <- 12
nTrain <- length(diff1.ts) - nValid
train1.ts <- window(diff1.ts, start = c(1992,1), end = c(1992, nTrain))
## Warning in window.default(x, ...): 'start' value not changed
valid1.ts <- window(diff1.ts, start = c(1992, nTrain+1), end = c(1992, nTrain + nValid))

# AUTOMATED MODEL SELECTION: set model "ZZZ" 
auto1 <- ets(train1.ts, model = "ZZZ", restrict = FALSE, allow.multiplicative.trend = TRUE)
auto1
## ETS(A,N,A) 
## 
## Call:
##  ets(y = train1.ts, model = "ZZZ", restrict = FALSE, allow.multiplicative.trend = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 2e-04 
##     gamma = 0.3655 
## 
##   Initial states:
##     l = 7.7927 
##     s = 430.4313 -76.7463 -501.3143 136.9706 -471.4353 5.9876
##            -53.9154 -80.3998 155.6504 -642.8276 320.6401 776.9587
## 
##   sigma:  429.0121
## 
##      AIC     AICc      BIC 
## 6233.124 6234.579 6290.821
As we see here, the new ETS function for the lag-1 diff model gives no beta value. In other words, beta is equal to 0. There is no trend.
Since there is no trend, the ETS function returns ETS(A,N,A), where N stands for "no trend".

c. Overfitting. 

"The α value that optimizes one-step-ahead predictive accuracy over the training period can be used to determine the degree
of local versus global nature of the level. However, beware of choosing the "best α" for forecasting, as this can lead to model overfitting and low predictive accuracy over the validation pe- riod and/or future." (Shmueli, P. 89)

3.

In general, our goal here is to minimize our model's root mean squared error (RMSE). In other words, a model with a lower RMSE performs better.
In this case, we have two models, one performing better on the training period, while the other performing better on the validation period.

a. Model a, since its RMSE is lower on the training period, and therefore it fits the training data better than model b. This will be more useful for descriptive analysis, as it is more accurate in describing the behavior of past data. (Possibly over fitting)

b. Model b, since its RMSE is lower on the validation period, and therefore it fits the validation period better than model a. 
This model will be more useful for predictive analysis, as it is more accurate in predicting future data.

Bonus:

Assuming the automated ets function is using Holt-Winter's exponential smoothing method, in which the trend was recognized as additive and the seasonality was recognized as multiplicative (MAM), the general form of the updating equations used to determine alpha, beta and gamma respectively for *additive* trend *multiplicative* seasonality are the following:

1. General Level:

Lt= αyt/St−M + (1 − α)(Lt−1 + Tt−1)

2. General Trend: 

β(Lt − Lt−1) + (1 − β)Tt−1

3. General Seasonality:

γ (yt/Lt) + (1 − γ)St−M

Now, considering the model specifications (smoothing parameters) which were determined earlier:

Smoothing parameters:

alpha = 0.3438 

beta  = 1e-04 

gamma = 0.2742 

We can substitute the constants we received into each equation, and get:

1. Model Level:

Lt= 0.3438yt/St−M + (1 − 0.3438)(Lt−1 + Tt−1)

2. Model Trend: 

1e-04(Lt − Lt−1) + (1 − 1e-04)Tt−1

3. Model Seasonality:

0.2742(yt/Lt) + (1 − 0.2742)St−M