load("cregist.RData")
train.cregist = ts(cregist[1:223], frequency = 4, start = c(1960,1))
test.cregist = ts(cregist[224:235], frequency = 4)

Question 1

The “cregist.RData” contains quarterly observed passenger car registration (units) in the United States from January 1960 to July 2018. Data is from FRED (Federal Reserve Economic Data) and you can find more information about the data here. Among the total 235 observations, we will use the last 12 series (224-235) for the model validation and the rest of them (1-223) for the model train. In other words, we aim to predict the last 12 series based on the fitted model using the first 223 observations.

(a)

By visually checking time series, ACF, and PACF plots of the train set, decide what ARIMA (or SARIMA) model seems most appropriate. In other words, specify p,d,q in 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) if you decide ARIMA model or specify p, d, q, and P, D, Q, S in 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) ∗ 𝐴𝑅𝐼𝑀𝐴(𝑃,𝐷,𝑄)𝑆 if you decide SARIMA model. Explain your answer.

Looking at the acf plot its variables don’t appear to go to zero, implying nonstationarity, but the pacf plot cuts off at lag2. For this model I would choose p=2,d=0,q=0. With nonstationarity need to take the difference of the time series data and reexamine the acf and pacf plots. Looking at the differenced acf and pacf plots it’s difficult to interpret the acf plot as it spikes at 1, has a negative spike at 2, and then we see some more significant values appearing in later lags. The pacf plot is much easier to interpret as it spikes at 1 and then all other variables are insignficant. What we have here I would have p=1,d=0,q=0 as the pacf plot cuts off after lag 1.

# Checking time series plot
plot.ts(cregist)

# Looking at ACF and PACF plot. 
par(mfrow=c(1,2))
acf(train.cregist, main = "acf"); 
pacf(train.cregist, main = "pacf")

# difference due to nonstationarity. 
diff.creg = diff(train.cregist, lag=1)
plot.ts(diff.creg)

# check acf and pacf plots of difference
par(mfrow=c(1,2))
acf(diff.creg); pacf(diff.creg)

# Comparing our model to the automatically selected model. 
cregist.model = auto.arima(train.cregist)
summary(cregist.model)
## Series: train.cregist 
## ARIMA(1,1,0) 
## 
## Coefficients:
##           ar1
##       -0.3613
## s.e.   0.0624
## 
## sigma^2 estimated as 2.931e+09:  log likelihood=-2734.21
## AIC=5472.41   AICc=5472.47   BIC=5479.22
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 465.2472 53892.16 36810.86 -0.2965018 5.225752 0.5640233
##                      ACF1
## Training set -0.003495651

(b)

Fit the model which you chose in part (a) on the train set using sarima function in R. Make comments on diagnostics plots.

Both plots actually look pretty good. Both of their acf residual plot have insignificant values which leads us to believe there is no autocorrelation in the model. The Ljung-box test for both also have p-values above the level of significance which tells us that there is white noise.

# Fitting the model for differenced model
sarima(diff.creg, p=1, d=0, q=0)
## initial  value 10.969810 
## iter   2 value 10.899161
## iter   2 value 10.899161
## iter   2 value 10.899161
## final  value 10.899161 
## converged
## initial  value 10.897265 
## iter   2 value 10.897264
## iter   2 value 10.897264
## iter   2 value 10.897264
## final  value 10.897264 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1      xmean
##       -0.3614   349.6933
## s.e.   0.0624  2665.9070
## 
## sigma^2 estimated as 2.917e+09:  log likelihood = -2734.2,  aic = 5474.39
## 
## $degrees_of_freedom
## [1] 220
## 
## $ttable
##       Estimate        SE t.value p.value
## ar1    -0.3614    0.0624 -5.7933  0.0000
## xmean 349.6933 2665.9070  0.1312  0.8958
## 
## $AIC
## [1] 24.65943
## 
## $AICc
## [1] 24.65968
## 
## $BIC
## [1] 24.70541
# Fitting model for nondifferenced
# we choose p=2 as as the acf plot doesn't reach 0 but the pacf cuts off after lag2 in part a
creg.arm = sarima(train.cregist, p=2, d=0, q=0)
## initial  value 11.741102 
## iter   2 value 11.665248
## iter   3 value 10.903557
## iter   4 value 10.891776
## iter   5 value 10.885094
## iter   6 value 10.884351
## iter   7 value 10.884344
## iter   8 value 10.884342
## iter   9 value 10.884342
## iter  10 value 10.884340
## iter  11 value 10.884308
## iter  12 value 10.884303
## iter  13 value 10.884301
## iter  13 value 10.884301
## iter  13 value 10.884301
## final  value 10.884301 
## converged
## initial  value 10.888393 
## iter   2 value 10.888353
## iter   3 value 10.888103
## iter   4 value 10.888076
## iter   5 value 10.888023
## iter   6 value 10.887939
## iter   7 value 10.887816
## iter   8 value 10.887738
## iter   9 value 10.887732
## iter  10 value 10.887710
## iter  11 value 10.887709
## iter  12 value 10.887709
## iter  13 value 10.887709
## iter  14 value 10.887708
## iter  15 value 10.887707
## iter  16 value 10.887706
## iter  16 value 10.887706
## iter  16 value 10.887706
## final  value 10.887706 
## converged

(c)

By visually checking the time series plot of the train set, decide which ESM model seems most appropriate. Explain your choice.

The first thing I notice is that it certainly isn’t an additive linear model so we can assume that there is NO trend. There also doesn’t appear to be any obvious seasonality. The only real guess is error type. I think it would be safe to test both model=MNN and model=ANN

# Again, looking at the auto selection for comparison.
creg.ets.auto = ets(train.cregist)
summary(creg.ets.auto)
## ETS(M,N,N) 
## 
## Call:
##  ets(y = train.cregist) 
## 
##   Smoothing parameters:
##     alpha = 0.6207 
## 
##   Initial states:
##     l = 544687.6876 
## 
##   sigma:  0.0761
## 
##      AIC     AICc      BIC 
## 6068.329 6068.438 6078.550 
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE     ACF1
## Training set 652.8057 54295.63 37275.05 -0.3037918 5.292607 0.5711357 0.011535

(d)

Fit the model which you chose in part (c) on the train set. Use ets function with the specification of model input with three letters. Report autoplot() and interpret how your chosen model decomposes the data.

# Fitting model on training set
creg.ets1 = ets(train.cregist, model="MNN") 
creg.ets2 = ets(train.cregist, model="ANN")
# Using autoplot to look at creg.ets1
summary(creg.ets1)
## ETS(M,N,N) 
## 
## Call:
##  ets(y = train.cregist, model = "MNN") 
## 
##   Smoothing parameters:
##     alpha = 0.6207 
## 
##   Initial states:
##     l = 544687.6876 
## 
##   sigma:  0.0761
## 
##      AIC     AICc      BIC 
## 6068.329 6068.438 6078.550 
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE     ACF1
## Training set 652.8057 54295.63 37275.05 -0.3037918 5.292607 0.5711357 0.011535
autoplot(creg.ets1)

# Using autoplot to look at creg.ets2
summary(creg.ets2)
## ETS(A,N,N) 
## 
## Call:
##  ets(y = train.cregist, model = "ANN") 
## 
##   Smoothing parameters:
##     alpha = 0.662 
## 
##   Initial states:
##     l = 548759.119 
## 
##   sigma:  54471.77
## 
##      AIC     AICc      BIC 
## 6073.616 6073.725 6083.837 
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 584.4343 54226.95 37188.17 -0.3014103 5.277357 0.5698045
##                     ACF1
## Training set -0.03490159
autoplot(creg.ets2)

(e)

Make a prediction for next 12 values based on your (i) ARIMA (or SARIMA) model you chose in part (a) and (ii) ESM you chose in part (c). Calculate their prediction errors using the test set.

# Making prediction for next 12 for ESM model. 
esm.pred = forecast(creg.ets1, h=12)
plot(esm.pred)

mean((esm.pred$mean - as.vector(test.cregist))^2)/20^8
## [1] 0.6422232
# Making prediction for next 12 for ARIMA model.
creg.pred = sarima.for(train.cregist, n.ahead=12,p=2, d=0, q=0)

mean((creg.pred$pred - as.vector(test.cregist))^2)/20^8
## [1] 0.8772961
# Making prediction for next 12 for ARIMA model 2.
creg.pred2 = sarima.for(train.cregist, n.ahead=12,p=1, d=1, q=0)

mean((creg.pred2$pred - as.vector(test.cregist))^2)/20^8
## [1] 0.6757667

(f)

Compare prediction performance between ESM and ARIMA.

ESM prediction error: .64 ARIMA prediction error: .88 ARIMA model 2 prediction error: .68

ESM has the lower prediction error, thus it slightly edges out the ARIMA models.

Question 2

The “UnempRate” in ‘astsa’ package includes the monthly U.S. unemployment rate in percent unemployed from Jan, 1948 to Nov, 2016. Among 827 time series observations, we will use the last 36 series (792-827) for the model validation and the rest of them (1-791) for the model train. In other words, we aim to predict the last 36 series based on the fitted model using the first 791 observations. We repeat the same processes of Question1

data("UnempRate")

train.UnempRate = ts(UnempRate[1:791], frequency = 12, start = c(1948,1))
test.UnempRate = ts(UnempRate[792:827], frequency = 12)

(a)

By visually checking time series, ACF, and PACF plots of the train set, decide what ARIMA (or SARIMA) model seems most appropriate. In other words, specify p,d,q in 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) if you decide ARIMA model or specify p, d, q, and P, D, Q, S in 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) ∗ 𝐴𝑅𝐼𝑀𝐴(𝑃,𝐷,𝑄)𝑆if you decide SARIMA model. Explain your answer.

Examining the acf and pacf plot we see that it is fairly similar to the acf and pacf plots shown in question 1. The acf plot has very large lags that don’t appear to reach zero, again, suggesting nonstationality. The pacf plot is definitely more interpretable as it spikes at lag 1, has a smaller negative significant lag and then falls to zero. The only issue is that it does end up spiking again at lag 13. Again, I would suggest using a lagged difference here. After looking at the lagged difference it doesn’t seem to help us make a concrete decision. For the one specifically it would appear we would need to use an ARIMA * ARIMA model. Specifically p=3,d=1,q=0 * P=3,D=0,Q=0. I choose this one specifically because we have to difference the time series data and for each acf we see a trend of 3 significant lags and pacf plot we see what also looks like a trend of 3 lags. Looking at the auto arima function it comes up with a different suggestion so for comparison sake we may also use this automatically selected model.

#Looking at the time series plot
plot.ts(train.UnempRate)

# Looking at ACF and PACF plot. 
par(mfrow=c(1,2))
acf(train.UnempRate, main = "acf"); 
pacf(train.UnempRate, main = "pacf")

# difference due to nonstationarity. 
diff.un = diff(train.UnempRate, lag=1)
plot.ts(diff.un)

# check acf and pacf plots of difference
par(mfrow=c(1,2))
acf(diff.un); pacf(diff.un)

# Comparing our model to the automatically selected model. 
un.auto = auto.arima(train.UnempRate)
summary(un.auto)
## Series: train.UnempRate 
## ARIMA(3,0,1)(2,1,2)[12] 
## 
## Coefficients:
##          ar1      ar2      ar3      ma1     sar1    sar2    sma1     sma2
##       1.6821  -0.5725  -0.1222  -0.6040  -0.7886  0.0484  0.0521  -0.6233
## s.e.  0.0694   0.1017   0.0396   0.0708   0.0421  0.0505  0.0400   0.0103
## 
## sigma^2 estimated as 0.05549:  log likelihood=18.46
## AIC=-18.93   AICc=-18.69   BIC=22.99
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE    MAPE      MASE
## Training set 0.006590547 0.2325567 0.1764448 0.04619351 3.31106 0.2005991
##                     ACF1
## Training set 0.002053673
# I think it's safe to assume that it get's the 3 from the max lag cut off at 3 for the acf plot?

(b)

Fit the model which you chose in part (a) on the train set using sarima function in R. Make comments on diagnostics plots.

Looking at the plot we suggested it looks pretty bad. the p-values in the Ljung-Box are all very small and the ACF of residuals have significant spikes at varying lags. On the other hand, our automatically selected model looks to preform much better. It still has a few small p-values in the Ljung Box, but nothing as significant as our model. The acf of residuals plot also has all insignificant values leading us to believe white noise.The only issue is that it produces NaN values.

# Fitting our chosen model
un.arm = sarima(train.UnempRate, p=3,d=1,q=0, P=3,D=1,Q=0, S = 12)
## initial  value -1.202846 
## iter   2 value -1.385281
## iter   3 value -1.439128
## iter   4 value -1.457958
## iter   5 value -1.462031
## iter   6 value -1.462196
## iter   7 value -1.462252
## iter   8 value -1.462252
## iter   9 value -1.462252
## iter   9 value -1.462252
## iter   9 value -1.462252
## final  value -1.462252 
## converged
## initial  value -1.410652 
## iter   2 value -1.411286
## iter   3 value -1.411341
## iter   4 value -1.411353
## iter   5 value -1.411353
## iter   6 value -1.411353
## iter   6 value -1.411353
## iter   6 value -1.411353
## final  value -1.411353 
## converged

# Fitting model for auto selected 
un.arm2 = sarima(train.UnempRate, p=3,d=0,q=1, P=2,D=1,Q=2, S = 12)
## initial  value 0.147653 
## iter   2 value -0.651200
## iter   3 value -0.755451
## iter   4 value -0.840112
## iter   5 value -1.210950
## iter   6 value -1.313644
## iter   7 value -1.378907
## iter   8 value -1.421107
## iter   9 value -1.436675
## iter  10 value -1.440643
## iter  11 value -1.440776
## iter  12 value -1.441882
## iter  13 value -1.442247
## iter  14 value -1.442697
## iter  15 value -1.443365
## iter  16 value -1.443703
## iter  17 value -1.444144
## iter  18 value -1.444456
## iter  19 value -1.444954
## iter  20 value -1.446058
## iter  21 value -1.448650
## iter  22 value -1.449432
## iter  23 value -1.450912
## iter  24 value -1.453660
## iter  25 value -1.455280
## iter  26 value -1.456553
## iter  27 value -1.459297
## iter  28 value -1.464943
## iter  29 value -1.470264
## iter  30 value -1.476788
## iter  31 value -1.477630
## iter  32 value -1.479718
## iter  33 value -1.481136
## iter  34 value -1.483538
## iter  35 value -1.484450
## iter  36 value -1.484791
## iter  37 value -1.485177
## iter  38 value -1.485342
## iter  39 value -1.485383
## iter  40 value -1.485387
## iter  40 value -1.485387
## final  value -1.485387 
## converged
## initial  value -1.440434 
## iter   2 value -1.441003
## iter   3 value -1.441875
## iter   4 value -1.442185
## iter   5 value -1.442273
## iter   6 value -1.442279
## iter   7 value -1.442282
## iter   8 value -1.442291
## iter   9 value -1.442292
## iter  10 value -1.442299
## iter  11 value -1.442300
## iter  12 value -1.442302
## iter  13 value -1.442302
## iter  14 value -1.442306
## iter  15 value -1.442308
## iter  16 value -1.442310
## iter  17 value -1.442311
## iter  18 value -1.442312
## iter  19 value -1.442313
## iter  20 value -1.442314
## iter  21 value -1.442315
## iter  22 value -1.442316
## iter  23 value -1.442317
## iter  24 value -1.442319
## iter  25 value -1.442321
## iter  26 value -1.442323
## iter  27 value -1.442328
## iter  28 value -1.442335
## iter  29 value -1.442338
## iter  30 value -1.442350
## iter  31 value -1.442396
## iter  32 value -1.442401
## iter  33 value -1.442407
## iter  34 value -1.442428
## iter  35 value -1.442453
## iter  36 value -1.442475
## iter  37 value -1.442486
## iter  38 value -1.442493
## iter  39 value -1.442498
## iter  40 value -1.442501
## iter  41 value -1.442505
## iter  42 value -1.442510
## iter  43 value -1.442521
## iter  44 value -1.442529
## iter  45 value -1.442554
## iter  46 value -1.442578
## iter  47 value -1.442626
## iter  48 value -1.442645
## iter  49 value -1.442710
## iter  50 value -1.442766
## iter  51 value -1.442774
## iter  52 value -1.442791
## iter  53 value -1.442809
## iter  54 value -1.442820
## iter  55 value -1.442828
## iter  56 value -1.442835
## iter  57 value -1.442839
## iter  58 value -1.442842
## iter  59 value -1.442845
## iter  60 value -1.442856
## iter  61 value -1.442872
## iter  62 value -1.442902
## iter  63 value -1.442925
## iter  64 value -1.442955
## iter  65 value -1.442978
## iter  66 value -1.443003
## iter  67 value -1.443035
## iter  68 value -1.443041
## iter  69 value -1.443044
## iter  70 value -1.443045
## iter  71 value -1.443046
## iter  72 value -1.443050
## iter  73 value -1.443053
## iter  74 value -1.443054
## iter  75 value -1.443057
## iter  76 value -1.443059
## iter  77 value -1.443060
## iter  78 value -1.443060
## iter  79 value -1.443061
## iter  80 value -1.443062
## iter  81 value -1.443062
## iter  82 value -1.443062
## iter  83 value -1.443062
## iter  84 value -1.443063
## iter  85 value -1.443063
## iter  86 value -1.443066
## iter  87 value -1.443067
## iter  88 value -1.443068
## iter  89 value -1.443068
## iter  90 value -1.443069
## iter  91 value -1.443071
## iter  92 value -1.443071
## iter  93 value -1.443071
## iter  94 value -1.443072
## iter  95 value -1.443072
## iter  96 value -1.443072
## iter  97 value -1.443072
## iter  98 value -1.443072
## iter  99 value -1.443072
## iter 100 value -1.443072
## final  value -1.443072 
## stopped after 100 iterations
## Warning in stats::arima(xdata, order = c(p, d, q), seasonal = list(order =
## c(P, : possible convergence problem: optim gave code = 1
## Warning in sqrt(diag(fitit$var.coef)): NaNs produced

## Warning in sqrt(diag(fitit$var.coef)): NaNs produced

(c)

By visually checking the time series plot of the train set, decide which ESM model seems the most appropriate. Explain your choice.

Looking at the time series plot I think it’s first safe to say that our error type could be multiplicative or additive, I would guess that the trend is also multiplicative, and assume that seasonality does exist leading me to believe the best ESM model would be AMA or MMA. Each of these models give us an error so it’s time to go back to the drawing board. Next, we’ll try MMM as I believed both error type and trend to be multiplicative, it would be too far of a stretch to assume seasonality could also. We’ll compare this model against the automatically selected model AAA and see which one is the best fit.

plot.ts(train.UnempRate)

# Again, looking at the auto selection for comparison.
un.ets.auto = ets(train.UnempRate)
summary(un.ets.auto)
## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = train.UnempRate) 
## 
##   Smoothing parameters:
##     alpha = 0.9007 
##     beta  = 0.139 
##     gamma = 0.0985 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 2.9488 
##     b = -0.4162 
##     s = -0.3114 -0.576 -0.7885 -0.5041 -0.1704 0.2942
##            0.3564 -0.0905 0.1349 0.4263 0.7376 0.4916
## 
##   sigma:  0.2541
## 
##      AIC     AICc      BIC 
## 3130.060 3130.946 3214.179 
## 
## Training set error measures:
##                       ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.004677949 0.2513626 0.1936921 0.1064673 3.601387 0.2202074
##                   ACF1
## Training set 0.1239095

(d)

Fit the model which you chose in part (c) on the train set. Use ets function with the specification of model input with three letters. Report autoplot() and interpret how your chosen model decomposes the data.

Looking at the plot of the two models they look very similar. The only real difference is in the slope and in some ways seasonality. To me it looks much more defined in the AAA model so this and the fact that it also has a lower AIC will be our choice for best model.

# Fitting model on training set
un.ets1 = ets(train.UnempRate, model="MMM") 
un.ets2 = ets(train.UnempRate, model="AAA")
#un.ets1 = ets(train.UnempRate, model="MMA") 
#un.ets2 = ets(train.UnempRate, model="AMA")
# plotting our MMM model
summary(un.ets1)
## ETS(M,Md,M) 
## 
## Call:
##  ets(y = train.UnempRate, model = "MMM") 
## 
##   Smoothing parameters:
##     alpha = 0.7181 
##     beta  = 0.0463 
##     gamma = 0.2818 
##     phi   = 0.8001 
## 
##   Initial states:
##     l = 2.7868 
##     b = 1.1117 
##     s = 0.9061 0.9019 0.7696 0.8167 0.8686 0.9846
##            1.013 0.9611 1.0824 1.2114 1.2874 1.1972
## 
##   sigma:  0.0539
## 
##      AIC     AICc      BIC 
## 3396.365 3397.251 3480.485 
## 
## Training set error measures:
##                        ME      RMSE      MAE        MPE     MAPE      MASE
## Training set -0.005405733 0.2887437 0.216973 -0.1271209 3.923748 0.2466753
##                   ACF1
## Training set 0.3494624
autoplot(un.ets1)

# plotting auto selected AAA model
summary(un.ets2)
## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = train.UnempRate, model = "AAA") 
## 
##   Smoothing parameters:
##     alpha = 0.9007 
##     beta  = 0.139 
##     gamma = 0.0985 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 2.9488 
##     b = -0.4162 
##     s = -0.3114 -0.576 -0.7885 -0.5041 -0.1704 0.2942
##            0.3564 -0.0905 0.1349 0.4263 0.7376 0.4916
## 
##   sigma:  0.2541
## 
##      AIC     AICc      BIC 
## 3130.060 3130.946 3214.179 
## 
## Training set error measures:
##                       ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.004677949 0.2513626 0.1936921 0.1064673 3.601387 0.2202074
##                   ACF1
## Training set 0.1239095
autoplot(un.ets2)

(e)

Make a prediction for next 36 values based on your (i) ARIMA (or SARIMA) model you chose in part (a) and (ii) ESM you chose in part (c). Calculate prediction errors using the test set.

# Making prediction for next 36 for ESM model. 
un.esm.pred = forecast(un.ets2, h=36)
plot(un.esm.pred)

mean((un.esm.pred$mean - as.vector(test.UnempRate))^2)
## [1] 1.857387
# Making prediction for next 36 for auto selected ARIMA.
un.pred = sarima.for(train.UnempRate, n.ahead=36,p=3,d=0,q=1, P=2,D=1,Q=2, S = 12)
## Warning in stats::arima(xdata, order = c(p, d, q), seasonal = list(order =
## c(P, : possible convergence problem: optim gave code = 1

mean((un.pred$pred - as.vector(test.UnempRate))^2)
## [1] 3.40613
# quick note the prediction looks... volitile so we'll compare with other model. 
# Making prediction for next 36 for personally selected ARIMA.
un.pred2 = sarima.for(train.UnempRate, n.ahead=36,p=3,d=1,q=0, P=3,D=1,Q=0, S = 12)

mean((un.pred2$pred - as.vector(test.UnempRate))^2)
## [1] 0.08266696
# Making prediction for next 36 for ARIMA. I used this one just for my own comparison sake. This was my original answer for part a before changing it.
un.pred3 = sarima.for(train.UnempRate, n.ahead=36,p=1,d=1,q=0, P=1,D=1,Q=0, S = 12)

mean((un.pred3$pred - as.vector(test.UnempRate))^2)
## [1] 0.1239371
# quick note the prediction looks... volitile so we'll compare with other model. 

(f)

Compare prediction performance between ESM and ARIMA.

ESM Model prediction error: 1.86 Personal ARIMA model: .082 Auto Selected ARIMA model: 3.4

Question 3

The “nyse.RData” contains returns of the New York Stock Exchange (NYSE) from February 2, 1984 to December 31, 1991.

load("nyse.RData")
nyse= ts(nyse, frequency = 365)
#view(nyse)

(a)

Generate the time series plot on returns of NYSE. Generate ACF and PACF plots from both raw data and squared data, respectively. Make comments based on your observation.

Examining both the acf and pacf and acf and pacf plots squared how similar they are. Due to the amount of lags that are significant in each plot and the almost mirror imagine of both it’s safe to assume that they are correlated.

# plotting time series of NYSE returns.
plot(nyse)

# plotting acf and pacf plots
par(mfrow=c(1,2))
acf(nyse, main='acf')
pacf(nyse, main = 'pacf')

# plotting acf and pacf squared plots
par(mfrow=c(1,2))
acf(nyse^2,main = "acf Squared")
pacf(nyse^2,main = "pacf Squared")

(b)

Fit any four times series models of Heteroscedasticity, i.e., choose any four models from ARCH(q) or GARCH(q,p) with your specification of q and p. Then choose the best one based on diagnostics tests on residuals and AIC criteria. There is no one correct model and choose the one that you think the best.

It was a tough choice overall, but i’ll choose GARCH(2,2) as my best model.

ARCH(1): We can eliminate as most of its r^2 and r-values are zero. It is not able to explain volitility within the data.

# ARCH(1)
ny.arch1 = garchFit(~garch(1,0), data=nyse, trace=F)
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
##   Consider formula(paste(x, collapse = " ")) instead.
summary(ny.arch1)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(1, 0), data = nyse, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(1, 0)
## <environment: 0x000000001fc14f60>
##  [data = nyse]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1  
## 6.3533e-04  6.5228e-05  2.6983e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     6.353e-04   1.968e-04    3.228  0.00125 ** 
## omega  6.523e-05   2.484e-06   26.254  < 2e-16 ***
## alpha1 2.698e-01   3.301e-02    8.175 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  6594.697    normalized:  3.297349 
## 
## Description:
##  Fri May 07 16:27:39 2021 by user: FugBoi 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  9019.18   0           
##  Shapiro-Wilk Test  R    W      0.9212042 0           
##  Ljung-Box Test     R    Q(10)  38.38662  3.249771e-05
##  Ljung-Box Test     R    Q(15)  42.17843  0.0002109731
##  Ljung-Box Test     R    Q(20)  45.27366  0.001012957 
##  Ljung-Box Test     R^2  Q(10)  400.3309  0           
##  Ljung-Box Test     R^2  Q(15)  416.9455  0           
##  Ljung-Box Test     R^2  Q(20)  422.2616  0           
##  LM Arch Test       R    TR^2   308.6684  0           
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -6.591697 -6.583296 -6.591702 -6.588612

ARCH(2) definitely has much better looking r and r^2 values so this one is a good option to consider.

# ARCH(2)
ny.arch2 = garchFit(~garch(2,0), data=nyse, trace=F)
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
##   Consider formula(paste(x, collapse = " ")) instead.
summary(ny.arch2)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(2, 0), data = nyse, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(2, 0)
## <environment: 0x0000000022d7ab18>
##  [data = nyse]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1      alpha2  
## 6.7759e-04  5.5752e-05  1.8776e-01  1.4201e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     6.776e-04   1.889e-04    3.587 0.000334 ***
## omega  5.575e-05   2.352e-06   23.707  < 2e-16 ***
## alpha1 1.878e-01   2.840e-02    6.612 3.79e-11 ***
## alpha2 1.420e-01   2.584e-02    5.497 3.87e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  6660.251    normalized:  3.330126 
## 
## Description:
##  Fri May 07 16:27:40 2021 by user: FugBoi 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  3242.832  0           
##  Shapiro-Wilk Test  R    W      0.9441303 0           
##  Ljung-Box Test     R    Q(10)  25.50612  0.004464403 
##  Ljung-Box Test     R    Q(15)  27.22277  0.02697281  
##  Ljung-Box Test     R    Q(20)  29.02819  0.08720581  
##  Ljung-Box Test     R^2  Q(10)  84.49595  6.561418e-14
##  Ljung-Box Test     R^2  Q(15)  94.69486  1.308953e-13
##  Ljung-Box Test     R^2  Q(20)  99.9624   1.27931e-12 
##  LM Arch Test       R    TR^2   76.41465  1.983314e-11
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -6.656251 -6.645049 -6.656259 -6.652138

We can eliminate ARCH(3) as well since all of its values are also less than .05.

# ARCH(3)
ny.arch3 = garchFit(~garch(3,0), data=nyse, trace=F)
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
##   Consider formula(paste(x, collapse = " ")) instead.
summary(ny.arch3)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(3, 0), data = nyse, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(3, 0)
## <environment: 0x0000000018cf7528>
##  [data = nyse]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1      alpha2      alpha3  
## 7.3578e-04  5.1059e-05  1.7391e-01  5.6969e-02  1.3601e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     7.358e-04   1.842e-04    3.995 6.48e-05 ***
## omega  5.106e-05   2.472e-06   20.655  < 2e-16 ***
## alpha1 1.739e-01   2.901e-02    5.996 2.02e-09 ***
## alpha2 5.697e-02   2.222e-02    2.563   0.0104 *  
## alpha3 1.360e-01   3.288e-02    4.137 3.53e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  6689.963    normalized:  3.344982 
## 
## Description:
##  Fri May 07 16:27:40 2021 by user: FugBoi 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value    
##  Jarque-Bera Test   R    Chi^2  2677.76   0          
##  Shapiro-Wilk Test  R    W      0.9512896 0          
##  Ljung-Box Test     R    Q(10)  26.3873   0.003252767
##  Ljung-Box Test     R    Q(15)  27.59275  0.02426187 
##  Ljung-Box Test     R    Q(20)  29.60461  0.07651877 
##  Ljung-Box Test     R^2  Q(10)  22.56195  0.01248425 
##  Ljung-Box Test     R^2  Q(15)  29.04168  0.01588696 
##  Ljung-Box Test     R^2  Q(20)  32.7092   0.03630334 
##  LM Arch Test       R    TR^2   21.32701  0.04579209 
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -6.684963 -6.670961 -6.684976 -6.679822

GARCH(2,2) also is a strong candidate to consider since majority of it’s factors are able to explain volatility.

# GARCH(2,2)
ny.garch1 = garchFit(~garch(2,2), data=nyse, trace=F)
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
##   Consider formula(paste(x, collapse = " ")) instead.
summary(ny.garch1)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(2, 2), data = nyse, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(2, 2)
## <environment: 0x0000000021cb98d0>
##  [data = nyse]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1      alpha2       beta1       beta2  
## 7.3368e-04  7.9497e-06  1.4136e-01  1.0000e-08  3.7725e-01  3.8309e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     7.337e-04   1.791e-04    4.098 4.18e-05 ***
## omega  7.950e-06   3.367e-06    2.361   0.0182 *  
## alpha1 1.414e-01   2.408e-02    5.872 4.32e-09 ***
## alpha2 1.000e-08   6.478e-02    0.000   1.0000    
## beta1  3.772e-01   2.871e-01    1.314   0.1889    
## beta2  3.831e-01   2.067e-01    1.853   0.0639 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  6726.978    normalized:  3.363489 
## 
## Description:
##  Fri May 07 16:27:40 2021 by user: FugBoi 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value    
##  Jarque-Bera Test   R    Chi^2  3283.62   0          
##  Shapiro-Wilk Test  R    W      0.9532771 0          
##  Ljung-Box Test     R    Q(10)  28.66327  0.001412368
##  Ljung-Box Test     R    Q(15)  29.53388  0.01371783 
##  Ljung-Box Test     R    Q(20)  31.75336  0.04599833 
##  Ljung-Box Test     R^2  Q(10)  2.732222  0.9870426  
##  Ljung-Box Test     R^2  Q(15)  3.796005  0.9983323  
##  Ljung-Box Test     R^2  Q(20)  6.058174  0.9988166  
##  LM Arch Test       R    TR^2   3.330788  0.9927238  
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -6.720978 -6.704175 -6.720996 -6.714808

(c)

Make a prediction for the next 10 volatility based on your final model

ny.pred = predict(ny.garch1, n.ahead=10)
ny.pred
##    meanForecast   meanError standardDeviation
## 1  0.0007336844 0.009553450       0.009553450
## 2  0.0007336844 0.010250753       0.010250753
## 3  0.0007336844 0.009869532       0.009869532
## 4  0.0007336844 0.009935794       0.009935794
## 5  0.0007336844 0.009821518       0.009821518
## 6  0.0007336844 0.009787443       0.009787443
## 7  0.0007336844 0.009725359       0.009725359
## 8  0.0007336844 0.009679792       0.009679792
## 9  0.0007336844 0.009632016       0.009632016
## 10 0.0007336844 0.009589499       0.009589499
predict(ny.garch1, n.ahead=10,plot=TRUE, conf=.9)

##    meanForecast   meanError standardDeviation lowerInterval upperInterval
## 1  0.0007336844 0.009553450       0.009553450   -0.01498034    0.01644771
## 2  0.0007336844 0.010250753       0.010250753   -0.01612730    0.01759467
## 3  0.0007336844 0.009869532       0.009869532   -0.01550025    0.01696762
## 4  0.0007336844 0.009935794       0.009935794   -0.01560924    0.01707661
## 5  0.0007336844 0.009821518       0.009821518   -0.01542128    0.01688864
## 6  0.0007336844 0.009787443       0.009787443   -0.01536523    0.01683260
## 7  0.0007336844 0.009725359       0.009725359   -0.01526311    0.01673048
## 8  0.0007336844 0.009679792       0.009679792   -0.01518816    0.01665553
## 9  0.0007336844 0.009632016       0.009632016   -0.01510957    0.01657694
## 10 0.0007336844 0.009589499       0.009589499   -0.01503964    0.01650701