load("cregist.RData")
train.cregist = ts(cregist[1:223], frequency = 4, start = c(1960,1))
test.cregist = ts(cregist[224:235], frequency = 4)
The “cregist.RData” contains quarterly observed passenger car registration (units) in the United States from January 1960 to July 2018. Data is from FRED (Federal Reserve Economic Data) and you can find more information about the data here. Among the total 235 observations, we will use the last 12 series (224-235) for the model validation and the rest of them (1-223) for the model train. In other words, we aim to predict the last 12 series based on the fitted model using the first 223 observations.
By visually checking time series, ACF, and PACF plots of the train set, decide what ARIMA (or SARIMA) model seems most appropriate. In other words, specify p,d,q in 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) if you decide ARIMA model or specify p, d, q, and P, D, Q, S in 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) ∗ 𝐴𝑅𝐼𝑀𝐴(𝑃,𝐷,𝑄)𝑆 if you decide SARIMA model. Explain your answer.
Looking at the acf plot its variables don’t appear to go to zero, implying nonstationarity, but the pacf plot cuts off at lag2. For this model I would choose p=2,d=0,q=0. With nonstationarity need to take the difference of the time series data and reexamine the acf and pacf plots. Looking at the differenced acf and pacf plots it’s difficult to interpret the acf plot as it spikes at 1, has a negative spike at 2, and then we see some more significant values appearing in later lags. The pacf plot is much easier to interpret as it spikes at 1 and then all other variables are insignficant. What we have here I would have p=1,d=0,q=0 as the pacf plot cuts off after lag 1.
# Checking time series plot
plot.ts(cregist)
# Looking at ACF and PACF plot.
par(mfrow=c(1,2))
acf(train.cregist, main = "acf");
pacf(train.cregist, main = "pacf")
# difference due to nonstationarity.
diff.creg = diff(train.cregist, lag=1)
plot.ts(diff.creg)
# check acf and pacf plots of difference
par(mfrow=c(1,2))
acf(diff.creg); pacf(diff.creg)
# Comparing our model to the automatically selected model.
cregist.model = auto.arima(train.cregist)
summary(cregist.model)
## Series: train.cregist
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.3613
## s.e. 0.0624
##
## sigma^2 estimated as 2.931e+09: log likelihood=-2734.21
## AIC=5472.41 AICc=5472.47 BIC=5479.22
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 465.2472 53892.16 36810.86 -0.2965018 5.225752 0.5640233
## ACF1
## Training set -0.003495651
Fit the model which you chose in part (a) on the train set using sarima function in R. Make comments on diagnostics plots.
Both plots actually look pretty good. Both of their acf residual plot have insignificant values which leads us to believe there is no autocorrelation in the model. The Ljung-box test for both also have p-values above the level of significance which tells us that there is white noise.
# Fitting the model for differenced model
sarima(diff.creg, p=1, d=0, q=0)
## initial value 10.969810
## iter 2 value 10.899161
## iter 2 value 10.899161
## iter 2 value 10.899161
## final value 10.899161
## converged
## initial value 10.897265
## iter 2 value 10.897264
## iter 2 value 10.897264
## iter 2 value 10.897264
## final value 10.897264
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 xmean
## -0.3614 349.6933
## s.e. 0.0624 2665.9070
##
## sigma^2 estimated as 2.917e+09: log likelihood = -2734.2, aic = 5474.39
##
## $degrees_of_freedom
## [1] 220
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.3614 0.0624 -5.7933 0.0000
## xmean 349.6933 2665.9070 0.1312 0.8958
##
## $AIC
## [1] 24.65943
##
## $AICc
## [1] 24.65968
##
## $BIC
## [1] 24.70541
# Fitting model for nondifferenced
# we choose p=2 as as the acf plot doesn't reach 0 but the pacf cuts off after lag2 in part a
creg.arm = sarima(train.cregist, p=2, d=0, q=0)
## initial value 11.741102
## iter 2 value 11.665248
## iter 3 value 10.903557
## iter 4 value 10.891776
## iter 5 value 10.885094
## iter 6 value 10.884351
## iter 7 value 10.884344
## iter 8 value 10.884342
## iter 9 value 10.884342
## iter 10 value 10.884340
## iter 11 value 10.884308
## iter 12 value 10.884303
## iter 13 value 10.884301
## iter 13 value 10.884301
## iter 13 value 10.884301
## final value 10.884301
## converged
## initial value 10.888393
## iter 2 value 10.888353
## iter 3 value 10.888103
## iter 4 value 10.888076
## iter 5 value 10.888023
## iter 6 value 10.887939
## iter 7 value 10.887816
## iter 8 value 10.887738
## iter 9 value 10.887732
## iter 10 value 10.887710
## iter 11 value 10.887709
## iter 12 value 10.887709
## iter 13 value 10.887709
## iter 14 value 10.887708
## iter 15 value 10.887707
## iter 16 value 10.887706
## iter 16 value 10.887706
## iter 16 value 10.887706
## final value 10.887706
## converged
By visually checking the time series plot of the train set, decide which ESM model seems most appropriate. Explain your choice.
The first thing I notice is that it certainly isn’t an additive linear model so we can assume that there is NO trend. There also doesn’t appear to be any obvious seasonality. The only real guess is error type. I think it would be safe to test both model=MNN and model=ANN
# Again, looking at the auto selection for comparison.
creg.ets.auto = ets(train.cregist)
summary(creg.ets.auto)
## ETS(M,N,N)
##
## Call:
## ets(y = train.cregist)
##
## Smoothing parameters:
## alpha = 0.6207
##
## Initial states:
## l = 544687.6876
##
## sigma: 0.0761
##
## AIC AICc BIC
## 6068.329 6068.438 6078.550
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 652.8057 54295.63 37275.05 -0.3037918 5.292607 0.5711357 0.011535
Fit the model which you chose in part (c) on the train set. Use ets function with the specification of model input with three letters. Report autoplot() and interpret how your chosen model decomposes the data.
# Fitting model on training set
creg.ets1 = ets(train.cregist, model="MNN")
creg.ets2 = ets(train.cregist, model="ANN")
# Using autoplot to look at creg.ets1
summary(creg.ets1)
## ETS(M,N,N)
##
## Call:
## ets(y = train.cregist, model = "MNN")
##
## Smoothing parameters:
## alpha = 0.6207
##
## Initial states:
## l = 544687.6876
##
## sigma: 0.0761
##
## AIC AICc BIC
## 6068.329 6068.438 6078.550
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 652.8057 54295.63 37275.05 -0.3037918 5.292607 0.5711357 0.011535
autoplot(creg.ets1)
# Using autoplot to look at creg.ets2
summary(creg.ets2)
## ETS(A,N,N)
##
## Call:
## ets(y = train.cregist, model = "ANN")
##
## Smoothing parameters:
## alpha = 0.662
##
## Initial states:
## l = 548759.119
##
## sigma: 54471.77
##
## AIC AICc BIC
## 6073.616 6073.725 6083.837
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 584.4343 54226.95 37188.17 -0.3014103 5.277357 0.5698045
## ACF1
## Training set -0.03490159
autoplot(creg.ets2)
Make a prediction for next 12 values based on your (i) ARIMA (or SARIMA) model you chose in part (a) and (ii) ESM you chose in part (c). Calculate their prediction errors using the test set.
# Making prediction for next 12 for ESM model.
esm.pred = forecast(creg.ets1, h=12)
plot(esm.pred)
mean((esm.pred$mean - as.vector(test.cregist))^2)/20^8
## [1] 0.6422232
# Making prediction for next 12 for ARIMA model.
creg.pred = sarima.for(train.cregist, n.ahead=12,p=2, d=0, q=0)
mean((creg.pred$pred - as.vector(test.cregist))^2)/20^8
## [1] 0.8772961
# Making prediction for next 12 for ARIMA model 2.
creg.pred2 = sarima.for(train.cregist, n.ahead=12,p=1, d=1, q=0)
mean((creg.pred2$pred - as.vector(test.cregist))^2)/20^8
## [1] 0.6757667
Compare prediction performance between ESM and ARIMA.
ESM prediction error: .64 ARIMA prediction error: .88 ARIMA model 2 prediction error: .68
ESM has the lower prediction error, thus it slightly edges out the ARIMA models.
The “UnempRate” in ‘astsa’ package includes the monthly U.S. unemployment rate in percent unemployed from Jan, 1948 to Nov, 2016. Among 827 time series observations, we will use the last 36 series (792-827) for the model validation and the rest of them (1-791) for the model train. In other words, we aim to predict the last 36 series based on the fitted model using the first 791 observations. We repeat the same processes of Question1
data("UnempRate")
train.UnempRate = ts(UnempRate[1:791], frequency = 12, start = c(1948,1))
test.UnempRate = ts(UnempRate[792:827], frequency = 12)
By visually checking time series, ACF, and PACF plots of the train set, decide what ARIMA (or SARIMA) model seems most appropriate. In other words, specify p,d,q in 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) if you decide ARIMA model or specify p, d, q, and P, D, Q, S in 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) ∗ 𝐴𝑅𝐼𝑀𝐴(𝑃,𝐷,𝑄)𝑆if you decide SARIMA model. Explain your answer.
Examining the acf and pacf plot we see that it is fairly similar to the acf and pacf plots shown in question 1. The acf plot has very large lags that don’t appear to reach zero, again, suggesting nonstationality. The pacf plot is definitely more interpretable as it spikes at lag 1, has a smaller negative significant lag and then falls to zero. The only issue is that it does end up spiking again at lag 13. Again, I would suggest using a lagged difference here. After looking at the lagged difference it doesn’t seem to help us make a concrete decision. For the one specifically it would appear we would need to use an ARIMA * ARIMA model. Specifically p=3,d=1,q=0 * P=3,D=0,Q=0. I choose this one specifically because we have to difference the time series data and for each acf we see a trend of 3 significant lags and pacf plot we see what also looks like a trend of 3 lags. Looking at the auto arima function it comes up with a different suggestion so for comparison sake we may also use this automatically selected model.
#Looking at the time series plot
plot.ts(train.UnempRate)
# Looking at ACF and PACF plot.
par(mfrow=c(1,2))
acf(train.UnempRate, main = "acf");
pacf(train.UnempRate, main = "pacf")
# difference due to nonstationarity.
diff.un = diff(train.UnempRate, lag=1)
plot.ts(diff.un)
# check acf and pacf plots of difference
par(mfrow=c(1,2))
acf(diff.un); pacf(diff.un)
# Comparing our model to the automatically selected model.
un.auto = auto.arima(train.UnempRate)
summary(un.auto)
## Series: train.UnempRate
## ARIMA(3,0,1)(2,1,2)[12]
##
## Coefficients:
## ar1 ar2 ar3 ma1 sar1 sar2 sma1 sma2
## 1.6821 -0.5725 -0.1222 -0.6040 -0.7886 0.0484 0.0521 -0.6233
## s.e. 0.0694 0.1017 0.0396 0.0708 0.0421 0.0505 0.0400 0.0103
##
## sigma^2 estimated as 0.05549: log likelihood=18.46
## AIC=-18.93 AICc=-18.69 BIC=22.99
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.006590547 0.2325567 0.1764448 0.04619351 3.31106 0.2005991
## ACF1
## Training set 0.002053673
# I think it's safe to assume that it get's the 3 from the max lag cut off at 3 for the acf plot?
Fit the model which you chose in part (a) on the train set using sarima function in R. Make comments on diagnostics plots.
Looking at the plot we suggested it looks pretty bad. the p-values in the Ljung-Box are all very small and the ACF of residuals have significant spikes at varying lags. On the other hand, our automatically selected model looks to preform much better. It still has a few small p-values in the Ljung Box, but nothing as significant as our model. The acf of residuals plot also has all insignificant values leading us to believe white noise.The only issue is that it produces NaN values.
# Fitting our chosen model
un.arm = sarima(train.UnempRate, p=3,d=1,q=0, P=3,D=1,Q=0, S = 12)
## initial value -1.202846
## iter 2 value -1.385281
## iter 3 value -1.439128
## iter 4 value -1.457958
## iter 5 value -1.462031
## iter 6 value -1.462196
## iter 7 value -1.462252
## iter 8 value -1.462252
## iter 9 value -1.462252
## iter 9 value -1.462252
## iter 9 value -1.462252
## final value -1.462252
## converged
## initial value -1.410652
## iter 2 value -1.411286
## iter 3 value -1.411341
## iter 4 value -1.411353
## iter 5 value -1.411353
## iter 6 value -1.411353
## iter 6 value -1.411353
## iter 6 value -1.411353
## final value -1.411353
## converged
# Fitting model for auto selected
un.arm2 = sarima(train.UnempRate, p=3,d=0,q=1, P=2,D=1,Q=2, S = 12)
## initial value 0.147653
## iter 2 value -0.651200
## iter 3 value -0.755451
## iter 4 value -0.840112
## iter 5 value -1.210950
## iter 6 value -1.313644
## iter 7 value -1.378907
## iter 8 value -1.421107
## iter 9 value -1.436675
## iter 10 value -1.440643
## iter 11 value -1.440776
## iter 12 value -1.441882
## iter 13 value -1.442247
## iter 14 value -1.442697
## iter 15 value -1.443365
## iter 16 value -1.443703
## iter 17 value -1.444144
## iter 18 value -1.444456
## iter 19 value -1.444954
## iter 20 value -1.446058
## iter 21 value -1.448650
## iter 22 value -1.449432
## iter 23 value -1.450912
## iter 24 value -1.453660
## iter 25 value -1.455280
## iter 26 value -1.456553
## iter 27 value -1.459297
## iter 28 value -1.464943
## iter 29 value -1.470264
## iter 30 value -1.476788
## iter 31 value -1.477630
## iter 32 value -1.479718
## iter 33 value -1.481136
## iter 34 value -1.483538
## iter 35 value -1.484450
## iter 36 value -1.484791
## iter 37 value -1.485177
## iter 38 value -1.485342
## iter 39 value -1.485383
## iter 40 value -1.485387
## iter 40 value -1.485387
## final value -1.485387
## converged
## initial value -1.440434
## iter 2 value -1.441003
## iter 3 value -1.441875
## iter 4 value -1.442185
## iter 5 value -1.442273
## iter 6 value -1.442279
## iter 7 value -1.442282
## iter 8 value -1.442291
## iter 9 value -1.442292
## iter 10 value -1.442299
## iter 11 value -1.442300
## iter 12 value -1.442302
## iter 13 value -1.442302
## iter 14 value -1.442306
## iter 15 value -1.442308
## iter 16 value -1.442310
## iter 17 value -1.442311
## iter 18 value -1.442312
## iter 19 value -1.442313
## iter 20 value -1.442314
## iter 21 value -1.442315
## iter 22 value -1.442316
## iter 23 value -1.442317
## iter 24 value -1.442319
## iter 25 value -1.442321
## iter 26 value -1.442323
## iter 27 value -1.442328
## iter 28 value -1.442335
## iter 29 value -1.442338
## iter 30 value -1.442350
## iter 31 value -1.442396
## iter 32 value -1.442401
## iter 33 value -1.442407
## iter 34 value -1.442428
## iter 35 value -1.442453
## iter 36 value -1.442475
## iter 37 value -1.442486
## iter 38 value -1.442493
## iter 39 value -1.442498
## iter 40 value -1.442501
## iter 41 value -1.442505
## iter 42 value -1.442510
## iter 43 value -1.442521
## iter 44 value -1.442529
## iter 45 value -1.442554
## iter 46 value -1.442578
## iter 47 value -1.442626
## iter 48 value -1.442645
## iter 49 value -1.442710
## iter 50 value -1.442766
## iter 51 value -1.442774
## iter 52 value -1.442791
## iter 53 value -1.442809
## iter 54 value -1.442820
## iter 55 value -1.442828
## iter 56 value -1.442835
## iter 57 value -1.442839
## iter 58 value -1.442842
## iter 59 value -1.442845
## iter 60 value -1.442856
## iter 61 value -1.442872
## iter 62 value -1.442902
## iter 63 value -1.442925
## iter 64 value -1.442955
## iter 65 value -1.442978
## iter 66 value -1.443003
## iter 67 value -1.443035
## iter 68 value -1.443041
## iter 69 value -1.443044
## iter 70 value -1.443045
## iter 71 value -1.443046
## iter 72 value -1.443050
## iter 73 value -1.443053
## iter 74 value -1.443054
## iter 75 value -1.443057
## iter 76 value -1.443059
## iter 77 value -1.443060
## iter 78 value -1.443060
## iter 79 value -1.443061
## iter 80 value -1.443062
## iter 81 value -1.443062
## iter 82 value -1.443062
## iter 83 value -1.443062
## iter 84 value -1.443063
## iter 85 value -1.443063
## iter 86 value -1.443066
## iter 87 value -1.443067
## iter 88 value -1.443068
## iter 89 value -1.443068
## iter 90 value -1.443069
## iter 91 value -1.443071
## iter 92 value -1.443071
## iter 93 value -1.443071
## iter 94 value -1.443072
## iter 95 value -1.443072
## iter 96 value -1.443072
## iter 97 value -1.443072
## iter 98 value -1.443072
## iter 99 value -1.443072
## iter 100 value -1.443072
## final value -1.443072
## stopped after 100 iterations
## Warning in stats::arima(xdata, order = c(p, d, q), seasonal = list(order =
## c(P, : possible convergence problem: optim gave code = 1
## Warning in sqrt(diag(fitit$var.coef)): NaNs produced
## Warning in sqrt(diag(fitit$var.coef)): NaNs produced
By visually checking the time series plot of the train set, decide which ESM model seems the most appropriate. Explain your choice.
Looking at the time series plot I think it’s first safe to say that our error type could be multiplicative or additive, I would guess that the trend is also multiplicative, and assume that seasonality does exist leading me to believe the best ESM model would be AMA or MMA. Each of these models give us an error so it’s time to go back to the drawing board. Next, we’ll try MMM as I believed both error type and trend to be multiplicative, it would be too far of a stretch to assume seasonality could also. We’ll compare this model against the automatically selected model AAA and see which one is the best fit.
plot.ts(train.UnempRate)
# Again, looking at the auto selection for comparison.
un.ets.auto = ets(train.UnempRate)
summary(un.ets.auto)
## ETS(A,Ad,A)
##
## Call:
## ets(y = train.UnempRate)
##
## Smoothing parameters:
## alpha = 0.9007
## beta = 0.139
## gamma = 0.0985
## phi = 0.8
##
## Initial states:
## l = 2.9488
## b = -0.4162
## s = -0.3114 -0.576 -0.7885 -0.5041 -0.1704 0.2942
## 0.3564 -0.0905 0.1349 0.4263 0.7376 0.4916
##
## sigma: 0.2541
##
## AIC AICc BIC
## 3130.060 3130.946 3214.179
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.004677949 0.2513626 0.1936921 0.1064673 3.601387 0.2202074
## ACF1
## Training set 0.1239095
Fit the model which you chose in part (c) on the train set. Use ets function with the specification of model input with three letters. Report autoplot() and interpret how your chosen model decomposes the data.
Looking at the plot of the two models they look very similar. The only real difference is in the slope and in some ways seasonality. To me it looks much more defined in the AAA model so this and the fact that it also has a lower AIC will be our choice for best model.
# Fitting model on training set
un.ets1 = ets(train.UnempRate, model="MMM")
un.ets2 = ets(train.UnempRate, model="AAA")
#un.ets1 = ets(train.UnempRate, model="MMA")
#un.ets2 = ets(train.UnempRate, model="AMA")
# plotting our MMM model
summary(un.ets1)
## ETS(M,Md,M)
##
## Call:
## ets(y = train.UnempRate, model = "MMM")
##
## Smoothing parameters:
## alpha = 0.7181
## beta = 0.0463
## gamma = 0.2818
## phi = 0.8001
##
## Initial states:
## l = 2.7868
## b = 1.1117
## s = 0.9061 0.9019 0.7696 0.8167 0.8686 0.9846
## 1.013 0.9611 1.0824 1.2114 1.2874 1.1972
##
## sigma: 0.0539
##
## AIC AICc BIC
## 3396.365 3397.251 3480.485
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.005405733 0.2887437 0.216973 -0.1271209 3.923748 0.2466753
## ACF1
## Training set 0.3494624
autoplot(un.ets1)
# plotting auto selected AAA model
summary(un.ets2)
## ETS(A,Ad,A)
##
## Call:
## ets(y = train.UnempRate, model = "AAA")
##
## Smoothing parameters:
## alpha = 0.9007
## beta = 0.139
## gamma = 0.0985
## phi = 0.8
##
## Initial states:
## l = 2.9488
## b = -0.4162
## s = -0.3114 -0.576 -0.7885 -0.5041 -0.1704 0.2942
## 0.3564 -0.0905 0.1349 0.4263 0.7376 0.4916
##
## sigma: 0.2541
##
## AIC AICc BIC
## 3130.060 3130.946 3214.179
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.004677949 0.2513626 0.1936921 0.1064673 3.601387 0.2202074
## ACF1
## Training set 0.1239095
autoplot(un.ets2)
Make a prediction for next 36 values based on your (i) ARIMA (or SARIMA) model you chose in part (a) and (ii) ESM you chose in part (c). Calculate prediction errors using the test set.
# Making prediction for next 36 for ESM model.
un.esm.pred = forecast(un.ets2, h=36)
plot(un.esm.pred)
mean((un.esm.pred$mean - as.vector(test.UnempRate))^2)
## [1] 1.857387
# Making prediction for next 36 for auto selected ARIMA.
un.pred = sarima.for(train.UnempRate, n.ahead=36,p=3,d=0,q=1, P=2,D=1,Q=2, S = 12)
## Warning in stats::arima(xdata, order = c(p, d, q), seasonal = list(order =
## c(P, : possible convergence problem: optim gave code = 1
mean((un.pred$pred - as.vector(test.UnempRate))^2)
## [1] 3.40613
# quick note the prediction looks... volitile so we'll compare with other model.
# Making prediction for next 36 for personally selected ARIMA.
un.pred2 = sarima.for(train.UnempRate, n.ahead=36,p=3,d=1,q=0, P=3,D=1,Q=0, S = 12)
mean((un.pred2$pred - as.vector(test.UnempRate))^2)
## [1] 0.08266696
# Making prediction for next 36 for ARIMA. I used this one just for my own comparison sake. This was my original answer for part a before changing it.
un.pred3 = sarima.for(train.UnempRate, n.ahead=36,p=1,d=1,q=0, P=1,D=1,Q=0, S = 12)
mean((un.pred3$pred - as.vector(test.UnempRate))^2)
## [1] 0.1239371
# quick note the prediction looks... volitile so we'll compare with other model.
Compare prediction performance between ESM and ARIMA.
ESM Model prediction error: 1.86 Personal ARIMA model: .082 Auto Selected ARIMA model: 3.4
The “nyse.RData” contains returns of the New York Stock Exchange (NYSE) from February 2, 1984 to December 31, 1991.
load("nyse.RData")
nyse= ts(nyse, frequency = 365)
#view(nyse)
Generate the time series plot on returns of NYSE. Generate ACF and PACF plots from both raw data and squared data, respectively. Make comments based on your observation.
Examining both the acf and pacf and acf and pacf plots squared how similar they are. Due to the amount of lags that are significant in each plot and the almost mirror imagine of both it’s safe to assume that they are correlated.
# plotting time series of NYSE returns.
plot(nyse)
# plotting acf and pacf plots
par(mfrow=c(1,2))
acf(nyse, main='acf')
pacf(nyse, main = 'pacf')
# plotting acf and pacf squared plots
par(mfrow=c(1,2))
acf(nyse^2,main = "acf Squared")
pacf(nyse^2,main = "pacf Squared")
Fit any four times series models of Heteroscedasticity, i.e., choose any four models from ARCH(q) or GARCH(q,p) with your specification of q and p. Then choose the best one based on diagnostics tests on residuals and AIC criteria. There is no one correct model and choose the one that you think the best.
It was a tough choice overall, but i’ll choose GARCH(2,2) as my best model.
ARCH(1): We can eliminate as most of its r^2 and r-values are zero. It is not able to explain volitility within the data.
# ARCH(1)
ny.arch1 = garchFit(~garch(1,0), data=nyse, trace=F)
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
## Consider formula(paste(x, collapse = " ")) instead.
summary(ny.arch1)
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(1, 0), data = nyse, trace = F)
##
## Mean and Variance Equation:
## data ~ garch(1, 0)
## <environment: 0x000000001fc14f60>
## [data = nyse]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## mu omega alpha1
## 6.3533e-04 6.5228e-05 2.6983e-01
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 6.353e-04 1.968e-04 3.228 0.00125 **
## omega 6.523e-05 2.484e-06 26.254 < 2e-16 ***
## alpha1 2.698e-01 3.301e-02 8.175 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 6594.697 normalized: 3.297349
##
## Description:
## Fri May 07 16:27:39 2021 by user: FugBoi
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 9019.18 0
## Shapiro-Wilk Test R W 0.9212042 0
## Ljung-Box Test R Q(10) 38.38662 3.249771e-05
## Ljung-Box Test R Q(15) 42.17843 0.0002109731
## Ljung-Box Test R Q(20) 45.27366 0.001012957
## Ljung-Box Test R^2 Q(10) 400.3309 0
## Ljung-Box Test R^2 Q(15) 416.9455 0
## Ljung-Box Test R^2 Q(20) 422.2616 0
## LM Arch Test R TR^2 308.6684 0
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -6.591697 -6.583296 -6.591702 -6.588612
ARCH(2) definitely has much better looking r and r^2 values so this one is a good option to consider.
# ARCH(2)
ny.arch2 = garchFit(~garch(2,0), data=nyse, trace=F)
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
## Consider formula(paste(x, collapse = " ")) instead.
summary(ny.arch2)
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(2, 0), data = nyse, trace = F)
##
## Mean and Variance Equation:
## data ~ garch(2, 0)
## <environment: 0x0000000022d7ab18>
## [data = nyse]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## mu omega alpha1 alpha2
## 6.7759e-04 5.5752e-05 1.8776e-01 1.4201e-01
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 6.776e-04 1.889e-04 3.587 0.000334 ***
## omega 5.575e-05 2.352e-06 23.707 < 2e-16 ***
## alpha1 1.878e-01 2.840e-02 6.612 3.79e-11 ***
## alpha2 1.420e-01 2.584e-02 5.497 3.87e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 6660.251 normalized: 3.330126
##
## Description:
## Fri May 07 16:27:40 2021 by user: FugBoi
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 3242.832 0
## Shapiro-Wilk Test R W 0.9441303 0
## Ljung-Box Test R Q(10) 25.50612 0.004464403
## Ljung-Box Test R Q(15) 27.22277 0.02697281
## Ljung-Box Test R Q(20) 29.02819 0.08720581
## Ljung-Box Test R^2 Q(10) 84.49595 6.561418e-14
## Ljung-Box Test R^2 Q(15) 94.69486 1.308953e-13
## Ljung-Box Test R^2 Q(20) 99.9624 1.27931e-12
## LM Arch Test R TR^2 76.41465 1.983314e-11
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -6.656251 -6.645049 -6.656259 -6.652138
We can eliminate ARCH(3) as well since all of its values are also less than .05.
# ARCH(3)
ny.arch3 = garchFit(~garch(3,0), data=nyse, trace=F)
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
## Consider formula(paste(x, collapse = " ")) instead.
summary(ny.arch3)
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(3, 0), data = nyse, trace = F)
##
## Mean and Variance Equation:
## data ~ garch(3, 0)
## <environment: 0x0000000018cf7528>
## [data = nyse]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## mu omega alpha1 alpha2 alpha3
## 7.3578e-04 5.1059e-05 1.7391e-01 5.6969e-02 1.3601e-01
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 7.358e-04 1.842e-04 3.995 6.48e-05 ***
## omega 5.106e-05 2.472e-06 20.655 < 2e-16 ***
## alpha1 1.739e-01 2.901e-02 5.996 2.02e-09 ***
## alpha2 5.697e-02 2.222e-02 2.563 0.0104 *
## alpha3 1.360e-01 3.288e-02 4.137 3.53e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 6689.963 normalized: 3.344982
##
## Description:
## Fri May 07 16:27:40 2021 by user: FugBoi
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 2677.76 0
## Shapiro-Wilk Test R W 0.9512896 0
## Ljung-Box Test R Q(10) 26.3873 0.003252767
## Ljung-Box Test R Q(15) 27.59275 0.02426187
## Ljung-Box Test R Q(20) 29.60461 0.07651877
## Ljung-Box Test R^2 Q(10) 22.56195 0.01248425
## Ljung-Box Test R^2 Q(15) 29.04168 0.01588696
## Ljung-Box Test R^2 Q(20) 32.7092 0.03630334
## LM Arch Test R TR^2 21.32701 0.04579209
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -6.684963 -6.670961 -6.684976 -6.679822
GARCH(2,2) also is a strong candidate to consider since majority of it’s factors are able to explain volatility.
# GARCH(2,2)
ny.garch1 = garchFit(~garch(2,2), data=nyse, trace=F)
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
## Consider formula(paste(x, collapse = " ")) instead.
summary(ny.garch1)
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(2, 2), data = nyse, trace = F)
##
## Mean and Variance Equation:
## data ~ garch(2, 2)
## <environment: 0x0000000021cb98d0>
## [data = nyse]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## mu omega alpha1 alpha2 beta1 beta2
## 7.3368e-04 7.9497e-06 1.4136e-01 1.0000e-08 3.7725e-01 3.8309e-01
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 7.337e-04 1.791e-04 4.098 4.18e-05 ***
## omega 7.950e-06 3.367e-06 2.361 0.0182 *
## alpha1 1.414e-01 2.408e-02 5.872 4.32e-09 ***
## alpha2 1.000e-08 6.478e-02 0.000 1.0000
## beta1 3.772e-01 2.871e-01 1.314 0.1889
## beta2 3.831e-01 2.067e-01 1.853 0.0639 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 6726.978 normalized: 3.363489
##
## Description:
## Fri May 07 16:27:40 2021 by user: FugBoi
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 3283.62 0
## Shapiro-Wilk Test R W 0.9532771 0
## Ljung-Box Test R Q(10) 28.66327 0.001412368
## Ljung-Box Test R Q(15) 29.53388 0.01371783
## Ljung-Box Test R Q(20) 31.75336 0.04599833
## Ljung-Box Test R^2 Q(10) 2.732222 0.9870426
## Ljung-Box Test R^2 Q(15) 3.796005 0.9983323
## Ljung-Box Test R^2 Q(20) 6.058174 0.9988166
## LM Arch Test R TR^2 3.330788 0.9927238
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -6.720978 -6.704175 -6.720996 -6.714808
Make a prediction for the next 10 volatility based on your final model
ny.pred = predict(ny.garch1, n.ahead=10)
ny.pred
## meanForecast meanError standardDeviation
## 1 0.0007336844 0.009553450 0.009553450
## 2 0.0007336844 0.010250753 0.010250753
## 3 0.0007336844 0.009869532 0.009869532
## 4 0.0007336844 0.009935794 0.009935794
## 5 0.0007336844 0.009821518 0.009821518
## 6 0.0007336844 0.009787443 0.009787443
## 7 0.0007336844 0.009725359 0.009725359
## 8 0.0007336844 0.009679792 0.009679792
## 9 0.0007336844 0.009632016 0.009632016
## 10 0.0007336844 0.009589499 0.009589499
predict(ny.garch1, n.ahead=10,plot=TRUE, conf=.9)
## meanForecast meanError standardDeviation lowerInterval upperInterval
## 1 0.0007336844 0.009553450 0.009553450 -0.01498034 0.01644771
## 2 0.0007336844 0.010250753 0.010250753 -0.01612730 0.01759467
## 3 0.0007336844 0.009869532 0.009869532 -0.01550025 0.01696762
## 4 0.0007336844 0.009935794 0.009935794 -0.01560924 0.01707661
## 5 0.0007336844 0.009821518 0.009821518 -0.01542128 0.01688864
## 6 0.0007336844 0.009787443 0.009787443 -0.01536523 0.01683260
## 7 0.0007336844 0.009725359 0.009725359 -0.01526311 0.01673048
## 8 0.0007336844 0.009679792 0.009679792 -0.01518816 0.01665553
## 9 0.0007336844 0.009632016 0.009632016 -0.01510957 0.01657694
## 10 0.0007336844 0.009589499 0.009589499 -0.01503964 0.01650701