Time Series Analysis

Question 1

The cregist.RData contains quarterly observed passenger car registration (units) in the United States from January 1960 to July 2018. The data is from FRED (Federal Reserve Economic Data) and you can find more information about the data here. Among the total 235 observations, we will use the last 12 for model validation and the rest for training.

a) By visually checking the time series, ACF, and PACF plots of the training set, decide what ARIMA (or SARIMA) model seems most appropriate.

The time series plot shows the data has a nonconstant mean. To detrend the data, I took the first difference of the series. The ACF and PACF plots do not show a clear seasonal pattern. The ACF goes to zero after lag 1 and the PACF tails off, indicating \(p = 0\) and \(q = 1\).

library(tidyverse)
library(forecast)
library(fGarch)
library(astsa)
library(caret)
library(ggplot2)
library(scales)
library(formattable)
library(gridExtra)
load("cregist.RData")
train.cregist <- ts(cregist[1:223], frequency = 4, start = c(1960,1))
test.cregist <- ts(cregist[224:235], frequency = 4, start = c(2015,4))
diff.cregist <- diff(train.cregist, lag = 1)
ts1 <- autoplot(train.cregist) +
  scale_y_continuous(labels = comma_format()) +
  labs(title = "Quarterly Car Registrations in the US", y = "Number of Registrations \n")
ts2 <- autoplot(diff.cregist) + 
  scale_y_continuous(labels = comma_format()) + 
  labs(title = "First Order Differenced Series", y = "")
grid.arrange(ts1, ts2, nrow = 1)

acf <- ggAcf(diff.cregist, lwd = 1.25, main = "")
pacf <- ggPacf(diff.cregist, lwd = 1.25, main = "")
grid.arrange(acf, pacf, nrow = 1, top = "ACF and PACF of First-Differenced Series")

b) Fit the model which you chose in part (a) on the training set using sarima(). Comment on the diagnostics plots.

In the ARIMA(0,1,1) model, the MA(1) coefficient is significant. The residuals show insignificant autocorrelations for almost all lags in the ACF plot and the p-values for the Ljung-Box test are above 0.05, implying white noise of the residuals.

cregist.ma1 <- sarima(train.cregist, p=0, d=1, q=1)

## initial  value 10.967596 
## iter   2 value 10.903519
## iter   3 value 10.903111
## iter   4 value 10.903081
## iter   4 value 10.903081
## iter   4 value 10.903081
## final  value 10.903081 
## converged
## initial  value 10.903344 
## iter   2 value 10.903343
## iter   2 value 10.903343
## iter   2 value 10.903343
## final  value 10.903343 
## converged

cregist.ma1

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1   constant
##       -0.3366   352.0726
## s.e.   0.0577  2425.2920
## 
## sigma^2 estimated as 2.953e+09:  log likelihood = -2735.55,  aic = 5477.09
## 
## $degrees_of_freedom
## [1] 220
## 
## $ttable
##          Estimate        SE t.value p.value
## ma1       -0.3366    0.0577 -5.8283  0.0000
## constant 352.0726 2425.2920  0.1452  0.8847
## 
## $AIC
## [1] 24.67159
## 
## $AICc
## [1] 24.67184
## 
## $BIC
## [1] 24.71757

c) By visually checking the time series plot of the training set, decide which ESM model seems most appropriate. Explain your choice.

There is a lot of variation in the error and seasonality in the data, but overall I think it appears to be more additive. The time series doesn’t appear to have a clear systematic trend, indicating an ANA ESM model.

autoplot(train.cregist, size = 0.7) +
  scale_y_continuous(labels = comma_format()) +
  labs(title = "Quarterly Car Registrations in the US", y = "Number of Registrations \n")

d) Fit the model which you chose in part (c) on the training set. Use ets() with the specification of model input with three letters. Report autoplot() and interpret how your chosen model decomposes the data.

The ANA ESM model decomposes the series into two components:

alpha, \(\alpha\), for the level smoothing weight
gamma, \(\gamma\), for the seasonal smoothing weight

The level captures the overall trend in the data with a smoothing parameter of 0.6802, while the value of gamma is much smaller at 0.0001, indicating that the seasonal component changes very little over time.

cregist.ana <- ets(train.cregist, model = "ANA")
summary(cregist.ana)

## ETS(A,N,A) 
## 
## Call:
##  ets(y = train.cregist, model = "ANA") 
## 
##   Smoothing parameters:
##     alpha = 0.6802 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 586792.8267 
##     s = -10530.35 9016.402 -1106.082 2620.032
## 
##   sigma:  54033.07
## 
##      AIC     AICc      BIC 
## 6073.936 6074.457 6097.786 
## 
## Training set error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE        ACF1
## Training set 279.1865 53301.21 37689.7 -0.337298 5.351536 0.5774891 -0.02896001

autoplot(cregist.ana)

e) Make a prediction for the next 12 values based on your (i) ARIMA (or SARIMA) model you chose in part (a) and (ii) ESM you chose in part (c). Calculate their prediction error on the test set.

arima.ma1 <- sarima.for(train.cregist, n.ahead=12, p=0, d=1, q=1, plot=F)
cregist.esm <- forecast(cregist.ana, h = 12)
autoplot(train.cregist, size = 0.7) +
  autolayer(arima.ma1$pred, series = "ARIMA", size = 0.8) +
  autolayer(cregist.esm, series = "ESM", PI = FALSE, size = 0.8) +
  scale_color_manual(values = c("#54B6B5", "#1017C0")) +
  scale_y_continuous("Number of Registrations \n", labels = comma_format()) +
  coord_cartesian(xlim = c(1960, 2018)) +
  labs(title = "Forecasted Car Registrations using Exponential Smoothing and ARIMA", color = "Forecast") +
  theme(plot.title = element_text(hjust = 0.5), legend.position = c(.917, .872), 
        legend.title = element_text(size = 12, hjust = 0.5), 
        legend.key.size = unit(8.25, 'mm'), legend.text = element_text(size = 11), 
        legend.background = element_rect(fill = "#F5F5F5", color = "grey50"))

arima.rmse <- RMSE(arima.ma1$pred, test.cregist)
esm.rmse <- RMSE(cregist.esm$mean, test.cregist)
data.frame(Model = c("ARIMA", "Exponential Smoothing"), 
           RMSE = comma(round(c(arima.rmse, esm.rmse)), 0)) %>% 
  arrange(RMSE) %>% 
  formattable(align = c("l", "c"))

Model	RMSE
Exponential Smoothing	124,325
ARIMA	130,962

f) Compare the prediction performance between ESM and ARIMA.

Overall, the exponential smoothing model provides a better fit in predicting quarterly car registrations with a lower test RMSE than the ARIMA model.

Question 2

The UnempRate data in the astsa library includes the monthly US unemployment rate in percent unemployed from January 1948 to November 2016. Among the total 827 time series observations, we will use the last 36 for model validation and the rest for training.

a) By visually checking the time series, ACF, and PACF plots of the training set, decide what ARIMA (or SARIMA) model seems most appropriate.

The time series plot shows the data has an increasing mean. To stabilize the series, I took the first difference of the data. The ACF plot peaks at every 12 lags, suggesting a yearly seasonal trend (\(s=12\)). The ACF does not go to zero very quickly, indicating nonstationarity so I will take the seasonal difference of the series.

data("UnempRate")
train.UnempRate <- ts(UnempRate[1:791], frequency = 12, start = c(1948,1))
test.UnempRate <- ts(UnempRate[792:827], frequency = 12, start = c(2013,12))
diff.unemprate <- diff(train.UnempRate, lag = 1)
ts1 <- autoplot(train.UnempRate) +
  scale_y_continuous(labels = percent_format(scale = 1, accuracy = 1)) +
  labs(title = "Monthly US Unemployment Rate", y = "Unemployment Rate")
ts2 <- autoplot(diff.unemprate) + 
  scale_y_continuous(labels = percent_format(scale = 1, accuracy = 1)) + 
  labs(title = "First Order Differenced Series", y = "")
grid.arrange(ts1, ts2, nrow = 1)

acf <- ggAcf(diff.unemprate, lwd = 1.25, main = "")
pacf <- ggPacf(diff.unemprate, lwd = 1.25, main = "")
grid.arrange(acf, pacf, nrow = 1, top = "ACF and PACF of First-Differenced Series")

sdiff.unemprate <- diff(diff.unemprate, lag = 12)
acf <- ggAcf(sdiff.unemprate, lwd = 1.25, main = "")
pacf <- ggPacf(sdiff.unemprate, lwd = 1.25, main = "")
grid.arrange(acf, pacf, nrow = 1, top = "ACF and PACF of Seasonal Differenced Series")

The seasonal ACF plot goes to zero after lag 1 and the PACF tails off, indicating P = 0 and Q = 1. The nonseasonal PACF cuts off after lag 3, which indicates an ARIMA(3,1,0) x ARIMA(0,1,1)\(_{12}\) model.

b) Fit the model which you chose in part (a) on the training set using sarima(). Comment on the diagnostics plots.

In the ARIMA(3,1,0) x ARIMA(0,1,1)\(_{12}\) model, the AR coefficients and the season MA(1) coefficient are significant. Nearly all lags in the ACF residuals plot and Ljung-Box tests are insignificant, implying white noise of the residuals.

arima.unemprate <- sarima(train.UnempRate, p=3, d=1, q=0, P=0, D=1, Q=1, S=12)

## initial  value -1.147308 
## iter   2 value -1.365674
## iter   3 value -1.370590
## iter   4 value -1.399679
## iter   5 value -1.410461
## iter   6 value -1.413610
## iter   7 value -1.418316
## iter   8 value -1.419035
## iter   9 value -1.419098
## iter  10 value -1.419099
## iter  10 value -1.419099
## final  value -1.419099 
## converged
## initial  value -1.426581 
## iter   2 value -1.426794
## iter   3 value -1.426996
## iter   4 value -1.427001
## iter   5 value -1.427002
## iter   5 value -1.427002
## iter   5 value -1.427002
## final  value -1.427002 
## converged

arima.unemprate

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ar2     ar3     sma1
##       0.1198  0.2053  0.0939  -0.7594
## s.e.  0.0359  0.0353  0.0358   0.0268
## 
## sigma^2 estimated as 0.05684:  log likelihood = 6.27,  aic = -2.55
## 
## $degrees_of_freedom
## [1] 774
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    0.1198 0.0359   3.3335   9e-04
## ar2    0.2053 0.0353   5.8100   0e+00
## ar3    0.0939 0.0358   2.6203   9e-03
## sma1  -0.7594 0.0268 -28.2890   0e+00
## 
## $AIC
## [1] -0.003227704
## 
## $AICc
## [1] -0.003163039
## 
## $BIC
## [1] 0.0262826

c) By visually checking the time series plot of the training set, decide which ESM model seems the most appropriate. Explain your choice.

In the plot of the time series, the slope appears to be additive but fluctuates, indicating a damped component in the trend, and the error and seasonality appear multiplicative overall, suggesting an MAdM ESM model.

autoplot(train.UnempRate, size = 0.5) +
  scale_y_continuous("Unemployment Rate", labels = percent_format(scale = 1, accuracy = 1)) +
  labs(title = "Monthly US Unemployment Rates", y = "Unemployment Rates")

The MAdM model contains four parameters:

alpha, \(\alpha = 0.7538\) for the level smoothing weight
beta, \(\beta = 0.1687\) for the trend/slope smoothing weight
gamma, \(\gamma = 0.0441\) for the seasonal smoothing weight
phi, \(\phi = 0.8\) for the trend damping weight

The model decomposes the series into three components: the level, slope, and season. The level and slope together explain the overall trend in the data, while the seasonal component captures the regular patterns in the series. The small value of \(\gamma\) indicates the seasonality changes very little over time.

unemprate.maa <- ets(train.UnempRate, model = "MAA", damped = T)
summary(unemprate.maa)

## ETS(M,Ad,A) 
## 
## Call:
##  ets(y = train.UnempRate, model = "MAA", damped = T) 
## 
##   Smoothing parameters:
##     alpha = 0.7538 
##     beta  = 0.1687 
##     gamma = 0.0441 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 2.5038 
##     b = 0.4006 
##     s = -0.2971 -0.3537 -0.7243 -0.2386 -0.2255 0.1159
##            0.398 -0.3863 0.0294 0.4224 0.6008 0.6592
## 
##   sigma:  0.052
## 
##      AIC     AICc      BIC 
## 3336.907 3337.793 3421.027 
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.002501454 0.2616073 0.1992624 0.02769154 3.716049 0.2265402
##                   ACF1
## Training set 0.2016424

autoplot(unemprate.maa)

e) Make a prediction for the next 12 values based on your (i) ARIMA (or SARIMA) model you chose in part (a) and (ii) ESM you chose in part (c). Calculate their prediction error on the test set.

sarima.unemprate <- sarima.for(train.UnempRate, n.ahead=36, p=3, d=1, q=0, P=0, D=1, Q=1, S=12, plot=F) 
unemprate.esm <- forecast(unemprate.maa, h = 36)
autoplot(train.UnempRate) +
  autolayer(sarima.unemprate$pred, series = "ARIMA", size = 0.7) +
  autolayer(unemprate.esm, series = "ESM", PI = FALSE, size = 0.7) +
  scale_color_manual(values = c("#54B6B5", "#1017C0")) +
  scale_y_continuous("Unemployment Rate", labels = percent_format(scale = 1, accuracy = 1)) +
  coord_cartesian(xlim = c(1948, 2020)) +
  labs(title = "Forecasted Unemployment Rates using Exponential Smoothing and ARIMA", color = "Forecast") +
  theme(plot.title = element_text(hjust = 0.5), legend.position = c(.935, .875),
        legend.title = element_text(size = 12, hjust = 0.5), 
        legend.key.size = unit(8.25, 'mm'), legend.text = element_text(size = 11), 
        legend.background = element_rect(fill = "#F5F5F5", color = "grey50"))

arima.rmse <- round(mean((sarima.unemprate$pred - test.UnempRate)^2) %>% sqrt(), 2)
esm.rmse <- round(mean((test.UnempRate - unemprate.esm$mean)^2) %>% sqrt(), 2)
data.frame(Model = c("Seasonal ARIMA", "Exponential Smoothing"), RMSE = c(arima.rmse, esm.rmse)) %>% 
  arrange(RMSE) %>% 
  formattable(align = c("l", "c"))

Model	RMSE
Seasonal ARIMA	1.05
Exponential Smoothing	1.58

f) Compare the prediction performance between ESM and ARIMA.

The seasonal ARIMA model has a better performance in predicting unemployment rates with a lower test root mean squared error than the ESM model.

Question 3

The nyse.RData contains returns of the New York Stock Exchange (NYSE) from February 2, 1984 to December 31, 1991.

a) Generate the time series plot of the NYSE stock returns. Generate ACF and PACF plots from both the raw data and squared data, respectively. Make comments based on your observation.

The time series plot shows the data has a constant mean of zero but there is a lot of variation in the volatility of the exchange rate. Based on the ACF and PACF plots of the original series, there is very weak autocorrelation with the majority of the lags insignificant. The ACF and PACF plots of the squared series show a stronger association between the current and past values with several lags above the significance level.

load("nyse.RData")
nyse <- ts(nyse, frequency = 365)
autoplot(nyse) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  labs(title = "New York Stock Exchange Returns", y = "Stock Return")

acf <- ggAcf(nyse, lwd = 1.25, lag.max = 30, main = "")
pacf <- ggPacf(nyse, lwd = 1.25, lag.max = 30, main = "")
grid.arrange(acf, pacf, nrow = 1, top = "ACF and PACF of NYSE Returns")

acf <- ggAcf(nyse^2, lwd = 1.25, lag.max = 30, main = "ACF (Squared Data)")
pacf <- ggPacf(nyse^2, lwd = 1.25, lag.max = 30, main = "PACF (Squared Data)")
grid.arrange(acf, pacf, nrow = 1)

b) Fit any four times series models of heteroscedasticity, i.e., choose any four models from ARCH(\(q\)) or GARCH(\(q, p\)) with your specification of \(q\) and \(p\). Then choose the best one based on diagnostics tests on residuals and AIC/BIC criteria. There is no one correct model and choose the one that you think is the best.

ARCH(1) Model

nyse.m1 <- garchFit(~garch(1,0), data = nyse, trace = F)
summary(nyse.m1)

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(1, 0), data = nyse, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(1, 0)
## <environment: 0x000000002850cb90>
##  [data = nyse]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1  
## 6.3533e-04  6.5228e-05  2.6983e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     6.353e-04   1.968e-04    3.228  0.00125 ** 
## omega  6.523e-05   2.484e-06   26.254  < 2e-16 ***
## alpha1 2.698e-01   3.301e-02    8.175 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  6594.697    normalized:  3.297349 
## 
## Description:
##  Mon May 10 14:49:27 2021 by user: Kelli 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  9019.18   0           
##  Shapiro-Wilk Test  R    W      0.9212042 0           
##  Ljung-Box Test     R    Q(10)  38.38662  3.249771e-05
##  Ljung-Box Test     R    Q(15)  42.17843  0.0002109731
##  Ljung-Box Test     R    Q(20)  45.27366  0.001012957 
##  Ljung-Box Test     R^2  Q(10)  400.3309  0           
##  Ljung-Box Test     R^2  Q(15)  416.9455  0           
##  Ljung-Box Test     R^2  Q(20)  422.2616  0           
##  LM Arch Test       R    TR^2   308.6684  0           
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -6.591697 -6.583296 -6.591702 -6.588612

ARCH(2) Model

nyse.m2 <- garchFit(~garch(2,0), data = nyse, trace = F)
summary(nyse.m2)

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(2, 0), data = nyse, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(2, 0)
## <environment: 0x0000000027740620>
##  [data = nyse]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1      alpha2  
## 6.7759e-04  5.5752e-05  1.8776e-01  1.4201e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     6.776e-04   1.889e-04    3.587 0.000334 ***
## omega  5.575e-05   2.352e-06   23.707  < 2e-16 ***
## alpha1 1.878e-01   2.840e-02    6.612 3.79e-11 ***
## alpha2 1.420e-01   2.584e-02    5.497 3.87e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  6660.251    normalized:  3.330126 
## 
## Description:
##  Mon May 10 14:49:27 2021 by user: Kelli 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  3242.832  0           
##  Shapiro-Wilk Test  R    W      0.9441303 0           
##  Ljung-Box Test     R    Q(10)  25.50612  0.004464403 
##  Ljung-Box Test     R    Q(15)  27.22277  0.02697281  
##  Ljung-Box Test     R    Q(20)  29.02819  0.08720581  
##  Ljung-Box Test     R^2  Q(10)  84.49595  6.561418e-14
##  Ljung-Box Test     R^2  Q(15)  94.69486  1.308953e-13
##  Ljung-Box Test     R^2  Q(20)  99.9624   1.27931e-12 
##  LM Arch Test       R    TR^2   76.41465  1.983314e-11
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -6.656251 -6.645049 -6.656259 -6.652138

GARCH(1,1) Model

nyse.m3 <- garchFit(~garch(1,1), data = nyse, trace = F)
summary(nyse.m3)

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(1, 1), data = nyse, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(1, 1)
## <environment: 0x000000002a4c8b50>
##  [data = nyse]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1       beta1  
## 7.3695e-04  6.5416e-06  1.1408e-01  8.0608e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     7.369e-04   1.786e-04    4.126 3.69e-05 ***
## omega  6.542e-06   1.455e-06    4.495 6.94e-06 ***
## alpha1 1.141e-01   1.604e-02    7.114 1.13e-12 ***
## beta1  8.061e-01   2.973e-02   27.112  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  6723.005    normalized:  3.361502 
## 
## Description:
##  Mon May 10 14:49:27 2021 by user: Kelli 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  3628.415  0           
##  Shapiro-Wilk Test  R    W      0.9515562 0           
##  Ljung-Box Test     R    Q(10)  29.69242  0.0009616813
##  Ljung-Box Test     R    Q(15)  30.50938  0.01021164  
##  Ljung-Box Test     R    Q(20)  32.81143  0.03538324  
##  Ljung-Box Test     R^2  Q(10)  3.510505  0.9667405   
##  Ljung-Box Test     R^2  Q(15)  4.408852  0.9960585   
##  Ljung-Box Test     R^2  Q(20)  6.68935   0.9975864   
##  LM Arch Test       R    TR^2   3.967784  0.9840107   
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -6.719005 -6.707803 -6.719013 -6.714891

GARCH(2,2) Model

nyse.m4 <- garchFit(~garch(2,2), data = nyse, trace = F)
summary(nyse.m4)

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(2, 2), data = nyse, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(2, 2)
## <environment: 0x0000000022e84fc8>
##  [data = nyse]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1      alpha2       beta1       beta2  
## 7.3368e-04  7.9497e-06  1.4136e-01  1.0000e-08  3.7725e-01  3.8309e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     7.337e-04   1.791e-04    4.098 4.18e-05 ***
## omega  7.950e-06   3.367e-06    2.361   0.0182 *  
## alpha1 1.414e-01   2.408e-02    5.872 4.32e-09 ***
## alpha2 1.000e-08   6.478e-02    0.000   1.0000    
## beta1  3.772e-01   2.871e-01    1.314   0.1889    
## beta2  3.831e-01   2.067e-01    1.853   0.0639 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  6726.978    normalized:  3.363489 
## 
## Description:
##  Mon May 10 14:49:27 2021 by user: Kelli 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value    
##  Jarque-Bera Test   R    Chi^2  3283.62   0          
##  Shapiro-Wilk Test  R    W      0.9532771 0          
##  Ljung-Box Test     R    Q(10)  28.66327  0.001412368
##  Ljung-Box Test     R    Q(15)  29.53388  0.01371783 
##  Ljung-Box Test     R    Q(20)  31.75336  0.04599833 
##  Ljung-Box Test     R^2  Q(10)  2.732222  0.9870426  
##  Ljung-Box Test     R^2  Q(15)  3.796005  0.9983323  
##  Ljung-Box Test     R^2  Q(20)  6.058174  0.9988166  
##  LM Arch Test       R    TR^2   3.330788  0.9927238  
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -6.720978 -6.704175 -6.720996 -6.714808

Overall, I think the GARCH(1,1) model is the best choice since it has the lowest BIC out of the four models. The diagnostic tests indicate the volatility of the residuals are not correlated, though the mean of the residuals show evidence of correlation with p-values just below the significance level of 0.05. An ARIMA component could be added to the model to uncorrelate the residuals.

c) Make a prediction for the next 10 volatility based on your final model.

nyse.pred <- predict(nyse.m3, n.ahead = 10, plot = T, conf = 0.9)

Predicted volatility

nyse.pred$standardDeviation

##  [1] 0.010351669 0.010253926 0.010163156 0.010078912 0.010000767 0.009928317
##  [7] 0.009861183 0.009799002 0.009741435 0.009688162