Exercise 1

Use the Electricity.RData examined in HW1 Exercise 2 (d)-(f). Fit the model using first order differenced log transformed series.

a: By visually checking, decide what SARIMA models seem appropriate, i.e., specify p, d, q and P, D, Q in SARIMA model, ARIMA(p, d, q) ∗ ARIMA(P,D, Q)s. Choose the most appropriate two models and explain your answer

load("Electricity.RData")

# first order differenced log transformed series
diff.log.electricity <- diff(log(electricity), lag=1)
acf(diff.log.electricity, lag=60, main = "acf")

Comment: When looking at the ACF plot, we can see that there are plenty repetitive peaks and trough throughout the time series data. They can be detected through very large ACF at s, 2s, 3s, etc., but they do not tail off to zero. This implies that this data is non-stationary seasonal series, and hence, we need to apply further seasonal difference.

sdiff.diff.log.electricity <- diff(diff.log.electricity, lag=12)
par(mfrow=c(1,2))
acf(sdiff.diff.log.electricity, lag=60)
pacf(sdiff.diff.log.electricity, lag=60)

Comment: After further differencing diff.log.electricity, we see that the ACF lags tail-off to 0 rather quick. This means that the data is finally a stationary seasonal series. The PACF also tails off to 0, with repetitive peaks below/above the blue dashed lines. Hence, we will try ARIMA(4,1,3) for both the non-seasonal part, and ARIMA(0,1,1) for the non-seasonal part. Another model to try would be ARIMA(1,1,1) for both non-seasonal and seasonal part.


b: Fit the two SARIMA models which you chose in part (a) using sarima function in R and interpret the result.

sarima(electricity, p=4, d=1, q=3, P=0, D=1, Q=1, S=12)
## initial  value 9.219916 
## iter   2 value 9.006146
## iter   3 value 8.959792
## iter   4 value 8.903809
## iter   5 value 8.878712
## iter   6 value 8.874496
## iter   7 value 8.868160
## iter   8 value 8.867186
## iter   9 value 8.866672
## iter  10 value 8.866515
## iter  11 value 8.866413
## iter  12 value 8.866387
## iter  13 value 8.866360
## iter  14 value 8.866321
## iter  15 value 8.866269
## iter  16 value 8.866080
## iter  17 value 8.864435
## iter  18 value 8.863320
## iter  19 value 8.862893
## iter  20 value 8.862720
## iter  21 value 8.862314
## iter  22 value 8.861819
## iter  23 value 8.861033
## iter  24 value 8.859630
## iter  25 value 8.857495
## iter  26 value 8.856444
## iter  27 value 8.855050
## iter  28 value 8.852176
## iter  29 value 8.851106
## iter  30 value 8.850572
## iter  31 value 8.850514
## iter  32 value 8.850444
## iter  33 value 8.850334
## iter  34 value 8.850231
## iter  35 value 8.850204
## iter  36 value 8.850179
## iter  37 value 8.850167
## iter  38 value 8.850149
## iter  39 value 8.850137
## iter  40 value 8.850124
## iter  41 value 8.850104
## iter  42 value 8.850099
## iter  43 value 8.850098
## iter  44 value 8.850097
## iter  44 value 8.850097
## iter  44 value 8.850097
## final  value 8.850097 
## converged
## initial  value 8.849589 
## iter   2 value 8.849580
## iter   3 value 8.849514
## iter   4 value 8.849467
## iter   5 value 8.849450
## iter   6 value 8.849432
## iter   7 value 8.849428
## iter   8 value 8.849425
## iter   9 value 8.849424
## iter  10 value 8.849424
## iter  11 value 8.849423
## iter  12 value 8.849420
## iter  13 value 8.849415
## iter  14 value 8.849410
## iter  15 value 8.849407
## iter  16 value 8.849407
## iter  16 value 8.849407
## iter  16 value 8.849407
## final  value 8.849407 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ar2     ar3      ar4      ma1     ma2      ma3     sma1
##       0.0278  -0.7623  0.4217  -0.1139  -0.4654  0.5371  -0.7767  -0.7047
## s.e.  0.1175   0.0954  0.0752   0.0814   0.1072  0.0612   0.1293   0.0339
## 
## sigma^2 estimated as 47334220:  log likelihood = -3932.78,  aic = 7883.55
## 
## $degrees_of_freedom
## [1] 375
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    0.0278 0.1175   0.2366  0.8131
## ar2   -0.7623 0.0954  -7.9932  0.0000
## ar3    0.4217 0.0752   5.6043  0.0000
## ar4   -0.1139 0.0814  -1.3991  0.1626
## ma1   -0.4654 0.1072  -4.3430  0.0000
## ma2    0.5371 0.0612   8.7704  0.0000
## ma3   -0.7767 0.1293  -6.0046  0.0000
## sma1  -0.7047 0.0339 -20.7951  0.0000
## 
## $AIC
## [1] 20.00902
## 
## $AICc
## [1] 20.00997
## 
## $BIC
## [1] 20.0992
sarima(electricity, p=1, d=1, q=1, P=1, D=1, Q=1, S=12)
## initial  value 9.227746 
## iter   2 value 9.004840
## iter   3 value 8.963150
## iter   4 value 8.935167
## iter   5 value 8.924893
## iter   6 value 8.906515
## iter   7 value 8.903222
## iter   8 value 8.883118
## iter   9 value 8.875670
## iter  10 value 8.875570
## iter  11 value 8.875512
## iter  12 value 8.875500
## iter  12 value 8.875500
## iter  12 value 8.875500
## final  value 8.875500 
## converged
## initial  value 8.871965 
## iter   2 value 8.871869
## iter   3 value 8.871862
## iter   4 value 8.871790
## iter   5 value 8.871780
## iter   6 value 8.871779
## iter   7 value 8.871779
## iter   7 value 8.871779
## iter   7 value 8.871779
## final  value 8.871779 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1     sar1     sma1
##       0.4204  -0.8864  -0.0447  -0.6962
## s.e.  0.0638   0.0334   0.0658   0.0422
## 
## sigma^2 estimated as 49511784:  log likelihood = -3941.34,  aic = 7892.69
## 
## $degrees_of_freedom
## [1] 379
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    0.4204 0.0638   6.5947  0.0000
## ma1   -0.8864 0.0334 -26.5052  0.0000
## sar1  -0.0447 0.0658  -0.6791  0.4975
## sma1  -0.6962 0.0422 -16.4995  0.0000
## 
## $AIC
## [1] 20.03221
## 
## $AICc
## [1] 20.03247
## 
## $BIC
## [1] 20.08231

c: Check model diagnostics of two candidate models and choose the final one based on diagnostics plots and AIC/ BIC comparison

Comment:

  • ARIMA (4,1,3) * ARIMA(0,1,1): We can see that most of the p-values for Ljung-Box statistic are fairly high as they’re plotted right above the cutoff point (blue dashed line), although there are some that are located just right on the line. This means that we cannot reject the null hypothesis that specifies our model does not show lack of fit (model is fine), since it’s higher than the cutoff line (0.05)

  • ARIMA(1,1,1) * ARIMA(1,1,1): The other model, produced randomly, produced a set of p-values lower than the cutoff line of 0.05, though one (lag 9) can be seen plotted right above the blue dashed line. This means that our null hypothesis is rejected, and we can conclude that our model shows lack of fit.


d: Make a prediction for next 4 values based on your final model. Make sure that you predict the values in original scale, not transformed values.

sarima.for(electricity, n.ahead=4, p=4, d=1, q=3, P=0, D=1, Q=1, S=12)

## $pred
##           Jan      Feb      Mar      Apr
## 2006 357481.2 316015.9 325577.4 304470.9
## 
## $se
##           Jan      Feb      Mar      Apr
## 2006 6879.987 7893.337 8203.880 8454.532

Exercise 2

Use the data gtemp in astsa package analyzed in HW1 Exercise 4. Among 130 time series observations, we will use last 10 series (121-130) for the model validation and rest of them (1-120) for the model train. In other words, we aim to predict last 10 series based on fitted model using first 120 observations.

data(gtemp)
train.gtemp = ts(gtemp[1:120]) # train set: for the model selection and fit
test.gtemp = ts(gtemp[121:130]) # test set: only for the evaluation (for calculating prediction error)

a: By visually checking first 120 time series plot, decide which ESM model seems the most appropriate. For example, answer should be like “Additive error + no trend + additive seasonality”. Explain your choice.

plot.ts(train.gtemp)

Comment: Looking at the plot, this dataset appears to have an increasing trend, although there seems to be no sign of seasonality. This is because we don’t see any repetition of data at a certain period of the time interval. Hence, the proper model for this dataset would be additive error + additive trend + no seasonality, or AAN


b: Fit the model for the first 120 data which you chose in part (a). Use ets function with the specification of model input with three letters. Report autoplot() and interpret how your chosen model decompose the data.

train.gtemp.ets = ets(train.gtemp)
summary(train.gtemp.ets)
## ETS(A,N,N) 
## 
## Call:
##  ets(y = train.gtemp) 
## 
##   Smoothing parameters:
##     alpha = 0.3775 
## 
##   Initial states:
##     l = -0.2644 
## 
##   sigma:  0.0999
## 
##      AIC     AICc      BIC 
## 25.59878 25.80568 33.96126 
## 
## Training set error measures:
##                      ME       RMSE       MAE MPE MAPE      MASE      ACF1
## Training set 0.01414113 0.09905458 0.0810756 NaN  Inf 0.9067666 0.1060173
autoplot(train.gtemp.ets)

Comment: When looking at the summary, we can see that the alpha and beta values are very small. This means that the slope and trend components change very little over time. This can be backed up by the plot generated as there’s not a lot of range for both the level and slope.


c: Make a prediction for next 10 values based on your final model. Calculate the prediction error using test set. As we know the true Y from the test set, it is the mean of squared error.

train.gtemp.ets <- ets(train.gtemp, model="AAN")

pred.ets.gtemp <- forecast(train.gtemp.ets, h=10)
plot(pred.ets.gtemp)

mean((pred.ets.gtemp$mean - as.vector(test.gtemp))^2)
## [1] 0.01915633
pred.ets.gtemp
##     Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## 121      0.3740666 0.2462632 0.5018700 0.1786081 0.5695250
## 122      0.3791551 0.2467692 0.5115410 0.1766883 0.5816219
## 123      0.3842436 0.2474254 0.5210618 0.1749982 0.5934891
## 124      0.3893321 0.2482176 0.5304467 0.1735160 0.6051482
## 125      0.3944206 0.2491337 0.5397075 0.1722235 0.6166178
## 126      0.3995092 0.2501635 0.5488548 0.1711046 0.6279137
## 127      0.4045977 0.2512977 0.5578976 0.1701456 0.6390497
## 128      0.4096862 0.2525286 0.5668437 0.1693345 0.6500379
## 129      0.4147747 0.2538493 0.5757001 0.1686605 0.6608889
## 130      0.4198632 0.2552534 0.5844730 0.1681142 0.6716122

Comment: The prediction error is 1.91%, which is acceptable.

d: Now, you fit ARIMA for the first 120 data using the chosen model in Homework1 Exercise 4. Predict the next 10 values based on this ARIMA model. Fit the model, and calculate the prediction error.

#ARIMA(5,1,0)
sarima(train.gtemp, p=5, d=1, q=0)
## initial  value -2.188408 
## iter   2 value -2.289406
## iter   3 value -2.326418
## iter   4 value -2.339477
## iter   5 value -2.344171
## iter   6 value -2.346433
## iter   7 value -2.346528
## iter   8 value -2.346531
## iter   9 value -2.346531
## iter   9 value -2.346531
## iter   9 value -2.346531
## final  value -2.346531 
## converged
## initial  value -2.359186 
## iter   2 value -2.359251
## iter   3 value -2.359264
## iter   4 value -2.359312
## iter   5 value -2.359315
## iter   6 value -2.359316
## iter   6 value -2.359316
## iter   6 value -2.359316
## final  value -2.359316 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5  constant
##       -0.4524  -0.4617  -0.3929  -0.1145  -0.2262    0.0054
## s.e.   0.0903   0.0996   0.1015   0.0992   0.0918    0.0033
## 
## sigma^2 estimated as 0.008868:  log likelihood = 111.9,  aic = -209.81
## 
## $degrees_of_freedom
## [1] 113
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1       -0.4524 0.0903 -5.0106  0.0000
## ar2       -0.4617 0.0996 -4.6350  0.0000
## ar3       -0.3929 0.1015 -3.8720  0.0002
## ar4       -0.1145 0.0992 -1.1539  0.2510
## ar5       -0.2262 0.0918 -2.4641  0.0152
## constant   0.0054 0.0033  1.6484  0.1020
## 
## $AIC
## [1] -1.763107
## 
## $AICc
## [1] -1.756805
## 
## $BIC
## [1] -1.599629
# prediction
arima.pred <- sarima.for(gtemp, n.ahead=10, p=5, d=1, q=0)

mean((arima.pred$pred - as.vector(test.gtemp))^2)
## [1] 0.01022859

Comment: The prediction error is 1.02%, which is acceptable.

e: Compare prediction performance between ESM and ARIMA


Exercise 3

We repeat the same processes of Exercise 2 using the electricity data analyzed in Exercise 1. Among 396 time series observations, we will use last 26 series (371-396) for the model validation and rest of them (1-370) for the model train. In other words, we aim to predict last 26 series based on fitted model using first 370 observations.

train.electricity = ts(electricity[1:370], freq=12) # train set: for the model selection and fit
test.electricity = ts(electricity[371:396], freq=12) # test set: only for the evaluation (for calculating prediction error)

a: By visually checking first 370 time series plot, decide which ESM model seems the most appropriate. Explain your choice.

plot(train.electricity)

Comment: Looking at the plot, this dataset appears to have an increasing trend, with a multiplicative seasonality. This is because we can see that the peaks and troughs of the data tend increase (vary in frequency and amplitude over time) Hence, the proper model for this dataset would be multiplicative error + additive trend + multiplicative seasonality, or MAM


b: Fit the model for the first 370 data which you chose in part (a). Use ets function with the specification of model input with three letters. Report autoplot() and interpret how your chosen model decompose the data.

electricity.ets <- ets(train.electricity, model = "MAM")
summary(electricity.ets)
## ETS(M,Ad,M) 
## 
## Call:
##  ets(y = train.electricity, model = "MAM") 
## 
##   Smoothing parameters:
##     alpha = 0.6427 
##     beta  = 5e-04 
##     gamma = 0.0067 
##     phi   = 0.9786 
## 
##   Initial states:
##     l = 153445.5704 
##     b = 661.7965 
##     s = 1.0085 0.9265 0.9389 0.9852 1.1337 1.1436
##            1.0381 0.9614 0.9009 0.9693 0.935 1.0588
## 
##   sigma:  0.0276
## 
##      AIC     AICc      BIC 
## 8678.796 8680.745 8749.239 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 624.7038 6691.059 4901.356 0.1873471 2.054026 0.5697099 0.07684258
autoplot(electricity.ets)

Comment: The small values of β and γ mean that the slope and seasonal components change very little over time. This can be backed up by the small range for season and slope. The seasonal part is very clear as it is indicated by the regular pattern by the season plot. The trend slope seems to have a decreasing trend, although it only ranges from 200 to 600 - short period. The level, however, shows a large scale ranging from 150000 to 300000.


c: Make a prediction for next 26 values based on your final model. Calculate the prediction error using validation set.

train.electricity.ets <- ets(train.electricity, model="MAM")

pred.ets.electricity <- forecast(train.electricity.ets, h=26)
plot(pred.ets.electricity)

mean((pred.ets.electricity$mean - as.vector(test.electricity))^2)
## [1] 142228849
pred.ets.electricity
##        Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Nov 31       303293.6 292555.1 314032.0 286870.5 319716.6
## Dec 31       330488.9 316574.8 344403.1 309209.0 351768.8
## Jan 32       346744.6 330142.6 363346.6 321354.0 372135.1
## Feb 32       306258.0 290014.0 322501.9 281414.9 331101.0
## Mar 32       317535.3 299197.8 335872.7 289490.6 345579.9
## Apr 32       295170.2 276837.4 313503.0 267132.6 323207.8
## May 32       314895.6 294052.9 335738.3 283019.4 346771.8
## Jun 32       340250.6 316420.0 364081.1 303804.9 376696.2
## Jul 32       374844.9 347221.8 402468.0 332599.0 417090.8
## Aug 32       372008.6 343297.5 400719.6 328098.8 415918.4
## Sep 32       323057.0 297044.8 349069.3 283274.7 362839.4
## Oct 32       307804.4 282031.4 333577.3 268388.1 347220.7
## Nov 32       303462.6 277093.8 329831.4 263135.0 343790.2
## Dec 32       330669.2 300946.4 360392.0 285212.1 376126.3
## Jan 33       346929.7 314738.7 379120.6 297697.9 396161.4
## Feb 33       306418.0 277122.8 335713.1 261614.8 351221.1
## Mar 33       317697.6 286453.4 348941.8 269913.7 365481.5
## Apr 33       295317.9 265486.2 325149.6 249694.3 340941.5
## May 33       315049.8 282404.4 347695.3 265122.9 364976.7
## Jun 33       340413.7 304274.0 376553.3 285142.8 395684.5
## Jul 33       375020.7 334274.1 415767.3 312704.2 437337.3
## Aug 33       372179.4 330835.2 413523.6 308948.8 435409.9
## Sep 33       323202.2 286527.6 359876.8 267113.2 379291.2
## Oct 33       307939.7 272276.5 343603.0 253397.5 362482.0
## Nov 33       303593.2 267721.9 339464.5 248732.8 358453.6
## Dec 33       330808.5 290975.8 370641.2 269889.7 391727.3

d: Now, you fit SARIMA for the first 370 data using the chosen model in Exercise 1. Predict the next 26 values based on this SARIMA model. Just borrow your answer from Exercise 1, fit the model, and calculate the prediction error.

# ARIMA(5,1,0) * ARIMA(0,1,1)12
sarima(train.electricity, p=4, d=1, q=3, P=0, D=1, Q=1, S=12)
## initial  value 9.193982 
## iter   2 value 8.985677
## iter   3 value 8.971106
## iter   4 value 8.902322
## iter   5 value 8.879036
## iter   6 value 8.864474
## iter   7 value 8.858320
## iter   8 value 8.853862
## iter   9 value 8.853425
## iter  10 value 8.848685
## iter  11 value 8.847490
## iter  12 value 8.846848
## iter  13 value 8.846755
## iter  14 value 8.846753
## iter  15 value 8.846752
## iter  16 value 8.846745
## iter  17 value 8.846730
## iter  18 value 8.846657
## iter  19 value 8.846636
## iter  20 value 8.846610
## iter  21 value 8.846597
## iter  22 value 8.846580
## iter  23 value 8.846570
## iter  24 value 8.846562
## iter  25 value 8.846509
## iter  26 value 8.846293
## iter  27 value 8.846085
## iter  28 value 8.845994
## iter  29 value 8.845906
## iter  30 value 8.845795
## iter  31 value 8.845632
## iter  32 value 8.845155
## iter  33 value 8.844371
## iter  34 value 8.843856
## iter  35 value 8.843532
## iter  36 value 8.843497
## iter  37 value 8.843333
## iter  38 value 8.843048
## iter  39 value 8.842535
## iter  40 value 8.839372
## iter  41 value 8.838742
## iter  42 value 8.836111
## iter  43 value 8.834603
## iter  44 value 8.833865
## iter  45 value 8.833446
## iter  46 value 8.832701
## iter  47 value 8.832591
## iter  48 value 8.832507
## iter  49 value 8.832445
## iter  50 value 8.832425
## iter  51 value 8.832419
## iter  52 value 8.832413
## iter  53 value 8.832387
## iter  54 value 8.832366
## iter  55 value 8.832357
## iter  56 value 8.832356
## iter  56 value 8.832356
## iter  56 value 8.832356
## final  value 8.832356 
## converged
## initial  value 8.830843 
## iter   2 value 8.830777
## iter   3 value 8.830762
## iter   4 value 8.830699
## iter   5 value 8.830687
## iter   6 value 8.830656
## iter   7 value 8.830638
## iter   8 value 8.830623
## iter   9 value 8.830595
## iter  10 value 8.830574
## iter  11 value 8.830552
## iter  12 value 8.830545
## iter  13 value 8.830543
## iter  14 value 8.830539
## iter  15 value 8.830532
## iter  16 value 8.830526
## iter  17 value 8.830524
## iter  17 value 8.830524
## iter  17 value 8.830524
## final  value 8.830524 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ar2     ar3      ar4      ma1     ma2      ma3     sma1
##       0.1452  -0.8058  0.3933  -0.1263  -0.5854  0.6606  -0.7558  -0.6922
## s.e.  0.0987   0.1240  0.0741   0.0628   0.0898  0.1570   0.0914   0.0363
## 
## sigma^2 estimated as 45540557:  log likelihood = -3659.06,  aic = 7336.12
## 
## $degrees_of_freedom
## [1] 349
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    0.1452 0.0987   1.4705  0.1423
## ar2   -0.8058 0.1240  -6.5000  0.0000
## ar3    0.3933 0.0741   5.3059  0.0000
## ar4   -0.1263 0.0628  -2.0117  0.0450
## ma1   -0.5854 0.0898  -6.5205  0.0000
## ma2    0.6606 0.1570   4.2083  0.0000
## ma3   -0.7558 0.0914  -8.2715  0.0000
## sma1  -0.6922 0.0363 -19.0840  0.0000
## 
## $AIC
## [1] 19.9351
## 
## $AICc
## [1] 19.93619
## 
## $BIC
## [1] 20.02993
# prediction
sarima.electricity.pred <- sarima.for(train.electricity, n.ahead=26, p=4, d=1, q=3, P=0, D=1, Q=1, S=12)

names(sarima.electricity.pred)
## [1] "pred" "se"
mean((sarima.electricity.pred$pred - as.vector(test.electricity))^2)
## [1] 85495410

e: Compare prediction performance between ESM and ARIMA.

Comment: We can see that both prediction performance between ESM and ARIMA aren’t that different from each other. When looking at both plots, they produce basically the same forecast, although the prediction error for ARIMA is less than the prediction error rate for our ESM model.


Exercise 4: Exercise 4: ARCH and GARCH Model

The usdhkd.RData contains daily USD/HKD (US dollar to Hong Kong dollar) exchange rate from January 1, 2005 to March 7, 2006

load("usdhkd.RData")

a: Generate the time series plot on exchange rate. Generate ACF and PACF from both raw data and squared data, respectively. Make comments based on your observation.

plot.ts(usdhkd)

#ACF/PACF
par(mfrow=c(1,2))
acf(usdhkd)
acf(usdhkd^2)

#ACF/PACF^2
par(mfrow=c(1,2))
pacf(usdhkd)
pacf(usdhkd^2)

Comment: For the ACF for the original time series, we can see that only one lag crosses the blue dashed lines, making them insignificant. However, when we squared the data, we see a few of these lags cross the blue dashed lines, making them significant. The same goes to out PACF plots, only this time, the original time series data have more lags that cross the blue dashed lines than the squared time series data.


b: Fit total four mdoels – ARCH(1), ARCH(2), GARCH(1,1) and GARCH(2,2). Then choose the best one based on diagnostics test on residuals and AIC criteria. There is no one correct model and choose the one that you think the optimal

# ARCH(1)
usdhkd.garch1 <- garchFit(~garch(1,0), data=usdhkd, trace=F)
summary(usdhkd.garch1)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(1, 0), data = usdhkd, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(1, 0)
## <environment: 0x7fec26e93f88>
##  [data = usdhkd]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##          mu        omega       alpha1  
## -0.00117994   0.00045482   0.49888267  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu     -1.180e-03   1.053e-03   -1.120 0.262606    
## omega   4.548e-04   4.296e-05   10.588  < 2e-16 ***
## alpha1  4.989e-01   1.336e-01    3.733 0.000189 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  977.8896    normalized:  2.268885 
## 
## Description:
##  Mon May 10 21:12:04 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value   
##  Jarque-Bera Test   R    Chi^2  4337.001  0         
##  Shapiro-Wilk Test  R    W      0.8287354 0         
##  Ljung-Box Test     R    Q(10)  20.34018  0.02619446
##  Ljung-Box Test     R    Q(15)  26.32464  0.03474572
##  Ljung-Box Test     R    Q(20)  30.61091  0.06054118
##  Ljung-Box Test     R^2  Q(10)  9.38141   0.4963288 
##  Ljung-Box Test     R^2  Q(15)  10.35731  0.7966824 
##  Ljung-Box Test     R^2  Q(20)  11.16525  0.9418239 
##  LM Arch Test       R    TR^2   9.46025   0.663197  
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -4.523850 -4.495547 -4.523946 -4.512675
# ARCH(2)
usdhkd.garch2 <- garchFit(~garch(2,0), data=usdhkd, trace=F)
summary(usdhkd.garch2)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(2, 0), data = usdhkd, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(2, 0)
## <environment: 0x7fec29e640e0>
##  [data = usdhkd]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##          mu        omega       alpha1       alpha2  
## -0.00046702   0.00026598   0.54327969   0.48701087  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu     -4.670e-04   8.699e-04   -0.537    0.591    
## omega   2.660e-04   3.473e-05    7.658 1.89e-14 ***
## alpha1  5.433e-01   1.323e-01    4.108 3.99e-05 ***
## alpha2  4.870e-01   1.108e-01    4.395 1.11e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  997.0851    normalized:  2.313422 
## 
## Description:
##  Mon May 10 21:12:05 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value   
##  Jarque-Bera Test   R    Chi^2  1450.006  0         
##  Shapiro-Wilk Test  R    W      0.8744017 0         
##  Ljung-Box Test     R    Q(10)  18.6254   0.04528692
##  Ljung-Box Test     R    Q(15)  26.25545  0.03542182
##  Ljung-Box Test     R    Q(20)  28.93381  0.08907092
##  Ljung-Box Test     R^2  Q(10)  8.642008  0.5663814 
##  Ljung-Box Test     R^2  Q(15)  11.90956  0.6858611 
##  Ljung-Box Test     R^2  Q(20)  16.36083  0.6940018 
##  LM Arch Test       R    TR^2   9.24412   0.68195   
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -4.608283 -4.570547 -4.608454 -4.593384
# GARCH(1,1)
usdhkd.garch11 = garchFit(~garch(1,1), data=usdhkd, trace=F)
summary(usdhkd.garch11)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(1, 1), data = usdhkd, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(1, 1)
## <environment: 0x7fec289a3ed0>
##  [data = usdhkd]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##          mu        omega       alpha1        beta1  
## -1.6576e-04   1.5468e-05   2.4065e-01   8.0449e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu     -1.658e-04   8.902e-04   -0.186   0.8523    
## omega   1.547e-05   1.904e-05    0.812   0.4167    
## alpha1  2.406e-01   1.362e-01    1.767   0.0772 .  
## beta1   8.045e-01   1.052e-01    7.648 2.04e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  999.3168    normalized:  2.3186 
## 
## Description:
##  Mon May 10 21:12:05 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value   
##  Jarque-Bera Test   R    Chi^2  2690.144  0         
##  Shapiro-Wilk Test  R    W      0.858535  0         
##  Ljung-Box Test     R    Q(10)  20.78275  0.02266047
##  Ljung-Box Test     R    Q(15)  24.94071  0.05074532
##  Ljung-Box Test     R    Q(20)  26.92712  0.1373269 
##  Ljung-Box Test     R^2  Q(10)  3.768855  0.9571724 
##  Ljung-Box Test     R^2  Q(15)  6.011632  0.9795517 
##  Ljung-Box Test     R^2  Q(20)  8.392027  0.9889398 
##  LM Arch Test       R    TR^2   4.156909  0.9804356 
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -4.618639 -4.580903 -4.618810 -4.603740
# GARCH (2,2)
usdhkd.garch22 = garchFit(~garch(2,2), data=usdhkd, trace=F)
## Warning in sqrt(diag(fit$cvar)): NaNs produced
summary(usdhkd.garch22)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(2, 2), data = usdhkd, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(2, 2)
## <environment: 0x7fec2b3659a8>
##  [data = usdhkd]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##          mu        omega       alpha1       alpha2        beta1        beta2  
## -1.0068e-04   1.3247e-05   2.4447e-01   1.0000e-08   5.5623e-01   2.4114e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)   
## mu     -1.007e-04   8.768e-04   -0.115  0.90858   
## omega   1.325e-05   9.320e-06    1.421  0.15519   
## alpha1  2.445e-01   8.728e-02    2.801  0.00509 **
## alpha2  1.000e-08          NA       NA       NA   
## beta1   5.562e-01          NA       NA       NA   
## beta2   2.411e-01          NA       NA       NA   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  1000.581    normalized:  2.321534 
## 
## Description:
##  Mon May 10 21:12:05 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value   
##  Jarque-Bera Test   R    Chi^2  2936.158  0         
##  Shapiro-Wilk Test  R    W      0.859941  0         
##  Ljung-Box Test     R    Q(10)  21.39029  0.01853056
##  Ljung-Box Test     R    Q(15)  25.39566  0.04487562
##  Ljung-Box Test     R    Q(20)  27.34351  0.1258733 
##  Ljung-Box Test     R^2  Q(10)  3.013569  0.9811029 
##  Ljung-Box Test     R^2  Q(15)  5.023137  0.9919287 
##  Ljung-Box Test     R^2  Q(20)  7.216473  0.9959124 
##  LM Arch Test       R    TR^2   3.459938  0.9913337 
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -4.615226 -4.558622 -4.615607 -4.592877

Comment: Although all four models have residuals that are not normally distributed (p-value < 0.05), I believe that GARCH(1,1) would be the best model out of all four because it has only one insignificant p-value for its residuals, while the others have more than one. Moreover, it has the lowest AIC and BIC values out of all models.


c: Make a prediction for next 5 volatility based on your final model

predict(usdhkd.garch11, n.ahead=5, plot=TRUE, conf=.9)

##    meanForecast  meanError standardDeviation lowerInterval upperInterval
## 1 -0.0001657554 0.01384428        0.01384428   -0.02293757    0.02260606
## 2 -0.0001657554 0.01468953        0.01468953   -0.02432788    0.02399637
## 3 -0.0001657554 0.01552382        0.01552382   -0.02570016    0.02536865
## 4 -0.0001657554 0.01635032        0.01635032   -0.02705965    0.02672813
## 5 -0.0001657554 0.01717167        0.01717167   -0.02841064    0.02807913