Forecasting - Volatility

Forecasting volatility

Read in the data we obtained in Phase 1. We calculate simple returns.

Plotting returns:

Splitting the data

We split the data on the in-sample (training) set and out-of-sample (testing) set.

We split the data the same way as in Phase 2 and 3.

## [1] "Number of observations in training set: 505 (82.11%)"

## [1] "Number of observations in testing set: 110 (17.89%)"

Looking at ACF and PACF plots we see that there’s serial correlation.

We take a look at regular, squared and absolute values for lag 10 and 30.

The regular ACF plot for lag 30 indicates that there’s serial correlation up to 25th lag, but looking at the squared returns we see we actually need to the 30th lag.

PACF plots show the same.

We use Box-Ljung test to test serial correlation on the returns.

Since the p-value is less than 5% we reject the null hypothesis that there is no serial correlation with strong evidence (p-value = 2.2e-16), i.e. there is serial correlation.

This means that we’ll need to use ARMA + GARCH model.

## 
##  Box-Ljung test
## 
## data:  training_set
## X-squared = 271.45, df = 30, p-value < 2.2e-16

We need to check if there’s a ARCH effect in the data.

Since the expected return of MSFT is not zero (calculated in Phase 1) we need to adjust for that.

Since p-value is practically zero (2.2e-16), we reject the null hypothesis (that there’s no conditional homoscedastcity).

This means that we have strong evidence to reject this hypothesis, hence, there’s ARCH effect.

## 
##  Box-Ljung test
## 
## data:  at^2
## X-squared = 593.01, df = 30, p-value < 2.2e-16

## 
## Call:
## lm(formula = atsq ~ x)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0068785 -0.0002686 -0.0000936  0.0000994  0.0132433 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.050e-04  6.543e-05   1.604 0.109386    
## x1           6.564e-01  4.744e-02  13.836  < 2e-16 ***
## x2           1.040e-02  5.675e-02   0.183 0.854609    
## x3          -1.654e-01  5.672e-02  -2.917 0.003715 ** 
## x4           1.695e-01  5.710e-02   2.969 0.003149 ** 
## x5          -4.795e-02  5.748e-02  -0.834 0.404587    
## x6           2.789e-02  5.742e-02   0.486 0.627372    
## x7           7.517e-02  5.742e-02   1.309 0.191151    
## x8          -4.266e-02  5.750e-02  -0.742 0.458544    
## x9           4.301e-02  5.753e-02   0.748 0.455027    
## x10          1.390e-01  5.755e-02   2.416 0.016094 *  
## x11         -6.316e-02  5.787e-02  -1.091 0.275647    
## x12          3.464e-02  5.777e-02   0.600 0.549101    
## x13         -8.333e-02  5.739e-02  -1.452 0.147172    
## x14         -1.523e-02  5.680e-02  -0.268 0.788771    
## x15          1.373e-01  5.681e-02   2.417 0.016036 *  
## x16          1.004e-04  5.681e-02   0.002 0.998590    
## x17         -1.947e-01  5.681e-02  -3.427 0.000666 ***
## x18          1.445e-01  5.742e-02   2.517 0.012193 *  
## x19         -9.493e-02  5.780e-02  -1.642 0.101233    
## x20          4.377e-02  5.790e-02   0.756 0.450059    
## x21         -2.435e-02  5.754e-02  -0.423 0.672358    
## x22         -1.432e-02  5.751e-02  -0.249 0.803490    
## x23          2.610e-02  5.748e-02   0.454 0.649935    
## x24         -3.067e-02  5.737e-02  -0.534 0.593264    
## x25          7.369e-02  5.737e-02   1.285 0.199633    
## x26         -9.916e-02  5.743e-02  -1.727 0.084919 .  
## x27          9.089e-02  5.698e-02   1.595 0.111379    
## x28         -3.698e-02  5.651e-02  -0.654 0.513189    
## x29         -7.300e-03  5.653e-02  -0.129 0.897316    
## x30          2.373e-02  4.726e-02   0.502 0.615828    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001214 on 444 degrees of freedom
## Multiple R-squared:  0.4863, Adjusted R-squared:  0.4516 
## F-statistic: 14.01 on 30 and 444 DF,  p-value: < 2.2e-16

Fit ARMA(1, 1) - GARCH(1, 1) model with Student t-distribution

Since we found that there’s serial correlation we’ll use ARMA(1, 1)-GARCH(1, 1) model with Student t-distribution.

If we look at the Standardized Residuals Tests we see the following:

Since p-values of Ljung-Box tests on standardized residuals is greater than 5%, there is no evidence of correlation in our residuals
The same is correct for the squared residuals so there’s no dependence in conditional variance.
The p-value for the LM Arch Test is 0.56 (not rejecting the null) which means that there’s no additional ARCH effect our model didn’t captured.

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~arma(1, 1) + garch(1, 1), data = training_set, 
##     cond.dist = "std", trace = F) 
## 
## Mean and Variance Equation:
##  data ~ arma(1, 1) + garch(1, 1)
## <environment: 0x7ff7a8e1ed68>
##  [data = training_set]
## 
## Conditional Distribution:
##  std 
## 
## Coefficient(s):
##          mu          ar1          ma1        omega       alpha1        beta1  
##  7.0173e-04   6.7945e-01  -8.2628e-01   9.0590e-06   1.8654e-01   7.9894e-01  
##       shape  
##  6.2238e+00  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu      7.017e-04   2.459e-04    2.854 0.004319 ** 
## ar1     6.795e-01   9.993e-02    6.799 1.05e-11 ***
## ma1    -8.263e-01   7.698e-02  -10.734  < 2e-16 ***
## omega   9.059e-06   4.345e-06    2.085 0.037059 *  
## alpha1  1.865e-01   4.618e-02    4.039 5.36e-05 ***
## beta1   7.989e-01   4.026e-02   19.845  < 2e-16 ***
## shape   6.224e+00   1.764e+00    3.529 0.000418 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  1383.943    normalized:  2.740481 
## 
## Description:
##  Mon Jun 21 09:54:05 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  33.82925  4.508907e-08
##  Shapiro-Wilk Test  R    W      0.9830147 1.259921e-05
##  Ljung-Box Test     R    Q(10)  14.75566  0.1412272   
##  Ljung-Box Test     R    Q(15)  20.5683   0.1511985   
##  Ljung-Box Test     R    Q(20)  22.85278  0.2960811   
##  Ljung-Box Test     R^2  Q(10)  8.821484  0.5491252   
##  Ljung-Box Test     R^2  Q(15)  10.10979  0.8127778   
##  Ljung-Box Test     R^2  Q(20)  11.66964  0.9269777   
##  LM Arch Test       R    TR^2   9.144168  0.6905698   
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -5.453240 -5.394681 -5.453617 -5.430271

Fit ARMA(1, 1) - GARCH(2, 1) model with Student t-distribution

The last model gave us pretty good results, let’s try to increase the order of the model.

All of our parameters except for alpha 2 are significant.
Just like in the previous model, we can see that there’s no correlation and no dependence in conditional variance.
The p-value for the LM Arch Test is 0.72 (not rejecting the null) which means that there’s no additional ARCH effect our model didn’t captured. This value is even greater than in the previous model.
This might indicate that this model will be better, but we’ll keep track of the AIC values and compare models that way.

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~arma(1, 1) + garch(2, 1), data = training_set, 
##     cond.dist = "std", trace = F) 
## 
## Mean and Variance Equation:
##  data ~ arma(1, 1) + garch(2, 1)
## <environment: 0x7ff78d47ba80>
##  [data = training_set]
## 
## Conditional Distribution:
##  std 
## 
## Coefficient(s):
##          mu          ar1          ma1        omega       alpha1       alpha2  
##  0.00068305   0.68536958  -0.83340048   0.00001009   0.14379790   0.06496868  
##       beta1        shape  
##  0.77600250   6.26818424  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu      6.831e-04   2.330e-04    2.932 0.003370 ** 
## ar1     6.854e-01   9.570e-02    7.161 7.99e-13 ***
## ma1    -8.334e-01   7.345e-02  -11.347  < 2e-16 ***
## omega   1.009e-05   5.000e-06    2.018 0.043581 *  
## alpha1  1.438e-01   7.501e-02    1.917 0.055229 .  
## alpha2  6.497e-02   9.147e-02    0.710 0.477529    
## beta1   7.760e-01   5.327e-02   14.566  < 2e-16 ***
## shape   6.268e+00   1.792e+00    3.498 0.000468 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  1384.426    normalized:  2.741437 
## 
## Description:
##  Mon Jun 21 09:54:06 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  33.43287  5.497227e-08
##  Shapiro-Wilk Test  R    W      0.9831497 1.371396e-05
##  Ljung-Box Test     R    Q(10)  14.67442  0.1443905   
##  Ljung-Box Test     R    Q(15)  20.45534  0.1551523   
##  Ljung-Box Test     R    Q(20)  22.61895  0.3078896   
##  Ljung-Box Test     R^2  Q(10)  8.612195  0.5692598   
##  Ljung-Box Test     R^2  Q(15)  9.580863  0.8452443   
##  Ljung-Box Test     R^2  Q(20)  11.3483   0.936685    
##  LM Arch Test       R    TR^2   8.73106   0.7257136   
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -5.451191 -5.384267 -5.451683 -5.424941

Fit ARMA(1, 1) - GARCH(2, 1) model with skew-Student distribution

The previous model gave us pretty good results, let’s try same order but different distribution.

Looking at the Standardised Resituals Tests, we conclude the following:

The Jarque-Bera Test p-value is zero (5.032e-08) so we don’t have normal distribution.
There no evidence of correlation in our residuals and there’s no dependence in conditional variance.
The p-value for the LM Arch Test is 0.68 - there’s no additional ARCH effect our model didn’t captured.

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~arma(1, 1) + garch(2, 1), data = training_set, 
##     cond.dist = "sstd", trace = F) 
## 
## Mean and Variance Equation:
##  data ~ arma(1, 1) + garch(2, 1)
## <environment: 0x7ff78db83eb0>
##  [data = training_set]
## 
## Conditional Distribution:
##  sstd 
## 
## Coefficient(s):
##          mu          ar1          ma1        omega       alpha1       alpha2  
##  0.00061444   0.68385472  -0.83579617   0.00001079   0.12527362   0.08521354  
##       beta1         skew        shape  
##  0.76959753   0.86590389   6.71037266  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu      6.144e-04   2.009e-04    3.059 0.002220 ** 
## ar1     6.839e-01   9.195e-02    7.437 1.03e-13 ***
## ma1    -8.358e-01   6.902e-02  -12.109  < 2e-16 ***
## omega   1.079e-05   4.944e-06    2.182 0.029084 *  
## alpha1  1.253e-01   6.930e-02    1.808 0.070672 .  
## alpha2  8.521e-02   8.665e-02    0.983 0.325386    
## beta1   7.696e-01   5.202e-02   14.794  < 2e-16 ***
## skew    8.659e-01   5.618e-02   15.413  < 2e-16 ***
## shape   6.710e+00   2.027e+00    3.311 0.000931 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  1386.982    normalized:  2.746499 
## 
## Description:
##  Mon Jun 21 09:54:07 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  33.60938  5.032875e-08
##  Shapiro-Wilk Test  R    W      0.9829162 1.184532e-05
##  Ljung-Box Test     R    Q(10)  14.93142  0.1345823   
##  Ljung-Box Test     R    Q(15)  20.53264  0.1524379   
##  Ljung-Box Test     R    Q(20)  22.63912  0.3068601   
##  Ljung-Box Test     R^2  Q(10)  9.089247  0.5236552   
##  Ljung-Box Test     R^2  Q(15)  9.879089  0.8272758   
##  Ljung-Box Test     R^2  Q(20)  11.83192  0.9217395   
##  LM Arch Test       R    TR^2   9.264897  0.6801537   
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -5.457355 -5.382066 -5.457975 -5.427824

Fit ARMA(1, 1) - GARCH(2, 1) model with generalized error distribution.

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~arma(1, 1) + garch(2, 1), data = training_set, 
##     cond.dist = "ged", trace = F) 
## 
## Mean and Variance Equation:
##  data ~ arma(1, 1) + garch(2, 1)
## <environment: 0x7ff7aaf12230>
##  [data = training_set]
## 
## Conditional Distribution:
##  ged 
## 
## Coefficient(s):
##          mu          ar1          ma1        omega       alpha1       alpha2  
##  6.7615e-04   6.8279e-01  -8.3055e-01   1.0797e-05   1.4818e-01   6.0657e-02  
##       beta1        shape  
##  7.6827e-01   1.4025e+00  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu      6.762e-04   1.814e-04    3.728 0.000193 ***
## ar1     6.828e-01   7.533e-02    9.064  < 2e-16 ***
## ma1    -8.305e-01   6.124e-02  -13.562  < 2e-16 ***
## omega   1.080e-05   5.081e-06    2.125 0.033596 *  
## alpha1  1.482e-01   7.098e-02    2.088 0.036825 *  
## alpha2  6.066e-02   8.636e-02    0.702 0.482424    
## beta1   7.683e-01   5.323e-02   14.432  < 2e-16 ***
## shape   1.402e+00   1.206e-01   11.629  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  1384.262    normalized:  2.741112 
## 
## Description:
##  Mon Jun 21 09:54:07 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  32.57534  8.440238e-08
##  Shapiro-Wilk Test  R    W      0.98335   1.556233e-05
##  Ljung-Box Test     R    Q(10)  14.7511   0.1414033   
##  Ljung-Box Test     R    Q(15)  20.61777  0.1494925   
##  Ljung-Box Test     R    Q(20)  22.85798  0.2958215   
##  Ljung-Box Test     R^2  Q(10)  8.699573  0.5608328   
##  Ljung-Box Test     R^2  Q(15)  9.613367  0.84333     
##  Ljung-Box Test     R^2  Q(20)  11.47761  0.9328845   
##  LM Arch Test       R    TR^2   8.796358  0.7202188   
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -5.450541 -5.383617 -5.451033 -5.424292

Fit ARMA(1, 1) - EGARCH(2, 1) model with Student t-distribution

The exponential GARCH Model is another form of GARCH model which is able to overcome deficiencies of a standard GARCH model, i.e. to capture asymmetries and it also imposes less assumptions on the parameters of the model.

## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : eGARCH(2,1)
## Mean Model   : ARFIMA(1,0,1)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error   t value Pr(>|t|)
## mu      0.002133    0.000262   8.13284 0.000000
## ar1     0.696325    0.036141  19.26708 0.000000
## ma1    -0.835149    0.030195 -27.65864 0.000000
## omega  -0.342423    0.149743  -2.28674 0.022211
## alpha1 -0.018614    0.072655  -0.25620 0.797794
## alpha2 -0.011812    0.073890  -0.15986 0.872989
## beta1   0.959055    0.018030  53.19312 0.000000
## gamma1  0.257589    0.124694   2.06576 0.038851
## gamma2  0.108704    0.135069   0.80480 0.420933
## shape   6.352813    1.821125   3.48840 0.000486
## 
## Robust Standard Errors:
##         Estimate  Std. Error   t value Pr(>|t|)
## mu      0.002133    0.000240   8.89910 0.000000
## ar1     0.696325    0.015976  43.58696 0.000000
## ma1    -0.835149    0.015498 -53.88634 0.000000
## omega  -0.342423    0.122392  -2.79775 0.005146
## alpha1 -0.018614    0.081645  -0.22799 0.819653
## alpha2 -0.011812    0.082048  -0.14397 0.885527
## beta1   0.959055    0.014909  64.32551 0.000000
## gamma1  0.257589    0.151085   1.70493 0.088207
## gamma2  0.108704    0.161564   0.67282 0.501060
## shape   6.352813    1.589815   3.99594 0.000064
## 
## LogLikelihood : 1383.244 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -5.4386
## Bayes        -5.3549
## Shibata      -5.4394
## Hannan-Quinn -5.4058
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic  p-value
## Lag[1]                     0.3239 0.569276
## Lag[2*(p+q)+(p+q)-1][5]    4.9894 0.003433
## Lag[4*(p+q)+(p+q)-1][9]    8.1743 0.049741
## d.o.f=2
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                          statistic p-value
## Lag[1]                      0.1371  0.7112
## Lag[2*(p+q)+(p+q)-1][8]     1.7968  0.8889
## Lag[4*(p+q)+(p+q)-1][14]    4.5276  0.8254
## d.o.f=3
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[4]    0.1148 0.500 2.000  0.7348
## ARCH Lag[6]    1.8721 1.461 1.711  0.5196
## ARCH Lag[8]    2.4053 2.368 1.583  0.6595
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.1104
## Individual Statistics:              
## mu     0.10216
## ar1    0.02347
## ma1    0.02211
## omega  0.20590
## alpha1 0.11822
## alpha2 0.07415
## beta1  0.18502
## gamma1 0.09281
## gamma2 0.12143
## shape  0.15281
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          2.29 2.54 3.05
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.4548 0.6494    
## Negative Sign Bias  0.2151 0.8298    
## Positive Sign Bias  0.4098 0.6821    
## Joint Effect        0.4375 0.9324    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     28.35      0.07698
## 2    30     33.51      0.25746
## 3    40     50.49      0.10295
## 4    50     61.04      0.11606
## 
## 
## Elapsed time : 0.2306859

Fit ARMA(1, 1) - FGARCH(2, 1) model with Student t-distribution

## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : fGARCH(2,1)
## fGARCH Sub-Model : GARCH
## Mean Model   : ARFIMA(1,0,1)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.002197    0.000311  7.06914 0.000000
## ar1     0.686639    0.110424  6.21821 0.000000
## ma1    -0.829487    0.084608 -9.80386 0.000000
## omega   0.000009    0.000006  1.53364 0.125119
## alpha1  0.185852    0.051548  3.60542 0.000312
## alpha2  0.003371    0.013528  0.24918 0.803224
## beta1   0.781031    0.071300 10.95411 0.000000
## shape   6.483214    2.066436  3.13739 0.001705
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.002197    0.000311  7.05301 0.000000
## ar1     0.686639    0.123220  5.57245 0.000000
## ma1    -0.829487    0.094692 -8.75981 0.000000
## omega   0.000009    0.000009  1.02605 0.304867
## alpha1  0.185852    0.073040  2.54452 0.010943
## alpha2  0.003371    0.013891  0.24266 0.808269
## beta1   0.781031    0.060986 12.80665 0.000000
## shape   6.483214    2.302972  2.81515 0.004875
## 
## LogLikelihood : 1383.952 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -5.4493
## Bayes        -5.3824
## Shibata      -5.4498
## Hannan-Quinn -5.4231
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic  p-value
## Lag[1]                      0.137 0.711309
## Lag[2*(p+q)+(p+q)-1][5]     4.901 0.004622
## Lag[4*(p+q)+(p+q)-1][9]     8.130 0.051666
## d.o.f=2
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                          statistic p-value
## Lag[1]                     0.05118  0.8210
## Lag[2*(p+q)+(p+q)-1][8]    2.18498  0.8295
## Lag[4*(p+q)+(p+q)-1][14]   5.06911  0.7609
## d.o.f=3
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[4]   0.05846 0.500 2.000  0.8089
## ARCH Lag[6]   1.54203 1.461 1.711  0.5998
## ARCH Lag[8]   2.03951 2.368 1.583  0.7336
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  2.5116
## Individual Statistics:              
## mu     0.11303
## ar1    0.02647
## ma1    0.02857
## omega  0.30597
## alpha1 0.21200
## alpha2 0.17454
## beta1  0.19165
## shape  0.14061
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.89 2.11 2.59
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias          0.35189 0.7251    
## Negative Sign Bias 0.09837 0.9217    
## Positive Sign Bias 0.11232 0.9106    
## Joint Effect       0.25044 0.9691    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     32.07      0.03070
## 2    30     27.46      0.54715
## 3    40     53.18      0.06458
## 4    50     58.07      0.17581
## 
## 
## Elapsed time : 0.4275339

Fit ARMA(1, 1) - IGARCH(2, 1) model with Student t-distribution.

Integrated GARCH Model is a restricted version of the GARCH model, where the persistent parameters sum up to one. \[ \sum^p_{i=1} ~\beta_{i} +\sum_{i=1}^q~\alpha_{i} = 1 \]

## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : iGARCH(2,1)
## Mean Model   : ARFIMA(1,0,1)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error   t value Pr(>|t|)
## mu      0.002202    0.000292   7.54328 0.000000
## ar1     0.692584    0.104186   6.64756 0.000000
## ma1    -0.837489    0.079399 -10.54790 0.000000
## omega   0.000009    0.000004   2.43006 0.015096
## alpha1  0.152971    0.079759   1.91792 0.055121
## alpha2  0.070082    0.092792   0.75526 0.450095
## beta1   0.776948          NA        NA       NA
## shape   6.079698    1.564976   3.88485 0.000102
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.002202    0.000291  7.56560 0.000000
## ar1     0.692584    0.124074  5.58205 0.000000
## ma1    -0.837489    0.098121 -8.53526 0.000000
## omega   0.000009    0.000003  2.80590 0.005018
## alpha1  0.152971    0.092254  1.65814 0.097289
## alpha2  0.070082    0.103313  0.67834 0.497553
## beta1   0.776948          NA       NA       NA
## shape   6.079698    1.441285  4.21825 0.000025
## 
## LogLikelihood : 1384.019 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -5.4535
## Bayes        -5.3950
## Shibata      -5.4539
## Hannan-Quinn -5.4306
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic  p-value
## Lag[1]                    0.09987 0.751983
## Lag[2*(p+q)+(p+q)-1][5]   4.87384 0.005065
## Lag[4*(p+q)+(p+q)-1][9]   8.03274 0.056110
## d.o.f=2
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                          statistic p-value
## Lag[1]                     0.02197  0.8822
## Lag[2*(p+q)+(p+q)-1][8]    1.85547  0.8805
## Lag[4*(p+q)+(p+q)-1][14]   4.79125  0.7949
## d.o.f=3
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[4]   0.01382 0.500 2.000  0.9064
## ARCH Lag[6]   1.38581 1.461 1.711  0.6406
## ARCH Lag[8]   1.97374 2.368 1.583  0.7469
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.3259
## Individual Statistics:              
## mu     0.11112
## ar1    0.02824
## ma1    0.03078
## omega  0.18152
## alpha1 0.12678
## alpha2 0.09033
## shape  0.13387
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.69 1.9 2.35
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias          0.36609 0.7145    
## Negative Sign Bias 0.01196 0.9905    
## Positive Sign Bias 0.01530 0.9878    
## Joint Effect       0.25201 0.9688    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     27.48      0.09406
## 2    30     32.68      0.29070
## 3    40     59.36      0.01937
## 4    50     70.35      0.02443
## 
## 
## Elapsed time : 0.08620811

Selecting the best model based on Information Criteria

If we compare Akaike Information Criteria of all of our models we see that the ARMA(1, 1) - GARCH (2, 1) with skew student distribution is the best one.

##                            Model       AIC
## 1          GARCH(1, 1) Student t -5.452700
## 2         GARCH (2, 1) Student t -5.451191
## 3       GARCH(2, 1) skew Student -5.457355
## 4         EGARCH(2, 1) Student t -5.438600
## 5                   FGARCH(2, 1) -5.449300
## 6 GARCH(2, 1) Generalized Error  -5.450541
## 7         IGARCH(2, 1) Student t -5.453500

Let’s take a closer look at the chosen model once again.

Even though this model had the lowest AIC, not all of it’s parameters are significant. Here we can see that alpha 2 is insignificant with p-value of 0.325.

Ljung-Box Test R Q(10) 14.93144 0.1345816 - no correlation
Ljung-Box Test R Q(15) 20.53265 0.1524374 - no correlation
Ljung-Box Test R Q(20) 22.63913 0.3068595 - no correlation
Ljung-Box Test R^2 Q(10) 9.089257 0.5236542 - no dependence in conditional variance
Ljung-Box Test R^2 Q(15) 9.8791 0.8272751 - no dependence in conditional variance
Ljung-Box Test R^2 Q(20) 11.83193 0.9217392 - no dependence in conditional variance
LM Arch Test tells us that there’s no additional ARCH effect our model didn’t captured.

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~arma(1, 1) + garch(2, 1), data = training_set, 
##     cond.dist = "sstd", trace = F) 
## 
## Mean and Variance Equation:
##  data ~ arma(1, 1) + garch(2, 1)
## <environment: 0x7ff7a9aa98d0>
##  [data = training_set]
## 
## Conditional Distribution:
##  sstd 
## 
## Coefficient(s):
##          mu          ar1          ma1        omega       alpha1       alpha2  
##  0.00061444   0.68385472  -0.83579617   0.00001079   0.12527362   0.08521354  
##       beta1         skew        shape  
##  0.76959753   0.86590389   6.71037266  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu      6.144e-04   2.009e-04    3.059 0.002220 ** 
## ar1     6.839e-01   9.195e-02    7.437 1.03e-13 ***
## ma1    -8.358e-01   6.902e-02  -12.109  < 2e-16 ***
## omega   1.079e-05   4.944e-06    2.182 0.029084 *  
## alpha1  1.253e-01   6.930e-02    1.808 0.070672 .  
## alpha2  8.521e-02   8.665e-02    0.983 0.325386    
## beta1   7.696e-01   5.202e-02   14.794  < 2e-16 ***
## skew    8.659e-01   5.618e-02   15.413  < 2e-16 ***
## shape   6.710e+00   2.027e+00    3.311 0.000931 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  1386.982    normalized:  2.746499 
## 
## Description:
##  Mon Jun 21 09:54:09 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  33.60938  5.032875e-08
##  Shapiro-Wilk Test  R    W      0.9829162 1.184532e-05
##  Ljung-Box Test     R    Q(10)  14.93142  0.1345823   
##  Ljung-Box Test     R    Q(15)  20.53264  0.1524379   
##  Ljung-Box Test     R    Q(20)  22.63912  0.3068601   
##  Ljung-Box Test     R^2  Q(10)  9.089247  0.5236552   
##  Ljung-Box Test     R^2  Q(15)  9.879089  0.8272758   
##  Ljung-Box Test     R^2  Q(20)  11.83192  0.9217395   
##  LM Arch Test       R    TR^2   9.264897  0.6801537   
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -5.457355 -5.382066 -5.457975 -5.427824

By examining the statistics and QQ plot of our chosen model, we see that even though we used the model with skew-student distribution, we still don’t have normal distribution the skewnees is still not zero.

The experiment with Generalized Error Distribution didn’t fix this issue.

##                attilda
## nobs        505.000000
## NAs           0.000000
## Minimum      -3.506913
## Maximum       3.389879
## 1. Quartile  -0.557014
## 3. Quartile   0.581023
## Mean         -0.012745
## Median        0.040364
## Sum          -6.436045
## SE Mean       0.044357
## LCL Mean     -0.099892
## UCL Mean      0.074403
## Variance      0.993609
## Stdev         0.996799
## Skewness     -0.357820
## Kurtosis      1.024246

Evaluate selected model

We first do a one-period-ahead forecast using the selected model, here we use rolling forecast method.

Now let’s take a look at the metrics of our model on out-of-sample data.

## [1] "RMSE of selected model: ARMA(1, 1)-GARCH(2, 1) with skew student distribution  0.013672"

Now let’s do a forecast with two aditional models, EGARCH and IGARCH and compare the results.

EGARCH:

## [1] "RMSE of EGARCH model 0.014105"

IGARCH:

## [1] "RMSE of IGARCH model 0.013615"

Root-mean-square error of all three models is pretty similar.

To further evaluate models, we use the Diebold-Mariano test to determine whether forecasts are significantly different.

Since p-value for both models comparison is zero (2.2e-16 and 1.55e-13) we reject the null hypothesis.

This tells us that the difference in models performance is not significant.

## 
##  Diebold-Mariano Test
## 
## data:  structure(abs(predictions_selected - target_values), class = "forecast")structure(abs(predictions_igarch - target_values), class = "forecast")
## DM = -12.492, Forecast horizon = 1, Loss function power = 2, p-value <
## 2.2e-16
## alternative hypothesis: two.sided

## 
##  Diebold-Mariano Test
## 
## data:  structure(abs(predictions_selected - target_values), class = "forecast")structure(abs(predictions_egarch - target_values), class = "forecast")
## DM = -8.4225, Forecast horizon = 1, Loss function power = 2, p-value =
## 1.55e-13
## alternative hypothesis: two.sided

Statistics and Financial Data Analysis

A work by: Nikola Krivacevic, Aleksandar Milinkovic and Milos Milunovic

Entire forecasting project on github

(https://github.com/mcf-long-short/statistics-stocks-forecasting)