Forecasting Volatility for SPDR ETF

Collins Hackman

2022-04-20

percentage log returns of the spdr etf

rwx <- getSymbols('RWX', frequency = 'daily', auto.assign = FALSE)
#percentage log returns of the adjusted closing price
rwx$log.ret <- 100 * diff(log(Ad(rwx)))
head(rwx)

           RWX.Open RWX.High RWX.Low RWX.Close RWX.Volume RWX.Adjusted
2007-01-03    64.00    64.00   63.14     63.20     264900     29.41992
2007-01-04    63.00    63.00   62.61     62.73     235100     29.20113
2007-01-05    62.25    62.25   61.70     61.95     165300     28.83804
2007-01-08    61.45    61.75   61.39     61.74     149500     28.74028
2007-01-09    62.06    62.14   61.46     61.84     148900     28.78683
2007-01-10    61.30    61.70   61.10     61.49     167900     28.62390
              log.ret
2007-01-03         NA
2007-01-04 -0.7464588
2007-01-05 -1.2512023
2007-01-08 -0.3395830
2007-01-09  0.1618541
2007-01-10 -0.5676096

Statistics summary

#stats for the etf
basicStats(rwx$log.ret)

                log.ret
nobs        3858.000000
NAs            1.000000
Minimum      -10.919917
Maximum       10.178258
1. Quartile   -0.543331
3. Quartile    0.645946
Mean           0.002421
Median         0.030251
Sum            9.339514
SE Mean        0.022087
LCL Mean      -0.040882
UCL Mean       0.045725
Variance       1.881575
Stdev          1.371705
Skewness      -0.904794
Kurtosis      10.140237

ANALYSIS:

The distribution is not normal as it is skewed to the left and leptokurtic as the excess kurtosis is greater than 3 thus showing fat tails (high probability of getting extreme returns). Looking at the mean and quartiles, average return is approximately 0, 25% of the returns fall under -0.5 with 75% of the returns being below 0.65 given a median return of 0.03.

#Extracting extreme negative returns

ret.losses <- -rwx$log.ret
ret.extreme <- ret.losses[ret.losses > 0.025]
head(ret.extreme)

             log.ret
2007-01-04 0.7464588
2007-01-05 1.2512023
2007-01-08 0.3395830
2007-01-10 0.5676096
2007-01-18 0.1116696
2007-01-25 0.5915135

length(ret.extreme)

[1] 1796

plot(ret.extreme, type = 'h')

ANALYSIS: - Since 2007-01-04 to 2022-04-26, there has been 1796 days with extreme negative returns (losses) whereby the returns fell beyond 2.5%

Plotting absolute returns

#absolute returns plot
s1<-ggplot() + geom_line(aes(time(rwx), abs(rwx$log.ret)), color="lightblue") + labs(x=NULL, y=NULL) + theme_dark() 
ggplotly(s1)

ANALYSIS:

There is clustering in the volatility of the absolute returns example; periods of low volatility were followed by low periods from 2019 to 2020 and periods of high volatility are followed by periods of high volatility from 2021 to date indicating high persistence. This gives evidence to serial dependency meaning the returns are not entirely identically independently distributed.

ACF plots for both percentage logarithmic and absolute returns

# ACF for both log and absolute returns 
p3<-corrgram(rwx$log.ret, lag.max = 50, type = "correlation", mode = "simple", ci = 0.95, style = "plotly", knit = T)
p4<-corrgram(abs(rwx$log.ret), lag.max = 50, type = "correlation", mode = "simple", ci = 0.95, style = "plotly", knit = T)
p3

p4

#OR, using forecast library:
#p1<-ggAcf(rwx$log.ret, lag=50)+theme_classic()
#p2<-ggAcf(abs(rwx$log.ret), lag=50)+theme_classic()
#grid.arrange(p1,p2, ncol=2)

ANALYSIS:

The acf for some lag periods are significant as they cross the 95% confidence interval threshold (blue dash lines) example lag 6 and lag 34 acf’s are significant. This indicates there’s some serial dependency. Moreover, the acf for the absolute returns show high persistence in the lag periods indicating slower mean reversion as there’s clustering. This is also shows serial dependency. This allows me to discard the random walk assumption.

ADF testing of Log price and Log returns

#my analysis: Ljung Box-Test
Box.test(abs(rwx$log.ret), lag=10, type='Ljung')


    Box-Ljung test

data:  abs(rwx$log.ret)
X-squared = 4290.1, df = 10, p-value < 2.2e-16

Box.test(rwx$log.ret, lag=10, type='Ljung')


    Box-Ljung test

data:  rwx$log.ret
X-squared = 43.378, df = 10, p-value = 4.256e-06

ANALYSIS: - The absolute return have a high serial dependency given the high and significant x-squared coefficient

The log returns also supports serial dependency assumption.

#log of the adjusted closing price
log.price<-log(Ad(rwx))

#adf testing of the log price
adftest.logprice<- ur.df(log.price, type='none', selectlags='AIC')
adftest.logprice


############################################################### 
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test # 
############################################################### 

The value of the test statistic is: -0.0079

# adf testing of the log returns 
adftest.logret<- ur.df(rwx$log.ret[-1,], type='none', selectlags = 'AIC')
adftest.logret


############################################################### 
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test # 
############################################################### 

The value of the test statistic is: -40.7661

ANALYSIS:

The log price is not stationary because it’s t-statistic of 0.03 is greater than -3.43 thus, I fail to reject the null hypothesis whereas the percentage log etf returns is stationary as the t-statistic of -40.7 is less than -3.43, thus, I reject the null hypothesis.

Exponential Moving Average of the ETF’s returns

# EMA of the etf returns with lambda as 0.06
ema06 <- EMA(rwx$log.ret^2, ratio=0.06) 
# EMA and absoulte returns plot
a1<-ggplot(rwx) + geom_point(aes(time(rwx$log.ret), abs(rwx$log.ret)), color="gray45", size=0.4) +
  geom_line(aes(time(rwx$log.ret), ema06^0.5), color="seagreen4", size = 0.8) +
  theme_bw(base_size = 22) + labs(x=NULL, y=NULL)
ggplotly(a1)

ANALYSIS:

The EMA being smoother as its ad-hoc lambda parameter of 0.06 gives slowly decaying weights to the lag returns thus in year 2008 where absolute returns was 9%, the EMA variance estimate is 4.92% (0.06*0.82) meaning a volatility of 2.22 (sqaured root of the variance estimate)

GARCH(1,1) Modelling

require(rugarch)
#specifying the model
model.spec=ugarchspec(variance.model=list(model='sGARCH',garchOrder=c(1,1)),
                      mean.model=list(armaOrder=c(1,0)))
#estimating
model.fit=ugarchfit(spec=model.spec, data=na.omit(rwx$log.ret))
round(coef(model.fit), 2)

    mu    ar1  omega alpha1  beta1 
  0.03  -0.02   0.01   0.13   0.87

ANALYSIS:

The GARCH’S alpha1 parameter is similar to the EMA’S lambda whereas GARCh beta1 is also similar to the (1-lambda) parameter in EMA(exponential moving average).
The alpha+beta determine how fast the conditional variance reverts to the unconditional variance(long-run average variance): When (aplha+beta)^h = 1, this indicates a random walk behavior meaning the conditional variance is very persistent and thus will slowly revert to the long-run average however, if (alpha+beta)^h<1, this indicates stationarity meaning reverting quicker to the mean variance in the long run. For this fit, (alpha+beta)^h is equal to 1

Plotting the conditional {short run} standard deviation

#plotting the conditional standard deviation (sigma)
a2<-autoplot(merge(GARCH=sigma(model.fit), ema06^0.5, abs(rwx$log.ret)))+theme_bw()
ggplotly(a2)

ANALYSIS:

The GARCH estimates are slightly larger than the EMA variance estimates example, in 2008 GARCH estimate is 6.3 and the EMA estimate is 5.2 same as in 2020 where the GARCH is 6 and the EMA variance estimate is 5.

GJR-GARCH(1,1) Modelling

#estimating GJR-GARCH(1,1)
spec=ugarchspec(variance.model=list(model='gjrGARCH', garchOrder=c(1,1)),
                mean.model=list(armaOrder=c(1,0)))
fitgjr=ugarchfit(spec=spec, data=na.omit(rwx$log.ret))
knitr::kable(data.frame(Estimate=coef(fitgjr), Tstat=fitgjr@fit$tval))

	Estimate	Tstat
mu	0.0111698	0.8341305
ar1	-0.0154900	-0.8945724
omega	0.0141225	4.7964436
alpha1	0.0549054	4.7914216
beta1	0.8885069	83.2886408
gamma1	0.1013141	6.7653112

ANALYSIS:

The GJR-GARCH assumes the position that the volatility of the negative returns (gamma) have a different effect on the standard deviation estimate (future volatility) relative to the returns estimate in general (alpha). Looking at gamma, its t-stat is greater than 1.96 at 5% significance error same as the alpha estimate which is also significant. Therefore, there is evidence for asymmetry given that the negative returns have an alpha + gamma effect on the drops in the returns.

#Plotting Volatility forecasts for both GARCH and GJR-GARCH models

#plotting volatility forecast for the two GARCH models
nforecast=250 #250 days
garchforecast<-ugarchforecast(model.fit, n.ahead=nforecast)
gjrforecast <-ugarchforecast(fitgjr, n.ahead=nforecast)
daysfct<-data.frame(Date = end(rwx$log.ret)+1:nforecast,
                    Gjr = as.numeric(sigma(gjrforecast)),
                    Garch = as.numeric(sigma(garchforecast)))

a3<-ggplot(daysfct)+geom_line(aes(Date, Garch), color='cyan1')+
  geom_line(aes(Date, Gjr), color='chartreuse2')+
  geom_hline(yintercept=sd(rwx$log.ret), color='red', linetype='dashed')+
  labs(x=NULL, y=NULL, title=paste('Forecaste date: ', end(rwx$log.ret)))+
  theme_bw(base_size=20)
ggplotly(a3)

ANALYSIS:

Tracing back to question 6 discussion, my etf’s conditional variance exhibits a random walk behavior(alpha+beta=1), thus it is not surprising that the forecast seems to trend upwards and doesn’t revert quickly to the long run average variance since alpha+beta was equal to 1.
The GJR-GARCH reverts quicker relative to the GARCH.
Also, the GJR and GARCH seem to be parallel as the t+k grows.