Overview

In one of my earlier notes on GARCH models (see link here), I demonstrated how to simulate, estimate, and forecast volatility models. In this vignette, I would like to demonstrate how to deploy a machine learning (henceforth ML) approach for volatility forecasting.

Simulating Data

Applying a similar code from these notes I simulate a time series that exhibits a GARCH-like data over 1000 periods.

library(fGarch)
seed <- 17
sim_garch <- function(theta,n) {
  
  # set the model parameters
  con <- theta[1]
  w <- theta[2]
  a <- theta[3]
  b <- theta[4]
  
  sigma_t <- sqrt(w/(1-a-b)) # start with initial volatility 
  ht_seq <- numeric()
  R_seq <- numeric()
  
  set.seed(seed)
  
  for(i in 1:n) {
    R_t1 <- rnorm(1,con,sigma_t)
    R_seq <- c(R_seq,R_t1)
    ht1 <- w + a*(R_t1^2) + b*sigma_t^2
    ht_seq <- c(ht_seq,ht1)
    sigma_t <- sqrt(ht1)
  }
  
  list(R_t = R_seq,h_t = ht_seq)
}
# set the number simulated returns
n <- 10^3
con <- 0
w <- 0.1
a <- 0.2
b <- 0.5
par_0 <- c(con,w,a,b)
sim1 <- sim_garch(par_0,n)
h_t <- sim1$h_t
R_t <- sim1$R_t
plot(sim1$h_t, type = "l", ylab = expression(h[t]))
grid(10)

Main Objective

Suppose we would like to forecast the volatility over the last 10% periods on a dynamic basis. To do so, let’s first split the data:

N_test <- 0.1*n
index <- 1:n
index_test <- (n - N_test + 1):n
index_train <- index[!index %in% index_test]
h_train <- h_t[index_train]
h_test <- h_t[index_test]
R_train <- R_t[index_train]
R_test <- R_t[index_test]

The question remains how can we forecast the h_test process given the h_train? To answer this, I will consider two approaches. One is a model driven (GARCH) and the other is data driven (ML).

Model Based Forecast

A simple approach to forecast future volatility is to build the volatility term structure using GARCH. To do so, we need to calibrate a GARCH model, which requires an estimation procedure using actual data. The volatility term structure is constructed to evaluate the convergence of the current volatility to the long-term one. In other words, it helps us to assess whether the current volatility level is greater or less than the long-term one. Also, it answers the question how long it would take the current volatility to converge back to the long-term level.

For back-testing, it is more realistic to construct the forecast on a dynamic basis. This can be achieved by considering the realized volatility as an update in the next period volatility forecasting procedure. This can be achieved as follows

g_train <- garchFit(formula = ~ garch(1,1), data = R_train ,trace = F)
theta_hat <-  g_train@fit$coef
gamma <- theta_hat[3] + theta_hat[4]
w <- theta_hat[2]
h_for <-  h_t[length(index_train)]
h_fit <- numeric()
for(i in 1:length(index_test)){ 
  h_next <- w + gamma*h_t[length(index_train)+i-1]
  h_fit <- c(h_fit,h_next)
}
{
  plot(h_test~index_test, type = "l", ylab = expression(h[t]), col = 1)
  abline(h = w/(1-gamma),lty = 2)
  lines(h_fit~index_test,col = 3,lty = 1)
  legend("topleft",c("True","GARCH"), col = c(1,3))
  grid(10)
}

To see how the two compare, we can plot the realized versus the forecast.

{
  plot(h_test~h_fit,pch = 20,cex = 0.5)
  abline(a=0,b=1,lty  = 2)
  grid(10)
}

In terms of regression, we have

lm_garch <- lm(h_test~h_fit)
stargazer::stargazer(lm_garch,type = "html",covariate.labels = "GARCH Forecast",dep.var.labels = "Actual Volatility")
Dependent variable:
Actual Volatility
GARCH Forecast 0.970***
(0.086)
Constant -0.017
(0.028)
Observations 100
R2 0.562
Adjusted R2 0.558
Residual Std. Error 0.066 (df = 98)
F Statistic 125.816*** (df = 1; 98)
Note: p<0.1; p<0.05; p<0.01

We observe that the GARCH forecasts closely mimic the actual test data, i.e. intercept of zero, slope roughly 1, and \(R^2\) of 56%. Nonetheless, the main issue here is that the data set, by construction, follows a GARCH model. Hence, this should not come as a surprise. The more challenging and realistic task is whether the data in real-life is stationary and obeys to such data generating function.

Machine Learning Approach

A data-driven approach does not possess information about the underlying process of the data. Hence, an ML approach would learn the volatility behavior from the data without imposing a structural model. The task here is to investigate the capability of the ML to map the data into a volatility forecast and compare it with respect to its benchmark, the GARCH model.

I use a similar code from my previous notes for implementation of ML algorithms. In this case, I focus on the support vector machines (SVM) with a radial kernel. To do so, my feature space is constructed using the current process itself (squared) and the current log-realized volatility. The outcome variable, on the other hand, is given by the next period log-realized volatility. Note that we use a log transformation, followed by an exponential one, to make sure the fitted volatilities are non-negative.

Similar to the previous analysis, we split the data into in-sample and out-of-sample sets. Additionally, for a more efficient implementation, I refer to the doParallel package for parallel computing on an 8-core laptop as demonstrated below:

library(caret)
library(doParallel)
ds <- na.omit(data.frame(h = log(h_t), r = R_t^2, y = log(c(h_t[-1],NA)) ))
index <- 1:nrow(ds)
index_train2 <- index_train
index_train2 <- index_train2[-length(index_train2)] # avoid the look ahead observation
index_test <- index[!index %in% index_train2]
ds_train <- ds[index_train,]
ds_test <- ds[index_test,]
model_i <- "svmRadial"
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 5)
cl <- makePSOCKcluster(8)
registerDoParallel(cl)
train_model <- train(y~ h + r, data = ds_train, method = model_i,
                     trControl=trctrl,
                     preProcess = c("center", "scale"),
                     tuneLength = 10)
stopCluster(cl)

Given the trained model, we forecast the next period using the test dataset and use the exponential transformation to retrieve the volatility level. As a summary, we plot altogether:

h_ml <- exp(predict(train_model,ds_test))
{
  plot(h_test~index_test,type = "l")
  lines(h_ml~index_test,col = 2)
  lines(h_fit~index_test,col = 3)
  legend("topleft",c("True","ML","GARCH"))
  grid(10)
}

From the above graph, we observe that the ML algorithm closely mimics the GARCH model in terms of forecasts. Since the data in our case is stationary, learning from historical data should be relevant for future applications. At the same time, we discern that the ML algorithm is able to uncover the auto-regressive component in the volatility process.

In terms of regression, we note the ML forecast closely corresponds to the GARCH statistics, with a slightly smaller \(R^2\). This can be inferred from the following regression:

lm_ml <- lm(h_test~h_ml)
stargazer::stargazer(list(lm_ml,lm_garch),type = "html",covariate.labels = c("ML Forecast","GARCH Forecast"),dep.var.labels = "Actual Volatility")
Dependent variable:
Actual Volatility
(1) (2)
ML Forecast 1.097***
(0.103)
GARCH Forecast 0.970***
(0.086)
Constant -0.014 -0.017
(0.029) (0.028)
Observations 100 100
R2 0.537 0.562
Adjusted R2 0.532 0.558
Residual Std. Error (df = 98) 0.068 0.066
F Statistic (df = 1; 98) 113.545*** 125.816***
Note: p<0.1; p<0.05; p<0.01

Concluding Remarks

The above experiment is conducted using simulated data from a GARCH model. For this, it is not surprising that the GARCH model provides a better approximation of future realized volatility. However, two major issues should be considered as a primary investigation. First, how does the result change if there is a structural break in the data? For instance, if the data is non-stationary, how does each model compare? Second, what is the economic value of deploying ML versus GARCH models?

For the first issue, we may add shocks to the training sample and test the sensitivity in the test set. In other words, which approach is more robust? We should get a better picture using real data. For the second issue, a portfolio selection problem is of high relevance. We can test the economic performance of global minimum portfolio constructed using either model. I leave both for future research.

---
title: "Volatility Models via Machine Learning"
#output: rmarkdown::github_document
output:
  html_notebook: default
  pdf_document: default
author: Majeed Simaan
date: Dec 28, 2019
fig_width: 50
---

# Overview
In one of my earlier notes on GARCH models (see link [here](https://rpubs.com/simaan84/garch)), I demonstrated how to simulate, estimate, and forecast volatility models. In this vignette, I would like to demonstrate how to deploy a machine learning (henceforth ML) approach for volatility forecasting. 


# Simulating Data
Applying a similar code from these [notes](https://rpubs.com/simaan84/garch) I simulate a time series that exhibits a GARCH-like data over 1000 periods. 
```{r,fig.align='center',message=FALSE}
library(fGarch)

seed <- 17

sim_garch <- function(theta,n) {
  
  # set the model parameters
  con <- theta[1]
  w <- theta[2]
  a <- theta[3]
  b <- theta[4]
  
  sigma_t <- sqrt(w/(1-a-b)) # start with initial volatility 
  ht_seq <- numeric()
  R_seq <- numeric()
  
  set.seed(seed)
  
  for(i in 1:n) {
    R_t1 <- rnorm(1,con,sigma_t)
    R_seq <- c(R_seq,R_t1)
    ht1 <- w + a*(R_t1^2) + b*sigma_t^2
    ht_seq <- c(ht_seq,ht1)
    sigma_t <- sqrt(ht1)
  }
  
  list(R_t = R_seq,h_t = ht_seq)
}

# set the number simulated returns
n <- 10^3

con <- 0
w <- 0.1
a <- 0.2
b <- 0.5

par_0 <- c(con,w,a,b)

sim1 <- sim_garch(par_0,n)
h_t <- sim1$h_t
R_t <- sim1$R_t

plot(sim1$h_t, type = "l", ylab = expression(h[t]))
grid(10)
```


# Main Objective
Suppose we would like to forecast the volatility over the last 10% periods on a dynamic basis. To do so, let's first split the data:
```{r}
N_test <- 0.1*n
index <- 1:n
index_test <- (n - N_test + 1):n
index_train <- index[!index %in% index_test]
h_train <- h_t[index_train]
h_test <- h_t[index_test]
R_train <- R_t[index_train]
R_test <- R_t[index_test]
```
The question remains how can we forecast the `h_test` process given the `h_train`? To answer this, I will consider two approaches. One is a model driven (GARCH) and the other is data driven (ML).

## Model Based Forecast
A simple approach to forecast future volatility is to build the volatility term structure using GARCH. To do so, we need to calibrate a GARCH model, which requires an estimation procedure using actual data. The volatility term structure is constructed to evaluate the convergence of the current volatility to the long-term one. In other words, it helps us to assess whether the current volatility level is greater or less than the long-term one. Also, it answers the question how long it would take the current volatility to converge back to the long-term level. 

For back-testing, it is more realistic to construct the forecast on a dynamic basis. This can be achieved by considering the realized volatility as an update in the next period volatility  forecasting procedure. This can be achieved as follows

```{r,fig.align="center"}
g_train <- garchFit(formula = ~ garch(1,1), data = R_train ,trace = F)
theta_hat <-  g_train@fit$coef
gamma <- theta_hat[3] + theta_hat[4]
w <- theta_hat[2]
h_for <-  h_t[length(index_train)]
h_fit <- numeric()
for(i in 1:length(index_test)){ 
  h_next <- w + gamma*h_t[length(index_train)+i-1]
  h_fit <- c(h_fit,h_next)
}

{
  plot(h_test~index_test, type = "l", ylab = expression(h[t]), col = 1)
  abline(h = w/(1-gamma),lty = 2)
  lines(h_fit~index_test,col = 3,lty = 1)
  legend("topleft",c("True","GARCH"), col = c(1,3))
  grid(10)
}
```

To see how the two compare, we can plot the realized versus the forecast.
```{r,fig.align="center"}
{
  plot(h_test~h_fit,pch = 20,cex = 0.5)
  abline(a=0,b=1,lty  = 2)
  grid(10)
}
```

In terms of regression, we have
```{r results = "asis",message=FALSE,warning=FALSE} 
lm_garch <- lm(h_test~h_fit)
stargazer::stargazer(lm_garch,type = "html",covariate.labels = "GARCH Forecast",dep.var.labels = "Actual Volatility")
```

We observe that the GARCH forecasts closely mimic the actual test data, i.e. intercept of zero, slope roughly 1, and $R^2$ of 56\%. Nonetheless, the main issue here is that the data set, by construction, follows a GARCH model. Hence, this should not come as a surprise. The more challenging and realistic task is whether the data in real-life is stationary and obeys to such data generating function. 


## Machine Learning Approach
A data-driven approach does not possess information about the underlying process of the data. Hence, an ML approach would learn the volatility behavior from the data without imposing a structural model. The task here is to investigate the capability of the ML to map the data into a volatility forecast and compare it with respect to its benchmark, the  GARCH model.  

I use a similar code from my previous [notes](https://rpubs.com/simaan84/ML_horse) for implementation of ML algorithms. In this case, I focus on the support vector machines (SVM) with a radial kernel. To do so, my feature space is constructed using the current process itself (squared) and the current log-realized volatility. The outcome variable, on the other hand, is given by the next period log-realized volatility. Note that we use a log transformation, followed by an exponential one, to make sure the fitted volatilities are non-negative.


Similar to the previous analysis, we split the data into in-sample and out-of-sample sets. Additionally, for a more efficient implementation, I refer to the `doParallel` package for parallel computing on an 8-core laptop as demonstrated below:
```{r,message=FALSE,warning=FALSE,fig.align="center"}
library(caret)
library(doParallel)

ds <- na.omit(data.frame(h = log(h_t), r = R_t^2, y = log(c(h_t[-1],NA)) ))
index <- 1:nrow(ds)
index_train2 <- index_train
index_train2 <- index_train2[-length(index_train2)] # avoid the look ahead observation
index_test <- index[!index %in% index_train2]

ds_train <- ds[index_train,]
ds_test <- ds[index_test,]

model_i <- "svmRadial"
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 5)

cl <- makePSOCKcluster(8)
registerDoParallel(cl)

train_model <- train(y~ h + r, data = ds_train, method = model_i,
                     trControl=trctrl,
                     preProcess = c("center", "scale"),
                     tuneLength = 10)

stopCluster(cl)
```

Given the trained model, we forecast the next period using the test dataset and use the exponential transformation to retrieve the volatility level. As a summary, we plot altogether:
```{r,fig.align="center"}
h_ml <- exp(predict(train_model,ds_test))
{
  plot(h_test~index_test,type = "l")
  lines(h_ml~index_test,col = 2)
  lines(h_fit~index_test,col = 3)
  legend("topleft",c("True","ML","GARCH"))
  grid(10)
}
```
From the above graph, we observe that the ML algorithm closely mimics the GARCH model in terms of forecasts. Since the data in our case is stationary, learning from historical data should be relevant for future applications. At the same time, we discern that the ML algorithm is able to uncover the auto-regressive component in the volatility process. 

In terms of regression,  we note the ML forecast closely corresponds to the GARCH statistics, with a slightly smaller $R^2$. This can be inferred from the following regression:
```{r results = "asis",message=FALSE,warning=FALSE} 
lm_ml <- lm(h_test~h_ml)
stargazer::stargazer(list(lm_ml,lm_garch),type = "html",covariate.labels = c("ML Forecast","GARCH Forecast"),dep.var.labels = "Actual Volatility")
```


# Concluding Remarks
The above experiment is conducted using simulated data from a GARCH model. For this, it is not surprising that the GARCH model provides a better approximation of future realized volatility. However, two major issues should be considered as a primary investigation. First, how does the result change if there is a structural break in the data? For instance, if the data is non-stationary, how does each model compare?  Second, what is the economic value of deploying ML versus GARCH models? 

For the first issue, we may add shocks to the training sample and test the sensitivity in the test set. In other words, which approach is more robust? We should get a better picture using real data. For the second issue, a portfolio selection problem is of high relevance. We can test the economic performance of global minimum portfolio constructed using either model. I leave both for future research.



