Estimating Value-at-Risk using Quantile Forest

December 8, 2017

Key Points

Financial risk management is an important tool used by government regulators and companies to manage the risk of its operations.
Value-at-Risk is a popular measure to quantify downside risk, i.e. possibility that a particular asset loses the share of its value given some probability. It uses "rainy day losses" on contrary to "expected losses".
Value-at-Risk is a measure based on the distribution of losses, thus its estimation was always coupled with statistical inference. We treat it as a feature of the data and extract information about it using Random Forest.
We find 5% improvement in accuracy compared with traditional measure.

VaR was first introduced in by Jorion (2006). There are a range of literature that uses EVT and multivariate statistics to study it.
To estimate VaR, we use Random Forest and bagging models proposed by Breiman (1996) and Breiman (2001). Quantile modification is done by Meinshausen (2006).
To my knowledge, there are no research on VaR using Random Forest. Uryasev et al (2014) uses SVM to estimate VaR.

Statistically, VaR is a left quantile of a distribution of losses.
The challenge of VaR estimation is that the distribution is normally unknown.
Most common way is to assume a distribution family, estimate the parameters, and find the quantile.

The initial assumption of distributional family is often wrong. Beware of heavy tails!
Estimating a whole distribution, when only quantile is needed is a waste of resources.
Most of the financial data is time series with its own baggage of problems.
Most of the financial data is also highly correlated, making univariate estimation a difficult task.

The random forest approach hinges on two assumptions: bagging and random features selection.
Bagging builds a set of predictors using iterative algorithm on the sample of the data with replacement.
Bagging helps to improve stability and accuracy by averaging out the predictors.
Random features selection allows to de-correlate features to improve the quality of prediction.
Normally, we average out (or take a mode of) the predictors in random forest. However, in our case we need to acquire a quantile. Thus, we can 1% lowest predictor as our estimate for VaR.

The challenge of testing VaR out-of-sample is that we only observe the realized loss, not the potential 1% lowest loss. Thus, we cannot test it directly against the data.
We propose a bootstrapping mechanism where we sample a subset of observations from the test set.
Realized VaR can be calculated from there, which we can compare with an average VaR predicted across observations.

Estimate the log returns of the equities to decrease volatility and get rid of a trend component.
Remove seasonality by decomposing the time series into random component.
Partition dataset into training and test component.
Train the data by repeatadly estimating a fully-grown CART tree on bootstrapped samples and randomly selected set of features.
To estimate out-of-sample error, use the test set to create \(N\) size samples with replacement and estimate the mean of the prediction.
Compare the mean of the quantile prediction with 1%th order statistic.

We use the data from top5 (by capitalization) companies' common stocks to estimate VaR: Microsoft, J&J, Facebook, Amazon, Berkshire.
Feature set includes
- Traditional market indicators and exchange trading volumes.
- Personal characteristic of the companies.
- Commodity markets.
- Interest rates.
All data is publicly available from financial databases.