December 8, 2017
Key Points
- Financial risk management is an important tool used by government regulators and companies to manage the risk of its operations.
- Value-at-Risk is a popular measure to quantify downside risk, i.e. possibility that a particular asset loses the share of its value given some probability. It uses "rainy day losses" on contrary to "expected losses".
- Value-at-Risk is a measure based on the distribution of losses, thus its estimation was always coupled with statistical inference. We treat it as a feature of the data and extract information about it using Random Forest.
- We find 5% improvement in accuracy compared with traditional measure.
Background
- VaR was first introduced in by Jorion (2006). There are a range of literature that uses EVT and multivariate statistics to study it.
- To estimate VaR, we use Random Forest and bagging models proposed by Breiman (1996) and Breiman (2001). Quantile modification is done by Meinshausen (2006).
- To my knowledge, there are no research on VaR using Random Forest. Uryasev et al (2014) uses SVM to estimate VaR.
Introduction to VaR

- Statistically, VaR is a left quantile of a distribution of losses.
- The challenge of VaR estimation is that the distribution is normally unknown.
- Most common way is to assume a distribution family, estimate the parameters, and find the quantile.
Why Common Approach Does Not Work
- The initial assumption of distributional family is often wrong. Beware of heavy tails!
- Estimating a whole distribution, when only quantile is needed is a waste of resources.
- Most of the financial data is time series with its own baggage of problems.
- Most of the financial data is also highly correlated, making univariate estimation a difficult task.
Random Forest
- The random forest approach hinges on two assumptions: bagging and random features selection.
- Bagging builds a set of predictors using iterative algorithm on the sample of the data with replacement.
- Bagging helps to improve stability and accuracy by averaging out the predictors.
- Random features selection allows to de-correlate features to improve the quality of prediction.
- Normally, we average out (or take a mode of) the predictors in random forest. However, in our case we need to acquire a quantile. Thus, we can 1% lowest predictor as our estimate for VaR.
Out-of-sample Testing
- The challenge of testing VaR out-of-sample is that we only observe the realized loss, not the potential 1% lowest loss. Thus, we cannot test it directly against the data.
- We propose a bootstrapping mechanism where we sample a subset of observations from the test set.
- Realized VaR can be calculated from there, which we can compare with an average VaR predicted across observations.
Procedure
- Estimate the log returns of the equities to decrease volatility and get rid of a trend component.
- Remove seasonality by decomposing the time series into random component.
- Partition dataset into training and test component.
- Train the data by repeatadly estimating a fully-grown CART tree on bootstrapped samples and randomly selected set of features.
- To estimate out-of-sample error, use the test set to create \(N\) size samples with replacement and estimate the mean of the prediction.
- Compare the mean of the quantile prediction with 1%th order statistic.
Data Description
- We use the data from top5 (by capitalization) companies' common stocks to estimate VaR: Microsoft, J&J, Facebook, Amazon, Berkshire.
- Feature set includes
- Traditional market indicators and exchange trading volumes.
- Personal characteristic of the companies.
- Commodity markets.
- Interest rates.
- All data is publicly available from financial databases.
Time Series Plot

Pairs Graph

Box Plot

SE Comparison

Thank you