Libraries Used

library(Quandl) #to get VIX and Oil Price data
library(randomForest) # to predict net sales 
library(plotly) #better visualization
library(rpart) # Recursive Partitioning and Regression Trees (RPART)
library(rattle) #fancy R plot
library(caret) #Classification and Regression Training (CARET)
library(caretEnsemble) #package of machine learning models

Business Objective

Show how the net sales of Hong Kong Equity Fund and Global Bond Fund will change if the Oil Price and VIX increases and decreases by 5%.

Data Exploration

Target Variables:

These historical target variables data were obtained from the Hong Kong Investment Funds Association (HKIFA)

Hong Kong Equity Fund (HKEF)
Global Bond Fund (GBF)

Features:

Data for features were convenient obtained using the Quandl package (getsymbols) from R.

Oil Price
Volatility Index (VIX)

Time Frame & Frequency

Past 5 years
- Exclude too old data, not reflective of present changes
Monthly
- Daily data not available for target variable and annually have too little data points

Change Instead of Absolute

The dataset has been manipulated into its percentage change as the business objective specifies the change instead of the absolute value.

head(RawData)

##         Date   HKEF     GBF   VIX Oil_Price
## 1 2010-12-31  60.56  -39.39 17.15     93.23
## 2 2011-01-31 128.34 -112.56 19.53     98.97
## 3 2011-02-28 100.69  -47.83 18.35    112.27
## 4 2011-03-31  66.58   12.43 17.74    116.94
## 5 2011-04-29 -33.78   17.25 14.75    126.59
## 6 2011-05-31  61.66   72.28 15.45    117.18

head(Data)

##         Date        HKEF        GBF         VIX   Oil_Price
## 1 2011-01-31  1.11922061  1.8575781  0.13877551  0.06156817
## 2 2011-02-28 -0.21544335 -0.5750711 -0.06041987  0.13438416
## 3 2011-03-31 -0.33876254 -1.2598787 -0.03324251  0.04159615
## 4 2011-04-29 -1.50735957  0.3877715 -0.16854566  0.08252095
## 5 2011-05-31 -2.82534044  3.1901449  0.04745763 -0.07433447
## 6 2011-06-30 -0.02854363  1.9673492  0.06925566 -0.04668032

Visualization

Preliminary visualization helps us understand the data better before coming out with an estimation model.

HKEF

Price of the HKEF is plotted against the Oil Price and VIX to have an idea of it’s movements.

GBF

Price of the GBF is plotted against the Oil Price and VIX to have an idea of it’s movements.

Insights

From visualizing our data, we have an idea that:

Variables are not linear
- So that we dont use linear models to estimate our data
- Generalized Linear Model (GLM), etc

\(\qquad \qquad \qquad \qquad y\quad =\quad { \beta }_{ 0 }\quad +\quad { \beta }_{ 1 }{ X }_{ 1 }\quad +\quad { \beta }_{ 2 }{ X }_{ 2 }\)

Movement of target variables
- HKEF and GBF moves in opposite direction with the features
- Stronger validation and provides an intuition to our estimated answer

Predicting the Change

Based on our findings from the preliminary visualization, we can see that our data is not linear. Hence, it is best we avoid using linear regression models which are invalid.

Tree Based Regression

Tree based models does not assume linearity in data. In fact, a tree based model maps observations about an item (branches) to conclusions about the item’s target value (leaves). Think of it as the motherload of nested if-else statements.

Example of estimating employment with tree depth of 3:

How Does it Compare ?

The diagrams shows an attempt using linear regression and tree based regression. Decision Tree vs Linear Regression

Random Forest

Grows multiple trees as far as possible
Randomness to prevent building the same tree
Helps prevent overfitting by using multiple trees and emsembling

HKEF & GBF

Code preview and predicted change:

fit <- randomForest(HKEF ~ VIX + Oil_Price, data = Data, ntree=2000, importance = TRUE)
VIX <- c(0.05, -0.05)
Oil_Price <- c(0.05, -0.05)
change <- data.frame("Change in VIX" = VIX, "Change in Oil_Price" = Oil_Price )
predictions <- predict(fit, change)
change$Change.in.HKEF <- predictions

fit2 <- randomForest(GBF ~ VIX + Oil_Price, data = Data, ntree=2000)
change2 <- data.frame("Change in VIX" = VIX, "Change in Oil_Price" = Oil_Price )
predictions2 <- predict(fit2, change2)
change$Change.in.GBF <- predictions2

Prediction

##   Change.in.VIX Change.in.Oil_Price Change.in.HKEF Change.in.GBF
## 1          0.05                0.05           0.22         -0.10
## 2         -0.05               -0.05           0.73          0.47

Evaluation

Calculating the Mean Absolute Error (MAE) for predicting HKEF & GBF

#HKEF
mae <- mean(abs(Data$HKEF - predict(fit, Data[, 4:5])))
mae

## [1] 2.672729

#GBF
mae <- mean(abs(Data$GBF - predict(fit2, Data[ , 4:5])))
mae

## [1] 0.886063

What Now ?

Despite the errors being large due to volatility of the target variables:

feature engineering using VIX and Oil Price. i.e, lags, VIX_2monthslag, OilPrice2monthslag
predict using other models(non linear) -

Updated Dataset

##       Date        HKEF         GBF         VIX   VIX_last2   VIX_last3
## 1  5/31/11 -2.82534044  3.19014493  0.04745763 -0.16854566 -0.03324251
## 2  6/30/11 -0.02854363  1.96734920  0.06925566  0.04745763 -0.16854566
## 3  7/29/11 -0.09899833  0.08373741  0.52845036  0.06925566  0.04745763
## 4  8/31/11  0.20511395 -0.48958871  0.25227723  0.52845036  0.06925566
## 5  9/30/11 -0.92220172 -0.14430209  0.35863378  0.25227723  0.52845036
## 6 10/31/11 -5.39328063  0.56648936 -0.30260708  0.35863378  0.25227723
##     VIX_last4   VIX_last5 VIX_3months_average VIX_5months_average
## 1 -0.06041987  0.13877551         -0.05144351         -0.01519498
## 2 -0.03324251 -0.06041987         -0.01727746         -0.02909895
## 3 -0.16854566 -0.03324251          0.21505455          0.08867510
## 4  0.04745763 -0.16854566          0.28332775          0.14577904
## 5  0.06925566  0.04745763          0.37978712          0.25121493
## 6  0.52845036  0.06925566          0.10276798          0.18120199
##      Oil_Price Oil_Price_last2 Oil_Price_last3 Oil_Price_last4
## 1 -0.074334466     0.082520951     0.041596152      0.13438416
## 2 -0.046680321    -0.074334466     0.082520951      0.04159615
## 3  0.037776385    -0.046680321    -0.074334466      0.08252095
## 4  0.004744242     0.037776385    -0.046680321     -0.07433447
## 5 -0.094951923     0.004744242     0.037776385     -0.04668032
## 6  0.028552457    -0.094951923     0.004744242      0.03777639
##   Oil_Price_last5 Oil_Price_3months_average Oil_Price_5months_average
## 1      0.06156817               0.016594212               0.049146992
## 2      0.13438416              -0.012831279               0.027497295
## 3      0.04159615              -0.027746134               0.008175740
## 4      0.08252095              -0.001386565               0.000805358
## 5     -0.07433447              -0.017477099              -0.034689217
## 6     -0.04668032              -0.020551741              -0.014111832

Using CARET

Stands for Classification and Regression Training (CARET).

consist of many famous machine learning packages
call different machine learning models such as xgbtree, knn, random forest, extraTrees
Choose the weightage that gives the lowest Mean Absolute Error (MAE)

HKEF

##               rf rf.1 xgbLinear xgbTree  kknn extraTrees
## rf          1.00 0.43      0.58    0.57 -0.08       0.35
## rf.1        0.43 1.00      0.25    0.44  0.29       0.72
## xgbLinear   0.58 0.25      1.00    0.34 -0.23       0.35
## xgbTree     0.57 0.44      0.34    1.00 -0.27       0.71
## kknn       -0.08 0.29     -0.23   -0.27  1.00      -0.02
## extraTrees  0.35 0.72      0.35    0.71 -0.02       1.00

## A glmnet ensemble of 2 base models: rf, rf, xgbLinear, xgbTree, kknn, extraTrees
## 
## Ensemble results:
## glmnet 
## 
## 46 samples
##  6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 42, 41, 41, 41, 42, 42, ... 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        MAE      
##   0.10   0.0009849867  0.4325490
##   0.10   0.0098498665  0.4325490
##   0.10   0.0984986654  0.4437686
##   0.55   0.0009849867  0.5798329
##   0.55   0.0098498665  0.5798329
##   0.55   0.0984986654  0.5754898
##   1.00   0.0009849867  0.5794388
##   1.00   0.0098498665  0.5807361
##   1.00   0.0984986654  0.5276956
## 
## MAE was used to select the optimal model using  the smallest value.
## The final values used for the model were alpha = 0.1 and lambda
##  = 0.009849867.

HKEF Prediction

##   Change.in.VIX Change.in.Oil_Price Change.in.HKEF
## 1          0.05                0.05    -0.41790984
## 2         -0.05               -0.05     0.01335041

GBF

##               rf rf.1 xgbLinear xgbTree kknn extraTrees
## rf          1.00 0.89     -0.16    0.74 0.37       0.60
## rf.1        0.89 1.00      0.07    0.56 0.58       0.79
## xgbLinear  -0.16 0.07      1.00    0.11 0.09       0.12
## xgbTree     0.74 0.56      0.11    1.00 0.23       0.16
## kknn        0.37 0.58      0.09    0.23 1.00       0.73
## extraTrees  0.60 0.79      0.12    0.16 0.73       1.00

## A glmnet ensemble of 2 base models: rf, rf, xgbLinear, xgbTree, kknn, extraTrees
## 
## Ensemble results:
## glmnet 
## 
## 46 samples
##  6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 40, 42, 41, 41, 41, 42, ... 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        MAE      
##   0.10   0.0007280638  0.3244459
##   0.10   0.0072806380  0.3244459
##   0.10   0.0728063801  0.3244459
##   0.55   0.0007280638  0.3177243
##   0.55   0.0072806380  0.3177243
##   0.55   0.0728063801  0.3092113
##   1.00   0.0007280638  0.3304099
##   1.00   0.0072806380  0.3292084
##   1.00   0.0728063801  0.2999794
## 
## MAE was used to select the optimal model using  the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.07280638.

GBF Prediction

##   Change.in.VIX Change.in.Oil_Price Change.in.GBF
## 1          0.05                0.05   -0.04026385
## 2         -0.05               -0.05    0.12993277

Conclusion

Though we managed to answer the business objective:

HKEF and GBF data are very volatile
VIX and Oil Price are not really good predictors
By feature engineering and using CARET, we reduced the errors of our prediction model
Outside of question scope, try other features such as HSI and interest rate, which might be better predictors

Fidelity Mini Exercise Using R

Jason Chan

October 31, 2016

Libraries Used

Business Objective

Data Exploration

Visualization

HKEF

GBF

Insights

Predicting the Change

Tree Based Regression

How Does it Compare ?

Random Forest

HKEF & GBF

Prediction

Evaluation

What Now ?

Updated Dataset

Using CARET

HKEF

HKEF Prediction

GBF

GBF Prediction

Conclusion