Libraries Used

library(Quandl) #to get VIX and Oil Price data
library(randomForest) # to predict net sales 
library(plotly) #better visualization
library(rpart) # Recursive Partitioning and Regression Trees (RPART)
library(rattle) #fancy R plot
library(caret) #Classification and Regression Training (CARET)
library(caretEnsemble) #package of machine learning models 


Business Objective

Show how the net sales of Hong Kong Equity Fund and Global Bond Fund will change if the Oil Price and VIX increases and decreases by 5%.

Data Exploration

Target Variables:

These historical target variables data were obtained from the Hong Kong Investment Funds Association (HKIFA)

Features:

Data for features were convenient obtained using the Quandl package (getsymbols) from R.

Time Frame & Frequency

Change Instead of Absolute

head(RawData)
##         Date   HKEF     GBF   VIX Oil_Price
## 1 2010-12-31  60.56  -39.39 17.15     93.23
## 2 2011-01-31 128.34 -112.56 19.53     98.97
## 3 2011-02-28 100.69  -47.83 18.35    112.27
## 4 2011-03-31  66.58   12.43 17.74    116.94
## 5 2011-04-29 -33.78   17.25 14.75    126.59
## 6 2011-05-31  61.66   72.28 15.45    117.18
head(Data)
##         Date        HKEF        GBF         VIX   Oil_Price
## 1 2011-01-31  1.11922061  1.8575781  0.13877551  0.06156817
## 2 2011-02-28 -0.21544335 -0.5750711 -0.06041987  0.13438416
## 3 2011-03-31 -0.33876254 -1.2598787 -0.03324251  0.04159615
## 4 2011-04-29 -1.50735957  0.3877715 -0.16854566  0.08252095
## 5 2011-05-31 -2.82534044  3.1901449  0.04745763 -0.07433447
## 6 2011-06-30 -0.02854363  1.9673492  0.06925566 -0.04668032

Visualization

Preliminary visualization helps us understand the data better before coming out with an estimation model.

HKEF

Price of the HKEF is plotted against the Oil Price and VIX to have an idea of it’s movements.



GBF

Price of the GBF is plotted against the Oil Price and VIX to have an idea of it’s movements.



Insights

From visualizing our data, we have an idea that:

  • Variables are not linear
    • So that we dont use linear models to estimate our data
    • Generalized Linear Model (GLM), etc

\(\qquad \qquad \qquad \qquad y\quad =\quad { \beta }_{ 0 }\quad +\quad { \beta }_{ 1 }{ X }_{ 1 }\quad +\quad { \beta }_{ 2 }{ X }_{ 2 }\)

  • Movement of target variables
    • HKEF and GBF moves in opposite direction with the features
    • Stronger validation and provides an intuition to our estimated answer


Predicting the Change

Based on our findings from the preliminary visualization, we can see that our data is not linear. Hence, it is best we avoid using linear regression models which are invalid.

Tree Based Regression

Tree based models does not assume linearity in data. In fact, a tree based model maps observations about an item (branches) to conclusions about the item’s target value (leaves). Think of it as the motherload of nested if-else statements.

Example of estimating employment with tree depth of 3:

How Does it Compare ?

The diagrams shows an attempt using linear regression and tree based regression. Decision Tree vs Linear Regression

Random Forest

  • Grows multiple trees as far as possible
  • Randomness to prevent building the same tree
  • Helps prevent overfitting by using multiple trees and emsembling

HKEF & GBF

Code preview and predicted change:

fit <- randomForest(HKEF ~ VIX + Oil_Price, data = Data, ntree=2000, importance = TRUE)
VIX <- c(0.05, -0.05)
Oil_Price <- c(0.05, -0.05)
change <- data.frame("Change in VIX" = VIX, "Change in Oil_Price" = Oil_Price )
predictions <- predict(fit, change)
change$Change.in.HKEF <- predictions

fit2 <- randomForest(GBF ~ VIX + Oil_Price, data = Data, ntree=2000)
change2 <- data.frame("Change in VIX" = VIX, "Change in Oil_Price" = Oil_Price )
predictions2 <- predict(fit2, change2)
change$Change.in.GBF <- predictions2

Prediction

##   Change.in.VIX Change.in.Oil_Price Change.in.HKEF Change.in.GBF
## 1          0.05                0.05           0.22         -0.10
## 2         -0.05               -0.05           0.73          0.47

Evaluation

Calculating the Mean Absolute Error (MAE) for predicting HKEF & GBF

#HKEF
mae <- mean(abs(Data$HKEF - predict(fit, Data[, 4:5])))
mae
## [1] 2.672729
#GBF
mae <- mean(abs(Data$GBF - predict(fit2, Data[ , 4:5])))
mae
## [1] 0.886063


What Now ?

Despite the errors being large due to volatility of the target variables:

Updated Dataset

##       Date        HKEF         GBF         VIX   VIX_last2   VIX_last3
## 1  5/31/11 -2.82534044  3.19014493  0.04745763 -0.16854566 -0.03324251
## 2  6/30/11 -0.02854363  1.96734920  0.06925566  0.04745763 -0.16854566
## 3  7/29/11 -0.09899833  0.08373741  0.52845036  0.06925566  0.04745763
## 4  8/31/11  0.20511395 -0.48958871  0.25227723  0.52845036  0.06925566
## 5  9/30/11 -0.92220172 -0.14430209  0.35863378  0.25227723  0.52845036
## 6 10/31/11 -5.39328063  0.56648936 -0.30260708  0.35863378  0.25227723
##     VIX_last4   VIX_last5 VIX_3months_average VIX_5months_average
## 1 -0.06041987  0.13877551         -0.05144351         -0.01519498
## 2 -0.03324251 -0.06041987         -0.01727746         -0.02909895
## 3 -0.16854566 -0.03324251          0.21505455          0.08867510
## 4  0.04745763 -0.16854566          0.28332775          0.14577904
## 5  0.06925566  0.04745763          0.37978712          0.25121493
## 6  0.52845036  0.06925566          0.10276798          0.18120199
##      Oil_Price Oil_Price_last2 Oil_Price_last3 Oil_Price_last4
## 1 -0.074334466     0.082520951     0.041596152      0.13438416
## 2 -0.046680321    -0.074334466     0.082520951      0.04159615
## 3  0.037776385    -0.046680321    -0.074334466      0.08252095
## 4  0.004744242     0.037776385    -0.046680321     -0.07433447
## 5 -0.094951923     0.004744242     0.037776385     -0.04668032
## 6  0.028552457    -0.094951923     0.004744242      0.03777639
##   Oil_Price_last5 Oil_Price_3months_average Oil_Price_5months_average
## 1      0.06156817               0.016594212               0.049146992
## 2      0.13438416              -0.012831279               0.027497295
## 3      0.04159615              -0.027746134               0.008175740
## 4      0.08252095              -0.001386565               0.000805358
## 5     -0.07433447              -0.017477099              -0.034689217
## 6     -0.04668032              -0.020551741              -0.014111832


Using CARET

Stands for Classification and Regression Training (CARET).

  • consist of many famous machine learning packages
  • call different machine learning models such as xgbtree, knn, random forest, extraTrees
  • Choose the weightage that gives the lowest Mean Absolute Error (MAE)

HKEF

##               rf rf.1 xgbLinear xgbTree  kknn extraTrees
## rf          1.00 0.43      0.58    0.57 -0.08       0.35
## rf.1        0.43 1.00      0.25    0.44  0.29       0.72
## xgbLinear   0.58 0.25      1.00    0.34 -0.23       0.35
## xgbTree     0.57 0.44      0.34    1.00 -0.27       0.71
## kknn       -0.08 0.29     -0.23   -0.27  1.00      -0.02
## extraTrees  0.35 0.72      0.35    0.71 -0.02       1.00
## A glmnet ensemble of 2 base models: rf, rf, xgbLinear, xgbTree, kknn, extraTrees
## 
## Ensemble results:
## glmnet 
## 
## 46 samples
##  6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 42, 41, 41, 41, 42, 42, ... 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        MAE      
##   0.10   0.0009849867  0.4325490
##   0.10   0.0098498665  0.4325490
##   0.10   0.0984986654  0.4437686
##   0.55   0.0009849867  0.5798329
##   0.55   0.0098498665  0.5798329
##   0.55   0.0984986654  0.5754898
##   1.00   0.0009849867  0.5794388
##   1.00   0.0098498665  0.5807361
##   1.00   0.0984986654  0.5276956
## 
## MAE was used to select the optimal model using  the smallest value.
## The final values used for the model were alpha = 0.1 and lambda
##  = 0.009849867.

HKEF Prediction

##   Change.in.VIX Change.in.Oil_Price Change.in.HKEF
## 1          0.05                0.05    -0.41790984
## 2         -0.05               -0.05     0.01335041


GBF

##               rf rf.1 xgbLinear xgbTree kknn extraTrees
## rf          1.00 0.89     -0.16    0.74 0.37       0.60
## rf.1        0.89 1.00      0.07    0.56 0.58       0.79
## xgbLinear  -0.16 0.07      1.00    0.11 0.09       0.12
## xgbTree     0.74 0.56      0.11    1.00 0.23       0.16
## kknn        0.37 0.58      0.09    0.23 1.00       0.73
## extraTrees  0.60 0.79      0.12    0.16 0.73       1.00
## A glmnet ensemble of 2 base models: rf, rf, xgbLinear, xgbTree, kknn, extraTrees
## 
## Ensemble results:
## glmnet 
## 
## 46 samples
##  6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 40, 42, 41, 41, 41, 42, ... 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        MAE      
##   0.10   0.0007280638  0.3244459
##   0.10   0.0072806380  0.3244459
##   0.10   0.0728063801  0.3244459
##   0.55   0.0007280638  0.3177243
##   0.55   0.0072806380  0.3177243
##   0.55   0.0728063801  0.3092113
##   1.00   0.0007280638  0.3304099
##   1.00   0.0072806380  0.3292084
##   1.00   0.0728063801  0.2999794
## 
## MAE was used to select the optimal model using  the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.07280638.

GBF Prediction

##   Change.in.VIX Change.in.Oil_Price Change.in.GBF
## 1          0.05                0.05   -0.04026385
## 2         -0.05               -0.05    0.12993277


Conclusion

Though we managed to answer the business objective: