How to pick the unit trusts fund

Data Source - https://secure.fundsupermart.com/fsm/funds/fund-selector , export the CSV.

Sample Table.

You should have a similar excel graph like below.

Sample Excel Table.

Many people buy funds from short term horizon to long term horizon. For this purpose i will focus from 1-year returns model to 3-year returns model.

# load libraries
suppressWarnings(suppressMessages(library(rpart)))
suppressWarnings(suppressMessages(library(partykit)))

# download 
CART_data <- read.csv("2018.csv")

myvars <- names(CART_data) %in% c("Date", "X5.YR", "X10.YR", "X") 
newdata <- CART_data[!myvars]

#remove NA
newdata <- newdata[complete.cases(newdata),]

# keep this data for comparison later and derived fund name
write.csv(newdata, file = "MyData_year1.csv")
write.csv(newdata, file = "MyData_year2.csv")
write.csv(newdata, file = "MyData_year3.csv")


# remove FUND NAME as we going to perform algorithm
myvars <- names(newdata) %in% c("Fund.Name") 
newdata <- newdata[!myvars]
str(newdata)

## 'data.frame':    832 obs. of  12 variables:
##  $ Currency               : Factor w/ 8 levels "AUD","CNH","EUR",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ Risk.Rating            : int  4 9 8 8 10 1 8 7 1 10 ...
##  $ X3.YR.Risk.Return.Ratio: num  0.12 0.06 0.34 0.34 0.88 0.42 0.1 0.62 0.96 0.24 ...
##  $ X3.YR.Sharpe.Ratio     : num  -0.29 -0.11 0.25 0.19 0.73 -0.32 -0.04 0.39 0.11 0.02 ...
##  $ YTD                    : num  -4.52 3.22 1.87 1.69 3.09 -1.61 -3.6 -2.56 -1.75 -0.87 ...
##  $ X1.WK                  : num  -0.18 4.39 2.38 4.66 3.89 -0.2 3.9 0.96 -0.15 1.7 ...
##  $ X1.MTH                 : num  -0.34 0.73 -1.93 -0.34 -5.9 0.27 0.28 -0.95 0.15 0.07 ...
##  $ X3.MTH                 : num  -3.56 0.33 -0.89 -1.12 -1.75 -1.47 -5.68 -3 -1.6 -2.61 ...
##  $ X6.MTH                 : num  -5.81 8.63 3.19 7.74 4.91 -1.21 -3.38 0.32 -1.37 1.93 ...
##  $ X2.YR                  : num  -6.6 11.4 24.4 15.7 19.9 ...
##  $ X1.YR                  : num  0.74 8.5 24.95 16.11 23.58 ...
##  $ X3.YR                  : num  0.69 0.7 5.34 4.09 12.88 ...

Develop Decision Tree for 1-year returns model.

year1_model <- rpart(X1.YR ~., data = newdata, method = "anova", control=rpart.control(minsplit=60, minbucket=30, maxdepth=6))

plot(as.party(year1_model))

Develop Decision Tree for 2-year returns model.

year2_model <- rpart(X2.YR ~., data = newdata, method = "anova", control=rpart.control(minsplit=60, minbucket=30, maxdepth=6))
plot(as.party(year2_model))

Develop Decision Tree for 3-year returns model.

year3_model <- rpart(X3.YR ~., data = newdata, method = "anova", control=rpart.control(minsplit=60, minbucket=30, maxdepth=6))
plot(as.party(year3_model))

Question 1 - which node gave the highest returns ?

Look at the boxplot for each model.

print(year1_model)

## n= 832 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
##  1) root 832 46407.6000 11.321470  
##    2) X2.YR< 10 536 12316.9900  7.403396  
##      4) Risk.Rating< 6.5 289  2140.9080  4.660346  
##        8) X3.YR.Sharpe.Ratio< -0.195 71   129.9039  1.806197 *
##        9) X3.YR.Sharpe.Ratio>=-0.195 218  1244.2550  5.589908 *
##      5) Risk.Rating>=6.5 247  5457.2670 10.612870  
##       10) X2.YR< 0.115 52  1141.8690  6.234231 *
##       11) X2.YR>=0.115 195  3052.5680 11.780510  
##         22) X3.YR.Sharpe.Ratio< 0.045 48   364.7237  9.079375 *
##         23) X3.YR.Sharpe.Ratio>=0.045 147  2223.2740 12.662520 *
##    3) X2.YR>=10 296 10962.4500 18.416350  
##      6) X2.YR< 18.67 189  4342.3830 15.764870  
##       12) Risk.Rating< 7.5 33   402.4952 11.145760 *
##       13) Risk.Rating>=7.5 156  3086.8510 16.741990  
##         26) X3.YR< 4.18 52   850.5266 13.267500 *
##         27) X3.YR>=4.18 104  1294.7040 18.479230 *
##      7) X2.YR>=18.67 107  2944.3050 23.099810  
##       14) X2.YR< 26.12 69  1150.1220 21.158840 *
##       15) X2.YR>=26.12 38  1062.2220 26.624210 *

print(year2_model)

## n= 832 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
##  1) root 832 78894.3200  7.528630  
##    2) X1.YR< 14.72 578 21849.0800  2.978512  
##      4) X6.MTH< -1.895 239  4746.3570 -1.742929  
##        8) X1.YR< 6.975 161  1784.9600 -3.508758 *
##        9) X1.YR>=6.975 78  1423.1490  1.901923 *
##      5) X6.MTH>=-1.895 339  8018.7600  6.307198  
##       10) X6.MTH< 1.115 252  3862.1510  4.942063  
##         20) X3.YR.Sharpe.Ratio< -0.005 77   387.1665  1.752597 *
##         21) X3.YR.Sharpe.Ratio>=-0.005 175  2347.0360  6.345429 *
##       11) X6.MTH>=1.115 87  2326.6920 10.261380 *
##    3) X1.YR>=14.72 254 17847.3500 17.882830  
##      6) X1.YR< 24.255 202  7233.5280 15.356340  
##       12) X6.MTH< 2.845 105  2311.1330 12.030950 *
##       13) X6.MTH>=2.845 97  2504.4160 18.955980 *
##      7) X1.YR>=24.255 52  4315.5760 27.697310 *

print(year3_model)

## n= 832 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
##  1) root 832 7515.55900  4.068930  
##    2) X3.YR.Sharpe.Ratio< 0.285 607 1252.12300  2.714778  
##      4) X3.YR.Sharpe.Ratio< 0.095 337  209.60420  1.662967  
##        8) X3.YR.Sharpe.Ratio< -0.005 208   79.97529  1.224279 *
##        9) X3.YR.Sharpe.Ratio>=-0.005 129   25.05679  2.370310 *
##      5) X3.YR.Sharpe.Ratio>=0.095 270  204.35370  4.027593  
##       10) X3.YR.Sharpe.Ratio< 0.195 145   27.41739  3.413724 *
##       11) X3.YR.Sharpe.Ratio>=0.195 125   58.91179  4.739680 *
##    3) X3.YR.Sharpe.Ratio>=0.285 225 2147.53600  7.722133  
##      6) X3.YR.Sharpe.Ratio< 0.665 195  477.72980  6.916821  
##       12) Risk.Rating< 7.5 44   42.34265  5.351818 *
##       13) Risk.Rating>=7.5 151  296.21890  7.372848  
##         26) X3.YR.Sharpe.Ratio< 0.405 93   58.84719  6.578495 *
##         27) X3.YR.Sharpe.Ratio>=0.405 58   84.59431  8.646552 *
##      7) X3.YR.Sharpe.Ratio>=0.665 30  721.33290 12.956670 *

Question 2 - what is the lowest value and highest return for each model node ?

Refer to earlier spreadsheet (mydata_year1.csv, mydata_year2.csv, mydata_year3.csv respectively for each model) and follow the decisions rules.

Sort the returns accordingly.

Question 3 - what are the fund names for each model with highest returns ?

Refer to earlier spreadsheets created and follow the decisions rules. Use the Filtering Function in Excel, following the Decision Tree numbers

And you will derive the fund name. (mydata_year1.csv, mydata_year2.csv, mydata_year3.csv respectively for each model)

Question 4 - is the split most efficient ? Are these model RELIABLE ?

The higher the R-SQUARE, the most reliable.

To calculate R-SQUARE, Look for the column “rel error”" - this is the undefined VARIANCES.

R-SQUARE is calculated as 1 minus “rel error”.

rsq.rpart(year1_model)

## 
## Regression tree:
## rpart(formula = X1.YR ~ ., data = newdata, method = "anova", 
##     control = rpart.control(minsplit = 60, minbucket = 30, maxdepth = 6))
## 
## Variables actually used in tree construction:
## [1] Risk.Rating        X2.YR              X3.YR             
## [4] X3.YR.Sharpe.Ratio
## 
## Root node error: 46408/832 = 55.778
## 
## n= 832 
## 
##         CP nsplit rel error  xerror     xstd
## 1 0.498370      0   1.00000 1.00177 0.046205
## 2 0.101682      1   0.50163 0.50892 0.027627
## 3 0.079206      2   0.39995 0.40733 0.026296
## 4 0.027212      3   0.32074 0.34300 0.021796
## 5 0.019336      4   0.29353 0.32517 0.020296
## 6 0.016522      6   0.25486 0.31173 0.018859
## 7 0.015772      7   0.23834 0.28871 0.018358
## 8 0.010011      8   0.22256 0.27474 0.017419
## 9 0.010000      9   0.21255 0.26994 0.016730

rsq.rpart(year2_model)

## 
## Regression tree:
## rpart(formula = X2.YR ~ ., data = newdata, method = "anova", 
##     control = rpart.control(minsplit = 60, minbucket = 30, maxdepth = 6))
## 
## Variables actually used in tree construction:
## [1] X1.YR              X3.YR.Sharpe.Ratio X6.MTH            
## 
## Root node error: 78894/832 = 94.825
## 
## n= 832 
## 
##         CP nsplit rel error  xerror     xstd
## 1 0.496840      0   1.00000 1.00221 0.062551
## 2 0.115141      1   0.50316 0.51191 0.033980
## 3 0.079831      2   0.38802 0.39799 0.031721
## 4 0.030648      3   0.30819 0.35112 0.023628
## 5 0.023195      4   0.27754 0.32261 0.022204
## 6 0.019498      5   0.25434 0.29633 0.019932
## 7 0.014297      6   0.23485 0.26509 0.019358
## 8 0.010000      7   0.22055 0.25363 0.019250

rsq.rpart(year3_model)

## 
## Regression tree:
## rpart(formula = X3.YR ~ ., data = newdata, method = "anova", 
##     control = rpart.control(minsplit = 60, minbucket = 30, maxdepth = 6))
## 
## Variables actually used in tree construction:
## [1] Risk.Rating        X3.YR.Sharpe.Ratio
## 
## Root node error: 7515.6/832 = 9.0331
## 
## n= 832 
## 
##         CP nsplit rel error  xerror     xstd
## 1 0.547651      0   1.00000 1.00115 0.098268
## 2 0.126201      1   0.45235 0.48283 0.060992
## 3 0.111524      2   0.32615 0.36514 0.031767
## 4 0.019423      3   0.21462 0.25433 0.028460
## 5 0.015704      5   0.17578 0.22950 0.027918
## 6 0.013914      6   0.16007 0.21606 0.027886
## 7 0.010000      7   0.14616 0.20725 0.027919

Do you always go for the HIGHEST returns ?

No ! This is because you will realize those funds with highest returns are of highest RISK class.

How do you choose a fund with less volatile assuming SAME RISK CLASS ?

Let’s use the decision rule of Model 3 on fund selection.

I first open mydata_year3.csv.
Apply decision rule based on Model 3, which it will show the various fund.
Model 3 rule 1 says “Pick X3.YR.Sharpe.Ratio > 0.285”

Pick X3.YR.Sharpe.Ratio > 0.285

Model 3 rule 2 says “Pick X3.YR.Sharpe.Ratio > 0.665”

Pick X3.YR.Sharpe.Ratio > 0.665

Sort the Year 3 returns from ascending to descending in EXCEL. From there, you can get the minimum, maximum and average returns of each fund.

Sort within Excel

Since Risk 10 gave the highest returns, let’s pick a Risk-10 FUND and choose a fund with LESS VOLATILE in terms of returns from year 1 to year 3.
Filter Risk Rating = 10
Calcute the mean, standard deivation and then sort by Coefficient by Variance.

Funds sorted with less volatile funds

Conclusion For Model-3 Decision Making Rules.

From excel above, you can select a fund with least volatile by using basic statistics. Example Coefficent of Variance.

Please refer to this youtube on why standard deviation is different from Coefficent of Variance.- https://www.youtube.com/watch?v=DsYbik-Djbk

RPART_REGRESSION_TREE

James Lim

September 25, 2018