Data Source - https://secure.fundsupermart.com/fsm/funds/fund-selector , export the CSV.
Sample Table.
You should have a similar excel graph like below.
Sample Excel Table.
# load libraries
suppressWarnings(suppressMessages(library(rpart)))
suppressWarnings(suppressMessages(library(partykit)))
# download
CART_data <- read.csv("2018.csv")
myvars <- names(CART_data) %in% c("Date", "X5.YR", "X10.YR", "X")
newdata <- CART_data[!myvars]
#remove NA
newdata <- newdata[complete.cases(newdata),]
# keep this data for comparison later and derived fund name
write.csv(newdata, file = "MyData_year1.csv")
write.csv(newdata, file = "MyData_year2.csv")
write.csv(newdata, file = "MyData_year3.csv")
# remove FUND NAME as we going to perform algorithm
myvars <- names(newdata) %in% c("Fund.Name")
newdata <- newdata[!myvars]
str(newdata)
## 'data.frame': 832 obs. of 12 variables:
## $ Currency : Factor w/ 8 levels "AUD","CNH","EUR",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ Risk.Rating : int 4 9 8 8 10 1 8 7 1 10 ...
## $ X3.YR.Risk.Return.Ratio: num 0.12 0.06 0.34 0.34 0.88 0.42 0.1 0.62 0.96 0.24 ...
## $ X3.YR.Sharpe.Ratio : num -0.29 -0.11 0.25 0.19 0.73 -0.32 -0.04 0.39 0.11 0.02 ...
## $ YTD : num -4.52 3.22 1.87 1.69 3.09 -1.61 -3.6 -2.56 -1.75 -0.87 ...
## $ X1.WK : num -0.18 4.39 2.38 4.66 3.89 -0.2 3.9 0.96 -0.15 1.7 ...
## $ X1.MTH : num -0.34 0.73 -1.93 -0.34 -5.9 0.27 0.28 -0.95 0.15 0.07 ...
## $ X3.MTH : num -3.56 0.33 -0.89 -1.12 -1.75 -1.47 -5.68 -3 -1.6 -2.61 ...
## $ X6.MTH : num -5.81 8.63 3.19 7.74 4.91 -1.21 -3.38 0.32 -1.37 1.93 ...
## $ X2.YR : num -6.6 11.4 24.4 15.7 19.9 ...
## $ X1.YR : num 0.74 8.5 24.95 16.11 23.58 ...
## $ X3.YR : num 0.69 0.7 5.34 4.09 12.88 ...
year1_model <- rpart(X1.YR ~., data = newdata, method = "anova", control=rpart.control(minsplit=60, minbucket=30, maxdepth=6))
plot(as.party(year1_model))
year2_model <- rpart(X2.YR ~., data = newdata, method = "anova", control=rpart.control(minsplit=60, minbucket=30, maxdepth=6))
plot(as.party(year2_model))
year3_model <- rpart(X3.YR ~., data = newdata, method = "anova", control=rpart.control(minsplit=60, minbucket=30, maxdepth=6))
plot(as.party(year3_model))
Look at the boxplot for each model.
print(year1_model)
## n= 832
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 832 46407.6000 11.321470
## 2) X2.YR< 10 536 12316.9900 7.403396
## 4) Risk.Rating< 6.5 289 2140.9080 4.660346
## 8) X3.YR.Sharpe.Ratio< -0.195 71 129.9039 1.806197 *
## 9) X3.YR.Sharpe.Ratio>=-0.195 218 1244.2550 5.589908 *
## 5) Risk.Rating>=6.5 247 5457.2670 10.612870
## 10) X2.YR< 0.115 52 1141.8690 6.234231 *
## 11) X2.YR>=0.115 195 3052.5680 11.780510
## 22) X3.YR.Sharpe.Ratio< 0.045 48 364.7237 9.079375 *
## 23) X3.YR.Sharpe.Ratio>=0.045 147 2223.2740 12.662520 *
## 3) X2.YR>=10 296 10962.4500 18.416350
## 6) X2.YR< 18.67 189 4342.3830 15.764870
## 12) Risk.Rating< 7.5 33 402.4952 11.145760 *
## 13) Risk.Rating>=7.5 156 3086.8510 16.741990
## 26) X3.YR< 4.18 52 850.5266 13.267500 *
## 27) X3.YR>=4.18 104 1294.7040 18.479230 *
## 7) X2.YR>=18.67 107 2944.3050 23.099810
## 14) X2.YR< 26.12 69 1150.1220 21.158840 *
## 15) X2.YR>=26.12 38 1062.2220 26.624210 *
print(year2_model)
## n= 832
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 832 78894.3200 7.528630
## 2) X1.YR< 14.72 578 21849.0800 2.978512
## 4) X6.MTH< -1.895 239 4746.3570 -1.742929
## 8) X1.YR< 6.975 161 1784.9600 -3.508758 *
## 9) X1.YR>=6.975 78 1423.1490 1.901923 *
## 5) X6.MTH>=-1.895 339 8018.7600 6.307198
## 10) X6.MTH< 1.115 252 3862.1510 4.942063
## 20) X3.YR.Sharpe.Ratio< -0.005 77 387.1665 1.752597 *
## 21) X3.YR.Sharpe.Ratio>=-0.005 175 2347.0360 6.345429 *
## 11) X6.MTH>=1.115 87 2326.6920 10.261380 *
## 3) X1.YR>=14.72 254 17847.3500 17.882830
## 6) X1.YR< 24.255 202 7233.5280 15.356340
## 12) X6.MTH< 2.845 105 2311.1330 12.030950 *
## 13) X6.MTH>=2.845 97 2504.4160 18.955980 *
## 7) X1.YR>=24.255 52 4315.5760 27.697310 *
print(year3_model)
## n= 832
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 832 7515.55900 4.068930
## 2) X3.YR.Sharpe.Ratio< 0.285 607 1252.12300 2.714778
## 4) X3.YR.Sharpe.Ratio< 0.095 337 209.60420 1.662967
## 8) X3.YR.Sharpe.Ratio< -0.005 208 79.97529 1.224279 *
## 9) X3.YR.Sharpe.Ratio>=-0.005 129 25.05679 2.370310 *
## 5) X3.YR.Sharpe.Ratio>=0.095 270 204.35370 4.027593
## 10) X3.YR.Sharpe.Ratio< 0.195 145 27.41739 3.413724 *
## 11) X3.YR.Sharpe.Ratio>=0.195 125 58.91179 4.739680 *
## 3) X3.YR.Sharpe.Ratio>=0.285 225 2147.53600 7.722133
## 6) X3.YR.Sharpe.Ratio< 0.665 195 477.72980 6.916821
## 12) Risk.Rating< 7.5 44 42.34265 5.351818 *
## 13) Risk.Rating>=7.5 151 296.21890 7.372848
## 26) X3.YR.Sharpe.Ratio< 0.405 93 58.84719 6.578495 *
## 27) X3.YR.Sharpe.Ratio>=0.405 58 84.59431 8.646552 *
## 7) X3.YR.Sharpe.Ratio>=0.665 30 721.33290 12.956670 *
Refer to earlier spreadsheet (mydata_year1.csv, mydata_year2.csv, mydata_year3.csv respectively for each model) and follow the decisions rules.
Sort the returns accordingly.
Refer to earlier spreadsheets created and follow the decisions rules. Use the Filtering Function in Excel, following the Decision Tree numbers
And you will derive the fund name. (mydata_year1.csv, mydata_year2.csv, mydata_year3.csv respectively for each model)
The higher the R-SQUARE, the most reliable.
To calculate R-SQUARE, Look for the column “rel error”" - this is the undefined VARIANCES.
R-SQUARE is calculated as 1 minus “rel error”.
rsq.rpart(year1_model)
##
## Regression tree:
## rpart(formula = X1.YR ~ ., data = newdata, method = "anova",
## control = rpart.control(minsplit = 60, minbucket = 30, maxdepth = 6))
##
## Variables actually used in tree construction:
## [1] Risk.Rating X2.YR X3.YR
## [4] X3.YR.Sharpe.Ratio
##
## Root node error: 46408/832 = 55.778
##
## n= 832
##
## CP nsplit rel error xerror xstd
## 1 0.498370 0 1.00000 1.00177 0.046205
## 2 0.101682 1 0.50163 0.50892 0.027627
## 3 0.079206 2 0.39995 0.40733 0.026296
## 4 0.027212 3 0.32074 0.34300 0.021796
## 5 0.019336 4 0.29353 0.32517 0.020296
## 6 0.016522 6 0.25486 0.31173 0.018859
## 7 0.015772 7 0.23834 0.28871 0.018358
## 8 0.010011 8 0.22256 0.27474 0.017419
## 9 0.010000 9 0.21255 0.26994 0.016730
rsq.rpart(year2_model)
##
## Regression tree:
## rpart(formula = X2.YR ~ ., data = newdata, method = "anova",
## control = rpart.control(minsplit = 60, minbucket = 30, maxdepth = 6))
##
## Variables actually used in tree construction:
## [1] X1.YR X3.YR.Sharpe.Ratio X6.MTH
##
## Root node error: 78894/832 = 94.825
##
## n= 832
##
## CP nsplit rel error xerror xstd
## 1 0.496840 0 1.00000 1.00221 0.062551
## 2 0.115141 1 0.50316 0.51191 0.033980
## 3 0.079831 2 0.38802 0.39799 0.031721
## 4 0.030648 3 0.30819 0.35112 0.023628
## 5 0.023195 4 0.27754 0.32261 0.022204
## 6 0.019498 5 0.25434 0.29633 0.019932
## 7 0.014297 6 0.23485 0.26509 0.019358
## 8 0.010000 7 0.22055 0.25363 0.019250
rsq.rpart(year3_model)
##
## Regression tree:
## rpart(formula = X3.YR ~ ., data = newdata, method = "anova",
## control = rpart.control(minsplit = 60, minbucket = 30, maxdepth = 6))
##
## Variables actually used in tree construction:
## [1] Risk.Rating X3.YR.Sharpe.Ratio
##
## Root node error: 7515.6/832 = 9.0331
##
## n= 832
##
## CP nsplit rel error xerror xstd
## 1 0.547651 0 1.00000 1.00115 0.098268
## 2 0.126201 1 0.45235 0.48283 0.060992
## 3 0.111524 2 0.32615 0.36514 0.031767
## 4 0.019423 3 0.21462 0.25433 0.028460
## 5 0.015704 5 0.17578 0.22950 0.027918
## 6 0.013914 6 0.16007 0.21606 0.027886
## 7 0.010000 7 0.14616 0.20725 0.027919
No ! This is because you will realize those funds with highest returns are of highest RISK class.
Let’s use the decision rule of Model 3 on fund selection.
I first open mydata_year3.csv.
Apply decision rule based on Model 3, which it will show the various fund.
Model 3 rule 1 says “Pick X3.YR.Sharpe.Ratio > 0.285”
Pick X3.YR.Sharpe.Ratio > 0.285
Pick X3.YR.Sharpe.Ratio > 0.665
Sort within Excel
Since Risk 10 gave the highest returns, let’s pick a Risk-10 FUND and choose a fund with LESS VOLATILE in terms of returns from year 1 to year 3.
Filter Risk Rating = 10
Calcute the mean, standard deivation and then sort by Coefficient by Variance.
Funds sorted with less volatile funds