8.1
Recreate the simulated data from exercise 7.2:
8.1.a
Fit a random forest model to all of the predictors, then estimate the variable importance scores:
library(randomForest)
library(caret)
model1 <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
rfImp1 <- varImp(model1, scale = FALSE)
rfImp1## Overall
## V1 8.732235404
## V2 6.415369387
## V3 0.763591825
## V4 7.615118809
## V5 2.023524577
## V6 0.165111172
## V7 -0.005961659
## V8 -0.166362581
## V9 -0.095292651
## V10 -0.074944788
Did the random forest model significantly use the uninformative predictors(V6-V10)?
From the output of the varImp function, we see that the random forest did use predictors 6-10, although they were not as significant.The most important variables are 1, 2, 4, 5.
8.1.b
Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
set.seed(123)
simulated$duplicate1 <- simulated$V1 + rnorm(200) *.1
cor(simulated$duplicate1, simulated$V1)## [1] 0.9504983
Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is high correlated to V1?
model2 <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
rfImp2 <- varImp(model2, scale = FALSE)
rfImp2## Overall
## V1 5.183260546
## V2 6.318610678
## V3 0.697206467
## V4 6.952539867
## V5 1.986661056
## V6 0.233829847
## V7 -0.006712779
## V8 -0.083743959
## V9 -0.020932796
## V10 -0.069609685
## duplicate1 4.305479706
Looking at the output for the variable importance, there was a decrease in the importance of V1 by about 2. There was an overall decrease, which I think makes sense. The importance value of the duplicated column is 3.456.
set.seed(234)
simulated$duplicate2 <- simulated$V1 + rnorm(200) * .12
cor(simulated$duplicate2, simulated$V1)## [1] 0.9243674
model3 <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
rfImp3 <- varImp(model3, scale = FALSE)
rfImp3## Overall
## V1 4.15929157
## V2 6.16758547
## V3 0.43858329
## V4 7.28389142
## V5 1.90255763
## V6 0.21213093
## V7 -0.01960485
## V8 -0.09552319
## V9 -0.06038266
## V10 -0.01800977
## duplicate1 3.14294997
## duplicate2 2.87803982
From the output, it looks as though the V1’s importance was reduced even further when we added another highly correlated variable.
8.1.c
Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importance show the same pattern as the traditional random forest model?
library(party)
set.seed(345)
cforest1 <- cforest(y ~ ., data = simulated[, 1:11], controls = cforest_control(ntree = 1000)) #using the columns that we originally created to not inlcude the highly correlated columns
cforest2 <- cforest(y ~ ., data = simulated[, 1:12], controls = cforest_control(ntree = 1000)) #includes the first highly correlated column to V1
cforest3 <- cforest(y ~ ., data = simulated, controls = cforest_control(ntree = 1000))I complete a random forest tree model for the three sets of simulated data(simulated, simulated with 1 highly correlated, simulated with 2 highly correlated) that we have. I used the controls argument to set the number of trees to 1000. Next I’ll check for variable importance using the varimp function. I’ll check first with the conditional argument set to “FALSE” then set to “TRUE”. I’ll compare the output of the variable importance of the different models.
cforest.imp1 <- varimp(cforest1)
cforest.imp2 <- varimp(cforest2)
cforest.imp3 <- varimp(cforest3)
cforest.imp4 <- varimp(cforest1, conditional = TRUE)
cforest.imp5 <- varimp(cforest2, conditional = TRUE)
cforest.imp6 <- varimp(cforest3, conditional = TRUE)library(knitr)
#cforest.imp1
#cforest.imp2
#cforest.imp3
#cforest.imp4
#cforest.imp5
#cforest.imp6
var.names <- c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "Corr1", "Corr2")
cforest.imp1 <- data.frame(Original_Simulated = cforest.imp1, Variable = factor(names(cforest.imp1), levels = var.names))
cforest.imp2 <- data.frame(Original_Simulated_Correlated1 = cforest.imp2, Variable = factor(names(cforest.imp2), levels = var.names))
cforest.imp3 <- data.frame(Original_Simulated_Correlated2 = cforest.imp3, Variable = factor(names(cforest.imp3), levels = var.names))
cforest.imp4 <- data.frame(Original_Simulated2 = cforest.imp4, Variable = factor(names(cforest.imp4), levels = var.names))
cforest.imp5 <- data.frame(Original_Simulated2_Correlated1 = cforest.imp5, Variable = factor(names(cforest.imp5), levels = var.names))
cforest.imp6 <- data.frame(Original_Simulated2_Correlated2 = cforest.imp6, Variable = factor(names(cforest.imp6), levels = var.names))
cforest.imp <- merge(cforest.imp1, cforest.imp2, all = TRUE)
cforest.imp <- merge(cforest.imp, cforest.imp3, all = TRUE)
cforest.imp <- merge(cforest.imp, cforest.imp4, all = TRUE)
cforest.imp <- merge(cforest.imp, cforest.imp5, all = TRUE)
cforest.imp <- merge(cforest.imp, cforest.imp6, all = TRUE)
cforest.imp$Variable <- factor(cforest.imp$Variable, levels = var.names)
attach(cforest.imp)
cforest.imp <- cforest.imp[order(Variable), ]
kable(cforest.imp)| Variable | Original_Simulated | Original_Simulated_Correlated1 | Original_Simulated_Correlated2 | Original_Simulated2 | Original_Simulated2_Correlated1 | Original_Simulated2_Correlated2 | |
|---|---|---|---|---|---|---|---|
| 1 | V1 | 9.2912099 | 5.6900862 | 4.2663934 | 3.1842192 | 1.1898197 | 0.5302215 |
| 3 | V2 | 7.2699157 | 6.3294000 | 6.1227909 | 3.9512745 | 3.7367197 | 3.5672106 |
| 4 | V3 | 0.0465104 | 0.0468226 | 0.0571431 | 0.0139257 | 0.0192500 | 0.0167082 |
| 5 | V4 | 8.7462832 | 8.0197883 | 7.8767015 | 4.8435073 | 4.6129464 | 4.3252320 |
| 6 | V5 | 2.1798340 | 2.0186085 | 1.7982137 | 0.7201620 | 0.8460670 | 0.7586149 |
| 7 | V6 | 0.0153615 | -0.0050624 | 0.0521642 | 0.0074876 | 0.0018668 | 0.0265691 |
| 8 | V7 | 0.0998168 | 0.0497234 | 0.0420129 | 0.0219337 | 0.0224695 | 0.0014323 |
| 9 | V8 | -0.0447170 | -0.0436383 | -0.0092861 | -0.0104191 | -0.0152817 | 0.0016234 |
| 10 | V9 | -0.0281041 | -0.0261491 | -0.0027388 | -0.0016222 | 0.0020749 | 0.0051651 |
| 2 | V10 | -0.0378497 | -0.0237428 | -0.0056340 | 0.0019101 | -0.0060677 | -0.0018240 |
| 11 | NA | NA | 4.0672712 | 2.9405583 | NA | 0.9936000 | 0.5174836 |
| 12 | NA | NA | 4.0672712 | 2.9405583 | NA | 0.9936000 | 0.2430641 |
| 13 | NA | NA | 4.0672712 | 2.6147749 | NA | 0.9936000 | 0.5174836 |
| 14 | NA | NA | 4.0672712 | 2.6147749 | NA | 0.9936000 | 0.2430641 |
| It ap | pears that | the trends are the sa | me for the models with V1, V2, V4 | , V5 holding steady as being rank | ed as most important. |
8.1.d
Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?
From the ipred package, I plan to use the bagging function, which uses the formula interface.
library(ipred)
set.seed(789)
bg1 <- bagging(y ~ ., data = simulated[, 1:11], nbags = 50) #using the columns that we originally created to not inlcude the highly correlated columns
bg2 <- bagging(y ~ ., data = simulated[, 1:12], nbags = 50) #includes the first highly correlated column to V1
bg3 <- bagging(y ~ ., data = simulated, nbags = 50) #using all the variablesbg.varimp1 <- varImp(bg1)
bg.varimp2 <- varImp(bg2)
bg.varimp3 <- varImp(bg3)
names(bg.varimp1) <- "Original_Simulated"
names(bg.varimp2) <- "Simulated_Correlated1"
names(bg.varimp3) <- "Simulated_Correlated2"
bg.varimp1$Variable <- factor(rownames(bg.varimp1), levels = var.names)
bg.varimp2$Variable <- factor(rownames(bg.varimp2), levels = var.names)
bg.varimp3$Variable <- factor(rownames(bg.varimp3), levels = var.names)
bg.imp <- merge(bg.varimp1, bg.varimp2, all = TRUE)
bg.imp <- merge(bg.imp, bg.varimp3, all = TRUE)
attach(bg.imp)
bg.imp <- bg.imp[order(Variable), ]
kable(bg.imp)| Variable | Original_Simulated | Simulated_Correlated1 | Simulated_Correlated2 | |
|---|---|---|---|---|
| 1 | V1 | 2.1196834 | 1.7069067 | 1.7721698 |
| 3 | V2 | 2.3209243 | 2.0857234 | 2.4074153 |
| 4 | V3 | 1.2419984 | 0.8790192 | 1.1049421 |
| 5 | V4 | 2.6603400 | 2.5593713 | 2.4957950 |
| 6 | V5 | 2.4569089 | 2.3914402 | 2.2823831 |
| 7 | V6 | 0.9662812 | 0.8235606 | 0.8717684 |
| 8 | V7 | 1.0994569 | 1.0043189 | 0.9560367 |
| 9 | V8 | 0.6802120 | 0.5146650 | 0.4823199 |
| 10 | V9 | 0.7327331 | 0.5774102 | 0.5030545 |
| 2 | V10 | 1.0177427 | 0.5589770 | 0.7367097 |
| 11 | NA | NA | 1.5375948 | 1.5007556 |
| 12 | NA | NA | 1.5375948 | 1.6418951 |
The trend, for the most part, seems to follow the other models. I was surprised to see Variable 5 ranking higher in these models. In general, I think there’s an overall higher rating of importance on the other variables as well. Now I’ll take a cubist route using the Cubist package setting the committees argument to 100 to indicate how many models we are fitting.
library(Cubist)
set.seed(891)
simulated2 <- simulated[-12] #break up data so one set contains just one of the correlated columns
simulated3 <- simulated[-13]
cb1 <- cubist(x = simulated[, 1:10], y = simulated$y, committees = 100)
cb2 <- cubist(x = simulated2[, names(simulated2) != "y"], y = simulated2$y, committees = 100)
cb3 <- cubist(x = simulated3[, names(simulated3) != "y"], y = simulated3$y, committees = 100)
#cb2 <- cubist(x = simulated2[, 1:10, 12], y = simulated2$y, committees = 100)
#cb3 <- cubist(x = simulated3[, 1:10, 12], y = simulated3$y, committees = 100)
cb.varimp1 <- varImp(cb1)
cb.varimp2 <- varImp(cb2)
cb.varimp3 <- varImp(cb3)
#I'm not going to take the time to put a data frame. I'll just take a look at the output of each varimp output.
cb.varimp1## Overall
## V1 71.5
## V3 47.0
## V2 58.5
## V4 48.0
## V5 33.0
## V6 13.0
## V7 0.0
## V8 0.0
## V9 0.0
## V10 0.0
## Overall
## V3 48.5
## V2 58.5
## V1 58.0
## V4 47.5
## duplicate2 21.5
## V5 36.5
## V6 14.0
## V8 2.0
## V10 1.0
## V9 0.5
## V7 0.0
## Overall
## V1 50.0
## V3 46.0
## V2 58.0
## V4 47.5
## V5 30.5
## duplicate1 25.5
## V8 7.5
## V6 3.5
## V7 0.0
## V9 0.0
## V10 0.0
The output for the cubist models has V1, V2, V3, V4, V5 and surprisingly the duplicate correlated variables.
8.2 Use a simulation to show tree bias with different granularities.
From the text, there was a phenomenon that occurs. This is that predictors that have more potential split points also have a higher chance of being used towards the top of a tree partition. This happens even when there is little to no relationship with the response. To show this via simulations, we will created three variables, 2 predictor variables and 1 response. One of the predictors are going to be categorical that will help separate y into two homogeneous groups and the other will not. We will build a model and check variable importance. Variable 1 should be extremely important and 2 should not be at all.
set.seed(147)
X1 <- rep(1:2, each = 100)
Y <- X1 + rnorm(200, mean = 0, sd = 4)
set.seed(148)
X2 <- rnorm(200, mean = 0, sd = 2)
simData <- data.frame(Y = Y, X1 = X1, X2 = X2)
library(rpart)
set.seed(149)
fit <- rpart(Y ~ ., data = simData)
varImp(fit)## Overall
## X1 0.05409211
## X2 0.34896693
X2 should not have a higher importance rate than X1, but the above output shows otherwise.
8.3
In stochastic gradient boosting, the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction(.1 and .9) and the learning rate (.1 and .9) for the solubility data. The left hand plot has both parameters set to .1 and the right has both set to .9:
8.3.a
Why does the model on the right focus its importance on just the first few of the predictors, whereas the model on the left spreads importance across more predictors?
As we increase the learning rate going towards 1, the model becomes more greedy. With more the increased greediness, our model will identify fewer predictors related to the response. When we increase the bagging fraction we use more data for the model. Putting that together, when we increase both the learning rate and the bagging fraction from .1 to .9, the importance will be focused on less predictors.
8.3.b
Which model do you think would be more predictive of other samples?
I think that the model where both parameters are .1 would be more predictive of other samples because when we increase the parameters, model performance will decrease.
8.3.c
How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?
Increasing interaction depth will spread variable importance over more predictors and increase the importance overall.
8.7
Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting and preprocessing steps as before the train several tree-based models:
set.seed(123)
gbmGrid <- expand.grid(interaction.depth = seq(1, 6, by = 1), n.trees = seq(25, 200, by = 25), n.minobsinnode = 10, shrinkage = c(.01, .05, .1, .2))
gbmTune <- train(x = trainPred.pp, y = trainingYield, method = "gbm", metric = "RMSE",
tuneGrid = gbmGrid, trControl = ctrl, verbose = FALSE)
gbmTune$bestTune## n.trees interaction.depth shrinkage n.minobsinnode
## 86 150 5 0.05 10
## shrinkage interaction.depth n.minobsinnode n.trees RMSE Rsquared
## 1 0.01 1 10 25 1.778412 0.3986286
## 49 0.05 1 10 25 1.514010 0.4611440
## 97 0.10 1 10 25 1.381630 0.4973205
## 145 0.20 1 10 25 1.331796 0.5054901
## 9 0.01 2 10 25 1.743178 0.4569941
## 57 0.05 2 10 25 1.423501 0.5100147
## 105 0.10 2 10 25 1.329810 0.5186382
## 153 0.20 2 10 25 1.343014 0.4981162
## 17 0.01 3 10 25 1.727755 0.4764758
## 65 0.05 3 10 25 1.396974 0.5180575
## 113 0.10 3 10 25 1.333900 0.5069763
## 161 0.20 3 10 25 1.298208 0.5282931
## 25 0.01 4 10 25 1.721070 0.4880272
## 73 0.05 4 10 25 1.401228 0.5112747
## 121 0.10 4 10 25 1.315872 0.5197589
## 169 0.20 4 10 25 1.290580 0.5314054
## 33 0.01 5 10 25 1.723672 0.4846978
## 81 0.05 5 10 25 1.397119 0.5221066
## 129 0.10 5 10 25 1.299476 0.5363788
## 177 0.20 5 10 25 1.293604 0.5343201
## 41 0.01 6 10 25 1.722685 0.4864388
## 89 0.05 6 10 25 1.414202 0.5021028
## 137 0.10 6 10 25 1.284932 0.5462492
## 185 0.20 6 10 25 1.304211 0.5278327
## 2 0.01 1 10 50 1.688350 0.4284615
## 50 0.05 1 10 50 1.381052 0.5001521
## 98 0.10 1 10 50 1.313916 0.5194359
## 146 0.20 1 10 50 1.306898 0.5235059
## 10 0.01 2 10 50 1.627944 0.4792936
## 58 0.05 2 10 50 1.319967 0.5296053
## 106 0.10 2 10 50 1.289170 0.5368360
## 154 0.20 2 10 50 1.335840 0.5092811
## 18 0.01 3 10 50 1.601997 0.4967075
## 66 0.05 3 10 50 1.299321 0.5404476
## 114 0.10 3 10 50 1.302241 0.5246229
## 162 0.20 3 10 50 1.289008 0.5393446
## 26 0.01 4 10 50 1.597197 0.5033510
## 74 0.05 4 10 50 1.302587 0.5383994
## 122 0.10 4 10 50 1.278267 0.5402557
## 170 0.20 4 10 50 1.278916 0.5426049
## 34 0.01 5 10 50 1.597557 0.5027525
## 82 0.05 5 10 50 1.295093 0.5466336
## 130 0.10 5 10 50 1.261100 0.5548275
## 178 0.20 5 10 50 1.288396 0.5420669
## 42 0.01 6 10 50 1.594543 0.5046193
## 90 0.05 6 10 50 1.312363 0.5288222
## 138 0.10 6 10 50 1.254614 0.5587836
## 186 0.20 6 10 50 1.293673 0.5376470
## 3 0.01 1 10 75 1.616986 0.4516339
## 51 0.05 1 10 75 1.328216 0.5167016
## 99 0.10 1 10 75 1.297066 0.5287199
## 147 0.20 1 10 75 1.316260 0.5206850
## 11 0.01 2 10 75 1.542054 0.4958405
## 59 0.05 2 10 75 1.283011 0.5433894
## 107 0.10 2 10 75 1.278615 0.5433393
## 155 0.20 2 10 75 1.330621 0.5153431
## 19 0.01 3 10 75 1.509795 0.5151268
## 67 0.05 3 10 75 1.272363 0.5499812
## 115 0.10 3 10 75 1.293895 0.5310727
## 163 0.20 3 10 75 1.289683 0.5398357
## 27 0.01 4 10 75 1.505716 0.5156066
## 75 0.05 4 10 75 1.274659 0.5488684
## 123 0.10 4 10 75 1.271194 0.5477553
## 171 0.20 4 10 75 1.280885 0.5427646
## 35 0.01 5 10 75 1.505821 0.5133578
## 83 0.05 5 10 75 1.261936 0.5605009
## 131 0.10 5 10 75 1.251176 0.5609713
## 179 0.20 5 10 75 1.289548 0.5417739
## 43 0.01 6 10 75 1.503701 0.5151265
## 91 0.05 6 10 75 1.281979 0.5425743
## 139 0.10 6 10 75 1.250867 0.5624803
## 187 0.20 6 10 75 1.295706 0.5360656
## 4 0.01 1 10 100 1.563215 0.4628727
## 52 0.05 1 10 100 1.308273 0.5235945
## 100 0.10 1 10 100 1.288955 0.5358950
## 148 0.20 1 10 100 1.321170 0.5189941
## 12 0.01 2 10 100 1.476955 0.5065972
## 60 0.05 2 10 100 1.268307 0.5499749
## 108 0.10 2 10 100 1.275273 0.5464375
## 156 0.20 2 10 100 1.329097 0.5185329
## 20 0.01 3 10 100 1.444081 0.5256392
## 68 0.05 3 10 100 1.257842 0.5576280
## 116 0.10 3 10 100 1.291979 0.5326537
## 164 0.20 3 10 100 1.290100 0.5399918
## 28 0.01 4 10 100 1.440421 0.5247165
## 76 0.05 4 10 100 1.266759 0.5529861
## 124 0.10 4 10 100 1.268702 0.5500801
## 172 0.20 4 10 100 1.278752 0.5453621
## 36 0.01 5 10 100 1.442315 0.5211541
## 84 0.05 5 10 100 1.250478 0.5659654
## 132 0.10 5 10 100 1.249027 0.5631706
## 180 0.20 5 10 100 1.293826 0.5396546
## 44 0.01 6 10 100 1.440497 0.5215845
## 92 0.05 6 10 100 1.268484 0.5498951
## 140 0.10 6 10 100 1.245981 0.5657715
## 188 0.20 6 10 100 1.294127 0.5375419
## 5 0.01 1 10 125 1.515009 0.4775145
## 53 0.05 1 10 125 1.299395 0.5276076
## 101 0.10 1 10 125 1.287549 0.5375280
## 149 0.20 1 10 125 1.325297 0.5200522
## 13 0.01 2 10 125 1.428355 0.5152646
## 61 0.05 2 10 125 1.257088 0.5569405
## 109 0.10 2 10 125 1.274384 0.5487823
## 157 0.20 2 10 125 1.330702 0.5175849
## 21 0.01 3 10 125 1.398541 0.5295717
## 69 0.05 3 10 125 1.253582 0.5599723
## 117 0.10 3 10 125 1.295221 0.5319299
## 165 0.20 3 10 125 1.290160 0.5404326
## 29 0.01 4 10 125 1.391825 0.5325631
## 77 0.05 4 10 125 1.262064 0.5569466
## 125 0.10 4 10 125 1.269000 0.5503452
## 173 0.20 4 10 125 1.279988 0.5453668
## 37 0.01 5 10 125 1.393035 0.5296534
## 85 0.05 5 10 125 1.244754 0.5689671
## 133 0.10 5 10 125 1.246546 0.5651067
## 181 0.20 5 10 125 1.295218 0.5394010
## 45 0.01 6 10 125 1.392869 0.5309316
## 93 0.05 6 10 125 1.260755 0.5544591
## 141 0.10 6 10 125 1.248199 0.5646084
## 189 0.20 6 10 125 1.294408 0.5381987
## 6 0.01 1 10 150 1.476411 0.4859998
## 54 0.05 1 10 150 1.290730 0.5331477
## 102 0.10 1 10 150 1.287909 0.5381063
## 150 0.20 1 10 150 1.337149 0.5151428
## 14 0.01 2 10 150 1.390061 0.5240002
## 62 0.05 2 10 150 1.252297 0.5601339
## 110 0.10 2 10 150 1.275642 0.5493604
## 158 0.20 2 10 150 1.329206 0.5203742
## 22 0.01 3 10 150 1.362304 0.5365823
## 70 0.05 3 10 150 1.250877 0.5620949
## 118 0.10 3 10 150 1.291988 0.5340247
## 166 0.20 3 10 150 1.291481 0.5401438
## 30 0.01 4 10 150 1.356656 0.5383769
## 78 0.05 4 10 150 1.255676 0.5605675
## 126 0.10 4 10 150 1.269663 0.5504748
## 174 0.20 4 10 150 1.280864 0.5450200
## 38 0.01 5 10 150 1.359447 0.5347478
## 86 0.05 5 10 150 1.240678 0.5712937
## 134 0.10 5 10 150 1.246903 0.5657151
## 182 0.20 5 10 150 1.296511 0.5385870
## 46 0.01 6 10 150 1.357357 0.5380483
## 94 0.05 6 10 150 1.259260 0.5555202
## 142 0.10 6 10 150 1.248445 0.5647663
## 190 0.20 6 10 150 1.294297 0.5385283
## 7 0.01 1 10 175 1.443785 0.4931555
## 55 0.05 1 10 175 1.288391 0.5349892
## 103 0.10 1 10 175 1.289153 0.5373635
## 151 0.20 1 10 175 1.341953 0.5133138
## 15 0.01 2 10 175 1.362582 0.5296550
## 63 0.05 2 10 175 1.249957 0.5616564
## 111 0.10 2 10 175 1.276198 0.5497492
## 159 0.20 2 10 175 1.329006 0.5201134
## 23 0.01 3 10 175 1.335619 0.5414551
## 71 0.05 3 10 175 1.248625 0.5639895
## 119 0.10 3 10 175 1.289711 0.5358368
## 167 0.20 3 10 175 1.288861 0.5420234
## 31 0.01 4 10 175 1.331859 0.5419981
## 79 0.05 4 10 175 1.256699 0.5599681
## 127 0.10 4 10 175 1.268308 0.5512828
## 175 0.20 4 10 175 1.280471 0.5455653
## 39 0.01 5 10 175 1.333300 0.5409870
## 87 0.05 5 10 175 1.240878 0.5704412
## 135 0.10 5 10 175 1.246045 0.5662565
## 183 0.20 5 10 175 1.295840 0.5392601
## 47 0.01 6 10 175 1.330907 0.5435326
## 95 0.05 6 10 175 1.259061 0.5562404
## 143 0.10 6 10 175 1.249802 0.5642235
## 191 0.20 6 10 175 1.293892 0.5388194
## 8 0.01 1 10 200 1.416870 0.4983737
## 56 0.05 1 10 200 1.283517 0.5373366
## 104 0.10 1 10 200 1.298543 0.5335723
## 152 0.20 1 10 200 1.345208 0.5122997
## 16 0.01 2 10 200 1.341843 0.5327212
## 64 0.05 2 10 200 1.250069 0.5625120
## 112 0.10 2 10 200 1.277813 0.5496735
## 160 0.20 2 10 200 1.328424 0.5208364
## 24 0.01 3 10 200 1.318046 0.5432836
## 72 0.05 3 10 200 1.247291 0.5649233
## 120 0.10 3 10 200 1.289452 0.5361220
## 168 0.20 3 10 200 1.289947 0.5415830
## 32 0.01 4 10 200 1.314871 0.5446487
## 80 0.05 4 10 200 1.256384 0.5601931
## 128 0.10 4 10 200 1.267004 0.5518451
## 176 0.20 4 10 200 1.279526 0.5463297
## 40 0.01 5 10 200 1.314759 0.5443951
## 88 0.05 5 10 200 1.241902 0.5697570
## 136 0.10 5 10 200 1.248000 0.5651198
## 184 0.20 5 10 200 1.295866 0.5394539
## 48 0.01 6 10 200 1.312478 0.5468268
## 96 0.05 6 10 200 1.259073 0.5564253
## 144 0.10 6 10 200 1.251129 0.5632205
## 192 0.20 6 10 200 1.294285 0.5386076
## MAE RMSESD RsquaredSD MAESD
## 1 1.4322535 0.13698792 0.07903455 0.09336246
## 49 1.2024338 0.14291156 0.08803578 0.09514732
## 97 1.0779468 0.13659929 0.08218126 0.08992939
## 145 1.0397687 0.10580353 0.08981908 0.09886336
## 9 1.4072609 0.13598863 0.07904775 0.09358166
## 57 1.1290000 0.12698269 0.08740983 0.08624516
## 105 1.0280368 0.11460908 0.08293091 0.09046718
## 153 1.0396302 0.13551369 0.07579974 0.10727936
## 17 1.3949222 0.13081354 0.07524071 0.08744511
## 65 1.0995280 0.12245445 0.07710628 0.07941596
## 113 1.0294253 0.11279794 0.08704525 0.08637984
## 161 1.0078175 0.12023044 0.09074427 0.07767724
## 25 1.3891569 0.13806351 0.08543464 0.09294494
## 73 1.1032806 0.12257077 0.08110714 0.07707372
## 121 1.0222550 0.11111006 0.08886277 0.08447210
## 169 1.0061920 0.11867457 0.08514914 0.08189648
## 33 1.3916750 0.13441423 0.08633628 0.09140876
## 81 1.0977070 0.13018193 0.07428150 0.08957868
## 129 1.0009860 0.13078942 0.09128776 0.10120357
## 177 1.0008555 0.10430766 0.08299562 0.08172214
## 41 1.3899894 0.13351597 0.07052664 0.09053166
## 89 1.1136787 0.12369063 0.08776778 0.08515129
## 137 0.9955253 0.10842598 0.07357079 0.07397338
## 185 1.0040759 0.12543235 0.09159003 0.08538456
## 2 1.3538200 0.13973144 0.08634911 0.09280640
## 50 1.0810609 0.13355257 0.07704950 0.08953896
## 98 1.0092428 0.11646841 0.08176297 0.09352250
## 146 1.0169374 0.11572367 0.08691838 0.09351020
## 10 1.3094977 0.13400398 0.07162072 0.09063932
## 58 1.0218186 0.11559251 0.08659634 0.08862743
## 106 0.9896518 0.11548894 0.08518725 0.09219100
## 154 1.0380867 0.12117214 0.07325281 0.09231426
## 18 1.2870305 0.12919890 0.07308743 0.08510682
## 66 1.0106686 0.10457195 0.07388360 0.07879918
## 114 1.0058641 0.10729529 0.08069172 0.08086066
## 162 0.9971029 0.11154711 0.08370974 0.07731461
## 26 1.2827545 0.13435460 0.07841626 0.08763715
## 74 1.0061303 0.11056827 0.07333396 0.07584690
## 122 0.9856480 0.10432036 0.08497513 0.07906314
## 170 0.9980378 0.11220886 0.08172087 0.07733102
## 34 1.2831568 0.12909320 0.08220623 0.08442878
## 82 0.9942813 0.11870639 0.07550137 0.08386768
## 130 0.9679545 0.12123696 0.08933711 0.09001168
## 178 0.9977820 0.10988938 0.08987644 0.09016164
## 42 1.2817405 0.12991624 0.07165078 0.08386699
## 90 1.0169554 0.11730531 0.08546611 0.08439176
## 138 0.9669190 0.10422746 0.07744747 0.07044370
## 186 0.9938641 0.12268798 0.08803315 0.07796052
## 3 1.2925238 0.13848450 0.08563957 0.08899735
## 51 1.0253535 0.12247932 0.07233687 0.08758023
## 99 0.9952587 0.11456411 0.08076116 0.08986417
## 147 1.0262660 0.11545038 0.09064602 0.09035161
## 11 1.2343464 0.13078416 0.07223515 0.08747838
## 59 0.9862869 0.10715432 0.08356679 0.08069713
## 107 0.9815825 0.11480437 0.08562260 0.08894938
## 155 1.0306424 0.11442769 0.07392500 0.08461696
## 19 1.2071520 0.12593784 0.07020440 0.08400900
## 67 0.9819192 0.10743465 0.07680728 0.07981634
## 115 1.0018412 0.10284679 0.08054247 0.07984699
## 163 0.9995826 0.11114404 0.09061064 0.07682580
## 27 1.2012302 0.13287414 0.08112834 0.08687563
## 75 0.9802995 0.10996119 0.07265226 0.07053496
## 123 0.9755438 0.10519751 0.08507280 0.07944325
## 171 0.9984738 0.11678324 0.08766688 0.08523913
## 35 1.2027376 0.12436096 0.07782281 0.08148926
## 83 0.9649259 0.11305940 0.07528966 0.08105518
## 131 0.9604675 0.12021330 0.08921930 0.08681863
## 179 1.0010294 0.12094664 0.09116313 0.09725043
## 43 1.2008194 0.12835531 0.07295335 0.08313154
## 91 0.9914360 0.11248160 0.08447294 0.07809046
## 139 0.9643715 0.10146538 0.07898344 0.06795023
## 187 0.9966389 0.12351472 0.08740955 0.07410512
## 4 1.2464305 0.13652086 0.08577509 0.08606749
## 52 1.0052725 0.11707603 0.07468370 0.08345844
## 100 0.9898955 0.11704468 0.08183165 0.08969713
## 148 1.0286487 0.11685165 0.08885401 0.09053771
## 12 1.1768835 0.13054225 0.07431193 0.08796644
## 60 0.9758133 0.10361755 0.08279813 0.07656882
## 108 0.9797407 0.12365732 0.09077104 0.09281517
## 156 1.0323467 0.11581875 0.07663666 0.08377418
## 20 1.1479896 0.12401167 0.07058195 0.08282258
## 68 0.9721145 0.10526499 0.07602490 0.07844676
## 116 1.0033211 0.10006556 0.08198940 0.07608863
## 164 0.9987456 0.11542518 0.09259741 0.07975349
## 28 1.1418846 0.12872407 0.08172538 0.08422987
## 76 0.9755058 0.10651264 0.07323314 0.07052737
## 124 0.9729261 0.10338232 0.08490457 0.07682409
## 172 0.9972292 0.11514791 0.08584792 0.08303147
## 36 1.1452838 0.12106117 0.07815408 0.07960944
## 84 0.9580548 0.10473691 0.07384176 0.07237104
## 132 0.9595187 0.11659171 0.08904328 0.08188223
## 180 1.0056535 0.11939357 0.08916159 0.09483227
## 44 1.1429796 0.12637212 0.07403639 0.08042947
## 92 0.9782846 0.11282306 0.08482954 0.07782648
## 140 0.9633490 0.09827898 0.08010985 0.06606115
## 188 0.9961186 0.12053136 0.08721537 0.07216058
## 5 1.2038763 0.13633546 0.08097897 0.08709981
## 53 0.9960582 0.10918518 0.07700849 0.08253926
## 101 0.9904768 0.11073208 0.08169618 0.08273114
## 149 1.0272074 0.11060622 0.08545803 0.08194401
## 13 1.1326603 0.12675248 0.07488466 0.08477358
## 61 0.9675405 0.10301549 0.08049083 0.07423800
## 109 0.9780985 0.12134937 0.09008091 0.09373903
## 157 1.0315211 0.11540637 0.07704636 0.08366468
## 21 1.1058195 0.12074481 0.07179757 0.07991060
## 69 0.9705609 0.10397296 0.07578976 0.07642101
## 117 1.0040013 0.10121729 0.08268080 0.07715323
## 165 0.9989349 0.11444523 0.09315164 0.07861537
## 29 1.0967462 0.12554622 0.08081302 0.08160382
## 77 0.9720777 0.10389337 0.07295110 0.06799467
## 125 0.9759251 0.10201938 0.08509476 0.07643627
## 173 0.9977997 0.11614088 0.08761623 0.08360356
## 37 1.0986240 0.11997952 0.08054887 0.08090416
## 85 0.9550834 0.10473724 0.07387799 0.07073407
## 133 0.9580460 0.11632454 0.08968117 0.07910281
## 181 1.0068456 0.11985393 0.09048160 0.09624709
## 45 1.0980796 0.12294343 0.07468370 0.07767317
## 93 0.9741422 0.11070853 0.08487828 0.07442711
## 141 0.9660662 0.09888168 0.08030903 0.06627780
## 189 0.9952411 0.11945004 0.08790396 0.07357201
## 6 1.1694901 0.13172637 0.08073489 0.08440955
## 54 0.9909024 0.10992672 0.07656454 0.08308017
## 102 0.9919966 0.10768625 0.07982799 0.07520948
## 150 1.0371842 0.10989524 0.08372425 0.07723873
## 14 1.0955445 0.12412746 0.07308212 0.08436940
## 62 0.9636273 0.10254860 0.07828079 0.07210618
## 110 0.9826253 0.11989399 0.08847117 0.09274801
## 158 1.0299267 0.11497373 0.07708027 0.08226583
## 22 1.0716784 0.11818454 0.07125184 0.07942530
## 70 0.9683906 0.10260548 0.07650254 0.07437590
## 118 1.0024985 0.10006814 0.08260918 0.07637724
## 166 1.0004957 0.11520834 0.09414598 0.07760469
## 30 1.0623017 0.12281723 0.07955622 0.08083236
## 78 0.9686495 0.10105453 0.07400918 0.06437286
## 126 0.9780063 0.10294651 0.08648100 0.07719392
## 174 0.9990374 0.11718345 0.08830467 0.08440008
## 38 1.0659103 0.11921105 0.08126728 0.08129819
## 86 0.9506507 0.10326975 0.07209525 0.06943989
## 134 0.9582314 0.11640620 0.09164849 0.07931350
## 182 1.0083507 0.11986553 0.08930307 0.09608266
## 46 1.0635796 0.11967340 0.07597345 0.07723335
## 94 0.9726936 0.11033119 0.08469711 0.07382225
## 142 0.9668681 0.10258071 0.08146020 0.06992817
## 190 0.9952778 0.12086540 0.08885566 0.07450963
## 7 1.1404645 0.12843112 0.07758445 0.08300737
## 55 0.9919051 0.10760724 0.07734075 0.08202638
## 103 0.9948400 0.10284730 0.07691819 0.07151963
## 151 1.0439255 0.11947637 0.09048615 0.08658979
## 15 1.0684269 0.12130727 0.07509076 0.08419543
## 63 0.9630092 0.10606958 0.07786392 0.07660014
## 111 0.9829093 0.12201289 0.08894729 0.09220784
## 159 1.0308622 0.11556789 0.07811726 0.08261569
## 23 1.0447654 0.11781258 0.07222280 0.08090832
## 71 0.9670475 0.10291247 0.07717177 0.07736388
## 119 1.0011669 0.10072804 0.08281072 0.07637967
## 167 0.9982944 0.11512409 0.09383791 0.07741579
## 31 1.0367344 0.12194468 0.08131320 0.08200019
## 79 0.9700212 0.10126971 0.07632909 0.06517627
## 127 0.9773240 0.10488456 0.08749220 0.07833838
## 175 0.9981965 0.11918428 0.08895288 0.08651097
## 39 1.0406847 0.11872641 0.07936414 0.08264008
## 87 0.9505215 0.10281853 0.07341320 0.06762995
## 135 0.9576054 0.11848731 0.09225123 0.07864204
## 183 1.0084571 0.12018809 0.08930283 0.09678208
## 47 1.0378272 0.11945098 0.07516362 0.07840772
## 95 0.9724750 0.10948421 0.08521803 0.07324542
## 143 0.9691290 0.10168441 0.08199122 0.07014804
## 191 0.9944395 0.12065576 0.08889442 0.07516118
## 8 1.1146333 0.12702683 0.07534002 0.08445383
## 56 0.9890101 0.10696267 0.07748995 0.07946691
## 104 1.0017540 0.10507017 0.07328949 0.07024656
## 152 1.0480088 0.11887099 0.08986159 0.08378574
## 16 1.0483848 0.11859714 0.07566300 0.08396822
## 64 0.9646882 0.10222750 0.07738834 0.07352206
## 112 0.9840310 0.12005072 0.08687741 0.08862949
## 160 1.0302006 0.11756757 0.07876973 0.08375992
## 24 1.0261720 0.11768907 0.07332940 0.08294983
## 72 0.9661762 0.10358875 0.07775481 0.07901400
## 120 0.9996461 0.10061451 0.08375883 0.07585393
## 168 0.9990297 0.11381124 0.09432483 0.07608146
## 32 1.0190338 0.12067827 0.08176840 0.08329563
## 80 0.9699794 0.10120986 0.07630180 0.06434184
## 128 0.9762944 0.10486242 0.08728599 0.07833207
## 176 0.9976508 0.12025267 0.08918870 0.08672007
## 40 1.0218863 0.11695156 0.07967020 0.08121775
## 88 0.9504654 0.10260301 0.07415649 0.06661331
## 136 0.9593787 0.11858991 0.09258928 0.07861690
## 184 1.0086358 0.12061599 0.08994303 0.09788795
## 48 1.0188667 0.11765899 0.07540304 0.07843897
## 96 0.9727662 0.10864292 0.08450959 0.07217377
## 144 0.9703957 0.10215201 0.08317846 0.06974274
## 192 0.9943541 0.12088494 0.08911791 0.07554989
set.seed(239)
cubistGrid <- expand.grid(committees = c(1, 5, 10, 15, 25, 50, 100), neighbors = c(0, 1, 3, 5, 7))
cubistTune <- train(x = trainPred.pp, y = trainingYield, method = "cubist", metric = "RMSE",
tuneGrid = cubistGrid, trControl = ctrl, verbose = FALSE)
cubistTune$bestTune## committees neighbors
## 32 100 1
## committees neighbors RMSE Rsquared MAE RMSESD RsquaredSD
## 1 1 0 1.815817 0.3729893 1.3287102 0.4410520 0.14753091
## 2 1 1 1.823720 0.3777537 1.3164965 0.4180463 0.14287399
## 3 1 3 1.811003 0.3837828 1.3129488 0.4457070 0.15271315
## 4 1 5 1.811058 0.3835257 1.3162949 0.4549487 0.15440134
## 5 1 7 1.812864 0.3814421 1.3173620 0.4548548 0.15399021
## 6 5 0 1.339063 0.5428295 1.0392760 0.1650454 0.09377780
## 7 5 1 1.326775 0.5558176 1.0089725 0.1717408 0.09839337
## 8 5 3 1.326267 0.5530943 1.0199380 0.1712862 0.09830341
## 9 5 5 1.331409 0.5495098 1.0265840 0.1736262 0.09803668
## 10 5 7 1.333760 0.5479240 1.0313836 0.1712259 0.09754239
## 11 10 0 1.249081 0.5925068 0.9641738 0.1493271 0.08435457
## 12 10 1 1.231828 0.6073942 0.9294092 0.1751055 0.09584823
## 13 10 3 1.236307 0.6027264 0.9390496 0.1522294 0.08539121
## 14 10 5 1.240883 0.5988914 0.9457719 0.1514277 0.08338686
## 15 10 7 1.244827 0.5958626 0.9530015 0.1515803 0.08479786
## 16 15 0 1.207406 0.6137342 0.9376497 0.1512417 0.08386398
## 17 15 1 1.186471 0.6303127 0.8980299 0.1750167 0.09599709
## 18 15 3 1.191196 0.6255989 0.9104375 0.1598763 0.08917522
## 19 15 5 1.196608 0.6211787 0.9165546 0.1565933 0.08517451
## 20 15 7 1.200602 0.6181965 0.9237320 0.1544061 0.08506926
## 21 25 0 1.173682 0.6345458 0.9088167 0.1602742 0.08526183
## 22 25 1 1.149735 0.6500487 0.8678692 0.1803227 0.09480116
## 23 25 3 1.155165 0.6462308 0.8827286 0.1697128 0.09034740
## 24 25 5 1.161127 0.6420632 0.8887960 0.1647827 0.08592533
## 25 25 7 1.167231 0.6379653 0.8965336 0.1655586 0.08713961
## 26 50 0 1.157235 0.6470662 0.8886820 0.1695928 0.09080517
## 27 50 1 1.134742 0.6600635 0.8511781 0.1866305 0.09679261
## 28 50 3 1.140899 0.6561878 0.8651175 0.1787657 0.09504125
## 29 50 5 1.146688 0.6524589 0.8704164 0.1749121 0.09205127
## 30 50 7 1.151951 0.6493381 0.8779450 0.1740798 0.09190664
## 31 100 0 1.130171 0.6636874 0.8756053 0.1544431 0.08785870
## 32 100 1 1.106350 0.6754262 0.8308069 0.1710071 0.09295143
## 33 100 3 1.110268 0.6733689 0.8479971 0.1649592 0.09258312
## 34 100 5 1.117131 0.6694645 0.8557965 0.1623567 0.08966974
## 35 100 7 1.122534 0.6665816 0.8626227 0.1620758 0.09006781
## MAESD
## 1 0.2736448
## 2 0.2673535
## 3 0.2850348
## 4 0.2889125
## 5 0.2872165
## 6 0.1193115
## 7 0.1175439
## 8 0.1242035
## 9 0.1284891
## 10 0.1251167
## 11 0.1240041
## 12 0.1364519
## 13 0.1238115
## 14 0.1243396
## 15 0.1249116
## 16 0.1073923
## 17 0.1278246
## 18 0.1143906
## 19 0.1113541
## 20 0.1116598
## 21 0.1078867
## 22 0.1290473
## 23 0.1151643
## 24 0.1109704
## 25 0.1128112
## 26 0.1131293
## 27 0.1283744
## 28 0.1173866
## 29 0.1141493
## 30 0.1147991
## 31 0.1101001
## 32 0.1252730
## 33 0.1161649
## 34 0.1139281
## 35 0.1152729
rpartGrid <- expand.grid(maxdepth = seq(1, 10, by = 1))
rpartTune <- train(x = trainPred.pp, y = trainingYield, method = "rpart2", metric = "RMSE",
tuneGrid = rpartGrid, trControl = ctrl)
rpartTune$bestTune## maxdepth
## 3 3
## maxdepth RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 1 1.625503 0.2938364 1.235695 0.1287611 0.09110426 0.10504613
## 2 2 1.577299 0.3432741 1.224915 0.1220109 0.08817430 0.09869234
## 3 3 1.540239 0.3816488 1.192859 0.1252318 0.07769938 0.09853502
## 4 4 1.567766 0.3711832 1.223452 0.1346474 0.06579136 0.09781720
## 5 5 1.558371 0.3852945 1.210444 0.1388159 0.06912356 0.10069130
## 6 6 1.551238 0.3888407 1.198838 0.1158610 0.07098808 0.07655960
## 7 7 1.546039 0.3940567 1.205011 0.1344655 0.07804167 0.08913624
## 8 8 1.553666 0.3939345 1.205712 0.1478528 0.08414120 0.09732488
## 9 9 1.557341 0.3942246 1.211759 0.1559561 0.08811109 0.10236428
## 10 10 1.558139 0.3946435 1.214121 0.1575044 0.08751819 0.11096117
From the above