library(tidyverse)
library(caret)
In this document, we will be going through exercises 8.1, 8.2, 8.3, and 8.7 from Applied Predictive Modeling - Kuhn and Johnson.
Recreate the simulated data from Exercise 7.2:
library(mlbench)
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"
Fit a random forest model to all of the predictors, then estimate the variable importance scores:
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
library(caret)
model1 <- randomForest(y ~ ., data = simulated,
importance = TRUE,
ntree = 1000)
(rfImp1 <- varImp(model1, scale = FALSE))
## Overall
## V1 8.732235404
## V2 6.415369387
## V3 0.763591825
## V4 7.615118809
## V5 2.023524577
## V6 0.165111172
## V7 -0.005961659
## V8 -0.166362581
## V9 -0.095292651
## V10 -0.074944788
Did the random forest model significantly use the uninformative predic- tors (V6 – V10)?
The uninformative predictors were not seen as very important within our random forest model.
Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V1)
## [1] 0.9460206
Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?
model2 <- randomForest(y ~ ., data = simulated,
importance = TRUE,
ntree = 1000)
(rfImp2 <- varImp(model2, scale = FALSE))
## Overall
## V1 5.69119973
## V2 6.06896061
## V3 0.62970218
## V4 7.04752238
## V5 1.87238438
## V6 0.13569065
## V7 -0.01345645
## V8 -0.04370565
## V9 0.00840438
## V10 0.02894814
## duplicate1 4.28331581
simulated$duplicate2 <- simulated$V1 + rnorm(200) * .1
model3 <- randomForest(y ~ ., data = simulated,
importance = TRUE,
ntree = 1000)
(rfImp3 <- varImp(model3, scale = FALSE))
## Overall
## V1 4.91687329
## V2 6.52816504
## V3 0.58711552
## V4 7.04870917
## V5 2.03115561
## V6 0.14213148
## V7 0.10991985
## V8 -0.08405687
## V9 -0.01075028
## V10 0.09230576
## duplicate1 3.80068234
## duplicate2 1.87721959
Each time a new highly correlated predictor is added we see a decrease in importance with the original feature where the correlated columns are derived from. This also leads to an increase in importance of all other variables beyond the correlated set, leaving us with seeing importance values that are lower than they should be for some variables while higher than they should be for others. This is why it is important to tackle collinearity within your data before making a random forest model.
Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that func- tion toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?
library(party)
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
##
## Attaching package: 'strucchange'
## The following object is masked from 'package:stringr':
##
## boundary
##
## Attaching package: 'party'
## The following object is masked from 'package:dplyr':
##
## where
bagCtrl <- cforest_control(mtry = ncol(simulated) - 1)
model4 <- cforest(y ~ ., data = simulated, controls = bagCtrl)
varimp(model4, conditional = FALSE)
## V1 V2 V3 V4 V5 V6
## 3.885759655 7.668577261 0.022784363 10.187144892 1.944941733 0.001131697
## V7 V8 V9 V10 duplicate1 duplicate2
## 0.046943026 -0.049653633 0.018987655 -0.013785512 5.639384800 0.147354677
varimp(model4, conditional = TRUE)
## V1 V2 V3 V4 V5 V6
## 0.592703911 4.693514780 0.004655632 5.768559366 0.678054050 -0.006028802
## V7 V8 V9 V10 duplicate1 duplicate2
## 0.010717836 -0.002740081 0.003729004 -0.002720673 0.668166201 0.008386952
Once more our variable importance of V1 are minimized while the other non correlated variables are raised higher in both cases. However, utilizing conditional importance has now minimized V1 and its duplicates to the extent that it seems these variables are not important. This is because the heavy covariance between them which the conditional importance utilizes in its calculations. Interestingly enough if we started removing variables based on this importance we might remove the original V1 before the new duplicate1 which would make it seem like the duplicate was more important than the original. So that’s something to be wary about.
Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?
library(gbm)
## Warning: package 'gbm' was built under R version 4.3.2
## Loaded gbm 2.1.8.1
model5 <- train(y ~ ., data=simulated,method="gbm",
distribution ="gaussian", verbose=FALSE)
plot(varImp(model5, scale = FALSE))
With a boosted tree model we once more have V1 with a much lower importance compared to its previous standing at the top while the duplicates take importance away from it.
library(Cubist)
## Warning: package 'Cubist' was built under R version 4.3.2
model5 <- train(y ~ ., data=simulated,method="cubist")
varImp(model5, scale = FALSE)
## cubist variable importance
##
## Overall
## V2 69.5
## V1 54.0
## V4 50.0
## V5 38.0
## V3 32.0
## duplicate1 25.0
## duplicate2 25.0
## V6 10.0
## V8 3.0
## V10 0.0
## V9 0.0
## V7 0.0
With a cubist tree model we once more have V1 with a lower importance compared, however it seems to retain importance better than the other tree-based models. Additionally, the duplicate features have the same importance. Thus it overall seems important to remember, no matter which tree-based model you use collinearity can cause some features to seem important and others to seem not important when they really actually are the opposite.
Use a simulation to show tree bias with different granularities.
What is tree bias?:
(Finally, these trees suffer from selection bias: predictors with a higher number of distinct values are favored over more granular predictors (Loh and Shih 1997; Carolin et al. 2007; Loh 2010). Loh and Shih (1997) remarked that “The danger occurs when a data set consists of a mix of informative and noise variables, and the noise variables have many more splits than the informative variables. Then there is a high probability that the noise variables will be chosen to split the top nodes of the tree. Pruning will produce either a tree with misleading structure or no tree at all.”)
We will utilize a combination of rounding and the simulated data that we were working with previously in order to show this type of tree bias. First, we remove our duplicate rows to get ourselves back to the original dataframe.
simulated <- simulated %>%
select(-duplicate1, -duplicate2)
Then we generate a single tree in order to visualize how our data, where even the noise variables have about the same amount of granularity, produces a model.
library(partykit)
## Warning: package 'partykit' was built under R version 4.3.2
## Loading required package: libcoin
##
## Attaching package: 'partykit'
## The following objects are masked from 'package:party':
##
## cforest, ctree, ctree_control, edge_simple, mob, mob_control,
## node_barplot, node_bivplot, node_boxplot, node_inner, node_surv,
## node_terminal, varimp
rpart <- train(y ~ ., data=simulated,
method = "rpart2", tuneLength = 5, trControl = trainControl(method = "cv"))
plot(as.party(rpart$finalModel))
Overall we see that our variables with actual data are present in every split besides the 23rd one. Which is a fairly good data:noise ratio.
Next we reduce the granularity of our data by rounding the important variables to only three digits.
(
sim2 <- simulated %>%
mutate(
across(c(V1,V2,V3,V4,V5),~round(.x, digits = 3))
)
)
## V1 V2 V3 V4 V5 V6 V7 V8
## 1 0.534 0.648 0.851 0.182 0.929 0.361790597 0.8266608594 0.421408064
## 2 0.584 0.438 0.673 0.669 0.164 0.453059313 0.6489600763 0.844623926
## 3 0.590 0.588 0.410 0.338 0.894 0.026819108 0.1785614495 0.349590781
## 4 0.691 0.226 0.033 0.067 0.637 0.525006367 0.5133613953 0.797025980
## 5 0.667 0.819 0.717 0.803 0.083 0.223441572 0.6644906041 0.903891937
## 6 0.839 0.386 0.646 0.861 0.630 0.437038906 0.3360117343 0.648917723
## 7 0.712 0.116 0.768 0.860 0.521 0.990291612 0.0084998407 0.072795420
## 8 0.097 0.845 0.154 0.413 0.747 0.662056439 0.4722572784 0.381633542
## 9 0.524 0.251 0.285 0.452 0.507 0.019370317 0.3058403293 0.525661726
## 10 0.235 0.432 0.780 0.072 0.100 0.294671116 0.3228343336 0.960311741
## 11 0.454 0.119 0.220 0.352 0.002 0.149134940 0.1315679180 0.939230266
## 12 0.649 0.097 0.598 0.735 0.670 0.668878612 0.7618696443 0.550847133
## 13 0.154 0.013 0.609 0.634 0.095 0.616397752 0.4806034924 0.485595847
## 14 0.649 0.187 0.365 0.046 0.806 0.476613421 0.4200604216 0.282479481
## 15 0.383 0.461 0.345 0.735 0.217 0.350192171 0.6595693664 0.469818084
## 16 0.307 0.422 0.979 0.145 0.490 0.516859321 0.6941591527 0.917750880
## 17 0.567 0.781 0.890 0.996 0.413 0.379292418 0.1693497750 0.290054937
## 18 0.132 0.200 0.380 0.570 0.932 0.175180161 0.1774999567 0.724190771
## 19 0.922 0.468 0.010 0.008 0.790 0.846729845 0.6109301876 0.182253540
## 20 0.646 0.793 0.054 0.652 0.226 0.510733424 0.1655056404 0.996789995
## 21 0.460 0.070 0.784 0.687 0.380 0.434224161 0.8696455043 0.480284780
## 22 0.099 0.231 0.521 0.299 0.059 0.321473247 0.2200369346 0.543487359
## 23 0.207 0.647 0.606 0.087 0.537 0.592455531 0.6324701225 0.833171203
## 24 0.922 0.634 0.888 0.504 0.505 0.519335387 0.6703429641 0.285668603
## 25 0.319 0.055 0.381 0.999 0.982 0.515890769 0.0101521015 0.960430558
## 26 0.265 0.856 0.051 0.396 0.937 0.983467527 0.2522400960 0.634320574
## 27 0.717 0.295 0.737 0.919 0.128 0.414260471 0.6650384206 0.650785477
## 28 0.380 0.000 0.733 0.346 0.443 0.558792540 0.0648198922 0.178094681
## 29 0.031 0.273 0.903 0.758 0.147 0.146207562 0.6192647608 0.146737191
## 30 0.521 0.354 0.831 0.394 0.672 0.573219036 0.3292513283 0.042337252
## 31 0.264 0.506 0.141 0.678 0.562 0.812396222 0.2824812313 0.332542812
## 32 0.166 0.300 0.378 0.270 0.712 0.103405041 0.2051616914 0.785139066
## 33 0.483 0.509 0.636 0.654 0.616 0.399691042 0.6479521065 0.175871460
## 34 0.326 0.925 0.818 0.371 0.383 0.441881776 0.3500645757 0.635448656
## 35 0.925 0.237 0.010 0.346 0.548 0.201830557 0.8686830916 0.324884567
## 36 0.561 0.427 0.313 0.610 0.803 0.815573796 0.6281718945 0.651751797
## 37 0.179 0.435 0.153 0.549 0.356 0.094069178 0.5401211150 0.802915463
## 38 0.972 0.636 0.446 0.900 0.269 0.903799011 0.0954262505 0.271751555
## 39 0.494 0.345 0.446 0.449 0.003 0.759522119 0.3189930974 0.487114596
## 40 0.486 0.855 0.813 0.046 0.880 0.513183234 0.0987321541 0.398785473
## 41 0.587 0.280 0.038 0.018 0.150 0.053033940 0.8256829928 0.274217872
## 42 0.721 0.909 0.211 0.875 0.094 0.585604633 0.4254488801 0.543745558
## 43 0.692 0.593 0.421 0.247 0.983 0.427623936 0.6002272039 0.303179812
## 44 0.176 0.714 0.742 0.198 0.836 0.296208750 0.0993871398 0.726330749
## 45 0.952 0.849 0.317 0.925 0.607 0.803849567 0.3573383761 0.659334866
## 46 0.689 0.748 0.808 0.878 0.967 0.574638351 0.3369224169 0.171208087
## 47 0.158 0.669 0.354 0.369 0.378 0.298162997 0.7216083466 0.347465995
## 48 0.576 0.838 0.609 0.192 0.173 0.113711787 0.1637666794 0.465182498
## 49 0.653 0.918 0.894 0.623 0.328 0.832185488 0.0661412831 0.662970975
## 50 0.833 0.227 0.356 0.048 0.548 0.939744496 0.5044657106 0.684693513
## 51 0.067 0.187 0.053 0.788 0.070 0.175875548 0.4450478635 0.954027392
## 52 0.119 0.498 0.082 0.847 0.096 0.953105583 0.8620285194 0.258589442
## 53 0.892 0.919 0.180 0.484 0.850 0.771444682 0.0236228870 0.676012672
## 54 0.558 0.956 0.071 0.285 0.861 0.018207328 0.1452215833 0.955375290
## 55 0.515 0.603 0.492 0.258 0.287 0.054381748 0.3742820916 0.214503606
## 56 0.134 0.725 0.863 0.611 0.281 0.785266264 0.5384509673 0.037968947
## 57 0.149 0.170 0.566 0.564 0.273 0.328188200 0.2577628391 0.186539843
## 58 0.161 0.220 0.223 0.136 0.596 0.105031757 0.2109247684 0.363152975
## 59 0.561 0.780 0.169 0.369 0.295 0.618674891 0.4573113865 0.197379992
## 60 0.672 0.563 0.307 0.797 0.402 0.169488368 0.2189239766 0.259368576
## 61 0.147 0.918 1.000 0.435 0.960 0.746313307 0.9801436167 0.418736819
## 62 0.571 0.890 0.360 0.808 0.943 0.850621036 0.2792056827 0.632583834
## 63 0.758 0.416 0.651 0.089 0.318 0.789343349 0.6202754262 0.088279070
## 64 0.566 0.293 0.935 0.255 0.682 0.116470202 0.6345477444 0.994484988
## 65 0.776 0.296 0.816 0.778 0.345 0.608970067 0.1516102208 0.669617534
## 66 0.677 0.277 0.627 0.938 0.439 0.796132486 0.0638373715 0.153521036
## 67 0.218 0.743 0.371 0.678 0.851 0.108555941 0.3630222164 0.350279308
## 68 0.281 0.256 0.541 0.103 0.500 0.200114859 0.0141349288 0.663475810
## 69 0.095 0.677 0.857 0.147 0.191 0.593200353 0.5267603784 0.429941210
## 70 0.593 0.605 0.830 0.068 0.813 0.272600746 0.6532241760 0.671208466
## 71 0.355 0.902 0.163 0.167 0.786 0.253500610 0.2219880908 0.476079757
## 72 0.153 0.540 0.522 0.132 0.303 0.406195677 0.1531321660 0.691954747
## 73 0.466 0.219 0.649 0.877 0.373 0.500154867 0.5008657877 0.040073613
## 74 0.275 0.581 0.328 0.821 0.365 0.574757167 0.9806543912 0.819187354
## 75 0.398 0.051 0.737 0.069 0.466 0.526418373 0.9244000583 0.198013207
## 76 0.311 0.888 0.830 0.274 0.822 0.576108309 0.0099979453 0.295807930
## 77 0.586 0.661 0.511 0.863 0.947 0.895700728 0.2356902743 0.506900013
## 78 0.151 0.024 0.294 0.763 0.337 0.572278405 0.3650656492 0.810620838
## 79 0.644 0.539 0.439 0.679 0.816 0.739229684 0.7987873172 0.366785977
## 80 0.006 0.324 0.082 0.389 0.144 0.586718400 0.5599248759 0.116578872
## 81 0.817 0.854 0.217 0.442 0.144 0.926974642 0.7522757188 0.026414874
## 82 0.742 0.238 0.580 0.721 0.629 0.597393914 0.8865377689 0.061911004
## 83 0.654 0.594 0.995 0.937 0.913 0.215555561 0.0590573207 0.684142034
## 84 0.441 0.867 0.564 0.379 0.823 0.368613384 0.7768231640 0.850076108
## 85 0.442 0.247 0.827 0.198 0.347 0.981209712 0.4138241454 0.755554634
## 86 0.786 0.920 0.150 0.543 0.513 0.773290500 0.9818397635 0.945009324
## 87 0.459 0.227 0.645 0.170 0.383 0.076924091 0.2916908523 0.702514261
## 88 0.984 0.737 0.306 0.648 0.115 0.384680568 0.1587244619 0.582193831
## 89 0.371 0.152 0.995 0.888 0.512 0.986306011 0.0014450864 0.128084549
## 90 0.998 0.653 0.099 0.917 0.471 0.157362159 0.2105985023 0.884157248
## 91 0.474 0.601 0.279 0.791 0.928 0.305082921 0.8245949268 0.793933963
## 92 0.220 0.655 0.537 0.346 0.043 0.429867428 0.0445211111 0.460617797
## 93 0.031 0.611 0.025 0.910 0.158 0.016845753 0.5918905353 0.485206415
## 94 0.230 0.605 0.640 0.573 0.743 0.887359502 0.0303562423 0.324637733
## 95 0.609 0.912 0.726 0.461 0.011 0.763862459 0.6099704383 0.731502909
## 96 0.371 0.076 0.362 0.433 0.091 0.101371452 0.0619420919 0.788262721
## 97 0.307 0.648 0.831 0.447 0.351 0.442842154 0.6019777379 0.718690965
## 98 0.158 0.387 0.703 0.567 0.685 0.440206385 0.7470110336 0.789140293
## 99 0.448 0.864 0.423 0.633 0.233 0.103215690 0.5261754794 0.983355801
## 100 0.683 0.984 0.612 0.531 0.798 0.597651707 0.4876051969 0.122928050
## 101 0.687 0.605 0.168 0.457 0.608 0.314174908 0.9011508890 0.617960313
## 102 0.832 0.294 0.673 0.803 0.280 0.998987383 0.0003387642 0.388927228
## 103 0.113 0.255 0.219 0.058 0.280 0.239368085 0.9080805883 0.905519754
## 104 0.557 0.813 0.693 0.966 0.351 0.281969752 0.4965877566 0.636728981
## 105 0.192 0.342 0.497 0.414 0.089 0.727522285 0.8487957353 0.912640729
## 106 0.049 0.964 0.606 0.566 0.508 0.614031048 0.0822484379 0.955071241
## 107 0.636 0.481 0.156 0.863 0.745 0.765995033 0.1611803344 0.599529999
## 108 0.003 0.936 0.649 0.242 0.870 0.335062859 0.1425062015 0.571244097
## 109 0.502 0.732 0.968 0.455 0.559 0.644129353 0.4653496440 0.651546226
## 110 0.479 0.810 0.372 0.896 0.852 0.277587108 0.1981665948 0.999560626
## 111 0.905 0.777 0.928 0.170 0.499 0.493327187 0.8408644302 0.437424371
## 112 0.076 0.389 0.870 0.632 0.509 0.512153390 0.9164640335 0.395302785
## 113 0.172 0.664 0.447 0.110 0.687 0.990115111 0.3355441941 0.030395976
## 114 0.598 0.880 0.934 0.184 0.930 0.015515274 0.6774176396 0.696283764
## 115 0.495 0.527 0.166 0.280 0.548 0.006808892 0.9650529190 0.449902425
## 116 0.822 0.087 0.339 0.625 0.140 0.881927748 0.1141603340 0.863953833
## 117 0.885 0.205 0.351 0.772 0.897 0.373503622 0.4158274462 0.713239111
## 118 0.196 0.148 0.548 0.426 0.234 0.459527157 0.1791522284 0.834004079
## 119 0.463 0.468 0.079 0.124 0.210 0.249119067 0.5624954123 0.710008362
## 120 0.008 0.976 0.647 0.623 0.489 0.913406589 0.9726133470 0.842078145
## 121 0.623 0.796 0.725 0.269 0.873 0.629094972 0.4887156698 0.887054311
## 122 0.633 0.606 0.028 0.215 0.438 0.412266847 0.9838132532 0.585472636
## 123 0.313 0.648 0.003 0.679 0.590 0.988463188 0.4743697466 0.242674318
## 124 0.776 0.110 0.169 0.229 0.966 0.842011251 0.2407878560 0.447328788
## 125 0.999 0.861 0.913 0.131 0.664 0.934387672 0.1783671109 0.503948929
## 126 0.751 0.371 0.758 0.436 0.914 0.487309208 0.4367017439 0.344732890
## 127 0.513 0.443 0.777 0.132 0.667 0.893005643 0.3784207169 0.273177421
## 128 0.595 0.615 0.630 0.029 0.157 0.675986912 0.1147928145 0.103627008
## 129 0.108 0.046 0.554 0.836 0.992 0.029231649 0.3586573373 0.238100665
## 130 0.213 0.609 0.344 0.091 0.383 0.830723291 0.2376520939 0.312197204
## 131 0.511 0.868 0.542 0.964 0.315 0.805567415 0.7941833863 0.433608739
## 132 0.653 0.063 0.925 0.036 0.251 0.872808259 0.5684747580 0.312200866
## 133 0.680 0.482 0.544 0.283 0.722 0.261010709 0.9834440248 0.023838794
## 134 0.594 0.952 0.487 0.430 0.078 0.396774478 0.8681481183 0.755061130
## 135 0.566 0.770 0.284 0.481 0.085 0.392161391 0.2529283839 0.309956316
## 136 0.292 0.460 0.031 0.356 0.714 0.646736944 0.2954634046 0.686869082
## 137 0.622 0.898 0.487 0.238 0.755 0.993754126 0.7287589125 0.934555566
## 138 0.734 0.706 0.685 0.159 0.679 0.522533498 0.5105628674 0.330980633
## 139 0.250 0.217 0.828 0.760 0.784 0.528184886 0.4536720926 0.385358283
## 140 0.726 0.321 0.802 0.684 0.786 0.368117954 0.0308236629 0.654293299
## 141 0.519 0.189 0.637 0.216 0.824 0.130777703 0.3351621642 0.409007091
## 142 0.354 0.119 0.772 0.229 0.027 0.470881379 0.6144252147 0.651714271
## 143 0.537 0.506 0.545 0.408 0.221 0.716128180 0.1153656312 0.793603583
## 144 0.828 0.113 0.151 0.249 0.167 0.812992523 0.7361220436 0.177242560
## 145 0.799 0.223 0.315 0.644 0.296 0.010295009 0.9072946333 0.738731097
## 146 0.121 0.513 0.737 0.174 0.987 0.128936096 0.8854910310 0.688096952
## 147 0.966 0.296 0.282 0.181 0.460 0.428525911 0.0388078641 0.214172413
## 148 0.655 0.914 0.685 0.203 0.556 0.384476536 0.3260785770 0.810973278
## 149 0.524 0.362 0.790 0.297 0.181 0.809934170 0.7807700243 0.578197747
## 150 0.406 0.174 0.437 0.529 0.651 0.017366972 0.7907638911 0.208587777
## 151 0.811 0.475 0.737 0.091 0.734 0.640642888 0.7937673274 0.208766474
## 152 0.196 0.188 0.037 0.038 0.362 0.891157788 0.1613148297 0.654811297
## 153 0.902 0.764 0.976 0.649 0.125 0.843844815 0.7080952730 0.903838935
## 154 0.419 0.713 0.537 0.663 0.814 0.346468189 0.5305830529 0.004698141
## 155 0.503 0.158 0.927 0.752 0.535 0.519248105 0.1885056335 0.079913177
## 156 0.506 0.592 0.250 0.230 0.489 0.002901212 0.5161641990 0.466810009
## 157 0.914 0.322 0.149 0.232 0.058 0.278796293 0.6012678132 0.025244068
## 158 0.149 0.558 0.832 0.294 0.117 0.885807658 0.6169876396 0.499346328
## 159 0.937 0.718 0.342 0.827 0.590 0.927866664 0.7895973460 0.786763847
## 160 0.975 0.472 0.145 0.496 0.438 0.338820112 0.0390002122 0.015907055
## 161 0.425 0.631 0.783 0.511 0.314 0.086185409 0.4199729285 0.538111618
## 162 0.812 0.655 0.353 0.395 0.834 0.642550439 0.5129533098 0.613649708
## 163 0.029 0.734 0.288 0.481 0.647 0.586816839 0.6883663596 0.515120720
## 164 0.918 0.670 0.015 0.308 0.909 0.511950091 0.2867906420 0.219839853
## 165 0.214 0.354 0.441 0.705 0.190 0.788839862 0.9943477523 0.381418612
## 166 0.554 0.380 0.427 0.406 0.583 0.982102466 0.2593658050 0.461603611
## 167 0.087 0.044 0.888 0.335 0.958 0.631051316 0.5809806921 0.810821845
## 168 0.052 0.538 0.287 0.544 0.581 0.138133082 0.9927047875 0.928290160
## 169 0.153 0.761 0.588 0.727 0.869 0.140277030 0.3948761285 0.914971805
## 170 0.586 0.634 0.114 0.054 0.133 0.262855512 0.9569103546 0.694013482
## 171 0.915 0.644 0.583 0.953 0.661 0.182328664 0.5889326367 0.509478116
## 172 0.022 0.367 0.137 0.985 0.360 0.504694133 0.6114409810 0.830605455
## 173 0.176 0.506 0.364 0.650 0.942 0.238556437 0.4287582068 0.366665906
## 174 0.083 0.179 0.373 0.705 0.368 0.640988169 0.1562757776 0.463090256
## 175 0.492 0.490 0.298 0.331 0.870 0.530976141 0.0453625068 0.177558171
## 176 0.666 0.617 0.978 0.226 0.989 0.411653516 0.6969831374 0.631295626
## 177 0.028 0.377 0.719 0.022 0.712 0.537131623 0.1762444021 0.080035930
## 178 0.880 0.732 0.869 0.650 0.669 0.397410648 0.2357786680 0.385391619
## 179 0.796 0.276 0.781 0.406 0.116 0.363247100 0.5439273247 0.627847850
## 180 0.067 0.435 0.641 0.071 0.726 0.845899404 0.8102693288 0.839098028
## 181 0.369 0.288 0.375 0.381 0.753 0.080834012 0.0133081947 0.168556706
## 182 0.111 0.512 0.743 0.402 0.633 0.334856320 0.6584848959 0.287486323
## 183 0.826 0.780 0.426 0.962 0.727 0.637834937 0.3847984066 0.117578465
## 184 0.613 0.915 0.257 0.630 0.376 0.271460483 0.1876184577 0.798065790
## 185 0.638 0.108 0.281 0.655 0.933 0.974294363 0.0729941991 0.472175947
## 186 0.723 0.414 0.976 0.462 0.261 0.760588499 0.7420141906 0.437308198
## 187 0.088 0.365 0.112 0.427 0.631 0.064398194 0.4334616780 0.849088030
## 188 0.206 0.679 0.270 0.026 0.534 0.339906024 0.6745903494 0.986013835
## 189 0.272 0.453 0.733 0.710 0.555 0.429670967 0.3680351812 0.496575716
## 190 0.521 0.314 0.804 0.350 0.498 0.495041399 0.6922219021 0.011549961
## 191 0.777 0.962 0.754 0.092 0.551 0.733340201 0.0971158685 0.412869229
## 192 0.289 0.843 0.202 0.496 0.547 0.428116115 0.1838154087 0.210182422
## 193 0.989 0.236 0.704 0.447 0.046 0.493715242 0.5494235111 0.720276414
## 194 0.311 0.476 0.772 0.275 0.416 0.183150188 0.7455740881 0.886575559
## 195 0.582 0.040 0.147 0.854 0.729 0.166967576 0.8809142739 0.322509774
## 196 0.701 0.705 0.532 0.319 0.511 0.799619116 0.6664053195 0.267326644
## 197 0.539 0.159 0.654 0.492 0.607 0.903776626 0.8829731958 0.123415505
## 198 0.011 0.960 0.776 0.773 0.595 0.984942808 0.2567906741 0.309302658
## 199 0.604 0.959 0.875 0.094 0.826 0.624501048 0.5021552520 0.334554489
## 200 0.089 0.601 0.114 0.826 0.558 0.084706127 0.6627391470 0.959779277
## V9 V10 y
## 1 0.59111440 0.588621560 18.463980
## 2 0.92819306 0.758400814 16.098360
## 3 0.01759542 0.444118458 17.761647
## 4 0.68986918 0.445071622 13.787300
## 5 0.39696995 0.550080800 18.429836
## 6 0.53116033 0.906618237 20.858166
## 7 0.97395768 0.440172910 13.888401
## 8 0.75877525 0.710887919 12.915431
## 9 0.43136410 0.400128186 12.149448
## 10 0.92426620 0.832559698 5.271123
## 11 0.46228702 0.775593376 8.946052
## 12 0.08637756 0.524860506 12.894078
## 13 0.54158360 0.081258474 6.533292
## 14 0.62596273 0.003172379 7.520004
## 15 0.03909819 0.706367689 15.141730
## 16 0.33898498 0.689810129 12.974733
## 17 0.69159996 0.120331543 25.016165
## 18 0.14320681 0.075203559 12.436690
## 19 0.20626788 0.247241936 16.899482
## 20 0.57239957 0.970613467 22.052472
## 21 0.94174627 0.525216599 12.892896
## 22 0.91963756 0.948926731 3.555960
## 23 0.91811502 0.181142754 10.032559
## 24 0.84918471 0.397776954 19.728886
## 25 0.23506862 0.230721395 15.271168
## 26 0.87522970 0.635452710 21.921927
## 27 0.28744271 0.616526859 17.848795
## 28 0.56876498 0.428627433 5.964147
## 29 0.33908035 0.938806834 13.254190
## 30 0.86859891 0.880950373 15.181016
## 31 0.95191869 0.763801782 18.021637
## 32 0.04968553 0.968671421 8.385652
## 33 0.67905283 0.730759681 17.967104
## 34 0.89620270 0.575594178 14.097415
## 35 0.03161212 0.849832683 17.352764
## 36 0.69222388 0.442386812 17.608729
## 37 0.51344968 0.959669129 13.136663
## 38 0.91415621 0.535858047 21.879995
## 39 0.38007261 0.068724414 9.980150
## 40 0.70468780 0.814099950 18.002840
## 41 0.24911536 0.363809015 9.489052
## 42 0.56771488 0.586587886 19.991424
## 43 0.30442762 0.996330470 17.766711
## 44 0.51122937 0.574089565 13.299483
## 45 0.82108557 0.880166529 18.828347
## 46 0.58813299 0.298467441 26.945666
## 47 0.45980972 0.522110901 9.358890
## 48 0.67228512 0.549377529 12.133777
## 49 0.67190601 0.712600555 22.132197
## 50 0.23367805 0.853776497 8.886239
## 51 0.87891811 0.228369326 13.343289
## 52 0.96154879 0.437253388 12.709052
## 53 0.37005100 0.260396947 16.708610
## 54 0.70495617 0.882332698 19.797593
## 55 0.13091783 0.462896239 12.166608
## 56 0.61604382 0.535041264 13.772781
## 57 0.93924713 0.942935549 6.563014
## 58 0.22155711 0.789319306 5.325261
## 59 0.41096121 0.626259638 18.244465
## 60 0.39850931 0.322176775 20.632950
## 61 0.50638210 0.791115409 18.974731
## 62 0.92330341 0.663045999 21.779238
## 63 0.26778657 0.207211251 9.197490
## 64 0.67288102 0.949158964 15.073224
## 65 0.12557122 0.392013110 17.561914
## 66 0.22166633 0.123506922 17.601765
## 67 0.14200169 0.041529058 15.824533
## 68 0.67553172 0.501539484 7.134264
## 69 0.97828898 0.885561631 6.413136
## 70 0.65576477 0.673768810 14.613611
## 71 0.71183447 0.425648322 16.255757
## 72 0.65801907 0.426056444 4.698295
## 73 0.92800985 0.381413696 14.105578
## 74 0.84159996 0.896280786 15.707841
## 75 0.23337787 0.116631588 4.888355
## 76 0.90356056 0.885416760 16.992321
## 77 0.55981430 0.534760828 22.980799
## 78 0.11388461 0.410755347 9.623636
## 79 0.60186782 0.973480208 20.669432
## 80 0.34580507 0.714962742 7.096990
## 81 0.34715430 0.836363711 13.593588
## 82 0.97803556 0.612240904 15.468199
## 83 0.71260502 0.255351776 28.381673
## 84 0.17097505 0.408350330 17.977054
## 85 0.92548457 0.945293089 10.694865
## 86 0.68008657 0.510965007 17.640215
## 87 0.85496045 0.604653787 7.138598
## 88 0.05235641 0.486561552 15.656878
## 89 0.47699398 0.679715703 18.363995
## 90 0.18779775 0.402003791 23.572686
## 91 0.53137582 0.756866812 21.330888
## 92 0.58243842 0.944980194 8.631156
## 93 0.77555244 0.542896184 16.373575
## 94 0.98245833 0.605254818 13.359596
## 95 0.49410755 0.882788646 15.744177
## 96 0.32567336 0.790985481 8.078082
## 97 0.24367231 0.571871065 14.497672
## 98 0.18306352 0.852484180 14.942224
## 99 0.37887188 0.765290411 15.829082
## 100 0.08177731 0.848134163 19.178018
## 101 0.12460480 0.702702949 19.928981
## 102 0.97821702 0.104494945 17.178817
## 103 0.60285307 0.877587634 5.784234
## 104 0.79705572 0.319453278 22.061638
## 105 0.11477043 0.155581983 5.776771
## 106 0.06021731 0.421584291 9.597466
## 107 0.26890232 0.188167820 23.395596
## 108 0.74935739 0.464775090 7.444598
## 109 0.95540449 0.564070737 19.434952
## 110 0.57950670 0.127039001 21.911755
## 111 0.93131402 0.133683814 16.718751
## 112 0.63386223 0.888060797 10.227144
## 113 0.20250656 0.053323140 8.500874
## 114 0.75913311 0.809930917 20.114536
## 115 0.57304399 0.753339031 13.443701
## 116 0.95345440 0.380650506 8.789036
## 117 0.60102539 0.635894833 18.291185
## 118 0.20838605 0.592570851 5.901247
## 119 0.18311621 0.394101708 12.337007
## 120 0.93576851 0.956083941 8.710816
## 121 0.31041901 0.510736341 18.395967
## 122 0.14732752 0.086574405 19.595790
## 123 0.42347788 0.868947359 20.525566
## 124 0.66467369 0.494879879 11.827082
## 125 0.16230501 0.187285239 13.071909
## 126 0.42201320 0.437095925 17.164010
## 127 0.11701698 0.547305345 12.604661
## 128 0.16504249 0.275940885 9.364654
## 129 0.68896641 0.177130092 12.350974
## 130 0.25109129 0.997579731 6.692139
## 131 0.76234330 0.569325565 19.390364
## 132 0.11102144 0.240327878 8.216058
## 133 0.16816803 0.999992477 14.163226
## 134 0.16099688 0.581953628 14.852324
## 135 0.35376172 0.576154274 14.960020
## 136 0.23401367 0.743752910 14.371226
## 137 0.27740693 0.256602708 16.171388
## 138 0.32516244 0.474497675 17.728141
## 139 0.03655417 0.171900066 16.347169
## 140 0.92551144 0.435859671 18.929738
## 141 0.67315717 0.996981804 9.388698
## 142 0.12889572 0.633149029 5.393462
## 143 0.55421117 0.356097830 12.775396
## 144 0.03421274 0.872309444 8.442596
## 145 0.26822183 0.173052300 13.267265
## 146 0.75856509 0.669689639 8.941159
## 147 0.61253457 0.207942491 14.353866
## 148 0.39199971 0.370241001 15.394612
## 149 0.70276752 0.123103430 9.084654
## 150 0.05438854 0.766739978 11.312659
## 151 0.83636350 0.858343330 16.299293
## 152 0.66729678 0.115404263 5.839912
## 153 0.31193910 0.969527644 19.525552
## 154 0.48546810 0.259277162 17.824782
## 155 0.57972754 0.905628399 15.534928
## 156 0.76300679 0.944087471 14.460986
## 157 0.19821959 0.402928842 13.992105
## 158 0.32760646 0.266050707 9.634654
## 159 0.07832531 0.312945329 20.360593
## 160 0.57958929 0.502512869 21.628769
## 161 0.94932648 0.535985540 17.995487
## 162 0.26589918 0.269859831 19.648813
## 163 0.64359538 0.473454136 8.728757
## 164 0.16326702 0.120969625 22.293667
## 165 0.81146453 0.136825552 10.776480
## 166 0.68922230 0.171518739 12.980947
## 167 0.21677497 0.741042868 13.200163
## 168 0.48741662 0.881588204 11.617543
## 169 0.04233756 0.987049844 15.072437
## 170 0.97705820 0.012435718 13.042415
## 171 0.64141950 0.365601074 21.870618
## 172 0.52658278 0.065911923 15.131138
## 173 0.86474573 0.016379327 14.669672
## 174 0.02956841 0.971855537 9.609650
## 175 0.10203368 0.505838658 14.428173
## 176 0.58758911 0.247838673 21.871572
## 177 0.84510174 0.700015211 6.124100
## 178 0.62558152 0.954732809 19.906736
## 179 0.88854501 0.855865543 14.038694
## 180 0.15944836 0.379626432 6.595364
## 181 0.10525675 0.771278932 12.188266
## 182 0.12728033 0.041241815 11.029138
## 183 0.96804658 0.902288517 20.917170
## 184 0.92881170 0.884292413 18.056711
## 185 0.07752809 0.744791487 11.085355
## 186 0.66531113 0.045803222 17.402036
## 187 0.46642330 0.779457425 11.244792
## 188 0.50686606 0.312909630 9.334814
## 189 0.49428027 0.447099152 15.629095
## 190 0.57885152 0.406213945 12.631487
## 191 0.97960020 0.136601998 13.381026
## 192 0.04580357 0.427214286 16.619825
## 193 0.29912111 0.994646956 11.294074
## 194 0.25383887 0.521175744 10.942370
## 195 0.66346270 0.930126207 15.337523
## 196 0.24159338 0.606167757 17.171795
## 197 0.99102161 0.073506513 10.496627
## 198 0.05267484 0.698478227 14.793806
## 199 0.03080467 0.395422508 17.333764
## 200 0.09184009 0.219858172 17.153588
Interestingly enough we have the exact same tree, it looks like reducing the granularity this little is not enough to evoke tree bias.
rpart <- train(y ~ ., data=sim2,
method = "rpart2", tuneLength = 5, trControl = trainControl(method = "cv"))
plot(as.party(rpart$finalModel))
Thus we try to round these with only a single digit to reduce distinct values even more.
(
sim3 <- simulated %>%
mutate(
across(c(V1,V2,V3,V4,V5),~round(.x, digits = 1))
)
)
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 1 0.5 0.6 0.9 0.2 0.9 0.361790597 0.8266608594 0.421408064 0.59111440
## 2 0.6 0.4 0.7 0.7 0.2 0.453059313 0.6489600763 0.844623926 0.92819306
## 3 0.6 0.6 0.4 0.3 0.9 0.026819108 0.1785614495 0.349590781 0.01759542
## 4 0.7 0.2 0.0 0.1 0.6 0.525006367 0.5133613953 0.797025980 0.68986918
## 5 0.7 0.8 0.7 0.8 0.1 0.223441572 0.6644906041 0.903891937 0.39696995
## 6 0.8 0.4 0.6 0.9 0.6 0.437038906 0.3360117343 0.648917723 0.53116033
## 7 0.7 0.1 0.8 0.9 0.5 0.990291612 0.0084998407 0.072795420 0.97395768
## 8 0.1 0.8 0.2 0.4 0.7 0.662056439 0.4722572784 0.381633542 0.75877525
## 9 0.5 0.3 0.3 0.5 0.5 0.019370317 0.3058403293 0.525661726 0.43136410
## 10 0.2 0.4 0.8 0.1 0.1 0.294671116 0.3228343336 0.960311741 0.92426620
## 11 0.5 0.1 0.2 0.4 0.0 0.149134940 0.1315679180 0.939230266 0.46228702
## 12 0.6 0.1 0.6 0.7 0.7 0.668878612 0.7618696443 0.550847133 0.08637756
## 13 0.2 0.0 0.6 0.6 0.1 0.616397752 0.4806034924 0.485595847 0.54158360
## 14 0.6 0.2 0.4 0.0 0.8 0.476613421 0.4200604216 0.282479481 0.62596273
## 15 0.4 0.5 0.3 0.7 0.2 0.350192171 0.6595693664 0.469818084 0.03909819
## 16 0.3 0.4 1.0 0.1 0.5 0.516859321 0.6941591527 0.917750880 0.33898498
## 17 0.6 0.8 0.9 1.0 0.4 0.379292418 0.1693497750 0.290054937 0.69159996
## 18 0.1 0.2 0.4 0.6 0.9 0.175180161 0.1774999567 0.724190771 0.14320681
## 19 0.9 0.5 0.0 0.0 0.8 0.846729845 0.6109301876 0.182253540 0.20626788
## 20 0.6 0.8 0.1 0.7 0.2 0.510733424 0.1655056404 0.996789995 0.57239957
## 21 0.5 0.1 0.8 0.7 0.4 0.434224161 0.8696455043 0.480284780 0.94174627
## 22 0.1 0.2 0.5 0.3 0.1 0.321473247 0.2200369346 0.543487359 0.91963756
## 23 0.2 0.6 0.6 0.1 0.5 0.592455531 0.6324701225 0.833171203 0.91811502
## 24 0.9 0.6 0.9 0.5 0.5 0.519335387 0.6703429641 0.285668603 0.84918471
## 25 0.3 0.1 0.4 1.0 1.0 0.515890769 0.0101521015 0.960430558 0.23506862
## 26 0.3 0.9 0.1 0.4 0.9 0.983467527 0.2522400960 0.634320574 0.87522970
## 27 0.7 0.3 0.7 0.9 0.1 0.414260471 0.6650384206 0.650785477 0.28744271
## 28 0.4 0.0 0.7 0.3 0.4 0.558792540 0.0648198922 0.178094681 0.56876498
## 29 0.0 0.3 0.9 0.8 0.1 0.146207562 0.6192647608 0.146737191 0.33908035
## 30 0.5 0.4 0.8 0.4 0.7 0.573219036 0.3292513283 0.042337252 0.86859891
## 31 0.3 0.5 0.1 0.7 0.6 0.812396222 0.2824812313 0.332542812 0.95191869
## 32 0.2 0.3 0.4 0.3 0.7 0.103405041 0.2051616914 0.785139066 0.04968553
## 33 0.5 0.5 0.6 0.7 0.6 0.399691042 0.6479521065 0.175871460 0.67905283
## 34 0.3 0.9 0.8 0.4 0.4 0.441881776 0.3500645757 0.635448656 0.89620270
## 35 0.9 0.2 0.0 0.3 0.5 0.201830557 0.8686830916 0.324884567 0.03161212
## 36 0.6 0.4 0.3 0.6 0.8 0.815573796 0.6281718945 0.651751797 0.69222388
## 37 0.2 0.4 0.2 0.5 0.4 0.094069178 0.5401211150 0.802915463 0.51344968
## 38 1.0 0.6 0.4 0.9 0.3 0.903799011 0.0954262505 0.271751555 0.91415621
## 39 0.5 0.3 0.4 0.4 0.0 0.759522119 0.3189930974 0.487114596 0.38007261
## 40 0.5 0.9 0.8 0.0 0.9 0.513183234 0.0987321541 0.398785473 0.70468780
## 41 0.6 0.3 0.0 0.0 0.2 0.053033940 0.8256829928 0.274217872 0.24911536
## 42 0.7 0.9 0.2 0.9 0.1 0.585604633 0.4254488801 0.543745558 0.56771488
## 43 0.7 0.6 0.4 0.2 1.0 0.427623936 0.6002272039 0.303179812 0.30442762
## 44 0.2 0.7 0.7 0.2 0.8 0.296208750 0.0993871398 0.726330749 0.51122937
## 45 1.0 0.8 0.3 0.9 0.6 0.803849567 0.3573383761 0.659334866 0.82108557
## 46 0.7 0.7 0.8 0.9 1.0 0.574638351 0.3369224169 0.171208087 0.58813299
## 47 0.2 0.7 0.4 0.4 0.4 0.298162997 0.7216083466 0.347465995 0.45980972
## 48 0.6 0.8 0.6 0.2 0.2 0.113711787 0.1637666794 0.465182498 0.67228512
## 49 0.7 0.9 0.9 0.6 0.3 0.832185488 0.0661412831 0.662970975 0.67190601
## 50 0.8 0.2 0.4 0.0 0.5 0.939744496 0.5044657106 0.684693513 0.23367805
## 51 0.1 0.2 0.1 0.8 0.1 0.175875548 0.4450478635 0.954027392 0.87891811
## 52 0.1 0.5 0.1 0.8 0.1 0.953105583 0.8620285194 0.258589442 0.96154879
## 53 0.9 0.9 0.2 0.5 0.9 0.771444682 0.0236228870 0.676012672 0.37005100
## 54 0.6 1.0 0.1 0.3 0.9 0.018207328 0.1452215833 0.955375290 0.70495617
## 55 0.5 0.6 0.5 0.3 0.3 0.054381748 0.3742820916 0.214503606 0.13091783
## 56 0.1 0.7 0.9 0.6 0.3 0.785266264 0.5384509673 0.037968947 0.61604382
## 57 0.1 0.2 0.6 0.6 0.3 0.328188200 0.2577628391 0.186539843 0.93924713
## 58 0.2 0.2 0.2 0.1 0.6 0.105031757 0.2109247684 0.363152975 0.22155711
## 59 0.6 0.8 0.2 0.4 0.3 0.618674891 0.4573113865 0.197379992 0.41096121
## 60 0.7 0.6 0.3 0.8 0.4 0.169488368 0.2189239766 0.259368576 0.39850931
## 61 0.1 0.9 1.0 0.4 1.0 0.746313307 0.9801436167 0.418736819 0.50638210
## 62 0.6 0.9 0.4 0.8 0.9 0.850621036 0.2792056827 0.632583834 0.92330341
## 63 0.8 0.4 0.7 0.1 0.3 0.789343349 0.6202754262 0.088279070 0.26778657
## 64 0.6 0.3 0.9 0.3 0.7 0.116470202 0.6345477444 0.994484988 0.67288102
## 65 0.8 0.3 0.8 0.8 0.3 0.608970067 0.1516102208 0.669617534 0.12557122
## 66 0.7 0.3 0.6 0.9 0.4 0.796132486 0.0638373715 0.153521036 0.22166633
## 67 0.2 0.7 0.4 0.7 0.9 0.108555941 0.3630222164 0.350279308 0.14200169
## 68 0.3 0.3 0.5 0.1 0.5 0.200114859 0.0141349288 0.663475810 0.67553172
## 69 0.1 0.7 0.9 0.1 0.2 0.593200353 0.5267603784 0.429941210 0.97828898
## 70 0.6 0.6 0.8 0.1 0.8 0.272600746 0.6532241760 0.671208466 0.65576477
## 71 0.4 0.9 0.2 0.2 0.8 0.253500610 0.2219880908 0.476079757 0.71183447
## 72 0.2 0.5 0.5 0.1 0.3 0.406195677 0.1531321660 0.691954747 0.65801907
## 73 0.5 0.2 0.6 0.9 0.4 0.500154867 0.5008657877 0.040073613 0.92800985
## 74 0.3 0.6 0.3 0.8 0.4 0.574757167 0.9806543912 0.819187354 0.84159996
## 75 0.4 0.1 0.7 0.1 0.5 0.526418373 0.9244000583 0.198013207 0.23337787
## 76 0.3 0.9 0.8 0.3 0.8 0.576108309 0.0099979453 0.295807930 0.90356056
## 77 0.6 0.7 0.5 0.9 0.9 0.895700728 0.2356902743 0.506900013 0.55981430
## 78 0.2 0.0 0.3 0.8 0.3 0.572278405 0.3650656492 0.810620838 0.11388461
## 79 0.6 0.5 0.4 0.7 0.8 0.739229684 0.7987873172 0.366785977 0.60186782
## 80 0.0 0.3 0.1 0.4 0.1 0.586718400 0.5599248759 0.116578872 0.34580507
## 81 0.8 0.9 0.2 0.4 0.1 0.926974642 0.7522757188 0.026414874 0.34715430
## 82 0.7 0.2 0.6 0.7 0.6 0.597393914 0.8865377689 0.061911004 0.97803556
## 83 0.7 0.6 1.0 0.9 0.9 0.215555561 0.0590573207 0.684142034 0.71260502
## 84 0.4 0.9 0.6 0.4 0.8 0.368613384 0.7768231640 0.850076108 0.17097505
## 85 0.4 0.2 0.8 0.2 0.3 0.981209712 0.4138241454 0.755554634 0.92548457
## 86 0.8 0.9 0.2 0.5 0.5 0.773290500 0.9818397635 0.945009324 0.68008657
## 87 0.5 0.2 0.6 0.2 0.4 0.076924091 0.2916908523 0.702514261 0.85496045
## 88 1.0 0.7 0.3 0.6 0.1 0.384680568 0.1587244619 0.582193831 0.05235641
## 89 0.4 0.2 1.0 0.9 0.5 0.986306011 0.0014450864 0.128084549 0.47699398
## 90 1.0 0.7 0.1 0.9 0.5 0.157362159 0.2105985023 0.884157248 0.18779775
## 91 0.5 0.6 0.3 0.8 0.9 0.305082921 0.8245949268 0.793933963 0.53137582
## 92 0.2 0.7 0.5 0.3 0.0 0.429867428 0.0445211111 0.460617797 0.58243842
## 93 0.0 0.6 0.0 0.9 0.2 0.016845753 0.5918905353 0.485206415 0.77555244
## 94 0.2 0.6 0.6 0.6 0.7 0.887359502 0.0303562423 0.324637733 0.98245833
## 95 0.6 0.9 0.7 0.5 0.0 0.763862459 0.6099704383 0.731502909 0.49410755
## 96 0.4 0.1 0.4 0.4 0.1 0.101371452 0.0619420919 0.788262721 0.32567336
## 97 0.3 0.6 0.8 0.4 0.4 0.442842154 0.6019777379 0.718690965 0.24367231
## 98 0.2 0.4 0.7 0.6 0.7 0.440206385 0.7470110336 0.789140293 0.18306352
## 99 0.4 0.9 0.4 0.6 0.2 0.103215690 0.5261754794 0.983355801 0.37887188
## 100 0.7 1.0 0.6 0.5 0.8 0.597651707 0.4876051969 0.122928050 0.08177731
## 101 0.7 0.6 0.2 0.5 0.6 0.314174908 0.9011508890 0.617960313 0.12460480
## 102 0.8 0.3 0.7 0.8 0.3 0.998987383 0.0003387642 0.388927228 0.97821702
## 103 0.1 0.3 0.2 0.1 0.3 0.239368085 0.9080805883 0.905519754 0.60285307
## 104 0.6 0.8 0.7 1.0 0.4 0.281969752 0.4965877566 0.636728981 0.79705572
## 105 0.2 0.3 0.5 0.4 0.1 0.727522285 0.8487957353 0.912640729 0.11477043
## 106 0.0 1.0 0.6 0.6 0.5 0.614031048 0.0822484379 0.955071241 0.06021731
## 107 0.6 0.5 0.2 0.9 0.7 0.765995033 0.1611803344 0.599529999 0.26890232
## 108 0.0 0.9 0.6 0.2 0.9 0.335062859 0.1425062015 0.571244097 0.74935739
## 109 0.5 0.7 1.0 0.5 0.6 0.644129353 0.4653496440 0.651546226 0.95540449
## 110 0.5 0.8 0.4 0.9 0.9 0.277587108 0.1981665948 0.999560626 0.57950670
## 111 0.9 0.8 0.9 0.2 0.5 0.493327187 0.8408644302 0.437424371 0.93131402
## 112 0.1 0.4 0.9 0.6 0.5 0.512153390 0.9164640335 0.395302785 0.63386223
## 113 0.2 0.7 0.4 0.1 0.7 0.990115111 0.3355441941 0.030395976 0.20250656
## 114 0.6 0.9 0.9 0.2 0.9 0.015515274 0.6774176396 0.696283764 0.75913311
## 115 0.5 0.5 0.2 0.3 0.5 0.006808892 0.9650529190 0.449902425 0.57304399
## 116 0.8 0.1 0.3 0.6 0.1 0.881927748 0.1141603340 0.863953833 0.95345440
## 117 0.9 0.2 0.4 0.8 0.9 0.373503622 0.4158274462 0.713239111 0.60102539
## 118 0.2 0.1 0.5 0.4 0.2 0.459527157 0.1791522284 0.834004079 0.20838605
## 119 0.5 0.5 0.1 0.1 0.2 0.249119067 0.5624954123 0.710008362 0.18311621
## 120 0.0 1.0 0.6 0.6 0.5 0.913406589 0.9726133470 0.842078145 0.93576851
## 121 0.6 0.8 0.7 0.3 0.9 0.629094972 0.4887156698 0.887054311 0.31041901
## 122 0.6 0.6 0.0 0.2 0.4 0.412266847 0.9838132532 0.585472636 0.14732752
## 123 0.3 0.6 0.0 0.7 0.6 0.988463188 0.4743697466 0.242674318 0.42347788
## 124 0.8 0.1 0.2 0.2 1.0 0.842011251 0.2407878560 0.447328788 0.66467369
## 125 1.0 0.9 0.9 0.1 0.7 0.934387672 0.1783671109 0.503948929 0.16230501
## 126 0.8 0.4 0.8 0.4 0.9 0.487309208 0.4367017439 0.344732890 0.42201320
## 127 0.5 0.4 0.8 0.1 0.7 0.893005643 0.3784207169 0.273177421 0.11701698
## 128 0.6 0.6 0.6 0.0 0.2 0.675986912 0.1147928145 0.103627008 0.16504249
## 129 0.1 0.0 0.6 0.8 1.0 0.029231649 0.3586573373 0.238100665 0.68896641
## 130 0.2 0.6 0.3 0.1 0.4 0.830723291 0.2376520939 0.312197204 0.25109129
## 131 0.5 0.9 0.5 1.0 0.3 0.805567415 0.7941833863 0.433608739 0.76234330
## 132 0.7 0.1 0.9 0.0 0.3 0.872808259 0.5684747580 0.312200866 0.11102144
## 133 0.7 0.5 0.5 0.3 0.7 0.261010709 0.9834440248 0.023838794 0.16816803
## 134 0.6 1.0 0.5 0.4 0.1 0.396774478 0.8681481183 0.755061130 0.16099688
## 135 0.6 0.8 0.3 0.5 0.1 0.392161391 0.2529283839 0.309956316 0.35376172
## 136 0.3 0.5 0.0 0.4 0.7 0.646736944 0.2954634046 0.686869082 0.23401367
## 137 0.6 0.9 0.5 0.2 0.8 0.993754126 0.7287589125 0.934555566 0.27740693
## 138 0.7 0.7 0.7 0.2 0.7 0.522533498 0.5105628674 0.330980633 0.32516244
## 139 0.3 0.2 0.8 0.8 0.8 0.528184886 0.4536720926 0.385358283 0.03655417
## 140 0.7 0.3 0.8 0.7 0.8 0.368117954 0.0308236629 0.654293299 0.92551144
## 141 0.5 0.2 0.6 0.2 0.8 0.130777703 0.3351621642 0.409007091 0.67315717
## 142 0.4 0.1 0.8 0.2 0.0 0.470881379 0.6144252147 0.651714271 0.12889572
## 143 0.5 0.5 0.5 0.4 0.2 0.716128180 0.1153656312 0.793603583 0.55421117
## 144 0.8 0.1 0.2 0.2 0.2 0.812992523 0.7361220436 0.177242560 0.03421274
## 145 0.8 0.2 0.3 0.6 0.3 0.010295009 0.9072946333 0.738731097 0.26822183
## 146 0.1 0.5 0.7 0.2 1.0 0.128936096 0.8854910310 0.688096952 0.75856509
## 147 1.0 0.3 0.3 0.2 0.5 0.428525911 0.0388078641 0.214172413 0.61253457
## 148 0.7 0.9 0.7 0.2 0.6 0.384476536 0.3260785770 0.810973278 0.39199971
## 149 0.5 0.4 0.8 0.3 0.2 0.809934170 0.7807700243 0.578197747 0.70276752
## 150 0.4 0.2 0.4 0.5 0.7 0.017366972 0.7907638911 0.208587777 0.05438854
## 151 0.8 0.5 0.7 0.1 0.7 0.640642888 0.7937673274 0.208766474 0.83636350
## 152 0.2 0.2 0.0 0.0 0.4 0.891157788 0.1613148297 0.654811297 0.66729678
## 153 0.9 0.8 1.0 0.6 0.1 0.843844815 0.7080952730 0.903838935 0.31193910
## 154 0.4 0.7 0.5 0.7 0.8 0.346468189 0.5305830529 0.004698141 0.48546810
## 155 0.5 0.2 0.9 0.8 0.5 0.519248105 0.1885056335 0.079913177 0.57972754
## 156 0.5 0.6 0.2 0.2 0.5 0.002901212 0.5161641990 0.466810009 0.76300679
## 157 0.9 0.3 0.1 0.2 0.1 0.278796293 0.6012678132 0.025244068 0.19821959
## 158 0.1 0.6 0.8 0.3 0.1 0.885807658 0.6169876396 0.499346328 0.32760646
## 159 0.9 0.7 0.3 0.8 0.6 0.927866664 0.7895973460 0.786763847 0.07832531
## 160 1.0 0.5 0.1 0.5 0.4 0.338820112 0.0390002122 0.015907055 0.57958929
## 161 0.4 0.6 0.8 0.5 0.3 0.086185409 0.4199729285 0.538111618 0.94932648
## 162 0.8 0.7 0.4 0.4 0.8 0.642550439 0.5129533098 0.613649708 0.26589918
## 163 0.0 0.7 0.3 0.5 0.6 0.586816839 0.6883663596 0.515120720 0.64359538
## 164 0.9 0.7 0.0 0.3 0.9 0.511950091 0.2867906420 0.219839853 0.16326702
## 165 0.2 0.4 0.4 0.7 0.2 0.788839862 0.9943477523 0.381418612 0.81146453
## 166 0.6 0.4 0.4 0.4 0.6 0.982102466 0.2593658050 0.461603611 0.68922230
## 167 0.1 0.0 0.9 0.3 1.0 0.631051316 0.5809806921 0.810821845 0.21677497
## 168 0.1 0.5 0.3 0.5 0.6 0.138133082 0.9927047875 0.928290160 0.48741662
## 169 0.2 0.8 0.6 0.7 0.9 0.140277030 0.3948761285 0.914971805 0.04233756
## 170 0.6 0.6 0.1 0.1 0.1 0.262855512 0.9569103546 0.694013482 0.97705820
## 171 0.9 0.6 0.6 1.0 0.7 0.182328664 0.5889326367 0.509478116 0.64141950
## 172 0.0 0.4 0.1 1.0 0.4 0.504694133 0.6114409810 0.830605455 0.52658278
## 173 0.2 0.5 0.4 0.7 0.9 0.238556437 0.4287582068 0.366665906 0.86474573
## 174 0.1 0.2 0.4 0.7 0.4 0.640988169 0.1562757776 0.463090256 0.02956841
## 175 0.5 0.5 0.3 0.3 0.9 0.530976141 0.0453625068 0.177558171 0.10203368
## 176 0.7 0.6 1.0 0.2 1.0 0.411653516 0.6969831374 0.631295626 0.58758911
## 177 0.0 0.4 0.7 0.0 0.7 0.537131623 0.1762444021 0.080035930 0.84510174
## 178 0.9 0.7 0.9 0.6 0.7 0.397410648 0.2357786680 0.385391619 0.62558152
## 179 0.8 0.3 0.8 0.4 0.1 0.363247100 0.5439273247 0.627847850 0.88854501
## 180 0.1 0.4 0.6 0.1 0.7 0.845899404 0.8102693288 0.839098028 0.15944836
## 181 0.4 0.3 0.4 0.4 0.8 0.080834012 0.0133081947 0.168556706 0.10525675
## 182 0.1 0.5 0.7 0.4 0.6 0.334856320 0.6584848959 0.287486323 0.12728033
## 183 0.8 0.8 0.4 1.0 0.7 0.637834937 0.3847984066 0.117578465 0.96804658
## 184 0.6 0.9 0.3 0.6 0.4 0.271460483 0.1876184577 0.798065790 0.92881170
## 185 0.6 0.1 0.3 0.7 0.9 0.974294363 0.0729941991 0.472175947 0.07752809
## 186 0.7 0.4 1.0 0.5 0.3 0.760588499 0.7420141906 0.437308198 0.66531113
## 187 0.1 0.4 0.1 0.4 0.6 0.064398194 0.4334616780 0.849088030 0.46642330
## 188 0.2 0.7 0.3 0.0 0.5 0.339906024 0.6745903494 0.986013835 0.50686606
## 189 0.3 0.5 0.7 0.7 0.6 0.429670967 0.3680351812 0.496575716 0.49428027
## 190 0.5 0.3 0.8 0.3 0.5 0.495041399 0.6922219021 0.011549961 0.57885152
## 191 0.8 1.0 0.8 0.1 0.6 0.733340201 0.0971158685 0.412869229 0.97960020
## 192 0.3 0.8 0.2 0.5 0.5 0.428116115 0.1838154087 0.210182422 0.04580357
## 193 1.0 0.2 0.7 0.4 0.0 0.493715242 0.5494235111 0.720276414 0.29912111
## 194 0.3 0.5 0.8 0.3 0.4 0.183150188 0.7455740881 0.886575559 0.25383887
## 195 0.6 0.0 0.1 0.9 0.7 0.166967576 0.8809142739 0.322509774 0.66346270
## 196 0.7 0.7 0.5 0.3 0.5 0.799619116 0.6664053195 0.267326644 0.24159338
## 197 0.5 0.2 0.7 0.5 0.6 0.903776626 0.8829731958 0.123415505 0.99102161
## 198 0.0 1.0 0.8 0.8 0.6 0.984942808 0.2567906741 0.309302658 0.05267484
## 199 0.6 1.0 0.9 0.1 0.8 0.624501048 0.5021552520 0.334554489 0.03080467
## 200 0.1 0.6 0.1 0.8 0.6 0.084706127 0.6627391470 0.959779277 0.09184009
## V10 y
## 1 0.588621560 18.463980
## 2 0.758400814 16.098360
## 3 0.444118458 17.761647
## 4 0.445071622 13.787300
## 5 0.550080800 18.429836
## 6 0.906618237 20.858166
## 7 0.440172910 13.888401
## 8 0.710887919 12.915431
## 9 0.400128186 12.149448
## 10 0.832559698 5.271123
## 11 0.775593376 8.946052
## 12 0.524860506 12.894078
## 13 0.081258474 6.533292
## 14 0.003172379 7.520004
## 15 0.706367689 15.141730
## 16 0.689810129 12.974733
## 17 0.120331543 25.016165
## 18 0.075203559 12.436690
## 19 0.247241936 16.899482
## 20 0.970613467 22.052472
## 21 0.525216599 12.892896
## 22 0.948926731 3.555960
## 23 0.181142754 10.032559
## 24 0.397776954 19.728886
## 25 0.230721395 15.271168
## 26 0.635452710 21.921927
## 27 0.616526859 17.848795
## 28 0.428627433 5.964147
## 29 0.938806834 13.254190
## 30 0.880950373 15.181016
## 31 0.763801782 18.021637
## 32 0.968671421 8.385652
## 33 0.730759681 17.967104
## 34 0.575594178 14.097415
## 35 0.849832683 17.352764
## 36 0.442386812 17.608729
## 37 0.959669129 13.136663
## 38 0.535858047 21.879995
## 39 0.068724414 9.980150
## 40 0.814099950 18.002840
## 41 0.363809015 9.489052
## 42 0.586587886 19.991424
## 43 0.996330470 17.766711
## 44 0.574089565 13.299483
## 45 0.880166529 18.828347
## 46 0.298467441 26.945666
## 47 0.522110901 9.358890
## 48 0.549377529 12.133777
## 49 0.712600555 22.132197
## 50 0.853776497 8.886239
## 51 0.228369326 13.343289
## 52 0.437253388 12.709052
## 53 0.260396947 16.708610
## 54 0.882332698 19.797593
## 55 0.462896239 12.166608
## 56 0.535041264 13.772781
## 57 0.942935549 6.563014
## 58 0.789319306 5.325261
## 59 0.626259638 18.244465
## 60 0.322176775 20.632950
## 61 0.791115409 18.974731
## 62 0.663045999 21.779238
## 63 0.207211251 9.197490
## 64 0.949158964 15.073224
## 65 0.392013110 17.561914
## 66 0.123506922 17.601765
## 67 0.041529058 15.824533
## 68 0.501539484 7.134264
## 69 0.885561631 6.413136
## 70 0.673768810 14.613611
## 71 0.425648322 16.255757
## 72 0.426056444 4.698295
## 73 0.381413696 14.105578
## 74 0.896280786 15.707841
## 75 0.116631588 4.888355
## 76 0.885416760 16.992321
## 77 0.534760828 22.980799
## 78 0.410755347 9.623636
## 79 0.973480208 20.669432
## 80 0.714962742 7.096990
## 81 0.836363711 13.593588
## 82 0.612240904 15.468199
## 83 0.255351776 28.381673
## 84 0.408350330 17.977054
## 85 0.945293089 10.694865
## 86 0.510965007 17.640215
## 87 0.604653787 7.138598
## 88 0.486561552 15.656878
## 89 0.679715703 18.363995
## 90 0.402003791 23.572686
## 91 0.756866812 21.330888
## 92 0.944980194 8.631156
## 93 0.542896184 16.373575
## 94 0.605254818 13.359596
## 95 0.882788646 15.744177
## 96 0.790985481 8.078082
## 97 0.571871065 14.497672
## 98 0.852484180 14.942224
## 99 0.765290411 15.829082
## 100 0.848134163 19.178018
## 101 0.702702949 19.928981
## 102 0.104494945 17.178817
## 103 0.877587634 5.784234
## 104 0.319453278 22.061638
## 105 0.155581983 5.776771
## 106 0.421584291 9.597466
## 107 0.188167820 23.395596
## 108 0.464775090 7.444598
## 109 0.564070737 19.434952
## 110 0.127039001 21.911755
## 111 0.133683814 16.718751
## 112 0.888060797 10.227144
## 113 0.053323140 8.500874
## 114 0.809930917 20.114536
## 115 0.753339031 13.443701
## 116 0.380650506 8.789036
## 117 0.635894833 18.291185
## 118 0.592570851 5.901247
## 119 0.394101708 12.337007
## 120 0.956083941 8.710816
## 121 0.510736341 18.395967
## 122 0.086574405 19.595790
## 123 0.868947359 20.525566
## 124 0.494879879 11.827082
## 125 0.187285239 13.071909
## 126 0.437095925 17.164010
## 127 0.547305345 12.604661
## 128 0.275940885 9.364654
## 129 0.177130092 12.350974
## 130 0.997579731 6.692139
## 131 0.569325565 19.390364
## 132 0.240327878 8.216058
## 133 0.999992477 14.163226
## 134 0.581953628 14.852324
## 135 0.576154274 14.960020
## 136 0.743752910 14.371226
## 137 0.256602708 16.171388
## 138 0.474497675 17.728141
## 139 0.171900066 16.347169
## 140 0.435859671 18.929738
## 141 0.996981804 9.388698
## 142 0.633149029 5.393462
## 143 0.356097830 12.775396
## 144 0.872309444 8.442596
## 145 0.173052300 13.267265
## 146 0.669689639 8.941159
## 147 0.207942491 14.353866
## 148 0.370241001 15.394612
## 149 0.123103430 9.084654
## 150 0.766739978 11.312659
## 151 0.858343330 16.299293
## 152 0.115404263 5.839912
## 153 0.969527644 19.525552
## 154 0.259277162 17.824782
## 155 0.905628399 15.534928
## 156 0.944087471 14.460986
## 157 0.402928842 13.992105
## 158 0.266050707 9.634654
## 159 0.312945329 20.360593
## 160 0.502512869 21.628769
## 161 0.535985540 17.995487
## 162 0.269859831 19.648813
## 163 0.473454136 8.728757
## 164 0.120969625 22.293667
## 165 0.136825552 10.776480
## 166 0.171518739 12.980947
## 167 0.741042868 13.200163
## 168 0.881588204 11.617543
## 169 0.987049844 15.072437
## 170 0.012435718 13.042415
## 171 0.365601074 21.870618
## 172 0.065911923 15.131138
## 173 0.016379327 14.669672
## 174 0.971855537 9.609650
## 175 0.505838658 14.428173
## 176 0.247838673 21.871572
## 177 0.700015211 6.124100
## 178 0.954732809 19.906736
## 179 0.855865543 14.038694
## 180 0.379626432 6.595364
## 181 0.771278932 12.188266
## 182 0.041241815 11.029138
## 183 0.902288517 20.917170
## 184 0.884292413 18.056711
## 185 0.744791487 11.085355
## 186 0.045803222 17.402036
## 187 0.779457425 11.244792
## 188 0.312909630 9.334814
## 189 0.447099152 15.629095
## 190 0.406213945 12.631487
## 191 0.136601998 13.381026
## 192 0.427214286 16.619825
## 193 0.994646956 11.294074
## 194 0.521175744 10.942370
## 195 0.930126207 15.337523
## 196 0.606167757 17.171795
## 197 0.073506513 10.496627
## 198 0.698478227 14.793806
## 199 0.395422508 17.333764
## 200 0.219858172 17.153588
We get a change in the extraneous variable present but all of the meaningful variables are still there. Thus, it doesn’t seem like the selection of distinct values really change mun we’re only dealing in continuous variables.
rpart <- train(y ~ ., data=sim3,
method = "rpart2", tuneLength = 5, trControl = trainControl(method = "cv"))
plot(as.party(rpart$finalModel))
Next, we’ll try changing our variables to actually be discrete variables through factorization.
(
sim2 <- simulated %>%
mutate(
across(c(V1:V5),~as.character(round(.x, digits = 1))),
across(c(V6:V10),~as.character(round(.x, digits = 6)))
)
)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 y
## 1 0.5 0.6 0.9 0.2 0.9 0.361791 0.826661 0.421408 0.591114 0.588622 18.463980
## 2 0.6 0.4 0.7 0.7 0.2 0.453059 0.64896 0.844624 0.928193 0.758401 16.098360
## 3 0.6 0.6 0.4 0.3 0.9 0.026819 0.178561 0.349591 0.017595 0.444118 17.761647
## 4 0.7 0.2 0 0.1 0.6 0.525006 0.513361 0.797026 0.689869 0.445072 13.787300
## 5 0.7 0.8 0.7 0.8 0.1 0.223442 0.664491 0.903892 0.39697 0.550081 18.429836
## 6 0.8 0.4 0.6 0.9 0.6 0.437039 0.336012 0.648918 0.53116 0.906618 20.858166
## 7 0.7 0.1 0.8 0.9 0.5 0.990292 0.0085 0.072795 0.973958 0.440173 13.888401
## 8 0.1 0.8 0.2 0.4 0.7 0.662056 0.472257 0.381634 0.758775 0.710888 12.915431
## 9 0.5 0.3 0.3 0.5 0.5 0.01937 0.30584 0.525662 0.431364 0.400128 12.149448
## 10 0.2 0.4 0.8 0.1 0.1 0.294671 0.322834 0.960312 0.924266 0.83256 5.271123
## 11 0.5 0.1 0.2 0.4 0 0.149135 0.131568 0.93923 0.462287 0.775593 8.946052
## 12 0.6 0.1 0.6 0.7 0.7 0.668879 0.76187 0.550847 0.086378 0.524861 12.894078
## 13 0.2 0 0.6 0.6 0.1 0.616398 0.480603 0.485596 0.541584 0.081258 6.533292
## 14 0.6 0.2 0.4 0 0.8 0.476613 0.42006 0.282479 0.625963 0.003172 7.520004
## 15 0.4 0.5 0.3 0.7 0.2 0.350192 0.659569 0.469818 0.039098 0.706368 15.141730
## 16 0.3 0.4 1 0.1 0.5 0.516859 0.694159 0.917751 0.338985 0.68981 12.974733
## 17 0.6 0.8 0.9 1 0.4 0.379292 0.16935 0.290055 0.6916 0.120332 25.016165
## 18 0.1 0.2 0.4 0.6 0.9 0.17518 0.1775 0.724191 0.143207 0.075204 12.436690
## 19 0.9 0.5 0 0 0.8 0.84673 0.61093 0.182254 0.206268 0.247242 16.899482
## 20 0.6 0.8 0.1 0.7 0.2 0.510733 0.165506 0.99679 0.5724 0.970613 22.052472
## 21 0.5 0.1 0.8 0.7 0.4 0.434224 0.869646 0.480285 0.941746 0.525217 12.892896
## 22 0.1 0.2 0.5 0.3 0.1 0.321473 0.220037 0.543487 0.919638 0.948927 3.555960
## 23 0.2 0.6 0.6 0.1 0.5 0.592456 0.63247 0.833171 0.918115 0.181143 10.032559
## 24 0.9 0.6 0.9 0.5 0.5 0.519335 0.670343 0.285669 0.849185 0.397777 19.728886
## 25 0.3 0.1 0.4 1 1 0.515891 0.010152 0.960431 0.235069 0.230721 15.271168
## 26 0.3 0.9 0.1 0.4 0.9 0.983468 0.25224 0.634321 0.87523 0.635453 21.921927
## 27 0.7 0.3 0.7 0.9 0.1 0.41426 0.665038 0.650785 0.287443 0.616527 17.848795
## 28 0.4 0 0.7 0.3 0.4 0.558793 0.06482 0.178095 0.568765 0.428627 5.964147
## 29 0 0.3 0.9 0.8 0.1 0.146208 0.619265 0.146737 0.33908 0.938807 13.254190
## 30 0.5 0.4 0.8 0.4 0.7 0.573219 0.329251 0.042337 0.868599 0.88095 15.181016
## 31 0.3 0.5 0.1 0.7 0.6 0.812396 0.282481 0.332543 0.951919 0.763802 18.021637
## 32 0.2 0.3 0.4 0.3 0.7 0.103405 0.205162 0.785139 0.049686 0.968671 8.385652
## 33 0.5 0.5 0.6 0.7 0.6 0.399691 0.647952 0.175871 0.679053 0.73076 17.967104
## 34 0.3 0.9 0.8 0.4 0.4 0.441882 0.350065 0.635449 0.896203 0.575594 14.097415
## 35 0.9 0.2 0 0.3 0.5 0.201831 0.868683 0.324885 0.031612 0.849833 17.352764
## 36 0.6 0.4 0.3 0.6 0.8 0.815574 0.628172 0.651752 0.692224 0.442387 17.608729
## 37 0.2 0.4 0.2 0.5 0.4 0.094069 0.540121 0.802915 0.51345 0.959669 13.136663
## 38 1 0.6 0.4 0.9 0.3 0.903799 0.095426 0.271752 0.914156 0.535858 21.879995
## 39 0.5 0.3 0.4 0.4 0 0.759522 0.318993 0.487115 0.380073 0.068724 9.980150
## 40 0.5 0.9 0.8 0 0.9 0.513183 0.098732 0.398785 0.704688 0.8141 18.002840
## 41 0.6 0.3 0 0 0.2 0.053034 0.825683 0.274218 0.249115 0.363809 9.489052
## 42 0.7 0.9 0.2 0.9 0.1 0.585605 0.425449 0.543746 0.567715 0.586588 19.991424
## 43 0.7 0.6 0.4 0.2 1 0.427624 0.600227 0.30318 0.304428 0.99633 17.766711
## 44 0.2 0.7 0.7 0.2 0.8 0.296209 0.099387 0.726331 0.511229 0.57409 13.299483
## 45 1 0.8 0.3 0.9 0.6 0.80385 0.357338 0.659335 0.821086 0.880167 18.828347
## 46 0.7 0.7 0.8 0.9 1 0.574638 0.336922 0.171208 0.588133 0.298467 26.945666
## 47 0.2 0.7 0.4 0.4 0.4 0.298163 0.721608 0.347466 0.45981 0.522111 9.358890
## 48 0.6 0.8 0.6 0.2 0.2 0.113712 0.163767 0.465182 0.672285 0.549378 12.133777
## 49 0.7 0.9 0.9 0.6 0.3 0.832185 0.066141 0.662971 0.671906 0.712601 22.132197
## 50 0.8 0.2 0.4 0 0.5 0.939744 0.504466 0.684694 0.233678 0.853776 8.886239
## 51 0.1 0.2 0.1 0.8 0.1 0.175876 0.445048 0.954027 0.878918 0.228369 13.343289
## 52 0.1 0.5 0.1 0.8 0.1 0.953106 0.862029 0.258589 0.961549 0.437253 12.709052
## 53 0.9 0.9 0.2 0.5 0.9 0.771445 0.023623 0.676013 0.370051 0.260397 16.708610
## 54 0.6 1 0.1 0.3 0.9 0.018207 0.145222 0.955375 0.704956 0.882333 19.797593
## 55 0.5 0.6 0.5 0.3 0.3 0.054382 0.374282 0.214504 0.130918 0.462896 12.166608
## 56 0.1 0.7 0.9 0.6 0.3 0.785266 0.538451 0.037969 0.616044 0.535041 13.772781
## 57 0.1 0.2 0.6 0.6 0.3 0.328188 0.257763 0.18654 0.939247 0.942936 6.563014
## 58 0.2 0.2 0.2 0.1 0.6 0.105032 0.210925 0.363153 0.221557 0.789319 5.325261
## 59 0.6 0.8 0.2 0.4 0.3 0.618675 0.457311 0.19738 0.410961 0.62626 18.244465
## 60 0.7 0.6 0.3 0.8 0.4 0.169488 0.218924 0.259369 0.398509 0.322177 20.632950
## 61 0.1 0.9 1 0.4 1 0.746313 0.980144 0.418737 0.506382 0.791115 18.974731
## 62 0.6 0.9 0.4 0.8 0.9 0.850621 0.279206 0.632584 0.923303 0.663046 21.779238
## 63 0.8 0.4 0.7 0.1 0.3 0.789343 0.620275 0.088279 0.267787 0.207211 9.197490
## 64 0.6 0.3 0.9 0.3 0.7 0.11647 0.634548 0.994485 0.672881 0.949159 15.073224
## 65 0.8 0.3 0.8 0.8 0.3 0.60897 0.15161 0.669618 0.125571 0.392013 17.561914
## 66 0.7 0.3 0.6 0.9 0.4 0.796132 0.063837 0.153521 0.221666 0.123507 17.601765
## 67 0.2 0.7 0.4 0.7 0.9 0.108556 0.363022 0.350279 0.142002 0.041529 15.824533
## 68 0.3 0.3 0.5 0.1 0.5 0.200115 0.014135 0.663476 0.675532 0.501539 7.134264
## 69 0.1 0.7 0.9 0.1 0.2 0.5932 0.52676 0.429941 0.978289 0.885562 6.413136
## 70 0.6 0.6 0.8 0.1 0.8 0.272601 0.653224 0.671208 0.655765 0.673769 14.613611
## 71 0.4 0.9 0.2 0.2 0.8 0.253501 0.221988 0.47608 0.711834 0.425648 16.255757
## 72 0.2 0.5 0.5 0.1 0.3 0.406196 0.153132 0.691955 0.658019 0.426056 4.698295
## 73 0.5 0.2 0.6 0.9 0.4 0.500155 0.500866 0.040074 0.92801 0.381414 14.105578
## 74 0.3 0.6 0.3 0.8 0.4 0.574757 0.980654 0.819187 0.8416 0.896281 15.707841
## 75 0.4 0.1 0.7 0.1 0.5 0.526418 0.9244 0.198013 0.233378 0.116632 4.888355
## 76 0.3 0.9 0.8 0.3 0.8 0.576108 0.009998 0.295808 0.903561 0.885417 16.992321
## 77 0.6 0.7 0.5 0.9 0.9 0.895701 0.23569 0.5069 0.559814 0.534761 22.980799
## 78 0.2 0 0.3 0.8 0.3 0.572278 0.365066 0.810621 0.113885 0.410755 9.623636
## 79 0.6 0.5 0.4 0.7 0.8 0.73923 0.798787 0.366786 0.601868 0.97348 20.669432
## 80 0 0.3 0.1 0.4 0.1 0.586718 0.559925 0.116579 0.345805 0.714963 7.096990
## 81 0.8 0.9 0.2 0.4 0.1 0.926975 0.752276 0.026415 0.347154 0.836364 13.593588
## 82 0.7 0.2 0.6 0.7 0.6 0.597394 0.886538 0.061911 0.978036 0.612241 15.468199
## 83 0.7 0.6 1 0.9 0.9 0.215556 0.059057 0.684142 0.712605 0.255352 28.381673
## 84 0.4 0.9 0.6 0.4 0.8 0.368613 0.776823 0.850076 0.170975 0.40835 17.977054
## 85 0.4 0.2 0.8 0.2 0.3 0.98121 0.413824 0.755555 0.925485 0.945293 10.694865
## 86 0.8 0.9 0.2 0.5 0.5 0.773291 0.98184 0.945009 0.680087 0.510965 17.640215
## 87 0.5 0.2 0.6 0.2 0.4 0.076924 0.291691 0.702514 0.85496 0.604654 7.138598
## 88 1 0.7 0.3 0.6 0.1 0.384681 0.158724 0.582194 0.052356 0.486562 15.656878
## 89 0.4 0.2 1 0.9 0.5 0.986306 0.001445 0.128085 0.476994 0.679716 18.363995
## 90 1 0.7 0.1 0.9 0.5 0.157362 0.210599 0.884157 0.187798 0.402004 23.572686
## 91 0.5 0.6 0.3 0.8 0.9 0.305083 0.824595 0.793934 0.531376 0.756867 21.330888
## 92 0.2 0.7 0.5 0.3 0 0.429867 0.044521 0.460618 0.582438 0.94498 8.631156
## 93 0 0.6 0 0.9 0.2 0.016846 0.591891 0.485206 0.775552 0.542896 16.373575
## 94 0.2 0.6 0.6 0.6 0.7 0.88736 0.030356 0.324638 0.982458 0.605255 13.359596
## 95 0.6 0.9 0.7 0.5 0 0.763862 0.60997 0.731503 0.494108 0.882789 15.744177
## 96 0.4 0.1 0.4 0.4 0.1 0.101371 0.061942 0.788263 0.325673 0.790985 8.078082
## 97 0.3 0.6 0.8 0.4 0.4 0.442842 0.601978 0.718691 0.243672 0.571871 14.497672
## 98 0.2 0.4 0.7 0.6 0.7 0.440206 0.747011 0.78914 0.183064 0.852484 14.942224
## 99 0.4 0.9 0.4 0.6 0.2 0.103216 0.526175 0.983356 0.378872 0.76529 15.829082
## 100 0.7 1 0.6 0.5 0.8 0.597652 0.487605 0.122928 0.081777 0.848134 19.178018
## 101 0.7 0.6 0.2 0.5 0.6 0.314175 0.901151 0.61796 0.124605 0.702703 19.928981
## 102 0.8 0.3 0.7 0.8 0.3 0.998987 0.000339 0.388927 0.978217 0.104495 17.178817
## 103 0.1 0.3 0.2 0.1 0.3 0.239368 0.908081 0.90552 0.602853 0.877588 5.784234
## 104 0.6 0.8 0.7 1 0.4 0.28197 0.496588 0.636729 0.797056 0.319453 22.061638
## 105 0.2 0.3 0.5 0.4 0.1 0.727522 0.848796 0.912641 0.11477 0.155582 5.776771
## 106 0 1 0.6 0.6 0.5 0.614031 0.082248 0.955071 0.060217 0.421584 9.597466
## 107 0.6 0.5 0.2 0.9 0.7 0.765995 0.16118 0.59953 0.268902 0.188168 23.395596
## 108 0 0.9 0.6 0.2 0.9 0.335063 0.142506 0.571244 0.749357 0.464775 7.444598
## 109 0.5 0.7 1 0.5 0.6 0.644129 0.46535 0.651546 0.955404 0.564071 19.434952
## 110 0.5 0.8 0.4 0.9 0.9 0.277587 0.198167 0.999561 0.579507 0.127039 21.911755
## 111 0.9 0.8 0.9 0.2 0.5 0.493327 0.840864 0.437424 0.931314 0.133684 16.718751
## 112 0.1 0.4 0.9 0.6 0.5 0.512153 0.916464 0.395303 0.633862 0.888061 10.227144
## 113 0.2 0.7 0.4 0.1 0.7 0.990115 0.335544 0.030396 0.202507 0.053323 8.500874
## 114 0.6 0.9 0.9 0.2 0.9 0.015515 0.677418 0.696284 0.759133 0.809931 20.114536
## 115 0.5 0.5 0.2 0.3 0.5 0.006809 0.965053 0.449902 0.573044 0.753339 13.443701
## 116 0.8 0.1 0.3 0.6 0.1 0.881928 0.11416 0.863954 0.953454 0.380651 8.789036
## 117 0.9 0.2 0.4 0.8 0.9 0.373504 0.415827 0.713239 0.601025 0.635895 18.291185
## 118 0.2 0.1 0.5 0.4 0.2 0.459527 0.179152 0.834004 0.208386 0.592571 5.901247
## 119 0.5 0.5 0.1 0.1 0.2 0.249119 0.562495 0.710008 0.183116 0.394102 12.337007
## 120 0 1 0.6 0.6 0.5 0.913407 0.972613 0.842078 0.935769 0.956084 8.710816
## 121 0.6 0.8 0.7 0.3 0.9 0.629095 0.488716 0.887054 0.310419 0.510736 18.395967
## 122 0.6 0.6 0 0.2 0.4 0.412267 0.983813 0.585473 0.147328 0.086574 19.595790
## 123 0.3 0.6 0 0.7 0.6 0.988463 0.47437 0.242674 0.423478 0.868947 20.525566
## 124 0.8 0.1 0.2 0.2 1 0.842011 0.240788 0.447329 0.664674 0.49488 11.827082
## 125 1 0.9 0.9 0.1 0.7 0.934388 0.178367 0.503949 0.162305 0.187285 13.071909
## 126 0.8 0.4 0.8 0.4 0.9 0.487309 0.436702 0.344733 0.422013 0.437096 17.164010
## 127 0.5 0.4 0.8 0.1 0.7 0.893006 0.378421 0.273177 0.117017 0.547305 12.604661
## 128 0.6 0.6 0.6 0 0.2 0.675987 0.114793 0.103627 0.165042 0.275941 9.364654
## 129 0.1 0 0.6 0.8 1 0.029232 0.358657 0.238101 0.688966 0.17713 12.350974
## 130 0.2 0.6 0.3 0.1 0.4 0.830723 0.237652 0.312197 0.251091 0.99758 6.692139
## 131 0.5 0.9 0.5 1 0.3 0.805567 0.794183 0.433609 0.762343 0.569326 19.390364
## 132 0.7 0.1 0.9 0 0.3 0.872808 0.568475 0.312201 0.111021 0.240328 8.216058
## 133 0.7 0.5 0.5 0.3 0.7 0.261011 0.983444 0.023839 0.168168 0.999992 14.163226
## 134 0.6 1 0.5 0.4 0.1 0.396774 0.868148 0.755061 0.160997 0.581954 14.852324
## 135 0.6 0.8 0.3 0.5 0.1 0.392161 0.252928 0.309956 0.353762 0.576154 14.960020
## 136 0.3 0.5 0 0.4 0.7 0.646737 0.295463 0.686869 0.234014 0.743753 14.371226
## 137 0.6 0.9 0.5 0.2 0.8 0.993754 0.728759 0.934556 0.277407 0.256603 16.171388
## 138 0.7 0.7 0.7 0.2 0.7 0.522533 0.510563 0.330981 0.325162 0.474498 17.728141
## 139 0.3 0.2 0.8 0.8 0.8 0.528185 0.453672 0.385358 0.036554 0.1719 16.347169
## 140 0.7 0.3 0.8 0.7 0.8 0.368118 0.030824 0.654293 0.925511 0.43586 18.929738
## 141 0.5 0.2 0.6 0.2 0.8 0.130778 0.335162 0.409007 0.673157 0.996982 9.388698
## 142 0.4 0.1 0.8 0.2 0 0.470881 0.614425 0.651714 0.128896 0.633149 5.393462
## 143 0.5 0.5 0.5 0.4 0.2 0.716128 0.115366 0.793604 0.554211 0.356098 12.775396
## 144 0.8 0.1 0.2 0.2 0.2 0.812993 0.736122 0.177243 0.034213 0.872309 8.442596
## 145 0.8 0.2 0.3 0.6 0.3 0.010295 0.907295 0.738731 0.268222 0.173052 13.267265
## 146 0.1 0.5 0.7 0.2 1 0.128936 0.885491 0.688097 0.758565 0.66969 8.941159
## 147 1 0.3 0.3 0.2 0.5 0.428526 0.038808 0.214172 0.612535 0.207942 14.353866
## 148 0.7 0.9 0.7 0.2 0.6 0.384477 0.326079 0.810973 0.392 0.370241 15.394612
## 149 0.5 0.4 0.8 0.3 0.2 0.809934 0.78077 0.578198 0.702768 0.123103 9.084654
## 150 0.4 0.2 0.4 0.5 0.7 0.017367 0.790764 0.208588 0.054389 0.76674 11.312659
## 151 0.8 0.5 0.7 0.1 0.7 0.640643 0.793767 0.208766 0.836363 0.858343 16.299293
## 152 0.2 0.2 0 0 0.4 0.891158 0.161315 0.654811 0.667297 0.115404 5.839912
## 153 0.9 0.8 1 0.6 0.1 0.843845 0.708095 0.903839 0.311939 0.969528 19.525552
## 154 0.4 0.7 0.5 0.7 0.8 0.346468 0.530583 0.004698 0.485468 0.259277 17.824782
## 155 0.5 0.2 0.9 0.8 0.5 0.519248 0.188506 0.079913 0.579728 0.905628 15.534928
## 156 0.5 0.6 0.2 0.2 0.5 0.002901 0.516164 0.46681 0.763007 0.944087 14.460986
## 157 0.9 0.3 0.1 0.2 0.1 0.278796 0.601268 0.025244 0.19822 0.402929 13.992105
## 158 0.1 0.6 0.8 0.3 0.1 0.885808 0.616988 0.499346 0.327606 0.266051 9.634654
## 159 0.9 0.7 0.3 0.8 0.6 0.927867 0.789597 0.786764 0.078325 0.312945 20.360593
## 160 1 0.5 0.1 0.5 0.4 0.33882 0.039 0.015907 0.579589 0.502513 21.628769
## 161 0.4 0.6 0.8 0.5 0.3 0.086185 0.419973 0.538112 0.949326 0.535986 17.995487
## 162 0.8 0.7 0.4 0.4 0.8 0.64255 0.512953 0.61365 0.265899 0.26986 19.648813
## 163 0 0.7 0.3 0.5 0.6 0.586817 0.688366 0.515121 0.643595 0.473454 8.728757
## 164 0.9 0.7 0 0.3 0.9 0.51195 0.286791 0.21984 0.163267 0.12097 22.293667
## 165 0.2 0.4 0.4 0.7 0.2 0.78884 0.994348 0.381419 0.811465 0.136826 10.776480
## 166 0.6 0.4 0.4 0.4 0.6 0.982102 0.259366 0.461604 0.689222 0.171519 12.980947
## 167 0.1 0 0.9 0.3 1 0.631051 0.580981 0.810822 0.216775 0.741043 13.200163
## 168 0.1 0.5 0.3 0.5 0.6 0.138133 0.992705 0.92829 0.487417 0.881588 11.617543
## 169 0.2 0.8 0.6 0.7 0.9 0.140277 0.394876 0.914972 0.042338 0.98705 15.072437
## 170 0.6 0.6 0.1 0.1 0.1 0.262856 0.95691 0.694013 0.977058 0.012436 13.042415
## 171 0.9 0.6 0.6 1 0.7 0.182329 0.588933 0.509478 0.64142 0.365601 21.870618
## 172 0 0.4 0.1 1 0.4 0.504694 0.611441 0.830605 0.526583 0.065912 15.131138
## 173 0.2 0.5 0.4 0.7 0.9 0.238556 0.428758 0.366666 0.864746 0.016379 14.669672
## 174 0.1 0.2 0.4 0.7 0.4 0.640988 0.156276 0.46309 0.029568 0.971856 9.609650
## 175 0.5 0.5 0.3 0.3 0.9 0.530976 0.045363 0.177558 0.102034 0.505839 14.428173
## 176 0.7 0.6 1 0.2 1 0.411654 0.696983 0.631296 0.587589 0.247839 21.871572
## 177 0 0.4 0.7 0 0.7 0.537132 0.176244 0.080036 0.845102 0.700015 6.124100
## 178 0.9 0.7 0.9 0.6 0.7 0.397411 0.235779 0.385392 0.625582 0.954733 19.906736
## 179 0.8 0.3 0.8 0.4 0.1 0.363247 0.543927 0.627848 0.888545 0.855866 14.038694
## 180 0.1 0.4 0.6 0.1 0.7 0.845899 0.810269 0.839098 0.159448 0.379626 6.595364
## 181 0.4 0.3 0.4 0.4 0.8 0.080834 0.013308 0.168557 0.105257 0.771279 12.188266
## 182 0.1 0.5 0.7 0.4 0.6 0.334856 0.658485 0.287486 0.12728 0.041242 11.029138
## 183 0.8 0.8 0.4 1 0.7 0.637835 0.384798 0.117578 0.968047 0.902289 20.917170
## 184 0.6 0.9 0.3 0.6 0.4 0.27146 0.187618 0.798066 0.928812 0.884292 18.056711
## 185 0.6 0.1 0.3 0.7 0.9 0.974294 0.072994 0.472176 0.077528 0.744791 11.085355
## 186 0.7 0.4 1 0.5 0.3 0.760588 0.742014 0.437308 0.665311 0.045803 17.402036
## 187 0.1 0.4 0.1 0.4 0.6 0.064398 0.433462 0.849088 0.466423 0.779457 11.244792
## 188 0.2 0.7 0.3 0 0.5 0.339906 0.67459 0.986014 0.506866 0.31291 9.334814
## 189 0.3 0.5 0.7 0.7 0.6 0.429671 0.368035 0.496576 0.49428 0.447099 15.629095
## 190 0.5 0.3 0.8 0.3 0.5 0.495041 0.692222 0.01155 0.578852 0.406214 12.631487
## 191 0.8 1 0.8 0.1 0.6 0.73334 0.097116 0.412869 0.9796 0.136602 13.381026
## 192 0.3 0.8 0.2 0.5 0.5 0.428116 0.183815 0.210182 0.045804 0.427214 16.619825
## 193 1 0.2 0.7 0.4 0 0.493715 0.549424 0.720276 0.299121 0.994647 11.294074
## 194 0.3 0.5 0.8 0.3 0.4 0.18315 0.745574 0.886576 0.253839 0.521176 10.942370
## 195 0.6 0 0.1 0.9 0.7 0.166968 0.880914 0.32251 0.663463 0.930126 15.337523
## 196 0.7 0.7 0.5 0.3 0.5 0.799619 0.666405 0.267327 0.241593 0.606168 17.171795
## 197 0.5 0.2 0.7 0.5 0.6 0.903777 0.882973 0.123416 0.991022 0.073507 10.496627
## 198 0 1 0.8 0.8 0.6 0.984943 0.256791 0.309303 0.052675 0.698478 14.793806
## 199 0.6 1 0.9 0.1 0.8 0.624501 0.502155 0.334554 0.030805 0.395423 17.333764
## 200 0.1 0.6 0.1 0.8 0.6 0.084706 0.662739 0.959779 0.09184 0.219858 17.153588
Yet, even with only 10 distinct values for V1 through V5 and 1,000,000 possible values for V6 through V10 we do not see any of the noise variables in our tree. It’s possible our simulation isn’t ideal here, but selection bias is not immediately apparent from our simulations.
rpart <- train(y ~ ., data=sim2,
method = "rpart2", tuneLength = 5, trControl = trainControl(method = "cv"))
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
plot(as.party(rpart$finalModel))
In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradi- ent. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:
Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?
The model on the left goes through more iterations because the learning rate is lower and the fraction of data that is tackled at a time is lower as well. This means that it is able to subtly improve upon how each variable affects the data while also considering many different spreads due to the small fraction. Variable importance tends to even out a little when you train through more iterations on more granular data.
The model on the right goes through less iterations with the high learning rate and a large amount of data at a time. Thus, the importance is not as refined and will be based on just how a few configurations are improved upon with the boosted decision trees that are built up.
Which model do you think would be more predictive of other samples?
The model with the lower learning rate and bagging fraction should have the best predictive power and generalization power. This is because the lower learning rate allows us to build on the little nuances that are in the data with our boosted trees, while the low bagging fraction prevents overfitting happening at the same time increasing its application to other data within the same population as our samples.
How would increasing interaction depth affect the slope of predictor im- portance for either model in Fig. 8.24?
The slope would be more gradual as there would be more predictors used within each tree and thus the lesser variables would be more important and represented than they previously were.
Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:
(Exercise 6.3: A chemical manufacturing process for a pharmaceutical product was discussed in Sect. 1.4. In this problem, the objective is to understand the re- lationship between biological measurements of the raw materials (predictors), measurements of the manufacturing process (predictors), and the response of product yield. Biological predictors cannot be changed but can be used to assess the quality of the raw material before processing. On the other hand, manufacturing process predictors can be changed in the manufacturing pro- cess. Improving product yield by 1 % will boost revenue by approximately one hundred thousand dollars per batch.)
These are the same loading and preprocessing steps from 6.3:
library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)
preProcess(ChemicalManufacturingProcess, method = c("knnImpute", "BoxCox", "center", "scale")) |>
predict(ChemicalManufacturingProcess) -> cmp
part <- createDataPartition(cmp$Yield, p = 0.75, list = FALSE)
cmp_train <- cmp[part,]
cmp_test <- cmp[-part,]
dim(cmp_train)
## [1] 132 58
First we train a single rpart tree model.
rpModel <- train(x = cmp_train[,-1],
y = cmp_train$Yield,
method = "rpart2",
tuneLength = 10)
## note: only 9 possible values of the max tree depth from the initial fit.
## Truncating the grid to 9 .
rpModel
## CART
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## maxdepth RMSE Rsquared MAE
## 1 0.8175537 0.3775736 0.6265683
## 2 0.8231189 0.3762344 0.6408011
## 3 0.8381809 0.3656687 0.6489826
## 4 0.8439256 0.3712713 0.6460844
## 5 0.8543423 0.3664662 0.6561471
## 6 0.8552462 0.3737762 0.6552344
## 7 0.8613777 0.3667645 0.6610742
## 8 0.8560087 0.3742493 0.6567316
## 9 0.8585462 0.3702118 0.6561662
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was maxdepth = 1.
Then we test the model on the test set.
predict(rpModel, cmp_test[,-1]) %>%
postResample(pred = ., obs = cmp_test$Yield)
## RMSE Rsquared MAE
## 0.8257154 0.2711733 0.6767816
Next we train a random forest model.
rfModel <- train(x = cmp_train[,-1],
y = cmp_train$Yield,
method = "rf")
rfModel
## Random Forest
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 0.7236138 0.5842830 0.5873090
## 29 0.6592865 0.5942617 0.5110928
## 57 0.6721925 0.5659395 0.5137634
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 29.
Our random forest model almost doubles the predictive power compared to our rpart model.
predict(rfModel, cmp_test[,-1]) %>%
postResample(pred = ., obs = cmp_test$Yield)
## RMSE Rsquared MAE
## 0.5591847 0.6656840 0.4470668
Next we train a stochastic gradient boosting model.
gbmModel <- train(x = cmp_train[,-1],
y = cmp_train$Yield,
method = "gbm",
verbose = FALSE)
gbmModel
## Stochastic Gradient Boosting
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## interaction.depth n.trees RMSE Rsquared MAE
## 1 50 0.6747303 0.5455938 0.5280799
## 1 100 0.6670484 0.5549521 0.5247048
## 1 150 0.6697214 0.5532902 0.5276607
## 2 50 0.6716316 0.5495085 0.5270243
## 2 100 0.6634565 0.5596244 0.5234029
## 2 150 0.6648194 0.5604604 0.5258441
## 3 50 0.6635415 0.5577761 0.5174201
## 3 100 0.6614607 0.5604971 0.5162071
## 3 150 0.6615139 0.5606589 0.5176707
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 100, interaction.depth =
## 3, shrinkage = 0.1 and n.minobsinnode = 10.
Our RMSE is very close to our random forest model, but beats it out ever so slightly. Additionally, GBM trains much faster than rf.
predict(gbmModel, cmp_test[,-1]) %>%
postResample(pred = ., obs = cmp_test$Yield)
## RMSE Rsquared MAE
## 0.5505060 0.6664411 0.4285254
Next we train a cubist model.
cubModel <- train(x = cmp_train[,-1],
y = cmp_train$Yield,
method = "cubist")
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
cubModel
## Cubist
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared MAE
## 1 0 1.0203175 0.3374739 0.7488729
## 1 5 1.0181010 0.3431682 0.7427497
## 1 9 1.0183096 0.3408462 0.7449241
## 10 0 0.6989775 0.5600543 0.5463506
## 10 5 0.6936176 0.5672185 0.5403733
## 10 9 0.6966088 0.5631056 0.5433529
## 20 0 0.6694613 0.5910256 0.5255571
## 20 5 0.6621819 0.6003488 0.5185625
## 20 9 0.6667352 0.5947340 0.5223337
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.
We manage to get our lowest RMSE and highest R^2 with our cubist model.
predict(cubModel, cmp_test[,-1]) %>%
postResample(pred = ., obs = cmp_test$Yield)
## RMSE Rsquared MAE
## 0.4933203 0.7410197 0.3681058
Which tree-based regression model gives the optimal resampling and test set performance?
The cubist model.
Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?
Compared to all of our other models, our cubist model applies especially high importance to the single variable of manufacturing process 32. This is about 2.5 times more important than the second important variable of manufacturing process 33. Within the rest of the important variables, our top 10 predictors only has a single biological material with high importance. This is quite different than our optimal linear and nonlinear models. They had around half of their top 10 important variables being biological. We also get more importance out of some manufcaturing process such as 33 compared to our other models where it was lower down the importance list.
plot(varImp(cubModel), top = 25)
Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?
plot(as.party(rpModel$finalModel))
Although it is a bit squished missing out on specific details, our single tree visualization does actually provide us with some interesting data. We start to see how our variables are interconnected to each other, which is a view that’s sorely missing from just looking at variable importance or scatterplots from the other types of models. For example, we are able to see that if the manufacturing process of 32 passes the threshold of 0.764, then it is important to decrease manufacturing process below 21 to increase yield. We see with the distribution that there’s a fairly consistent shift in the distribution when this change happens. Additionally, from our short tree we can see that changes in biological processes to increase yield really depend on multiple changes within the manufacturing processes.