Recreate the simulated data from Exercise 7.2:
library(mlbench)
library(dplyr)
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"
Fit a random forest model to all of the predictors, then estimate the variable importance scores:
library(randomForest)
library(caret)
set.seed(1)
model1 <- randomForest(y ~ ., data = simulated,
importance = TRUE,
ntree = 1000)
rfImp1 <- varImp(model1, scale = FALSE)
rfImp1 <- rfImp1 %>%
mutate(Variable = rownames(rfImp1)) %>%
rename(Importance_before = Overall)
rfImp1 <- rfImp1[order(-rfImp1$Importance_before),]
rfImp1
## Importance_before Variable
## V1 8.64598308 V1
## V4 7.62987929 V4
## V2 6.81423882 V2
## V5 2.17385901 V5
## V3 0.72598030 V3
## V6 0.10559967 V6
## V7 0.06427442 V7
## V10 -0.02447215 V10
## V9 -0.08227548 V9
## V8 -0.10071802 V8
Did the random forest model significantly use the
uninformative predictors (V6 –
V10)?
The informative predictors (V1 – V5) has
higher importance values, while the uninformative predictors
(V6 – V10) have importance values near zero.
The random forest model did not significantly use V6 –
V10 predictors in the model.
Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
set.seed(2)
simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V1)
## [1] 0.9385209
Fit another random forest model to these data. Did the
importance score for V1 change?
Before adding duplicated1, V1 was the
strongest predictor in the random forest model. However, after adding
duplicated1, - V1 ‘s importance values dropped
from about 8.6 to 5.3 - V4 becomes the strongest predictor
- all other predictors’ importance values remain similar but decreased
slightly - duplicated1 gain a importance value of about
4.4
V1 and duplicated1 are highly correlated so
they carry almost the same information. Random forest split their
predicting power.
set.seed(3)
model2 <- randomForest(y ~ ., data = simulated,
importance = TRUE,
ntree = 1000)
rfImp2 <- varImp(model2, scale = FALSE)
rfImp2 <- rfImp2 %>%
mutate(Variable = rownames(rfImp2)) %>%
rename(Importance_after = Overall)
rfImp_combined <- merge(rfImp1, rfImp2, by = "Variable", all= TRUE)
rfImp_combined <- rfImp_combined[order(-rfImp_combined$Importance_after),]
rfImp_combined
## Variable Importance_before Importance_after
## 6 V4 7.62987929 6.37409556
## 4 V2 6.81423882 6.22814840
## 2 V1 8.64598308 5.34102073
## 1 duplicate1 NA 4.37373219
## 7 V5 2.17385901 1.92347519
## 5 V3 0.72598030 0.52023473
## 8 V6 0.10559967 0.23659142
## 3 V10 -0.02447215 0.11989685
## 9 V7 0.06427442 0.01111648
## 11 V9 -0.08227548 -0.01672550
## 10 V8 -0.10071802 -0.11847288
What happens when you add another predictor that is also highly correlated with V1?
Add another predictor that is highly correlated with one of the informative predictors.
set.seed(4)
simulated$duplicate2 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate2, simulated$V1)
## [1] 0.9473968
Before adding duplicated1 and duplicated2,
V1 was the strongest predictor (importance value = 8.6) in
the random forest model.
After adding duplicated2, - V4 remains to
be the strongest predictor - all other predictors’ importance values
remain similar - V1’s importance value dropped further 8.6
-> 5.3 -> 4.3 - duplicated1’s importance value
dropped from 4.4 to 3.8 - duplicated2 gain a importance
value of about 2.0
Adding V1, duplicated1, and
duplicated2’s importance value (4.3 + 3.8 + 2.0 = 10.1) is
similar to V1’s original importance value.
set.seed(4)
model3 <- randomForest(y ~ ., data = simulated,
importance = TRUE,
ntree = 1000)
rfImp3 <- varImp(model3, scale = FALSE)
rfImp3 <- rfImp3 %>%
mutate(Variable = rownames(rfImp3)) %>%
rename(Importance_after2 = Overall)
rfImp_combined2 <- merge(rfImp_combined, rfImp3, by = "Variable", all= TRUE)
rfImp_combined2 <- rfImp_combined2[order(-rfImp_combined2$Importance_after2),]
rfImp_combined2
## Variable Importance_before Importance_after Importance_after2
## 7 V4 7.62987929 6.37409556 6.929337982
## 5 V2 6.81423882 6.22814840 6.709044774
## 3 V1 8.64598308 5.34102073 4.281019480
## 1 duplicate1 NA 4.37373219 3.813202454
## 8 V5 2.17385901 1.92347519 2.183434956
## 2 duplicate2 NA NA 1.978928422
## 6 V3 0.72598030 0.52023473 0.437590286
## 9 V6 0.10559967 0.23659142 0.144365953
## 4 V10 -0.02447215 0.11989685 0.018855843
## 12 V9 -0.08227548 -0.01672550 0.003648182
## 10 V7 0.06427442 0.01111648 -0.031073686
## 11 V8 -0.10071802 -0.11847288 -0.099575772
Use the cforest function in the party package to
fit a random forest model using conditional inference trees. The party
package function varimp can calculate predictor importance.
The conditional argument of that function toggles between
the traditional importance measure and the modified version described in
Strobl et al. (2007). Do these importances show the same pattern as the
traditional random forest model?
Traditional importance behaves similarly to random forest model. The
importance of V1 is split with V1,
duplicated1, and duplicated2. The
uninformative predictors (V6 – V10) continue
to have near zero importance values. The importance of V3
dropped.
library(party)
set.seed(4)
model4 <- cforest(y ~ ., data = simulated)
rfImp4 <- varImp(model4, conditional = FALSE)
rfImp4 %>%
arrange(desc(Overall))
## Overall
## V4 6.583309522
## V2 5.881114391
## V1 4.053541290
## duplicate1 3.798439407
## V5 1.817428374
## duplicate2 1.757678596
## V7 0.056120073
## V3 0.040002274
## V6 0.006464814
## V9 -0.002232059
## V10 -0.034487197
## V8 -0.043319070
Conditional importance adjusts for correlation.
Adding V1, duplicated1, and
duplicated2’s importance value (1.6 + 1.3 + 0.5 = 3.4) is
no longer similar to V1’s original importance value
(8.6).
rfImp5 <- varImp(model4, conditional = TRUE)
rfImp5 %>%
arrange(desc(Overall))
## Overall
## V4 5.208699787
## V2 4.583794176
## V1 1.547152586
## duplicate1 1.229387806
## V5 1.167797154
## duplicate2 0.585745567
## V7 0.030583956
## V9 0.014444673
## V3 0.010708149
## V6 0.009149605
## V8 -0.028654096
## V10 -0.030472259
Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?
library(gbm)
set.seed(5)
gbmModel <- gbm(y ~ ., data = simulated, distribution = "gaussian")
summary(gbmModel)
## var rel.inf
## V4 V4 29.2071475
## V2 V2 21.6898002
## duplicate1 duplicate1 14.9937496
## V1 V1 13.2567960
## V5 V5 11.1785877
## V3 V3 8.6389593
## duplicate2 duplicate2 0.5023758
## V6 V6 0.3949306
## V7 V7 0.1376532
## V8 V8 0.0000000
## V9 V9 0.0000000
## V10 V10 0.0000000
library(Cubist)
set.seed(6)
predictors <- simulated[,setdiff(names(simulated),"y")]
cubistModel <- cubist(x = predictors, y = simulated$y)
varImp(cubistModel)
## Overall
## duplicate1 79.0
## V2 74.0
## V4 50.0
## V5 50.0
## V1 45.0
## V6 23.5
## V3 0.0
## V7 0.0
## V8 0.0
## V9 0.0
## V10 0.0
## duplicate2 0.0
Use a simulation to show tree bias with different granularities.
set.seed(10)
x <- runif(2000, min = 0, max= 1)
y <- sin(x) + rnorm(length(x)) * .25
x_noisy <- x + rnorm(length(x)) *0.1
data <- data.frame(
x_fine = x,
x_medium = round(x,1),
x_coarse = round(x, 0),
x_noisy = x_noisy,
y= y
)
library(rpart)
fit_fine <- rpart(y ~ x_fine + x_noisy, data = data)
fit_med <- rpart(y ~ x_medium+ x_noisy, data = data)
fit_coarse <- rpart(y ~ x_coarse+ x_noisy, data = data)
plot(fit_fine, uniform = TRUE)
text(fit_fine, use.n = TRUE)
plot(fit_med, uniform = TRUE)
text(fit_med, use.n = TRUE)
plot(fit_coarse, uniform = TRUE)
text(fit_coarse, use.n = TRUE)
We see more splits for fine granularity and less splits for fine granularity. As the granularity becomes coarse, it overfits on noisy data.
set.seed(11)
x <- runif(2000, min = 0, max= 1)
y <- sin(x) + rnorm(length(x)) * .25
x_noisy <- x + rnorm(length(x)) *0.1
x_binary <- rbinom(length(x),1, 0.5)
data <- data.frame(
x_fine = x,
x_medium = round(x,1),
x_coarse = round(x, 0),
x_noisy = x_noisy,
x_binary= x_binary,
y= y
)
library(rpart)
fit_fine <- rpart(y ~ x_fine + x_noisy + x_binary, data = data)
fit_med <- rpart(y ~ x_medium+ x_noisy + x_binary, data = data)
fit_coarse <- rpart(y ~ x_coarse+ x_noisy + x_binary, data = data)
plot(fit_fine, uniform = TRUE)
text(fit_fine, use.n = TRUE)
fit_fine$variable.importance
## x_fine x_noisy x_binary
## 109.8058141 90.2847522 0.4228861
plot(fit_med, uniform = TRUE)
text(fit_med, use.n = TRUE)
fit_med$variable.importance
## x_medium x_noisy
## 108.0585 88.1173
plot(fit_coarse, uniform = TRUE)
text(fit_coarse, use.n = TRUE)
fit_coarse$variable.importance
## x_coarse x_noisy x_binary
## 88.891683 88.314359 2.768394
Decision trees favor high cardinality variables (x_fine). Evne though a binary variable was included, it barely influence splits.
In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:
Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?
The left model with its low parameters has high regularization and randomness. Low learning rate allows more trees to be built, which allows for more robust model. Low bagging fraction allows each tree to be trained on a small subset of the total data, whihc introduces randomness in the model. It allows the model to use a wider range of predictors. The right model relies on only the few strongest and most dominant predictors.
Which model do you think would be more predictive of other samples?
The left model would likely to be more more predictive of other samples. Its strong regularization prevents overfitting of the training dataset.
How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?
Increasing interaction depth aloows the trees to capture more complex interactions. It will make the slope of predictor importance steeper for both models. But it will affect mostly the right model. The top predictor will become even more dominant due to the high learning rate.
Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:
library(AppliedPredictiveModeling)
library(caret)
data(ChemicalManufacturingProcess)
# Refering text p. 54
trans <- preProcess(ChemicalManufacturingProcess[,-1],
method = "knnImpute" )
transformed <- predict(trans, ChemicalManufacturingProcess[,-1])
sum(is.na(transformed))
## [1] 0
chem_filtered <- transformed[, -nearZeroVar(transformed)]
set.seed(6)
trainingRows <- createDataPartition(ChemicalManufacturingProcess$Yield,
p = .70,
list= FALSE)
TrainX <- chem_filtered[trainingRows, ]
TrainY <- ChemicalManufacturingProcess$Yield[trainingRows]
TestX <- chem_filtered[-trainingRows, ]
TestY <- ChemicalManufacturingProcess$Yield[-trainingRows]
# Provide by Text
set.seed(111)
weightedknnModel2 <- train(x = TrainX,
y = TrainY,
method = "kknn",
preProc = c("center", "scale", "pca"),
tuneGrid = expand.grid(kmax = seq(5,25,5),
distance = 2,
kernel = c("rectangular", "triangular","epanechnikov")))
weightedknnModel2
## k-Nearest Neighbors
##
## 124 samples
## 56 predictor
##
## Pre-processing: centered (56), scaled (56), principal component
## signal extraction (56)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ...
## Resampling results across tuning parameters:
##
## kmax kernel RMSE Rsquared MAE
## 5 rectangular 1.699732 0.2860582 1.306812
## 5 triangular 1.594407 0.3072062 1.230722
## 5 epanechnikov 1.618735 0.2967283 1.242189
## 10 rectangular 1.699732 0.2860582 1.306812
## 10 triangular 1.553165 0.3255548 1.214788
## 10 epanechnikov 1.599011 0.3028114 1.237059
## 15 rectangular 1.699732 0.2860582 1.306812
## 15 triangular 1.549762 0.3273930 1.213288
## 15 epanechnikov 1.599011 0.3028114 1.237059
## 20 rectangular 1.699732 0.2860582 1.306812
## 20 triangular 1.549762 0.3273930 1.213288
## 20 epanechnikov 1.599011 0.3028114 1.237059
## 25 rectangular 1.699732 0.2860582 1.306812
## 25 triangular 1.549762 0.3273930 1.213288
## 25 epanechnikov 1.599011 0.3028114 1.237059
##
## Tuning parameter 'distance' was held constant at a value of 2
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were kmax = 25, distance = 2 and kernel
## = triangular.
weightedknnPred2 <- predict(weightedknnModel2, newdata = TestX)
## The function 'postResample' can be used to get the test set
## perforamnce values
postResample(pred = weightedknnPred2, obs = TestY)
## RMSE Rsquared MAE
## 1.4676074 0.4733189 1.1051550
set.seed(113)
#refer to text p. 166
svmModel2 <- train(x = TrainX,
y = TrainY,
method = "svmRadial",
preProc = c("center", "scale", "pca"),
tuneLength = 14,
trControl = trainControl(method = "cv"))
svmModel2
## Support Vector Machines with Radial Basis Function Kernel
##
## 124 samples
## 56 predictor
##
## Pre-processing: centered (56), scaled (56), principal component
## signal extraction (56)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 110, 112, 112, 112, 112, 112, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 1.450085 0.5568588 1.1997266
## 0.50 1.332047 0.5647384 1.0980277
## 1.00 1.238790 0.6006898 1.0179477
## 2.00 1.200795 0.6201847 0.9773347
## 4.00 1.205578 0.6058129 0.9752402
## 8.00 1.222136 0.5942627 0.9882264
## 16.00 1.221243 0.5954775 0.9870167
## 32.00 1.221243 0.5954775 0.9870167
## 64.00 1.221243 0.5954775 0.9870167
## 128.00 1.221243 0.5954775 0.9870167
## 256.00 1.221243 0.5954775 0.9870167
## 512.00 1.221243 0.5954775 0.9870167
## 1024.00 1.221243 0.5954775 0.9870167
## 2048.00 1.221243 0.5954775 0.9870167
##
## Tuning parameter 'sigma' was held constant at a value of 0.02748059
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.02748059 and C = 2.
svmPred2 <- predict(svmModel2, newdata = TestX)
## The function 'postResample' can be used to get the test set
## perforamnce values
postResample(pred = svmPred2, obs = TestY)
## RMSE Rsquared MAE
## 1.2695080 0.6191610 0.9728057
set.seed(114)
#refer to text p. 165
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
marsModel2 <- train(x = TrainX,
y = TrainY,
method = "earth",
tuneGrid = marsGrid,
preProc = c("center", "scale"),
trControl = trainControl(method = "cv"))
## Loading required package: earth
## Loading required package: Formula
## Loading required package: plotmo
## Loading required package: plotrix
marsModel2
## Multivariate Adaptive Regression Spline
##
## 124 samples
## 56 predictor
##
## Pre-processing: centered (56), scaled (56)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 112, 112, 111, 112, 112, 112, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 1.420245 0.4161646 1.1223649
## 1 3 1.192370 0.6063616 0.9763954
## 1 4 1.171674 0.6163737 0.9433324
## 1 5 1.169215 0.6034139 0.9388695
## 1 6 1.223304 0.5669649 0.9703588
## 1 7 1.185700 0.6027056 0.9510157
## 1 8 1.180833 0.5932560 0.9827166
## 1 9 1.180792 0.5976978 0.9827534
## 1 10 1.185384 0.6048656 0.9685761
## 1 11 1.184590 0.6095915 0.9674617
## 1 12 1.373554 0.5963351 1.0829173
## 1 13 1.395727 0.5843234 1.0984551
## 1 14 1.404500 0.5844436 1.1004205
## 1 15 1.395147 0.5864584 1.0949697
## 1 16 1.405719 0.5837766 1.1066869
## 1 17 1.392827 0.5902839 1.0942656
## 1 18 1.392827 0.5902839 1.0942656
## 1 19 1.392827 0.5902839 1.0942656
## 1 20 1.415115 0.5837947 1.1080326
## 1 21 1.421045 0.5836208 1.1194441
## 1 22 1.421045 0.5836208 1.1194441
## 1 23 1.421045 0.5836208 1.1194441
## 1 24 1.421045 0.5836208 1.1194441
## 1 25 1.421045 0.5836208 1.1194441
## 1 26 1.421045 0.5836208 1.1194441
## 1 27 1.421045 0.5836208 1.1194441
## 1 28 1.421045 0.5836208 1.1194441
## 1 29 1.421045 0.5836208 1.1194441
## 1 30 1.421045 0.5836208 1.1194441
## 1 31 1.421045 0.5836208 1.1194441
## 1 32 1.421045 0.5836208 1.1194441
## 1 33 1.421045 0.5836208 1.1194441
## 1 34 1.421045 0.5836208 1.1194441
## 1 35 1.421045 0.5836208 1.1194441
## 1 36 1.421045 0.5836208 1.1194441
## 1 37 1.421045 0.5836208 1.1194441
## 1 38 1.421045 0.5836208 1.1194441
## 2 2 1.420245 0.4161646 1.1223649
## 2 3 1.184745 0.6076965 0.9602160
## 2 4 1.209515 0.5854936 0.9681314
## 2 5 1.254889 0.5406905 0.9757740
## 2 6 1.183539 0.5938054 0.9470884
## 2 7 1.156842 0.6058061 0.9096095
## 2 8 1.159008 0.6048214 0.9155860
## 2 9 1.158606 0.6017263 0.9002778
## 2 10 1.212862 0.5807610 0.9210734
## 2 11 1.398343 0.5475084 0.9810743
## 2 12 1.489289 0.5383656 1.0335915
## 2 13 1.510972 0.5529365 1.0257113
## 2 14 1.533395 0.5414225 1.0553174
## 2 15 1.532111 0.5254442 1.0506624
## 2 16 1.546251 0.5066151 1.0598119
## 2 17 1.516852 0.5173416 1.0311456
## 2 18 1.578750 0.5075698 1.0671888
## 2 19 1.574408 0.5085381 1.0617242
## 2 20 1.571305 0.5131123 1.0582440
## 2 21 1.583670 0.5129312 1.0444304
## 2 22 1.576575 0.5135234 1.0351308
## 2 23 1.585898 0.5104487 1.0460993
## 2 24 1.585898 0.5104487 1.0460993
## 2 25 1.585898 0.5104487 1.0460993
## 2 26 1.585898 0.5104487 1.0460993
## 2 27 1.585898 0.5104487 1.0460993
## 2 28 1.582272 0.5062776 1.0514079
## 2 29 1.594988 0.5027155 1.0586435
## 2 30 1.594988 0.5027155 1.0586435
## 2 31 1.594988 0.5027155 1.0586435
## 2 32 1.594988 0.5027155 1.0586435
## 2 33 1.594988 0.5027155 1.0586435
## 2 34 1.594988 0.5027155 1.0586435
## 2 35 1.594988 0.5027155 1.0586435
## 2 36 1.594988 0.5027155 1.0586435
## 2 37 1.594988 0.5027155 1.0586435
## 2 38 1.594988 0.5027155 1.0586435
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 7 and degree = 2.
marsPred2 <- predict(marsModel2, newdata = TestX)
## The function 'postResample' can be used to get the test set
## perforamnce values
postResample(pred = marsPred2, obs = TestY)
## RMSE Rsquared MAE
## 1.2278715 0.6159963 0.9913622
set.seed(115)
#refer to text p. 165
rfGrid <- expand.grid(.mtry = c(15, 18, 21, 24, 27))
rfModel <- train(x = TrainX,
y = TrainY,
method = "rf",
tuneGrid = rfGrid,
preProc = c("center", "scale"),
trControl = trainControl(method = "cv"),
ntree = 1000)
rfModel
## Random Forest
##
## 124 samples
## 56 predictor
##
## Pre-processing: centered (56), scaled (56)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 112, 112, 112, 111, 111, 112, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 15 1.098227 0.6851523 0.8789189
## 18 1.092095 0.6878947 0.8735003
## 21 1.096146 0.6814199 0.8793522
## 24 1.097314 0.6791979 0.8761327
## 27 1.095722 0.6798135 0.8699768
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 18.
rfPred <- predict(rfModel, newdata = TestX)
## The function 'postResample' can be used to get the test set
## perforamnce values
postResample(pred = rfPred, obs = TestY)
## RMSE Rsquared MAE
## 1.3841889 0.5202457 1.1082526
set.seed(117)
#refer to text p. 165
gbmGrid <- expand.grid(.interaction.depth = seq(1, 7, by = 2),
.n.trees = seq(100, 1000, by = 50),
.shrinkage = c(0.01, 0.1),
.n.minobsinnode = 10)
gbmModel <- train(x = TrainX,
y = TrainY,
method = "gbm",
tuneGrid = gbmGrid,
preProc = c("center", "scale"),
trControl = trainControl(method = "cv"),
verbose = FALSE)
gbmModel
## Stochastic Gradient Boosting
##
## 124 samples
## 56 predictor
##
## Pre-processing: centered (56), scaled (56)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 112, 112, 112, 112, 111, 112, ...
## Resampling results across tuning parameters:
##
## shrinkage interaction.depth n.trees RMSE Rsquared MAE
## 0.01 1 100 1.423289 0.5413911 1.1667342
## 0.01 1 150 1.331318 0.5617667 1.0901688
## 0.01 1 200 1.279718 0.5651657 1.0448565
## 0.01 1 250 1.247326 0.5695968 1.0101425
## 0.01 1 300 1.224454 0.5764725 0.9826139
## 0.01 1 350 1.203294 0.5875462 0.9569889
## 0.01 1 400 1.188215 0.5943973 0.9386811
## 0.01 1 450 1.175066 0.5997744 0.9227480
## 0.01 1 500 1.164687 0.6052734 0.9091128
## 0.01 1 550 1.158112 0.6083235 0.9018204
## 0.01 1 600 1.152841 0.6105767 0.8955818
## 0.01 1 650 1.147077 0.6148857 0.8890254
## 0.01 1 700 1.141007 0.6187003 0.8819512
## 0.01 1 750 1.135141 0.6224087 0.8770528
## 0.01 1 800 1.133440 0.6228413 0.8734346
## 0.01 1 850 1.132026 0.6247604 0.8697171
## 0.01 1 900 1.127433 0.6270981 0.8653934
## 0.01 1 950 1.125513 0.6284345 0.8643677
## 0.01 1 1000 1.125789 0.6279463 0.8627760
## 0.01 3 100 1.343599 0.5551676 1.0982510
## 0.01 3 150 1.267234 0.5725646 1.0266337
## 0.01 3 200 1.222567 0.5889824 0.9757780
## 0.01 3 250 1.195491 0.5992512 0.9460645
## 0.01 3 300 1.176967 0.6101907 0.9214132
## 0.01 3 350 1.167288 0.6145875 0.9077750
## 0.01 3 400 1.153302 0.6222767 0.8946053
## 0.01 3 450 1.149045 0.6253844 0.8887948
## 0.01 3 500 1.141792 0.6295808 0.8803786
## 0.01 3 550 1.136828 0.6322995 0.8767161
## 0.01 3 600 1.129970 0.6357851 0.8725452
## 0.01 3 650 1.128943 0.6363298 0.8716499
## 0.01 3 700 1.126405 0.6383188 0.8707491
## 0.01 3 750 1.123365 0.6404017 0.8692980
## 0.01 3 800 1.120793 0.6420349 0.8675805
## 0.01 3 850 1.122359 0.6407888 0.8691054
## 0.01 3 900 1.121090 0.6412225 0.8705442
## 0.01 3 950 1.118007 0.6433928 0.8679228
## 0.01 3 1000 1.117042 0.6429260 0.8686071
## 0.01 5 100 1.348762 0.5517591 1.1029438
## 0.01 5 150 1.268411 0.5661896 1.0199601
## 0.01 5 200 1.226719 0.5766824 0.9720797
## 0.01 5 250 1.201941 0.5882316 0.9401397
## 0.01 5 300 1.185710 0.5967537 0.9220089
## 0.01 5 350 1.167212 0.6072465 0.9028121
## 0.01 5 400 1.157383 0.6121039 0.8933403
## 0.01 5 450 1.148480 0.6170459 0.8823184
## 0.01 5 500 1.143795 0.6199530 0.8745758
## 0.01 5 550 1.140687 0.6212369 0.8727268
## 0.01 5 600 1.135444 0.6241541 0.8682690
## 0.01 5 650 1.136242 0.6234792 0.8694747
## 0.01 5 700 1.137893 0.6220183 0.8709401
## 0.01 5 750 1.135129 0.6235764 0.8682022
## 0.01 5 800 1.134210 0.6247533 0.8663565
## 0.01 5 850 1.131808 0.6258023 0.8643168
## 0.01 5 900 1.129189 0.6270516 0.8616205
## 0.01 5 950 1.127407 0.6279617 0.8606197
## 0.01 5 1000 1.127206 0.6283794 0.8599961
## 0.01 7 100 1.326611 0.5786090 1.0847976
## 0.01 7 150 1.244848 0.5911154 1.0035258
## 0.01 7 200 1.199004 0.6042228 0.9555810
## 0.01 7 250 1.170475 0.6143877 0.9223474
## 0.01 7 300 1.157853 0.6198942 0.9107050
## 0.01 7 350 1.146733 0.6251460 0.8983793
## 0.01 7 400 1.132562 0.6324873 0.8847830
## 0.01 7 450 1.125064 0.6367261 0.8763655
## 0.01 7 500 1.119617 0.6392291 0.8707603
## 0.01 7 550 1.117285 0.6411367 0.8671246
## 0.01 7 600 1.113380 0.6428982 0.8629368
## 0.01 7 650 1.109695 0.6451248 0.8599001
## 0.01 7 700 1.104532 0.6472997 0.8560713
## 0.01 7 750 1.104302 0.6470565 0.8552709
## 0.01 7 800 1.101732 0.6482944 0.8536485
## 0.01 7 850 1.100361 0.6497801 0.8530984
## 0.01 7 900 1.097763 0.6511390 0.8517636
## 0.01 7 950 1.097189 0.6516431 0.8522323
## 0.01 7 1000 1.094306 0.6528367 0.8497895
## 0.10 1 100 1.139782 0.6041604 0.8775208
## 0.10 1 150 1.132027 0.6142042 0.8691031
## 0.10 1 200 1.116140 0.6282207 0.8779192
## 0.10 1 250 1.112499 0.6302337 0.8761978
## 0.10 1 300 1.111850 0.6280776 0.8760848
## 0.10 1 350 1.117811 0.6266570 0.8751897
## 0.10 1 400 1.109572 0.6335824 0.8650768
## 0.10 1 450 1.110829 0.6309095 0.8626886
## 0.10 1 500 1.115151 0.6276409 0.8630946
## 0.10 1 550 1.113257 0.6295937 0.8626008
## 0.10 1 600 1.113128 0.6293539 0.8614671
## 0.10 1 650 1.117905 0.6264673 0.8650594
## 0.10 1 700 1.117931 0.6254488 0.8656066
## 0.10 1 750 1.123672 0.6219291 0.8676270
## 0.10 1 800 1.124216 0.6234135 0.8661076
## 0.10 1 850 1.121010 0.6247008 0.8654976
## 0.10 1 900 1.128553 0.6208985 0.8706600
## 0.10 1 950 1.131260 0.6197499 0.8751686
## 0.10 1 1000 1.134479 0.6175351 0.8788523
## 0.10 3 100 1.192937 0.5831494 0.9534042
## 0.10 3 150 1.176441 0.5942503 0.9442726
## 0.10 3 200 1.160444 0.6059577 0.9383723
## 0.10 3 250 1.151995 0.6100688 0.9303530
## 0.10 3 300 1.143422 0.6154580 0.9251949
## 0.10 3 350 1.137039 0.6190863 0.9183965
## 0.10 3 400 1.134891 0.6208552 0.9162557
## 0.10 3 450 1.130939 0.6233878 0.9130177
## 0.10 3 500 1.129706 0.6240104 0.9122626
## 0.10 3 550 1.129453 0.6242597 0.9124971
## 0.10 3 600 1.128395 0.6250840 0.9120618
## 0.10 3 650 1.127809 0.6255418 0.9114271
## 0.10 3 700 1.126992 0.6260127 0.9107517
## 0.10 3 750 1.127035 0.6259410 0.9107677
## 0.10 3 800 1.126634 0.6261981 0.9106233
## 0.10 3 850 1.126753 0.6261375 0.9107700
## 0.10 3 900 1.126545 0.6262471 0.9106498
## 0.10 3 950 1.126522 0.6262460 0.9105503
## 0.10 3 1000 1.126416 0.6263158 0.9104782
## 0.10 5 100 1.182585 0.5868787 0.9060703
## 0.10 5 150 1.157691 0.6021213 0.8882589
## 0.10 5 200 1.143956 0.6111160 0.8746710
## 0.10 5 250 1.133866 0.6170608 0.8663558
## 0.10 5 300 1.129412 0.6201409 0.8630462
## 0.10 5 350 1.126043 0.6218628 0.8622037
## 0.10 5 400 1.126764 0.6213953 0.8609887
## 0.10 5 450 1.125189 0.6221890 0.8603034
## 0.10 5 500 1.123235 0.6231935 0.8590695
## 0.10 5 550 1.121713 0.6242633 0.8580728
## 0.10 5 600 1.121014 0.6246869 0.8578194
## 0.10 5 650 1.120303 0.6251466 0.8573793
## 0.10 5 700 1.119608 0.6254000 0.8571426
## 0.10 5 750 1.119307 0.6255528 0.8571290
## 0.10 5 800 1.118725 0.6258675 0.8568384
## 0.10 5 850 1.118305 0.6261355 0.8568044
## 0.10 5 900 1.118353 0.6260849 0.8570405
## 0.10 5 950 1.118173 0.6262247 0.8570595
## 0.10 5 1000 1.117936 0.6263906 0.8570213
## 0.10 7 100 1.165168 0.6164119 0.9037594
## 0.10 7 150 1.147902 0.6262323 0.8964773
## 0.10 7 200 1.138103 0.6315710 0.8912544
## 0.10 7 250 1.135841 0.6327367 0.8906029
## 0.10 7 300 1.133264 0.6335503 0.8876008
## 0.10 7 350 1.131041 0.6344672 0.8865193
## 0.10 7 400 1.128790 0.6356978 0.8834832
## 0.10 7 450 1.126209 0.6372801 0.8821103
## 0.10 7 500 1.125467 0.6372491 0.8824050
## 0.10 7 550 1.124105 0.6380272 0.8813631
## 0.10 7 600 1.123055 0.6387294 0.8803206
## 0.10 7 650 1.122448 0.6391842 0.8801378
## 0.10 7 700 1.122021 0.6393152 0.8798143
## 0.10 7 750 1.121867 0.6393796 0.8799372
## 0.10 7 800 1.121683 0.6395168 0.8798424
## 0.10 7 850 1.121603 0.6395411 0.8797303
## 0.10 7 900 1.121495 0.6395804 0.8797665
## 0.10 7 950 1.121448 0.6396150 0.8796829
## 0.10 7 1000 1.121375 0.6396165 0.8796685
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1000, interaction.depth =
## 7, shrinkage = 0.01 and n.minobsinnode = 10.
gbmPred <- predict(gbmModel, newdata = TestX)
## The function 'postResample' can be used to get the test set
## perforamnce values
postResample(pred = gbmPred, obs = TestY)
## RMSE Rsquared MAE
## 1.3844945 0.5226213 0.9988662
set.seed(117)
#refer to text p. 165
cubistGrid <- expand.grid(committees = c(1, 5, 10, 20),
neighbors = c(0, 3, 5))
cubistModel <- train(x = TrainX,
y = TrainY,
method = "cubist",
tuneGrid = cubistGrid,
preProc = c("center", "scale"),
trControl = trainControl(method = "cv"))
cubistModel
## Cubist
##
## 124 samples
## 56 predictor
##
## Pre-processing: centered (56), scaled (56)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 112, 112, 112, 112, 111, 112, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared MAE
## 1 0 1.307594 0.5003035 1.0431963
## 1 3 1.187560 0.5904275 0.9104327
## 1 5 1.211412 0.5699709 0.9321292
## 5 0 1.117679 0.6233613 0.8723219
## 5 3 1.041239 0.6850491 0.8005877
## 5 5 1.065681 0.6617667 0.8061668
## 10 0 1.103046 0.6554743 0.8612666
## 10 3 1.042496 0.7068475 0.7924773
## 10 5 1.062694 0.6875339 0.7969218
## 20 0 1.082658 0.6669867 0.8372747
## 20 3 1.031802 0.7054281 0.7817013
## 20 5 1.049982 0.6910679 0.7951145
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 3.
cubistPred <- predict(cubistModel, newdata = TestX)
## The function 'postResample' can be used to get the test set
## perforamnce values
postResample(pred = cubistPred, obs = TestY)
## RMSE Rsquared MAE
## 1.3129126 0.5603709 0.9849868
Which tree-based regression model gives the optimal resampling and test set performance?
| Model | \(R^2\) | RMSE |
|---|---|---|
| SVM | 0.6191610 | 1.2695080 |
| MARS | 0.6159963 | 1.2278715 |
| Cubist | 0.5603709 | 1.3129126 |
| GBM | 0.5226213 | 1.3844945 |
| Random Forest | 0.5202457 | 1.3841889 |
| Weighted KNN | 0.4733189 | 1.4676074 |
The SVM model outperform all the other models as it has the highest \(R^2\). However, it is still not an ideal model as it only explains about 62% of the variance of the test data.
But in terms of only tree-based regression models, Cubist is the top performer.
Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?
varImp(cubistModel)
## cubist variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess13 62.03
## BiologicalMaterial12 59.49
## ManufacturingProcess09 50.63
## BiologicalMaterial03 50.63
## BiologicalMaterial02 43.04
## ManufacturingProcess30 39.24
## ManufacturingProcess04 25.32
## ManufacturingProcess29 25.32
## BiologicalMaterial06 25.32
## ManufacturingProcess26 24.05
## ManufacturingProcess25 24.05
## ManufacturingProcess17 22.78
## ManufacturingProcess39 21.52
## ManufacturingProcess28 18.99
## BiologicalMaterial09 15.19
## ManufacturingProcess42 13.92
## ManufacturingProcess27 11.39
## ManufacturingProcess06 11.39
## ManufacturingProcess33 11.39
Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?
cubistModel$finalModel
##
## Call:
## cubist.default(x = x, y = y, committees = param$committees)
##
## Number of samples: 124
## Number of predictors: 56
##
## Number of committees: 20
## Number of rules per committee: 1, 5, 1, 4, 1, 2, 1, 1, 3, 2, 1, 2, 1, 2, 1, 3, 2, 3, 1, 4