Data 624 HW9: Regression Trees and Rule-Based Models
library(tidyverse)
library(fpp2)
library(urca)
library(rio)
library(gridExtra)
library(caret)
library(randomForest)
library(glmnet)
library(mlbench)
library(AppliedPredictiveModeling)
library(party)
library(gbm)
library(Cubist)
library(rpart)
seed <- 2001 HW9: Regression Trees and Rule-Based Models
Do problems 8.1, 8.2, 8.3, and 8.7 in Kuhn and Johnson. Please submit the Rpubs link along with the .rmd file.
1.1 Ex. 8.1
Recreate the simulated data from Exercise 7.2:
#library(mlbench)
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"1.1.1 Part a
Fit a random forest model to all of the predictors, then estimate the variable importance scores. Did the random forest model significantly use the uninformative predictors (V6 – V10)?
Answer:
From the results below, the random forest model did not significantly use the uninformative predictors V6 to V10.
#library(randomForest)
#library(caret)
model1 <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
rfImp1 <- varImp(model1, scale = FALSE)
rfImp11.1.2 Part b
Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
## [1] 0.9460206
Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?
Answer:
When I added another predictor duplicate1 into the model, the importance of V1 is reduced and so as all others. The importance of V1 is reduced from 8.69 to 6.02, by 2.67, which is about 30%. This is resulted from adding the highly correlated predictor duplicate1 to the model.
model2 <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
rfImp2 <- varImp(model2, scale = FALSE)
rfImp21.1.3 Part c
Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?
Answer:
As given by the question, we used conditional inference trees in this cforest function.
The result importances are in a different pattern as the traditional random forest model. The importance of V1 reduced to 3.25 as the correlation between predictors V1 and duplicate1 are being considered. The importance of V2 and V4 are still high while the unimportant predictors V6 to V10 are still low.
This model is better than traditional random forest model when there are highly correlated predictors.
model3 <- cforest(y ~ ., data = simulated, controls = cforest_unbiased(ntree=1000))
rfImp3 <- varimp(model3, conditional = TRUE)
rfImp3## V1 V2 V3 V4 V5
## 1.832916297 4.771944496 0.014222304 5.933231943 1.055686258
## V6 V7 V8 V9 V10
## 0.009163044 0.004786348 -0.002685858 0.003285658 -0.017157917
## duplicate1
## 1.983987681
1.1.4 Part d
Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?
Answer:
Similar to the models above, the boosted trees model and Cubist model also have low importance in predictors V6 to V10.
Boosted Trees model (gbm) is similar to cforest model that it has V4 as the highest importance as the importance from V1 are slightly shared with the predictor duplicate1.
Cubist model (cubist) is similar to the traditional random forest model that still has V1 as the top important model but also ranked 0 importance on the predictor duplicate1.
#cubist
model5 <- cubist(x=simulated[, names(simulated)!="y"], y=simulated$y, committees=100)
rfImp5 <- varImp(model5)
rfImp51.2 Ex. 8.2
Use a simulation to show tree bias with different granularities.
Answer:
From the textbook, we know that trees suffer from selection bias: Predictors with a higher number of distinct values are favored over more granular predictors. Also, as the number of missing values increases, the selection of predictors becomes more biased. (Sec. 8.1 page 182)
I will use sample function to simulate 5 variables with different granularities.
From the results below, we can see that x1 with higher number of distinct values are favored as it has the highest importance among all predictors, while X2, X3, and X4 are much less important. X5 has no distinct values except 1 so it was deemed unimportant. Thus, the result proves that tree model suffers from selection bias as it favors higher number of distinct values.
set.seed(200)
x1 <- sample(0:10000, 500, replace=TRUE)
x2 <- sample(0:1000, 500, replace=TRUE)
x3 <- sample(0:100, 500, replace=TRUE)
x4 <- sample(0:10, 500, replace=TRUE)
x5 <- sample(1, 500, replace=TRUE)
y <- x1+x2+x3+x4+x5+rnorm(500)
df <- data.frame(x1,x2,x3,x4,x5,y)
str(df)## 'data.frame': 500 obs. of 6 variables:
## $ x1: int 2213 5489 5870 5851 6324 1561 9781 9783 8730 3755 ...
## $ x2: int 409 115 880 59 617 940 343 465 685 570 ...
## $ x3: int 36 41 75 96 99 80 86 79 7 66 ...
## $ x4: int 5 4 7 6 8 4 8 1 2 4 ...
## $ x5: int 1 1 1 1 1 1 1 1 1 1 ...
## $ y : num 2665 5649 6835 6014 7051 ...
1.3 Ex. 8.3
In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:
Fig. 8.24
1.3.1 Part a
Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?
Answer:
Boosting is a method of converting weak learners into strong learners. In boosting, each new tree is a fit on a modified version of the original data set. Gradient Boosting trains many models in a gradual, additive and sequential manner. Gradient boosting performs the identifies the shortcomings of weak learners by using gradients in the loss function (y=ax+b+error). The loss function is a measure indicating how good are model’s coefficients are at fitting the underlying data.
Bagging fraction is the fraction of the training set observations randomly selected to propose the next tree in the expansion. Large bagging fraction may have the most explanatory variables involved which they may have too much importance. Small bagging fraction allows other variables to be modeled in the next step with randomness.
Learning rate, also called shrinkage, is used for reducing or shrinking the impact of each additional fitted base-learner (tree). It reduces the size of incremental steps, slows down the learning, and thus penalizes the importance of each consecutive iteration. It is better to improve a model by taking many small steps than by taking fewer large steps. Small learning rate reduces overfitting.
Back to the question, the model on the left has both parameters set to 0.1, which allows only 10% impact from the previous tree and allows 90% randomness of variables to be modeled in the next step. Thus more variables are being counted in each steps, and therefore in the final tree importance. The model on the right has both parameters set to 0.9, which each step focus 90% on the previous tree and uses 90% variables from the previous tree. Therefore it has most importance on only few variables.
1.3.2 Part b
Which model do you think would be more predictive of other samples?
Answer:
A model with small bagging fraction and small learning rate allows better generalization as it considers more variables and takes small steps to learn in the process and has a lower chance of overfitting. The model on the left with both parameters set to 0.1 would be more predictive of other samples than the one with parameters set to 0.9.
1.3.3 Part c
How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?
Answer:
Increasing interaction depth, which is also called tree depth, will allow more predictors to be included in the model, which would brings more importances across more predictors. That is, similar to the model on the left. The slope of the predictor importance will therefore be more flattened (with smaller slope).
1.4 Ex. 8.7
Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:
(a.) Which tree-based regression model gives the optimal resampling and test set performance?
(b.) Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?
(c.) Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?
1.4.1 Data Pre-Processing
The matrix ChemicalManufacturingProcess contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs, plus the variable yield which contains the percent yield for each run.
A small percentage of cells in the predictor set contain missing values. Used a knn imputation function to fill in these missing values.
## Yield BiologicalMaterial01 BiologicalMaterial02
## Min. :35.25 Min. :4.580 Min. :46.87
## 1st Qu.:38.75 1st Qu.:5.978 1st Qu.:52.68
## Median :39.97 Median :6.305 Median :55.09
## Mean :40.18 Mean :6.411 Mean :55.69
## 3rd Qu.:41.48 3rd Qu.:6.870 3rd Qu.:58.74
## Max. :46.34 Max. :8.810 Max. :64.75
##
## BiologicalMaterial03 BiologicalMaterial04 BiologicalMaterial05
## Min. :56.97 Min. : 9.38 Min. :13.24
## 1st Qu.:64.98 1st Qu.:11.24 1st Qu.:17.23
## Median :67.22 Median :12.10 Median :18.49
## Mean :67.70 Mean :12.35 Mean :18.60
## 3rd Qu.:70.43 3rd Qu.:13.22 3rd Qu.:19.90
## Max. :78.25 Max. :23.09 Max. :24.85
##
## BiologicalMaterial06 BiologicalMaterial07 BiologicalMaterial08
## Min. :40.60 Min. :100.0 Min. :15.88
## 1st Qu.:46.05 1st Qu.:100.0 1st Qu.:17.06
## Median :48.46 Median :100.0 Median :17.51
## Mean :48.91 Mean :100.0 Mean :17.49
## 3rd Qu.:51.34 3rd Qu.:100.0 3rd Qu.:17.88
## Max. :59.38 Max. :100.8 Max. :19.14
##
## BiologicalMaterial09 BiologicalMaterial10 BiologicalMaterial11
## Min. :11.44 Min. :1.770 Min. :135.8
## 1st Qu.:12.60 1st Qu.:2.460 1st Qu.:143.8
## Median :12.84 Median :2.710 Median :146.1
## Mean :12.85 Mean :2.801 Mean :147.0
## 3rd Qu.:13.13 3rd Qu.:2.990 3rd Qu.:149.6
## Max. :14.08 Max. :6.870 Max. :158.7
##
## BiologicalMaterial12 ManufacturingProcess01 ManufacturingProcess02
## Min. :18.35 Min. : 0.00 Min. : 0.00
## 1st Qu.:19.73 1st Qu.:10.80 1st Qu.:19.30
## Median :20.12 Median :11.40 Median :21.00
## Mean :20.20 Mean :11.21 Mean :16.68
## 3rd Qu.:20.75 3rd Qu.:12.15 3rd Qu.:21.50
## Max. :22.21 Max. :14.10 Max. :22.50
## NA's :1 NA's :3
## ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05
## Min. :1.47 Min. :911.0 Min. : 923.0
## 1st Qu.:1.53 1st Qu.:928.0 1st Qu.: 986.8
## Median :1.54 Median :934.0 Median : 999.2
## Mean :1.54 Mean :931.9 Mean :1001.7
## 3rd Qu.:1.55 3rd Qu.:936.0 3rd Qu.:1008.9
## Max. :1.60 Max. :946.0 Max. :1175.3
## NA's :15 NA's :1 NA's :1
## ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08
## Min. :203.0 Min. :177.0 Min. :177.0
## 1st Qu.:205.7 1st Qu.:177.0 1st Qu.:177.0
## Median :206.8 Median :177.0 Median :178.0
## Mean :207.4 Mean :177.5 Mean :177.6
## 3rd Qu.:208.7 3rd Qu.:178.0 3rd Qu.:178.0
## Max. :227.4 Max. :178.0 Max. :178.0
## NA's :2 NA's :1 NA's :1
## ManufacturingProcess09 ManufacturingProcess10 ManufacturingProcess11
## Min. :38.89 Min. : 7.500 Min. : 7.500
## 1st Qu.:44.89 1st Qu.: 8.700 1st Qu.: 9.000
## Median :45.73 Median : 9.100 Median : 9.400
## Mean :45.66 Mean : 9.179 Mean : 9.386
## 3rd Qu.:46.52 3rd Qu.: 9.550 3rd Qu.: 9.900
## Max. :49.36 Max. :11.600 Max. :11.500
## NA's :9 NA's :10
## ManufacturingProcess12 ManufacturingProcess13 ManufacturingProcess14
## Min. : 0.0 Min. :32.10 Min. :4701
## 1st Qu.: 0.0 1st Qu.:33.90 1st Qu.:4828
## Median : 0.0 Median :34.60 Median :4856
## Mean : 857.8 Mean :34.51 Mean :4854
## 3rd Qu.: 0.0 3rd Qu.:35.20 3rd Qu.:4882
## Max. :4549.0 Max. :38.60 Max. :5055
## NA's :1 NA's :1
## ManufacturingProcess15 ManufacturingProcess16 ManufacturingProcess17
## Min. :5904 Min. : 0 Min. :31.30
## 1st Qu.:6010 1st Qu.:4561 1st Qu.:33.50
## Median :6032 Median :4588 Median :34.40
## Mean :6039 Mean :4566 Mean :34.34
## 3rd Qu.:6061 3rd Qu.:4619 3rd Qu.:35.10
## Max. :6233 Max. :4852 Max. :40.00
##
## ManufacturingProcess18 ManufacturingProcess19 ManufacturingProcess20
## Min. : 0 Min. :5890 Min. : 0
## 1st Qu.:4813 1st Qu.:6001 1st Qu.:4553
## Median :4835 Median :6022 Median :4582
## Mean :4810 Mean :6028 Mean :4556
## 3rd Qu.:4862 3rd Qu.:6050 3rd Qu.:4610
## Max. :4971 Max. :6146 Max. :4759
##
## ManufacturingProcess21 ManufacturingProcess22 ManufacturingProcess23
## Min. :-1.8000 Min. : 0.000 Min. :0.000
## 1st Qu.:-0.6000 1st Qu.: 3.000 1st Qu.:2.000
## Median :-0.3000 Median : 5.000 Median :3.000
## Mean :-0.1642 Mean : 5.406 Mean :3.017
## 3rd Qu.: 0.0000 3rd Qu.: 8.000 3rd Qu.:4.000
## Max. : 3.6000 Max. :12.000 Max. :6.000
## NA's :1 NA's :1
## ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26
## Min. : 0.000 Min. : 0 Min. : 0
## 1st Qu.: 4.000 1st Qu.:4832 1st Qu.:6020
## Median : 8.000 Median :4855 Median :6047
## Mean : 8.834 Mean :4828 Mean :6016
## 3rd Qu.:14.000 3rd Qu.:4877 3rd Qu.:6070
## Max. :23.000 Max. :4990 Max. :6161
## NA's :1 NA's :5 NA's :5
## ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29
## Min. : 0 Min. : 0.000 Min. : 0.00
## 1st Qu.:4560 1st Qu.: 0.000 1st Qu.:19.70
## Median :4587 Median :10.400 Median :19.90
## Mean :4563 Mean : 6.592 Mean :20.01
## 3rd Qu.:4609 3rd Qu.:10.750 3rd Qu.:20.40
## Max. :4710 Max. :11.500 Max. :22.00
## NA's :5 NA's :5 NA's :5
## ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess32
## Min. : 0.000 Min. : 0.00 Min. :143.0
## 1st Qu.: 8.800 1st Qu.:70.10 1st Qu.:155.0
## Median : 9.100 Median :70.80 Median :158.0
## Mean : 9.161 Mean :70.18 Mean :158.5
## 3rd Qu.: 9.700 3rd Qu.:71.40 3rd Qu.:162.0
## Max. :11.200 Max. :72.50 Max. :173.0
## NA's :5 NA's :5
## ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35
## Min. :56.00 Min. :2.300 Min. :463.0
## 1st Qu.:62.00 1st Qu.:2.500 1st Qu.:490.0
## Median :64.00 Median :2.500 Median :495.0
## Mean :63.54 Mean :2.494 Mean :495.6
## 3rd Qu.:65.00 3rd Qu.:2.500 3rd Qu.:501.5
## Max. :70.00 Max. :2.600 Max. :522.0
## NA's :5 NA's :5 NA's :5
## ManufacturingProcess36 ManufacturingProcess37 ManufacturingProcess38
## Min. :0.01700 Min. :0.000 Min. :0.000
## 1st Qu.:0.01900 1st Qu.:0.700 1st Qu.:2.000
## Median :0.02000 Median :1.000 Median :3.000
## Mean :0.01957 Mean :1.014 Mean :2.534
## 3rd Qu.:0.02000 3rd Qu.:1.300 3rd Qu.:3.000
## Max. :0.02200 Max. :2.300 Max. :3.000
## NA's :5
## ManufacturingProcess39 ManufacturingProcess40 ManufacturingProcess41
## Min. :0.000 Min. :0.00000 Min. :0.00000
## 1st Qu.:7.100 1st Qu.:0.00000 1st Qu.:0.00000
## Median :7.200 Median :0.00000 Median :0.00000
## Mean :6.851 Mean :0.01771 Mean :0.02371
## 3rd Qu.:7.300 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :7.500 Max. :0.10000 Max. :0.20000
## NA's :1 NA's :1
## ManufacturingProcess42 ManufacturingProcess43 ManufacturingProcess44
## Min. : 0.00 Min. : 0.0000 Min. :0.000
## 1st Qu.:11.40 1st Qu.: 0.6000 1st Qu.:1.800
## Median :11.60 Median : 0.8000 Median :1.900
## Mean :11.21 Mean : 0.9119 Mean :1.805
## 3rd Qu.:11.70 3rd Qu.: 1.0250 3rd Qu.:1.900
## Max. :12.10 Max. :11.0000 Max. :2.100
##
## ManufacturingProcess45
## Min. :0.000
## 1st Qu.:2.100
## Median :2.200
## Mean :2.138
## 3rd Qu.:2.300
## Max. :2.600
##
predictors <- ChemicalManufacturingProcess[,-c(1)]
#fill in missing values from textbook sec3.8
cmp_pre <- preProcess(predictors, method="knnImpute")
#apply the transformations
cmp_predictors <- predict(cmp_pre, predictors)Split the data into a training and a test set, pre-process the data, and tune models
- Pre-process the data with centering and scaling.
cmp_pre <- preProcess(cmp_predictors, method=c("center", "scale"))
cmp_predictors <- predict(cmp_pre, cmp_predictors)- Train-test split at 70%
set.seed(200)
trainingRows <- createDataPartition(ChemicalManufacturingProcess$Yield,
p=0.70, list=FALSE) #caret, textbook sec4.9
train_X <- cmp_predictors[trainingRows, ]
train_Y <- ChemicalManufacturingProcess$Yield[trainingRows]
test_X <- cmp_predictors[-trainingRows, ]
test_Y <- ChemicalManufacturingProcess$Yield[-trainingRows]1.4.2 Models
Single Tree
set.seed(seed)
stModel <- train(x = train_X, y = train_Y, method = "rpart",
tuneLength = 10, control=rpart.control(maxdepth=2))
stModel## CART
##
## 124 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ...
## Resampling results across tuning parameters:
##
## cp RMSE Rsquared MAE
## 0.00000000 1.518103 0.3266369 1.202472
## 0.04395540 1.520401 0.3253569 1.206507
## 0.08791079 1.526268 0.3148964 1.214521
## 0.13186619 1.530143 0.3014006 1.213307
## 0.17582158 1.534424 0.2986228 1.220919
## 0.21977698 1.539533 0.2950447 1.226515
## 0.26373237 1.539533 0.2950447 1.226515
## 0.30768777 1.568565 0.2883506 1.251475
## 0.35164316 1.629782 0.2853446 1.308522
## 0.39559856 1.676726 0.2834380 1.351730
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.
## rpart variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess31 100.00
## BiologicalMaterial12 75.47
## ManufacturingProcess17 70.32
## ManufacturingProcess28 64.15
## ManufacturingProcess32 60.73
## BiologicalMaterial11 37.69
## BiologicalMaterial03 37.00
## ManufacturingProcess09 34.69
## ManufacturingProcess15 28.21
## ManufacturingProcess06 28.17
## ManufacturingProcess24 0.00
## ManufacturingProcess40 0.00
## ManufacturingProcess21 0.00
## BiologicalMaterial02 0.00
## ManufacturingProcess26 0.00
## ManufacturingProcess39 0.00
## ManufacturingProcess18 0.00
## BiologicalMaterial05 0.00
## BiologicalMaterial09 0.00
## ManufacturingProcess04 0.00
## RMSE Rsquared MAE
## 1.4857317 0.4525266 1.1270232
Random Forest
## Random Forest
##
## 124 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 1.316662 0.5519327 1.0379859
## 8 1.246818 0.5747840 0.9699150
## 14 1.233179 0.5773851 0.9522117
## 20 1.227264 0.5740008 0.9432728
## 26 1.222440 0.5738101 0.9356269
## 32 1.224043 0.5684739 0.9394338
## 38 1.226168 0.5641929 0.9410744
## 44 1.232027 0.5566863 0.9473137
## 50 1.239039 0.5485328 0.9527932
## 57 1.245284 0.5420043 0.9600014
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 26.
## rf variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess32 100.000
## ManufacturingProcess31 23.708
## BiologicalMaterial12 21.664
## BiologicalMaterial03 19.208
## ManufacturingProcess17 18.552
## BiologicalMaterial06 14.834
## ManufacturingProcess28 11.787
## ManufacturingProcess13 11.767
## ManufacturingProcess09 10.755
## ManufacturingProcess06 10.622
## BiologicalMaterial04 8.729
## BiologicalMaterial11 8.145
## ManufacturingProcess36 6.385
## ManufacturingProcess11 6.018
## ManufacturingProcess15 5.815
## BiologicalMaterial09 5.666
## ManufacturingProcess18 5.413
## BiologicalMaterial08 5.361
## ManufacturingProcess21 5.105
## ManufacturingProcess39 4.935
## RMSE Rsquared MAE
## 1.2303142 0.6837008 0.9528876
Gradient Boosting
set.seed(seed)
gbmGrid <- expand.grid(interaction.depth=seq(1,7,by=2),
n.trees=seq(100,1000,by=50),
shrinkage=c(0.01,0.1),
n.minobsinnode=c(5,10))
gbModel <- train(x = train_X, y = train_Y, method = "gbm", tuneGrid = gbmGrid, verbose=FALSE)
gbModel## Stochastic Gradient Boosting
##
## 124 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ...
## Resampling results across tuning parameters:
##
## shrinkage interaction.depth n.minobsinnode n.trees RMSE
## 0.01 1 5 100 1.480277
## 0.01 1 5 150 1.405650
## 0.01 1 5 200 1.355092
## 0.01 1 5 250 1.319146
## 0.01 1 5 300 1.295846
## 0.01 1 5 350 1.276031
## 0.01 1 5 400 1.262732
## 0.01 1 5 450 1.252730
## 0.01 1 5 500 1.245207
## 0.01 1 5 550 1.239166
## 0.01 1 5 600 1.233629
## 0.01 1 5 650 1.230084
## 0.01 1 5 700 1.227514
## 0.01 1 5 750 1.224667
## 0.01 1 5 800 1.222653
## 0.01 1 5 850 1.219236
## 0.01 1 5 900 1.217531
## 0.01 1 5 950 1.216385
## 0.01 1 5 1000 1.214763
## 0.01 1 10 100 1.476244
## 0.01 1 10 150 1.399333
## 0.01 1 10 200 1.350291
## 0.01 1 10 250 1.317951
## 0.01 1 10 300 1.296651
## 0.01 1 10 350 1.281203
## 0.01 1 10 400 1.270028
## 0.01 1 10 450 1.260320
## 0.01 1 10 500 1.253534
## 0.01 1 10 550 1.247147
## 0.01 1 10 600 1.242175
## 0.01 1 10 650 1.238170
## 0.01 1 10 700 1.235831
## 0.01 1 10 750 1.231925
## 0.01 1 10 800 1.229383
## 0.01 1 10 850 1.227090
## 0.01 1 10 900 1.224215
## 0.01 1 10 950 1.222430
## 0.01 1 10 1000 1.220555
## 0.01 3 5 100 1.391434
## 0.01 3 5 150 1.318215
## 0.01 3 5 200 1.275600
## 0.01 3 5 250 1.247046
## 0.01 3 5 300 1.230361
## 0.01 3 5 350 1.217753
## 0.01 3 5 400 1.207990
## 0.01 3 5 450 1.200779
## 0.01 3 5 500 1.194489
## 0.01 3 5 550 1.190947
## 0.01 3 5 600 1.186986
## 0.01 3 5 650 1.184187
## 0.01 3 5 700 1.181125
## 0.01 3 5 750 1.177897
## 0.01 3 5 800 1.175333
## 0.01 3 5 850 1.173383
## 0.01 3 5 900 1.171773
## 0.01 3 5 950 1.170630
## 0.01 3 5 1000 1.168734
## 0.01 3 10 100 1.389842
## 0.01 3 10 150 1.314983
## 0.01 3 10 200 1.275882
## 0.01 3 10 250 1.252611
## 0.01 3 10 300 1.237562
## 0.01 3 10 350 1.227288
## 0.01 3 10 400 1.219444
## 0.01 3 10 450 1.212378
## 0.01 3 10 500 1.207135
## 0.01 3 10 550 1.203530
## 0.01 3 10 600 1.199398
## 0.01 3 10 650 1.195837
## 0.01 3 10 700 1.193297
## 0.01 3 10 750 1.190485
## 0.01 3 10 800 1.188157
## 0.01 3 10 850 1.186210
## 0.01 3 10 900 1.184725
## 0.01 3 10 950 1.183036
## 0.01 3 10 1000 1.181317
## 0.01 5 5 100 1.377714
## 0.01 5 5 150 1.303842
## 0.01 5 5 200 1.264003
## 0.01 5 5 250 1.238545
## 0.01 5 5 300 1.221581
## 0.01 5 5 350 1.210239
## 0.01 5 5 400 1.202022
## 0.01 5 5 450 1.195110
## 0.01 5 5 500 1.190129
## 0.01 5 5 550 1.185702
## 0.01 5 5 600 1.182469
## 0.01 5 5 650 1.179294
## 0.01 5 5 700 1.176763
## 0.01 5 5 750 1.175219
## 0.01 5 5 800 1.173553
## 0.01 5 5 850 1.172043
## 0.01 5 5 900 1.170960
## 0.01 5 5 950 1.170169
## 0.01 5 5 1000 1.169261
## 0.01 5 10 100 1.381348
## 0.01 5 10 150 1.310417
## 0.01 5 10 200 1.272649
## 0.01 5 10 250 1.248859
## 0.01 5 10 300 1.233604
## 0.01 5 10 350 1.222545
## 0.01 5 10 400 1.213892
## 0.01 5 10 450 1.208156
## 0.01 5 10 500 1.202792
## 0.01 5 10 550 1.198690
## 0.01 5 10 600 1.194886
## 0.01 5 10 650 1.191626
## 0.01 5 10 700 1.189293
## 0.01 5 10 750 1.187027
## 0.01 5 10 800 1.185103
## 0.01 5 10 850 1.183326
## 0.01 5 10 900 1.182079
## 0.01 5 10 950 1.181316
## 0.01 5 10 1000 1.180262
## 0.01 7 5 100 1.369447
## 0.01 7 5 150 1.296516
## 0.01 7 5 200 1.258112
## 0.01 7 5 250 1.234191
## 0.01 7 5 300 1.218584
## 0.01 7 5 350 1.207082
## 0.01 7 5 400 1.199157
## 0.01 7 5 450 1.193588
## 0.01 7 5 500 1.187955
## 0.01 7 5 550 1.183680
## 0.01 7 5 600 1.180914
## 0.01 7 5 650 1.178196
## 0.01 7 5 700 1.176420
## 0.01 7 5 750 1.174903
## 0.01 7 5 800 1.173505
## 0.01 7 5 850 1.172021
## 0.01 7 5 900 1.171323
## 0.01 7 5 950 1.170875
## 0.01 7 5 1000 1.170281
## 0.01 7 10 100 1.388818
## 0.01 7 10 150 1.317273
## 0.01 7 10 200 1.280265
## 0.01 7 10 250 1.257073
## 0.01 7 10 300 1.241434
## 0.01 7 10 350 1.230961
## 0.01 7 10 400 1.221933
## 0.01 7 10 450 1.215610
## 0.01 7 10 500 1.209952
## 0.01 7 10 550 1.205849
## 0.01 7 10 600 1.202015
## 0.01 7 10 650 1.198899
## 0.01 7 10 700 1.196131
## 0.01 7 10 750 1.194262
## 0.01 7 10 800 1.191174
## 0.01 7 10 850 1.189093
## 0.01 7 10 900 1.188075
## 0.01 7 10 950 1.186651
## 0.01 7 10 1000 1.185664
## 0.10 1 5 100 1.236296
## 0.10 1 5 150 1.229715
## 0.10 1 5 200 1.226412
## 0.10 1 5 250 1.227304
## 0.10 1 5 300 1.229313
## 0.10 1 5 350 1.233080
## 0.10 1 5 400 1.233562
## 0.10 1 5 450 1.239474
## 0.10 1 5 500 1.240238
## 0.10 1 5 550 1.244110
## 0.10 1 5 600 1.245817
## 0.10 1 5 650 1.246932
## 0.10 1 5 700 1.248729
## 0.10 1 5 750 1.249701
## 0.10 1 5 800 1.250663
## 0.10 1 5 850 1.251227
## 0.10 1 5 900 1.251879
## 0.10 1 5 950 1.252762
## 0.10 1 5 1000 1.253236
## 0.10 1 10 100 1.229125
## 0.10 1 10 150 1.228103
## 0.10 1 10 200 1.224388
## 0.10 1 10 250 1.225189
## 0.10 1 10 300 1.224493
## 0.10 1 10 350 1.225377
## 0.10 1 10 400 1.225896
## 0.10 1 10 450 1.229155
## 0.10 1 10 500 1.230161
## 0.10 1 10 550 1.232920
## 0.10 1 10 600 1.233681
## 0.10 1 10 650 1.235450
## 0.10 1 10 700 1.235377
## 0.10 1 10 750 1.235835
## 0.10 1 10 800 1.236428
## 0.10 1 10 850 1.237547
## 0.10 1 10 900 1.237875
## 0.10 1 10 950 1.238537
## 0.10 1 10 1000 1.238830
## 0.10 3 5 100 1.188394
## 0.10 3 5 150 1.183286
## 0.10 3 5 200 1.181185
## 0.10 3 5 250 1.179614
## 0.10 3 5 300 1.179219
## 0.10 3 5 350 1.179188
## 0.10 3 5 400 1.179453
## 0.10 3 5 450 1.179529
## 0.10 3 5 500 1.179707
## 0.10 3 5 550 1.179728
## 0.10 3 5 600 1.179723
## 0.10 3 5 650 1.179758
## 0.10 3 5 700 1.179785
## 0.10 3 5 750 1.179821
## 0.10 3 5 800 1.179822
## 0.10 3 5 850 1.179828
## 0.10 3 5 900 1.179838
## 0.10 3 5 950 1.179827
## 0.10 3 5 1000 1.179835
## 0.10 3 10 100 1.202966
## 0.10 3 10 150 1.196589
## 0.10 3 10 200 1.198090
## 0.10 3 10 250 1.197598
## 0.10 3 10 300 1.197822
## 0.10 3 10 350 1.196939
## 0.10 3 10 400 1.196942
## 0.10 3 10 450 1.196721
## 0.10 3 10 500 1.196766
## 0.10 3 10 550 1.196736
## 0.10 3 10 600 1.196688
## 0.10 3 10 650 1.196753
## 0.10 3 10 700 1.196720
## 0.10 3 10 750 1.196622
## 0.10 3 10 800 1.196543
## 0.10 3 10 850 1.196544
## 0.10 3 10 900 1.196554
## 0.10 3 10 950 1.196558
## 0.10 3 10 1000 1.196536
## 0.10 5 5 100 1.205322
## 0.10 5 5 150 1.199964
## 0.10 5 5 200 1.196647
## 0.10 5 5 250 1.195825
## 0.10 5 5 300 1.195202
## 0.10 5 5 350 1.194632
## 0.10 5 5 400 1.194317
## 0.10 5 5 450 1.194007
## 0.10 5 5 500 1.193839
## 0.10 5 5 550 1.193806
## 0.10 5 5 600 1.193701
## 0.10 5 5 650 1.193564
## 0.10 5 5 700 1.193531
## 0.10 5 5 750 1.193516
## 0.10 5 5 800 1.193516
## 0.10 5 5 850 1.193505
## 0.10 5 5 900 1.193506
## 0.10 5 5 950 1.193498
## 0.10 5 5 1000 1.193490
## 0.10 5 10 100 1.209579
## 0.10 5 10 150 1.201874
## 0.10 5 10 200 1.200986
## 0.10 5 10 250 1.198996
## 0.10 5 10 300 1.197635
## 0.10 5 10 350 1.197149
## 0.10 5 10 400 1.197557
## 0.10 5 10 450 1.196966
## 0.10 5 10 500 1.196780
## 0.10 5 10 550 1.196480
## 0.10 5 10 600 1.196429
## 0.10 5 10 650 1.196497
## 0.10 5 10 700 1.196561
## 0.10 5 10 750 1.196547
## 0.10 5 10 800 1.196610
## 0.10 5 10 850 1.196643
## 0.10 5 10 900 1.196613
## 0.10 5 10 950 1.196637
## 0.10 5 10 1000 1.196680
## 0.10 7 5 100 1.195582
## 0.10 7 5 150 1.194296
## 0.10 7 5 200 1.192201
## 0.10 7 5 250 1.191540
## 0.10 7 5 300 1.190906
## 0.10 7 5 350 1.190655
## 0.10 7 5 400 1.190366
## 0.10 7 5 450 1.190335
## 0.10 7 5 500 1.190363
## 0.10 7 5 550 1.190190
## 0.10 7 5 600 1.190082
## 0.10 7 5 650 1.190036
## 0.10 7 5 700 1.189992
## 0.10 7 5 750 1.189970
## 0.10 7 5 800 1.189969
## 0.10 7 5 850 1.189972
## 0.10 7 5 900 1.189964
## 0.10 7 5 950 1.189970
## 0.10 7 5 1000 1.189965
## 0.10 7 10 100 1.211618
## 0.10 7 10 150 1.207120
## 0.10 7 10 200 1.204088
## 0.10 7 10 250 1.202100
## 0.10 7 10 300 1.201943
## 0.10 7 10 350 1.202077
## 0.10 7 10 400 1.200898
## 0.10 7 10 450 1.200365
## 0.10 7 10 500 1.200129
## 0.10 7 10 550 1.199716
## 0.10 7 10 600 1.199647
## 0.10 7 10 650 1.199753
## 0.10 7 10 700 1.199571
## 0.10 7 10 750 1.199469
## 0.10 7 10 800 1.199625
## 0.10 7 10 850 1.199719
## 0.10 7 10 900 1.199830
## 0.10 7 10 950 1.199885
## 0.10 7 10 1000 1.199920
## Rsquared MAE
## 0.4908325 1.1931899
## 0.5103237 1.1245025
## 0.5200618 1.0737862
## 0.5288867 1.0385217
## 0.5334796 1.0176124
## 0.5386618 0.9978822
## 0.5427609 0.9857718
## 0.5453326 0.9770929
## 0.5470237 0.9706041
## 0.5483820 0.9647363
## 0.5504850 0.9604681
## 0.5513261 0.9569613
## 0.5516462 0.9549625
## 0.5528817 0.9527118
## 0.5532762 0.9509207
## 0.5548490 0.9478902
## 0.5551491 0.9460249
## 0.5555386 0.9456493
## 0.5560613 0.9447399
## 0.4884585 1.1894450
## 0.5071834 1.1158803
## 0.5174216 1.0668578
## 0.5226412 1.0341976
## 0.5264303 1.0126520
## 0.5299600 0.9964057
## 0.5331289 0.9866965
## 0.5365916 0.9779900
## 0.5380827 0.9717191
## 0.5396867 0.9657190
## 0.5418053 0.9614864
## 0.5433572 0.9590986
## 0.5441150 0.9574631
## 0.5461394 0.9546771
## 0.5467197 0.9529351
## 0.5476160 0.9510764
## 0.5489032 0.9487403
## 0.5497959 0.9478753
## 0.5508385 0.9469974
## 0.5253746 1.1137493
## 0.5371687 1.0441789
## 0.5449372 1.0030779
## 0.5533355 0.9752265
## 0.5587876 0.9591325
## 0.5637729 0.9471088
## 0.5678988 0.9387539
## 0.5709212 0.9327625
## 0.5743061 0.9274757
## 0.5758459 0.9250448
## 0.5777817 0.9215359
## 0.5790080 0.9187230
## 0.5804325 0.9160588
## 0.5821706 0.9137929
## 0.5834469 0.9119795
## 0.5843144 0.9103300
## 0.5851808 0.9092155
## 0.5855080 0.9084002
## 0.5864923 0.9069573
## 0.5223767 1.1139420
## 0.5347878 1.0409448
## 0.5407576 1.0025027
## 0.5461754 0.9779471
## 0.5503623 0.9625670
## 0.5536057 0.9520911
## 0.5561751 0.9443170
## 0.5592364 0.9382266
## 0.5617349 0.9340961
## 0.5634000 0.9311316
## 0.5655795 0.9281181
## 0.5676945 0.9256045
## 0.5689972 0.9236491
## 0.5708303 0.9217909
## 0.5721534 0.9200079
## 0.5733982 0.9190758
## 0.5743821 0.9181317
## 0.5753122 0.9173104
## 0.5763228 0.9161348
## 0.5242327 1.1007058
## 0.5397913 1.0284910
## 0.5486446 0.9870598
## 0.5562431 0.9608080
## 0.5633055 0.9449055
## 0.5686654 0.9347102
## 0.5723489 0.9280612
## 0.5758848 0.9221731
## 0.5783005 0.9179424
## 0.5807967 0.9146446
## 0.5826104 0.9116928
## 0.5842859 0.9090868
## 0.5856436 0.9074319
## 0.5863889 0.9061914
## 0.5873318 0.9050684
## 0.5880629 0.9038106
## 0.5885878 0.9031556
## 0.5890708 0.9025576
## 0.5895472 0.9020410
## 0.5247189 1.1043148
## 0.5354504 1.0339834
## 0.5418032 0.9966811
## 0.5480337 0.9725938
## 0.5527884 0.9572641
## 0.5570909 0.9464999
## 0.5606948 0.9395797
## 0.5631491 0.9345870
## 0.5658379 0.9300802
## 0.5675371 0.9267840
## 0.5697878 0.9243803
## 0.5718704 0.9221024
## 0.5730387 0.9202456
## 0.5743161 0.9191866
## 0.5752236 0.9184038
## 0.5762575 0.9173323
## 0.5769503 0.9166611
## 0.5773602 0.9163521
## 0.5781538 0.9156080
## 0.5338744 1.0893997
## 0.5458778 1.0181469
## 0.5539123 0.9795895
## 0.5606404 0.9561740
## 0.5663679 0.9413084
## 0.5716825 0.9314444
## 0.5751438 0.9243343
## 0.5779934 0.9194282
## 0.5809698 0.9149178
## 0.5832469 0.9114761
## 0.5849602 0.9094936
## 0.5866478 0.9074566
## 0.5876183 0.9063568
## 0.5884269 0.9051404
## 0.5892274 0.9044433
## 0.5900716 0.9033013
## 0.5904011 0.9027985
## 0.5904439 0.9025145
## 0.5906703 0.9020785
## 0.5193600 1.1104417
## 0.5285172 1.0397835
## 0.5348050 1.0024346
## 0.5418272 0.9788711
## 0.5469370 0.9637361
## 0.5506809 0.9543742
## 0.5548878 0.9463987
## 0.5578626 0.9413567
## 0.5606873 0.9365738
## 0.5629653 0.9336666
## 0.5649020 0.9308903
## 0.5666439 0.9286554
## 0.5681137 0.9265941
## 0.5689678 0.9251281
## 0.5709455 0.9228442
## 0.5722629 0.9210798
## 0.5728198 0.9205647
## 0.5737098 0.9192394
## 0.5741672 0.9185819
## 0.5372629 0.9637514
## 0.5404137 0.9594601
## 0.5414620 0.9574969
## 0.5411750 0.9595110
## 0.5396162 0.9615799
## 0.5378739 0.9653306
## 0.5377329 0.9653213
## 0.5344389 0.9714472
## 0.5335452 0.9723211
## 0.5312124 0.9764990
## 0.5308413 0.9780840
## 0.5301167 0.9789977
## 0.5290555 0.9799077
## 0.5285096 0.9804556
## 0.5280481 0.9807729
## 0.5278040 0.9816271
## 0.5274085 0.9823637
## 0.5268913 0.9833122
## 0.5264715 0.9837781
## 0.5409253 0.9554961
## 0.5423221 0.9528773
## 0.5443743 0.9550134
## 0.5435756 0.9558658
## 0.5446493 0.9557889
## 0.5444307 0.9570971
## 0.5440504 0.9572776
## 0.5422206 0.9605232
## 0.5421337 0.9605281
## 0.5408592 0.9633225
## 0.5404975 0.9650603
## 0.5396567 0.9650508
## 0.5396509 0.9655000
## 0.5391657 0.9659862
## 0.5386921 0.9664930
## 0.5381817 0.9672363
## 0.5383907 0.9676737
## 0.5381919 0.9683416
## 0.5379569 0.9682445
## 0.5714082 0.9250431
## 0.5731176 0.9216732
## 0.5739989 0.9207203
## 0.5745290 0.9200016
## 0.5746479 0.9196219
## 0.5745796 0.9197168
## 0.5743516 0.9200909
## 0.5742723 0.9201164
## 0.5741313 0.9202934
## 0.5741050 0.9203314
## 0.5741017 0.9203665
## 0.5740778 0.9204311
## 0.5740597 0.9204489
## 0.5740376 0.9204712
## 0.5740375 0.9204639
## 0.5740345 0.9204604
## 0.5740265 0.9204676
## 0.5740335 0.9204546
## 0.5740271 0.9204599
## 0.5577999 0.9399620
## 0.5626126 0.9345153
## 0.5614444 0.9357624
## 0.5619953 0.9347600
## 0.5617574 0.9354543
## 0.5624130 0.9346478
## 0.5624637 0.9345707
## 0.5626111 0.9346582
## 0.5626148 0.9346653
## 0.5627025 0.9348068
## 0.5627453 0.9347937
## 0.5627323 0.9348611
## 0.5627729 0.9348638
## 0.5628389 0.9348618
## 0.5628775 0.9348037
## 0.5628945 0.9348893
## 0.5628922 0.9349046
## 0.5629021 0.9349102
## 0.5629204 0.9348764
## 0.5621312 0.9436128
## 0.5649367 0.9405323
## 0.5671021 0.9383911
## 0.5673524 0.9382306
## 0.5676558 0.9378540
## 0.5678913 0.9375442
## 0.5680341 0.9374395
## 0.5681602 0.9373256
## 0.5682552 0.9373237
## 0.5682304 0.9372828
## 0.5683128 0.9372654
## 0.5683924 0.9371842
## 0.5684116 0.9371828
## 0.5684060 0.9371707
## 0.5683946 0.9371710
## 0.5683981 0.9371755
## 0.5683943 0.9371718
## 0.5683950 0.9371699
## 0.5684002 0.9371617
## 0.5542034 0.9357214
## 0.5592937 0.9301293
## 0.5599679 0.9315695
## 0.5613445 0.9309785
## 0.5622354 0.9296579
## 0.5626483 0.9295774
## 0.5624368 0.9301805
## 0.5627549 0.9300119
## 0.5628734 0.9301154
## 0.5630306 0.9299221
## 0.5631073 0.9302270
## 0.5630819 0.9303946
## 0.5630952 0.9305430
## 0.5631320 0.9305767
## 0.5630989 0.9306711
## 0.5630972 0.9308512
## 0.5631219 0.9308361
## 0.5631271 0.9308800
## 0.5630853 0.9309518
## 0.5673104 0.9328421
## 0.5676770 0.9321263
## 0.5689402 0.9303719
## 0.5692543 0.9299531
## 0.5696439 0.9293259
## 0.5697125 0.9293223
## 0.5698579 0.9292341
## 0.5698725 0.9293235
## 0.5698573 0.9293814
## 0.5699639 0.9292623
## 0.5700266 0.9291855
## 0.5700594 0.9291788
## 0.5700864 0.9291572
## 0.5700939 0.9291476
## 0.5701008 0.9291585
## 0.5700993 0.9291602
## 0.5701006 0.9291522
## 0.5700963 0.9291566
## 0.5700987 0.9291542
## 0.5538418 0.9489126
## 0.5570075 0.9453812
## 0.5590230 0.9443949
## 0.5603893 0.9431830
## 0.5606448 0.9431956
## 0.5604910 0.9437100
## 0.5613122 0.9433862
## 0.5616436 0.9429939
## 0.5617800 0.9429204
## 0.5620990 0.9429994
## 0.5621016 0.9430984
## 0.5620536 0.9433932
## 0.5621531 0.9432156
## 0.5622207 0.9431898
## 0.5620885 0.9432770
## 0.5620199 0.9434294
## 0.5619402 0.9435134
## 0.5618997 0.9435909
## 0.5618711 0.9436321
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1000,
## interaction.depth = 3, shrinkage = 0.01 and n.minobsinnode = 5.
gbModel$results[which(gbModel$results$n.trees==gbModel$bestTune$n.trees
& gbModel$results$interaction.depth==gbModel$bestTune$interaction.depth
& gbModel$results$shrinkage==gbModel$bestTune$shrinkage
& gbModel$results$n.minobsinnode==gbModel$bestTune$n.minobsinnode),]## gbm variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess32 100.000
## ManufacturingProcess17 30.601
## ManufacturingProcess31 22.468
## ManufacturingProcess06 20.177
## BiologicalMaterial12 18.920
## BiologicalMaterial03 16.484
## ManufacturingProcess09 11.287
## ManufacturingProcess01 10.710
## ManufacturingProcess13 10.007
## BiologicalMaterial09 9.957
## ManufacturingProcess05 9.014
## ManufacturingProcess24 8.246
## ManufacturingProcess39 7.255
## BiologicalMaterial11 6.573
## ManufacturingProcess15 6.515
## ManufacturingProcess14 6.513
## ManufacturingProcess21 5.395
## ManufacturingProcess28 5.303
## ManufacturingProcess10 4.917
## ManufacturingProcess37 4.906
## RMSE Rsquared MAE
## 1.2267536 0.6746691 0.9548075
Cubist
## Cubist
##
## 124 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared MAE
## 1 0 1.743360 0.3421518 1.2502010
## 1 5 1.722870 0.3580408 1.2297350
## 1 9 1.724375 0.3538024 1.2321318
## 10 0 1.239189 0.5379959 0.9603813
## 10 5 1.222683 0.5509488 0.9471472
## 10 9 1.225073 0.5488191 0.9494012
## 20 0 1.184533 0.5801013 0.9204430
## 20 5 1.164153 0.5932035 0.9059211
## 20 9 1.170477 0.5896930 0.9096912
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.
cubistModel$results[which(cubistModel$results$committees==cubistModel$bestTune$committees
& cubistModel$results$neighbors==cubistModel$bestTune$neighbors),]## cubist variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess17 100.000
## ManufacturingProcess32 98.913
## BiologicalMaterial12 53.261
## ManufacturingProcess39 50.000
## ManufacturingProcess06 31.522
## ManufacturingProcess09 30.435
## BiologicalMaterial03 26.087
## ManufacturingProcess29 19.565
## ManufacturingProcess33 18.478
## ManufacturingProcess27 17.391
## ManufacturingProcess05 15.217
## ManufacturingProcess01 15.217
## ManufacturingProcess13 15.217
## ManufacturingProcess26 13.043
## ManufacturingProcess25 10.870
## BiologicalMaterial09 8.696
## ManufacturingProcess02 8.696
## BiologicalMaterial06 8.696
## ManufacturingProcess28 7.609
## ManufacturingProcess23 7.609
## RMSE Rsquared MAE
## 1.0191358 0.7499820 0.7881474
1.4.3 Part a
Which tree-based regression model gives the optimal resampling and test set performance?
Answer:
According to the result statistics, cubist model has the lowest RMSE and the largest \(R^2\) on both resampling and test set performances. It has the best performance among the four tree-based regression models.
## RMSE Rsquared MAE
## 1.4857317 0.4525266 1.1270232
## RMSE Rsquared MAE
## 1.2303142 0.6837008 0.9528876
#gbm
gbModel$results[which(gbModel$results$n.trees==gbModel$bestTune$n.trees
& gbModel$results$interaction.depth==gbModel$bestTune$interaction.depth
& gbModel$results$shrinkage==gbModel$bestTune$shrinkage
& gbModel$results$n.minobsinnode==gbModel$bestTune$n.minobsinnode),]## RMSE Rsquared MAE
## 1.2267536 0.6746691 0.9548075
#cubist
cubistModel$results[which(cubistModel$results$committees==cubistModel$bestTune$committees
& cubistModel$results$neighbors==cubistModel$bestTune$neighbors),]## RMSE Rsquared MAE
## 1.0191358 0.7499820 0.7881474
1.4.4 Part b
Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?
Answer:
By looking at the list of importance, ManufacturingProcess17 and the ManufacturingProcess32 are the most important predictors in the cubist model. Manufacturing process variables dominates the list by having 8 out of the top 10 variables.
From HW7, the optimal linear regression model was elastic net model. The top 10 predictors are mostly ManufacturingProcess predictors, there are 6 out of the top 10 predictors.
From HW8, the optimal non-linear regression model is SVM model. The top 10 predictors are mostly ManufacturingProcess predictors, there are 6 out of the top 10 predictors.
The top 10 important predictors in cubist model are slightly different with the optimal linear and nonlinear models. Although all 3 are dominated by process variables, there are 8 process variables out of the top 10 variables in cubist model.
The top 10 important predictors in the optimal linear and nonlinear models were ManufacturingProcess32, ManufacturingProcess13, BiologicalMaterial03, BiologicalMaterial06, ManufacturingProcess17, ManufacturingProcess09, BiologicalMaterial12, BiologicalMaterial02, ManufacturingProcess36, and ManufacturingProcess06.
## cubist variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess17 100.000
## ManufacturingProcess32 98.913
## BiologicalMaterial12 53.261
## ManufacturingProcess39 50.000
## ManufacturingProcess06 31.522
## ManufacturingProcess09 30.435
## BiologicalMaterial03 26.087
## ManufacturingProcess29 19.565
## ManufacturingProcess33 18.478
## ManufacturingProcess27 17.391
## ManufacturingProcess01 15.217
## ManufacturingProcess13 15.217
## ManufacturingProcess05 15.217
## ManufacturingProcess26 13.043
## ManufacturingProcess25 10.870
## ManufacturingProcess02 8.696
## BiologicalMaterial06 8.696
## BiologicalMaterial09 8.696
## ManufacturingProcess28 7.609
## BiologicalMaterial01 7.609
1.4.5 Part c
Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?
Answer:
The final single tree model plot shows how the nodes are decided and their corresponding percentage at each level. It shows the knowledge about the biological or process predictors and their relationship with yield by showing the decision at each split. For example, the higher the value of ManufacturingProcess32, the higher the yield.
It starts at 40 with percentage 100%.
The first split happens at
ManufacturingProcess32< 0.19. If it is less than 0.19, the yield at this node will be 39 with chance of 56% in total. If it is greater than or equal to 0.19, the yield at this node will be 41 with chance of 44% in total.If
ManufacturingProcess32< 0.19, the second split happens atBiologicalMaterial12< -0.18. If it’s a Yes, the yield becomes 39 with chance of 33% in total. If it’s a No, the yield becomes 40 with chance of 23% in total. These two percentage comes from the previous node of 56%.If
ManufacturingProcess32>= 0.19, the second split happens atManufacturingProcess31>= 0.14. If it’s a Yes, the yield becomes 41 with chance of 15% in total. If it’s a No, the yield becomes 42 with chance of 28% in total. These two percentage comes from the previous node of 44%.
## n= 124
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 124 391.81030 40.17169
## 2) ManufacturingProcess32< 0.191596 70 137.98860 39.18971
## 4) BiologicalMaterial12< -0.1808383 41 63.08785 38.58293 *
## 5) BiologicalMaterial12>=-0.1808383 29 38.46253 40.04759 *
## 3) ManufacturingProcess32>=0.191596 54 98.82214 41.44463
## 6) ManufacturingProcess31>=0.1415756 19 19.63712 40.59789 *
## 7) ManufacturingProcess31< 0.1415756 35 58.16786 41.90429 *