8.1. Recreate the simulated data from Exercise 7.2:
library(mlbench)
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"- Fit a random forest model to all of the predictors, then estimate the variable importance scores:
library(randomForest)
library(caret)
library(vip)
model1 <- randomForest(y ~ ., data = simulated,
importance = T,
ntree = 1000)
rfImp1 <-varImp(model1, scale = FALSE)## Overall
## V1 8.732235404
## V2 6.415369387
## V3 0.763591825
## V4 7.615118809
## V5 2.023524577
## V6 0.165111172
## V7 -0.005961659
## V8 -0.166362581
## V9 -0.095292651
## V10 -0.074944788
Did the random forest model significantly use the uninformative predictors (V6 – V10)?
The random forest model uses the uninformative predictors but it is not significant since the importance of these are really low.
- Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
## [1] 0.9460206
Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?
model2 <- randomForest(y ~ ., data = simulated,
importance = T,
ntree = 1000)
rfImp2 <-varImp(model2, scale = FALSE)
rfImp2 ## Overall
## V1 5.69119973
## V2 6.06896061
## V3 0.62970218
## V4 7.04752238
## V5 1.87238438
## V6 0.13569065
## V7 -0.01345645
## V8 -0.04370565
## V9 0.00840438
## V10 0.02894814
## duplicate1 4.28331581
V1’s importance decreases.
- Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?
library(party)
library(tidyverse)
cformod <- cforest(y ~ ., data = simulated)
varimp(cformod) %>%
sort(decreasing = T)## V4 V2 duplicate1 V1 V5
## 7.6223892727 6.0579730772 5.0941897280 4.6171158805 1.7161194047
## V7 V9 V3 V6 V10
## 0.0465374951 0.0046062409 0.0003116115 -0.0289427183 -0.0310326410
## V8
## -0.0380965511
It is different since the importance of the duplicate 3rd instead of fourth, and V1 is no longer as important, the most important is v4. What is still the same is that V6-V10 are still not important.
- Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?
Boosting
## var rel.inf
## V4 V4 28.4060721
## V2 V2 24.7780725
## V1 V1 17.0887380
## V5 V5 10.1864440
## duplicate1 duplicate1 9.8217115
## V3 V3 8.8594164
## V7 V7 0.4766545
## V6 V6 0.3828911
## V8 V8 0.0000000
## V9 V9 0.0000000
## V10 V10 0.0000000
For the boosted model, V4 and V2 are the most important and V1 is third and V6-V10 are still not that important.
Cubist
##
## Call:
## cubist.default(x = x, y = y, committees = param$committees)
##
##
## Cubist [Release 2.07 GPL Edition] Sun May 02 23:00:22 2021
## ---------------------------------
##
## Target attribute `outcome'
##
## Read 200 cases (12 attributes) from undefined.data
##
## Model 1:
##
## Rule 1/1: [200 cases, mean 14.416183, range 3.55596 to 28.38167, est err 1.936506]
##
## outcome = 0.269253 + 8.9 V4 + 7.1 V2 + 5.1 V5 + 4.8 V1 + 3.2 duplicate1
##
## Model 2:
##
## Rule 2/1: [200 cases, mean 14.416183, range 3.55596 to 28.38167, est err 1.990785]
##
## outcome = 0.826137 + 9 V4 + 8.3 V1 + 7.3 V2 + 5.2 V5 - 3 V6
##
## Model 3:
##
## Rule 3/1: [105 cases, mean 13.381248, range 3.55596 to 23.3956, est err 2.029922]
##
## if
## V1 <= 0.7340099
## V3 <= 0.654213
## then
## outcome = 2.658355 - 12.6 V3 + 11.6 duplicate1 + 10.2 V4 + 7.8 V2
## + 2.4 V6 + 1.5 V1 + 0.5 V5
##
## Rule 3/2: [20 cases, mean 14.639552, range 8.442596 to 21.62877, est err 2.450924]
##
## if
## V1 > 0.7340099
## V2 <= 0.5403168
## then
## outcome = 2.108552 + 35 V2 + 10.4 V4 - 6 V3 + 1.3 duplicate1 + 0.8 V5
##
## Rule 3/3: [57 cases, mean 14.914219, range 4.888355 to 28.38167, est err 2.814725]
##
## if
## V1 <= 0.7340099
## V3 > 0.654213
## then
## outcome = -21.377814 + 25.2 V3 + 11.3 V4 + 11 V1 + 8.1 V2 + 7.1 V5
##
## Rule 3/4: [18 cases, mean 18.628002, range 13.07191 to 23.57269, est err 2.682001]
##
## if
## V1 > 0.7340099
## V2 > 0.5403168
## then
## outcome = 43.992161 - 34.9 V2 + 0.2 V4
##
## Model 4:
##
## Rule 4/1: [200 cases, mean 14.416183, range 3.55596 to 28.38167, est err 2.058539]
##
## outcome = 0.1879 + 9.1 V4 + 7.9 V1 + 7 V5 + 7.2 V2 - 3.1 V6
##
## Model 5:
##
## Rule 5/1: [106 cases, mean 12.285650, range 3.55596 to 23.3956, est err 3.237101]
##
## if
## V2 <= 0.5403168
## then
## outcome = -7.104052 + 28.4 V2 + 12.9 duplicate1 + 7.9 V4 + 0.3 V5
##
## Rule 5/2: [105 cases, mean 13.381248, range 3.55596 to 23.3956, est err 2.238507]
##
## if
## V1 <= 0.7340099
## V3 <= 0.654213
## then
## outcome = 3.509951 - 13.8 V3 + 12.8 duplicate1 + 9.9 V4 + 7.7 V2
## + 2.6 V6 + 0.4 V1
##
## Rule 5/3: [57 cases, mean 14.914219, range 4.888355 to 28.38167, est err 2.820595]
##
## if
## V1 <= 0.7340099
## V3 > 0.654213
## then
## outcome = -20.764964 + 25.3 V3 + 11.4 V1 + 11.2 V4 + 8.2 V2 + 5.3 V5
##
## Rule 5/4: [18 cases, mean 18.628002, range 13.07191 to 23.57269, est err 2.651464]
##
## if
## V1 > 0.7340099
## V2 > 0.5403168
## then
## outcome = 43.756229 - 34.5 V2 + 0.2 V4
##
## Model 6:
##
## Rule 6/1: [200 cases, mean 14.416183, range 3.55596 to 28.38167, est err 2.317699]
##
## outcome = 1.218663 + 12.6 V1 + 10 V4 + 8.5 V5 - 5.4 duplicate1 + 4.1 V2
## - 2.6 V6
##
## Model 7:
##
## Rule 7/1: [14 cases, mean 11.101607, range 5.325261 to 17.15359, est err 3.645042]
##
## if
## V2 <= 0.9183624
## duplicate1 <= 0.07016665
## then
## outcome = -2.763226 + 43.1 duplicate1 + 12.8 V2 + 10.6 V4
##
## Rule 7/2: [12 cases, mean 14.461304, range 7.444598 to 19.79759, est err 3.295416]
##
## if
## V2 > 0.9183624
## then
## outcome = 5.665264 + 11.8 duplicate1 + 2.1 V2 + 1.5 V4
##
## Rule 7/3: [100 cases, mean 14.512158, range 3.55596 to 28.38167, est err 2.334867]
##
## if
## V3 > 0.4459752
## duplicate1 > 0.07016665
## then
## outcome = -14.769383 + 18.5 V3 + 13.7 V2 + 9.7 V4 + 7 duplicate1
## + 2.6 V5
##
## Rule 7/4: [83 cases, mean 15.070425, range 5.784235 to 23.57269, est err 3.290021]
##
## if
## V3 <= 0.4459752
## duplicate1 > 0.07016665
## then
## outcome = 6.19234 - 21.2 V3 + 14.9 duplicate1 + 13.1 V2 - 8.1 V1
## + 7.7 V4
##
## Model 8:
##
## Rule 8/1: [200 cases, mean 14.416183, range 3.55596 to 28.38167, est err 2.479266]
##
## outcome = 0.203753 + 13 V1 + 10 V4 + 9.3 V5 - 5.7 duplicate1 + 2.7 V2
##
## Model 9:
##
## Rule 9/1: [17 cases, mean 10.657375, range 5.325261 to 17.15359, est err 4.792915]
##
## if
## duplicate1 <= 0.07016665
## then
## outcome = -2.702351 + 43.8 duplicate1 + 15.2 V2 + 10.1 V4
##
## Rule 9/2: [78 cases, mean 13.686872, range 3.55596 to 28.38167, est err 2.377873]
##
## if
## V2 <= 0.7803221
## V3 > 0.4459752
## duplicate1 > 0.07016665
## then
## outcome = -15.617722 + 20.1 V3 + 16.6 V2 + 10.1 V4 + 6.1 duplicate1
##
## Rule 9/3: [68 cases, mean 14.288445, range 5.784235 to 23.57269, est err 3.086318]
##
## if
## V2 <= 0.7803221
## V3 <= 0.4459752
## duplicate1 > 0.07016665
## then
## outcome = 4.558539 - 18.8 V3 + 16.2 V2 + 13.9 duplicate1 + 8.1 V4
## - 6.6 V1 - 0.2 V6
##
## Rule 9/4: [17 cases, mean 16.810188, range 7.444598 to 25.01616, est err 4.524103]
##
## if
## V1 <= 0.671787
## V2 > 0.7803221
## V3 > 0.6060145
## then
## outcome = 5.227253 + 36.2 V3 - 28.3 V2 + 15.8 duplicate1 + 6.3 V4
##
## Rule 9/5: [52 cases, mean 16.931290, range 8.442596 to 26.94567, est err 4.433290]
##
## if
## V1 > 0.671787
## then
## outcome = 29.735355 - 24.5 V1 + 7.4 V4 + 0.9 V2 + 0.7 duplicate1
## - 0.2 V6
##
## Rule 9/6: [16 cases, mean 17.710804, range 9.597466 to 22.05247, est err 4.757797]
##
## if
## V1 <= 0.671787
## V2 > 0.7803221
## V3 <= 0.6060145
## then
## outcome = 39.25704 - 27 V2 + 14.4 V1 - 11.7 V3 + 1.8 duplicate1
##
## Model 10:
##
## Rule 10/1: [200 cases, mean 14.416183, range 3.55596 to 28.38167, est err 2.432866]
##
## outcome = -1.808888 + 13.8 V1 + 10.1 V4 + 9.8 V5 - 3.7 duplicate1
## + 3.4 V2
##
## Model 11:
##
## Rule 11/1: [110 cases, mean 14.087054, range 3.55596 to 28.38167, est err 2.779835]
##
## if
## V3 > 0.4459752
## then
## outcome = -12.561468 + 17.6 V3 + 14.5 V2 + 9.5 V4 + 5.4 duplicate1
##
## Rule 11/2: [90 cases, mean 14.818451, range 5.325261 to 23.57269, est err 3.266586]
##
## if
## V3 <= 0.4459752
## then
## outcome = 6.060055 - 18.2 V3 + 16.1 duplicate1 + 14 V2 - 10.5 V1 + 8 V4
## - 0.2 V6
##
## Rule 11/3: [41 cases, mean 17.110712, range 7.444598 to 25.01616, est err 6.489881]
##
## if
## V2 > 0.7803221
## then
## outcome = 8.034255 + 2.3 V2 + 1.6 V4 + 1.4 duplicate1 - 0.6 V6
##
## Rule 11/4: [35 cases, mean 17.380442, range 7.444598 to 25.01616, est err 2.897774]
##
## if
## V1 <= 0.7421085
## V2 > 0.7803221
## then
## outcome = 40.756111 - 31.7 V2 + 11.1 V1 + 1 V4 + 0.9 duplicate1 - 0.4 V6
##
## Model 12:
##
## Rule 12/1: [164 cases, mean 13.949840, range 3.55596 to 28.38167, est err 2.349439]
##
## if
## V1 <= 0.7514832
## then
## outcome = -4.684432 + 15.1 V1 + 11.1 V4 + 9.8 V5 + 5.1 V2
##
## Rule 12/2: [36 cases, mean 16.540636, range 8.442596 to 23.57269, est err 2.349583]
##
## if
## V1 > 0.7514832
## then
## outcome = 0.145388 + 10.2 V4 + 7.4 V5 + 7 V1 + 2.9 V2
##
## Model 13:
##
## Rule 13/1: [110 cases, mean 14.087054, range 3.55596 to 28.38167, est err 2.850824]
##
## if
## V3 > 0.4459752
## then
## outcome = -11.696705 + 18.1 V3 + 14 V2 + 8.9 V4 + 3.9 V1
##
## Rule 13/2: [90 cases, mean 14.818451, range 5.325261 to 23.57269, est err 3.362806]
##
## if
## V3 <= 0.4459752
## then
## outcome = 6.742455 - 20.9 V3 + 12.8 duplicate1 + 13.4 V2 + 8.3 V4
## - 6.6 V1
##
## Rule 13/3: [41 cases, mean 17.110712, range 7.444598 to 25.01616, est err 2.771123]
##
## if
## V2 > 0.7803221
## then
## outcome = 44.851998 - 33.1 V2 + 1.5 V4 + 1.2 duplicate1
##
## Model 14:
##
## Rule 14/1: [164 cases, mean 13.949840, range 3.55596 to 28.38167, est err 2.508220]
##
## if
## V1 <= 0.7514832
## then
## outcome = -4.057296 + 15.4 V1 + 11 V4 + 10.1 V5 + 3.1 V2 - 0.4 V6
## + 0.2 V3
##
## Rule 14/2: [36 cases, mean 16.540636, range 8.442596 to 23.57269, est err 2.458088]
##
## if
## V1 > 0.7514832
## then
## outcome = -1.343076 + 10.7 V4 + 8.6 V1 + 8.4 V5 - 2.1 V6 + 1.9 V2
## + 1.2 V3
##
## Model 15:
##
## Rule 15/1: [85 cases, mean 13.413920, range 3.55596 to 28.38167, est err 2.607504]
##
## if
## V2 <= 0.7803221
## V3 > 0.4459752
## then
## outcome = -11.744776 + 17.3 V3 + 15.7 V2 + 8.7 V4 + 3.7 V1
## + 0.2 duplicate1
##
## Rule 15/2: [74 cases, mean 14.074516, range 5.325261 to 23.57269, est err 3.106939]
##
## if
## V2 <= 0.7803221
## V3 <= 0.4459752
## then
## outcome = 6.285717 - 21.4 V3 + 15.1 V2 + 8.2 V4 + 6.9 duplicate1
##
## Rule 15/3: [41 cases, mean 17.110712, range 7.444598 to 25.01616, est err 2.792759]
##
## if
## V2 > 0.7803221
## then
## outcome = 43.111191 - 30.3 V2 + 1.6 V4 + 1.3 duplicate1
##
## Model 16:
##
## Rule 16/1: [164 cases, mean 13.949840, range 3.55596 to 28.38167, est err 2.496711]
##
## if
## V1 <= 0.7514832
## then
## outcome = -4.747534 + 14.9 V1 + 11.7 V4 + 9.9 V5 + 3.6 V2
## + 0.5 duplicate1 + 0.2 V3
##
## Rule 16/2: [36 cases, mean 16.540636, range 8.442596 to 23.57269, est err 2.689233]
##
## if
## V1 > 0.7514832
## then
## outcome = -2.046392 + 10.8 V4 + 7.5 V5 + 4.3 V1 + 3.1 duplicate1
## + 2.1 V2 + 1.4 V3
##
## Model 17:
##
## Rule 17/1: [85 cases, mean 13.413920, range 3.55596 to 28.38167, est err 2.580971]
##
## if
## V2 <= 0.7803221
## V3 > 0.4459752
## then
## outcome = -11.940892 + 18.3 V3 + 15.5 V2 + 8.3 V4 + 4 duplicate1
##
## Rule 17/2: [74 cases, mean 14.074516, range 5.325261 to 23.57269, est err 3.121916]
##
## if
## V2 <= 0.7803221
## V3 <= 0.4459752
## then
## outcome = 6.54601 - 21.9 V3 + 14.6 V2 + 7.7 duplicate1 + 7.9 V4
##
## Rule 17/3: [41 cases, mean 17.110712, range 7.444598 to 25.01616, est err 2.776035]
##
## if
## V2 > 0.7803221
## then
## outcome = 43.613615 - 31 V2 + 1.4 duplicate1 + 1.4 V4
##
## Model 18:
##
## Rule 18/1: [164 cases, mean 13.949840, range 3.55596 to 28.38167, est err 2.499548]
##
## if
## V1 <= 0.7514832
## then
## outcome = -5.392214 + 15.7 V1 + 12.2 V4 + 9.7 V5 + 4.4 V2
##
## Rule 18/2: [36 cases, mean 16.540636, range 8.442596 to 23.57269, est err 2.627912]
##
## if
## V1 > 0.7514832
## then
## outcome = -1.4426 + 11.2 V4 + 7.6 V5 + 7.1 V1 + 2.6 V2
##
## Model 19:
##
## Rule 19/1: [84 cases, mean 13.374577, range 3.55596 to 28.38167, est err 2.586975]
##
## if
## V2 <= 0.770291
## V3 > 0.4459752
## then
## outcome = -11.572484 + 18.8 V3 + 15 V2 + 7.7 V4 + 4 duplicate1
##
## Rule 19/2: [73 cases, mean 13.980781, range 5.325261 to 23.57269, est err 3.218686]
##
## if
## V2 <= 0.770291
## V3 <= 0.4459752
## then
## outcome = 6.635832 - 20.7 V3 + 14.2 V2 + 13.1 duplicate1 + 7.7 V4
## - 6.1 V1
##
## Rule 19/3: [43 cases, mean 17.190119, range 7.444598 to 25.01616, est err 2.792291]
##
## if
## V2 > 0.770291
## then
## outcome = 44.540363 - 31.6 V2 + 0.9 duplicate1 + 0.9 V4
##
## Model 20:
##
## Rule 20/1: [164 cases, mean 13.949840, range 3.55596 to 28.38167, est err 2.555268]
##
## if
## V1 <= 0.7514832
## then
## outcome = -5.959895 + 16 V1 + 12.6 V4 + 9.7 V5 + 4.8 V2
##
## Rule 20/2: [36 cases, mean 16.540636, range 8.442596 to 23.57269, est err 3.267745]
##
## if
## V1 > 0.7514832
## then
## outcome = 6.763101 + 10.5 V5 + 10.6 V4 - 3.4 V6 + 0.4 V1
##
##
## Evaluation on training data (200 cases):
##
## Average |error| 1.563071
## Relative |error| 0.39
## Correlation coefficient 0.92
##
##
## Attribute usage:
## Conds Model
##
## 37% 47% V3
## 35% 76% V1
## 25% 99% V2
## 8% 66% duplicate1
## 100% V4
## 62% V5
## 31% V6
##
##
## Time: 0.1 secs
## cubist variable importance
##
## Overall
## V2 100.00
## V1 89.52
## V4 80.65
## V3 67.74
## duplicate1 59.68
## V5 50.00
## V6 25.00
## V7 0.00
## V8 0.00
## V9 0.00
## V10 0.00
In the cubist model the importance is much higher V2 has the importance of 100% while V1 has the importance of 89% while V4 is 3. It seems like the cubist method is more tuned in in comparison to the other models.
8.2. Use a simulation to show tree bias with different granularities.
library(rpart)
Y1 <- runif(1000, 2,800)
Y2 <- rnorm(1000, 2,30)
Y3 <- rnorm(1000, 1,1000)
y <- Y2 - Y1
df <- data.frame(Y1, Y2, Y3, y)## Overall
## Y1 3.41469691
## Y2 0.43375419
## Y3 0.04630341
This simulation shows that Y1 takes over all of the trees, and the importance is set to Y1 even though there are other values
8.3. In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:
- Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?
The model on the right focuses on the first few predictors because the tuning parameters are much higher on the right compared to the model on the left which the one on the right is set to .9 while the one on the left is set to .1.
- Which model do you think would be more predictive of other samples?
The one on the left would be more predictive of other samples because it has more predictors that are considered more important comparison to the one on the right that has few important predictors.
- How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?
Increasing interaction depth would decrease the slope and spread across the preditors.
8.7. Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:
library(AppliedPredictiveModeling)
data("ChemicalManufacturingProcess")
library(tidyverse)
preP <- preProcess(ChemicalManufacturingProcess,
method = c( "knnImpute", "center", "scale"))
df <- predict(preP, ChemicalManufacturingProcess)
## Restore the response variable values to original
df$Yield = ChemicalManufacturingProcess$Yield
## Split the data into a training and a test set
trainRows <- createDataPartition(df$Yield, p = .80, list = FALSE)
df.train <- df[trainRows, ]
df.test <- df[-trainRows, ]
colYield <- which(colnames(df) == "Yield")
trainingX <- df.train[, -colYield]
trainingY <- df.train$Yield
testingX <- df.test[, -colYield]
testingY <- df.test$Yield- Which tree-based regression model gives the optimal resampling and test set performance?
Random Forest
rfmod <- randomForest(trainingY ~ ., data = trainingX,
importance = T,
ntree = 1000)
rfpredict <- predict(rfmod, newdata = testingX)Bagging
Boosted
Cubist
cbmod<- train(trainingX, trainingY, method= "cubist")
cbpredict <- predict(cbmod, newdata = testingX)## Warning: package 'kableExtra' was built under R version 3.6.3
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
resample<-rbind(
"Random Forest" = postResample(pred = predict(rfmod), obs = trainingY),
"Bagging" = postResample(pred = predict(cfmod), obs = trainingY),
"Boosting" = postResample(pred = predict(gbmod), obs = trainingY),
"Cubist" = postResample(pred = predict(cbmod), obs = trainingY)
)## Using 100 trees...
| RMSE | Rsquared | MAE | |
|---|---|---|---|
| Random Forest | 1.1725937 | 0.6395192 | 0.8881461 |
| Bagging | 1.0986633 | 0.7404666 | 0.8432799 |
| Boosting | 0.8297382 | 0.8155654 | 0.6206663 |
| Cubist | 0.1892204 | 0.9921570 | 0.1458568 |
Cubist is the best.
predictcmp<-rbind(
"Random Forest" = postResample(pred = rfpredict, obs = testingY),
"Bagging" = postResample(pred = cfpredict, obs = testingY),
"Boosting" = postResample(pred = gbpredict, obs = testingY),
"Cubist" = postResample(pred = cbpredict, obs = testingY)
)
predictcmp %>%
kable() %>%
kable_styling()| RMSE | Rsquared | MAE | |
|---|---|---|---|
| Random Forest | 0.956162 | 0.6409486 | 0.7509213 |
| Bagging | 1.119648 | 0.5107594 | 0.9052674 |
| Boosting | 1.198602 | 0.4794783 | 0.8598609 |
| Cubist | 0.901844 | 0.6769447 | 0.7176895 |
Cubist is the best.
- Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?
## cubist variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess09 72.92
## ManufacturingProcess17 54.17
## ManufacturingProcess13 47.92
## BiologicalMaterial06 46.88
## BiologicalMaterial02 26.04
## BiologicalMaterial03 26.04
## ManufacturingProcess33 26.04
## ManufacturingProcess04 25.00
## ManufacturingProcess29 20.83
## ManufacturingProcess01 18.75
## ManufacturingProcess19 17.71
## ManufacturingProcess36 17.71
## ManufacturingProcess26 16.67
## ManufacturingProcess25 16.67
## ManufacturingProcess37 15.62
## BiologicalMaterial04 14.58
## ManufacturingProcess27 14.58
## BiologicalMaterial10 12.50
## ManufacturingProcess14 12.50
ManufactoringProcess is the most important and it is alos important for the non linear models as well.
- Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?
## Warning: package 'rpart.plot' was built under R version 3.6.3
Yes it does, it shows even more that the ManufactoringProcess has a higher relationship with Yield.