In this exercise we work with the simulated dataset from Exercise 7.2. We use the mlbench.friedman1 function from the mlbench library to simulate data for ten predictor variables \(V_1\) through \(V_{10}\) and one response variable \(y\) from the following nonlinear equation:
\[y = 10 \sin (\pi V_1 V_2) + 20 (V_3 -0.5)^2 + 10 V_4 + 5 V_5 + \epsilon\]
where the \(V_j\) are random variables uniformly distributed on [0, 1] and the error term \(\epsilon \sim N(0, \sigma^2)\) is normally distributed. Note that only the first five \(V_j\) variables enter the equation for the response \(y\); the remaining \(V_j\) variables are non-informative / noise variables. Our sample size is 200.
library(mlbench)
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"
For the first model, we fit a random forest model to the data using the randomForest function from the library of the same name. We use the following parameter values:
mtry (number of randomly sampled variables considered at each split): default value, which in this case is 3ntree (number of trees to grow): 1000.The variable importance scores are output below. We see that the most important variables in the random forest model are V1, V4, V2, and V5; none of the non-informative predictors (V6 through V10) are significant in the model.
set.seed(1012)
model1 <- randomForest(y ~ ., data = simulated,
importance = TRUE,
ntree = 1000, mtry = 3)
model1
##
## Call:
## randomForest(formula = y ~ ., data = simulated, importance = TRUE, ntree = 1000, mtry = 3)
## Type of random forest: regression
## Number of trees: 1000
## No. of variables tried at each split: 3
##
## Mean of squared residuals: 6.592795
## % Var explained: 72.96
plot(model1, main = "Error profile of random forest model1")
rfImp1 <- varImp(model1, scale = FALSE)
rfImp1 %>% kable(digits = 3,
caption = "Variable importance scores for random forest model")
| Overall | |
|---|---|
| V1 | 8.843 |
| V2 | 6.643 |
| V3 | 0.670 |
| V4 | 7.707 |
| V5 | 2.225 |
| V6 | 0.215 |
| V7 | -0.004 |
| V8 | -0.091 |
| V9 | 0.031 |
| V10 | -0.015 |
Now we fit a random forest model with conditional inference trees to the data, using the cforest function from the party library. We use the controls argument to specify the same parameters used above for the base RF model:
mtry: 3ntree: 1000.We summarize the importance scores below, showing both the traditional and the conditional scores, denoted by “c”. We follow the same framework used above for the base RF model, in which we start with the 10 predictors, then add the first correlated variable (duplicate1), and then add the second correlated variable (duplicate2). Whether viewing the traditional or conditional importance scores, we see the same pattern observed for the base RF model:
duplicate1 dilutes the importance score for V1duplicate2 dilutes the importance scores for V1 and duplicate1.# first model without correlated variables
set.seed(1012)
cfor1 <- cforest(y ~ ., data = simulated[- (12:13)], # exclude duplicate1 & duplicate2
controls = cforest_unbiased(ntree = 1000, mtry = 3))
cfor1
##
## Random Forest using Conditional Inference Trees
##
## Number of trees: 1000
##
## Response: y
## Inputs: V1, V2, V3, V4, V5, V6, V7, V8, V9, V10
## Number of observations: 200
cforImp1 <- varimp(cfor1, conditional = FALSE)
cforImp1c <- varimp(cfor1, conditional = TRUE)
cforImp <- cbind(cforImp1, cforImp1c)
# include duplicate1
set.seed(1012)
cfor2 <- cforest(y ~ ., data = simulated[- 13], # exclude duplicate2
controls = cforest_unbiased(ntree = 1000, mtry = 3))
cfor2
##
## Random Forest using Conditional Inference Trees
##
## Number of trees: 1000
##
## Response: y
## Inputs: V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, duplicate1
## Number of observations: 200
cforImp2 <- varimp(cfor2, conditional = FALSE)
cforImp2c <- varimp(cfor2, conditional = TRUE)
cforImp <- merge(cforImp, cbind(cforImp2, cforImp2c), by = "row.names",
all = TRUE, sort = FALSE)
# include duplicate2
set.seed(1012)
cfor3 <- cforest(y ~ ., data = simulated,
controls = cforest_unbiased(ntree = 1000, mtry = 3))
cfor3
##
## Random Forest using Conditional Inference Trees
##
## Number of trees: 1000
##
## Response: y
## Inputs: V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, duplicate1, duplicate2
## Number of observations: 200
cforImp3 <- varimp(cfor3, conditional = FALSE)
cforImp3c <- varimp(cfor3, conditional = TRUE)
cforImp <- merge(cforImp, cbind(cforImp3, cforImp3c),
by.x = "Row.names", by.y = "row.names",
all = TRUE, sort = FALSE)
# compile importance scores
colnames(cforImp) <- c("Variable",
paste0("CFOR", c("", "c", "_dup", "c_dup", "_dup2", "c_dup2")))
cforImp %>% kable(digits = 3,
caption = "Variable importance scores for conditional random forest model")
| Variable | CFOR | CFORc | CFOR_dup | CFORc_dup | CFOR_dup2 | CFORc_dup2 |
|---|---|---|---|---|---|---|
| V1 | 7.586 | 4.454 | 5.387 | 2.366 | 4.756 | 1.869 |
| V2 | 5.396 | 4.103 | 5.077 | 3.620 | 4.594 | 3.356 |
| V3 | 0.132 | 0.121 | 0.053 | 0.038 | 0.087 | 0.080 |
| V4 | 6.692 | 5.012 | 5.802 | 4.609 | 5.717 | 4.295 |
| V5 | 1.928 | 1.124 | 1.685 | 1.145 | 1.639 | 0.950 |
| V6 | -0.056 | -0.022 | -0.001 | 0.013 | 0.028 | 0.015 |
| V7 | 0.027 | -0.037 | -0.039 | -0.026 | -0.030 | -0.005 |
| V8 | 0.003 | 0.006 | -0.051 | -0.007 | -0.039 | -0.010 |
| V9 | -0.047 | 0.003 | -0.020 | -0.011 | -0.016 | -0.017 |
| V10 | -0.101 | -0.008 | -0.035 | -0.027 | -0.070 | -0.014 |
| duplicate1 | NA | NA | 3.509 | 1.734 | 2.558 | 1.119 |
| duplicate2 | NA | NA | NA | NA | 1.701 | 0.328 |
Next we repeat the same process with the boosted trees and Cubist models.
For the boosted trees model, we use the gbm function from the library of the same name. We specify the same number of trees used previously for the base RF and CFOR models, with other parameters set to their default values:
n.trees: 1000interaction.depth: 1shrinkage: 0.1bag.fraction: 0.5.From the table of variable importance scores below, we see the same pattern that was observed previously for the base RF and conditional RF models:
duplicate1 dilutes the importance score for V1duplicate2 dilutes the importance scores for V1 and duplicate1.### boosted trees model
# first model without correlated variables
set.seed(1012)
boost1 <- gbm(y ~ ., data = simulated[- (12:13)], # exclude duplicate1 & duplicate2
distribution = "gaussian",
n.trees = 1000)
boost1
## gbm(formula = y ~ ., distribution = "gaussian", data = simulated[-(12:13)],
## n.trees = 1000)
## A gradient boosted model with gaussian loss function.
## 1000 iterations were performed.
## There were 10 predictors of which 10 had non-zero influence.
boostImp1 <- summary(boost1, plotit = FALSE, order = FALSE)
# include duplicate1
set.seed(1012)
boost2 <- gbm(y ~ ., data = simulated[- 13], # exclude duplicate2
distribution = "gaussian",
n.trees = 1000)
boost2
## gbm(formula = y ~ ., distribution = "gaussian", data = simulated[-13],
## n.trees = 1000)
## A gradient boosted model with gaussian loss function.
## 1000 iterations were performed.
## There were 11 predictors of which 11 had non-zero influence.
boostImp2 <- summary(boost2, plotit = FALSE, order = FALSE)
boostImp <- merge(boostImp1, boostImp2, by = "var",
all = TRUE, sort = FALSE)
# include duplicate2
set.seed(1012)
boost3 <- gbm(y ~ ., data = simulated,
distribution = "gaussian",
n.trees = 1000)
boost3
## gbm(formula = y ~ ., distribution = "gaussian", data = simulated,
## n.trees = 1000)
## A gradient boosted model with gaussian loss function.
## 1000 iterations were performed.
## There were 12 predictors of which 12 had non-zero influence.
boostImp3 <- summary(boost3, plotit = FALSE, order = FALSE)
boostImp <- merge(boostImp, boostImp3, by = "var",
all = TRUE, sort = FALSE)
# compile importance scores
colnames(boostImp) <- c("Variable",
paste0("BOOST", c("", "_dup", "_dup2")))
boostImp %>% kable(digits = 3,
caption = "Variable importance scores for boosted trees model")
| Variable | BOOST | BOOST_dup | BOOST_dup2 |
|---|---|---|---|
| V1 | 22.335 | 17.547 | 16.978 |
| V2 | 19.978 | 19.802 | 19.371 |
| V3 | 10.048 | 10.107 | 10.176 |
| V4 | 25.167 | 24.933 | 24.850 |
| V5 | 11.465 | 10.971 | 10.748 |
| V6 | 2.501 | 2.459 | 2.145 |
| V7 | 3.147 | 2.763 | 2.404 |
| V8 | 1.514 | 1.361 | 1.474 |
| V9 | 1.862 | 1.954 | 1.744 |
| V10 | 1.983 | 1.756 | 1.635 |
| duplicate1 | NA | 6.347 | 6.229 |
| duplicate2 | NA | NA | 2.246 |
For the Cubist model, we use the cubist function from the Cubist library. We set the number of committees but use the default values for all other parameters, including:
committees: 100neighbors: not used for fitting.We see that the effect of adding the correlated predictors to the Cubist model is slightly different compared to the previous models:
V6) that is borderline-significant; its importance score is almost 40% that of the weakest informative variable (V5)duplicate1 has almost no effect on the importance scores for V1, the other informative predictors, or the significant non-informative variable (V6); however, it causes a slight increase in the score for another of the non-informative predictors (V8). Interestingly, the importance score for duplicate1 remains insignificant.duplicate2 dilutes the importance score for V1, and in addition, has a slight impact on the other informative variables. Again of interest, duplicate2 has a higher importance score than duplicate1 even though it is more weakly correlated to V1.The relationship between adding the correlated predictors and their importance scores (as well as their impact on the importance scores of the other predictors) seems unusual, and is worth further investigation.
### cubist model
# first model without correlated variables
set.seed(1012)
cub1 <- cubist(x = simulated[- (11:13)], # exclude duplicate1 & duplicate2
y = simulated$y,
committees = 100)
cub1
##
## Call:
## cubist.default(x = simulated[-(11:13)], y = simulated$y, committees = 100)
##
## Number of samples: 200
## Number of predictors: 10
##
## Number of committees: 100
## Number of rules per committee: 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4 ...
cubImp1 <- varImp(cub1)
# include duplicate1
set.seed(1012)
cub2 <- cubist(x = simulated[- c(11, 13)], # exclude duplicate2
y = simulated$y,
committees = 100)
cub2
##
## Call:
## cubist.default(x = simulated[-c(11, 13)], y = simulated$y, committees
## = 100)
##
## Number of samples: 200
## Number of predictors: 11
##
## Number of committees: 100
## Number of rules per committee: 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 2, 4 ...
cubImp2 <- varImp(cub2)
cubImp <- merge(cubImp1, cubImp2, by = "row.names",
all = TRUE, sort = FALSE)
# include duplicate2
set.seed(1012)
cub3 <- cubist(x = simulated[- 11],
y = simulated$y,
committees = 100)
cub3
##
## Call:
## cubist.default(x = simulated[-11], y = simulated$y, committees = 100)
##
## Number of samples: 200
## Number of predictors: 12
##
## Number of committees: 100
## Number of rules per committee: 1, 4, 1, 5, 1, 1, 4, 5, 5, 1, 4, 2, 4, 2, 6, 1, 4, 1, 4, 1 ...
cubImp3 <- varImp(cub3)
cubImp <- merge(cubImp, cubImp3,
by.x = "Row.names", by.y = "row.names",
all = TRUE, sort = FALSE)
# compile importance scores
colnames(cubImp) <- c("Variable",
paste0("CUBIST", c("", "_dup", "_dup2")))
cubImp %>% kable(digits = 3,
caption = "Variable importance scores for Cubist model")
| Variable | CUBIST | CUBIST_dup | CUBIST_dup2 |
|---|---|---|---|
| V1 | 71.5 | 71.0 | 65.0 |
| V3 | 47.0 | 46.5 | 49.0 |
| V2 | 58.5 | 59.0 | 62.0 |
| V4 | 48.0 | 48.0 | 46.0 |
| V5 | 33.0 | 32.5 | 30.5 |
| V6 | 13.0 | 12.0 | 12.0 |
| V7 | 0.0 | 0.0 | 0.0 |
| V8 | 0.0 | 1.0 | 4.0 |
| V9 | 0.0 | 0.0 | 0.0 |
| V10 | 0.0 | 0.0 | 0.0 |
| duplicate1 | NA | 0.0 | 1.0 |
| duplicate2 | NA | NA | 5.5 |
We start by simulating several random variables with a sample size of 500:
Then we define the response variable \(y\) to be related to the informative variable \(x\) through a sine function with error term:
\[y = \sin(2 \pi x) + e\]
which is plotted below.
# random variables
set.seed(227)
x <- runif(500) # informative variable
z <- rnorm(500, sd = 2) # noise variable
e <- rnorm(500, sd = 4) # error
# response = function of x + error
f <- function(x) sin(2 * pi * x) + e
y <- f(x)
plot(x, y, main = "y = sin(2*pi*x) + e")
lines(sort(x), sin(2 * pi * sort(x)), col = 'red', lwd = 2)
Next we consider different granularity levels for the informative variable by rounding the x-values into different buckets, which is equivalent to partitioning the interval [0, 1] into different-sized increments (i.e., nearest increments of 1, 1/2, 1/3, etc.):
We can see from the pairs plot below that \(x_1\) rounds to binary data (0’s and 1’s), \(x_3\) rounds to the nearest third, and so on; by the time we get to \(x_{10}\), the distribution almost resembles the continuous distribution for \(x\).
# different granularity of informative variable
x1 <- round(x)
x2 <- round(2*x) / 2
x3 <- round(3*x) / 3
x5 <- round(5*x) / 5
x10 <- round(10*x) / 10
x20 <- round(20*x) / 20
x50 <- round(50*x) / 50
x100 <- round(100*x) / 100
xx <- data.frame(x1, x2, x3, x5, x10, x20, x50, x100, x)
# plot distributions
ggpairs(xx[c(1,3,4,5,9)]) + labs(title = "Pairs plot of selected x_j")
Now for different levels of granularity for \(x_j\), we do the following:
rpart function)The results are shown below. We see that for \(x_1\) through \(x_5\), the variable importance scores are higher for the noise variable \(z\) than for the informative variable \(x_j\); whereas for \(x_{10}\) through \(x\), the pattern reverses. In fact, when \(x\) is binary (\(x_1\)) or rounded to the nearest 1/2 (\(x_2\)), the model is fitted solely on the \(z\) noise variable. This illustrates that for low levels of granularity (i.e., fewer values) in the informative variable, the regression tree is biased to select the continuous noise variable over the informative variable.
imp <- matrix(nrow = 2, ncol = ncol(xx))
colnames(imp) <- colnames(xx)
rownames(imp) <- c("x", "z")
for (j in (1:ncol(xx))) {
fit <- rpart(f(xx[[j]]) ~ xx[[j]] + z)
imp[ , j] <- as.matrix(varImp(fit))
}
imp %>% kable(digits = 3,
caption = "Variable importance scores for different granularity levels of x")
| x1 | x2 | x3 | x5 | x10 | x20 | x50 | x100 | x | |
|---|---|---|---|---|---|---|---|---|---|
| x | 0.000 | 0.001 | 0.033 | 0.034 | 0.032 | 0.052 | 0.061 | 0.285 | 0.116 |
| z | 0.022 | 0.022 | 0.049 | 0.047 | 0.011 | 0.040 | 0.048 | 0.095 | 0.097 |
Below we show the tree diagram for the case where \(x\) is rounded to the nearest 1/2 (\(x_2\)); in this instance, the tree splits are specified in terms of only the noise variable \(z\).
plot(as.party(rpart(f(x2) ~ x2 + z)), main = "Tree diagram for x2")
First let’s define terms, within the stochastic gradient boosting context:
The model on the right has a higher bagging fraction (0.9) and higher learning rate (0.9) than the model on the left (where both parameters equal 0.1). This means that the model on the right most likely is over-fitting the training set, in that at each iteration stage, the model fitting is determined by the optimal predictors on most (0.9x) of the dataset and most of this learning effect (0.9x) is inherited in the next iteration step. Combined, these factors suggest that variable significance will be more concentrated in the second model (on the right) than in the first model (on the left).
For the same reasons given above, the model on the left (where both the bagging fraction and the learning rate equal 0.1) will most likely generalize better and have stronger predictive performance for new samples.
Increasing the interaction depth will most likely decrease the slope of the variable importance for either model. The reason for this is that increasing the interaction depth produces stronger / more complex learners at each iteration step, which will include more variables in each learner. As a result the final model will likely include a greater diversity of variables that are influential in the model, so that importance scores are less concentrated in the top predictors.
In this exercise we work with the ChemicalManufacturingProcess dataset using the caret library. We apply the same imputation, data splitting, and pre-processing steps that we used in Exercise 6.3 (Homework 7) and Exercise 7.5 (Homework 8):
preProcess functioncreateDataPartition functionWe then fit and evaluate four models using the following algorithms and tuning parameters:
rpart library; tune cp (complexity) parameterparty library; tune mincriterion (1 - P-value threshold) parameterRWeka package; tune pruned, smoothed, and rules parameters while holding M (minimum sample size to split) parameter equal to 10ipred library; no parameter tuning with 50 bootstrap samplesrandomForest library; tune mtry (number of randomly sampled predictors for each split) while holding ntrees (number of bootstrap samples) parameter equal to 500gbm library; tune n.trees (number of boosting iterations), interaction.depth (maximum tree depth), and shrinkage parameters while holding n.minobsinnode (minimum terminal node size) parameter equal to 10Cubist library; tune committees (number of committees) and neighbors (number of nearest instances) parameters.We specify a common resampling and tuning methodology for all models within the train function:
For each model, we show below the re-sampling output, the tuning profile, the final model summary, and the variable importance plot.
### load data
data("ChemicalManufacturingProcess")
#str(ChemicalManufacturingProcess)
# for ease of manipulation, split response and predictors & relabel columns
yield <- ChemicalManufacturingProcess[1]
cmp <- ChemicalManufacturingProcess[-1]
colnames(cmp) <- c(paste0("B", 1:12), paste0("M", 1:45))
#summary(yield)
#summary(cmp)
yield <- yield[[1]] # numeric
cmp <- as.matrix(cmp) # matrix
### impute missing values
set.seed(502)
impute <- preProcess(cmp, method = "bagImpute")
cmp <- predict(impute, cmp)
#summary(cmp)
### training/test split
set.seed(314)
train_idx <- createDataPartition(yield, p = 0.75, list = FALSE)
yield_train <- yield[train_idx]
yield_test <- yield[-train_idx]
cmp_train <- cmp[train_idx, ]
cmp_test <- cmp[-train_idx, ]
#summary(yield_train)
#summary(yield_test)
### pre-processing and resampling method
#prep <- c("BoxCox", "center", "scale", "nzv") <<< BOX-COX CAUSES NaN / Inf ERRORS
prep <- c("center", "scale", "nzv")
#ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 5) <<< TAKES TOO LONG
ctrl <- trainControl(method = "cv", number = 10)
##############################################################################
# fit rpart model; tune complexity
##############################################################################
# fit & tune model
set.seed(1012)
rpartModel <- train(x = cmp_train,
y = yield_train,
preProcess = prep,
trControl = ctrl,
method = "rpart",
tuneLength = 15)
rpartModel
## CART
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (56), scaled (56), remove (1)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 119, 117, 120, 120, 117, ...
## Resampling results across tuning parameters:
##
## cp RMSE Rsquared MAE
## 0.00000000 1.644582 0.3969930 1.286968
## 0.02666366 1.642595 0.4024115 1.280128
## 0.05332731 1.558834 0.4224409 1.211878
## 0.07999097 1.591814 0.3893508 1.254702
## 0.10665463 1.595145 0.3481610 1.268602
## 0.13331829 1.628173 0.3350098 1.282216
## 0.15998194 1.629326 0.3338690 1.295754
## 0.18664560 1.629326 0.3338690 1.295754
## 0.21330926 1.629326 0.3338690 1.295754
## 0.23997291 1.629326 0.3338690 1.295754
## 0.26663657 1.629326 0.3338690 1.295754
## 0.29330023 1.629326 0.3338690 1.295754
## 0.31996389 1.629326 0.3338690 1.295754
## 0.34662754 1.704395 0.2909449 1.364751
## 0.37329120 1.842859 0.1814291 1.465252
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.05332731.
ggplot(rpartModel, highlight = TRUE) +
labs(title = paste0("Tuning profile: ", rpartModel$modelInfo$label))
# final model & variable importance
plot(as.party(rpartModel$finalModel),
main = paste0("Tree diagram: ", rpartModel$modelInfo$label))
ggplot(varImp(rpartModel), top = 20) +
labs(title = paste0("Variable importance: ", rpartModel$modelInfo$label))
# test set predictions
rpartPred <- predict(rpartModel, newdata = cmp_test)
rpartPerf <- postResample(pred = rpartPred, obs = yield_test)
##############################################################################
# fit ctree model; tune mincriterion
##############################################################################
# fit & tune model
set.seed(1012)
ctreeModel <- train(x = cmp_train,
y = yield_train,
preProcess = prep,
trControl = ctrl,
method = "ctree",
tuneLength = 15)
ctreeModel
## Conditional Inference Tree
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (56), scaled (56), remove (1)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 119, 117, 120, 120, 117, ...
## Resampling results across tuning parameters:
##
## mincriterion RMSE Rsquared MAE
## 0.01 1.520954 0.4542141 1.233950
## 0.08 1.516247 0.4583815 1.231029
## 0.15 1.518616 0.4553296 1.221299
## 0.22 1.536211 0.4436224 1.233415
## 0.29 1.526104 0.4370691 1.227696
## 0.36 1.549873 0.4229506 1.240391
## 0.43 1.561818 0.4088292 1.255816
## 0.50 1.561818 0.4088292 1.255816
## 0.57 1.569959 0.3981830 1.262282
## 0.64 1.559189 0.4067986 1.248650
## 0.71 1.571693 0.3962898 1.263112
## 0.78 1.575295 0.3919013 1.270514
## 0.85 1.631148 0.3515523 1.331550
## 0.92 1.624372 0.3538730 1.320035
## 0.99 1.647913 0.3331788 1.328366
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mincriterion = 0.08.
ggplot(ctreeModel, highlight = TRUE) +
labs(title = paste0("Tuning profile: ", ctreeModel$modelInfo$label))
# final model & variable importance
plot(ctreeModel$finalModel,
main = paste0("Tree diagram: ", ctreeModel$modelInfo$label))
ggplot(varImp(ctreeModel), top = 20) +
labs(title = paste0("Variable importance: ", ctreeModel$modelInfo$label))
# test set predictions
ctreePred <- predict(ctreeModel, newdata = cmp_test)
ctreePerf <- postResample(pred = ctreePred, obs = yield_test)
##############################################################################
# fit m5 model; tune pruned, smoothed & rules
##############################################################################
# fit & tune model
set.seed(1012)
m5Model <- train(x = cmp_train,
y = yield_train,
preProcess = prep,
trControl = ctrl,
method = "M5",
control = Weka_control(M = 10))
m5Model
## Model Tree
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (56), scaled (56), remove (1)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 119, 117, 120, 120, 117, ...
## Resampling results across tuning parameters:
##
## pruned smoothed rules RMSE Rsquared MAE
## Yes Yes Yes 1.379771 0.5343513 1.104119
## Yes Yes No 1.283890 0.5760348 1.074251
## Yes No Yes 1.390627 0.5345356 1.109995
## Yes No No 1.405642 0.5115142 1.147530
## No Yes Yes 1.411207 0.5346730 1.114986
## No Yes No 1.366413 0.5619962 1.108823
## No No Yes 1.655949 0.4389937 1.287995
## No No No 1.685049 0.3763196 1.286099
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were pruned = Yes, smoothed = Yes
## and rules = No.
plot(m5Model, main = paste0("Tuning profile: ", m5Model$modelInfo$label))
# final model & variable importance
m5Model$finalModel
## M5 pruned model tree:
## (using smoothed linear models)
##
## M32 <= 0.132 :
## | B12 <= -0.604 : LM1 (31/38.044%)
## | B12 > -0.604 :
## | | M17 <= -0.27 : LM2 (18/50.366%)
## | | M17 > -0.27 : LM3 (26/30.379%)
## M32 > 0.132 :
## | M6 <= 0.265 :
## | | B11 <= -0.349 :
## | | | B3 <= -0.342 : LM4 (4/52.545%)
## | | | B3 > -0.342 : LM5 (8/42.649%)
## | | B11 > -0.349 :
## | | | M31 <= -0.162 : LM6 (6/30.694%)
## | | | M31 > -0.162 :
## | | | | M15 <= 0.107 : LM7 (10/20.291%)
## | | | | M15 > 0.107 : LM8 (6/10.246%)
## | M6 > 0.265 : LM9 (23/47.538%)
##
## LM num: 1
## .outcome =
## 0.0508 * B3
## + 0.1542 * B9
## - 0.0376 * B11
## + 0.1444 * B12
## - 0.0752 * M2
## + 0.6592 * M6
## - 0.3502 * M17
## + 0.0626 * M30
## + 0.1609 * M32
## - 5.2393 * M39
## + 40.4592
##
## LM num: 2
## .outcome =
## 0.0508 * B3
## - 0.0376 * B11
## + 0.1231 * B12
## - 0.1564 * M2
## + 0.4254 * M6
## - 0.6111 * M17
## + 0.0626 * M30
## + 0.1609 * M32
## + 0.0377 * M39
## + 40.1661
##
## LM num: 3
## .outcome =
## 0.0508 * B3
## - 0.0376 * B11
## + 0.1231 * B12
## - 0.1373 * M2
## + 0.1254 * M6
## - 0.5497 * M17
## - 0.1513 * M22
## + 0.0626 * M30
## + 0.1609 * M32
## + 0.0377 * M39
## + 39.5935
##
## LM num: 4
## .outcome =
## 0.6197 * B3
## - 0.047 * B11
## + 0.0596 * B12
## + 0.0442 * M6
## + 0.331 * M15
## - 2.0083 * M16
## - 0.2943 * M17
## + 0.0782 * M30
## - 0.159 * M31
## + 0.2011 * M32
## + 0.0471 * M39
## + 40.8813
##
## LM num: 5
## .outcome =
## 0.5441 * B3
## - 0.047 * B11
## + 0.0596 * B12
## + 0.0442 * M6
## + 0.331 * M15
## - 2.0083 * M16
## - 0.2943 * M17
## + 0.0782 * M30
## - 0.159 * M31
## + 0.2011 * M32
## + 0.0471 * M39
## + 41.1235
##
## LM num: 6
## .outcome =
## 0.1848 * B3
## - 0.047 * B11
## + 0.0596 * B12
## + 0.0442 * M6
## + 0.2415 * M15
## - 1.4655 * M16
## - 0.2943 * M17
## + 0.0782 * M30
## - 0.2472 * M31
## + 0.2011 * M32
## + 0.0471 * M39
## + 40.9585
##
## LM num: 7
## .outcome =
## 0.1848 * B3
## - 0.047 * B11
## + 0.0596 * B12
## + 0.0442 * M6
## + 0.1952 * M15
## - 1.2018 * M16
## - 0.2943 * M17
## + 0.0782 * M30
## - 0.2188 * M31
## + 0.2011 * M32
## + 0.0471 * M39
## + 40.8325
##
## LM num: 8
## .outcome =
## 0.1848 * B3
## - 0.047 * B11
## + 0.0596 * B12
## + 0.0442 * M6
## + 0.1864 * M15
## - 1.4655 * M16
## - 0.2943 * M17
## + 0.0782 * M30
## - 0.2188 * M31
## + 0.2011 * M32
## + 0.0471 * M39
## + 40.7556
##
## LM num: 9
## .outcome =
## 0.2199 * B3
## + 0.1937 * B11
## + 0.0596 * B12
## + 0.0442 * M6
## - 0.7 * M17
## + 0.0782 * M30
## - 0.205 * M31
## + 0.2011 * M32
## + 0.0471 * M39
## + 41.3458
##
## Number of Rules : 9
#plot(m5Model$finalModel,
# main = paste0("Tree diagram: ", m5Model$modelInfo$label))
# error with tree diagram so plot manually: pruned = Yes, smoothed = Yes
m5plot <- M5P(yield_train ~ ., data = data.frame(yield_train, cmp_train),
control = Weka_control(M = 10, N = FALSE, U = FALSE))
plot(m5plot, main = "Tree diagram: M5 (raw predictors)")
ggplot(varImp(m5Model), top = 20) +
labs(title = paste0("Variable importance: ", m5Model$modelInfo$label))
# test set predictions
m5Pred <- predict(m5Model, newdata = cmp_test)
m5Perf <- postResample(pred = m5Pred, obs = yield_test)
##############################################################################
# fit bagged tree model; no tuning
##############################################################################
# fit & tune model
set.seed(1012)
bagModel <- train(x = cmp_train,
y = yield_train,
preProcess = prep,
trControl = ctrl,
method = "treebag",
nbagg = 50)
bagModel
## Bagged CART
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (56), scaled (56), remove (1)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 119, 117, 120, 120, 117, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 1.274328 0.6045216 0.9850819
#ggplot(bagModel, highlight = TRUE) +
# labs(title = paste0("Tuning profile: ", bagModel$modelInfo$label))
# final model & variable importance
bagModel$finalModel
##
## Bagging regression trees with 50 bootstrap replications
ggplot(varImp(bagModel), top = 20) +
labs(title = paste0("Variable importance: ", bagModel$modelInfo$label))
# test set predictions
bagPred <- predict(bagModel, newdata = cmp_test)
bagPerf <- postResample(pred = bagPred, obs = yield_test)
##############################################################################
# fit random forest model; tune mtry
##############################################################################
# fit & tune model
rfGrid <- data.frame(mtry = seq(2, ncol(cmp) - 1, length = 10))
set.seed(1012)
rfModel <- train(x = cmp_train,
y = yield_train,
preProcess = prep,
trControl = ctrl,
method = "rf", importance = TRUE, ntree = 500,
tuneGrid = rfGrid)
rfModel
## Random Forest
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (56), scaled (56), remove (1)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 119, 117, 120, 120, 117, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 1.256593 0.6753606 1.0280983
## 8 1.175725 0.6914582 0.9301040
## 14 1.162387 0.6859670 0.9118959
## 20 1.162259 0.6835265 0.9166543
## 26 1.163774 0.6805012 0.9122052
## 32 1.174409 0.6715481 0.9161034
## 38 1.183526 0.6647786 0.9201713
## 44 1.184037 0.6646573 0.9270198
## 50 1.193942 0.6577142 0.9323377
## 56 1.191157 0.6591370 0.9218105
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 20.
ggplot(rfModel, highlight = TRUE) +
labs(title = paste0("Tuning profile: ", rfModel$modelInfo$label))
# final model & variable importance
plot(rfModel$finalModel, main = paste0("Error profile: ", rfModel$modelInfo$label))
ggplot(varImp(rfModel), top = 20) +
labs(title = paste0("Variable importance: ", rfModel$modelInfo$label))
# test set predictions
rfPred <- predict(rfModel, newdata = cmp_test)
rfPerf <- postResample(pred = rfPred, obs = yield_test)
##############################################################################
# fit boosted tree model; tune interaction.depth, n.trees & shrinkage
##############################################################################
# fit & tune model
gbmGrid <- expand.grid(interaction.depth = seq(1, 7, by = 2),
n.trees = seq(100, 1000, by = 100),
shrinkage = c(0.01, 0.1), n.minobsinnode = 10)
set.seed(1012)
gbmModel <- train(x = cmp_train,
y = yield_train,
preProcess = prep,
trControl = ctrl,
method = "gbm", verbose = FALSE,
tuneGrid = gbmGrid)
gbmModel
## Stochastic Gradient Boosting
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (56), scaled (56), remove (1)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 119, 117, 120, 120, 117, ...
## Resampling results across tuning parameters:
##
## shrinkage interaction.depth n.trees RMSE Rsquared MAE
## 0.01 1 100 1.498650 0.5901162 1.2186318
## 0.01 1 200 1.321326 0.6312728 1.0622624
## 0.01 1 300 1.246167 0.6478029 0.9895680
## 0.01 1 400 1.208770 0.6628299 0.9533216
## 0.01 1 500 1.192092 0.6684800 0.9388970
## 0.01 1 600 1.181935 0.6710051 0.9288092
## 0.01 1 700 1.168935 0.6759267 0.9194513
## 0.01 1 800 1.156721 0.6819414 0.9051553
## 0.01 1 900 1.151742 0.6836909 0.8981264
## 0.01 1 1000 1.148110 0.6850797 0.8953117
## 0.01 3 100 1.384707 0.6146350 1.1281035
## 0.01 3 200 1.230898 0.6515550 0.9817653
## 0.01 3 300 1.179918 0.6699248 0.9284973
## 0.01 3 400 1.148254 0.6842008 0.9042542
## 0.01 3 500 1.129518 0.6927197 0.8898761
## 0.01 3 600 1.114233 0.6987633 0.8808828
## 0.01 3 700 1.104542 0.7026256 0.8739396
## 0.01 3 800 1.098531 0.7045647 0.8689153
## 0.01 3 900 1.092963 0.7069225 0.8648685
## 0.01 3 1000 1.089273 0.7083811 0.8656634
## 0.01 5 100 1.375900 0.6263831 1.1195141
## 0.01 5 200 1.240781 0.6477026 0.9917123
## 0.01 5 300 1.182059 0.6672434 0.9369984
## 0.01 5 400 1.151305 0.6816515 0.9085756
## 0.01 5 500 1.127197 0.6917683 0.8954229
## 0.01 5 600 1.115840 0.6958137 0.8863796
## 0.01 5 700 1.104935 0.6996919 0.8773043
## 0.01 5 800 1.097023 0.7033499 0.8716464
## 0.01 5 900 1.091912 0.7065807 0.8682391
## 0.01 5 1000 1.085987 0.7099831 0.8634081
## 0.01 7 100 1.369840 0.6349881 1.1146840
## 0.01 7 200 1.231200 0.6564487 0.9817977
## 0.01 7 300 1.183379 0.6705162 0.9348219
## 0.01 7 400 1.150104 0.6834322 0.9052179
## 0.01 7 500 1.126963 0.6926420 0.8841507
## 0.01 7 600 1.109488 0.7013380 0.8701628
## 0.01 7 700 1.099563 0.7046437 0.8617788
## 0.01 7 800 1.088393 0.7090501 0.8564584
## 0.01 7 900 1.085948 0.7092663 0.8568966
## 0.01 7 1000 1.080974 0.7117137 0.8536201
## 0.10 1 100 1.151952 0.6723921 0.9107895
## 0.10 1 200 1.139967 0.6800563 0.9062366
## 0.10 1 300 1.152613 0.6690931 0.9350862
## 0.10 1 400 1.152909 0.6669775 0.9494351
## 0.10 1 500 1.149212 0.6694937 0.9490025
## 0.10 1 600 1.160175 0.6671251 0.9497953
## 0.10 1 700 1.166699 0.6643580 0.9552319
## 0.10 1 800 1.167924 0.6637075 0.9548883
## 0.10 1 900 1.168252 0.6633959 0.9564782
## 0.10 1 1000 1.167384 0.6660688 0.9492150
## 0.10 3 100 1.139742 0.6938809 0.9008562
## 0.10 3 200 1.111269 0.6990786 0.8774888
## 0.10 3 300 1.105495 0.6971149 0.8730109
## 0.10 3 400 1.105066 0.6972192 0.8719661
## 0.10 3 500 1.105318 0.6970177 0.8733287
## 0.10 3 600 1.104597 0.6967659 0.8734629
## 0.10 3 700 1.104572 0.6969158 0.8727718
## 0.10 3 800 1.105370 0.6964025 0.8730812
## 0.10 3 900 1.105140 0.6965465 0.8728403
## 0.10 3 1000 1.105487 0.6962912 0.8728713
## 0.10 5 100 1.136620 0.6841275 0.8997597
## 0.10 5 200 1.138032 0.6802838 0.9018184
## 0.10 5 300 1.137156 0.6815714 0.9050817
## 0.10 5 400 1.137473 0.6806456 0.9055027
## 0.10 5 500 1.138379 0.6799225 0.9058674
## 0.10 5 600 1.138328 0.6798776 0.9066717
## 0.10 5 700 1.138298 0.6801197 0.9070434
## 0.10 5 800 1.138642 0.6799711 0.9074562
## 0.10 5 900 1.139345 0.6796498 0.9081762
## 0.10 5 1000 1.139497 0.6794466 0.9087549
## 0.10 7 100 1.134180 0.6851758 0.8871431
## 0.10 7 200 1.127551 0.6854471 0.8842233
## 0.10 7 300 1.126379 0.6852896 0.8849238
## 0.10 7 400 1.123417 0.6867847 0.8855397
## 0.10 7 500 1.121551 0.6874811 0.8838705
## 0.10 7 600 1.121040 0.6873256 0.8840421
## 0.10 7 700 1.120442 0.6877678 0.8839037
## 0.10 7 800 1.120144 0.6877076 0.8841339
## 0.10 7 900 1.120146 0.6876607 0.8842620
## 0.10 7 1000 1.120144 0.6875703 0.8843156
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1000,
## interaction.depth = 7, shrinkage = 0.01 and n.minobsinnode = 10.
ggplot(gbmModel, highlight = TRUE) +
labs(title = paste0("Tuning profile: ", gbmModel$modelInfo$label))
# final model & variable importance
gbmModel$finalModel
## A gradient boosted model with gaussian loss function.
## 1000 iterations were performed.
## There were 56 predictors of which 55 had non-zero influence.
ggplot(varImp(gbmModel), top = 20) +
labs(title = paste0("Variable importance: ", gbmModel$modelInfo$label))
# test set predictions
gbmPred <- predict(gbmModel, newdata = cmp_test)
gbmPerf <- postResample(pred = gbmPred, obs = yield_test)
##############################################################################
# fit cubist model; tune committees & neighbors
##############################################################################
# fit & tune model
cubGrid <- expand.grid(committees = c(1:10, 15, 20),
neighbors = c(0, 1, 5, 9))
set.seed(1012)
cubModel <- train(x = cmp_train,
y = yield_train,
preProcess = prep,
trControl = ctrl,
method = "cubist",
tuneGrid = cubGrid)
cubModel
## Cubist
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (56), scaled (56), remove (1)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 119, 117, 120, 120, 117, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared MAE
## 1 0 1.462101 0.5445383 1.0970479
## 1 1 1.544244 0.5505422 1.0726196
## 1 5 1.413605 0.5884390 1.0127159
## 1 9 1.415435 0.5757981 1.0311758
## 2 0 1.309348 0.5861699 1.0390401
## 2 1 1.266222 0.6314397 0.9606537
## 2 5 1.214394 0.6496652 0.9264805
## 2 9 1.231811 0.6392587 0.9517074
## 3 0 1.222276 0.6277949 0.9430797
## 3 1 1.206730 0.6483130 0.9280919
## 3 5 1.155068 0.6696490 0.8670846
## 3 9 1.159427 0.6663908 0.8898611
## 4 0 1.134353 0.6683059 0.9279043
## 4 1 1.161542 0.6783434 0.8898185
## 4 5 1.088536 0.7053267 0.8511029
## 4 9 1.091623 0.7026369 0.8761153
## 5 0 1.054738 0.7031342 0.8602937
## 5 1 1.114396 0.6884811 0.8542105
## 5 5 1.023758 0.7277670 0.7972337
## 5 9 1.028384 0.7263642 0.8180578
## 6 0 1.065000 0.6938469 0.8798195
## 6 1 1.116572 0.6829143 0.8557129
## 6 5 1.022836 0.7240784 0.8036790
## 6 9 1.028574 0.7217627 0.8287710
## 7 0 1.063161 0.6930327 0.8733267
## 7 1 1.124854 0.6811572 0.8660188
## 7 5 1.030245 0.7173850 0.8072162
## 7 9 1.032430 0.7173399 0.8216513
## 8 0 1.080282 0.6862694 0.8898215
## 8 1 1.117292 0.6888184 0.8594842
## 8 5 1.048409 0.7159723 0.8276938
## 8 9 1.043188 0.7145785 0.8317350
## 9 0 1.097547 0.6852917 0.9009156
## 9 1 1.133008 0.6864410 0.8694005
## 9 5 1.059014 0.7137707 0.8285612
## 9 9 1.062247 0.7113388 0.8418928
## 10 0 1.120652 0.6782645 0.9122784
## 10 1 1.142692 0.6850253 0.8768056
## 10 5 1.069575 0.7147424 0.8364076
## 10 9 1.074259 0.7112952 0.8512635
## 15 0 1.173084 0.6689301 0.9091464
## 15 1 1.196077 0.6731132 0.8635982
## 15 5 1.132594 0.7035524 0.8256041
## 15 9 1.134516 0.6975300 0.8545451
## 20 0 1.225224 0.6544912 0.9428941
## 20 1 1.247980 0.6654064 0.8984041
## 20 5 1.182058 0.6916699 0.8516604
## 20 9 1.180900 0.6846730 0.8809144
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 6 and neighbors = 5.
ggplot(cubModel, highlight = TRUE) +
labs(title = paste0("Tuning profile: ", cubModel$modelInfo$label))
# final model & variable importance
cubModel$finalModel
##
## Call:
## cubist.default(x = x, y = y, committees = param$committees)
##
## Number of samples: 132
## Number of predictors: 56
##
## Number of committees: 6
## Number of rules per committee: 2, 1, 3, 1, 4, 1
ggplot(varImp(cubModel), top = 20) +
labs(title = paste0("Variable importance: ", cubModel$modelInfo$label))
# test set predictions
cubPred <- predict(cubModel, newdata = cmp_test)
cubPerf <- postResample(pred = cubPred, obs = yield_test)
We evaluate predictive performance for the models based on the root mean squared error (RMSE), R-squared, and mean absolute error (MAE) metrics. We summarize these performance metrics in the table below based on the resampled holdout sets (prefixed as “res_”) as well as the test set (prefixed as “test_”) for all 7 tree-based models.
We note several observations:
We also plot below the resampled metrics for the various models.
# resampling & test set metrics
res_perf <- rbind(CART = colMeans(rpartModel$resample[-4]),
CIT = colMeans(ctreeModel$resample[-4]),
M5 = colMeans(m5Model$resample[-4]),
BAG = colMeans(bagModel$resample[-4]),
RF = colMeans(rfModel$resample[-4]),
SGB = colMeans(gbmModel$resample[-4]),
CUB = colMeans(cubModel$resample[-4]))
colnames(res_perf) <- paste0("res_", colnames(res_perf))
test_perf <- rbind(CART = rpartPerf, CIT = ctreePerf, M5 = m5Perf,
BAG = bagPerf, RF = rfPerf, SGB = gbmPerf, CUB = cubPerf)
colnames(test_perf) <- paste0("test_", colnames(test_perf))
cbind(res_perf, test_perf) %>%
kable(digits = 3,
caption = "Resampled & test set performance metrics")
| res_RMSE | res_Rsquared | res_MAE | test_RMSE | test_Rsquared | test_MAE | |
|---|---|---|---|---|---|---|
| CART | 1.559 | 0.422 | 1.212 | 1.310 | 0.429 | 1.072 |
| CIT | 1.516 | 0.458 | 1.231 | 1.364 | 0.359 | 1.116 |
| M5 | 1.284 | 0.576 | 1.074 | 1.156 | 0.521 | 0.964 |
| BAG | 1.274 | 0.605 | 0.985 | 1.006 | 0.626 | 0.815 |
| RF | 1.162 | 0.684 | 0.917 | 0.972 | 0.661 | 0.795 |
| SGB | 1.081 | 0.712 | 0.854 | 1.020 | 0.632 | 0.849 |
| CUB | 1.023 | 0.724 | 0.804 | 1.495 | 0.416 | 0.880 |
# plot resampled metrics
list(CART = rpartModel, CIT = ctreeModel, M5 = m5Model,
BAG = bagModel, RF = rfModel, SGB = gbmModel, CUB = cubModel) %>%
resamples() %>%
bwplot(metric = c("RMSE", "Rsquared", "MAE"),
layout = c(3,1),
between = list(x = 1),
main = "Comparison of resampled performance metrics")
From the variable importance plot above, we see that the top 10 important predictors in the RF model include 6 manufacturing process and 4 biological variables:
The top 10 predictors are summarized in the table below for each of the 7 tree-based models as well as the SVM and elastic net (ENET) models, which were the optimal non-linear and linear models from the previous exercises. It is interesting to see that there is greater diversity of opinion among the tree-based models, compared to the output seen previously for the non-linear and linear models.
Reviewing variable importance for the RF model, we observe that:
top10 <- function(mod) {
varImp(mod)$importance %>%
mutate(Var = paste(rownames(varImp(mod)$importance),
round(Overall, 1), sep = " - ")) %>%
arrange(desc(Overall)) %>%
dplyr::select(Var) %>%
head(10)
}
prevtop10 <- c("M32", "M13", "M17", "B6", "B3", "M9", "M36", "M6", "B12", "B2")
cbind(CART = top10(rpartModel),
CIT = top10(ctreeModel),
M5 = top10(m5Model),
BAG = top10(bagModel),
RF = top10(rfModel),
SGB = top10(gbmModel),
CUB = top10(cubModel),
SVM = prevtop10,
ENET = prevtop10) %>%
kable(col.names = c("CART", "CIT", "M5", "BAG", "RF", "SGB", "CUB", "SVM", "ENET"),
caption = "Top 10 important variables")
| CART | CIT | M5 | BAG | RF | SGB | CUB | SVM | ENET |
|---|---|---|---|---|---|---|---|---|
| M17 - 100 | M32 - 100 | M32 - 100 | M17 - 100 | M32 - 100 | M32 - 100 | M32 - 100 | M32 | M32 |
| M11 - 62.4 | M13 - 89.9 | M13 - 89.9 | B12 - 94.7 | M13 - 61.5 | B12 - 35.1 | M13 - 80 | M13 | M13 |
| B12 - 57.9 | M17 - 82.6 | M17 - 82.6 | M9 - 91.3 | B12 - 58.2 | M13 - 31 | M17 - 61.1 | M17 | M17 |
| M13 - 49.6 | B6 - 70.9 | B6 - 70.9 | M32 - 89.4 | B3 - 52 | M6 - 24.5 | M29 - 40 | B6 | B6 |
| M25 - 49.4 | B3 - 63.9 | B3 - 63.9 | B6 - 76.9 | M17 - 49.8 | M9 - 20.8 | M18 - 37.8 | B3 | B3 |
| M30 - 39.8 | M9 - 62.9 | M9 - 62.9 | B3 - 74.1 | B6 - 47.9 | B3 - 16 | M4 - 35.6 | M9 | M9 |
| M18 - 37.7 | M36 - 61.4 | M36 - 61.4 | M6 - 71.4 | M39 - 41.9 | M17 - 15.4 | M25 - 35.6 | M36 | M36 |
| M32 - 37.2 | M6 - 60.2 | M6 - 60.2 | M31 - 62.5 | M9 - 41.3 | M31 - 14.8 | B8 - 27.8 | M6 | M6 |
| B11 - 27.4 | B12 - 59.6 | B12 - 59.6 | M13 - 62.3 | M36 - 39.9 | B9 - 13.5 | M20 - 22.2 | B12 | B12 |
| B3 - 25.3 | B2 - 51.2 | B2 - 51.2 | B11 - 50 | B11 - 39.8 | B6 - 8.7 | M10 - 17.8 | B2 | B2 |
The optimal single tree model based on the test set metrics above is the CART model. The tree diagram is shown below. Based on the tree splits and the distribution of yield in the terminal nodes, this model indicates the following predictor relationships with yield:
plot(as.party(rpartModel$finalModel),
main = paste0("Tree diagram: ", rpartModel$modelInfo$label))