Weeks 12-13 - Trees and Rules - Homework

C. Rosemond 11.22.20

library(tidyverse)
library(mlbench)
library(randomForest)
library(caret)
library(party)
library(gbm)
library(Cubist)
library(rpart)
library(AppliedPredictiveModeling)
library(RWeka)

8.1

Recreate the simulated data from Exercise 7.2

set.seed(624)
simulated <- mlbench.friedman1(200, sd = 1) #200 observations
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"

a. Fit a random forest model to all of the predictors, then estimate the variable importance scores. Did the random forest model significantly use the uninformative predictors (V6 - V10)?

set.seed(624)
model1 <- randomForest(y ~ .,
                       data = simulated,
                       importance = TRUE,
                       ntree = 1000)
varImp(model1, scale = FALSE)

##         Overall
## V1   7.06138525
## V2   4.76962217
## V3   1.01126851
## V4   9.88171245
## V5   2.05197889
## V6   0.09245359
## V7  -0.05564489
## V8  -0.07717705
## V9   0.03891580
## V10 -0.06427946

No, the uninformative predictors (V6-V10) are not important to this model.

b. Now add an additional predictor that is highly correlated with one of the informative predictors. Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?

set.seed(624)
simulated$duplicate1 <- simulated$V4 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V4)

## [1] 0.9439399

model2 <- randomForest(y ~ .,
                       data = simulated,
                       importance = TRUE,
                       ntree = 1000)
varImp(model2, scale = FALSE)

##                 Overall
## V1          6.566392759
## V2          4.206880432
## V3          0.960317494
## V4          6.983596352
## V5          2.031986609
## V6          0.020246106
## V7         -0.102195868
## V8         -0.001416176
## V9          0.085113008
## V10         0.015123619
## duplicate1  4.063380186

After adding a predictor highly correlated with V4, the importance score for V4 decreases from approximately 6.56 in the first model to approximately 6.38 in the second model. V1, with an importance score of approximately 7.20, remains the most importance variable.

set.seed(624)
simulated$duplicate2 <- simulated$V4 + rnorm(200) * .2
cor(simulated$duplicate2, simulated$V4)

## [1] 0.8148932

model3 <- randomForest(y ~ .,
                       data = simulated,
                       importance = TRUE,
                       ntree = 1000)
varImp(model3, scale = FALSE)

##                Overall
## V1          7.20159947
## V2          4.08531236
## V3          0.69719592
## V4          6.38375577
## V5          2.04940330
## V6          0.13753284
## V7         -0.05776366
## V8         -0.08171977
## V9          0.04213954
## V10         0.06795521
## duplicate1  3.35042968
## duplicate2  1.00378664

Adding a second additional predictor correlated with V4 results in negligible change in importance for either V4 or the first duplicate.

c. Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?

set.seed(624)
model4 <- cforest(y ~ ., data = simulated)
varimp(model4, conditional = TRUE)

##           V1           V2           V3           V4           V5           V6           V7           V8           V9          V10   duplicate1   duplicate2 
##  4.811084012  2.263260969  0.117301895  4.235852150  1.307494550  0.019542410 -0.001326794 -0.026687589  0.024102142  0.025047531  1.008603284  0.066807840

The use of conditional inference trees results in a model that places less importance on each of the informative predictors but still places little to no importance on the uninformative ones. V1 remains the most important, but with a lower score of approximately 4.81. The importance scores of variables V2 and v4 [and its first duplicate] each show roughly two point decreases from their counterpart scores from the prior model.

d. Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?

Boosted Trees

set.seed(624)
simulated <- simulated %>% select(-c(duplicate1, duplicate2))
boosted_model1 <- gbm(y ~ .,
              data = simulated,
              distribution = "gaussian")
summary.gbm(boosted_model1)

##     var   rel.inf
## V4   V4 36.466310
## V1   V1 25.198851
## V2   V2 17.098097
## V5   V5 12.021480
## V3   V3  9.215263
## V6   V6  0.000000
## V7   V7  0.000000
## V8   V8  0.000000
## V9   V9  0.000000
## V10 V10  0.000000

A boosted trees model focuses on the informative predictors V4, V1, V2, V5, and V3, in descending order of relative importance.

set.seed(624)
simulated$duplicate1 <- simulated$V4 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V4)

## [1] 0.9439399

boosted_model2 <- gbm(y ~ .,
              data = simulated,
              distribution = "gaussian")
summary.gbm(boosted_model2)

##                   var    rel.inf
## V4                 V4 34.9746629
## V1                 V1 25.4830902
## V2                 V2 16.6363163
## V5                 V5 11.8536331
## V3                 V3  8.5933492
## duplicate1 duplicate1  1.9040810
## V10               V10  0.2088589
## V9                 V9  0.1948116
## V8                 V8  0.1511967
## V6                 V6  0.0000000
## V7                 V7  0.0000000

Adding a predictor (duplicate1) highly correlated with V4 results in no change in order, though the duplicate is more important than the set of uninformative predictors. As with the random forest model, the informative predictors each decrease in importance with the addition of a duplicate.

Cubist

set.seed(624)
simulated <- simulated %>% select(-c(duplicate1))
cubist_model1 <- cubist(simulated[1:10], simulated$y)
summary(cubist_model1)

## 
## Call:
## cubist.default(x = simulated[1:10], y = simulated$y)
## 
## 
## Cubist [Release 2.07 GPL Edition]  Wed Nov 18 18:41:29 2020
## ---------------------------------
## 
##     Target attribute `outcome'
## 
## Read 200 cases (11 attributes) from undefined.data
## 
## Model:
## 
##   Rule 1: [53 cases, mean 10.425206, range 1.93532 to 17.54472, est err 1.407606]
## 
##     if
##  V1 <= 0.2136574
##  V4 <= 0.934455
##     then
##  outcome = 2.358711 + 13.5 V1 + 9.7 V4 + 5 V5 - 2 V3 + 1.1 V2
## 
##   Rule 2: [55 cases, mean 12.290461, range 4.176228 to 22.66149, est err 1.248912]
## 
##     if
##  V1 > 0.2136574
##  V2 <= 0.4577003
##  V3 > 0.07063606
##  V4 <= 0.934455
##     then
##  outcome = -1.676048 + 18.6 V2 + 9 V4 + 6.5 V1 + 4 V5 - 0.7 V3
## 
##   Rule 3: [20 cases, mean 14.479267, range 10.15909 to 21.36569, est err 1.369244]
## 
##     if
##  V1 > 0.2136574
##  V1 <= 0.5269587
##  V2 > 0.4577003
##  V4 <= 0.934455
##  V5 <= 0.5558813
##     then
##  outcome = 10.405348 + 8.7 V4 - 2.8 V9 - 1.9 V7 + 1.7 V10 + 1.4 V5
##            + 1.1 V2 + 0.6 V1 - 0.3 V3
## 
##   Rule 4: [20 cases, mean 16.767288, range 10.33381 to 23.71413, est err 1.739599]
## 
##     if
##  V1 > 0.5269587
##  V2 > 0.4577003
##  V4 <= 0.934455
##  V5 <= 0.5558813
##     then
##  outcome = 31.8521 - 12.9 V1 - 12.4 V2 + 8.7 V4 + 0.5 V5
## 
##   Rule 5: [10 cases, mean 18.189348, range 10.8877 to 22.12831, est err 0.689396]
## 
##     if
##  V1 > 0.2136574
##  V2 <= 0.4577003
##  V3 <= 0.07063606
##     then
##  outcome = 2.862449 + 16.9 V2 + 7.9 V4 + 6.6 V1 + 3.5 V5 - 2.8 V3
## 
##   Rule 6: [32 cases, mean 18.451132, range 10.53267 to 23.19134, est err 1.462592]
## 
##     if
##  V1 > 0.2136574
##  V2 > 0.4577003
##  V4 <= 0.934455
##  V5 > 0.5558813
##     then
##  outcome = 10.412436 + 7.9 V4 + 2.6 V2 + 2.4 V5 + 1.6 V1 - 0.8 V3
## 
##   Rule 7: [10 cases, mean 19.617697, range 15.73031 to 24.72045, est err 0.778461]
## 
##     if
##  V4 > 0.934455
##     then
##  outcome = 125.80106 - 110.9 V4 + 2.9 V2 + 0.2 V5 + 0.2 V1
## 
## 
## Evaluation on training data (200 cases):
## 
##     Average  |error|           1.355614
##     Relative |error|               0.34
##     Correlation coefficient        0.94
## 
## 
##  Attribute usage:
##    Conds  Model
## 
##     95%   100%    V1
##     95%   100%    V4
##     68%   100%    V2
##     36%   100%    V5
##     32%    85%    V3
##            10%    V7
##            10%    V9
##            10%    V10
## 
## 
## Time: 0.0 secs

A cubist model focuses on the informative predictors V1, V4, V2, V5, and V3, in descending order of relative importance. Here, V1 and V4 switch places from the boosted trees models.

set.seed(624)
simulated$duplicate1 <- simulated$V4 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V4)

## [1] 0.9439399

cubist_model2 <- cubist(simulated[-11], simulated$y)
summary(cubist_model2)

## 
## Call:
## cubist.default(x = simulated[-11], y = simulated$y)
## 
## 
## Cubist [Release 2.07 GPL Edition]  Wed Nov 18 18:41:29 2020
## ---------------------------------
## 
##     Target attribute `outcome'
## 
## Read 200 cases (12 attributes) from undefined.data
## 
## Model:
## 
##   Rule 1: [53 cases, mean 10.425206, range 1.93532 to 17.54472, est err 1.407606]
## 
##     if
##  V1 <= 0.2136574
##  V4 <= 0.934455
##     then
##  outcome = 2.358711 + 13.5 V1 + 9.7 V4 + 5 V5 - 2 V3 + 1.1 V2
## 
##   Rule 2: [65 cases, mean 13.197982, range 4.176228 to 22.66149, est err 1.376401]
## 
##     if
##  V1 > 0.2136574
##  V2 <= 0.4577003
##  V4 <= 0.934455
##     then
##  outcome = -1.180513 + 19.2 V2 + 9 V4 + 7.5 V1 + 4 V5 - 3.1 V3
## 
##   Rule 3: [72 cases, mean 16.880102, range 10.15909 to 23.71413, est err 1.842202]
## 
##     if
##  V1 > 0.2136574
##  V2 > 0.4577003
##  V4 <= 0.934455
##     then
##  outcome = 7.40266 + 8.8 V4 + 5.5 V5 + 2.6 V2 + 1.8 V1 - 0.8 V3
## 
##   Rule 4: [10 cases, mean 19.617697, range 15.73031 to 24.72045, est err 0.778461]
## 
##     if
##  V4 > 0.934455
##     then
##  outcome = 125.80106 - 110.9 V4 + 2.9 V2 + 0.2 V5 + 0.2 V1
## 
## 
## Evaluation on training data (200 cases):
## 
##     Average  |error|           1.678127
##     Relative |error|               0.42
##     Correlation coefficient        0.90
## 
## 
##  Attribute usage:
##    Conds  Model
## 
##    100%   100%    V4
##     95%   100%    V1
##     68%   100%    V2
##           100%    V5
##            95%    V3
## 
## 
## Time: 0.0 secs

Adding a predictor (duplicate1) highly correlated with V4 results in a new model that uses solely the informative predictors, and neither the duplicate nor any of the uninformative predictors. More specifically, the new model places greatest importance on V4, V1, and V2.

8.2

Use a simulation to show tree bias with different granularities.

set.seed(624)
x1 <- seq(0.01, 1, 0.01)
x2 <- rep(1:2, each = 50)
x3 <- sample(c(seq(1, 50, 1), rep(1:2, each = 25)))
y <- x2 + rnorm(100)
varImp(rpart(y ~ x1 + x2 + x3))

##      Overall
## x1 0.5268857
## x2 0.2619461
## x3 0.3989453

The simulation includes three predictors--x1, with 100 unique values; x2, with 2 unique values; and x3, with 52 unique values--and a response y created from x2 values plus random noise. Using CART to model y on the three predictors results in a model that places greatest importance on x1, which has the most unique values, and the least importance on x2, which informed y but also is the most granular. This tendency towards using less granular predictors for splitting suggests selection bias.

8.3

In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9.

a. Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?

The model on the right uses extremely high values for both the bagging fraction--the sampled proportion of training data used to inform the current iteration--and the learning rate--the fraction of the current predicted value to be added to the previous iteration's predicted value. These high values mean that the right-hand model places greater importance on past stages, using much of the same data (bagging) and correlating closely in structure and prediction with prior ones (learning). It also means that the model is greedy, or selects the optimal weak learner in each stage, which is related to learning. Thus, the predictors showing greater importance previously are likely to be the focus--and increasingly so--subsequently.

By comparison, the left-hand model uses a much smaller bagging fraction and [more optimal] learning rate. A lower proportion of the training data is used at each stage, and predictions are less dependent upon previous predictions. As a result, the model places greater importance on more diverse, more random data across stages.

b. Which model do you think would be more predictive of other samples?

I expect the model with bagging and learning parameters of 0.1, respectively, would be more predictive of other samples given it will have been trained on more diverse data and thus should be more generalizable to new data. The text suggests that lower values for the learning parameter perform better, though the low bagging parameter seems sub-optimal and could result in an overly simplistic model. On the other hand, the right-hand model is greedy and thus could be prone to over-fitting. Regardless, I would choose based upon validation of both models.

c. How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?

Increasing the interaction depth would result in a flatter, still decreasing slope of predictor importance. Increased depth results in additional splits and thus additional interactions between split predictors, which in turn increases opportunities for new predictors to take on greater importance in the model. Allocating importance across a larger set of predictors lessens the reliance on a smaller set of particularly important predictors.

8.7

Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models.

The code and descriptions below are adapted directly from the Week 10 Homework assignment.

data("ChemicalManufacturingProcess")
imputed <- impute::impute.knn(as.matrix(ChemicalManufacturingProcess), rng.seed = 624)
cmp <- as.data.frame(imputed$data)
sum(is.na(cmp))

## [1] 0

The data set contains the 57 predictors (12 describing input biological material and 45 describing the process predictors) for the 176 manufacturing runs. Response yield contains the percent yield for each run.

Data are assumed to missing at least at random for the purposes of this exercise, and KNN imputation is used to estimate the 106 missing values in the set. The imputation process uses 10 neighbors.

set.seed(624)
index <- createDataPartition(cmp$Yield, p = .80, list = FALSE)
cmp_train <- cmp[index,] # 144 observations
cmp_test <- cmp[-index,] # 32 observations

An 80/20 split is used to create a training set of 144 runs and a test set of 32 runs.

ex <- nearZeroVar(cmp_train[-1], saveMetrics = TRUE)
ex %>% arrange(-freqRatio, percentUnique, -nzv) %>% head()

##                        freqRatio percentUnique zeroVar   nzv
## BiologicalMaterial07   71.000000      1.388889   FALSE  TRUE
## ManufacturingProcess41  6.500000      2.777778   FALSE FALSE
## ManufacturingProcess28  5.400000     14.583333   FALSE FALSE
## ManufacturingProcess12  4.760000      1.388889   FALSE FALSE
## ManufacturingProcess34  4.636364      6.250000   FALSE FALSE
## ManufacturingProcess40  4.333333      1.388889   FALSE FALSE

sum(ex$nzv)

## [1] 1

A check for near-zero variance predictors returns just one: BiologicalMaterial07, with a frequency ratio of approximately 71.

corr <- cor((cmp_train %>% select(-c("Yield","BiologicalMaterial07"))), method = "spearman")
corrplot::corrplot(corr)

hicorr <- findCorrelation(corr)

set.seed(624)
cmp_train_slim <- cmp_train %>% select(-c(Yield, BiologicalMaterial07)) %>% select(-all_of(hicorr))
cmp_train_transform <- cmp_train_slim %>% preProcess(method = c("BoxCox", "center", "scale")) %>% predict(cmp_train_slim) %>% cbind(cmp_train$Yield) %>% rename(Yield = "cmp_train$Yield")
cmp_test_slim <- cmp_test %>% select(-c(Yield, BiologicalMaterial07)) %>% select(-all_of(hicorr))
cmp_test_transform <- cmp_test_slim %>% preProcess(method = c("BoxCox", "center", "scale")) %>% predict(cmp_test_slim) %>% cbind(cmp_test$Yield) %>% rename(Yield = "cmp_test$Yield")

In pre-processing, predictors with near-zero variance or high correlations (using Spearman's \(\rho\)) are removed, and remaining predictors undergo a Box-Cox transformation as well as centering and scaling. The resulting training and test sets feature 51 predictors.

a. Which tree-based regression model gives the optimal resampling and test set performance?

Single Tree

set.seed(624)
grid <- expand.grid(maxdepth = seq(1, 10, by = 1))
(single_cmp <- caret::train(Yield ~ .,
                           data = cmp_train_transform,
                           method = "rpart2",
                           tuneGrid = grid,
                           trControl = trainControl(method = "repeatedcv", repeats = 5)))

## CART 
## 
## 144 samples
##  51 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 130, 130, 130, 130, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   maxdepth  RMSE      Rsquared   MAE     
##    1        1.461753  0.3740462  1.167401
##    2        1.448095  0.3822608  1.161703
##    3        1.496788  0.3572653  1.198263
##    4        1.517058  0.3647183  1.202607
##    5        1.556054  0.3461312  1.221953
##    6        1.545414  0.3534624  1.202002
##    7        1.531284  0.3669317  1.174808
##    8        1.528409  0.3701504  1.164245
##    9        1.535316  0.3677410  1.169591
##   10        1.531337  0.3706694  1.161769
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was maxdepth = 2.

singlepred_cmp <- predict(single_cmp, newdata = cmp_test_transform)
postResample(pred = singlepred_cmp, obs = cmp_test_transform$Yield)

##      RMSE  Rsquared       MAE 
## 1.4593369 0.5400498 1.1604447

Minimizing RMSE (at approximately (~1.45) and using repeated 10-fold cross-validation, the maximum depth is 2. Evaluating on the test set returns an RMSE of approximately 1.46.

Random Forest

set.seed(624)
(rf_cmp <- caret::train(Yield ~ .,
                           data = cmp_train_transform,
                           method = "rf",
                           trControl = trainControl(method = "repeatedcv", repeats = 5),
                           importance = TRUE))

## Random Forest 
## 
## 144 samples
##  51 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 130, 130, 130, 130, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE      
##    2    1.220578  0.6430496  0.9980242
##   26    1.138212  0.6445025  0.8997997
##   51    1.157702  0.6243623  0.9124376
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 26.

rfpred_cmp <- predict(rf_cmp, newdata = cmp_test_transform)
postResample(pred = rfpred_cmp, obs = cmp_test_transform$Yield)

##      RMSE  Rsquared       MAE 
## 1.2659820 0.7610683 0.9470121

Minimizing RMSE (~1.14) and using repeated 10-fold cross-validation, the number of predictors to be randomly sampled at each split is 26. Evaluating on the test set returns an RMSE of approximately 1.27.

Boosted Trees

set.seed(624)
boosted_grid <- expand.grid(.interaction.depth = seq(1, 7, by =2),
                    .n.trees = seq(100, 1000, by = 50),
                    .shrinkage = c(0.01, 0.1),
                    .n.minobsinnode = c(5, 10))
(boosted_cmp <- caret::train(Yield ~ .,
                           data = cmp_train_transform,
                           method = "gbm",
                           tuneGrid = boosted_grid,
                           trControl = trainControl(method = "repeatedcv", repeats = 5),
                           verbose = FALSE,
                           distribution = "gaussian"))

## Stochastic Gradient Boosting 
## 
## 144 samples
##  51 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 130, 130, 130, 130, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   shrinkage  interaction.depth  n.minobsinnode  n.trees  RMSE      Rsquared   MAE      
##   0.01       1                   5               100     1.427721  0.5372102  1.1649823
##   0.01       1                   5               150     1.346093  0.5562413  1.0949981
##   0.01       1                   5               200     1.291922  0.5689057  1.0438736
##   0.01       1                   5               250     1.260184  0.5751826  1.0100025
##   0.01       1                   5               300     1.239968  0.5795328  0.9869272
##   0.01       1                   5               350     1.227185  0.5827024  0.9731314
##   0.01       1                   5               400     1.216606  0.5868720  0.9617909
##   0.01       1                   5               450     1.207433  0.5908080  0.9528824
##   0.01       1                   5               500     1.199722  0.5940348  0.9461525
##   0.01       1                   5               550     1.194491  0.5971129  0.9409722
##   0.01       1                   5               600     1.189609  0.5994562  0.9353368
##   0.01       1                   5               650     1.184690  0.6022769  0.9305125
##   0.01       1                   5               700     1.180421  0.6042636  0.9259541
##   0.01       1                   5               750     1.177870  0.6054348  0.9230573
##   0.01       1                   5               800     1.174100  0.6070573  0.9190059
##   0.01       1                   5               850     1.170615  0.6092221  0.9152021
##   0.01       1                   5               900     1.167679  0.6110072  0.9124152
##   0.01       1                   5               950     1.165259  0.6124905  0.9093425
##   0.01       1                   5              1000     1.162631  0.6137581  0.9066604
##   0.01       1                  10               100     1.430718  0.5351788  1.1697965
##   0.01       1                  10               150     1.347425  0.5543554  1.0981603
##   0.01       1                  10               200     1.293514  0.5672715  1.0469999
##   0.01       1                  10               250     1.261516  0.5734375  1.0133415
##   0.01       1                  10               300     1.242343  0.5776798  0.9898924
##   0.01       1                  10               350     1.226934  0.5823603  0.9701843
##   0.01       1                  10               400     1.218723  0.5846213  0.9588219
##   0.01       1                  10               450     1.211388  0.5882841  0.9498190
##   0.01       1                  10               500     1.206770  0.5904363  0.9443450
##   0.01       1                  10               550     1.201310  0.5935291  0.9396121
##   0.01       1                  10               600     1.197199  0.5950782  0.9357632
##   0.01       1                  10               650     1.193007  0.5969303  0.9317260
##   0.01       1                  10               700     1.189875  0.5983347  0.9290368
##   0.01       1                  10               750     1.186269  0.6004149  0.9252790
##   0.01       1                  10               800     1.183274  0.6020390  0.9214548
##   0.01       1                  10               850     1.180326  0.6032591  0.9185015
##   0.01       1                  10               900     1.178213  0.6048093  0.9157945
##   0.01       1                  10               950     1.177492  0.6048333  0.9147729
##   0.01       1                  10              1000     1.173987  0.6067682  0.9120401
##   0.01       3                   5               100     1.341189  0.5781599  1.0951408
##   0.01       3                   5               150     1.261639  0.5922409  1.0229614
##   0.01       3                   5               200     1.215986  0.6054455  0.9766663
##   0.01       3                   5               250     1.188927  0.6143525  0.9492845
##   0.01       3                   5               300     1.169674  0.6228305  0.9308577
##   0.01       3                   5               350     1.156187  0.6287682  0.9170908
##   0.01       3                   5               400     1.146540  0.6328776  0.9070298
##   0.01       3                   5               450     1.137410  0.6366442  0.8984058
##   0.01       3                   5               500     1.130171  0.6395350  0.8906868
##   0.01       3                   5               550     1.123249  0.6432943  0.8828832
##   0.01       3                   5               600     1.118381  0.6458489  0.8771387
##   0.01       3                   5               650     1.112654  0.6489951  0.8704841
##   0.01       3                   5               700     1.108581  0.6513915  0.8658082
##   0.01       3                   5               750     1.104946  0.6537154  0.8618190
##   0.01       3                   5               800     1.101683  0.6556584  0.8585060
##   0.01       3                   5               850     1.097812  0.6575985  0.8546428
##   0.01       3                   5               900     1.095353  0.6588568  0.8517996
##   0.01       3                   5               950     1.092516  0.6604139  0.8495586
##   0.01       3                   5              1000     1.090565  0.6614816  0.8477378
##   0.01       3                  10               100     1.344530  0.5709046  1.0954226
##   0.01       3                  10               150     1.269268  0.5832928  1.0247813
##   0.01       3                  10               200     1.227559  0.5925271  0.9817101
##   0.01       3                  10               250     1.204203  0.5992447  0.9563339
##   0.01       3                  10               300     1.187879  0.6054613  0.9393864
##   0.01       3                  10               350     1.178072  0.6088769  0.9282799
##   0.01       3                  10               400     1.168352  0.6138010  0.9181443
##   0.01       3                  10               450     1.160359  0.6178563  0.9099285
##   0.01       3                  10               500     1.153195  0.6218698  0.9027709
##   0.01       3                  10               550     1.147271  0.6246241  0.8966618
##   0.01       3                  10               600     1.143126  0.6267419  0.8932371
##   0.01       3                  10               650     1.139550  0.6289213  0.8901652
##   0.01       3                  10               700     1.135354  0.6312293  0.8863081
##   0.01       3                  10               750     1.131851  0.6335354  0.8830729
##   0.01       3                  10               800     1.129948  0.6349748  0.8813000
##   0.01       3                  10               850     1.127933  0.6361663  0.8796885
##   0.01       3                  10               900     1.126984  0.6368407  0.8789730
##   0.01       3                  10               950     1.124901  0.6383726  0.8781243
##   0.01       3                  10              1000     1.122757  0.6397135  0.8758066
##   0.01       5                   5               100     1.320854  0.5893272  1.0787699
##   0.01       5                   5               150     1.236864  0.6096302  1.0022836
##   0.01       5                   5               200     1.190980  0.6213442  0.9586855
##   0.01       5                   5               250     1.162328  0.6320830  0.9313011
##   0.01       5                   5               300     1.141233  0.6410624  0.9113742
##   0.01       5                   5               350     1.128032  0.6464326  0.8970834
##   0.01       5                   5               400     1.117900  0.6507642  0.8866143
##   0.01       5                   5               450     1.109422  0.6551295  0.8769114
##   0.01       5                   5               500     1.100296  0.6604405  0.8671636
##   0.01       5                   5               550     1.094523  0.6631980  0.8598450
##   0.01       5                   5               600     1.089303  0.6655582  0.8541454
##   0.01       5                   5               650     1.085266  0.6677532  0.8491781
##   0.01       5                   5               700     1.081199  0.6700302  0.8444533
##   0.01       5                   5               750     1.077784  0.6721817  0.8407880
##   0.01       5                   5               800     1.074649  0.6737578  0.8373817
##   0.01       5                   5               850     1.071930  0.6754401  0.8353929
##   0.01       5                   5               900     1.069330  0.6768532  0.8337019
##   0.01       5                   5               950     1.067298  0.6780592  0.8320116
##   0.01       5                   5              1000     1.064959  0.6794012  0.8302676
##   0.01       5                  10               100     1.334725  0.5774794  1.0872102
##   0.01       5                  10               150     1.256511  0.5912033  1.0161543
##   0.01       5                  10               200     1.212597  0.6031082  0.9704546
##   0.01       5                  10               250     1.187118  0.6108277  0.9415820
##   0.01       5                  10               300     1.168823  0.6189689  0.9227376
##   0.01       5                  10               350     1.158372  0.6235842  0.9114831
##   0.01       5                  10               400     1.149189  0.6279380  0.9016161
##   0.01       5                  10               450     1.141560  0.6316506  0.8928099
##   0.01       5                  10               500     1.135468  0.6346432  0.8863226
##   0.01       5                  10               550     1.130429  0.6371292  0.8816861
##   0.01       5                  10               600     1.127230  0.6388135  0.8785189
##   0.01       5                  10               650     1.123189  0.6409727  0.8757199
##   0.01       5                  10               700     1.118759  0.6436225  0.8718750
##   0.01       5                  10               750     1.116078  0.6451100  0.8693085
##   0.01       5                  10               800     1.114160  0.6461159  0.8669856
##   0.01       5                  10               850     1.112224  0.6473042  0.8653907
##   0.01       5                  10               900     1.110956  0.6482396  0.8648234
##   0.01       5                  10               950     1.109271  0.6494480  0.8640239
##   0.01       5                  10              1000     1.107787  0.6502579  0.8632946
##   0.01       7                   5               100     1.307003  0.6055447  1.0678095
##   0.01       7                   5               150     1.222831  0.6211639  0.9897895
##   0.01       7                   5               200     1.175944  0.6333512  0.9413445
##   0.01       7                   5               250     1.145687  0.6439610  0.9114432
##   0.01       7                   5               300     1.126863  0.6503178  0.8920416
##   0.01       7                   5               350     1.112537  0.6571885  0.8777051
##   0.01       7                   5               400     1.101134  0.6625862  0.8665131
##   0.01       7                   5               450     1.091912  0.6667622  0.8566489
##   0.01       7                   5               500     1.084818  0.6702307  0.8491598
##   0.01       7                   5               550     1.079717  0.6726313  0.8430410
##   0.01       7                   5               600     1.075421  0.6746464  0.8384735
##   0.01       7                   5               650     1.071004  0.6770645  0.8334904
##   0.01       7                   5               700     1.067025  0.6793239  0.8296821
##   0.01       7                   5               750     1.063902  0.6810934  0.8267113
##   0.01       7                   5               800     1.061204  0.6824111  0.8241104
##   0.01       7                   5               850     1.058741  0.6837976  0.8219455
##   0.01       7                   5               900     1.056899  0.6848372  0.8203424
##   0.01       7                   5               950     1.055008  0.6859554  0.8189743
##   0.01       7                   5              1000     1.052993  0.6870680  0.8179537
##   0.01       7                  10               100     1.331869  0.5804672  1.0863914
##   0.01       7                  10               150     1.256325  0.5917392  1.0150732
##   0.01       7                  10               200     1.213167  0.6009102  0.9714404
##   0.01       7                  10               250     1.186659  0.6103508  0.9439939
##   0.01       7                  10               300     1.170533  0.6164179  0.9269136
##   0.01       7                  10               350     1.157858  0.6225617  0.9137971
##   0.01       7                  10               400     1.147356  0.6275769  0.9031168
##   0.01       7                  10               450     1.141269  0.6298271  0.8969305
##   0.01       7                  10               500     1.134919  0.6332701  0.8905073
##   0.01       7                  10               550     1.129641  0.6360193  0.8856572
##   0.01       7                  10               600     1.124837  0.6389644  0.8803235
##   0.01       7                  10               650     1.121352  0.6409815  0.8776442
##   0.01       7                  10               700     1.118717  0.6425308  0.8748260
##   0.01       7                  10               750     1.116387  0.6441264  0.8723908
##   0.01       7                  10               800     1.114156  0.6455379  0.8700583
##   0.01       7                  10               850     1.112080  0.6468245  0.8680341
##   0.01       7                  10               900     1.109234  0.6488245  0.8658905
##   0.01       7                  10               950     1.107222  0.6499643  0.8654522
##   0.01       7                  10              1000     1.105364  0.6511462  0.8645189
##   0.10       1                   5               100     1.189021  0.5956498  0.9344818
##   0.10       1                   5               150     1.178125  0.6032055  0.9186806
##   0.10       1                   5               200     1.173761  0.6043892  0.9100564
##   0.10       1                   5               250     1.169437  0.6089336  0.9037489
##   0.10       1                   5               300     1.174292  0.6068745  0.9059815
##   0.10       1                   5               350     1.176658  0.6056341  0.9093323
##   0.10       1                   5               400     1.170998  0.6094985  0.9017819
##   0.10       1                   5               450     1.166908  0.6132904  0.8993088
##   0.10       1                   5               500     1.164777  0.6147483  0.8984574
##   0.10       1                   5               550     1.164095  0.6155460  0.8997818
##   0.10       1                   5               600     1.162590  0.6169697  0.8989849
##   0.10       1                   5               650     1.162716  0.6169379  0.8974344
##   0.10       1                   5               700     1.162256  0.6184752  0.8974466
##   0.10       1                   5               750     1.159087  0.6208312  0.8953293
##   0.10       1                   5               800     1.160477  0.6202221  0.8963830
##   0.10       1                   5               850     1.157762  0.6220146  0.8938432
##   0.10       1                   5               900     1.156933  0.6230451  0.8929510
##   0.10       1                   5               950     1.156874  0.6233522  0.8939382
##   0.10       1                   5              1000     1.155715  0.6244395  0.8937127
##   0.10       1                  10               100     1.196562  0.5890876  0.9321203
##   0.10       1                  10               150     1.191884  0.5907804  0.9264165
##   0.10       1                  10               200     1.187808  0.5915673  0.9210896
##   0.10       1                  10               250     1.185724  0.5933218  0.9210656
##   0.10       1                  10               300     1.187728  0.5940101  0.9196915
##   0.10       1                  10               350     1.187453  0.5924768  0.9190276
##   0.10       1                  10               400     1.185182  0.5972276  0.9203548
##   0.10       1                  10               450     1.186503  0.5970572  0.9203066
##   0.10       1                  10               500     1.182477  0.6000346  0.9172044
##   0.10       1                  10               550     1.182385  0.6019572  0.9210270
##   0.10       1                  10               600     1.181680  0.6034325  0.9206430
##   0.10       1                  10               650     1.181371  0.6035280  0.9201149
##   0.10       1                  10               700     1.183981  0.6023284  0.9227204
##   0.10       1                  10               750     1.183313  0.6036948  0.9232864
##   0.10       1                  10               800     1.186230  0.6034987  0.9266954
##   0.10       1                  10               850     1.183598  0.6052658  0.9249682
##   0.10       1                  10               900     1.182754  0.6062730  0.9252733
##   0.10       1                  10               950     1.183869  0.6056868  0.9256990
##   0.10       1                  10              1000     1.186154  0.6051804  0.9280313
##   0.10       3                   5               100     1.126673  0.6309413  0.8778412
##   0.10       3                   5               150     1.118608  0.6344207  0.8711840
##   0.10       3                   5               200     1.111321  0.6396734  0.8679516
##   0.10       3                   5               250     1.106877  0.6418787  0.8657773
##   0.10       3                   5               300     1.103539  0.6440274  0.8630819
##   0.10       3                   5               350     1.100967  0.6457262  0.8618612
##   0.10       3                   5               400     1.100733  0.6459224  0.8624044
##   0.10       3                   5               450     1.099570  0.6467314  0.8616397
##   0.10       3                   5               500     1.098866  0.6472124  0.8611487
##   0.10       3                   5               550     1.098451  0.6474745  0.8610017
##   0.10       3                   5               600     1.098140  0.6476500  0.8608511
##   0.10       3                   5               650     1.098080  0.6476705  0.8609299
##   0.10       3                   5               700     1.098131  0.6476232  0.8609636
##   0.10       3                   5               750     1.098025  0.6476836  0.8609171
##   0.10       3                   5               800     1.097970  0.6477143  0.8609243
##   0.10       3                   5               850     1.097941  0.6477330  0.8609185
##   0.10       3                   5               900     1.097907  0.6477625  0.8609117
##   0.10       3                   5               950     1.097911  0.6477644  0.8609318
##   0.10       3                   5              1000     1.097905  0.6477734  0.8609294
##   0.10       3                  10               100     1.142661  0.6272673  0.8970376
##   0.10       3                  10               150     1.135790  0.6315620  0.8974571
##   0.10       3                  10               200     1.130363  0.6364418  0.8931886
##   0.10       3                  10               250     1.123547  0.6414206  0.8886820
##   0.10       3                  10               300     1.118739  0.6440784  0.8866396
##   0.10       3                  10               350     1.114966  0.6470856  0.8838787
##   0.10       3                  10               400     1.113234  0.6482029  0.8835835
##   0.10       3                  10               450     1.111906  0.6494541  0.8832368
##   0.10       3                  10               500     1.110307  0.6504913  0.8821800
##   0.10       3                  10               550     1.109801  0.6508576  0.8820233
##   0.10       3                  10               600     1.109214  0.6512106  0.8817648
##   0.10       3                  10               650     1.108676  0.6514447  0.8814342
##   0.10       3                  10               700     1.108292  0.6517145  0.8812605
##   0.10       3                  10               750     1.107870  0.6519886  0.8810405
##   0.10       3                  10               800     1.107656  0.6521021  0.8809519
##   0.10       3                  10               850     1.107521  0.6522160  0.8808688
##   0.10       3                  10               900     1.107300  0.6523223  0.8807278
##   0.10       3                  10               950     1.107137  0.6524157  0.8806744
##   0.10       3                  10              1000     1.107003  0.6525077  0.8806093
##   0.10       5                   5               100     1.110228  0.6464728  0.8706931
##   0.10       5                   5               150     1.094429  0.6559326  0.8591453
##   0.10       5                   5               200     1.089129  0.6594170  0.8563424
##   0.10       5                   5               250     1.084335  0.6616264  0.8532406
##   0.10       5                   5               300     1.081299  0.6631954  0.8512959
##   0.10       5                   5               350     1.080151  0.6637837  0.8504221
##   0.10       5                   5               400     1.079125  0.6643194  0.8498183
##   0.10       5                   5               450     1.078372  0.6646770  0.8493239
##   0.10       5                   5               500     1.077883  0.6649037  0.8490162
##   0.10       5                   5               550     1.077596  0.6650377  0.8488405
##   0.10       5                   5               600     1.077334  0.6651680  0.8486450
##   0.10       5                   5               650     1.077171  0.6652489  0.8485479
##   0.10       5                   5               700     1.077088  0.6652842  0.8485129
##   0.10       5                   5               750     1.077005  0.6653221  0.8484477
##   0.10       5                   5               800     1.076952  0.6653440  0.8484174
##   0.10       5                   5               850     1.076912  0.6653634  0.8483940
##   0.10       5                   5               900     1.076897  0.6653722  0.8483815
##   0.10       5                   5               950     1.076872  0.6653834  0.8483668
##   0.10       5                   5              1000     1.076853  0.6653936  0.8483529
##   0.10       5                  10               100     1.143387  0.6244895  0.9082911
##   0.10       5                  10               150     1.130882  0.6342497  0.9002799
##   0.10       5                  10               200     1.128319  0.6363941  0.8966440
##   0.10       5                  10               250     1.123485  0.6396760  0.8951774
##   0.10       5                  10               300     1.119522  0.6424423  0.8935532
##   0.10       5                  10               350     1.116497  0.6443116  0.8910050
##   0.10       5                  10               400     1.115291  0.6450172  0.8905278
##   0.10       5                  10               450     1.113397  0.6461334  0.8890954
##   0.10       5                  10               500     1.112395  0.6467030  0.8886366
##   0.10       5                  10               550     1.111445  0.6472506  0.8880727
##   0.10       5                  10               600     1.110833  0.6474857  0.8876282
##   0.10       5                  10               650     1.110356  0.6477130  0.8871843
##   0.10       5                  10               700     1.110071  0.6478199  0.8871712
##   0.10       5                  10               750     1.109561  0.6481382  0.8868893
##   0.10       5                  10               800     1.109290  0.6482998  0.8866922
##   0.10       5                  10               850     1.109141  0.6483626  0.8866334
##   0.10       5                  10               900     1.108886  0.6485098  0.8864779
##   0.10       5                  10               950     1.108744  0.6485841  0.8863830
##   0.10       5                  10              1000     1.108594  0.6486610  0.8862972
##   0.10       7                   5               100     1.089405  0.6620128  0.8535273
##   0.10       7                   5               150     1.078684  0.6694235  0.8446272
##   0.10       7                   5               200     1.072416  0.6725484  0.8392623
##   0.10       7                   5               250     1.069538  0.6738973  0.8367786
##   0.10       7                   5               300     1.067820  0.6748603  0.8357188
##   0.10       7                   5               350     1.066777  0.6754413  0.8350811
##   0.10       7                   5               400     1.065936  0.6758591  0.8344585
##   0.10       7                   5               450     1.065447  0.6760288  0.8341605
##   0.10       7                   5               500     1.065171  0.6761686  0.8339107
##   0.10       7                   5               550     1.064816  0.6763610  0.8336768
##   0.10       7                   5               600     1.064618  0.6764600  0.8335541
##   0.10       7                   5               650     1.064479  0.6765316  0.8334642
##   0.10       7                   5               700     1.064373  0.6765950  0.8333745
##   0.10       7                   5               750     1.064329  0.6766160  0.8333385
##   0.10       7                   5               800     1.064270  0.6766512  0.8332976
##   0.10       7                   5               850     1.064235  0.6766676  0.8332654
##   0.10       7                   5               900     1.064217  0.6766774  0.8332537
##   0.10       7                   5               950     1.064197  0.6766883  0.8332455
##   0.10       7                   5              1000     1.064194  0.6766887  0.8332439
##   0.10       7                  10               100     1.161177  0.6089217  0.9167722
##   0.10       7                  10               150     1.148935  0.6188672  0.9096452
##   0.10       7                  10               200     1.144118  0.6235342  0.9071570
##   0.10       7                  10               250     1.139228  0.6266509  0.9042905
##   0.10       7                  10               300     1.137585  0.6277145  0.9029185
##   0.10       7                  10               350     1.135365  0.6289793  0.9018301
##   0.10       7                  10               400     1.133647  0.6300888  0.9009775
##   0.10       7                  10               450     1.131108  0.6314978  0.8997108
##   0.10       7                  10               500     1.130178  0.6322563  0.8991446
##   0.10       7                  10               550     1.129738  0.6324890  0.8990676
##   0.10       7                  10               600     1.129200  0.6327620  0.8988156
##   0.10       7                  10               650     1.128651  0.6330775  0.8985912
##   0.10       7                  10               700     1.128076  0.6333568  0.8982969
##   0.10       7                  10               750     1.127922  0.6333845  0.8981867
##   0.10       7                  10               800     1.127631  0.6335750  0.8980244
##   0.10       7                  10               850     1.127460  0.6336513  0.8979988
##   0.10       7                  10               900     1.127215  0.6337674  0.8978673
##   0.10       7                  10               950     1.127072  0.6338214  0.8977727
##   0.10       7                  10              1000     1.127024  0.6338374  0.8977656
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1000, interaction.depth = 7, shrinkage = 0.01 and n.minobsinnode = 5.

boostedpred_cmp <- predict(boosted_cmp, newdata = cmp_test_transform)
postResample(pred = boostedpred_cmp, obs = cmp_test_transform$Yield)

##      RMSE  Rsquared       MAE 
## 1.1057507 0.7661067 0.8521719

Minimizing RMSE (~1.05) and using repeated 10-fold cross-validation, the model uses 1000 trees, a depth of 7 trees, a shrinkage value of 0.01, and a minimum of 5 observations per terminal node. Evaluating on the test set returns an RMSE of approximately 1.11.

Cubist

set.seed(624)
(cubist_cmp <- caret::train(Yield ~ .,
                           data = cmp_train_transform,
                           method = "cubist",
                           trControl = trainControl(method = "repeatedcv", repeats = 5)))

## Cubist 
## 
## 144 samples
##  51 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 130, 130, 130, 130, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   committees  neighbors  RMSE       Rsquared   MAE      
##    1          0          1.2718257  0.5613352  1.0059226
##    1          5          1.0997059  0.6641557  0.8461829
##    1          9          1.1979266  0.6046555  0.9266776
##   10          0          1.1581101  0.6015906  0.9204779
##   10          5          1.0131135  0.6930819  0.7922289
##   10          9          1.1057110  0.6384356  0.8669104
##   20          0          1.1399322  0.6118317  0.9079318
##   20          5          0.9893243  0.7036421  0.7778457
##   20          9          1.0853852  0.6482333  0.8578460
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.

cubistpred_cmp <- predict(cubist_cmp, newdata = cmp_test_transform)
postResample(pred = cubistpred_cmp, obs = cmp_test_transform$Yield)

##      RMSE  Rsquared       MAE 
## 0.9300742 0.8138656 0.6736065

Minimizing RMSE (~0.99) and using repeated 10-fold cross-validation, the model uses 20 committees and 5 neighbors. Evaluating on the test set returns an RMSE of approximately 0.93, which is oddly lower than its training set counterpart. After experimenting with other random seeds, this result could be attributable to randomness, so the initial seed (624) is used.

Of the four models, the Cubist model performs the best on the test set. Its test set RMSE is approximately 0.93.

b. Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?

varImp(cubist_cmp)$importance %>%
  arrange(-Overall) %>%
  rownames_to_column("predictor") %>%
  top_n(10) %>%
  ggplot(aes(x = reorder(predictor, Overall), y = Overall)) +
    geom_col() +
    ggtitle("Top ten predictors of product yield, by importance in Cubist model") +
    xlab(NULL) +
    ylab("Importance") +
    coord_flip()

set.seed(624)
svm_cmp <- caret::train(Yield ~ .,
                         data = cmp_train_transform,
                         method = "svmRadial",
                         tuneLength = 14,
                         trControl = trainControl(method = "repeatedcv", repeats = 5))
varImp(svm_cmp)$importance %>%
  arrange(-Overall) %>%
  rownames_to_column("predictor") %>%
  top_n(10) %>%
  ggplot(aes(x = reorder(predictor, Overall), y = Overall)) +
    geom_col() +
    ggtitle("Top ten predictors of product yield, by importance in SVM model") +
    xlab(NULL) +
    ylab("Importance") +
    coord_flip()

set.seed(624)
cmp_pls <- caret::train(Yield ~ .,
            data = cmp_train_transform,
            method = "pls",
            tuneLength = 20,
            trControl = trainControl(method = "repeatedcv", repeats = 5)
            )
varImp(cmp_pls)$importance %>%
  arrange(-Overall) %>%
  rownames_to_column("predictor") %>%
  top_n(10) %>%
  ggplot(aes(x = reorder(predictor, Overall), y = Overall)) +
    geom_col() +
    ggtitle("Top ten predictors of product yield, by importance in PLS model") +
    xlab(NULL) +
    ylab("Importance") +
    coord_flip()

ManufacturingProcess32 and ManufacturingProcess09 are by far the most important predictors in the Cubist model. The process predictors still dominate, though less than they do in the linear partial least squares (PLS) model and more than they do in the non-linear support vector machine (SVM) model. Relative to the other optimal models, the Cubist model places importance on different process predictors--ManufacturingProcess33, ManufacturingProcess28, and ManufacturingProcess25--that are all also relatively less important to the model. The biological predictors are generally the same across models.

c. Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?

single_cmp$finalModel

## n= 144 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
## 1) root 144 453.46990 40.15799  
##   2) ManufacturingProcess32< 0.1888699 83 152.30240 39.23084  
##     4) BiologicalMaterial12< -0.5895659 31  36.09462 38.32839 *
##     5) BiologicalMaterial12>=-0.5895659 52  75.90933 39.76885 *
##   3) ManufacturingProcess32>=0.1888699 61 132.74350 41.41951  
##     6) BiologicalMaterial03< 1.089406 48  92.19183 41.08688 *
##     7) BiologicalMaterial03>=1.089406 13  15.63103 42.64769 *

plot(single_cmp$finalModel)
text(single_cmp$finalModel)

Plotting offers a more visual and intuitive depiction of the tree. Ultimately, the process predictor is feeding the biological predictors towards predicted yields. In this tree, the split of the process predictor ManufacturingProcess32 is approximately 0.19. On either side of the split, down the tree, biological predictors control the terminal nodes. The predictor below the process split, BiologicalMaterial12, itself splits at approximately -0.59 towards approximately 38.33 below and approximately 39.77 above. The predictor above the process split, BiologicalMaterial03, itself splits at approximately 1.09 towards approximately 41.09 below and approximately 42.65 above. Those terminal node values represent the predicted or mean yields for each partitioned group of observations.

Source

Kuhn, M. and Johnson, K. (2013). Applied predictive modeling. doi 10.1007/978-1-4614-6849-3