library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rpart)

EXERCISE 8.1

Recreate the simulated data from Exercise 7.2:

library(mlbench)
## Warning: package 'mlbench' was built under R version 4.3.3
set.seed(200)

simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"

a. Fit a random forest model to all of the predictors, then estimate the variable importance scores. Did the random forest model significantly use the uninformative predictors (V6 – V10)?

library(randomForest)
## Warning: package 'randomForest' was built under R version 4.3.3
library(caret)
## Warning: package 'caret' was built under R version 4.3.3
set.seed(201)

model1 <- randomForest(y ~ ., data = simulated, 
                       importance = TRUE,
                       ntree = 1000)

rfImp1 <- varImp(model1, scale = FALSE)
rfImp1 %>% arrange(desc(Overall))
##          Overall
## V1   8.765986597
## V4   7.896367545
## V2   6.529195380
## V5   2.229576452
## V3   0.786661957
## V6   0.032819437
## V10 -0.007697345
## V7  -0.017614112
## V9  -0.056301236
## V8  -0.099819262

The importance scores for variables V6-V10 were very low indicating that these variables contributed little or in cases of a negative value (V8 and V9) potentially decreased the performance of the model.

b. Now add an additional predictor that is highly correlated with one of the informative predictors. For example:

set.seed(202)

simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1

cor(simulated$duplicate1, simulated$V1)
## [1] 0.9429382

Fit another random forest model to these data. Did the importance score for V1 change?

set.seed(203)

model2 <- randomForest(y ~ ., data = simulated, 
                       importance = TRUE,
                       ntree = 1000)

rfImp2 <- varImp(model2, scale = FALSE)


rfImp2 %>% arrange(desc(Overall))
##                Overall
## V4          6.81165073
## V2          6.42152045
## V1          5.43212415
## duplicate1  4.11403802
## V5          2.17263787
## V3          0.48198968
## V6          0.12768791
## V7         -0.05703721
## V10        -0.09164586
## V8         -0.10033492
## V9         -0.10843002

When adding in a predictor that was highly correlated with V1, the importance score decreased and ranked the importance of V1 third, versus first when the predictor was not in the model.

What happens when you add another predictor that is also highly correlated with V1?

set.seed(204)

# Add another highly correlated predictor to V1
simulated$duplicate2 <- simulated$V1 + rnorm(200) * .1

# Calculate correlation
cor(simulated$duplicate2, simulated$V1)
## [1] 0.9423098
# Create another model 
model3 <- randomForest(y ~ ., data = simulated, 
                       importance = TRUE,
                       ntree = 1000)

# Calculate feature importance
rfImp3 <- varImp(model3, scale = FALSE)

# View importance scores
rfImp3 %>% arrange(desc(Overall))
##                 Overall
## V4          7.001205661
## V2          6.556167747
## V1          5.002164795
## duplicate1  3.512152248
## V5          2.008208171
## duplicate2  1.751890242
## V3          0.525777133
## V6          0.130936617
## V7          0.011110731
## V9         -0.006968616
## V10        -0.087405693
## V8         -0.105559825

Adding a second highly correlated predictor (duplicate2) with V1 dropped the importance score, but not as much as running a model with the first highly correlated predictor (duplicate1). All non important predictors remained in the bottom 5 and still had very low scores.

c. Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007).

library(party)
## Warning: package 'party' was built under R version 4.3.3
## Warning: package 'strucchange' was built under R version 4.3.3
## Warning: package 'sandwich' was built under R version 4.3.3
set.seed(205)

cforest_model <- cforest(formula = y ~ V1+V2+V3+V4+V5+V6+V7+V8+V9+V10, data = simulated,
        # Adjust parameters ntree and mtry to values used in the random forest model                 
        controls = cforest_control(ntree = 1000, mtry = 3))

cforest_model
## 
##   Random Forest using Conditional Inference Trees
## 
## Number of trees:  1000 
## 
## Response:  y 
## Inputs:  V1, V2, V3, V4, V5, V6, V7, V8, V9, V10 
## Number of observations:  200
varImp(cforest_model) %>% arrange(desc(Overall))
##          Overall
## V1   6.764646178
## V4   6.393648430
## V2   5.334192335
## V5   1.836656095
## V3   0.123622518
## V7   0.033265543
## V6   0.005995986
## V9  -0.008123966
## V10 -0.033533213
## V8  -0.035588481
rfImp1 %>% arrange(desc(Overall))
##          Overall
## V1   8.765986597
## V4   7.896367545
## V2   6.529195380
## V5   2.229576452
## V3   0.786661957
## V6   0.032819437
## V10 -0.007697345
## V7  -0.017614112
## V9  -0.056301236
## V8  -0.099819262

Do these importance’s show the same pattern as the traditional random forest model?

The same patterns occur using a condition inference random forest model. The top 5 important features remained in the same order but lower scores. The order of the bottom 5 importance features changed slightly but still exhibited low or negative scores.

d. Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?

library(gbm)
## Warning: package 'gbm' was built under R version 4.3.3
## Loaded gbm 2.2.2
## This version of gbm is no longer under development. Consider transitioning to gbm3, https://github.com/gbm-developers/gbm3
# gbm model 
# gbmModel <- gbm(y ~ V1+V2+V3+V4+V5+V6+V7+V8+V9+V10, data = simulated, distribution = "gaussian", 
                #n.trees = 1000, 
                #shrinkage = 0.01)
set.seed(206)

gbmGrid <- expand.grid(interaction.depth = seq(1, 7, by = 2), 
                       n.trees = seq(100, 1000, by = 50),
                       shrinkage = c(0.01, 0.1), 
                       n.minobsinnode=10)

gbmTune <- train(simulated %>% select(-c(y, duplicate1, duplicate2)), 
                 simulated$y, 
                 method = "gbm",
                 tuneGrid = gbmGrid,
                 verbose = FALSE)
## The gbm() function produces copious amounts
## of  output, so pass in the verbose option
## to avoid printing a lot to the screen.
# Extract the best model 
gbmTune$bestTune
##     n.trees interaction.depth shrinkage n.minobsinnode
## 101     350                 3       0.1             10
# View important feature scores
varImp(gbmTune)
## gbm variable importance
## 
##     Overall
## V4  100.000
## V1   87.864
## V2   76.054
## V5   39.680
## V3   26.446
## V7    3.181
## V6    2.493
## V8    1.452
## V9    1.123
## V10   0.000

The same pattern occurs when fitting a boosted tree model, the top and bottom five most important remain the same, but slightly different order.

set.seed(207)

cubistGrid <- expand.grid(committees = c(1, 10, 50, 100), 
                          neighbors = c(0, 1, 5, 9))

cubistTuned <- train(simulated %>% select(-c(y, duplicate1, duplicate2)), 
                 simulated$y, 
                 method = "cubist",
                 tuneGrid = cubistGrid,
                 verbose = FALSE)
ggplot(cubistTuned)

# Extract the best model 
cubistTuned$bestTune
##    committees neighbors
## 16        100         9
# View important feature scores
varImp(cubistTuned)
## cubist variable importance
## 
##     Overall
## V1   100.00
## V2    81.82
## V4    67.13
## V3    65.73
## V5    46.15
## V6    18.18
## V9     0.00
## V7     0.00
## V10    0.00
## V8     0.00

The same pattern occurs when fitting a boosted tree model or a rules based Cubist model. The top and bottom five most important remain the same, but slightly different order and magnitude.

EXERCISE 8.2

Use a simulation to show tree bias with different granularities.

To show tree bias, six variables are created with varying levels of granularity. A predictor variable will also be created by taking adding two of the variables plus a random number between 10 - 15 in 0.001 increments.

set.seed(100)

X1 <- sample(seq(5, 10, 1), 200, replace = TRUE) # 5 distinct values 
X2 <- sample(seq(400, 450, 5), 200, replace = TRUE) # 11 distinct values
X3 <- sample(seq(15, 16, 0.01), 200, replace = TRUE) # 101 distinct values
X4 <- sample(seq(1,3, 0.01), 200, replace = TRUE) # 201 distinct values
X5 <- sample(seq(0, 0.5, 0.001), 200, replace = TRUE) # 501 distinct values
X6 <- sample(seq(0.5, 2, 0.001), 200, replace = TRUE) # > 1,501 distinct values
Y <- X6 + X3 + sample(seq(10, 15, 0.001), 200, replace = TRUE) # Y is a sample with variables X6 and X3 added to it 


# Create data frame 
simulated2 <- data.frame(X1,X2,X3,
                         X4,X5,X6,Y)


model_sim <-  rpart(Y ~., data=simulated2)

# Extract variable importance using varImp
varImp(model_sim) %>% arrange(desc(Overall))
##      Overall
## X5 1.0316934
## X6 0.9930010
## X3 0.6764543
## X4 0.5835573
## X2 0.5796439
## X1 0.5675177

There appears to be some selection bias with X5, X6 the 2 variable with the most distinct values and least granular being selected as the top 2 most important variables.

EXERCISE 8.3

In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data.

The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:

“Figure 8.24
“Figure 8.24
  1. Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?

The bagging fraction affects how much of the training data each tree sees, while the learning rate controls how much weight each tree’s predictions get in the final ensemble. The model on the right focuses its importance on the first few features as the learning rate is much higher, giving more weight to the models where those top features are used, thus determining them more important.

  1. Which model do you think would be more predictive of other samples?

Based on the learning rate and bagging fraction values the model on the left would be more successful at predicting other samples. The lower values are less likely to contribute to a model that is over fit. Lower values are generally preferred as they make the model robust to the specific characteristics of tree and thus allowing it to generalize well.

  1. How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?

Increasing the interaction depth or tree depth would potentially capture more complex relationships between predictors. As more predictors are available to be used at a split it would give incresaed exposure and weight to a larger number of predictors thus increasing their importance value. As the importance value increases for seemingly less important ones the slope of predictor importance would decrease.

EXERCISE 8.7

Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:

Load Data and Pre process

# Load libraries
library(tidymodels)
library(AppliedPredictiveModeling)

# Load data 
data("ChemicalManufacturingProcess")

chemical_df <- ChemicalManufacturingProcess

#head(chemical_df)
# Find number of missing values in each predictor
chemical_df %>% summarise_all(~ sum(is.na(.))) %>% pivot_longer(cols = colnames(chemical_df), names_to = "Variable", values_to = "Counts") %>% 
  filter(Counts > 0)
## # A tibble: 28 Ă— 2
##    Variable               Counts
##    <chr>                   <int>
##  1 ManufacturingProcess01      1
##  2 ManufacturingProcess02      3
##  3 ManufacturingProcess03     15
##  4 ManufacturingProcess04      1
##  5 ManufacturingProcess05      1
##  6 ManufacturingProcess06      2
##  7 ManufacturingProcess07      1
##  8 ManufacturingProcess08      1
##  9 ManufacturingProcess10      9
## 10 ManufacturingProcess11     10
## # ℹ 18 more rows
# Create a function that imputes the median of the predictor for the missing values

for(i in 1:ncol(chemical_df)) {
  chemical_df[ , i][is.na(chemical_df[ , i])] <- median(chemical_df[ , i], na.rm=TRUE)
}
# Get zero variance predictors 
nzv_labels <- nearZeroVar(chemical_df, names = TRUE)

print(paste0("The predictor/s with near zero variance are: ", nzv_labels))
## [1] "The predictor/s with near zero variance are: BiologicalMaterial07"
# Filter out the zero variance predictors 
filtered_chemical_df <- as_tibble(chemical_df) %>% select(-all_of(nzv_labels))
set.seed(300)

# Split data into train and test sets
chemical_split <- initial_split(filtered_chemical_df, 
                            prop = 0.7)

# Create the training data
chemical_train <- chemical_split %>%
  training()

chemXtrain <- data.frame(chemical_train %>% select(-Yield))
chemYtrain <- chemical_train %>% select(Yield)

# Create the test data
chemical_test <- chemical_split %>% 
  testing()

chemXtest <- data.frame(chemical_test %>% select(-Yield))
chemYtest <- chemical_test %>% select(Yield)

Train several tree based models

ctrl <- trainControl(method = "cv", number = 10)

Single tree

# getModelInfo("rpart2")
set.seed(301)

rpartTune <- train(chemXtrain, chemYtrain$Yield, 
                   method = "rpart2",
                   tuneLength = 10,
                   trControl = trainControl(method = "cv"),
                   metric = "Rsquared")
## note: only 9 possible values of the max tree depth from the initial fit.
##  Truncating the grid to 9 .
rpartTune
## CART 
## 
## 123 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 111, 111, 111, 110, 110, 111, ... 
## Resampling results across tuning parameters:
## 
##   maxdepth  RMSE      Rsquared   MAE     
##    1        1.503475  0.3688235  1.162670
##    2        1.500293  0.3871477  1.169326
##    3        1.520698  0.4047003  1.200330
##    4        1.463409  0.4703839  1.131641
##    6        1.436952  0.5185660  1.109244
##    7        1.418451  0.5362022  1.084360
##    8        1.442204  0.5313016  1.094717
##    9        1.453427  0.5238343  1.101385
##   10        1.453427  0.5238343  1.101385
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was maxdepth = 7.
rpartPred <- predict(rpartTune, newdata = chemXtest)

## The function 'postResample' can be used to get performance values
rpart_metrics <- postResample(pred = rpartPred, obs = chemYtest$Yield)

rpart_metrics
##      RMSE  Rsquared       MAE 
## 1.4292893 0.4249483 1.1197365

Random forest model

# getModelInfo("rf")
set.seed(302)

rfTune <- train(chemXtrain, chemYtrain$Yield, 
                   method = "rf",
                   tuneLength = 10,
                   trControl = trainControl(method = "cv"),
                   metric = "Rsquared")

rfTune
## Random Forest 
## 
## 123 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 111, 111, 110, 111, 111, 110, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE      
##    2    1.277038  0.6261434  1.0265381
##    8    1.170290  0.6577711  0.9361999
##   14    1.141997  0.6662932  0.9092988
##   20    1.141338  0.6555855  0.9121757
##   26    1.144411  0.6498069  0.9105170
##   32    1.135027  0.6503234  0.8976756
##   38    1.148485  0.6387736  0.9087014
##   44    1.146302  0.6366574  0.9047472
##   50    1.146526  0.6364851  0.9055020
##   56    1.147323  0.6337794  0.9042186
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 14.
rfPred <- predict(rfTune, newdata = chemXtest)

## The function 'postResample' can be used to get performance values
rf_metrics <- postResample(pred = rfPred, obs = chemYtest$Yield)

rf_metrics
##      RMSE  Rsquared       MAE 
## 1.0158680 0.6932786 0.8100351

Bagged tree

set.seed(303)

bagTune <- train(chemXtrain, chemYtrain$Yield, 
                   method = "treebag",
                   tuneLength = 10,
                   trControl = trainControl(method = "cv"),
                   metric = "Rsquared")

bagTune
## Bagged CART 
## 
## 123 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 111, 111, 110, 110, 111, 111, ... 
## Resampling results:
## 
##   RMSE      Rsquared  MAE     
##   1.258478  0.573961  1.006399
bagPred <- predict(bagTune, newdata = chemXtest)

## The function 'postResample' can be used to get performance values
bag_metrics <- postResample(pred = bagPred, obs = chemYtest$Yield)

bag_metrics
##      RMSE  Rsquared       MAE 
## 1.0043339 0.6843670 0.8267982

Boosted tree

# getModelInfo("gbm")
set.seed(304)

gbmGrid <- expand.grid(interaction.depth = seq(1, 7, by = 2), 
                       n.trees = seq(100, 1000, by = 50),
                       shrinkage = c(0.01, 0.1), 
                       n.minobsinnode=10)

gbmTune2 <- train(chemXtrain, 
                 chemYtrain$Yield, 
                 method = "gbm",
                 tuneGrid = gbmGrid,
                 trControl = trainControl(method = "cv"),
                 metric = "Rsquared",
                 verbose = FALSE)

gbmTune2$bestTune
##    n.trees interaction.depth shrinkage n.minobsinnode
## 55     900                 5      0.01             10
gbmPred <- predict(gbmTune2, newdata = chemXtest)

## The function 'postResample' can be used to get performance values
gbm_metrics <- postResample(pred = gbmPred, obs = chemYtest$Yield)

gbm_metrics
##      RMSE  Rsquared       MAE 
## 1.0146358 0.6816695 0.7605147

Cubist

# getModelInfo("cubist")
set.seed(305)

cubistGrid <- expand.grid(committees = c(1, 10, 50, 100), 
                          neighbors = c(0, 1, 5, 9))

cubistTuned <- train(chemXtrain, 
                chemYtrain$Yield, 
                 method = "cubist",
                 trControl = trainControl(method = "cv"),
                 tuneGrid = cubistGrid,
                 metric = "Rsquared",
                 verbose = FALSE)

cubistTuned
## Cubist 
## 
## 123 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 111, 110, 111, 109, 111, 111, ... 
## Resampling results across tuning parameters:
## 
##   committees  neighbors  RMSE      Rsquared   MAE      
##     1         0          1.422159  0.5071353  1.0684736
##     1         1          1.341539  0.5739216  0.9294287
##     1         5          1.344637  0.5745633  0.9928622
##     1         9          1.374678  0.5365619  1.0177208
##    10         0          1.279789  0.5862690  0.9885075
##    10         1          1.196680  0.6599332  0.8524738
##    10         5          1.198060  0.6516036  0.8843722
##    10         9          1.239305  0.6103248  0.9412864
##    50         0          1.132927  0.6330376  0.9225310
##    50         1          1.037712  0.7008581  0.7616969
##    50         5          1.043635  0.7051735  0.8120854
##    50         9          1.083676  0.6642215  0.8621034
##   100         0          1.141282  0.6345595  0.9294671
##   100         1          1.048064  0.7033339  0.7813476
##   100         5          1.058763  0.6976084  0.8252739
##   100         9          1.102042  0.6567972  0.8790249
## 
## Rsquared was used to select the optimal model using the largest value.
## The final values used for the model were committees = 50 and neighbors = 5.
cubistPred <- predict(cubistTuned, newdata = chemXtest)

## The function 'postResample' can be used to get performance values
cubist_metrics <- postResample(pred = cubistPred, obs = chemYtest$Yield)

cubist_metrics
##      RMSE  Rsquared       MAE 
## 0.9198151 0.7376965 0.6614308

a. Which tree-based regression model gives the optimal re-sampling and test set performance?

models <- c("Single Tree", "Random Forest", "Bagged Tree", "Boosted Tree", "Cubist")
values <- c(rpart_metrics["Rsquared"], rf_metrics["Rsquared"], bag_metrics["Rsquared"], gbm_metrics["Rsquared"], cubist_metrics["Rsquared"])

tibble(Models = models, R2_Test = values) %>% arrange(desc(R2_Test))
## # A tibble: 5 Ă— 2
##   Models        R2_Test
##   <chr>           <dbl>
## 1 Cubist          0.738
## 2 Random Forest   0.693
## 3 Bagged Tree     0.684
## 4 Boosted Tree    0.682
## 5 Single Tree     0.425

Five different methods were used to train models using a training set from the chemical manufacturing process data. The models included single regression tree, random forest, bagged tree, stochastic gradient boosting, and cubist methods. The model that provided the best re-sampling and test set perfromance was the one using the cubist method. The final values used for the model were committees = 50 and neighbors = 5 with an R-squared of 0.703. On the test set, the R-sqaured was 0.736. The cubist model explained 73.6% of the variability in the target variable Yield.

b1. Which predictors are most important in the optimal tree-based regression model?

varImp(cubistTuned)
## cubist variable importance
## 
##   only 20 most important variables shown (out of 56)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess17   68.18
## ManufacturingProcess13   60.23
## BiologicalMaterial03     51.14
## BiologicalMaterial06     42.05
## ManufacturingProcess04   40.91
## BiologicalMaterial12     40.91
## BiologicalMaterial02     39.77
## ManufacturingProcess11   30.68
## ManufacturingProcess33   23.86
## ManufacturingProcess29   22.73
## ManufacturingProcess09   20.45
## ManufacturingProcess39   18.18
## ManufacturingProcess37   18.18
## ManufacturingProcess19   17.05
## ManufacturingProcess02   15.91
## ManufacturingProcess27   13.64
## ManufacturingProcess20   13.64
## ManufacturingProcess16   13.64
## ManufacturingProcess26   13.64

b2. Do either the biological or process variables dominate the list?

Out of the 20 most important features 16 of them were manufacturing processes and 4 were biological material markers.

b3. How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?

The optimal linear model which was built using elastic net and the non linear model used a support vector machine. Both had similar patterns of the most important predictor with the majority being manufacturing processes. The predictors in red are ones that all three modesls share in their top 10 most important variables.

Elastic net model

ManufacturingProcess09
ManufacturingProcess32
ManufacturingProcess06
BiologicalMaterial02
BiologicalMaterial03
ManufacturingProcess31
ManufacturingProcess17
ManufacturingProcess13
ManufacturingProcess36

SVM

ManufacturingProcess32
ManufacturingProcess13 ManufacturingProcess17
BiologicalMaterial06
ManufacturingProcess09
BiologicalMaterial03
BiologicalMaterial12
ManufacturingProcess06
BiologicalMaterial02
ManufacturingProcess36

c. Plot the optimal single tree with the distribution of yield in the terminal nodes.

library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 4.3.3
rpart.plot(rpartTune$finalModel, fallen.leaves = FALSE, main = "Decision Tree for Chemical Data Set")

c2. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?

Viewing the decision tree does give additional information about the threshold that the level of biological material and the manufacturing process positively or negatively effect the Yield. For example the two paths that produce the highest yeild are when ManufacturingProcess32 is over 160. At the next split that evaluates ManufacturingProcess33, if this value is NOT greater or equal to 33 then according to the model the Yield will be 43, the highest of any leaf node. If a model performs well and we are confident in it’s strength, these cut off measures might be helpful in determining when to tweak certain variables when they rise above or below the threshold that optimizes Yield.