In this assignment, the problems 8.1, 8.2, 8.3, and 8.7 have been solved from the Kuhn and Johnson book.

Libraries:

library(mlbench)
library(randomForest)
library(caret)
library(party)
library(dplyr)
library(rpart)
#library(ipred)
library(partykit)
library(AppliedPredictiveModeling)
library(gbm)
library(Cubist)
#library(rpart.plot)

Problem-8.1:

Recreate the simulated data from Exercise 7.2:

set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"

head(simulated)

##          V1        V2         V3         V4         V5         V6        V7
## 1 0.5337724 0.6478064 0.85078526 0.18159957 0.92903976 0.36179060 0.8266609
## 2 0.5837650 0.4381528 0.67272659 0.66924914 0.16379784 0.45305931 0.6489601
## 3 0.5895783 0.5879065 0.40967108 0.33812728 0.89409334 0.02681911 0.1785614
## 4 0.6910399 0.2259548 0.03335447 0.06691274 0.63744519 0.52500637 0.5133614
## 5 0.6673315 0.8188985 0.71676079 0.80324287 0.08306864 0.22344157 0.6644906
## 6 0.8392937 0.3862983 0.64618857 0.86105431 0.63038947 0.43703891 0.3360117
##          V8         V9       V10        y
## 1 0.4214081 0.59111440 0.5886216 18.46398
## 2 0.8446239 0.92819306 0.7584008 16.09836
## 3 0.3495908 0.01759542 0.4441185 17.76165
## 4 0.7970260 0.68986918 0.4450716 13.78730
## 5 0.9038919 0.39696995 0.5500808 18.42984
## 6 0.6489177 0.53116033 0.9066182 20.85817

(a) Fit a random forest model to all of the predictors, then estimate the variable importance scores:

set.seed(200)
model1 <- randomForest(y ~ ., data = simulated,
                       importance = TRUE,ntree = 1000)
rfImp1 <- varImp(model1, scale = FALSE)
rfImp1

##          Overall
## V1   8.605365900
## V2   6.831259165
## V3   0.741534943
## V4   7.883384091
## V5   2.244750293
## V6   0.136054182
## V7   0.055950944
## V8  -0.068195812
## V9   0.003196175
## V10 -0.054705900

Did the random forest model significantly use the uninformative predictors (V6 – V10)?

The negative or near zero importance scores for V6–V10 indicate that the random forest model recognized these predictors as uninformative predictors and the model did not use these uninformative predictors significantly for making predictions.

(b) Now add an additional predictor that is highly correlated with one of the informative predictors. For example:

set.seed(200)
simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V1)

## [1] 0.9497025

Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?

set.seed(200)
model2 <- randomForest(y ~ ., data = simulated,
                       importance = TRUE,ntree = 1000)
rfImp2 <- varImp(model2, scale = FALSE)
rfImp2

##                Overall
## V1          6.05096011
## V2          6.19650787
## V3          0.52249125
## V4          6.79689905
## V5          2.26790879
## V6          0.19172138
## V7          0.03766832
## V8         -0.08406511
## V9          0.02041124
## V10        -0.04481192
## duplicate1  4.20641786

The importance score for V1 has changed. After adding a highly correlated variable, V1’s importance score has decreased from 8.605 to 6.051.

(c) Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?

simulated<-simulated|>select(-duplicate1)
set.seed(200)
model3 <- cforest(y ~ ., data = simulated, 
                  ntree = 1000)
rfImp3 <- varimp(model3,conditional = TRUE)
rfImp3

##          V1          V2          V3          V4          V5          V6 
##  6.08762374  5.24473369  0.02006549  6.10961502  1.41465983 -0.19520548 
##          V7          V8          V9         V10 
## -0.17739105 -0.34888062 -0.14389739 -0.20262827

Like the traditional random forest, the predictors V1–V5 are again identified as significant or important. Again, the predictors V6–V10 having negative or near-zero values are identified as insignificant or unimportant for predicting the response by the random forest using conditional inference trees. Consequently, the pattern of predictor importance and their respective importance scores have been changed. For example, with traditional random forest, the top most important predictor was V1, but with conditional inference trees, it bacme V4. The ascending orders for the important scores of the variables for both traditional and conditional inference trees are found to be following:

Traditional:

order(as.numeric(unlist(rfImp1)))

##  [1]  8 10  9  7  6  3  5  2  4  1

Conditional:

order(rfImp3)

##  [1]  8 10  6  7  9  3  5  2  1  4

(d) Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?

Boosted trees:

set.seed(100)
gbmModel <- gbm(y ~ ., data = simulated, distribution = "gaussian")

gbmImp4 <- summary.gbm(gbmModel)

gbmImp4

##     var    rel.inf
## V4   V4 29.7926789
## V1   V1 26.0431191
## V2   V2 23.7400606
## V5   V5 11.1100184
## V3   V3  8.7680640
## V6   V6  0.3369801
## V8   V8  0.2090789
## V7   V7  0.0000000
## V9   V9  0.0000000
## V10 V10  0.0000000

Cubist:

cubistMod <- cubist(x=simulated[, 1:10],y=simulated$y,committees = 100)
cubistImp<-varImp(cubistMod)
cubistImp

##     Overall
## V1     71.5
## V3     47.0
## V2     58.5
## V4     48.0
## V5     33.0
## V6     13.0
## V7      0.0
## V8      0.0
## V9      0.0
## V10     0.0

No, the pattern did not occur the same for the boosted trees and cubist models.

Problem-8.2:

Use a simulation to show tree bias with different granularities.

set.seed(100)

# Simulating different predictors
V1 <- sample(0:50,200, replace = T)
V2<- sample(0:100,200, replace = T)
V3<- sample(0:500,200, replace = T)

# creating response variable
y <- V1 + V2 + V3 + rnorm(200)

# create simulated dataset
simulated2 <- data.frame(V1,V2,V3, y)

# Predictors' variances
var(V1)

## [1] 224.5988

var(V2)

## [1] 861.805

var(V3)

## [1] 22751.54

set.seed(100)
# fit random forest with  simulated data 
rfmodel1 <- randomForest(y ~., data = simulated2, importance = TRUE, ntree = 1000)

# See importance
varImp(rfmodel1, scale=FALSE)

##       Overall
## V1   374.8066
## V2  1189.1852
## V3 37796.6048

In this simulation, tree bias with different granularities has been observed using a random forest model with three predictors: V1, V2, and V3. The predictors were sampled from different ranges. V3 had the largest range and the most distinct values. V2 had a medium range and V1 had the smallest range. The random forest model then assigned the highest importance to V3. It reflects the bias of tree-based models towards predictors with higher granularity as they provide more distinct splits in the data. On the other hand, V1 had the smallest range. It also had the lowest granularity. As a result, V1 received the lowest importance. It indicates how the model favors predictors with more distinct values. This simulation demonstrates how random forests exhibit tree bias. The model prioritizes predictors with higher granularity. Predictors with higher granularity provide more distinct splits in the data. These more informative splits contribute to improved model performance

Problem-8.3:

In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:

(a) Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?

The model on the right side of the figure 8.24 uses a higher bagging fraction (0.9) and learning rate (0.9). With a higher bagging fraction, a larger portion of the training data is used to create each decision tree. As the bagging fraction increases,the samples used to build trees become more alike, causing some predictors being more influential. On the other hand, the model on the left of the figure 8.24 has a lower bagging fraction(0.1) and learning rate (0.1). A lower learning rate makes the model less aggressive, which allows it to consider more predictors as important.

(b) Which model do you think would be more predictive of other samples?

The model with the lower bagging fraction and lower learning rate i.e. the model on the left of the figure 8.24 would be more predictive of other samples.

(c) How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?

Increasing interaction depth helps models find more complex patterns in the data. In the model with a higher bagging fraction (0.9) and learning rate (0.9), this would make predictors with higher variance even more important. As a result, the slope of predictor importance will become steeper for this model. In the model with a lower bagging fraction (0.1) and learning rate (0.1), the effect would be smaller. This model spreads importance more evenly and is less aggressive in favoring certain predictors

Problem-8.7:

Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:

Get data and do required pre-processing:

data("ChemicalManufacturingProcess")
dim(ChemicalManufacturingProcess)

## [1] 176  58

#head(ChemicalManufacturingProcess)

The matrix processPredictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.

A small percentage of cells in the predictor set contain missing values. Use an imputation function to fill in these missing values:

# Check for missing values in each column
missing_values <- colSums(is.na(ChemicalManufacturingProcess))
#print(missing_values)

# Apply KNN imputation
knn_impute_chemical <- preProcess(ChemicalManufacturingProcess, method=c('knnImpute'))

# Imputed dataset
imputed_chemical_df <- predict(knn_impute_chemical, ChemicalManufacturingProcess)

# Calculate total number of missing values after imputation
total_missing <- sum(is.na(imputed_chemical_df))
#print(total_missing)

Split the data into train test set:

#dim(imputed_chemical_df)
imputed_chemical_df <- imputed_chemical_df %>%
  select_at(vars(-one_of(nearZeroVar(., names = TRUE))))

set.seed(100)
train_chemical <-createDataPartition(imputed_chemical_df$Yield, times = 1, p = .70, list = FALSE)

train_chemical_x <- imputed_chemical_df[train_chemical, ][, -c(1)] 
test_chemical_x <- imputed_chemical_df[-train_chemical, ][, -c(1)] 
train_chemical_y<- imputed_chemical_df[train_chemical, ]$Yield
test_chemical_y <- imputed_chemical_df[-train_chemical, ]$Yield

Optimal linear regression from previous problem from previous homework was ridge regression which is given below:

set.seed(135)

ridgegrid <- data.frame(.lambda = seq(0,0.1,length=15))

ridge_model <- train(x=train_chemical_x,y=train_chemical_y,
                       method='ridge',
                       tuneGrid=ridgegrid,
                       trControl=trainControl(method='cv'),
                       preProc = c('center','scale')
                       )

ridge_model

## Ridge Regression 
## 
## 124 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 112, 111, 112, 111, 112, 112, ... 
## Resampling results across tuning parameters:
## 
##   lambda       RMSE      Rsquared   MAE      
##   0.000000000  4.054160  0.3911420  1.6139763
##   0.007142857  1.633270  0.5251733  0.8315422
##   0.014285714  1.767593  0.5375635  0.8611365
##   0.021428571  1.777055  0.5452884  0.8623404
##   0.028571429  1.764046  0.5507236  0.8590180
##   0.035714286  1.746125  0.5548928  0.8551684
##   0.042857143  1.727679  0.5582498  0.8507987
##   0.050000000  1.709956  0.5610352  0.8464616
##   0.057142857  1.693284  0.5633933  0.8425123
##   0.064285714  1.677696  0.5654189  0.8388062
##   0.071428571  1.663129  0.5671781  0.8355731
##   0.078571429  1.649496  0.5687194  0.8324824
##   0.085714286  1.636707  0.5700795  0.8295368
##   0.092857143  1.624683  0.5712868  0.8268756
##   0.100000000  1.613350  0.5723639  0.8243647
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.1.

Make prediction on test set and evaluate performance of Ridge model:

ridgepred <- predict(ridge_model, newdata=test_chemical_x)
postResample(pred=ridgepred, obs=test_chemical_y)

##      RMSE  Rsquared       MAE 
## 1.6115319 0.1753037 0.8482122

Tune optimum nonlinear regression SVM Model from previous homework:

set.seed(100)
svmRTuned <- train(x = train_chemical_x,
y = train_chemical_y,
method = "svmRadial",
preProc = c("center", "scale"),
tuneLength = 14,
trControl = trainControl(method = "cv"))

svmRTuned

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 124 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 112, 111, 112, 112, 112, 111, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE       Rsquared   MAE      
##      0.25  0.7950783  0.5019956  0.6355235
##      0.50  0.7206527  0.5675452  0.5786388
##      1.00  0.6620420  0.6391608  0.5329275
##      2.00  0.6199184  0.6705515  0.5039466
##      4.00  0.5997918  0.6797461  0.4884378
##      8.00  0.5916547  0.6873380  0.4857772
##     16.00  0.5916547  0.6873380  0.4857772
##     32.00  0.5916547  0.6873380  0.4857772
##     64.00  0.5916547  0.6873380  0.4857772
##    128.00  0.5916547  0.6873380  0.4857772
##    256.00  0.5916547  0.6873380  0.4857772
##    512.00  0.5916547  0.6873380  0.4857772
##   1024.00  0.5916547  0.6873380  0.4857772
##   2048.00  0.5916547  0.6873380  0.4857772
## 
## Tuning parameter 'sigma' was held constant at a value of 0.01447582
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01447582 and C = 8.

Make prediction on test data and evaluate SVM model performance:

svmRPred <- predict(svmRTuned, newdata = test_chemical_x)
postResample(pred = svmRPred, obs = test_chemical_y)

##      RMSE  Rsquared       MAE 
## 0.6809040 0.5165981 0.5419960

(a) Which tree-based regression model gives the optimal resampling and test set performance?

Fit Recursive Partitioning Decision Tree :

set.seed(100)
rpartTune <- train(train_chemical_x, train_chemical_y,
method = "rpart2",
preProc = c("center", "scale"),
tuneLength = 10,
trControl = trainControl(method = "cv"))

## note: only 9 possible values of the max tree depth from the initial fit.
##  Truncating the grid to 9 .

rpartTune

## CART 
## 
## 124 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 112, 111, 112, 112, 112, 111, ... 
## Resampling results across tuning parameters:
## 
##   maxdepth  RMSE       Rsquared   MAE      
##   1         0.8644371  0.3525877  0.6811590
##   2         0.8624244  0.3584603  0.7020611
##   3         0.8340208  0.4130831  0.6557903
##   4         0.7920130  0.4697482  0.6194825
##   5         0.7807961  0.4888475  0.6076105
##   6         0.7839218  0.4853527  0.5925733
##   7         0.8017760  0.4755253  0.5978002
##   8         0.8127784  0.4666723  0.6052341
##   9         0.8116171  0.4672931  0.6010899
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was maxdepth = 5.

Make prediction on test set and evaluate performance of Recursive Partitioning Decision Tree Model:

rpartpred <- predict(rpartTune, newdata=test_chemical_x)
postResample(pred=rpartpred, obs=test_chemical_y)

##      RMSE  Rsquared       MAE 
## 0.7373570 0.3977679 0.5922980

Fit Random Forest:

set.seed(100)

rfmodel <- randomForest(train_chemical_x, train_chemical_y, importance = TRUE, ntrees = 1000)
rfmodel

## 
## Call:
##  randomForest(x = train_chemical_x, y = train_chemical_y, importance = TRUE,      ntrees = 1000) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 18
## 
##           Mean of squared residuals: 0.4016283
##                     % Var explained: 62.4

Make prediction on test set and evaluate performance of Random Forest:

rfpred <- predict(rfmodel, newdata=test_chemical_x)
postResample(pred=rfpred, obs=test_chemical_y)

##      RMSE  Rsquared       MAE 
## 0.5519336 0.6419386 0.4468046

Fit Boosted Tress:

gbmGrid <- expand.grid(.interaction.depth = seq(1, 7, by = 2),
.n.trees = seq(100, 1000, by = 50),
.shrinkage = c(0.01, 0.1),.n.minobsinnode = c(10, 20))

set.seed(100)

gbmTune <- train(train_chemical_x, train_chemical_y,
method = "gbm",
preProc = c("center", "scale"),
tuneGrid = gbmGrid,
verbose = FALSE)

gbmTune

## Stochastic Gradient Boosting 
## 
## 124 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ... 
## Resampling results across tuning parameters:
## 
##   shrinkage  interaction.depth  n.minobsinnode  n.trees  RMSE       Rsquared 
##   0.01       1                  10               100     0.8619427  0.4961031
##   0.01       1                  10               150     0.8147432  0.5128075
##   0.01       1                  10               200     0.7835660  0.5246585
##   0.01       1                  10               250     0.7629037  0.5335685
##   0.01       1                  10               300     0.7505948  0.5389202
##   0.01       1                  10               350     0.7417453  0.5419065
##   0.01       1                  10               400     0.7366480  0.5436428
##   0.01       1                  10               450     0.7332183  0.5448325
##   0.01       1                  10               500     0.7299197  0.5462591
##   0.01       1                  10               550     0.7274676  0.5472404
##   0.01       1                  10               600     0.7251939  0.5480176
##   0.01       1                  10               650     0.7233207  0.5488942
##   0.01       1                  10               700     0.7222373  0.5494308
##   0.01       1                  10               750     0.7200409  0.5511872
##   0.01       1                  10               800     0.7194872  0.5514386
##   0.01       1                  10               850     0.7182830  0.5518998
##   0.01       1                  10               900     0.7178425  0.5521463
##   0.01       1                  10               950     0.7175529  0.5519596
##   0.01       1                  10              1000     0.7172759  0.5518846
##   0.01       1                  20               100     0.8684749  0.4783788
##   0.01       1                  20               150     0.8223527  0.4975089
##   0.01       1                  20               200     0.7931652  0.5088669
##   0.01       1                  20               250     0.7748864  0.5135702
##   0.01       1                  20               300     0.7629090  0.5165410
##   0.01       1                  20               350     0.7546083  0.5189928
##   0.01       1                  20               400     0.7488313  0.5207284
##   0.01       1                  20               450     0.7457348  0.5208224
##   0.01       1                  20               500     0.7430661  0.5215792
##   0.01       1                  20               550     0.7417082  0.5212001
##   0.01       1                  20               600     0.7398886  0.5212005
##   0.01       1                  20               650     0.7388621  0.5208721
##   0.01       1                  20               700     0.7381801  0.5209677
##   0.01       1                  20               750     0.7369238  0.5209005
##   0.01       1                  20               800     0.7361333  0.5212499
##   0.01       1                  20               850     0.7359617  0.5203852
##   0.01       1                  20               900     0.7351879  0.5202388
##   0.01       1                  20               950     0.7352199  0.5196898
##   0.01       1                  20              1000     0.7348395  0.5194283
##   0.01       3                  10               100     0.8101889  0.5229983
##   0.01       3                  10               150     0.7690982  0.5327018
##   0.01       3                  10               200     0.7468371  0.5398614
##   0.01       3                  10               250     0.7342148  0.5452599
##   0.01       3                  10               300     0.7261570  0.5495239
##   0.01       3                  10               350     0.7202948  0.5534397
##   0.01       3                  10               400     0.7162083  0.5557302
##   0.01       3                  10               450     0.7136344  0.5570454
##   0.01       3                  10               500     0.7107801  0.5593219
##   0.01       3                  10               550     0.7082383  0.5613770
##   0.01       3                  10               600     0.7064542  0.5626714
##   0.01       3                  10               650     0.7050697  0.5636954
##   0.01       3                  10               700     0.7036995  0.5651753
##   0.01       3                  10               750     0.7026979  0.5659385
##   0.01       3                  10               800     0.7017441  0.5668405
##   0.01       3                  10               850     0.7013269  0.5670750
##   0.01       3                  10               900     0.7012487  0.5671458
##   0.01       3                  10               950     0.7005533  0.5675776
##   0.01       3                  10              1000     0.7000768  0.5680647
##   0.01       3                  20               100     0.8552634  0.4900206
##   0.01       3                  20               150     0.8104619  0.5027118
##   0.01       3                  20               200     0.7844226  0.5094961
##   0.01       3                  20               250     0.7686843  0.5131210
##   0.01       3                  20               300     0.7579430  0.5170305
##   0.01       3                  20               350     0.7525156  0.5172783
##   0.01       3                  20               400     0.7479366  0.5185796
##   0.01       3                  20               450     0.7442613  0.5202874
##   0.01       3                  20               500     0.7421446  0.5204887
##   0.01       3                  20               550     0.7399813  0.5207975
##   0.01       3                  20               600     0.7390178  0.5205588
##   0.01       3                  20               650     0.7375460  0.5210134
##   0.01       3                  20               700     0.7356886  0.5222227
##   0.01       3                  20               750     0.7351315  0.5215755
##   0.01       3                  20               800     0.7344454  0.5216805
##   0.01       3                  20               850     0.7349301  0.5204034
##   0.01       3                  20               900     0.7345114  0.5201655
##   0.01       3                  20               950     0.7342266  0.5197294
##   0.01       3                  20              1000     0.7342032  0.5190023
##   0.01       5                  10               100     0.8099838  0.5196701
##   0.01       5                  10               150     0.7664955  0.5336396
##   0.01       5                  10               200     0.7443165  0.5424552
##   0.01       5                  10               250     0.7315031  0.5486593
##   0.01       5                  10               300     0.7229947  0.5534912
##   0.01       5                  10               350     0.7165208  0.5574049
##   0.01       5                  10               400     0.7119047  0.5607382
##   0.01       5                  10               450     0.7089361  0.5625588
##   0.01       5                  10               500     0.7060637  0.5645443
##   0.01       5                  10               550     0.7040043  0.5658574
##   0.01       5                  10               600     0.7025471  0.5668620
##   0.01       5                  10               650     0.7013123  0.5679026
##   0.01       5                  10               700     0.7002210  0.5688047
##   0.01       5                  10               750     0.6988815  0.5698984
##   0.01       5                  10               800     0.6978051  0.5709989
##   0.01       5                  10               850     0.6968338  0.5719899
##   0.01       5                  10               900     0.6957553  0.5731314
##   0.01       5                  10               950     0.6956943  0.5732012
##   0.01       5                  10              1000     0.6951357  0.5737641
##   0.01       5                  20               100     0.8551111  0.4897872
##   0.01       5                  20               150     0.8087220  0.5046218
##   0.01       5                  20               200     0.7825776  0.5116426
##   0.01       5                  20               250     0.7669653  0.5147037
##   0.01       5                  20               300     0.7570251  0.5178158
##   0.01       5                  20               350     0.7512534  0.5190385
##   0.01       5                  20               400     0.7469635  0.5202314
##   0.01       5                  20               450     0.7440647  0.5201334
##   0.01       5                  20               500     0.7417145  0.5206497
##   0.01       5                  20               550     0.7394494  0.5218859
##   0.01       5                  20               600     0.7384177  0.5214764
##   0.01       5                  20               650     0.7366376  0.5218347
##   0.01       5                  20               700     0.7363315  0.5207254
##   0.01       5                  20               750     0.7354775  0.5208958
##   0.01       5                  20               800     0.7344882  0.5213555
##   0.01       5                  20               850     0.7341582  0.5210159
##   0.01       5                  20               900     0.7340244  0.5203763
##   0.01       5                  20               950     0.7341274  0.5196927
##   0.01       5                  20              1000     0.7335606  0.5197196
##   0.01       7                  10               100     0.8082362  0.5248709
##   0.01       7                  10               150     0.7646140  0.5388778
##   0.01       7                  10               200     0.7416104  0.5484260
##   0.01       7                  10               250     0.7283160  0.5540308
##   0.01       7                  10               300     0.7203198  0.5583439
##   0.01       7                  10               350     0.7154354  0.5604361
##   0.01       7                  10               400     0.7113687  0.5632397
##   0.01       7                  10               450     0.7077097  0.5664971
##   0.01       7                  10               500     0.7054048  0.5679274
##   0.01       7                  10               550     0.7032892  0.5693644
##   0.01       7                  10               600     0.7017489  0.5702326
##   0.01       7                  10               650     0.7004403  0.5710407
##   0.01       7                  10               700     0.6997275  0.5714053
##   0.01       7                  10               750     0.6989119  0.5719676
##   0.01       7                  10               800     0.6981644  0.5725451
##   0.01       7                  10               850     0.6974258  0.5732495
##   0.01       7                  10               900     0.6967265  0.5739977
##   0.01       7                  10               950     0.6965644  0.5742331
##   0.01       7                  10              1000     0.6961831  0.5745076
##   0.01       7                  20               100     0.8552297  0.4883830
##   0.01       7                  20               150     0.8101163  0.5014834
##   0.01       7                  20               200     0.7845376  0.5077401
##   0.01       7                  20               250     0.7684001  0.5127391
##   0.01       7                  20               300     0.7579139  0.5158798
##   0.01       7                  20               350     0.7516809  0.5175866
##   0.01       7                  20               400     0.7480381  0.5181748
##   0.01       7                  20               450     0.7449675  0.5190462
##   0.01       7                  20               500     0.7431517  0.5188939
##   0.01       7                  20               550     0.7417291  0.5190313
##   0.01       7                  20               600     0.7402981  0.5190832
##   0.01       7                  20               650     0.7385103  0.5199205
##   0.01       7                  20               700     0.7377899  0.5198690
##   0.01       7                  20               750     0.7367947  0.5198067
##   0.01       7                  20               800     0.7364050  0.5192006
##   0.01       7                  20               850     0.7355238  0.5194831
##   0.01       7                  20               900     0.7347639  0.5197915
##   0.01       7                  20               950     0.7347416  0.5192253
##   0.01       7                  20              1000     0.7350005  0.5184241
##   0.10       1                  10               100     0.7275323  0.5352510
##   0.10       1                  10               150     0.7279266  0.5323091
##   0.10       1                  10               200     0.7295162  0.5310154
##   0.10       1                  10               250     0.7338571  0.5269244
##   0.10       1                  10               300     0.7334167  0.5279079
##   0.10       1                  10               350     0.7369279  0.5244824
##   0.10       1                  10               400     0.7394030  0.5216317
##   0.10       1                  10               450     0.7408112  0.5203014
##   0.10       1                  10               500     0.7401792  0.5214431
##   0.10       1                  10               550     0.7418537  0.5199436
##   0.10       1                  10               600     0.7439024  0.5180999
##   0.10       1                  10               650     0.7446695  0.5173160
##   0.10       1                  10               700     0.7455351  0.5161426
##   0.10       1                  10               750     0.7462623  0.5152511
##   0.10       1                  10               800     0.7464859  0.5152783
##   0.10       1                  10               850     0.7474904  0.5145378
##   0.10       1                  10               900     0.7478471  0.5139936
##   0.10       1                  10               950     0.7487633  0.5131569
##   0.10       1                  10              1000     0.7489365  0.5129501
##   0.10       1                  20               100     0.7431509  0.5119040
##   0.10       1                  20               150     0.7488784  0.5003224
##   0.10       1                  20               200     0.7504600  0.4988738
##   0.10       1                  20               250     0.7550166  0.4944737
##   0.10       1                  20               300     0.7602555  0.4883015
##   0.10       1                  20               350     0.7629998  0.4868599
##   0.10       1                  20               400     0.7673899  0.4827199
##   0.10       1                  20               450     0.7712596  0.4792313
##   0.10       1                  20               500     0.7715267  0.4794554
##   0.10       1                  20               550     0.7742775  0.4773005
##   0.10       1                  20               600     0.7758234  0.4762470
##   0.10       1                  20               650     0.7768118  0.4759240
##   0.10       1                  20               700     0.7773467  0.4752894
##   0.10       1                  20               750     0.7778800  0.4751854
##   0.10       1                  20               800     0.7785847  0.4749088
##   0.10       1                  20               850     0.7795916  0.4746016
##   0.10       1                  20               900     0.7810432  0.4732491
##   0.10       1                  20               950     0.7814590  0.4733112
##   0.10       1                  20              1000     0.7828688  0.4722748
##   0.10       3                  10               100     0.7088970  0.5526858
##   0.10       3                  10               150     0.7056364  0.5571894
##   0.10       3                  10               200     0.7055293  0.5584932
##   0.10       3                  10               250     0.7056879  0.5582212
##   0.10       3                  10               300     0.7054920  0.5586714
##   0.10       3                  10               350     0.7056000  0.5587617
##   0.10       3                  10               400     0.7054466  0.5591601
##   0.10       3                  10               450     0.7057178  0.5589761
##   0.10       3                  10               500     0.7053254  0.5593909
##   0.10       3                  10               550     0.7053087  0.5595411
##   0.10       3                  10               600     0.7050890  0.5598573
##   0.10       3                  10               650     0.7051730  0.5598414
##   0.10       3                  10               700     0.7050711  0.5599593
##   0.10       3                  10               750     0.7050454  0.5600081
##   0.10       3                  10               800     0.7050121  0.5600747
##   0.10       3                  10               850     0.7050387  0.5600621
##   0.10       3                  10               900     0.7050327  0.5600945
##   0.10       3                  10               950     0.7050146  0.5601346
##   0.10       3                  10              1000     0.7050362  0.5601241
##   0.10       3                  20               100     0.7392738  0.5117369
##   0.10       3                  20               150     0.7406613  0.5084034
##   0.10       3                  20               200     0.7455712  0.5031134
##   0.10       3                  20               250     0.7485909  0.5018524
##   0.10       3                  20               300     0.7504264  0.5011576
##   0.10       3                  20               350     0.7537975  0.4991533
##   0.10       3                  20               400     0.7543302  0.4982676
##   0.10       3                  20               450     0.7552320  0.4980858
##   0.10       3                  20               500     0.7563803  0.4977284
##   0.10       3                  20               550     0.7576535  0.4966238
##   0.10       3                  20               600     0.7581226  0.4970064
##   0.10       3                  20               650     0.7592556  0.4961807
##   0.10       3                  20               700     0.7602633  0.4957032
##   0.10       3                  20               750     0.7614354  0.4947545
##   0.10       3                  20               800     0.7617726  0.4944720
##   0.10       3                  20               850     0.7614385  0.4950398
##   0.10       3                  20               900     0.7619100  0.4946527
##   0.10       3                  20               950     0.7621800  0.4944959
##   0.10       3                  20              1000     0.7623670  0.4945085
##   0.10       5                  10               100     0.7096003  0.5572463
##   0.10       5                  10               150     0.7086674  0.5575044
##   0.10       5                  10               200     0.7077940  0.5589958
##   0.10       5                  10               250     0.7083593  0.5585496
##   0.10       5                  10               300     0.7083080  0.5588205
##   0.10       5                  10               350     0.7081298  0.5591577
##   0.10       5                  10               400     0.7083298  0.5591138
##   0.10       5                  10               450     0.7084492  0.5591287
##   0.10       5                  10               500     0.7084908  0.5592184
##   0.10       5                  10               550     0.7083985  0.5593860
##   0.10       5                  10               600     0.7084622  0.5593270
##   0.10       5                  10               650     0.7085878  0.5592666
##   0.10       5                  10               700     0.7086858  0.5592076
##   0.10       5                  10               750     0.7087874  0.5591617
##   0.10       5                  10               800     0.7088329  0.5591629
##   0.10       5                  10               850     0.7088602  0.5591647
##   0.10       5                  10               900     0.7088670  0.5591817
##   0.10       5                  10               950     0.7089035  0.5591615
##   0.10       5                  10              1000     0.7089559  0.5591108
##   0.10       5                  20               100     0.7451326  0.5042425
##   0.10       5                  20               150     0.7436587  0.5053513
##   0.10       5                  20               200     0.7481337  0.5007987
##   0.10       5                  20               250     0.7501660  0.4999680
##   0.10       5                  20               300     0.7521845  0.4983645
##   0.10       5                  20               350     0.7565896  0.4945845
##   0.10       5                  20               400     0.7572699  0.4944394
##   0.10       5                  20               450     0.7586502  0.4940749
##   0.10       5                  20               500     0.7585619  0.4944812
##   0.10       5                  20               550     0.7611195  0.4922538
##   0.10       5                  20               600     0.7612814  0.4922562
##   0.10       5                  20               650     0.7617362  0.4920931
##   0.10       5                  20               700     0.7622562  0.4913072
##   0.10       5                  20               750     0.7623991  0.4914863
##   0.10       5                  20               800     0.7630963  0.4912227
##   0.10       5                  20               850     0.7634993  0.4911920
##   0.10       5                  20               900     0.7632487  0.4915922
##   0.10       5                  20               950     0.7635985  0.4913546
##   0.10       5                  20              1000     0.7642794  0.4906482
##   0.10       7                  10               100     0.7083174  0.5552304
##   0.10       7                  10               150     0.7054586  0.5596078
##   0.10       7                  10               200     0.7048000  0.5612095
##   0.10       7                  10               250     0.7042707  0.5620460
##   0.10       7                  10               300     0.7043668  0.5624224
##   0.10       7                  10               350     0.7043567  0.5626495
##   0.10       7                  10               400     0.7039353  0.5632376
##   0.10       7                  10               450     0.7041271  0.5631395
##   0.10       7                  10               500     0.7041667  0.5631268
##   0.10       7                  10               550     0.7041838  0.5632392
##   0.10       7                  10               600     0.7041713  0.5632585
##   0.10       7                  10               650     0.7042707  0.5631922
##   0.10       7                  10               700     0.7042538  0.5633161
##   0.10       7                  10               750     0.7042096  0.5633967
##   0.10       7                  10               800     0.7042487  0.5633671
##   0.10       7                  10               850     0.7042830  0.5633789
##   0.10       7                  10               900     0.7043372  0.5633227
##   0.10       7                  10               950     0.7043226  0.5633839
##   0.10       7                  10              1000     0.7043076  0.5634143
##   0.10       7                  20               100     0.7472499  0.4985965
##   0.10       7                  20               150     0.7483026  0.4959972
##   0.10       7                  20               200     0.7543290  0.4900388
##   0.10       7                  20               250     0.7585461  0.4861161
##   0.10       7                  20               300     0.7591736  0.4862691
##   0.10       7                  20               350     0.7613460  0.4843779
##   0.10       7                  20               400     0.7600581  0.4871606
##   0.10       7                  20               450     0.7622616  0.4853936
##   0.10       7                  20               500     0.7643479  0.4835444
##   0.10       7                  20               550     0.7651031  0.4830860
##   0.10       7                  20               600     0.7660236  0.4825360
##   0.10       7                  20               650     0.7665230  0.4824743
##   0.10       7                  20               700     0.7668062  0.4823541
##   0.10       7                  20               750     0.7670085  0.4824451
##   0.10       7                  20               800     0.7676877  0.4821189
##   0.10       7                  20               850     0.7681026  0.4818954
##   0.10       7                  20               900     0.7682819  0.4818895
##   0.10       7                  20               950     0.7679894  0.4819427
##   0.10       7                  20              1000     0.7684987  0.4818401
##   MAE      
##   0.6693536
##   0.6274496
##   0.5990362
##   0.5806300
##   0.5692617
##   0.5618710
##   0.5578578
##   0.5549705
##   0.5522950
##   0.5500891
##   0.5481299
##   0.5463266
##   0.5455160
##   0.5437695
##   0.5435762
##   0.5429259
##   0.5427975
##   0.5427577
##   0.5426541
##   0.6810479
##   0.6405847
##   0.6143237
##   0.5971396
##   0.5859434
##   0.5782820
##   0.5725587
##   0.5694167
##   0.5666103
##   0.5649419
##   0.5628340
##   0.5611790
##   0.5599578
##   0.5583834
##   0.5571357
##   0.5567332
##   0.5558540
##   0.5557011
##   0.5555212
##   0.6205456
##   0.5835808
##   0.5622831
##   0.5508115
##   0.5437736
##   0.5390406
##   0.5363979
##   0.5344535
##   0.5323755
##   0.5304821
##   0.5294357
##   0.5283121
##   0.5277132
##   0.5270900
##   0.5266832
##   0.5265111
##   0.5268211
##   0.5265770
##   0.5265215
##   0.6695530
##   0.6300367
##   0.6064017
##   0.5909745
##   0.5803969
##   0.5750286
##   0.5708530
##   0.5670699
##   0.5648574
##   0.5628058
##   0.5613395
##   0.5595374
##   0.5577044
##   0.5569553
##   0.5562781
##   0.5567306
##   0.5564601
##   0.5561453
##   0.5559356
##   0.6197370
##   0.5796928
##   0.5603241
##   0.5494230
##   0.5429427
##   0.5378745
##   0.5340388
##   0.5317976
##   0.5295642
##   0.5280563
##   0.5267435
##   0.5260471
##   0.5254545
##   0.5246981
##   0.5239339
##   0.5237143
##   0.5231572
##   0.5233957
##   0.5232101
##   0.6699282
##   0.6279786
##   0.6043387
##   0.5885560
##   0.5788536
##   0.5733976
##   0.5692084
##   0.5662876
##   0.5638733
##   0.5616239
##   0.5605242
##   0.5588479
##   0.5582222
##   0.5573152
##   0.5565258
##   0.5561486
##   0.5559258
##   0.5561572
##   0.5557056
##   0.6177703
##   0.5781390
##   0.5574636
##   0.5452604
##   0.5388041
##   0.5349349
##   0.5319535
##   0.5291170
##   0.5270831
##   0.5255024
##   0.5247650
##   0.5240813
##   0.5236829
##   0.5232816
##   0.5230103
##   0.5227653
##   0.5225928
##   0.5227819
##   0.5227327
##   0.6695407
##   0.6287360
##   0.6045540
##   0.5889442
##   0.5794382
##   0.5735577
##   0.5703986
##   0.5676093
##   0.5654919
##   0.5641587
##   0.5625836
##   0.5610258
##   0.5599017
##   0.5581880
##   0.5579995
##   0.5569359
##   0.5566738
##   0.5565242
##   0.5568133
##   0.5535232
##   0.5565581
##   0.5597103
##   0.5631183
##   0.5644581
##   0.5681129
##   0.5711066
##   0.5719502
##   0.5720739
##   0.5736196
##   0.5755803
##   0.5759443
##   0.5766939
##   0.5778021
##   0.5782349
##   0.5790959
##   0.5794930
##   0.5806018
##   0.5808018
##   0.5635400
##   0.5685202
##   0.5702298
##   0.5751029
##   0.5797174
##   0.5845101
##   0.5893326
##   0.5937401
##   0.5955435
##   0.5988326
##   0.6013750
##   0.6027988
##   0.6040039
##   0.6050816
##   0.6060279
##   0.6076136
##   0.6093571
##   0.6103493
##   0.6120680
##   0.5395250
##   0.5389886
##   0.5394795
##   0.5397209
##   0.5399441
##   0.5400663
##   0.5401641
##   0.5405018
##   0.5403660
##   0.5404457
##   0.5403869
##   0.5405612
##   0.5405987
##   0.5406563
##   0.5407018
##   0.5407625
##   0.5407826
##   0.5408205
##   0.5408707
##   0.5610795
##   0.5645953
##   0.5690792
##   0.5731626
##   0.5761319
##   0.5795435
##   0.5813704
##   0.5827195
##   0.5844897
##   0.5862410
##   0.5877619
##   0.5893819
##   0.5902934
##   0.5916969
##   0.5923694
##   0.5926506
##   0.5930527
##   0.5935363
##   0.5942392
##   0.5371885
##   0.5389483
##   0.5405084
##   0.5424078
##   0.5426563
##   0.5427944
##   0.5433265
##   0.5437527
##   0.5438916
##   0.5439480
##   0.5441288
##   0.5442476
##   0.5443388
##   0.5444813
##   0.5445662
##   0.5446081
##   0.5446137
##   0.5446503
##   0.5447035
##   0.5639665
##   0.5654285
##   0.5717677
##   0.5754112
##   0.5784644
##   0.5835478
##   0.5840624
##   0.5861608
##   0.5872658
##   0.5904127
##   0.5914346
##   0.5921018
##   0.5931346
##   0.5937509
##   0.5949836
##   0.5960804
##   0.5959579
##   0.5968853
##   0.5978519
##   0.5351406
##   0.5361751
##   0.5371364
##   0.5380257
##   0.5384851
##   0.5387527
##   0.5387313
##   0.5390157
##   0.5392294
##   0.5393146
##   0.5393858
##   0.5395080
##   0.5395514
##   0.5396037
##   0.5397062
##   0.5397496
##   0.5398181
##   0.5398299
##   0.5398482
##   0.5683145
##   0.5693486
##   0.5755584
##   0.5797045
##   0.5817843
##   0.5846819
##   0.5852263
##   0.5872347
##   0.5900088
##   0.5918224
##   0.5932891
##   0.5941301
##   0.5944857
##   0.5952697
##   0.5961450
##   0.5966344
##   0.5970321
##   0.5970143
##   0.5977415
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1000, interaction.depth =
##  5, shrinkage = 0.01 and n.minobsinnode = 10.

Make prediction on test set and evaluate performance of Boosted Trees:

gbmpred <- predict(gbmTune, newdata=test_chemical_x)
postResample(pred=gbmpred, obs=test_chemical_y)

##      RMSE  Rsquared       MAE 
## 0.6095757 0.6183966 0.4866248

Fit Cubist model:

cubistGrid <- expand.grid(committees = c(1, 5, 10), 
                          neighbors = c(0, 1, 3, 5))

set.seed(100)

cubistTune <- train(train_chemical_x, train_chemical_y,
method = "cubist",
preProc = c("center", "scale"),
tuneGrid = cubistGrid,
verbose = FALSE)

cubistTune

## Cubist 
## 
## 124 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ... 
## Resampling results across tuning parameters:
## 
##   committees  neighbors  RMSE       Rsquared   MAE      
##    1          0          1.0002956  0.3554524  0.7248086
##    1          1          0.9899484  0.3824150  0.7010254
##    1          3          0.9770161  0.3818919  0.6939003
##    1          5          0.9829963  0.3743484  0.7036435
##    5          0          0.7638949  0.5146604  0.5882052
##    5          1          0.7386180  0.5445762  0.5574313
##    5          3          0.7401463  0.5381512  0.5619807
##    5          5          0.7449943  0.5324229  0.5690466
##   10          0          0.7207429  0.5569843  0.5578867
##   10          1          0.6919859  0.5889563  0.5218133
##   10          3          0.6947846  0.5836795  0.5306527
##   10          5          0.7001805  0.5775887  0.5379893
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 10 and neighbors = 1.

Make prediction on test set and evaluate performance of Cubist model:

cubistpred <- predict(cubistTune, newdata=test_chemical_x)
postResample(pred=cubistpred, obs=test_chemical_y)

##      RMSE  Rsquared       MAE 
## 0.7045971 0.5916504 0.4809221

Combine results of different models into a single table:

# Get results
#ridge_res <- postResample(pred=ridgepred, obs=test_chemical_y)
#svm_res<-postResample(pred = svmRPred, obs = test_chemical_y)
rpart_res<- postResample(pred=rpartpred, obs=test_chemical_y)
rf_res <- postResample(pred=rfpred, obs=test_chemical_y)
gbm_res<- postResample(pred=gbmpred, obs=test_chemical_y)
cubist_res<-postResample(pred=cubistpred, obs=test_chemical_y)

# Combine into a sigle table
all_results<- rbind(
  #ridge_linear = ridge_res,
  #svm_nonlinear = svm_res,
  rpart = rpart_res,
  "random forest" = rf_res,
  gbm = gbm_res,
  cubist= cubist_res
)

# Convert to a data frame 
results <- as.data.frame(all_results)

# See results
print(results)

##                    RMSE  Rsquared       MAE
## rpart         0.7373570 0.3977679 0.5922980
## random forest 0.5519336 0.6419386 0.4468046
## gbm           0.6095757 0.6183966 0.4866248
## cubist        0.7045971 0.5916504 0.4809221

The Random Forest model is observed to perform best, as it has the lowest RMSE value and the highest R-squared value among all models.

(b) Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?

Top 10 predictors for Random Forest:

# Get predictors importance from random forest model
imp_scores <- varImp(rfmodel)
top10 <- head(imp_scores[order(-imp_scores$Overall), , drop = FALSE], 10)
top10

##                          Overall
## ManufacturingProcess32 21.930907
## BiologicalMaterial12   10.243346
## ManufacturingProcess31  9.851923
## ManufacturingProcess17  9.773773
## ManufacturingProcess13  8.650226
## ManufacturingProcess36  7.739404
## ManufacturingProcess09  7.258026
## BiologicalMaterial06    7.092092
## BiologicalMaterial03    7.089226
## BiologicalMaterial02    6.648094

Top 10 predictors for optimal linear model (ridge model):

varImp(ridge_model)

## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 56)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess13   94.64
## BiologicalMaterial06     76.19
## ManufacturingProcess17   73.26
## ManufacturingProcess31   69.73
## ManufacturingProcess09   69.65
## BiologicalMaterial12     66.34
## ManufacturingProcess36   66.21
## BiologicalMaterial02     66.06
## BiologicalMaterial03     63.32
## ManufacturingProcess11   57.92
## ManufacturingProcess06   53.69
## ManufacturingProcess30   50.51
## BiologicalMaterial04     50.26
## ManufacturingProcess29   44.79
## BiologicalMaterial08     44.11
## BiologicalMaterial09     42.22
## BiologicalMaterial11     39.01
## ManufacturingProcess33   38.69
## ManufacturingProcess02   37.76

Top 10 predictors for optimal nonlinear (svm model) model:

varImp(svmRTuned)

## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 56)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess13   94.64
## BiologicalMaterial06     76.19
## ManufacturingProcess17   73.26
## ManufacturingProcess31   69.73
## ManufacturingProcess09   69.65
## BiologicalMaterial12     66.34
## ManufacturingProcess36   66.21
## BiologicalMaterial02     66.06
## BiologicalMaterial03     63.32
## ManufacturingProcess11   57.92
## ManufacturingProcess06   53.69
## ManufacturingProcess30   50.51
## BiologicalMaterial04     50.26
## ManufacturingProcess29   44.79
## BiologicalMaterial08     44.11
## BiologicalMaterial09     42.22
## BiologicalMaterial11     39.01
## ManufacturingProcess33   38.69
## ManufacturingProcess02   37.76

The top five predictors namely ManufacturingProcess32, BiologicalMaterial12, ManufacturingProcess31, ManufacturingProcess17, and ManufacturingProcess13 are identified as the most important for the Random Forest model. The process variables dominate the list in this model. Across all three models (random Forest, optimal linear (ridge), and optimal nonlinear (SVM)), the top 10 variables remain the same. They only differ in their order for nine variables, except for the top predictor, ManufacturingProcess32. This predictor consistently remains the best performer. It is also observed that the process variables dominate the importance lists across all three models.

(c) Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?

plot optimal single Recursive Partitioning Decision Tree:

plot(as.party(rpartTune$finalModel),gp=gpar(fontsize=8))

This view of the data provides valuable insights into the roles of biological and process predictors in determining yield. From the plot, it is observed that the root node begins with the ManufacturingProcess32 predictor, splitting initially between a process variable and a biological material variable. Throughout the splitting process, biological variables exhibit some dominance in explaining the broader variability in the response. This is confirmed at the terminal nodes, where seven out of the ten nodes are determined by biological variables. This shows that biological variables are important for making detailed and specific predictions.

Data624_HW09

Mahmud Hasan Al Raji

2024-11-17