library(tidyverse)
library(fpp2)
library(urca)
library(rio)
library(gridExtra)
library(caret)
library(randomForest)
library(glmnet)
library(mlbench)
library(AppliedPredictiveModeling)
library(party)
library(gbm)
library(Cubist)
library(rpart)
seed <- 200

1 HW9: Regression Trees and Rule-Based Models

Do problems 8.1, 8.2, 8.3, and 8.7 in Kuhn and Johnson. Please submit the Rpubs link along with the .rmd file.

1.1 Ex. 8.1

Recreate the simulated data from Exercise 7.2:

#library(mlbench)
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"

1.1.1 Part a

Fit a random forest model to all of the predictors, then estimate the variable importance scores. Did the random forest model significantly use the uninformative predictors (V6 – V10)?

Answer:

From the results below, the random forest model did not significantly use the uninformative predictors V6 to V10.

#library(randomForest)
#library(caret)
model1 <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
rfImp1 <- varImp(model1, scale = FALSE)
rfImp1

1.1.2 Part b

Now add an additional predictor that is highly correlated with one of the informative predictors. For example:

simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V1)

## [1] 0.9460206

Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?

Answer:

When I added another predictor duplicate1 into the model, the importance of V1 is reduced and so as all others. The importance of V1 is reduced from 8.69 to 6.02, by 2.67, which is about 30%. This is resulted from adding the highly correlated predictor duplicate1 to the model.

model2 <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
rfImp2 <- varImp(model2, scale = FALSE)
rfImp2

1.1.3 Part c

Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?

Answer:

As given by the question, we used conditional inference trees in this cforest function.

The result importances are in a different pattern as the traditional random forest model. The importance of V1 reduced to 3.25 as the correlation between predictors V1 and duplicate1 are being considered. The importance of V2 and V4 are still high while the unimportant predictors V6 to V10 are still low.

This model is better than traditional random forest model when there are highly correlated predictors.

model3 <- cforest(y ~ ., data = simulated, controls = cforest_unbiased(ntree=1000))
rfImp3 <- varimp(model3, conditional = TRUE)
rfImp3

##           V1           V2           V3           V4           V5 
##  1.832916297  4.771944496  0.014222304  5.933231943  1.055686258 
##           V6           V7           V8           V9          V10 
##  0.009163044  0.004786348 -0.002685858  0.003285658 -0.017157917 
##   duplicate1 
##  1.983987681

1.1.4 Part d

Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?

Answer:

Similar to the models above, the boosted trees model and Cubist model also have low importance in predictors V6 to V10.

Boosted Trees model (gbm) is similar to cforest model that it has V4 as the highest importance as the importance from V1 are slightly shared with the predictor duplicate1.

Cubist model (cubist) is similar to the traditional random forest model that still has V1 as the top important model but also ranked 0 importance on the predictor duplicate1.

#boosted trees
model4 <- gbm(y ~ ., data = simulated, distribution="gaussian")
summary(model4)

#cubist
model5 <- cubist(x=simulated[, names(simulated)!="y"], y=simulated$y, committees=100)
rfImp5 <- varImp(model5)
rfImp5

1.2 Ex. 8.2

Use a simulation to show tree bias with different granularities.

Answer:

From the textbook, we know that trees suffer from selection bias: Predictors with a higher number of distinct values are favored over more granular predictors. Also, as the number of missing values increases, the selection of predictors becomes more biased. (Sec. 8.1 page 182)

I will use sample function to simulate 5 variables with different granularities.

From the results below, we can see that x1 with higher number of distinct values are favored as it has the highest importance among all predictors, while X2, X3, and X4 are much less important. X5 has no distinct values except 1 so it was deemed unimportant. Thus, the result proves that tree model suffers from selection bias as it favors higher number of distinct values.

set.seed(200)

x1 <- sample(0:10000, 500, replace=TRUE)
x2 <- sample(0:1000,  500, replace=TRUE)
x3 <- sample(0:100,   500, replace=TRUE)
x4 <- sample(0:10,    500, replace=TRUE)
x5 <- sample(1,       500, replace=TRUE)
y <- x1+x2+x3+x4+x5+rnorm(500)

df <- data.frame(x1,x2,x3,x4,x5,y)
str(df)

## 'data.frame':    500 obs. of  6 variables:
##  $ x1: int  2213 5489 5870 5851 6324 1561 9781 9783 8730 3755 ...
##  $ x2: int  409 115 880 59 617 940 343 465 685 570 ...
##  $ x3: int  36 41 75 96 99 80 86 79 7 66 ...
##  $ x4: int  5 4 7 6 8 4 8 1 2 4 ...
##  $ x5: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ y : num  2665 5649 6835 6014 7051 ...

#rpart
tree <- rpart(y ~ ., data = df)
varImp(tree)

1.3 Ex. 8.3

In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:

Fig. 8.24

1.3.1 Part a

Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?

Answer:

Boosting is a method of converting weak learners into strong learners. In boosting, each new tree is a fit on a modified version of the original data set. Gradient Boosting trains many models in a gradual, additive and sequential manner. Gradient boosting performs the identifies the shortcomings of weak learners by using gradients in the loss function (y=ax+b+error). The loss function is a measure indicating how good are model’s coefficients are at fitting the underlying data.

Bagging fraction is the fraction of the training set observations randomly selected to propose the next tree in the expansion. Large bagging fraction may have the most explanatory variables involved which they may have too much importance. Small bagging fraction allows other variables to be modeled in the next step with randomness.

Learning rate, also called shrinkage, is used for reducing or shrinking the impact of each additional fitted base-learner (tree). It reduces the size of incremental steps, slows down the learning, and thus penalizes the importance of each consecutive iteration. It is better to improve a model by taking many small steps than by taking fewer large steps. Small learning rate reduces overfitting.

Back to the question, the model on the left has both parameters set to 0.1, which allows only 10% impact from the previous tree and allows 90% randomness of variables to be modeled in the next step. Thus more variables are being counted in each steps, and therefore in the final tree importance. The model on the right has both parameters set to 0.9, which each step focus 90% on the previous tree and uses 90% variables from the previous tree. Therefore it has most importance on only few variables.

1.3.2 Part b

Which model do you think would be more predictive of other samples?

Answer:

A model with small bagging fraction and small learning rate allows better generalization as it considers more variables and takes small steps to learn in the process and has a lower chance of overfitting. The model on the left with both parameters set to 0.1 would be more predictive of other samples than the one with parameters set to 0.9.

1.3.3 Part c

How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?

Answer:

Increasing interaction depth, which is also called tree depth, will allow more predictors to be included in the model, which would brings more importances across more predictors. That is, similar to the model on the left. The slope of the predictor importance will therefore be more flattened (with smaller slope).

1.4 Ex. 8.7

Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:

(a.) Which tree-based regression model gives the optimal resampling and test set performance?

(b.) Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?

(c.) Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?

1.4.1 Data Pre-Processing

The matrix ChemicalManufacturingProcess contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs, plus the variable yield which contains the percent yield for each run.

A small percentage of cells in the predictor set contain missing values. Used a knn imputation function to fill in these missing values.

data(ChemicalManufacturingProcess)
summary(ChemicalManufacturingProcess)

##      Yield       BiologicalMaterial01 BiologicalMaterial02
##  Min.   :35.25   Min.   :4.580        Min.   :46.87       
##  1st Qu.:38.75   1st Qu.:5.978        1st Qu.:52.68       
##  Median :39.97   Median :6.305        Median :55.09       
##  Mean   :40.18   Mean   :6.411        Mean   :55.69       
##  3rd Qu.:41.48   3rd Qu.:6.870        3rd Qu.:58.74       
##  Max.   :46.34   Max.   :8.810        Max.   :64.75       
##                                                           
##  BiologicalMaterial03 BiologicalMaterial04 BiologicalMaterial05
##  Min.   :56.97        Min.   : 9.38        Min.   :13.24       
##  1st Qu.:64.98        1st Qu.:11.24        1st Qu.:17.23       
##  Median :67.22        Median :12.10        Median :18.49       
##  Mean   :67.70        Mean   :12.35        Mean   :18.60       
##  3rd Qu.:70.43        3rd Qu.:13.22        3rd Qu.:19.90       
##  Max.   :78.25        Max.   :23.09        Max.   :24.85       
##                                                                
##  BiologicalMaterial06 BiologicalMaterial07 BiologicalMaterial08
##  Min.   :40.60        Min.   :100.0        Min.   :15.88       
##  1st Qu.:46.05        1st Qu.:100.0        1st Qu.:17.06       
##  Median :48.46        Median :100.0        Median :17.51       
##  Mean   :48.91        Mean   :100.0        Mean   :17.49       
##  3rd Qu.:51.34        3rd Qu.:100.0        3rd Qu.:17.88       
##  Max.   :59.38        Max.   :100.8        Max.   :19.14       
##                                                                
##  BiologicalMaterial09 BiologicalMaterial10 BiologicalMaterial11
##  Min.   :11.44        Min.   :1.770        Min.   :135.8       
##  1st Qu.:12.60        1st Qu.:2.460        1st Qu.:143.8       
##  Median :12.84        Median :2.710        Median :146.1       
##  Mean   :12.85        Mean   :2.801        Mean   :147.0       
##  3rd Qu.:13.13        3rd Qu.:2.990        3rd Qu.:149.6       
##  Max.   :14.08        Max.   :6.870        Max.   :158.7       
##                                                                
##  BiologicalMaterial12 ManufacturingProcess01 ManufacturingProcess02
##  Min.   :18.35        Min.   : 0.00          Min.   : 0.00         
##  1st Qu.:19.73        1st Qu.:10.80          1st Qu.:19.30         
##  Median :20.12        Median :11.40          Median :21.00         
##  Mean   :20.20        Mean   :11.21          Mean   :16.68         
##  3rd Qu.:20.75        3rd Qu.:12.15          3rd Qu.:21.50         
##  Max.   :22.21        Max.   :14.10          Max.   :22.50         
##                       NA's   :1              NA's   :3             
##  ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05
##  Min.   :1.47           Min.   :911.0          Min.   : 923.0        
##  1st Qu.:1.53           1st Qu.:928.0          1st Qu.: 986.8        
##  Median :1.54           Median :934.0          Median : 999.2        
##  Mean   :1.54           Mean   :931.9          Mean   :1001.7        
##  3rd Qu.:1.55           3rd Qu.:936.0          3rd Qu.:1008.9        
##  Max.   :1.60           Max.   :946.0          Max.   :1175.3        
##  NA's   :15             NA's   :1              NA's   :1             
##  ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08
##  Min.   :203.0          Min.   :177.0          Min.   :177.0         
##  1st Qu.:205.7          1st Qu.:177.0          1st Qu.:177.0         
##  Median :206.8          Median :177.0          Median :178.0         
##  Mean   :207.4          Mean   :177.5          Mean   :177.6         
##  3rd Qu.:208.7          3rd Qu.:178.0          3rd Qu.:178.0         
##  Max.   :227.4          Max.   :178.0          Max.   :178.0         
##  NA's   :2              NA's   :1              NA's   :1             
##  ManufacturingProcess09 ManufacturingProcess10 ManufacturingProcess11
##  Min.   :38.89          Min.   : 7.500         Min.   : 7.500        
##  1st Qu.:44.89          1st Qu.: 8.700         1st Qu.: 9.000        
##  Median :45.73          Median : 9.100         Median : 9.400        
##  Mean   :45.66          Mean   : 9.179         Mean   : 9.386        
##  3rd Qu.:46.52          3rd Qu.: 9.550         3rd Qu.: 9.900        
##  Max.   :49.36          Max.   :11.600         Max.   :11.500        
##                         NA's   :9              NA's   :10            
##  ManufacturingProcess12 ManufacturingProcess13 ManufacturingProcess14
##  Min.   :   0.0         Min.   :32.10          Min.   :4701          
##  1st Qu.:   0.0         1st Qu.:33.90          1st Qu.:4828          
##  Median :   0.0         Median :34.60          Median :4856          
##  Mean   : 857.8         Mean   :34.51          Mean   :4854          
##  3rd Qu.:   0.0         3rd Qu.:35.20          3rd Qu.:4882          
##  Max.   :4549.0         Max.   :38.60          Max.   :5055          
##  NA's   :1                                     NA's   :1             
##  ManufacturingProcess15 ManufacturingProcess16 ManufacturingProcess17
##  Min.   :5904           Min.   :   0           Min.   :31.30         
##  1st Qu.:6010           1st Qu.:4561           1st Qu.:33.50         
##  Median :6032           Median :4588           Median :34.40         
##  Mean   :6039           Mean   :4566           Mean   :34.34         
##  3rd Qu.:6061           3rd Qu.:4619           3rd Qu.:35.10         
##  Max.   :6233           Max.   :4852           Max.   :40.00         
##                                                                      
##  ManufacturingProcess18 ManufacturingProcess19 ManufacturingProcess20
##  Min.   :   0           Min.   :5890           Min.   :   0          
##  1st Qu.:4813           1st Qu.:6001           1st Qu.:4553          
##  Median :4835           Median :6022           Median :4582          
##  Mean   :4810           Mean   :6028           Mean   :4556          
##  3rd Qu.:4862           3rd Qu.:6050           3rd Qu.:4610          
##  Max.   :4971           Max.   :6146           Max.   :4759          
##                                                                      
##  ManufacturingProcess21 ManufacturingProcess22 ManufacturingProcess23
##  Min.   :-1.8000        Min.   : 0.000         Min.   :0.000         
##  1st Qu.:-0.6000        1st Qu.: 3.000         1st Qu.:2.000         
##  Median :-0.3000        Median : 5.000         Median :3.000         
##  Mean   :-0.1642        Mean   : 5.406         Mean   :3.017         
##  3rd Qu.: 0.0000        3rd Qu.: 8.000         3rd Qu.:4.000         
##  Max.   : 3.6000        Max.   :12.000         Max.   :6.000         
##                         NA's   :1              NA's   :1             
##  ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26
##  Min.   : 0.000         Min.   :   0           Min.   :   0          
##  1st Qu.: 4.000         1st Qu.:4832           1st Qu.:6020          
##  Median : 8.000         Median :4855           Median :6047          
##  Mean   : 8.834         Mean   :4828           Mean   :6016          
##  3rd Qu.:14.000         3rd Qu.:4877           3rd Qu.:6070          
##  Max.   :23.000         Max.   :4990           Max.   :6161          
##  NA's   :1              NA's   :5              NA's   :5             
##  ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29
##  Min.   :   0           Min.   : 0.000         Min.   : 0.00         
##  1st Qu.:4560           1st Qu.: 0.000         1st Qu.:19.70         
##  Median :4587           Median :10.400         Median :19.90         
##  Mean   :4563           Mean   : 6.592         Mean   :20.01         
##  3rd Qu.:4609           3rd Qu.:10.750         3rd Qu.:20.40         
##  Max.   :4710           Max.   :11.500         Max.   :22.00         
##  NA's   :5              NA's   :5              NA's   :5             
##  ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess32
##  Min.   : 0.000         Min.   : 0.00          Min.   :143.0         
##  1st Qu.: 8.800         1st Qu.:70.10          1st Qu.:155.0         
##  Median : 9.100         Median :70.80          Median :158.0         
##  Mean   : 9.161         Mean   :70.18          Mean   :158.5         
##  3rd Qu.: 9.700         3rd Qu.:71.40          3rd Qu.:162.0         
##  Max.   :11.200         Max.   :72.50          Max.   :173.0         
##  NA's   :5              NA's   :5                                    
##  ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35
##  Min.   :56.00          Min.   :2.300          Min.   :463.0         
##  1st Qu.:62.00          1st Qu.:2.500          1st Qu.:490.0         
##  Median :64.00          Median :2.500          Median :495.0         
##  Mean   :63.54          Mean   :2.494          Mean   :495.6         
##  3rd Qu.:65.00          3rd Qu.:2.500          3rd Qu.:501.5         
##  Max.   :70.00          Max.   :2.600          Max.   :522.0         
##  NA's   :5              NA's   :5              NA's   :5             
##  ManufacturingProcess36 ManufacturingProcess37 ManufacturingProcess38
##  Min.   :0.01700        Min.   :0.000          Min.   :0.000         
##  1st Qu.:0.01900        1st Qu.:0.700          1st Qu.:2.000         
##  Median :0.02000        Median :1.000          Median :3.000         
##  Mean   :0.01957        Mean   :1.014          Mean   :2.534         
##  3rd Qu.:0.02000        3rd Qu.:1.300          3rd Qu.:3.000         
##  Max.   :0.02200        Max.   :2.300          Max.   :3.000         
##  NA's   :5                                                           
##  ManufacturingProcess39 ManufacturingProcess40 ManufacturingProcess41
##  Min.   :0.000          Min.   :0.00000        Min.   :0.00000       
##  1st Qu.:7.100          1st Qu.:0.00000        1st Qu.:0.00000       
##  Median :7.200          Median :0.00000        Median :0.00000       
##  Mean   :6.851          Mean   :0.01771        Mean   :0.02371       
##  3rd Qu.:7.300          3rd Qu.:0.00000        3rd Qu.:0.00000       
##  Max.   :7.500          Max.   :0.10000        Max.   :0.20000       
##                         NA's   :1              NA's   :1             
##  ManufacturingProcess42 ManufacturingProcess43 ManufacturingProcess44
##  Min.   : 0.00          Min.   : 0.0000        Min.   :0.000         
##  1st Qu.:11.40          1st Qu.: 0.6000        1st Qu.:1.800         
##  Median :11.60          Median : 0.8000        Median :1.900         
##  Mean   :11.21          Mean   : 0.9119        Mean   :1.805         
##  3rd Qu.:11.70          3rd Qu.: 1.0250        3rd Qu.:1.900         
##  Max.   :12.10          Max.   :11.0000        Max.   :2.100         
##                                                                      
##  ManufacturingProcess45
##  Min.   :0.000         
##  1st Qu.:2.100         
##  Median :2.200         
##  Mean   :2.138         
##  3rd Qu.:2.300         
##  Max.   :2.600         
##

predictors <- ChemicalManufacturingProcess[,-c(1)]

#fill in missing values from textbook sec3.8 
cmp_pre <- preProcess(predictors, method="knnImpute") 
#apply the transformations
cmp_predictors <- predict(cmp_pre, predictors)

Split the data into a training and a test set, pre-process the data, and tune models

Pre-process the data with centering and scaling.

cmp_pre <- preProcess(cmp_predictors, method=c("center", "scale"))
cmp_predictors <- predict(cmp_pre, cmp_predictors)

Train-test split at 70%

set.seed(200)
trainingRows <- createDataPartition(ChemicalManufacturingProcess$Yield, 
                                    p=0.70, list=FALSE) #caret, textbook sec4.9
train_X <- cmp_predictors[trainingRows, ]
train_Y <- ChemicalManufacturingProcess$Yield[trainingRows]
test_X <- cmp_predictors[-trainingRows, ]
test_Y <- ChemicalManufacturingProcess$Yield[-trainingRows]

1.4.2 Models

Single Tree

set.seed(seed)
stModel <- train(x = train_X, y = train_Y, method = "rpart",
                  tuneLength = 10, control=rpart.control(maxdepth=2))
stModel

## CART 
## 
## 124 samples
##  57 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ... 
## Resampling results across tuning parameters:
## 
##   cp          RMSE      Rsquared   MAE     
##   0.00000000  1.518103  0.3266369  1.202472
##   0.04395540  1.520401  0.3253569  1.206507
##   0.08791079  1.526268  0.3148964  1.214521
##   0.13186619  1.530143  0.3014006  1.213307
##   0.17582158  1.534424  0.2986228  1.220919
##   0.21977698  1.539533  0.2950447  1.226515
##   0.26373237  1.539533  0.2950447  1.226515
##   0.30768777  1.568565  0.2883506  1.251475
##   0.35164316  1.629782  0.2853446  1.308522
##   0.39559856  1.676726  0.2834380  1.351730
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.

stModel$results[which(stModel$results$cp==stModel$bestTune$cp),]

varImp(stModel)

## rpart variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess31  100.00
## BiologicalMaterial12     75.47
## ManufacturingProcess17   70.32
## ManufacturingProcess28   64.15
## ManufacturingProcess32   60.73
## BiologicalMaterial11     37.69
## BiologicalMaterial03     37.00
## ManufacturingProcess09   34.69
## ManufacturingProcess15   28.21
## ManufacturingProcess06   28.17
## ManufacturingProcess24    0.00
## ManufacturingProcess40    0.00
## ManufacturingProcess21    0.00
## BiologicalMaterial02      0.00
## ManufacturingProcess26    0.00
## ManufacturingProcess39    0.00
## ManufacturingProcess18    0.00
## BiologicalMaterial05      0.00
## BiologicalMaterial09      0.00
## ManufacturingProcess04    0.00

stPred <- predict(stModel, newdata = test_X)
postResample(pred = stPred, obs = test_Y)

##      RMSE  Rsquared       MAE 
## 1.4857317 0.4525266 1.1270232

Random Forest

set.seed(seed)

rfModel <- train(x = train_X, y = train_Y, method = "rf", tuneLength = 10)
rfModel

## Random Forest 
## 
## 124 samples
##  57 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE      
##    2    1.316662  0.5519327  1.0379859
##    8    1.246818  0.5747840  0.9699150
##   14    1.233179  0.5773851  0.9522117
##   20    1.227264  0.5740008  0.9432728
##   26    1.222440  0.5738101  0.9356269
##   32    1.224043  0.5684739  0.9394338
##   38    1.226168  0.5641929  0.9410744
##   44    1.232027  0.5566863  0.9473137
##   50    1.239039  0.5485328  0.9527932
##   57    1.245284  0.5420043  0.9600014
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 26.

rfModel$results[which(rfModel$results$mtry==rfModel$bestTune$mtry),]

varImp(rfModel)

## rf variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess32 100.000
## ManufacturingProcess31  23.708
## BiologicalMaterial12    21.664
## BiologicalMaterial03    19.208
## ManufacturingProcess17  18.552
## BiologicalMaterial06    14.834
## ManufacturingProcess28  11.787
## ManufacturingProcess13  11.767
## ManufacturingProcess09  10.755
## ManufacturingProcess06  10.622
## BiologicalMaterial04     8.729
## BiologicalMaterial11     8.145
## ManufacturingProcess36   6.385
## ManufacturingProcess11   6.018
## ManufacturingProcess15   5.815
## BiologicalMaterial09     5.666
## ManufacturingProcess18   5.413
## BiologicalMaterial08     5.361
## ManufacturingProcess21   5.105
## ManufacturingProcess39   4.935

rfPred <- predict(rfModel, newdata = test_X)
postResample(pred = rfPred, obs = test_Y)

##      RMSE  Rsquared       MAE 
## 1.2303142 0.6837008 0.9528876

Gradient Boosting

set.seed(seed)

gbmGrid <- expand.grid(interaction.depth=seq(1,7,by=2),
                       n.trees=seq(100,1000,by=50),
                       shrinkage=c(0.01,0.1),
                       n.minobsinnode=c(5,10))
gbModel <- train(x = train_X, y = train_Y, method = "gbm", tuneGrid = gbmGrid, verbose=FALSE)
gbModel

## Stochastic Gradient Boosting 
## 
## 124 samples
##  57 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ... 
## Resampling results across tuning parameters:
## 
##   shrinkage  interaction.depth  n.minobsinnode  n.trees  RMSE    
##   0.01       1                   5               100     1.480277
##   0.01       1                   5               150     1.405650
##   0.01       1                   5               200     1.355092
##   0.01       1                   5               250     1.319146
##   0.01       1                   5               300     1.295846
##   0.01       1                   5               350     1.276031
##   0.01       1                   5               400     1.262732
##   0.01       1                   5               450     1.252730
##   0.01       1                   5               500     1.245207
##   0.01       1                   5               550     1.239166
##   0.01       1                   5               600     1.233629
##   0.01       1                   5               650     1.230084
##   0.01       1                   5               700     1.227514
##   0.01       1                   5               750     1.224667
##   0.01       1                   5               800     1.222653
##   0.01       1                   5               850     1.219236
##   0.01       1                   5               900     1.217531
##   0.01       1                   5               950     1.216385
##   0.01       1                   5              1000     1.214763
##   0.01       1                  10               100     1.476244
##   0.01       1                  10               150     1.399333
##   0.01       1                  10               200     1.350291
##   0.01       1                  10               250     1.317951
##   0.01       1                  10               300     1.296651
##   0.01       1                  10               350     1.281203
##   0.01       1                  10               400     1.270028
##   0.01       1                  10               450     1.260320
##   0.01       1                  10               500     1.253534
##   0.01       1                  10               550     1.247147
##   0.01       1                  10               600     1.242175
##   0.01       1                  10               650     1.238170
##   0.01       1                  10               700     1.235831
##   0.01       1                  10               750     1.231925
##   0.01       1                  10               800     1.229383
##   0.01       1                  10               850     1.227090
##   0.01       1                  10               900     1.224215
##   0.01       1                  10               950     1.222430
##   0.01       1                  10              1000     1.220555
##   0.01       3                   5               100     1.391434
##   0.01       3                   5               150     1.318215
##   0.01       3                   5               200     1.275600
##   0.01       3                   5               250     1.247046
##   0.01       3                   5               300     1.230361
##   0.01       3                   5               350     1.217753
##   0.01       3                   5               400     1.207990
##   0.01       3                   5               450     1.200779
##   0.01       3                   5               500     1.194489
##   0.01       3                   5               550     1.190947
##   0.01       3                   5               600     1.186986
##   0.01       3                   5               650     1.184187
##   0.01       3                   5               700     1.181125
##   0.01       3                   5               750     1.177897
##   0.01       3                   5               800     1.175333
##   0.01       3                   5               850     1.173383
##   0.01       3                   5               900     1.171773
##   0.01       3                   5               950     1.170630
##   0.01       3                   5              1000     1.168734
##   0.01       3                  10               100     1.389842
##   0.01       3                  10               150     1.314983
##   0.01       3                  10               200     1.275882
##   0.01       3                  10               250     1.252611
##   0.01       3                  10               300     1.237562
##   0.01       3                  10               350     1.227288
##   0.01       3                  10               400     1.219444
##   0.01       3                  10               450     1.212378
##   0.01       3                  10               500     1.207135
##   0.01       3                  10               550     1.203530
##   0.01       3                  10               600     1.199398
##   0.01       3                  10               650     1.195837
##   0.01       3                  10               700     1.193297
##   0.01       3                  10               750     1.190485
##   0.01       3                  10               800     1.188157
##   0.01       3                  10               850     1.186210
##   0.01       3                  10               900     1.184725
##   0.01       3                  10               950     1.183036
##   0.01       3                  10              1000     1.181317
##   0.01       5                   5               100     1.377714
##   0.01       5                   5               150     1.303842
##   0.01       5                   5               200     1.264003
##   0.01       5                   5               250     1.238545
##   0.01       5                   5               300     1.221581
##   0.01       5                   5               350     1.210239
##   0.01       5                   5               400     1.202022
##   0.01       5                   5               450     1.195110
##   0.01       5                   5               500     1.190129
##   0.01       5                   5               550     1.185702
##   0.01       5                   5               600     1.182469
##   0.01       5                   5               650     1.179294
##   0.01       5                   5               700     1.176763
##   0.01       5                   5               750     1.175219
##   0.01       5                   5               800     1.173553
##   0.01       5                   5               850     1.172043
##   0.01       5                   5               900     1.170960
##   0.01       5                   5               950     1.170169
##   0.01       5                   5              1000     1.169261
##   0.01       5                  10               100     1.381348
##   0.01       5                  10               150     1.310417
##   0.01       5                  10               200     1.272649
##   0.01       5                  10               250     1.248859
##   0.01       5                  10               300     1.233604
##   0.01       5                  10               350     1.222545
##   0.01       5                  10               400     1.213892
##   0.01       5                  10               450     1.208156
##   0.01       5                  10               500     1.202792
##   0.01       5                  10               550     1.198690
##   0.01       5                  10               600     1.194886
##   0.01       5                  10               650     1.191626
##   0.01       5                  10               700     1.189293
##   0.01       5                  10               750     1.187027
##   0.01       5                  10               800     1.185103
##   0.01       5                  10               850     1.183326
##   0.01       5                  10               900     1.182079
##   0.01       5                  10               950     1.181316
##   0.01       5                  10              1000     1.180262
##   0.01       7                   5               100     1.369447
##   0.01       7                   5               150     1.296516
##   0.01       7                   5               200     1.258112
##   0.01       7                   5               250     1.234191
##   0.01       7                   5               300     1.218584
##   0.01       7                   5               350     1.207082
##   0.01       7                   5               400     1.199157
##   0.01       7                   5               450     1.193588
##   0.01       7                   5               500     1.187955
##   0.01       7                   5               550     1.183680
##   0.01       7                   5               600     1.180914
##   0.01       7                   5               650     1.178196
##   0.01       7                   5               700     1.176420
##   0.01       7                   5               750     1.174903
##   0.01       7                   5               800     1.173505
##   0.01       7                   5               850     1.172021
##   0.01       7                   5               900     1.171323
##   0.01       7                   5               950     1.170875
##   0.01       7                   5              1000     1.170281
##   0.01       7                  10               100     1.388818
##   0.01       7                  10               150     1.317273
##   0.01       7                  10               200     1.280265
##   0.01       7                  10               250     1.257073
##   0.01       7                  10               300     1.241434
##   0.01       7                  10               350     1.230961
##   0.01       7                  10               400     1.221933
##   0.01       7                  10               450     1.215610
##   0.01       7                  10               500     1.209952
##   0.01       7                  10               550     1.205849
##   0.01       7                  10               600     1.202015
##   0.01       7                  10               650     1.198899
##   0.01       7                  10               700     1.196131
##   0.01       7                  10               750     1.194262
##   0.01       7                  10               800     1.191174
##   0.01       7                  10               850     1.189093
##   0.01       7                  10               900     1.188075
##   0.01       7                  10               950     1.186651
##   0.01       7                  10              1000     1.185664
##   0.10       1                   5               100     1.236296
##   0.10       1                   5               150     1.229715
##   0.10       1                   5               200     1.226412
##   0.10       1                   5               250     1.227304
##   0.10       1                   5               300     1.229313
##   0.10       1                   5               350     1.233080
##   0.10       1                   5               400     1.233562
##   0.10       1                   5               450     1.239474
##   0.10       1                   5               500     1.240238
##   0.10       1                   5               550     1.244110
##   0.10       1                   5               600     1.245817
##   0.10       1                   5               650     1.246932
##   0.10       1                   5               700     1.248729
##   0.10       1                   5               750     1.249701
##   0.10       1                   5               800     1.250663
##   0.10       1                   5               850     1.251227
##   0.10       1                   5               900     1.251879
##   0.10       1                   5               950     1.252762
##   0.10       1                   5              1000     1.253236
##   0.10       1                  10               100     1.229125
##   0.10       1                  10               150     1.228103
##   0.10       1                  10               200     1.224388
##   0.10       1                  10               250     1.225189
##   0.10       1                  10               300     1.224493
##   0.10       1                  10               350     1.225377
##   0.10       1                  10               400     1.225896
##   0.10       1                  10               450     1.229155
##   0.10       1                  10               500     1.230161
##   0.10       1                  10               550     1.232920
##   0.10       1                  10               600     1.233681
##   0.10       1                  10               650     1.235450
##   0.10       1                  10               700     1.235377
##   0.10       1                  10               750     1.235835
##   0.10       1                  10               800     1.236428
##   0.10       1                  10               850     1.237547
##   0.10       1                  10               900     1.237875
##   0.10       1                  10               950     1.238537
##   0.10       1                  10              1000     1.238830
##   0.10       3                   5               100     1.188394
##   0.10       3                   5               150     1.183286
##   0.10       3                   5               200     1.181185
##   0.10       3                   5               250     1.179614
##   0.10       3                   5               300     1.179219
##   0.10       3                   5               350     1.179188
##   0.10       3                   5               400     1.179453
##   0.10       3                   5               450     1.179529
##   0.10       3                   5               500     1.179707
##   0.10       3                   5               550     1.179728
##   0.10       3                   5               600     1.179723
##   0.10       3                   5               650     1.179758
##   0.10       3                   5               700     1.179785
##   0.10       3                   5               750     1.179821
##   0.10       3                   5               800     1.179822
##   0.10       3                   5               850     1.179828
##   0.10       3                   5               900     1.179838
##   0.10       3                   5               950     1.179827
##   0.10       3                   5              1000     1.179835
##   0.10       3                  10               100     1.202966
##   0.10       3                  10               150     1.196589
##   0.10       3                  10               200     1.198090
##   0.10       3                  10               250     1.197598
##   0.10       3                  10               300     1.197822
##   0.10       3                  10               350     1.196939
##   0.10       3                  10               400     1.196942
##   0.10       3                  10               450     1.196721
##   0.10       3                  10               500     1.196766
##   0.10       3                  10               550     1.196736
##   0.10       3                  10               600     1.196688
##   0.10       3                  10               650     1.196753
##   0.10       3                  10               700     1.196720
##   0.10       3                  10               750     1.196622
##   0.10       3                  10               800     1.196543
##   0.10       3                  10               850     1.196544
##   0.10       3                  10               900     1.196554
##   0.10       3                  10               950     1.196558
##   0.10       3                  10              1000     1.196536
##   0.10       5                   5               100     1.205322
##   0.10       5                   5               150     1.199964
##   0.10       5                   5               200     1.196647
##   0.10       5                   5               250     1.195825
##   0.10       5                   5               300     1.195202
##   0.10       5                   5               350     1.194632
##   0.10       5                   5               400     1.194317
##   0.10       5                   5               450     1.194007
##   0.10       5                   5               500     1.193839
##   0.10       5                   5               550     1.193806
##   0.10       5                   5               600     1.193701
##   0.10       5                   5               650     1.193564
##   0.10       5                   5               700     1.193531
##   0.10       5                   5               750     1.193516
##   0.10       5                   5               800     1.193516
##   0.10       5                   5               850     1.193505
##   0.10       5                   5               900     1.193506
##   0.10       5                   5               950     1.193498
##   0.10       5                   5              1000     1.193490
##   0.10       5                  10               100     1.209579
##   0.10       5                  10               150     1.201874
##   0.10       5                  10               200     1.200986
##   0.10       5                  10               250     1.198996
##   0.10       5                  10               300     1.197635
##   0.10       5                  10               350     1.197149
##   0.10       5                  10               400     1.197557
##   0.10       5                  10               450     1.196966
##   0.10       5                  10               500     1.196780
##   0.10       5                  10               550     1.196480
##   0.10       5                  10               600     1.196429
##   0.10       5                  10               650     1.196497
##   0.10       5                  10               700     1.196561
##   0.10       5                  10               750     1.196547
##   0.10       5                  10               800     1.196610
##   0.10       5                  10               850     1.196643
##   0.10       5                  10               900     1.196613
##   0.10       5                  10               950     1.196637
##   0.10       5                  10              1000     1.196680
##   0.10       7                   5               100     1.195582
##   0.10       7                   5               150     1.194296
##   0.10       7                   5               200     1.192201
##   0.10       7                   5               250     1.191540
##   0.10       7                   5               300     1.190906
##   0.10       7                   5               350     1.190655
##   0.10       7                   5               400     1.190366
##   0.10       7                   5               450     1.190335
##   0.10       7                   5               500     1.190363
##   0.10       7                   5               550     1.190190
##   0.10       7                   5               600     1.190082
##   0.10       7                   5               650     1.190036
##   0.10       7                   5               700     1.189992
##   0.10       7                   5               750     1.189970
##   0.10       7                   5               800     1.189969
##   0.10       7                   5               850     1.189972
##   0.10       7                   5               900     1.189964
##   0.10       7                   5               950     1.189970
##   0.10       7                   5              1000     1.189965
##   0.10       7                  10               100     1.211618
##   0.10       7                  10               150     1.207120
##   0.10       7                  10               200     1.204088
##   0.10       7                  10               250     1.202100
##   0.10       7                  10               300     1.201943
##   0.10       7                  10               350     1.202077
##   0.10       7                  10               400     1.200898
##   0.10       7                  10               450     1.200365
##   0.10       7                  10               500     1.200129
##   0.10       7                  10               550     1.199716
##   0.10       7                  10               600     1.199647
##   0.10       7                  10               650     1.199753
##   0.10       7                  10               700     1.199571
##   0.10       7                  10               750     1.199469
##   0.10       7                  10               800     1.199625
##   0.10       7                  10               850     1.199719
##   0.10       7                  10               900     1.199830
##   0.10       7                  10               950     1.199885
##   0.10       7                  10              1000     1.199920
##   Rsquared   MAE      
##   0.4908325  1.1931899
##   0.5103237  1.1245025
##   0.5200618  1.0737862
##   0.5288867  1.0385217
##   0.5334796  1.0176124
##   0.5386618  0.9978822
##   0.5427609  0.9857718
##   0.5453326  0.9770929
##   0.5470237  0.9706041
##   0.5483820  0.9647363
##   0.5504850  0.9604681
##   0.5513261  0.9569613
##   0.5516462  0.9549625
##   0.5528817  0.9527118
##   0.5532762  0.9509207
##   0.5548490  0.9478902
##   0.5551491  0.9460249
##   0.5555386  0.9456493
##   0.5560613  0.9447399
##   0.4884585  1.1894450
##   0.5071834  1.1158803
##   0.5174216  1.0668578
##   0.5226412  1.0341976
##   0.5264303  1.0126520
##   0.5299600  0.9964057
##   0.5331289  0.9866965
##   0.5365916  0.9779900
##   0.5380827  0.9717191
##   0.5396867  0.9657190
##   0.5418053  0.9614864
##   0.5433572  0.9590986
##   0.5441150  0.9574631
##   0.5461394  0.9546771
##   0.5467197  0.9529351
##   0.5476160  0.9510764
##   0.5489032  0.9487403
##   0.5497959  0.9478753
##   0.5508385  0.9469974
##   0.5253746  1.1137493
##   0.5371687  1.0441789
##   0.5449372  1.0030779
##   0.5533355  0.9752265
##   0.5587876  0.9591325
##   0.5637729  0.9471088
##   0.5678988  0.9387539
##   0.5709212  0.9327625
##   0.5743061  0.9274757
##   0.5758459  0.9250448
##   0.5777817  0.9215359
##   0.5790080  0.9187230
##   0.5804325  0.9160588
##   0.5821706  0.9137929
##   0.5834469  0.9119795
##   0.5843144  0.9103300
##   0.5851808  0.9092155
##   0.5855080  0.9084002
##   0.5864923  0.9069573
##   0.5223767  1.1139420
##   0.5347878  1.0409448
##   0.5407576  1.0025027
##   0.5461754  0.9779471
##   0.5503623  0.9625670
##   0.5536057  0.9520911
##   0.5561751  0.9443170
##   0.5592364  0.9382266
##   0.5617349  0.9340961
##   0.5634000  0.9311316
##   0.5655795  0.9281181
##   0.5676945  0.9256045
##   0.5689972  0.9236491
##   0.5708303  0.9217909
##   0.5721534  0.9200079
##   0.5733982  0.9190758
##   0.5743821  0.9181317
##   0.5753122  0.9173104
##   0.5763228  0.9161348
##   0.5242327  1.1007058
##   0.5397913  1.0284910
##   0.5486446  0.9870598
##   0.5562431  0.9608080
##   0.5633055  0.9449055
##   0.5686654  0.9347102
##   0.5723489  0.9280612
##   0.5758848  0.9221731
##   0.5783005  0.9179424
##   0.5807967  0.9146446
##   0.5826104  0.9116928
##   0.5842859  0.9090868
##   0.5856436  0.9074319
##   0.5863889  0.9061914
##   0.5873318  0.9050684
##   0.5880629  0.9038106
##   0.5885878  0.9031556
##   0.5890708  0.9025576
##   0.5895472  0.9020410
##   0.5247189  1.1043148
##   0.5354504  1.0339834
##   0.5418032  0.9966811
##   0.5480337  0.9725938
##   0.5527884  0.9572641
##   0.5570909  0.9464999
##   0.5606948  0.9395797
##   0.5631491  0.9345870
##   0.5658379  0.9300802
##   0.5675371  0.9267840
##   0.5697878  0.9243803
##   0.5718704  0.9221024
##   0.5730387  0.9202456
##   0.5743161  0.9191866
##   0.5752236  0.9184038
##   0.5762575  0.9173323
##   0.5769503  0.9166611
##   0.5773602  0.9163521
##   0.5781538  0.9156080
##   0.5338744  1.0893997
##   0.5458778  1.0181469
##   0.5539123  0.9795895
##   0.5606404  0.9561740
##   0.5663679  0.9413084
##   0.5716825  0.9314444
##   0.5751438  0.9243343
##   0.5779934  0.9194282
##   0.5809698  0.9149178
##   0.5832469  0.9114761
##   0.5849602  0.9094936
##   0.5866478  0.9074566
##   0.5876183  0.9063568
##   0.5884269  0.9051404
##   0.5892274  0.9044433
##   0.5900716  0.9033013
##   0.5904011  0.9027985
##   0.5904439  0.9025145
##   0.5906703  0.9020785
##   0.5193600  1.1104417
##   0.5285172  1.0397835
##   0.5348050  1.0024346
##   0.5418272  0.9788711
##   0.5469370  0.9637361
##   0.5506809  0.9543742
##   0.5548878  0.9463987
##   0.5578626  0.9413567
##   0.5606873  0.9365738
##   0.5629653  0.9336666
##   0.5649020  0.9308903
##   0.5666439  0.9286554
##   0.5681137  0.9265941
##   0.5689678  0.9251281
##   0.5709455  0.9228442
##   0.5722629  0.9210798
##   0.5728198  0.9205647
##   0.5737098  0.9192394
##   0.5741672  0.9185819
##   0.5372629  0.9637514
##   0.5404137  0.9594601
##   0.5414620  0.9574969
##   0.5411750  0.9595110
##   0.5396162  0.9615799
##   0.5378739  0.9653306
##   0.5377329  0.9653213
##   0.5344389  0.9714472
##   0.5335452  0.9723211
##   0.5312124  0.9764990
##   0.5308413  0.9780840
##   0.5301167  0.9789977
##   0.5290555  0.9799077
##   0.5285096  0.9804556
##   0.5280481  0.9807729
##   0.5278040  0.9816271
##   0.5274085  0.9823637
##   0.5268913  0.9833122
##   0.5264715  0.9837781
##   0.5409253  0.9554961
##   0.5423221  0.9528773
##   0.5443743  0.9550134
##   0.5435756  0.9558658
##   0.5446493  0.9557889
##   0.5444307  0.9570971
##   0.5440504  0.9572776
##   0.5422206  0.9605232
##   0.5421337  0.9605281
##   0.5408592  0.9633225
##   0.5404975  0.9650603
##   0.5396567  0.9650508
##   0.5396509  0.9655000
##   0.5391657  0.9659862
##   0.5386921  0.9664930
##   0.5381817  0.9672363
##   0.5383907  0.9676737
##   0.5381919  0.9683416
##   0.5379569  0.9682445
##   0.5714082  0.9250431
##   0.5731176  0.9216732
##   0.5739989  0.9207203
##   0.5745290  0.9200016
##   0.5746479  0.9196219
##   0.5745796  0.9197168
##   0.5743516  0.9200909
##   0.5742723  0.9201164
##   0.5741313  0.9202934
##   0.5741050  0.9203314
##   0.5741017  0.9203665
##   0.5740778  0.9204311
##   0.5740597  0.9204489
##   0.5740376  0.9204712
##   0.5740375  0.9204639
##   0.5740345  0.9204604
##   0.5740265  0.9204676
##   0.5740335  0.9204546
##   0.5740271  0.9204599
##   0.5577999  0.9399620
##   0.5626126  0.9345153
##   0.5614444  0.9357624
##   0.5619953  0.9347600
##   0.5617574  0.9354543
##   0.5624130  0.9346478
##   0.5624637  0.9345707
##   0.5626111  0.9346582
##   0.5626148  0.9346653
##   0.5627025  0.9348068
##   0.5627453  0.9347937
##   0.5627323  0.9348611
##   0.5627729  0.9348638
##   0.5628389  0.9348618
##   0.5628775  0.9348037
##   0.5628945  0.9348893
##   0.5628922  0.9349046
##   0.5629021  0.9349102
##   0.5629204  0.9348764
##   0.5621312  0.9436128
##   0.5649367  0.9405323
##   0.5671021  0.9383911
##   0.5673524  0.9382306
##   0.5676558  0.9378540
##   0.5678913  0.9375442
##   0.5680341  0.9374395
##   0.5681602  0.9373256
##   0.5682552  0.9373237
##   0.5682304  0.9372828
##   0.5683128  0.9372654
##   0.5683924  0.9371842
##   0.5684116  0.9371828
##   0.5684060  0.9371707
##   0.5683946  0.9371710
##   0.5683981  0.9371755
##   0.5683943  0.9371718
##   0.5683950  0.9371699
##   0.5684002  0.9371617
##   0.5542034  0.9357214
##   0.5592937  0.9301293
##   0.5599679  0.9315695
##   0.5613445  0.9309785
##   0.5622354  0.9296579
##   0.5626483  0.9295774
##   0.5624368  0.9301805
##   0.5627549  0.9300119
##   0.5628734  0.9301154
##   0.5630306  0.9299221
##   0.5631073  0.9302270
##   0.5630819  0.9303946
##   0.5630952  0.9305430
##   0.5631320  0.9305767
##   0.5630989  0.9306711
##   0.5630972  0.9308512
##   0.5631219  0.9308361
##   0.5631271  0.9308800
##   0.5630853  0.9309518
##   0.5673104  0.9328421
##   0.5676770  0.9321263
##   0.5689402  0.9303719
##   0.5692543  0.9299531
##   0.5696439  0.9293259
##   0.5697125  0.9293223
##   0.5698579  0.9292341
##   0.5698725  0.9293235
##   0.5698573  0.9293814
##   0.5699639  0.9292623
##   0.5700266  0.9291855
##   0.5700594  0.9291788
##   0.5700864  0.9291572
##   0.5700939  0.9291476
##   0.5701008  0.9291585
##   0.5700993  0.9291602
##   0.5701006  0.9291522
##   0.5700963  0.9291566
##   0.5700987  0.9291542
##   0.5538418  0.9489126
##   0.5570075  0.9453812
##   0.5590230  0.9443949
##   0.5603893  0.9431830
##   0.5606448  0.9431956
##   0.5604910  0.9437100
##   0.5613122  0.9433862
##   0.5616436  0.9429939
##   0.5617800  0.9429204
##   0.5620990  0.9429994
##   0.5621016  0.9430984
##   0.5620536  0.9433932
##   0.5621531  0.9432156
##   0.5622207  0.9431898
##   0.5620885  0.9432770
##   0.5620199  0.9434294
##   0.5619402  0.9435134
##   0.5618997  0.9435909
##   0.5618711  0.9436321
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1000,
##  interaction.depth = 3, shrinkage = 0.01 and n.minobsinnode = 5.

gbModel$results[which(gbModel$results$n.trees==gbModel$bestTune$n.trees 
                      & gbModel$results$interaction.depth==gbModel$bestTune$interaction.depth
                      & gbModel$results$shrinkage==gbModel$bestTune$shrinkage
                      & gbModel$results$n.minobsinnode==gbModel$bestTune$n.minobsinnode),]

varImp(gbModel)

## gbm variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess32 100.000
## ManufacturingProcess17  30.601
## ManufacturingProcess31  22.468
## ManufacturingProcess06  20.177
## BiologicalMaterial12    18.920
## BiologicalMaterial03    16.484
## ManufacturingProcess09  11.287
## ManufacturingProcess01  10.710
## ManufacturingProcess13  10.007
## BiologicalMaterial09     9.957
## ManufacturingProcess05   9.014
## ManufacturingProcess24   8.246
## ManufacturingProcess39   7.255
## BiologicalMaterial11     6.573
## ManufacturingProcess15   6.515
## ManufacturingProcess14   6.513
## ManufacturingProcess21   5.395
## ManufacturingProcess28   5.303
## ManufacturingProcess10   4.917
## ManufacturingProcess37   4.906

gbPred <- predict(gbModel, newdata = test_X)
postResample(pred = gbPred, obs = test_Y)

##      RMSE  Rsquared       MAE 
## 1.2267536 0.6746691 0.9548075

Cubist

set.seed(seed)

cubistModel <- train(x = train_X, y = train_Y, method = "cubist")
cubistModel

## Cubist 
## 
## 124 samples
##  57 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ... 
## Resampling results across tuning parameters:
## 
##   committees  neighbors  RMSE      Rsquared   MAE      
##    1          0          1.743360  0.3421518  1.2502010
##    1          5          1.722870  0.3580408  1.2297350
##    1          9          1.724375  0.3538024  1.2321318
##   10          0          1.239189  0.5379959  0.9603813
##   10          5          1.222683  0.5509488  0.9471472
##   10          9          1.225073  0.5488191  0.9494012
##   20          0          1.184533  0.5801013  0.9204430
##   20          5          1.164153  0.5932035  0.9059211
##   20          9          1.170477  0.5896930  0.9096912
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.

cubistModel$results[which(cubistModel$results$committees==cubistModel$bestTune$committees 
                          & cubistModel$results$neighbors==cubistModel$bestTune$neighbors),]

varImp(cubistModel)

## cubist variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess17 100.000
## ManufacturingProcess32  98.913
## BiologicalMaterial12    53.261
## ManufacturingProcess39  50.000
## ManufacturingProcess06  31.522
## ManufacturingProcess09  30.435
## BiologicalMaterial03    26.087
## ManufacturingProcess29  19.565
## ManufacturingProcess33  18.478
## ManufacturingProcess27  17.391
## ManufacturingProcess05  15.217
## ManufacturingProcess01  15.217
## ManufacturingProcess13  15.217
## ManufacturingProcess26  13.043
## ManufacturingProcess25  10.870
## BiologicalMaterial09     8.696
## ManufacturingProcess02   8.696
## BiologicalMaterial06     8.696
## ManufacturingProcess28   7.609
## ManufacturingProcess23   7.609

cubistPred <- predict(cubistModel, newdata = test_X)
postResample(pred = cubistPred, obs = test_Y)

##      RMSE  Rsquared       MAE 
## 1.0191358 0.7499820 0.7881474

1.4.3 Part a

Which tree-based regression model gives the optimal resampling and test set performance?

Answer:

According to the result statistics, cubist model has the lowest RMSE and the largest \(R^2\) on both resampling and test set performances. It has the best performance among the four tree-based regression models.

#single tree
stModel$results[which(stModel$results$cp==stModel$bestTune$cp),]

postResample(pred = stPred, obs = test_Y)

##      RMSE  Rsquared       MAE 
## 1.4857317 0.4525266 1.1270232

#random forest
rfModel$results[which(rfModel$results$mtry==rfModel$bestTune$mtry),]

postResample(pred = rfPred, obs = test_Y)

##      RMSE  Rsquared       MAE 
## 1.2303142 0.6837008 0.9528876

#gbm
gbModel$results[which(gbModel$results$n.trees==gbModel$bestTune$n.trees 
                      & gbModel$results$interaction.depth==gbModel$bestTune$interaction.depth
                      & gbModel$results$shrinkage==gbModel$bestTune$shrinkage
                      & gbModel$results$n.minobsinnode==gbModel$bestTune$n.minobsinnode),]

postResample(pred = gbPred, obs = test_Y)

##      RMSE  Rsquared       MAE 
## 1.2267536 0.6746691 0.9548075

#cubist
cubistModel$results[which(cubistModel$results$committees==cubistModel$bestTune$committees 
                          & cubistModel$results$neighbors==cubistModel$bestTune$neighbors),]

postResample(pred = cubistPred, obs = test_Y)

##      RMSE  Rsquared       MAE 
## 1.0191358 0.7499820 0.7881474

1.4.4 Part b

Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?

Answer:

By looking at the list of importance, ManufacturingProcess17 and the ManufacturingProcess32 are the most important predictors in the cubist model. Manufacturing process variables dominates the list by having 8 out of the top 10 variables.

From HW7, the optimal linear regression model was elastic net model. The top 10 predictors are mostly ManufacturingProcess predictors, there are 6 out of the top 10 predictors.

From HW8, the optimal non-linear regression model is SVM model. The top 10 predictors are mostly ManufacturingProcess predictors, there are 6 out of the top 10 predictors.

The top 10 important predictors in cubist model are slightly different with the optimal linear and nonlinear models. Although all 3 are dominated by process variables, there are 8 process variables out of the top 10 variables in cubist model.

The top 10 important predictors in the optimal linear and nonlinear models were ManufacturingProcess32, ManufacturingProcess13, BiologicalMaterial03, BiologicalMaterial06, ManufacturingProcess17, ManufacturingProcess09, BiologicalMaterial12, BiologicalMaterial02, ManufacturingProcess36, and ManufacturingProcess06.

varImp(cubistModel)

## cubist variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess17 100.000
## ManufacturingProcess32  98.913
## BiologicalMaterial12    53.261
## ManufacturingProcess39  50.000
## ManufacturingProcess06  31.522
## ManufacturingProcess09  30.435
## BiologicalMaterial03    26.087
## ManufacturingProcess29  19.565
## ManufacturingProcess33  18.478
## ManufacturingProcess27  17.391
## ManufacturingProcess01  15.217
## ManufacturingProcess13  15.217
## ManufacturingProcess05  15.217
## ManufacturingProcess26  13.043
## ManufacturingProcess25  10.870
## ManufacturingProcess02   8.696
## BiologicalMaterial06     8.696
## BiologicalMaterial09     8.696
## ManufacturingProcess28   7.609
## BiologicalMaterial01     7.609

1.4.5 Part c

Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?

Answer:

The final single tree model plot shows how the nodes are decided and their corresponding percentage at each level. It shows the knowledge about the biological or process predictors and their relationship with yield by showing the decision at each split. For example, the higher the value of ManufacturingProcess32, the higher the yield.

It starts at 40 with percentage 100%.
The first split happens at ManufacturingProcess32 < 0.19. If it is less than 0.19, the yield at this node will be 39 with chance of 56% in total. If it is greater than or equal to 0.19, the yield at this node will be 41 with chance of 44% in total.
If ManufacturingProcess32 < 0.19, the second split happens at BiologicalMaterial12 < -0.18. If it’s a Yes, the yield becomes 39 with chance of 33% in total. If it’s a No, the yield becomes 40 with chance of 23% in total. These two percentage comes from the previous node of 56%.
If ManufacturingProcess32 >= 0.19, the second split happens at ManufacturingProcess31 >= 0.14. If it’s a Yes, the yield becomes 41 with chance of 15% in total. If it’s a No, the yield becomes 42 with chance of 28% in total. These two percentage comes from the previous node of 44%.

stModel$finalModel

## n= 124 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
## 1) root 124 391.81030 40.17169  
##   2) ManufacturingProcess32< 0.191596 70 137.98860 39.18971  
##     4) BiologicalMaterial12< -0.1808383 41  63.08785 38.58293 *
##     5) BiologicalMaterial12>=-0.1808383 29  38.46253 40.04759 *
##   3) ManufacturingProcess32>=0.191596 54  98.82214 41.44463  
##     6) ManufacturingProcess31>=0.1415756 19  19.63712 40.59789 *
##     7) ManufacturingProcess31< 0.1415756 35  58.16786 41.90429 *

rpart.plot::rpart.plot(stModel$finalModel)

Data 624 HW9: Regression Trees and Rule-Based Models

Data 624 HW9: Regression Trees and Rule-Based Models

1 HW9: Regression Trees and Rule-Based Models

1.1 Ex. 8.1

1.1.1 Part a

1.1.2 Part b

1.1.3 Part c

1.1.4 Part d

1.2 Ex. 8.2

1.3 Ex. 8.3

1.3.1 Part a

1.3.2 Part b

1.3.3 Part c

1.4 Ex. 8.7

1.4.1 Data Pre-Processing

1.4.2 Models

1.4.3 Part a

1.4.4 Part b

1.4.5 Part c