8.1. Recreate the simulated data from Exercise 7.2:
8.2 Use a simulation to show tree bias with different granularities.
8.3 In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:
8.7. Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:

Do problems 8.1, 8.2, 8.3, and 8.7

8.1. Recreate the simulated data from Exercise 7.2:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(mlbench)  
set.seed(200)  
simulated <- mlbench.friedman1(200, sd = 1)  
simulated <- cbind(simulated$x, simulated$y)  
simulated <- as.data.frame(simulated)  
colnames(simulated)[ncol(simulated)] <- "y"

Fit a random forest model to all of the predictors, then estimate the variable importance scores. Did the random forest model significantly use the uninformative predictors (V6 – V10)?

library(randomForest)

## randomForest 4.6-14

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

## 
## Attaching package: 'ggplot2'

## The following object is masked from 'package:randomForest':
## 
##     margin

model1 <- randomForest(y ~., data = simulated,   importance = TRUE,  ntree = 1000)  
rfImp1 <- varImp(model1, scale = FALSE)

rfImp1

##         Overall
## V1   8.83890885
## V2   6.49023056
## V3   0.67583163
## V4   7.58822553
## V5   2.27426009
## V6   0.17436781
## V7   0.15136583
## V8  -0.03078937
## V9  -0.02989832
## V10 -0.08529218

Predictors v6 - v10 have very small importance and therefore the random forest model did not significantly use these.

Now add an additional predictor that is highly correlated with one of the informative predictors.

simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1  
cor(simulated$duplicate1, simulated$V1)

## [1] 0.9396216

Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?

model2 <- randomForest(y ~., data = simulated,   importance = TRUE,  ntree = 1000)  
rfImp2 <- varImp(model2, scale = FALSE)

rfImp2

##                Overall
## V1          6.29780744
## V2          6.08038134
## V3          0.58410718
## V4          6.93924427
## V5          2.03104094
## V6          0.07947642
## V7         -0.02566414
## V8         -0.11007435
## V9         -0.08839463
## V10        -0.00715093
## duplicate1  3.56411581

The importance score of v1 has decreased. When you add another predictor that is also highly correlated with v1, the number of splits in the random tree model are shared between v1 and the new predictor, therefore lessening the importance.

1. Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?

library(party)

## Loading required package: grid

## Loading required package: mvtnorm

## Loading required package: modeltools

## Loading required package: stats4

## Loading required package: strucchange

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

model3<- cforest(y ~., data = simulated)
varimp(model3, conditional = TRUE)

##            V1            V2            V3            V4            V5 
##  3.173621e+00  4.954327e+00 -2.487929e-03  6.122763e+00  1.157286e+00 
##            V6            V7            V8            V9           V10 
##  6.534901e-05 -2.353746e-02  6.846242e-03  1.737579e-02  1.154302e-02 
##    duplicate1 
##  9.159232e-01

The cforest model shows a slightly different pattern to the random forest model with diffent importance value, yet a similar order of importance with v4 being most important.

1. Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?

** Boosted Trees

library(gbm)

## Loaded gbm 2.1.8

model4 <- gbm(y ~., data = simulated, distribution = "gaussian")

summary(model4)

##                   var    rel.inf
## V4                 V4 30.1882249
## V2                 V2 23.3402488
## V1                 V1 20.2136333
## V5                 V5 10.9949556
## duplicate1 duplicate1  7.6076567
## V3                 V3  7.3687812
## V7                 V7  0.1678235
## V8                 V8  0.1186761
## V6                 V6  0.0000000
## V9                 V9  0.0000000
## V10               V10  0.0000000

Similarly to previous models, V4 is predicted to be the most important variable with the Boosted Tree Model.

** Cubist Model

library(Cubist)
Model5<-cubist(x=simulated[,-(ncol(simulated)-1)], y=simulated$y, committees=100)
varImp(Model5)

##            Overall
## V1            64.5
## V3            41.0
## V2            60.0
## V4            48.0
## V5            31.0
## V6             9.0
## duplicate1     6.0
## V8             2.0
## V10            0.5
## V7             0.0
## V9             0.0

The Cubist Model shows a difference in importance with the variables, predicting V1 to be most important as opposed to V4 from the earlier models.

8.2 Use a simulation to show tree bias with different granularities.

set.seed(1)
x1 <- sample(0:10000 / 10000, 200, replace = T)
x2 <- sample(0:1000 / 1000, 200, replace = T)
x3 <- sample(0:100 / 100, 200, replace = T)
x4 <- sample(0:10 / 10, 200, replace = T)

y <- x1 + x4 + rnorm(200) 

df <- data.frame(x1, x2, x3, x4, y)

library(rpart)

rpartTree <- rpart(y ~ ., data=df)
varImp(rpartTree)

##      Overall
## x1 0.7443663
## x2 0.7563594
## x3 0.5903005
## x4 0.4806487

8.3 In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:

1. Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?

Boosting means that each tree is dependent on prior trees. The algorithm learns by fitting the residual of the trees that preceded it. The learning rate determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge on the best solution. The higher the learning rate, the less dependent values will be correlated as in the secon graph with 0.9 learning rate.

1. Which model do you think would be more predictive of other samples?

The model with 0.1 learning rate will be most predictive of the other samples since models with less bagging fraction and learning rate will be less likely to overfit and will have more generalization.

1. How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?

Increasing the interaction depth decreased the RMSE on the number of trees.

8.7. Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:

1. Which tree-based regression model gives the optimal resampling and test set performance?

library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)

set.seed(56)

knnmodel2 <- preProcess(ChemicalManufacturingProcess, "knnImpute")
df <- predict(knnmodel2, ChemicalManufacturingProcess)

df <- df %>%
  select_at(vars(-one_of(nearZeroVar(., names = TRUE))))

in_train <- createDataPartition(df$Yield, times = 1, p = 0.8, list = FALSE)
train_df <- df[in_train, ]
test_df <- df[-in_train, ]
df.train.x = train_df[,-1]
df.train.y = train_df[,1]
df.test.x = test_df[,-1]
df.test.y = test_df[,1]

** Random Forest

library(randomForest)  
library(caret)  
set.seed(10)
rfModel <- train(x = df.train.x,
                 y = df.train.y,
                 method = 'rf',
                 tuneLength = 10)
rfModel

## Random Forest 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE       Rsquared   MAE      
##    2    0.6887506  0.6034891  0.5369235
##    8    0.6458874  0.6237886  0.4966979
##   14    0.6399251  0.6223807  0.4892793
##   20    0.6384215  0.6179594  0.4864648
##   26    0.6427075  0.6083136  0.4867058
##   32    0.6421827  0.6067880  0.4856572
##   38    0.6445505  0.6017645  0.4862263
##   44    0.6524093  0.5890632  0.4918243
##   50    0.6547797  0.5848733  0.4929361
##   56    0.6587105  0.5785681  0.4949506
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 20.

** Boosted

set.seed(10)
grid <- expand.grid(n.trees=c(50, 100, 150, 200), 
                    interaction.depth=c(1, 5, 10, 15), 
                    shrinkage=c(0.01, 0.1, 0.5), 
                    n.minobsinnode=c(5, 10, 15))
gbmModel <- train(x = df.train.x,
                  y = df.train.y,
                  method = 'gbm', 
                  tuneGrid = grid, 
                  verbose = FALSE)

gbmModel

## Stochastic Gradient Boosting 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ... 
## Resampling results across tuning parameters:
## 
##   shrinkage  interaction.depth  n.minobsinnode  n.trees  RMSE       Rsquared 
##   0.01        1                  5               50      0.8861818  0.4429902
##   0.01        1                  5              100      0.8122191  0.4898878
##   0.01        1                  5              150      0.7641514  0.5171798
##   0.01        1                  5              200      0.7311479  0.5349375
##   0.01        1                 10               50      0.8865594  0.4541180
##   0.01        1                 10              100      0.8106351  0.4943815
##   0.01        1                 10              150      0.7621684  0.5158366
##   0.01        1                 10              200      0.7305949  0.5299783
##   0.01        1                 15               50      0.8875225  0.4488035
##   0.01        1                 15              100      0.8124663  0.4930673
##   0.01        1                 15              150      0.7621555  0.5187584
##   0.01        1                 15              200      0.7298443  0.5332401
##   0.01        5                  5               50      0.8298870  0.5342329
##   0.01        5                  5              100      0.7418307  0.5549246
##   0.01        5                  5              150      0.6962730  0.5721927
##   0.01        5                  5              200      0.6702899  0.5863603
##   0.01        5                 10               50      0.8344043  0.5176358
##   0.01        5                 10              100      0.7478868  0.5390828
##   0.01        5                 10              150      0.7050841  0.5542119
##   0.01        5                 10              200      0.6830626  0.5643717
##   0.01        5                 15               50      0.8465142  0.4980061
##   0.01        5                 15              100      0.7628770  0.5258196
##   0.01        5                 15              150      0.7183813  0.5437013
##   0.01        5                 15              200      0.6946348  0.5550203
##   0.01       10                  5               50      0.8222481  0.5492881
##   0.01       10                  5              100      0.7337827  0.5631995
##   0.01       10                  5              150      0.6871138  0.5810400
##   0.01       10                  5              200      0.6628534  0.5928022
##   0.01       10                 10               50      0.8335054  0.5215425
##   0.01       10                 10              100      0.7478048  0.5396830
##   0.01       10                 10              150      0.7040710  0.5557826
##   0.01       10                 10              200      0.6818857  0.5656583
##   0.01       10                 15               50      0.8458350  0.5031433
##   0.01       10                 15              100      0.7629048  0.5259137
##   0.01       10                 15              150      0.7204376  0.5398506
##   0.01       10                 15              200      0.6968279  0.5511118
##   0.01       15                  5               50      0.8265157  0.5393524
##   0.01       15                  5              100      0.7345052  0.5640981
##   0.01       15                  5              150      0.6892562  0.5785601
##   0.01       15                  5              200      0.6649028  0.5910551
##   0.01       15                 10               50      0.8335155  0.5227051
##   0.01       15                 10              100      0.7468066  0.5437311
##   0.01       15                 10              150      0.7023107  0.5589773
##   0.01       15                 10              200      0.6791003  0.5701137
##   0.01       15                 15               50      0.8467634  0.5013152
##   0.01       15                 15              100      0.7631723  0.5291374
##   0.01       15                 15              150      0.7193982  0.5427500
##   0.01       15                 15              200      0.6953333  0.5537806
##   0.10        1                  5               50      0.6847272  0.5476474
##   0.10        1                  5              100      0.6778655  0.5535859
##   0.10        1                  5              150      0.6768149  0.5574006
##   0.10        1                  5              200      0.6758626  0.5601805
##   0.10        1                 10               50      0.6901427  0.5412144
##   0.10        1                 10              100      0.6822384  0.5487883
##   0.10        1                 10              150      0.6777967  0.5563153
##   0.10        1                 10              200      0.6767969  0.5594859
##   0.10        1                 15               50      0.6940884  0.5315457
##   0.10        1                 15              100      0.6852072  0.5432212
##   0.10        1                 15              150      0.6796391  0.5504543
##   0.10        1                 15              200      0.6790532  0.5536966
##   0.10        5                  5               50      0.6529136  0.5839441
##   0.10        5                  5              100      0.6397736  0.6016521
##   0.10        5                  5              150      0.6367649  0.6063580
##   0.10        5                  5              200      0.6356574  0.6078057
##   0.10        5                 10               50      0.6522776  0.5845012
##   0.10        5                 10              100      0.6409728  0.5996146
##   0.10        5                 10              150      0.6329560  0.6103532
##   0.10        5                 10              200      0.6297537  0.6146869
##   0.10        5                 15               50      0.6710660  0.5604933
##   0.10        5                 15              100      0.6550701  0.5806070
##   0.10        5                 15              150      0.6456897  0.5931307
##   0.10        5                 15              200      0.6401443  0.5996200
##   0.10       10                  5               50      0.6518655  0.5873948
##   0.10       10                  5              100      0.6430852  0.5999454
##   0.10       10                  5              150      0.6411299  0.6027942
##   0.10       10                  5              200      0.6404337  0.6039800
##   0.10       10                 10               50      0.6659471  0.5708972
##   0.10       10                 10              100      0.6484634  0.5937127
##   0.10       10                 10              150      0.6424486  0.6026656
##   0.10       10                 10              200      0.6422782  0.6035665
##   0.10       10                 15               50      0.6692092  0.5635372
##   0.10       10                 15              100      0.6511635  0.5844770
##   0.10       10                 15              150      0.6434424  0.5950505
##   0.10       10                 15              200      0.6370439  0.6029454
##   0.10       15                  5               50      0.6445123  0.5958846
##   0.10       15                  5              100      0.6350294  0.6080568
##   0.10       15                  5              150      0.6334227  0.6105791
##   0.10       15                  5              200      0.6332966  0.6111995
##   0.10       15                 10               50      0.6556006  0.5808349
##   0.10       15                 10              100      0.6413477  0.5990053
##   0.10       15                 10              150      0.6356894  0.6058592
##   0.10       15                 10              200      0.6328858  0.6097917
##   0.10       15                 15               50      0.6624939  0.5701062
##   0.10       15                 15              100      0.6482972  0.5888305
##   0.10       15                 15              150      0.6419439  0.5964804
##   0.10       15                 15              200      0.6383940  0.6009170
##   0.50        1                  5               50      0.7649890  0.4746799
##   0.50        1                  5              100      0.7684013  0.4822725
##   0.50        1                  5              150      0.7672336  0.4872189
##   0.50        1                  5              200      0.7651752  0.4896447
##   0.50        1                 10               50      0.7393639  0.5010414
##   0.50        1                 10              100      0.7305378  0.5189482
##   0.50        1                 10              150      0.7290002  0.5241207
##   0.50        1                 10              200      0.7309560  0.5230796
##   0.50        1                 15               50      0.7308240  0.5030164
##   0.50        1                 15              100      0.7404889  0.4990986
##   0.50        1                 15              150      0.7399604  0.5011822
##   0.50        1                 15              200      0.7442833  0.4982731
##   0.50        5                  5               50      0.7652374  0.4823328
##   0.50        5                  5              100      0.7641241  0.4836292
##   0.50        5                  5              150      0.7639709  0.4838715
##   0.50        5                  5              200      0.7639500  0.4839010
##   0.50        5                 10               50      0.7606870  0.4761583
##   0.50        5                 10              100      0.7596902  0.4783854
##   0.50        5                 10              150      0.7599554  0.4784211
##   0.50        5                 10              200      0.7598924  0.4786795
##   0.50        5                 15               50      0.7380017  0.4975783
##   0.50        5                 15              100      0.7384295  0.5004710
##   0.50        5                 15              150      0.7383149  0.5010109
##   0.50        5                 15              200      0.7390560  0.5005151
##   0.50       10                  5               50      0.7704154  0.4734729
##   0.50       10                  5              100      0.7692222  0.4747773
##   0.50       10                  5              150      0.7692219  0.4748132
##   0.50       10                  5              200      0.7692004  0.4748450
##   0.50       10                 10               50      0.7692894  0.4684564
##   0.50       10                 10              100      0.7657319  0.4735310
##   0.50       10                 10              150      0.7655652  0.4747527
##   0.50       10                 10              200      0.7654292  0.4750363
##   0.50       10                 15               50      0.7464144  0.4916312
##   0.50       10                 15              100      0.7456142  0.4961866
##   0.50       10                 15              150      0.7435738  0.4989335
##   0.50       10                 15              200      0.7434776  0.4993274
##   0.50       15                  5               50      0.7970965  0.4377319
##   0.50       15                  5              100      0.7956610  0.4396156
##   0.50       15                  5              150      0.7953666  0.4398930
##   0.50       15                  5              200      0.7955319  0.4397517
##   0.50       15                 10               50      0.7345734  0.5013059
##   0.50       15                 10              100      0.7331767  0.5038218
##   0.50       15                 10              150      0.7338389  0.5036720
##   0.50       15                 10              200      0.7337766  0.5036788
##   0.50       15                 15               50      0.7488314  0.4859507
##   0.50       15                 15              100      0.7457297  0.4918584
##   0.50       15                 15              150      0.7440140  0.4948376
##   0.50       15                 15              200      0.7440738  0.4950966
##   MAE      
##   0.7084569
##   0.6423555
##   0.5999429
##   0.5703862
##   0.7093705
##   0.6409595
##   0.5969848
##   0.5680730
##   0.7092613
##   0.6418178
##   0.5971861
##   0.5683856
##   0.6611786
##   0.5821977
##   0.5405946
##   0.5150489
##   0.6649608
##   0.5854547
##   0.5449346
##   0.5226105
##   0.6729642
##   0.5974332
##   0.5577710
##   0.5363203
##   0.6549598
##   0.5745619
##   0.5303373
##   0.5068978
##   0.6635435
##   0.5850749
##   0.5444182
##   0.5218759
##   0.6727529
##   0.5978940
##   0.5598675
##   0.5376782
##   0.6593590
##   0.5752664
##   0.5312511
##   0.5077834
##   0.6636590
##   0.5857195
##   0.5443379
##   0.5217813
##   0.6729232
##   0.5970193
##   0.5584931
##   0.5354213
##   0.5269139
##   0.5225728
##   0.5247743
##   0.5260974
##   0.5252831
##   0.5192826
##   0.5162082
##   0.5177700
##   0.5313570
##   0.5229483
##   0.5190232
##   0.5198918
##   0.5018773
##   0.4929476
##   0.4900240
##   0.4891554
##   0.5005073
##   0.4915170
##   0.4844472
##   0.4824698
##   0.5123633
##   0.5017698
##   0.4952874
##   0.4910297
##   0.4915354
##   0.4866941
##   0.4858184
##   0.4858829
##   0.5069258
##   0.4958276
##   0.4924167
##   0.4936399
##   0.5156449
##   0.5022434
##   0.4964176
##   0.4924929
##   0.4878967
##   0.4813249
##   0.4806941
##   0.4809243
##   0.4963140
##   0.4901471
##   0.4876518
##   0.4864212
##   0.5055348
##   0.4941241
##   0.4891849
##   0.4872332
##   0.6016989
##   0.6022836
##   0.6014021
##   0.5989816
##   0.5780994
##   0.5740305
##   0.5727691
##   0.5744938
##   0.5721260
##   0.5816357
##   0.5781600
##   0.5812817
##   0.5790967
##   0.5784724
##   0.5783322
##   0.5783135
##   0.5974886
##   0.5970656
##   0.5971459
##   0.5971986
##   0.5815954
##   0.5815951
##   0.5808630
##   0.5820511
##   0.6054156
##   0.6038618
##   0.6038410
##   0.6038177
##   0.5974770
##   0.5956630
##   0.5958917
##   0.5957671
##   0.5880511
##   0.5891991
##   0.5873446
##   0.5875142
##   0.6125050
##   0.6122914
##   0.6119624
##   0.6121229
##   0.5792508
##   0.5795749
##   0.5803924
##   0.5803430
##   0.5897518
##   0.5875889
##   0.5865808
##   0.5864940
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 200, interaction.depth =
##  5, shrinkage = 0.1 and n.minobsinnode = 10.

gbmModel$bestTune

##    n.trees interaction.depth shrinkage n.minobsinnode
## 68     200                 5       0.1             10

gbmModel$finalModel

## A gradient boosted model with gaussian loss function.
## 200 iterations were performed.
## There were 56 predictors of which 54 had non-zero influence.

** Cubist

set.seed(1)
cubModel <- train(x = df.train.x,
                     y = df.train.y,
                     method = 'cubist')
cubModel

## Cubist 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ... 
## Resampling results across tuning parameters:
## 
##   committees  neighbors  RMSE       Rsquared   MAE      
##    1          0          0.9855217  0.3720333  0.7021369
##    1          5          0.9678239  0.3922185  0.6775515
##    1          9          0.9714719  0.3869898  0.6851331
##   10          0          0.6985204  0.5798378  0.5258657
##   10          5          0.6813682  0.5998483  0.5088235
##   10          9          0.6869407  0.5935009  0.5148524
##   20          0          0.6560534  0.6266983  0.4945791
##   20          5          0.6378305  0.6468926  0.4769200
##   20          9          0.6441261  0.6398704  0.4832740
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.

The RMSE appears to be best for the Random Forest Model,

1. Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?

varImp(gbmModel)

## gbm variable importance
## 
##   only 20 most important variables shown (out of 56)
## 
##                        Overall
## ManufacturingProcess32 100.000
## ManufacturingProcess09  22.925
## BiologicalMaterial12    21.277
## ManufacturingProcess31  17.960
## BiologicalMaterial03    17.644
## ManufacturingProcess17  15.682
## BiologicalMaterial11    14.622
## ManufacturingProcess13  12.238
## BiologicalMaterial09     9.518
## ManufacturingProcess06   8.842
## ManufacturingProcess01   5.852
## ManufacturingProcess18   5.764
## ManufacturingProcess29   5.658
## BiologicalMaterial02     5.579
## ManufacturingProcess10   4.691
## BiologicalMaterial06     4.644
## BiologicalMaterial05     4.633
## ManufacturingProcess02   4.434
## ManufacturingProcess14   4.241
## ManufacturingProcess33   4.164

The Mfg process variables dominate the list of most important variables. This is similar to what we saw with the linear and non linear models from previous chapters in which mfg process 32 was most important and the mfg process dominated the models over the biological variables.

DATA 624 HW 9

Christina Kasman

11/15/2020

8.1. Recreate the simulated data from Exercise 7.2:

8.2 Use a simulation to show tree bias with different granularities.

8.7. Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models: