Data-624 Homework-9

—————————————————————————

Student Name : Sachid Deshmukh

Date : 04/26/2020

RPubs location of published file

—————————————————————————

8.1 Recreate the simulated data from Exercise 7.2

Load data

library(mlbench)
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"

a. Fit a random forest model to all of the predictors, then estimate the variable importance scores:

model1 <- randomForest(y ~ ., data = simulated,
                       importance = TRUE,
                       ntree = 1000)
rfImp1 <- model1$importance #varImp(model1, scale = FALSE)
vip(model1, color = 'red', fill='green') + 
  ggtitle('Model1 Var Imp')

## Warning in vip.default(model1, color = "red", fill = "green"): Arguments
## `width`, `alpha`, `color`, `fill`, `size`, and `shape` have all been
## deprecated in favor of the new `mapping` and `aesthetics` arguments. They
## will be removed in version 0.3.0.

Did the random forest model significantly use the uninformative predictors ( V6 – V10 )?

From the above variable importance chart we can see that random forest model didn’t significantly use the uninformative predictors ( V6 – V10 )

b. Now add an additional predictor that is highly correlated with one of the informative predictors. For example:

simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V1)

## [1] 0.9460206

Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1

model2 <- randomForest(y ~ ., data = simulated, 
                       importance = TRUE, 
                       ntree = 1000)
rfImp2 <- varImp(model2, scale = FALSE)
grid.arrange(vip(model1, color = 'red', fill='green') + 
  ggtitle('Model1 Var Imp'), vip(model2, color = 'green', fill='red') + 
  ggtitle('Model2 Var Imp'), ncol = 2)

## Warning in vip.default(model1, color = "red", fill = "green"): Arguments
## `width`, `alpha`, `color`, `fill`, `size`, and `shape` have all been
## deprecated in favor of the new `mapping` and `aesthetics` arguments. They
## will be removed in version 0.3.0.

## Warning in vip.default(model2, color = "green", fill = "red"): Arguments
## `width`, `alpha`, `color`, `fill`, `size`, and `shape` have all been
## deprecated in favor of the new `mapping` and `aesthetics` arguments. They
## will be removed in version 0.3.0.

We can see the addition of highly correlated valriable changes the overall variable importanc of the model. Importance score of V1 changed. We can see the original importance score of variable V1 is decreased by half due to addition of highly correlated variable. If we have highly correlated variables in the model input, variable importance score will be misleading

c. Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?

model3 <- cforest(y ~ ., data = simulated, ntree = 100)

# Conditional variable importance
cfImp3 <- varimp(model3, conditional = TRUE)
# Un-conditional variable importance
cfImp4 <- varimp(model3, conditional = FALSE)
old.par <- par(mfrow=c(1, 2))
barplot(sort(cfImp3),horiz = TRUE, main = 'Conditional', col = rainbow(3))
barplot(sort(cfImp4),horiz = TRUE, main = 'Un-Conditional', col = rainbow(5))

par(old.par)

We can see that when variable importance is calculated conditionally, it takes into account correlation between variable V1 and duplicate1. It adjust importance score of these two variables accordingly. When variable importance is calculated un-conditionally then it treats both highly correlated variables (V1 and duplicate1) with equal importance which can be misleading.

d. Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?

1] Cubist

model4 <- cubist(x = simulated[, names(simulated)[names(simulated) != 'y']], 
                 y = simulated[,c('y')])


# Conditional variable importance
cfImp4 <- varImp(model4, conditional = TRUE)
# Un-conditional variable importance
cfImp5 <- varImp(model4, conditional = FALSE)
old.par <- par(mfrow=c(1, 2))
barplot((t(cfImp4)),horiz = TRUE, main = 'Conditional', col = rainbow(3))
barplot((t(cfImp5)),horiz = TRUE, main = 'Un-Conditional', col = rainbow(5))

par(old.par)

2] Boosted Trees

gbmGrid = expand.grid(interaction.depth = seq(1,5, by=2), n.trees = seq(100, 1000, by = 100), shrinkage = 0.1, n.minobsinnode = 5)
model4 <- train(y ~ ., data = simulated, tuneGrid = gbmGrid, verbose = FALSE, method = 'gbm' )


# Conditional variable importance
cfImp4 <- varImp(model4, conditional = TRUE)
# Un-conditional variable importance
cfImp5 <- varImp(model4, conditional = FALSE)
old.par <- par(mfrow=c(1, 2))
barplot((t(cfImp4$importance)),horiz = TRUE, main = 'Conditional', col = rainbow(3))
barplot((t(cfImp5$importance)),horiz = TRUE, main = 'Un-Conditional', col = rainbow(5))

par(old.par)

We can see that conditional and un-conditional variable importance is same for Cubist and Boosted Trees algorithms.

8.2 Use a simulation to show tree bias with different granularities.

Let’s create a simulated dataset there output variable Y is combination of two input variable V1 and V2. V1 has high number distinct values (2-500) compared to V2 (2-10)

V1 <- runif(1000, 2,500)
V2 <- rnorm(1000, 2,10)
V3 <- rnorm(1000, 1,1000)
y <- V2 + V1 

df <- data.frame(V1, V2, V3, y)
model3 <- cforest(y ~ ., data = df, ntree = 10)

cfImp4 <- varimp(model3, conditional = FALSE)
barplot(sort(cfImp4),horiz = TRUE, main = 'Un-Conditional', col = rainbow(5))

We can see that Random Forest algorithm gives high score to variable V1 whihc has more number of distinct values

Let’s reverse the granularity of V1 and V2 variable respectively. Output variable y will remain unchanged. Let’s fit the Random Forest tree and observe the variable importance

V1 <- runif(1000, 2,10)
V2 <- rnorm(1000, 2,500)
V3 <- rnorm(1000, 1,1000)
y <- V2 + V1 
 

df <- data.frame(V1, V2, V3, y)
model3 <- cforest(y ~ ., data = df, ntree = 10)

cfImp4 <- varimp(model3, conditional = FALSE)
barplot(sort(cfImp4),horiz = TRUE, main = 'Un-Conditional', col = rainbow(5))

We can see that Random forest algorithm now gives high score to V2 variable since it is more granular (high number of distinct values 2-500)

8.3 In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9

a. Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?

Model on right have high bagging fraction rate (0.9). This means more and more trees saw the same fraction of data and diversity in variable selction is reduced. It results in few predictors dominating the importance score compared to left model which has very granular bagging fraction (0.1).

b. Which model do you think would be more predictive of other samples?

Model with lower learning rate and bagging fraction should be more predictive on other sample. It has a balanced mix of selected important predictors rather than veyr few predictors with very high importance in the right hand side model. Right hand side model cna be over bias to the training data and can fail to generalize on test data due to its stong fiting on training dataset

c. How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?

As we increase the interaction depth, trees are allowed to grow more deep. This will result in more and more predictors to consider for tree splitting chioce. This will spread the variable importance to more variables rather than selecting very few predictors with very high importance

8.7 Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:

Prepare Data

library(AppliedPredictiveModeling)

## Warning: package 'AppliedPredictiveModeling' was built under R version
## 3.6.3

data(ChemicalManufacturingProcess)
tmp.data <- mice(ChemicalManufacturingProcess,m=2,maxit=5,meth='pmm',seed=500)

## 
##  iter imp variable
##   1   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   1   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   2   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   2   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   3   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   3   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   4   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   4   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   5   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   5   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41

## Warning: Number of logged events: 270

ChemicalManufacturingProcess = complete(tmp.data)

# train test split
set.seed(100)
rows = nrow(ChemicalManufacturingProcess)
t.index <- sample(1:rows, size = round(0.75*rows), replace=FALSE)
df.train <- ChemicalManufacturingProcess[t.index ,]
df.test <- ChemicalManufacturingProcess[-t.index ,]
df.train.x = df.train[,-1]
df.train.y = df.train[,1]
df.test.x = df.test[,-1]
df.test.y = df.test[,1]

model.eval = function(modelmethod, gridSearch = NULL)
{
  Model = train(x = df.train.x, y = df.train.y, method = modelmethod, tuneGrid = gridSearch, preProcess = c('center', 'scale'), trControl = trainControl(method='cv'))
  Pred = predict(Model, newdata = df.test.x)
  modelperf = postResample(Pred, df.test.y)
  print(modelperf)
}

a. Which tree-based regression model gives the optimal resampling and test set performance?

1. Simple Tree

perftree = model.eval('rpart')

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
## trainInfo, : There were missing values in resampled performance measures.

##     RMSE Rsquared      MAE 
## 1.486780 0.227814 1.210479

2. Random Forest

perfrf = model.eval('rf')

##      RMSE  Rsquared       MAE 
## 1.1522583 0.4554709 0.9178730

3. Boosting Trees

perfgbm = model.eval('gbm')

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3686             nan     0.1000    0.2499
##      2        3.1502             nan     0.1000    0.1627
##      3        2.9550             nan     0.1000    0.1711
##      4        2.7587             nan     0.1000    0.2057
##      5        2.5919             nan     0.1000    0.1488
##      6        2.4252             nan     0.1000    0.1485
##      7        2.2885             nan     0.1000    0.0854
##      8        2.1633             nan     0.1000    0.0756
##      9        2.0716             nan     0.1000    0.0698
##     10        1.9635             nan     0.1000    0.0924
##     20        1.3618             nan     0.1000    0.0113
##     40        1.0208             nan     0.1000    0.0032
##     60        0.8285             nan     0.1000   -0.0149
##     80        0.7078             nan     0.1000   -0.0029
##    100        0.6042             nan     0.1000   -0.0073
##    120        0.5282             nan     0.1000   -0.0020
##    140        0.4701             nan     0.1000   -0.0044
##    150        0.4400             nan     0.1000   -0.0041
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.2844             nan     0.1000    0.2619
##      2        3.0378             nan     0.1000    0.1339
##      3        2.7776             nan     0.1000    0.2506
##      4        2.5648             nan     0.1000    0.1523
##      5        2.3238             nan     0.1000    0.2047
##      6        2.1127             nan     0.1000    0.1539
##      7        1.9829             nan     0.1000    0.0949
##      8        1.8446             nan     0.1000    0.1319
##      9        1.7140             nan     0.1000    0.0961
##     10        1.6407             nan     0.1000    0.0286
##     20        1.0319             nan     0.1000    0.0004
##     40        0.6617             nan     0.1000   -0.0026
##     60        0.4663             nan     0.1000   -0.0150
##     80        0.3507             nan     0.1000   -0.0112
##    100        0.2671             nan     0.1000   -0.0067
##    120        0.2044             nan     0.1000   -0.0017
##    140        0.1593             nan     0.1000   -0.0023
##    150        0.1471             nan     0.1000   -0.0062
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.2213             nan     0.1000    0.3371
##      2        2.8780             nan     0.1000    0.3081
##      3        2.6014             nan     0.1000    0.2533
##      4        2.3515             nan     0.1000    0.1582
##      5        2.1649             nan     0.1000    0.1433
##      6        1.9987             nan     0.1000    0.1475
##      7        1.8512             nan     0.1000    0.1133
##      8        1.6869             nan     0.1000    0.0885
##      9        1.5865             nan     0.1000    0.0850
##     10        1.5072             nan     0.1000    0.0080
##     20        0.9063             nan     0.1000    0.0015
##     40        0.5059             nan     0.1000   -0.0027
##     60        0.3306             nan     0.1000   -0.0072
##     80        0.2341             nan     0.1000   -0.0037
##    100        0.1637             nan     0.1000   -0.0026
##    120        0.1153             nan     0.1000   -0.0022
##    140        0.0870             nan     0.1000   -0.0007
##    150        0.0755             nan     0.1000   -0.0010
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3234             nan     0.1000    0.3097
##      2        3.1122             nan     0.1000    0.1733
##      3        2.8314             nan     0.1000    0.2004
##      4        2.6965             nan     0.1000    0.1019
##      5        2.5322             nan     0.1000    0.1623
##      6        2.3992             nan     0.1000    0.0740
##      7        2.2628             nan     0.1000    0.1287
##      8        2.1121             nan     0.1000    0.1326
##      9        1.9987             nan     0.1000    0.0847
##     10        1.8706             nan     0.1000    0.0968
##     20        1.3000             nan     0.1000    0.0046
##     40        0.8948             nan     0.1000   -0.0098
##     60        0.7460             nan     0.1000   -0.0073
##     80        0.6347             nan     0.1000   -0.0051
##    100        0.5586             nan     0.1000   -0.0057
##    120        0.4992             nan     0.1000   -0.0083
##    140        0.4486             nan     0.1000   -0.0101
##    150        0.4287             nan     0.1000   -0.0063
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3050             nan     0.1000    0.3467
##      2        2.9974             nan     0.1000    0.2571
##      3        2.7492             nan     0.1000    0.2046
##      4        2.5355             nan     0.1000    0.1173
##      5        2.3422             nan     0.1000    0.1629
##      6        2.1587             nan     0.1000    0.1439
##      7        1.9757             nan     0.1000    0.1263
##      8        1.8362             nan     0.1000    0.0902
##      9        1.7247             nan     0.1000    0.0851
##     10        1.6239             nan     0.1000    0.0809
##     20        1.0200             nan     0.1000    0.0010
##     40        0.6312             nan     0.1000    0.0031
##     60        0.4606             nan     0.1000   -0.0119
##     80        0.3514             nan     0.1000   -0.0011
##    100        0.2671             nan     0.1000   -0.0022
##    120        0.2115             nan     0.1000   -0.0034
##    140        0.1698             nan     0.1000   -0.0035
##    150        0.1501             nan     0.1000   -0.0029
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3733             nan     0.1000    0.2924
##      2        3.0065             nan     0.1000    0.3271
##      3        2.6812             nan     0.1000    0.2164
##      4        2.4143             nan     0.1000    0.2707
##      5        2.2077             nan     0.1000    0.2064
##      6        2.0418             nan     0.1000    0.0954
##      7        1.8964             nan     0.1000    0.1056
##      8        1.7503             nan     0.1000    0.1115
##      9        1.6407             nan     0.1000    0.0627
##     10        1.5259             nan     0.1000    0.0768
##     20        0.9659             nan     0.1000    0.0003
##     40        0.5333             nan     0.1000   -0.0044
##     60        0.3193             nan     0.1000   -0.0056
##     80        0.2112             nan     0.1000   -0.0011
##    100        0.1493             nan     0.1000   -0.0025
##    120        0.1057             nan     0.1000   -0.0029
##    140        0.0753             nan     0.1000   -0.0004
##    150        0.0658             nan     0.1000   -0.0009
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3099             nan     0.1000    0.2646
##      2        3.0316             nan     0.1000    0.3141
##      3        2.8667             nan     0.1000    0.1616
##      4        2.6389             nan     0.1000    0.2518
##      5        2.4765             nan     0.1000    0.0976
##      6        2.2844             nan     0.1000    0.1697
##      7        2.1399             nan     0.1000    0.1186
##      8        2.0277             nan     0.1000    0.0625
##      9        1.9200             nan     0.1000    0.0350
##     10        1.7908             nan     0.1000    0.0877
##     20        1.1872             nan     0.1000    0.0178
##     40        0.8776             nan     0.1000    0.0034
##     60        0.7067             nan     0.1000   -0.0064
##     80        0.5895             nan     0.1000   -0.0105
##    100        0.5128             nan     0.1000   -0.0100
##    120        0.4454             nan     0.1000   -0.0024
##    140        0.3911             nan     0.1000   -0.0023
##    150        0.3685             nan     0.1000   -0.0036
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.2629             nan     0.1000    0.3310
##      2        2.9560             nan     0.1000    0.2452
##      3        2.6570             nan     0.1000    0.1909
##      4        2.4880             nan     0.1000    0.1617
##      5        2.2620             nan     0.1000    0.1416
##      6        2.1323             nan     0.1000    0.1322
##      7        1.9807             nan     0.1000    0.1346
##      8        1.8426             nan     0.1000    0.0875
##      9        1.7044             nan     0.1000    0.0956
##     10        1.6106             nan     0.1000    0.0657
##     20        1.0190             nan     0.1000    0.0195
##     40        0.6130             nan     0.1000   -0.0088
##     60        0.4101             nan     0.1000   -0.0065
##     80        0.2995             nan     0.1000   -0.0031
##    100        0.2353             nan     0.1000   -0.0069
##    120        0.1979             nan     0.1000   -0.0046
##    140        0.1577             nan     0.1000   -0.0021
##    150        0.1438             nan     0.1000   -0.0027
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.2257             nan     0.1000    0.2845
##      2        2.8137             nan     0.1000    0.3316
##      3        2.5606             nan     0.1000    0.1970
##      4        2.2904             nan     0.1000    0.2023
##      5        2.0762             nan     0.1000    0.1796
##      6        1.9364             nan     0.1000    0.1353
##      7        1.7822             nan     0.1000    0.1258
##      8        1.6400             nan     0.1000    0.1372
##      9        1.5385             nan     0.1000    0.0812
##     10        1.4286             nan     0.1000    0.0603
##     20        0.8837             nan     0.1000    0.0043
##     40        0.4634             nan     0.1000   -0.0015
##     60        0.3073             nan     0.1000   -0.0013
##     80        0.2121             nan     0.1000   -0.0022
##    100        0.1402             nan     0.1000   -0.0033
##    120        0.0998             nan     0.1000   -0.0021
##    140        0.0732             nan     0.1000   -0.0021
##    150        0.0653             nan     0.1000   -0.0015
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.5607             nan     0.1000    0.2008
##      2        3.2261             nan     0.1000    0.2532
##      3        3.0559             nan     0.1000    0.1617
##      4        2.8235             nan     0.1000    0.1549
##      5        2.6515             nan     0.1000    0.1261
##      6        2.4806             nan     0.1000    0.1497
##      7        2.3379             nan     0.1000    0.0949
##      8        2.2161             nan     0.1000    0.1096
##      9        2.0902             nan     0.1000    0.0390
##     10        2.0088             nan     0.1000    0.0587
##     20        1.4027             nan     0.1000   -0.0018
##     40        1.0063             nan     0.1000   -0.0022
##     60        0.8562             nan     0.1000   -0.0033
##     80        0.7050             nan     0.1000   -0.0039
##    100        0.6212             nan     0.1000   -0.0083
##    120        0.5508             nan     0.1000   -0.0144
##    140        0.4792             nan     0.1000   -0.0057
##    150        0.4521             nan     0.1000   -0.0010
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.4277             nan     0.1000    0.4234
##      2        3.0765             nan     0.1000    0.3160
##      3        2.7802             nan     0.1000    0.1539
##      4        2.5723             nan     0.1000    0.2423
##      5        2.3517             nan     0.1000    0.1932
##      6        2.1334             nan     0.1000    0.1257
##      7        1.9882             nan     0.1000    0.0957
##      8        1.8514             nan     0.1000    0.1144
##      9        1.7363             nan     0.1000    0.0579
##     10        1.6459             nan     0.1000    0.0769
##     20        1.0555             nan     0.1000    0.0084
##     40        0.6653             nan     0.1000    0.0076
##     60        0.4672             nan     0.1000   -0.0205
##     80        0.3700             nan     0.1000   -0.0075
##    100        0.2883             nan     0.1000   -0.0039
##    120        0.2312             nan     0.1000   -0.0047
##    140        0.1827             nan     0.1000   -0.0045
##    150        0.1661             nan     0.1000   -0.0044
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.4502             nan     0.1000    0.4025
##      2        3.0986             nan     0.1000    0.2732
##      3        2.8270             nan     0.1000    0.1727
##      4        2.6305             nan     0.1000    0.1814
##      5        2.3292             nan     0.1000    0.2579
##      6        2.1697             nan     0.1000    0.1413
##      7        1.9728             nan     0.1000    0.1088
##      8        1.8433             nan     0.1000    0.1089
##      9        1.7249             nan     0.1000    0.0784
##     10        1.6100             nan     0.1000    0.0698
##     20        0.9869             nan     0.1000    0.0083
##     40        0.5599             nan     0.1000   -0.0147
##     60        0.3348             nan     0.1000   -0.0072
##     80        0.2267             nan     0.1000   -0.0080
##    100        0.1617             nan     0.1000   -0.0050
##    120        0.1141             nan     0.1000   -0.0021
##    140        0.0849             nan     0.1000   -0.0018
##    150        0.0751             nan     0.1000   -0.0026
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.4919             nan     0.1000    0.1085
##      2        3.1652             nan     0.1000    0.3036
##      3        2.8882             nan     0.1000    0.2582
##      4        2.6373             nan     0.1000    0.1813
##      5        2.4465             nan     0.1000    0.1343
##      6        2.2848             nan     0.1000    0.0329
##      7        2.1758             nan     0.1000    0.0737
##      8        2.0512             nan     0.1000    0.0972
##      9        1.9702             nan     0.1000    0.0308
##     10        1.8948             nan     0.1000    0.0775
##     20        1.3271             nan     0.1000    0.0175
##     40        0.9794             nan     0.1000   -0.0090
##     60        0.8102             nan     0.1000   -0.0095
##     80        0.6743             nan     0.1000   -0.0017
##    100        0.5900             nan     0.1000   -0.0026
##    120        0.4984             nan     0.1000   -0.0035
##    140        0.4509             nan     0.1000   -0.0055
##    150        0.4199             nan     0.1000   -0.0009
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3445             nan     0.1000    0.2722
##      2        2.9926             nan     0.1000    0.2782
##      3        2.7564             nan     0.1000    0.2452
##      4        2.5369             nan     0.1000    0.1588
##      5        2.3413             nan     0.1000    0.1603
##      6        2.1554             nan     0.1000    0.1525
##      7        1.9739             nan     0.1000    0.0881
##      8        1.8457             nan     0.1000    0.1136
##      9        1.7536             nan     0.1000    0.0803
##     10        1.6879             nan     0.1000   -0.0072
##     20        1.1365             nan     0.1000   -0.0271
##     40        0.6941             nan     0.1000    0.0010
##     60        0.4766             nan     0.1000   -0.0092
##     80        0.3400             nan     0.1000   -0.0122
##    100        0.2676             nan     0.1000   -0.0045
##    120        0.2068             nan     0.1000   -0.0025
##    140        0.1622             nan     0.1000   -0.0038
##    150        0.1443             nan     0.1000   -0.0040
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3194             nan     0.1000    0.3011
##      2        2.8779             nan     0.1000    0.3274
##      3        2.5749             nan     0.1000    0.2143
##      4        2.3555             nan     0.1000    0.1760
##      5        2.1338             nan     0.1000    0.1906
##      6        1.9816             nan     0.1000    0.1222
##      7        1.8531             nan     0.1000    0.0971
##      8        1.7722             nan     0.1000    0.0439
##      9        1.6842             nan     0.1000    0.0561
##     10        1.6052             nan     0.1000    0.0440
##     20        0.9506             nan     0.1000    0.0080
##     40        0.5324             nan     0.1000   -0.0094
##     60        0.3370             nan     0.1000    0.0000
##     80        0.2202             nan     0.1000   -0.0028
##    100        0.1571             nan     0.1000   -0.0020
##    120        0.1042             nan     0.1000   -0.0029
##    140        0.0784             nan     0.1000   -0.0021
##    150        0.0702             nan     0.1000   -0.0012
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.4211             nan     0.1000    0.3179
##      2        3.1381             nan     0.1000    0.1629
##      3        2.9552             nan     0.1000    0.1378
##      4        2.7624             nan     0.1000    0.1011
##      5        2.6351             nan     0.1000    0.0936
##      6        2.4691             nan     0.1000    0.0714
##      7        2.3450             nan     0.1000    0.0776
##      8        2.2555             nan     0.1000    0.0705
##      9        2.1472             nan     0.1000    0.0383
##     10        2.0518             nan     0.1000    0.0656
##     20        1.4748             nan     0.1000   -0.0371
##     40        1.0494             nan     0.1000    0.0032
##     60        0.8746             nan     0.1000   -0.0049
##     80        0.7122             nan     0.1000   -0.0025
##    100        0.6024             nan     0.1000   -0.0042
##    120        0.5355             nan     0.1000   -0.0114
##    140        0.4739             nan     0.1000   -0.0041
##    150        0.4523             nan     0.1000   -0.0103
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3877             nan     0.1000    0.3768
##      2        3.0397             nan     0.1000    0.2592
##      3        2.7850             nan     0.1000    0.2099
##      4        2.5621             nan     0.1000    0.1465
##      5        2.3718             nan     0.1000    0.1161
##      6        2.1820             nan     0.1000    0.1578
##      7        2.0615             nan     0.1000    0.0741
##      8        1.9669             nan     0.1000    0.0594
##      9        1.8386             nan     0.1000    0.1180
##     10        1.7368             nan     0.1000    0.0670
##     20        1.1765             nan     0.1000    0.0028
##     40        0.7355             nan     0.1000   -0.0168
##     60        0.5118             nan     0.1000    0.0000
##     80        0.3772             nan     0.1000   -0.0120
##    100        0.2957             nan     0.1000   -0.0059
##    120        0.2245             nan     0.1000   -0.0096
##    140        0.1798             nan     0.1000   -0.0057
##    150        0.1592             nan     0.1000   -0.0009
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3725             nan     0.1000    0.3365
##      2        3.0050             nan     0.1000    0.2695
##      3        2.7233             nan     0.1000    0.1364
##      4        2.5014             nan     0.1000    0.1541
##      5        2.3007             nan     0.1000    0.1886
##      6        2.1376             nan     0.1000    0.0956
##      7        2.0221             nan     0.1000    0.1165
##      8        1.8650             nan     0.1000    0.1526
##      9        1.7691             nan     0.1000    0.0150
##     10        1.6547             nan     0.1000    0.0756
##     20        0.9881             nan     0.1000   -0.0067
##     40        0.5550             nan     0.1000    0.0033
##     60        0.3421             nan     0.1000   -0.0108
##     80        0.2335             nan     0.1000   -0.0096
##    100        0.1622             nan     0.1000   -0.0051
##    120        0.1185             nan     0.1000   -0.0022
##    140        0.0870             nan     0.1000   -0.0018
##    150        0.0742             nan     0.1000   -0.0009
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3218             nan     0.1000    0.1964
##      2        3.0063             nan     0.1000    0.2610
##      3        2.8198             nan     0.1000    0.1272
##      4        2.6708             nan     0.1000    0.1069
##      5        2.4750             nan     0.1000    0.1880
##      6        2.3223             nan     0.1000    0.0959
##      7        2.2187             nan     0.1000    0.0933
##      8        2.0777             nan     0.1000    0.1138
##      9        1.9790             nan     0.1000    0.0558
##     10        1.8992             nan     0.1000    0.0359
##     20        1.3196             nan     0.1000    0.0216
##     40        0.9729             nan     0.1000    0.0017
##     60        0.7977             nan     0.1000   -0.0211
##     80        0.6723             nan     0.1000   -0.0091
##    100        0.5738             nan     0.1000   -0.0047
##    120        0.4981             nan     0.1000    0.0029
##    140        0.4477             nan     0.1000   -0.0061
##    150        0.4246             nan     0.1000   -0.0090
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.1840             nan     0.1000    0.2953
##      2        2.9236             nan     0.1000    0.2956
##      3        2.6743             nan     0.1000    0.2322
##      4        2.4156             nan     0.1000    0.2198
##      5        2.2515             nan     0.1000    0.0962
##      6        2.0836             nan     0.1000    0.1554
##      7        1.9627             nan     0.1000    0.0727
##      8        1.8578             nan     0.1000    0.0555
##      9        1.7303             nan     0.1000    0.0821
##     10        1.6377             nan     0.1000    0.0618
##     20        1.1082             nan     0.1000   -0.0014
##     40        0.6762             nan     0.1000    0.0049
##     60        0.4720             nan     0.1000   -0.0074
##     80        0.3339             nan     0.1000   -0.0071
##    100        0.2582             nan     0.1000   -0.0035
##    120        0.1967             nan     0.1000    0.0001
##    140        0.1570             nan     0.1000   -0.0021
##    150        0.1411             nan     0.1000   -0.0042
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.1829             nan     0.1000    0.3541
##      2        2.8504             nan     0.1000    0.2685
##      3        2.6211             nan     0.1000    0.2172
##      4        2.3842             nan     0.1000    0.2010
##      5        2.1889             nan     0.1000    0.1204
##      6        2.0474             nan     0.1000    0.0544
##      7        1.8712             nan     0.1000    0.0860
##      8        1.7748             nan     0.1000    0.0231
##      9        1.6582             nan     0.1000    0.0786
##     10        1.5354             nan     0.1000    0.0619
##     20        0.9194             nan     0.1000   -0.0013
##     40        0.5282             nan     0.1000   -0.0009
##     60        0.3335             nan     0.1000   -0.0094
##     80        0.2140             nan     0.1000   -0.0063
##    100        0.1475             nan     0.1000   -0.0012
##    120        0.1054             nan     0.1000   -0.0042
##    140        0.0747             nan     0.1000   -0.0012
##    150        0.0623             nan     0.1000   -0.0013
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.2954             nan     0.1000    0.2865
##      2        3.0889             nan     0.1000    0.2480
##      3        2.8704             nan     0.1000    0.1996
##      4        2.7001             nan     0.1000    0.1379
##      5        2.5083             nan     0.1000    0.1134
##      6        2.3296             nan     0.1000    0.1200
##      7        2.2429             nan     0.1000    0.0437
##      8        2.1469             nan     0.1000    0.0462
##      9        2.0367             nan     0.1000    0.0636
##     10        1.9564             nan     0.1000    0.0665
##     20        1.3619             nan     0.1000    0.0246
##     40        0.9851             nan     0.1000   -0.0053
##     60        0.8106             nan     0.1000   -0.0029
##     80        0.6757             nan     0.1000   -0.0056
##    100        0.5767             nan     0.1000   -0.0033
##    120        0.5098             nan     0.1000   -0.0081
##    140        0.4463             nan     0.1000   -0.0084
##    150        0.4210             nan     0.1000   -0.0030
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.1646             nan     0.1000    0.3233
##      2        2.8613             nan     0.1000    0.3073
##      3        2.5904             nan     0.1000    0.2120
##      4        2.4276             nan     0.1000    0.0897
##      5        2.2503             nan     0.1000    0.1456
##      6        2.1030             nan     0.1000    0.0953
##      7        1.9511             nan     0.1000    0.0982
##      8        1.8137             nan     0.1000    0.0702
##      9        1.6817             nan     0.1000    0.0884
##     10        1.6174             nan     0.1000    0.0475
##     20        1.0441             nan     0.1000    0.0260
##     40        0.6398             nan     0.1000    0.0061
##     60        0.4567             nan     0.1000   -0.0102
##     80        0.3393             nan     0.1000   -0.0045
##    100        0.2547             nan     0.1000   -0.0044
##    120        0.1967             nan     0.1000   -0.0055
##    140        0.1540             nan     0.1000   -0.0021
##    150        0.1374             nan     0.1000   -0.0018
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.2851             nan     0.1000    0.2060
##      2        2.9412             nan     0.1000    0.3261
##      3        2.6815             nan     0.1000    0.2365
##      4        2.4237             nan     0.1000    0.2180
##      5        2.2178             nan     0.1000    0.1073
##      6        2.0278             nan     0.1000    0.1291
##      7        1.8953             nan     0.1000    0.0882
##      8        1.7480             nan     0.1000    0.0946
##      9        1.6434             nan     0.1000    0.0626
##     10        1.5424             nan     0.1000    0.0832
##     20        0.9594             nan     0.1000   -0.0207
##     40        0.5376             nan     0.1000   -0.0126
##     60        0.3416             nan     0.1000   -0.0068
##     80        0.2459             nan     0.1000   -0.0044
##    100        0.1706             nan     0.1000   -0.0003
##    120        0.1177             nan     0.1000   -0.0025
##    140        0.0889             nan     0.1000   -0.0036
##    150        0.0795             nan     0.1000   -0.0021
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.4627             nan     0.1000    0.2368
##      2        3.1845             nan     0.1000    0.2178
##      3        2.9837             nan     0.1000    0.2018
##      4        2.8337             nan     0.1000    0.1395
##      5        2.6146             nan     0.1000    0.1799
##      6        2.4264             nan     0.1000    0.0819
##      7        2.2560             nan     0.1000    0.1027
##      8        2.1525             nan     0.1000    0.0854
##      9        2.0453             nan     0.1000    0.0654
##     10        1.9403             nan     0.1000    0.0992
##     20        1.3009             nan     0.1000    0.0239
##     40        0.9069             nan     0.1000    0.0002
##     60        0.7513             nan     0.1000   -0.0034
##     80        0.6263             nan     0.1000   -0.0053
##    100        0.5384             nan     0.1000   -0.0087
##    120        0.4707             nan     0.1000   -0.0029
##    140        0.4149             nan     0.1000   -0.0077
##    150        0.3915             nan     0.1000   -0.0058
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3667             nan     0.1000    0.3709
##      2        3.0263             nan     0.1000    0.3149
##      3        2.6965             nan     0.1000    0.2192
##      4        2.4587             nan     0.1000    0.2314
##      5        2.3108             nan     0.1000    0.1273
##      6        2.1240             nan     0.1000    0.1462
##      7        1.9565             nan     0.1000    0.1487
##      8        1.8334             nan     0.1000    0.0972
##      9        1.7141             nan     0.1000    0.0877
##     10        1.6000             nan     0.1000    0.0890
##     20        1.0438             nan     0.1000   -0.0009
##     40        0.6181             nan     0.1000   -0.0051
##     60        0.4126             nan     0.1000   -0.0077
##     80        0.3038             nan     0.1000   -0.0022
##    100        0.2258             nan     0.1000   -0.0042
##    120        0.1707             nan     0.1000   -0.0027
##    140        0.1431             nan     0.1000   -0.0026
##    150        0.1274             nan     0.1000   -0.0019
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3038             nan     0.1000    0.4676
##      2        2.9400             nan     0.1000    0.2663
##      3        2.6642             nan     0.1000    0.2661
##      4        2.3850             nan     0.1000    0.2280
##      5        2.1682             nan     0.1000    0.1658
##      6        2.0141             nan     0.1000    0.0832
##      7        1.8462             nan     0.1000    0.1166
##      8        1.6823             nan     0.1000    0.0725
##      9        1.5557             nan     0.1000    0.0868
##     10        1.4396             nan     0.1000    0.0970
##     20        0.8818             nan     0.1000    0.0190
##     40        0.4601             nan     0.1000   -0.0053
##     60        0.2845             nan     0.1000    0.0021
##     80        0.1894             nan     0.1000   -0.0021
##    100        0.1299             nan     0.1000   -0.0052
##    120        0.0952             nan     0.1000   -0.0018
##    140        0.0691             nan     0.1000   -0.0011
##    150        0.0612             nan     0.1000   -0.0020
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.4632             nan     0.1000    0.3501
##      2        3.1763             nan     0.1000    0.2116
##      3        2.9851             nan     0.1000    0.1728
##      4        2.8225             nan     0.1000    0.0863
##      5        2.6312             nan     0.1000    0.1488
##      6        2.4717             nan     0.1000    0.0828
##      7        2.3263             nan     0.1000    0.0475
##      8        2.1700             nan     0.1000    0.1455
##      9        2.0218             nan     0.1000    0.1067
##     10        1.9424             nan     0.1000    0.0559
##     20        1.3396             nan     0.1000    0.0150
##     40        0.9805             nan     0.1000   -0.0055
##     60        0.7967             nan     0.1000   -0.0059
##     80        0.6784             nan     0.1000    0.0014
##    100        0.6002             nan     0.1000   -0.0094
##    120        0.5266             nan     0.1000   -0.0060
##    140        0.4607             nan     0.1000   -0.0021
##    150        0.4378             nan     0.1000   -0.0008
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.4404             nan     0.1000    0.3396
##      2        3.1288             nan     0.1000    0.2304
##      3        2.8546             nan     0.1000    0.2703
##      4        2.6321             nan     0.1000    0.1329
##      5        2.3886             nan     0.1000    0.2011
##      6        2.1729             nan     0.1000    0.1511
##      7        2.0315             nan     0.1000    0.1180
##      8        1.8740             nan     0.1000    0.1399
##      9        1.7548             nan     0.1000    0.0533
##     10        1.6507             nan     0.1000    0.0615
##     20        1.1226             nan     0.1000    0.0089
##     40        0.6832             nan     0.1000   -0.0016
##     60        0.4959             nan     0.1000   -0.0036
##     80        0.3743             nan     0.1000   -0.0045
##    100        0.2941             nan     0.1000   -0.0108
##    120        0.2399             nan     0.1000   -0.0056
##    140        0.1885             nan     0.1000   -0.0012
##    150        0.1658             nan     0.1000   -0.0039
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.3325             nan     0.1000    0.3659
##      2        2.9811             nan     0.1000    0.3233
##      3        2.6390             nan     0.1000    0.3013
##      4        2.4524             nan     0.1000    0.0791
##      5        2.2602             nan     0.1000    0.1406
##      6        2.0693             nan     0.1000    0.1350
##      7        1.8779             nan     0.1000    0.1213
##      8        1.7244             nan     0.1000    0.1307
##      9        1.6108             nan     0.1000    0.0600
##     10        1.5028             nan     0.1000    0.0704
##     20        0.9084             nan     0.1000    0.0172
##     40        0.5118             nan     0.1000   -0.0151
##     60        0.3260             nan     0.1000   -0.0124
##     80        0.2105             nan     0.1000   -0.0076
##    100        0.1572             nan     0.1000   -0.0043
##    120        0.1102             nan     0.1000   -0.0031
##    140        0.0806             nan     0.1000   -0.0017
##    150        0.0698             nan     0.1000   -0.0019
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        3.2773             nan     0.1000    0.3606
##      2        2.9582             nan     0.1000    0.3272
##      3        2.7352             nan     0.1000    0.1317
##      4        2.4734             nan     0.1000    0.2020
##      5        2.2940             nan     0.1000    0.1534
##      6        2.1508             nan     0.1000    0.1246
##      7        1.9941             nan     0.1000    0.1025
##      8        1.8312             nan     0.1000    0.1068
##      9        1.7511             nan     0.1000    0.0164
##     10        1.6662             nan     0.1000    0.0680
##     20        1.0946             nan     0.1000   -0.0215
##     40        0.7051             nan     0.1000    0.0015
##     60        0.4878             nan     0.1000    0.0005
##     80        0.3761             nan     0.1000   -0.0008
##    100        0.2921             nan     0.1000   -0.0099
##    120        0.2295             nan     0.1000   -0.0024
##    140        0.1880             nan     0.1000   -0.0040
##    150        0.1703             nan     0.1000   -0.0030
## 
##      RMSE  Rsquared       MAE 
## 1.1267787 0.4844553 0.9468683

4. Cubist

perfcubist = model.eval('cubist')

##      RMSE  Rsquared       MAE 
## 1.0886085 0.5405831 0.8232466

df.perf = rbind(data.frame(Name = 'SimpleTree', RMSE = perftree[1]), data.frame(Name= 'RandomForest', RMSE = perfrf[1]) , data.frame(Name = 'BoostingTree', RMSE = perfgbm[1]), data.frame(Name = 'Cubist', RMSE = perfcubist[1]))

ggplot(data = df.perf, aes(x = Name, y = RMSE, fill=Name)) +
  geom_bar(stat="identity", position=position_dodge()) +
  geom_text(aes(label=RMSE), vjust=1, color="white",
            position = position_dodge(0.9), size=3.5)

From the above model performance chart, we can see the Cubist model gives the lowest RMSE on test set. Cubist is the most optimal model for this dataset

b. Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?

cModel <- train(x = df.train.x,
                     y = df.train.y,
                     method = 'cubist')
vip(cModel, color = 'red', fill='purple')

## Warning in vip.default(cModel, color = "red", fill = "purple"): Arguments
## `width`, `alpha`, `color`, `fill`, `size`, and `shape` have all been
## deprecated in favor of the new `mapping` and `aesthetics` arguments. They
## will be removed in version 0.3.0.

We can see that manufacturing process variables dominate the list of important variables which is in parity with optimal list of variables from linear and non-linear models

c. Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?

library(rpart.plot)

## Warning: package 'rpart.plot' was built under R version 3.6.3

multi.class.model  = rpart(Yield~., data=df.train)
rpart.plot(multi.class.model)

Data-624 Homework-9

—————————————————————————

Student Name : Sachid Deshmukh

Date : 04/26/2020

—————————————————————————

8.1 Recreate the simulated data from Exercise 7.2

Load data

a. Fit a random forest model to all of the predictors, then estimate the variable importance scores:

Did the random forest model significantly use the uninformative predictors ( V6 – V10 )?

From the above variable importance chart we can see that random forest model didn’t significantly use the uninformative predictors ( V6 – V10 )

b. Now add an additional predictor that is highly correlated with one of the informative predictors. For example:

Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1

d. Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?

1] Cubist

2] Boosted Trees

We can see that conditional and un-conditional variable importance is same for Cubist and Boosted Trees algorithms.

8.2 Use a simulation to show tree bias with different granularities.

Let’s create a simulated dataset there output variable Y is combination of two input variable V1 and V2. V1 has high number distinct values (2-500) compared to V2 (2-10)

We can see that Random Forest algorithm gives high score to variable V1 whihc has more number of distinct values

Let’s reverse the granularity of V1 and V2 variable respectively. Output variable y will remain unchanged. Let’s fit the Random Forest tree and observe the variable importance

We can see that Random forest algorithm now gives high score to V2 variable since it is more granular (high number of distinct values 2-500)

a. Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?

Model on right have high bagging fraction rate (0.9). This means more and more trees saw the same fraction of data and diversity in variable selction is reduced. It results in few predictors dominating the importance score compared to left model which has very granular bagging fraction (0.1).

b. Which model do you think would be more predictive of other samples?

c. How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?

As we increase the interaction depth, trees are allowed to grow more deep. This will result in more and more predictors to consider for tree splitting chioce. This will spread the variable importance to more variables rather than selecting very few predictors with very high importance

8.7 Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:

Prepare Data

a. Which tree-based regression model gives the optimal resampling and test set performance?

1. Simple Tree

2. Random Forest

3. Boosting Trees

4. Cubist

From the above model performance chart, we can see the Cubist model gives the lowest RMSE on test set. Cubist is the most optimal model for this dataset

b. Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?

We can see that manufacturing process variables dominate the list of important variables which is in parity with optimal list of variables from linear and non-linear models

c. Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?

From the above tree plot, we can clearly see that high values of manufacturing process vairaibles contributes to higher Yield