8.1. Recreate the simulated data from Exercise 7.2:
## Overall
## V1 8.83890885
## V2 6.49023056
## V3 0.67583163
## V4 7.58822553
## V5 2.27426009
## V6 0.17436781
## V7 0.15136583
## V8 -0.03078937
## V9 -0.02989832
## V10 -0.08529218
Did the random forest model significantly use the uninformative
predictors (V6 – V10)?
We can see that the random forest mondel did not significantly use
uninformative predictors.
Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
## [1] 0.9396216
Fit another random forest model to these data. Did the importance
score for V1 change? What happens when you add another predictor that is
also highly correlated with V1?
The important score of V1 decreased signficantly and the important
scores for the highly correlated variables changed or increased.
## Overall
## V1 6.29780744
## V2 6.08038134
## V3 0.58410718
## V4 6.93924427
## V5 2.03104094
## V6 0.07947642
## V7 -0.02566414
## V8 -0.11007435
## V9 -0.08839463
## V10 -0.00715093
## duplicate1 3.56411581
## [1] 0.9312569
## Overall
## V1 5.656397024
## V2 6.957366954
## V3 0.539700105
## V4 7.280227792
## V5 2.094226861
## V6 0.141163232
## V7 0.092792498
## V8 -0.096325566
## V9 -0.007463533
## V10 0.016839393
## duplicate1 2.566313355
## duplicate2 2.654958084
Yes, there is a similar pattern in variable importance with the random forest variable.
## V1 V2 V3 V4 V5 V6
## 6.29693233 5.31985191 0.08627462 5.91248630 1.45798478 -0.16535778
## V7 V8 V9 V10
## -0.12086303 -0.50833336 -0.28273187 -0.16748484
## n.trees interaction.depth shrinkage n.minobsinnode
## 505 600 5 0.07 10
8.2. Use a simulation to show tree bias with different
granularities.
Here we use the random forest model. We can see differences in
granularities between a, b, and c.
8.3. In stochastic gradient boosting the bagging fraction and
learning rate will govern the construction of the trees as they are
guided by the gradient. Although the optimal values of these parameters
should be obtained through the tuning process, it is helpful to
understand how the magnitudes of these parameters affect magnitudes of
variable importance. Figure 8.24 provides the variable importance plots
for boosting using two extreme values for the bagging fraction (0.1 and
0.9) and the learning rate (0.1 and 0.9) for the solubility data. The
left-hand plot has both parameters set to 0.1, and the right-hand plot
has both set to 0.9:
(a) Why does the model on the right focus its importance on just the
first few of predictors, whereas the model on the left spreads
importance across more predictors?
The left-side model takes longer to compute b/c of a decreased bagging
fraction and learning rate. The model the on the right has a high
bagging fraction rate and leads to having fewder predictors contributing
to the importance score.
Which model do you think would be more predictive of other
samples?
The left model would be more predictive b/c of the lower learning rate
and lower weight of predictors.
How would increasing interaction depth affect the slope of
predictor importance for either model in Fig. 8.24?
The slope would increase b/c as the depth increases the importance of
predictors increases, where we will see more predictors.
8.7. Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models: (a) Which tree-based regression model gives the optimal resampling and test set performance?
Sorry, my code was not loading at all for question 8.7.