8.1

Recreate the simulated data from Exercise 7.2:

library(mlbench)
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"
  1. Fit a random forest model to all of the predictors, then estimate the variable importance scores:
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin
library(caret)
model1 <- randomForest(y ~ ., data = simulated,
                       importance = TRUE,
                       ntree = 1000)
rfImp1 <- varImp(model1, scale = FALSE)

Did the random forest model significantly use the uninformative predictors (V6 – V10)?

rfImp1 %>% 
  mutate (var = rownames(rfImp1)) %>%
  ggplot(aes(Overall, reorder(var, Overall, sum), var)) + 
  geom_col(fill = 'blue') + 
  labs(title = 'Variable Importance' , y = 'Variable')

The above grapph shows that the model did not use V6-V10 variables significantly.

  1. Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V1)
## [1] 0.9460206

Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?

model2 <- randomForest(y ~ ., data = simulated,
                       importance = TRUE,
                       ntree = 1000)

rfImp2 <- varImp(model2, scale = FALSE)

rfImp2 %>% 
  mutate (var = rownames(rfImp2)) %>%
  ggplot(aes(Overall, reorder(var, Overall, sum), var)) + 
  geom_col(fill = 'blue') + 
  labs(title = 'Variable Importance' , y = 'Variable')

We see that V1 importance is dropped to the fourth position and lands next to its duplicate.

  1. Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?
model3 <- cforest(y ~ ., data = simulated)

rfImp3 <- varimp(model3, conditional = TRUE) %>% as.data.frame()

rfImp3 %>% 
  rename(Overall = '.') %>%
  mutate (var = rownames(rfImp3)) %>%
  ggplot(aes(Overall, reorder(var, Overall, sum), var)) + 
  geom_col(fill = 'blue') + 
  labs(title = 'Variable Importance' , y = 'Variable')

  1. Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?

The model continues to not significantly use V6-V10. Additionally, The variables importance that are used have equal values.

model4 <- cubist(simulated[, colnames(simulated)[colnames(simulated) != 'y']], 
                 simulated$y)

rfImp4 <- varImp(model4, scale = FALSE)

rfImp4 %>% 
  mutate (var = rownames(rfImp4)) %>%
  ggplot(aes(Overall, reorder(var, Overall, sum), var)) + 
  geom_col(fill = 'blue') + 
  labs(title = 'Variable Importance' , y = 'Variable')

8.2

Use a simulation to show tree bias with different granularities.

The graphs show similar results, maybe due to the simulation data.

# using the friedman2 function with stand dev of 2.5
set.seed(300)
simulated2 <- mlbench.friedman2(200, sd = 2.5)
simulated2 <- cbind(simulated2$x, simulated2$y)
simulated2 <- as.data.frame(simulated2)
colnames(simulated2)[ncol(simulated2)] <- "y"


model5 <- cforest(y ~ ., data = simulated2)

rfImp5 <- varimp(model5, conditional = FALSE) %>% as.data.frame()

rfImp5 %>% 
  rename(Overall = '.') %>%
  mutate (var = rownames(rfImp5)) %>%
  ggplot(aes(Overall, reorder(var, Overall, sum), var)) + 
  geom_col(fill = 'blue') + 
  labs(title = 'Variable Importance' , y = 'Variable')

rfImp6 <- varimp(model5, conditional = TRUE) %>% as.data.frame()

rfImp6 %>% 
  rename(Overall = '.') %>%
  mutate (var = rownames(rfImp6)) %>%
  ggplot(aes(Overall, reorder(var, Overall, sum), var)) + 
  geom_col(fill = 'blue') + 
  labs(title = 'Variable Importance' , y = 'Variable')

8.3

In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:

  1. Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?

The graph on the right, with parameter 0.9, shows a model that saw many more trees that narrowed down the variables of importance. While, the tree on the left, with parameter 0.1, saw many less trees resulting in a high variance.

  1. Which model do you think would be more predictive of other samples?

The model on the right should perform better on other samples as the bagging reduces variance and bias on a model.

  1. How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?

Increasing the interaction depth affect allows the trees to become larger. This may result in both models having more variables included intp their model and perhaps a higher performance.

8.7

Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:

data(ChemicalManufacturingProcess)
df <- ChemicalManufacturingProcess

init <- mice(df, maxit = 0, silent = TRUE)
predM <- init$predictorMatrix
set.seed(123)
imputed <- mice(df, method = 'pmm', predictorMatrix = predM, m=5, silent = TRUE)
## 
##  iter imp variable
##   1   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   1   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   1   3  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   1   4  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   1   5  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   2   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   2   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   2   3  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   2   4  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   2   5  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   3   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   3   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   3   3  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   3   4  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   3   5  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   4   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   4   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   4   3  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   4   4  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   4   5  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   5   1  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   5   2  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   5   3  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   5   4  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
##   5   5  ManufacturingProcess01  ManufacturingProcess02  ManufacturingProcess03  ManufacturingProcess04  ManufacturingProcess05  ManufacturingProcess06  ManufacturingProcess07  ManufacturingProcess08  ManufacturingProcess10  ManufacturingProcess11  ManufacturingProcess12  ManufacturingProcess14  ManufacturingProcess22  ManufacturingProcess23  ManufacturingProcess24  ManufacturingProcess25  ManufacturingProcess26  ManufacturingProcess27  ManufacturingProcess28  ManufacturingProcess29  ManufacturingProcess30  ManufacturingProcess31  ManufacturingProcess33  ManufacturingProcess34  ManufacturingProcess35  ManufacturingProcess36  ManufacturingProcess40  ManufacturingProcess41
## Warning: Number of logged events: 675
df <- complete(imputed ,silent = TRUE)

set.seed(123)
trans_df <- preProcess(df,
    method = c('center', 'scale'))

df <- predict(trans_df, df)

sample <- sample.split(df$Yield, SplitRatio = 0.75)
X_train = subset(df[,-1], sample == TRUE)
X_test = subset(df[,-1], sample == FALSE)

y_train <- subset(df[,1], sample == TRUE)
y_test <- subset(df[,1], sample == FALSE)
  1. Which tree-based regression model gives the optimal resampling and test set performance?

The random forest model has the highest performance.

models <- c('rf','gbm','cubist')

for(m in models){
  print(m)
  model <- train(X_train, y_train,
                 method = m,
                 trControl = trainControl(method = 'cv'),
                 verbose = FALSE)
  
  pred <- predict(model, newdata = X_test)
  perf <- postResample(pred, y_test)
  print(perf)
}
## [1] "rf"
##      RMSE  Rsquared       MAE 
## 0.4122269 0.7698316 0.3295748 
## [1] "gbm"
##      RMSE  Rsquared       MAE 
## 0.4715563 0.6788073 0.3565832 
## [1] "cubist"
##      RMSE  Rsquared       MAE 
## 0.4668316 0.6888499 0.3697297
  1. Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?
# elastic net
enetGrid <- expand.grid(.lambda = c(0,0.01,0.1),
            .fraction = seq(0.05,1,length = 20))

set.seed(213)
enetTune <- train(X_train, y_train,
                  method = 'enet',
                  tuneGrid = enetGrid
                  )
varImp(enetTune)
## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess13   99.45
## ManufacturingProcess17   84.45
## BiologicalMaterial06     76.83
## ManufacturingProcess09   76.23
## BiologicalMaterial03     76.20
## ManufacturingProcess36   74.99
## ManufacturingProcess06   74.25
## BiologicalMaterial12     60.54
## BiologicalMaterial02     59.04
## BiologicalMaterial11     49.98
## ManufacturingProcess31   48.75
## ManufacturingProcess29   41.56
## ManufacturingProcess11   40.96
## BiologicalMaterial04     40.87
## ManufacturingProcess33   39.55
## ManufacturingProcess30   38.79
## BiologicalMaterial08     38.66
## BiologicalMaterial01     33.78
## ManufacturingProcess12   33.39
# neural network
nnetGrid <- expand.grid(.decay = c(0,0.01,.1),
                        .size = c(1:5),
                        .bag = FALSE)
nnetFit <- train(X_train, y_train,
                  method = 'avNNet',
                  tuneGrid = nnetGrid,
                  linout = TRUE,
                  trace = FALSE,
                  MaxNWts = 5 * (ncol(X_train) + 1 + 5 + 1),
                  maxit = 100
  
)
## Warning: executing %dopar% sequentially: no parallel backend registered
varImp(nnetFit)
## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess13   99.45
## ManufacturingProcess17   84.45
## BiologicalMaterial06     76.83
## ManufacturingProcess09   76.23
## BiologicalMaterial03     76.20
## ManufacturingProcess36   74.99
## ManufacturingProcess06   74.25
## BiologicalMaterial12     60.54
## BiologicalMaterial02     59.04
## BiologicalMaterial11     49.98
## ManufacturingProcess31   48.75
## ManufacturingProcess29   41.56
## ManufacturingProcess11   40.96
## BiologicalMaterial04     40.87
## ManufacturingProcess33   39.55
## ManufacturingProcess30   38.79
## BiologicalMaterial08     38.66
## BiologicalMaterial01     33.78
## ManufacturingProcess12   33.39
rfmodel <- train(X_train, y_train,
                 method = 'rf',
                 trControl = trainControl(method = 'cv'))

rfVarImp <- varImp(rfmodel)

The variable importance shows a similar pattern, in that the manufacturing process variables dominate the top ten list and we see that ManufacturProcess32 has the clear top importance.

  1. Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?

We can see again that the manufacturing process variables have higher importance as they are higher on the tree.

tree <- rpart(Yield ~ ., data = df)
rpart.plot(tree)