DATA 624 Homework 9
Question 8.1
Recreate the simulated data from Exercise 7.2
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"
Part A
Fit a random fores model to all the predictors, then estimate the variable importance scores:
rf_model_1 <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
rf_imp_1 <- varImp(rf_model_1, scale = FALSE)
rf_imp_1 %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(Variable = rowname) %>%
arrange(desc(Overall)) %>%
kable() %>%
kable_styling()
Variable | Overall |
---|---|
V1 | 8.7322354 |
V4 | 7.6151188 |
V2 | 6.4153694 |
V5 | 2.0235246 |
V3 | 0.7635918 |
V6 | 0.1651112 |
V7 | -0.0059617 |
V10 | -0.0749448 |
V9 | -0.0952927 |
V8 | -0.1663626 |
Did the random forest model significantly use the uninformative predictors (V6-V10)?
No. The scores are very close to zero indicating they are uninformative.
Part B
Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
0.9460206
Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?
rf_model_2 <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
rf_imp_2 <- varImp(rf_model_2, scale = FALSE)
rf_imp_2 %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(Variable = rowname) %>%
arrange(desc(Overall)) %>%
kable() %>%
kable_styling()
Variable | Overall |
---|---|
V4 | 7.0475224 |
V2 | 6.0689606 |
V1 | 5.6911997 |
duplicate1 | 4.2833158 |
V5 | 1.8723844 |
V3 | 0.6297022 |
V6 | 0.1356906 |
V10 | 0.0289481 |
V9 | 0.0084044 |
V7 | -0.0134564 |
V8 | -0.0437056 |
The importance of v1
has decreased significantly since adding the additional correlated predictor. When adding the correlated predictor, both were included in the decision trees as significant predictors. This caused the overall significance to decrease.
Part C
Use the cforest
function in the party
package to fit a random forest model using conditional inference trees. The party
package function varimp
can calculate predictor importance. The conditional
argument of that function toggles between the traditional importance measure and the modified version described in Stobl et al. (2007). Do these importance show the same pattern as the traditional random forest model?
rf_model_3 <- cforest(y ~ ., data = simulated)
rf_imp_3 <- varImp(rf_model_3, conditional = TRUE)
rf_imp_3 %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(Variable = rowname) %>%
arrange(desc(Overall)) %>%
kable() %>%
kable_styling()
Variable | Overall |
---|---|
V4 | 6.0471707 |
V2 | 4.8021627 |
duplicate1 | 1.9703660 |
V1 | 1.8986240 |
V5 | 0.9850544 |
V3 | 0.0229993 |
V9 | 0.0004516 |
V10 | -0.0074653 |
V7 | -0.0104328 |
V8 | -0.0104863 |
V6 | -0.0119652 |
Part D
Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?
Boosted Tree
Training it without the duplicate predictor:
gbm_grid <- expand.grid(.interaction.depth = seq(1, 7, by = 2),
.n.trees = seq(100, 1000, by = 100),
.shrinkage = c(0.01, 0.1),
.n.minobsinnode = 10)
gbm_model_1 <- train(y ~ ., data = select(simulated, -duplicate1), method="gbm", tuneGrid = gbm_grid, verbose = FALSE)
gbm_imp_1 <- varImp(gbm_model_1)
gbm_imp_1$importance %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(Variable = rowname) %>%
arrange(desc(Overall)) %>%
kable() %>%
kable_styling()
Variable | Overall |
---|---|
V4 | 100.0000000 |
V1 | 94.3492687 |
V2 | 85.3573190 |
V5 | 37.3645989 |
V3 | 31.2757450 |
V6 | 3.2412743 |
V7 | 0.9391656 |
V9 | 0.5040653 |
V10 | 0.1698201 |
V8 | 0.0000000 |
The boosted tree did not pick up the uninformative predictors. Now let’s see what happens when I train it WITH the duplicate predictor:
gbm_model_2 <- train(y ~ ., data = simulated, method="gbm", tuneGrid = gbm_grid, verbose = FALSE)
gbm_imp_2 <- varImp(gbm_model_2)
gbm_imp_2$importance %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(Variable = rowname) %>%
arrange(desc(Overall)) %>%
kable() %>%
kable_styling()
Variable | Overall |
---|---|
V4 | 100.000000 |
V2 | 81.847724 |
V1 | 54.988209 |
V5 | 43.118057 |
duplicate1 | 39.179131 |
V3 | 33.518028 |
V6 | 2.929334 |
V8 | 2.052144 |
V7 | 2.045898 |
V10 | 1.305226 |
V9 | 0.000000 |
This model exhibits the same pattern. The duplicate becomes one of the important variables and the importance of v1
decreases.
Cubist
Again training it without the duplicate predictor:
cubist_model_1 <- train(y ~ ., data = select(simulated, -duplicate1), method="cubist")
cubist_imp_1 <- varImp(cubist_model_1)
cubist_imp_1$importance %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(Variable = rowname) %>%
arrange(desc(Overall)) %>%
kable() %>%
kable_styling()
Variable | Overall |
---|---|
V1 | 100.00000 |
V2 | 75.69444 |
V4 | 68.05556 |
V3 | 58.33333 |
V5 | 55.55556 |
V6 | 15.27778 |
V7 | 0.00000 |
V8 | 0.00000 |
V9 | 0.00000 |
V10 | 0.00000 |
Once again, the cubist model does not pick up the un-important variables. Let’s train it WITH the duplicate predictor and see what happens:
cubist_model_2 <- train(y ~ ., data = simulated, method="cubist")
cubist_imp_2 <- varImp(cubist_model_2)
cubist_imp_2$importance %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(Variable = rowname) %>%
arrange(desc(Overall)) %>%
kable() %>%
kable_styling()
Variable | Overall |
---|---|
V2 | 100.00000 |
V1 | 89.51613 |
V4 | 80.64516 |
V3 | 67.74194 |
duplicate1 | 59.67742 |
V5 | 50.00000 |
V6 | 25.00000 |
V7 | 0.00000 |
V8 | 0.00000 |
V9 | 0.00000 |
V10 | 0.00000 |
The cubist model also exhibits a similar behavior as the random forest model, but is not as pronounced.
Question 8.2
Use a simulation to show tree bias with different granularities.
I will create a simulated dataset generated by a non-linear function. I will then train the tree based models with varying levels of pruning. I will then look at the MSE on the training set and test set in relation to the complexity of the tree.
set.seed(42)
n_sample <- 350
nonlinear_function <- function(x){
sin(1.25 * x) + 2 * cos(.25*x)
}
x <- runif(n_sample, 1, 25)
f_of_x <- nonlinear_function(x)
noise <- rnorm(n_sample, 0, 2)
y <- f_of_x + noise
df <- data.frame(y=y, x=x)
in_train <- createDataPartition(df$y, p = .8, list = FALSE, times = 1)
train_df <- df[in_train,]
test_df <- df[-in_train,]
results <- data.frame(Granularity = c(NA), MSE = c(NA), data = c(NA)) %>% na.omit()
get_mse <- function(model, data){
y_hat <- predict(model, data)
mse <- mean((y_hat - data$y)^2)
return(mse)
}
for(depth in seq(1:10)){
rtree_model <- rpart(y ~ x, data = train_df, control=rpart.control(maxdepth=depth))
results <- rbind(results, data.frame(Granularity = depth, MSE = get_mse(rtree_model, train_df), data = "Training"))
results <- rbind(results, data.frame(Granularity = depth, MSE = get_mse(rtree_model, test_df), data = "Test"))
}
ggplot(results, aes(Granularity, MSE, color = data, group = data)) +
geom_line() +
geom_point() +
scale_color_brewer(palette = "Set1") +
theme(legend.position = "bottom", legend.title = element_blank())
One can see that as the granularity of the tree model increases, the MSE on the training set decreases. However the MSE on the test set initially begins to decline then increases again as the model starts to overfit the training data.
Question 8.3
In stocastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1. and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:
Fig 8.24
Part A
Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?
Because the learning rate is set to 0.1, the importance get’s spread out over more predictors. The higher learning rate will focus the importance on a smaller set of variables.
Part B
Which model do you think would be more predictive of other samples?
The one on the left. It will generalize while the one on the right will overfit the training data. Always go for an ensemble of weak predictors.
Part C
How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?
The predictor importance would get spread across more predictors as the interaction depth would increase. The slope would decrease.
Question 8.7
Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:
Part A
Which tree-based regression model gives the optimal resampling and test set performance?
library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)
# Make this reproducible
set.seed(42)
knn_model <- preProcess(ChemicalManufacturingProcess, "knnImpute")
df <- predict(knn_model, ChemicalManufacturingProcess)
df <- df %>%
select_at(vars(-one_of(nearZeroVar(., names = TRUE))))
in_train <- createDataPartition(df$Yield, times = 1, p = 0.8, list = FALSE)
train_df <- df[in_train, ]
test_df <- df[-in_train, ]
pls_model <- train(
Yield ~ ., data = train_df, method = "pls",
center = TRUE,
scale = TRUE,
trControl = trainControl("cv", number = 10),
tuneLength = 25
)
pls_predictions <- predict(pls_model, test_df)
pls_in_sample <- pls_model$results[pls_model$results$ncomp == pls_model$bestTune$ncomp,]
results <- data.frame(t(postResample(pred = pls_predictions, obs = test_df$Yield))) %>%
mutate("In Sample RMSE" = pls_in_sample$RMSE,
"In Sample Rsquared" = pls_in_sample$Rsquared,
"In Sample MAE" = pls_in_sample$MAE,
"Model"= "PLS")
pls_model
Partial Least Squares
144 samples
56 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 130, 129, 128, 129, 130, 129, ...
Resampling results across tuning parameters:
ncomp RMSE Rsquared MAE
1 0.8824790 0.3779221 0.6711462
2 1.1458456 0.4219806 0.7086431
3 0.7363066 0.5244517 0.5688553
4 0.8235294 0.5298005 0.5933120
5 0.9670735 0.4846010 0.6371199
6 0.9959036 0.4776684 0.6427478
7 0.9119517 0.4986338 0.6200233
8 0.9068621 0.5012144 0.6293371
9 0.8517370 0.5220166 0.6163795
10 0.8919356 0.5062912 0.6332243
11 0.9173758 0.4934557 0.6463164
12 0.9064125 0.4791526 0.6485663
13 0.9255289 0.4542181 0.6620193
14 1.0239913 0.4358371 0.6944056
15 1.0754710 0.4365214 0.7077991
16 1.1110579 0.4269065 0.7135684
17 1.1492855 0.4210485 0.7222868
18 1.1940639 0.4132534 0.7396357
19 1.2271867 0.4079005 0.7494818
20 1.2077102 0.4022859 0.7470327
21 1.2082648 0.4026711 0.7452969
22 1.2669285 0.3987044 0.7634170
23 1.3663033 0.3970188 0.7957514
24 1.4531634 0.3898475 0.8243034
25 1.5624265 0.3820102 0.8612935
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 3.
I’m going to fit the data against a series of tree based models. I will use caret and try to match the same paramaters to guague th preformance against the similar PLS model.
Bagged Tree
set.seed(42)
bagControl = bagControl(fit = ctreeBag$fit, predict = ctreeBag$pred, aggregate = ctreeBag$aggregate)
bag_model <- train(Yield ~ ., data = train_df, method="bag", bagControl = bagControl,
center = TRUE,
scale = TRUE,
trControl = trainControl("cv", number = 10),
tuneLength = 25)
bag_predictions <- predict(bag_model, test_df)
bag_in_sample <- merge(bag_model$results, bag_model$bestTune)
results <- data.frame(t(postResample(pred = bag_predictions, obs = test_df$Yield))) %>%
mutate("In Sample RMSE" = bag_in_sample$RMSE,
"In Sample Rsquared" = bag_in_sample$Rsquared,
"In Sample MAE" = bag_in_sample$MAE,
"Model"= "Bagged Tree") %>%
rbind(results)
bag_model
Bagged Model
144 samples
56 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 129, 129, 130, 129, 130, 130, ...
Resampling results:
RMSE Rsquared MAE
0.7108683 0.4717657 0.5795483
Tuning parameter 'vars' was held constant at a value of 56
Boosted Tree
set.seed(42)
gbm_model <- train(Yield ~ ., data = train_df, method="gbm", verbose = FALSE,
trControl = trainControl("cv", number = 10),
tuneLength = 25)
gbm_predictions <- predict(gbm_model, test_df)
gbm_in_sample <- merge(gbm_model$results, gbm_model$bestTune)
results <- data.frame(t(postResample(pred = gbm_predictions, obs = test_df$Yield))) %>%
mutate("In Sample RMSE" = gbm_in_sample$RMSE,
"In Sample Rsquared" = gbm_in_sample$Rsquared,
"In Sample MAE" = gbm_in_sample$MAE,
"Model"= "Boosted Tree") %>%
rbind(results)
gbm_model
Stochastic Gradient Boosting
144 samples
56 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 129, 129, 130, 129, 130, 130, ...
Resampling results across tuning parameters:
interaction.depth n.trees RMSE Rsquared MAE
1 50 0.6529787 0.5702357 0.5102064
1 100 0.6259745 0.5906080 0.4846082
1 150 0.6099166 0.6047640 0.4860877
1 200 0.6008357 0.6159204 0.4746704
1 250 0.6007583 0.6168513 0.4745437
1 300 0.6032872 0.6139154 0.4803398
1 350 0.6054569 0.6124108 0.4810323
1 400 0.6042296 0.6141087 0.4831893
1 450 0.6044401 0.6176478 0.4809852
1 500 0.6054335 0.6184590 0.4809517
1 550 0.6061891 0.6193797 0.4796327
1 600 0.6056039 0.6199645 0.4771155
1 650 0.6083802 0.6177853 0.4796680
1 700 0.6112744 0.6142575 0.4820985
1 750 0.6150272 0.6119709 0.4849456
1 800 0.6132495 0.6140022 0.4824066
1 850 0.6129022 0.6147128 0.4805281
1 900 0.6125136 0.6163699 0.4799557
1 950 0.6146513 0.6141389 0.4791453
1 1000 0.6158677 0.6136756 0.4810977
1 1050 0.6168063 0.6135476 0.4807789
1 1100 0.6190710 0.6114224 0.4824817
1 1150 0.6178899 0.6133618 0.4805481
1 1200 0.6182632 0.6126484 0.4816822
1 1250 0.6190874 0.6120263 0.4828504
2 50 0.6094282 0.6099501 0.4849066
2 100 0.5829629 0.6335127 0.4565230
2 150 0.5838239 0.6333416 0.4594031
2 200 0.5794192 0.6416762 0.4522288
2 250 0.5703484 0.6522903 0.4427181
2 300 0.5704290 0.6539219 0.4424843
2 350 0.5688872 0.6558810 0.4446840
2 400 0.5691133 0.6553808 0.4440911
2 450 0.5693332 0.6558510 0.4436964
2 500 0.5675464 0.6582781 0.4421581
2 550 0.5658889 0.6603078 0.4396887
2 600 0.5660413 0.6600440 0.4397280
2 650 0.5670866 0.6590858 0.4400525
2 700 0.5668798 0.6595438 0.4396349
2 750 0.5666876 0.6596874 0.4395221
2 800 0.5668211 0.6596993 0.4396469
2 850 0.5665125 0.6601553 0.4395041
2 900 0.5667711 0.6598908 0.4398854
2 950 0.5669461 0.6596749 0.4398769
2 1000 0.5668896 0.6597418 0.4397258
2 1050 0.5668324 0.6599450 0.4398214
2 1100 0.5665139 0.6602574 0.4396333
2 1150 0.5666514 0.6601821 0.4397815
2 1200 0.5668290 0.6600264 0.4399304
2 1250 0.5668629 0.6599759 0.4400462
3 50 0.5964119 0.6301569 0.4620894
3 100 0.5881904 0.6348098 0.4617679
3 150 0.5784579 0.6432397 0.4539545
3 200 0.5756343 0.6493645 0.4495719
3 250 0.5753652 0.6492650 0.4496511
3 300 0.5739679 0.6499200 0.4490757
3 350 0.5744236 0.6496053 0.4496752
3 400 0.5731088 0.6511659 0.4488762
3 450 0.5743237 0.6506949 0.4493729
3 500 0.5740582 0.6510667 0.4491126
3 550 0.5737973 0.6513584 0.4490028
3 600 0.5736313 0.6516578 0.4486977
3 650 0.5734303 0.6520309 0.4486500
3 700 0.5734154 0.6521713 0.4487097
3 750 0.5734448 0.6521538 0.4486533
3 800 0.5733412 0.6523101 0.4485162
3 850 0.5734461 0.6522107 0.4486230
3 900 0.5733855 0.6522783 0.4486340
3 950 0.5734464 0.6522402 0.4487534
3 1000 0.5734346 0.6522662 0.4487900
3 1050 0.5734458 0.6522690 0.4488393
3 1100 0.5735230 0.6521787 0.4489117
3 1150 0.5734965 0.6522345 0.4488955
3 1200 0.5734652 0.6522657 0.4489010
3 1250 0.5734822 0.6522470 0.4489104
4 50 0.6090419 0.6106657 0.4697679
4 100 0.5954252 0.6226990 0.4637289
4 150 0.5938504 0.6285557 0.4636849
4 200 0.5913247 0.6319587 0.4642710
4 250 0.5900504 0.6333095 0.4639411
4 300 0.5896414 0.6341805 0.4634896
4 350 0.5895137 0.6345933 0.4633461
4 400 0.5895641 0.6348717 0.4637587
4 450 0.5897582 0.6347995 0.4642484
4 500 0.5898016 0.6348652 0.4640096
4 550 0.5901324 0.6345240 0.4644099
4 600 0.5903075 0.6345846 0.4645382
4 650 0.5906552 0.6344835 0.4647610
4 700 0.5906150 0.6345581 0.4646932
4 750 0.5904937 0.6347022 0.4646343
4 800 0.5903896 0.6348218 0.4647111
4 850 0.5904498 0.6348500 0.4648654
4 900 0.5903898 0.6348947 0.4648645
4 950 0.5905029 0.6348003 0.4649357
4 1000 0.5904679 0.6348542 0.4649824
4 1050 0.5905540 0.6348192 0.4650847
4 1100 0.5904812 0.6348914 0.4651039
4 1150 0.5905393 0.6348489 0.4651573
4 1200 0.5905692 0.6348523 0.4652096
4 1250 0.5905788 0.6348482 0.4652295
5 50 0.6019146 0.6193173 0.4631369
5 100 0.5718941 0.6522686 0.4417011
5 150 0.5652677 0.6602092 0.4392289
5 200 0.5615494 0.6644658 0.4390921
5 250 0.5617600 0.6643786 0.4400462
5 300 0.5596646 0.6653772 0.4374645
5 350 0.5600163 0.6651947 0.4374533
5 400 0.5594723 0.6657756 0.4364679
5 450 0.5583903 0.6671558 0.4358012
5 500 0.5582863 0.6672867 0.4358657
5 550 0.5585848 0.6668710 0.4361121
5 600 0.5586374 0.6668762 0.4362419
5 650 0.5588214 0.6666570 0.4364302
5 700 0.5590060 0.6664883 0.4365978
5 750 0.5589198 0.6665628 0.4366476
5 800 0.5589112 0.6665959 0.4367284
5 850 0.5588357 0.6666270 0.4366683
5 900 0.5588534 0.6666288 0.4366668
5 950 0.5588659 0.6666386 0.4366916
5 1000 0.5588004 0.6667090 0.4366808
5 1050 0.5588444 0.6666598 0.4367332
5 1100 0.5588288 0.6667027 0.4367230
5 1150 0.5587907 0.6667457 0.4366934
5 1200 0.5587934 0.6667445 0.4367111
5 1250 0.5587895 0.6667425 0.4367281
6 50 0.6018491 0.6113981 0.4716015
6 100 0.5860809 0.6279896 0.4539058
6 150 0.5789599 0.6340675 0.4501756
6 200 0.5715997 0.6426798 0.4421709
6 250 0.5711400 0.6437785 0.4446479
6 300 0.5722309 0.6432605 0.4457324
6 350 0.5728877 0.6433494 0.4461208
6 400 0.5735122 0.6430660 0.4468629
6 450 0.5733378 0.6435084 0.4468686
6 500 0.5736811 0.6432701 0.4474261
6 550 0.5737036 0.6433590 0.4473096
6 600 0.5738540 0.6432025 0.4476653
6 650 0.5739788 0.6431127 0.4478362
6 700 0.5740895 0.6430439 0.4480447
6 750 0.5740191 0.6431101 0.4479443
6 800 0.5739330 0.6432375 0.4478419
6 850 0.5741192 0.6430959 0.4480611
6 900 0.5742059 0.6429815 0.4481779
6 950 0.5742488 0.6429632 0.4482300
6 1000 0.5742852 0.6429315 0.4482727
6 1050 0.5742550 0.6429979 0.4482394
6 1100 0.5743812 0.6428550 0.4483377
6 1150 0.5744985 0.6427302 0.4484260
6 1200 0.5745169 0.6427252 0.4484495
6 1250 0.5745410 0.6427004 0.4484620
7 50 0.6199541 0.5939358 0.4951611
7 100 0.5904056 0.6257145 0.4697978
7 150 0.5788458 0.6390945 0.4585648
7 200 0.5810272 0.6391831 0.4611296
7 250 0.5799441 0.6409543 0.4606679
7 300 0.5800624 0.6409086 0.4596577
7 350 0.5806014 0.6403463 0.4610040
7 400 0.5805843 0.6404279 0.4609606
7 450 0.5808030 0.6404105 0.4609477
7 500 0.5806460 0.6408355 0.4606353
7 550 0.5801929 0.6413834 0.4605745
7 600 0.5803194 0.6413844 0.4606553
7 650 0.5805963 0.6411193 0.4608334
7 700 0.5807950 0.6410027 0.4609297
7 750 0.5807957 0.6410385 0.4609906
7 800 0.5808788 0.6410234 0.4610401
7 850 0.5808231 0.6411133 0.4609822
7 900 0.5808365 0.6410813 0.4609990
7 950 0.5808931 0.6410494 0.4610153
7 1000 0.5809359 0.6410051 0.4610537
7 1050 0.5808882 0.6410698 0.4610093
7 1100 0.5809232 0.6410468 0.4610294
7 1150 0.5809114 0.6410602 0.4610554
7 1200 0.5808922 0.6410764 0.4610722
7 1250 0.5809613 0.6410143 0.4611178
8 50 0.5763016 0.6411581 0.4576852
8 100 0.5716689 0.6461974 0.4512790
8 150 0.5759381 0.6437704 0.4563453
8 200 0.5721415 0.6496414 0.4538074
8 250 0.5711192 0.6516050 0.4518189
8 300 0.5695114 0.6546120 0.4513573
8 350 0.5689287 0.6555732 0.4504089
8 400 0.5682439 0.6561605 0.4498706
8 450 0.5673344 0.6572600 0.4491474
8 500 0.5674381 0.6572173 0.4490904
8 550 0.5670931 0.6576854 0.4489108
8 600 0.5667498 0.6580690 0.4488511
8 650 0.5666761 0.6581715 0.4488999
8 700 0.5666796 0.6582255 0.4489354
8 750 0.5665660 0.6583684 0.4489615
8 800 0.5663936 0.6586085 0.4488358
8 850 0.5663429 0.6586493 0.4488768
8 900 0.5662484 0.6587890 0.4488474
8 950 0.5662163 0.6588234 0.4488634
8 1000 0.5662062 0.6588372 0.4488826
8 1050 0.5661932 0.6588655 0.4488774
8 1100 0.5661859 0.6588721 0.4489328
8 1150 0.5661797 0.6589055 0.4489740
8 1200 0.5661731 0.6589180 0.4489745
8 1250 0.5661145 0.6589825 0.4489516
9 50 0.6167604 0.5965443 0.4845607
9 100 0.6055422 0.6101877 0.4735896
9 150 0.6027718 0.6130671 0.4745875
9 200 0.6064713 0.6092537 0.4803259
9 250 0.6025144 0.6149249 0.4778493
9 300 0.6044546 0.6123663 0.4795388
9 350 0.6048698 0.6115446 0.4806716
9 400 0.6055499 0.6107397 0.4814121
9 450 0.6053731 0.6108421 0.4821199
9 500 0.6056554 0.6105514 0.4825302
9 550 0.6062967 0.6099543 0.4830778
9 600 0.6062888 0.6100437 0.4831844
9 650 0.6063035 0.6100249 0.4832666
9 700 0.6063524 0.6100332 0.4834070
9 750 0.6065921 0.6097913 0.4836381
9 800 0.6065971 0.6098266 0.4837327
9 850 0.6067575 0.6097239 0.4838410
9 900 0.6068762 0.6095239 0.4839607
9 950 0.6069309 0.6094834 0.4839398
9 1000 0.6070451 0.6093556 0.4840833
9 1050 0.6070858 0.6093011 0.4840669
9 1100 0.6071045 0.6092895 0.4841145
9 1150 0.6071044 0.6093033 0.4841501
9 1200 0.6071309 0.6092787 0.4841950
9 1250 0.6071436 0.6092698 0.4842299
10 50 0.5970134 0.6171065 0.4623301
10 100 0.5806102 0.6398879 0.4516936
10 150 0.5687076 0.6525233 0.4416021
10 200 0.5721478 0.6492513 0.4463209
10 250 0.5769418 0.6459730 0.4509770
10 300 0.5781494 0.6456177 0.4515166
10 350 0.5799840 0.6438356 0.4534752
10 400 0.5799262 0.6437909 0.4532906
10 450 0.5804882 0.6433379 0.4535657
10 500 0.5809207 0.6427982 0.4541454
10 550 0.5811243 0.6425842 0.4543062
10 600 0.5813681 0.6424302 0.4544117
10 650 0.5812722 0.6425518 0.4544515
10 700 0.5814412 0.6423282 0.4544904
10 750 0.5815164 0.6422026 0.4545465
10 800 0.5816406 0.6419893 0.4546413
10 850 0.5817970 0.6417711 0.4546952
10 900 0.5818765 0.6416974 0.4547529
10 950 0.5819495 0.6416149 0.4547905
10 1000 0.5819670 0.6416085 0.4547866
10 1050 0.5820416 0.6415048 0.4548642
10 1100 0.5821291 0.6413995 0.4549316
10 1150 0.5821326 0.6413981 0.4549135
10 1200 0.5821708 0.6413569 0.4549586
10 1250 0.5821957 0.6413248 0.4549680
11 50 0.6007956 0.6191324 0.4784111
11 100 0.5796223 0.6436577 0.4611684
11 150 0.5734513 0.6497914 0.4595051
11 200 0.5760920 0.6478978 0.4603030
11 250 0.5764815 0.6483934 0.4592201
11 300 0.5770282 0.6479640 0.4592198
11 350 0.5763090 0.6487221 0.4583147
11 400 0.5758337 0.6494612 0.4581384
11 450 0.5758699 0.6496084 0.4581600
11 500 0.5754348 0.6501079 0.4575764
11 550 0.5753623 0.6503882 0.4575615
11 600 0.5751066 0.6507755 0.4575770
11 650 0.5750791 0.6508957 0.4575429
11 700 0.5751269 0.6508885 0.4574905
11 750 0.5750977 0.6508783 0.4575510
11 800 0.5751921 0.6507287 0.4576147
11 850 0.5752069 0.6507458 0.4576468
11 900 0.5752048 0.6507309 0.4576749
11 950 0.5751624 0.6507659 0.4575584
11 1000 0.5751346 0.6508012 0.4575377
11 1050 0.5751284 0.6508137 0.4575096
11 1100 0.5751163 0.6508334 0.4575075
11 1150 0.5751286 0.6508189 0.4575395
11 1200 0.5750890 0.6508693 0.4575222
11 1250 0.5750407 0.6509139 0.4574603
12 50 0.6131404 0.6008363 0.4788888
12 100 0.5936990 0.6206144 0.4563439
12 150 0.5917700 0.6239359 0.4570861
12 200 0.5894059 0.6276284 0.4536889
12 250 0.5896794 0.6286413 0.4545724
12 300 0.5884641 0.6307350 0.4541598
12 350 0.5885899 0.6311055 0.4538239
12 400 0.5878622 0.6319971 0.4533814
12 450 0.5876460 0.6325633 0.4531581
12 500 0.5873139 0.6327759 0.4527693
12 550 0.5872261 0.6329262 0.4530022
12 600 0.5875665 0.6325250 0.4532315
12 650 0.5876798 0.6324167 0.4532977
12 700 0.5876485 0.6325664 0.4532492
12 750 0.5876823 0.6325857 0.4532712
12 800 0.5879170 0.6323502 0.4534515
12 850 0.5879436 0.6323289 0.4534417
12 900 0.5880458 0.6322095 0.4534969
12 950 0.5880053 0.6322652 0.4534268
12 1000 0.5880199 0.6322607 0.4534472
12 1050 0.5880897 0.6321977 0.4535073
12 1100 0.5880935 0.6321941 0.4534805
12 1150 0.5880697 0.6322349 0.4534862
12 1200 0.5881165 0.6321864 0.4535333
12 1250 0.5881200 0.6321888 0.4535375
13 50 0.5973542 0.6314045 0.4668008
13 100 0.5777799 0.6451084 0.4518852
13 150 0.5764224 0.6467808 0.4476571
13 200 0.5730664 0.6505212 0.4451696
13 250 0.5743130 0.6495475 0.4462130
13 300 0.5760493 0.6476727 0.4459779
13 350 0.5766260 0.6469370 0.4465765
13 400 0.5770238 0.6468019 0.4468128
13 450 0.5776063 0.6459700 0.4475754
13 500 0.5779502 0.6456783 0.4478369
13 550 0.5779233 0.6457420 0.4477710
13 600 0.5781181 0.6454777 0.4481283
13 650 0.5782368 0.6453590 0.4482780
13 700 0.5783686 0.6452265 0.4484776
13 750 0.5784837 0.6450636 0.4487266
13 800 0.5785037 0.6450326 0.4487840
13 850 0.5786308 0.6449255 0.4489184
13 900 0.5786399 0.6448911 0.4490071
13 950 0.5786643 0.6448690 0.4490676
13 1000 0.5787736 0.6447447 0.4491924
13 1050 0.5787891 0.6447278 0.4491872
13 1100 0.5788049 0.6447156 0.4492184
13 1150 0.5788276 0.6446876 0.4492568
13 1200 0.5788448 0.6446723 0.4492837
13 1250 0.5788611 0.6446606 0.4493125
14 50 0.5968688 0.6288067 0.4721340
14 100 0.5788231 0.6442392 0.4626092
14 150 0.5772605 0.6493066 0.4581304
14 200 0.5767852 0.6504154 0.4555545
14 250 0.5763805 0.6510261 0.4554558
14 300 0.5773394 0.6501146 0.4569615
14 350 0.5771256 0.6504539 0.4561920
14 400 0.5774954 0.6500725 0.4563451
14 450 0.5778210 0.6498560 0.4567000
14 500 0.5773189 0.6507293 0.4564541
14 550 0.5774635 0.6506929 0.4566913
14 600 0.5772649 0.6509344 0.4564884
14 650 0.5769523 0.6513005 0.4562758
14 700 0.5768395 0.6514243 0.4561250
14 750 0.5767784 0.6515344 0.4562432
14 800 0.5767428 0.6516513 0.4561898
14 850 0.5767717 0.6516118 0.4562955
14 900 0.5767780 0.6516089 0.4563007
14 950 0.5767695 0.6515845 0.4563348
14 1000 0.5768071 0.6515531 0.4563581
14 1050 0.5767456 0.6516372 0.4563169
14 1100 0.5767604 0.6516236 0.4563322
14 1150 0.5767172 0.6516701 0.4562795
14 1200 0.5766858 0.6517006 0.4562711
14 1250 0.5766666 0.6517396 0.4562572
15 50 0.5984904 0.6144377 0.4802185
15 100 0.5843983 0.6293893 0.4687758
15 150 0.5763789 0.6365162 0.4607338
15 200 0.5742064 0.6397660 0.4613239
15 250 0.5734581 0.6414163 0.4627411
15 300 0.5732190 0.6423065 0.4629835
15 350 0.5725697 0.6433780 0.4630833
15 400 0.5718269 0.6444742 0.4627995
15 450 0.5718025 0.6447416 0.4631744
15 500 0.5718086 0.6446100 0.4633308
15 550 0.5719365 0.6446516 0.4634589
15 600 0.5718646 0.6447368 0.4634482
15 650 0.5715927 0.6451576 0.4633120
15 700 0.5714562 0.6453500 0.4633540
15 750 0.5714828 0.6453266 0.4634053
15 800 0.5716535 0.6452634 0.4636229
15 850 0.5716339 0.6453263 0.4636592
15 900 0.5715742 0.6454348 0.4636541
15 950 0.5715357 0.6455050 0.4637054
15 1000 0.5715801 0.6454600 0.4637473
15 1050 0.5715296 0.6455215 0.4636962
15 1100 0.5714731 0.6456070 0.4636752
15 1150 0.5714961 0.6455984 0.4637025
15 1200 0.5714283 0.6456968 0.4636602
15 1250 0.5714074 0.6457320 0.4636621
16 50 0.5845598 0.6367314 0.4522037
16 100 0.5744352 0.6415904 0.4402636
16 150 0.5712235 0.6450458 0.4374079
16 200 0.5706646 0.6461059 0.4395498
16 250 0.5743295 0.6426962 0.4431993
16 300 0.5732029 0.6443089 0.4421847
16 350 0.5740836 0.6434172 0.4426110
16 400 0.5733769 0.6441363 0.4427320
16 450 0.5731878 0.6440416 0.4426181
16 500 0.5733640 0.6440319 0.4428262
16 550 0.5734077 0.6440675 0.4428775
16 600 0.5733253 0.6442436 0.4429795
16 650 0.5731203 0.6444578 0.4429265
16 700 0.5732675 0.6444389 0.4430143
16 750 0.5731228 0.6445695 0.4428747
16 800 0.5732311 0.6445276 0.4431213
16 850 0.5732604 0.6445157 0.4432044
16 900 0.5732764 0.6444899 0.4432273
16 950 0.5731499 0.6446022 0.4431385
16 1000 0.5732311 0.6445054 0.4432125
16 1050 0.5732592 0.6445116 0.4432604
16 1100 0.5732651 0.6444995 0.4432389
16 1150 0.5732933 0.6444539 0.4433079
16 1200 0.5733533 0.6443725 0.4433261
16 1250 0.5733543 0.6443782 0.4433468
17 50 0.5895081 0.6221239 0.4584180
17 100 0.5598563 0.6545783 0.4403988
17 150 0.5588101 0.6549230 0.4345365
17 200 0.5585532 0.6557803 0.4346063
17 250 0.5560925 0.6593080 0.4346842
17 300 0.5583117 0.6567464 0.4355277
17 350 0.5589839 0.6564977 0.4365358
17 400 0.5585887 0.6567415 0.4358248
17 450 0.5579936 0.6581041 0.4363710
17 500 0.5576780 0.6586560 0.4363073
17 550 0.5580179 0.6583182 0.4369656
17 600 0.5576574 0.6586239 0.4365961
17 650 0.5573910 0.6588929 0.4364291
17 700 0.5572706 0.6591183 0.4364193
17 750 0.5572105 0.6592226 0.4363436
17 800 0.5572080 0.6592986 0.4364028
17 850 0.5571794 0.6593780 0.4364372
17 900 0.5572748 0.6592692 0.4364702
17 950 0.5572903 0.6592425 0.4365164
17 1000 0.5573969 0.6591232 0.4366330
17 1050 0.5573523 0.6592176 0.4366466
17 1100 0.5573181 0.6592760 0.4366223
17 1150 0.5572999 0.6593232 0.4366195
17 1200 0.5573417 0.6592814 0.4366678
17 1250 0.5573217 0.6593154 0.4366610
18 50 0.5947040 0.6238031 0.4720874
18 100 0.5618769 0.6575285 0.4478613
18 150 0.5582539 0.6634045 0.4444225
18 200 0.5585195 0.6646903 0.4463124
18 250 0.5632000 0.6587717 0.4503918
18 300 0.5628079 0.6600173 0.4498614
18 350 0.5627775 0.6601182 0.4500888
18 400 0.5624382 0.6605836 0.4498481
18 450 0.5630086 0.6602329 0.4503341
18 500 0.5625749 0.6607663 0.4500855
18 550 0.5625871 0.6610884 0.4501850
18 600 0.5627920 0.6609869 0.4503990
18 650 0.5627910 0.6610874 0.4501716
18 700 0.5629293 0.6609290 0.4501911
18 750 0.5629849 0.6608811 0.4504471
18 800 0.5631089 0.6607372 0.4505341
18 850 0.5632879 0.6605472 0.4507397
18 900 0.5634311 0.6604467 0.4508773
18 950 0.5635290 0.6603438 0.4509308
18 1000 0.5636527 0.6602004 0.4510463
18 1050 0.5637119 0.6601604 0.4510915
18 1100 0.5637187 0.6601415 0.4510943
18 1150 0.5637947 0.6600683 0.4511788
18 1200 0.5637589 0.6601116 0.4511481
18 1250 0.5638153 0.6600584 0.4511662
19 50 0.6044116 0.6084186 0.4816248
19 100 0.5884591 0.6259706 0.4666916
19 150 0.5827201 0.6350315 0.4611196
19 200 0.5796750 0.6394943 0.4593404
19 250 0.5791659 0.6405009 0.4592163
19 300 0.5789062 0.6406001 0.4594678
19 350 0.5792869 0.6402798 0.4606605
19 400 0.5798080 0.6398658 0.4610008
19 450 0.5798865 0.6398399 0.4611347
19 500 0.5796314 0.6402471 0.4607525
19 550 0.5799321 0.6402072 0.4611976
19 600 0.5798439 0.6403842 0.4609075
19 650 0.5798640 0.6404004 0.4609272
19 700 0.5799447 0.6403739 0.4610496
19 750 0.5801595 0.6401531 0.4611883
19 800 0.5800469 0.6402696 0.4610050
19 850 0.5801075 0.6402613 0.4611237
19 900 0.5801117 0.6402842 0.4610773
19 950 0.5800802 0.6403546 0.4611275
19 1000 0.5801482 0.6402883 0.4612403
19 1050 0.5800854 0.6403723 0.4612398
19 1100 0.5800651 0.6404008 0.4612760
19 1150 0.5800433 0.6404542 0.4612893
19 1200 0.5800546 0.6404666 0.4613544
19 1250 0.5800492 0.6404551 0.4613605
20 50 0.6228995 0.5928646 0.4924395
20 100 0.6009344 0.6133889 0.4698906
20 150 0.5921427 0.6222307 0.4687025
20 200 0.5883624 0.6284961 0.4663482
20 250 0.5858482 0.6313708 0.4644570
20 300 0.5854396 0.6320113 0.4637267
20 350 0.5850790 0.6320588 0.4644387
20 400 0.5844287 0.6334042 0.4640988
20 450 0.5848621 0.6327404 0.4649257
20 500 0.5849726 0.6328714 0.4651716
20 550 0.5848708 0.6330092 0.4652380
20 600 0.5851218 0.6326644 0.4654536
20 650 0.5853962 0.6324062 0.4658945
20 700 0.5854769 0.6323859 0.4659694
20 750 0.5854284 0.6324044 0.4659240
20 800 0.5856137 0.6322070 0.4661607
20 850 0.5858048 0.6320155 0.4663029
20 900 0.5857902 0.6321231 0.4662757
20 950 0.5858142 0.6320788 0.4662986
20 1000 0.5859219 0.6319654 0.4663609
20 1050 0.5859580 0.6319822 0.4663748
20 1100 0.5860260 0.6319045 0.4664469
20 1150 0.5861297 0.6317874 0.4665715
20 1200 0.5861911 0.6317411 0.4666073
20 1250 0.5861864 0.6317425 0.4666093
21 50 0.6164913 0.5907092 0.4901770
21 100 0.5954901 0.6129761 0.4706662
21 150 0.5863253 0.6208539 0.4555843
21 200 0.5880713 0.6212227 0.4594044
21 250 0.5878737 0.6214425 0.4575625
21 300 0.5863312 0.6232168 0.4574849
21 350 0.5874788 0.6219323 0.4588074
21 400 0.5879728 0.6216350 0.4597942
21 450 0.5875109 0.6219386 0.4597765
21 500 0.5874788 0.6220713 0.4599140
21 550 0.5875259 0.6220454 0.4600137
21 600 0.5878106 0.6216276 0.4602113
21 650 0.5880222 0.6214441 0.4604127
21 700 0.5881523 0.6212943 0.4605623
21 750 0.5882456 0.6212172 0.4608117
21 800 0.5883022 0.6211843 0.4609491
21 850 0.5883325 0.6211449 0.4610155
21 900 0.5884684 0.6210333 0.4611763
21 950 0.5884012 0.6211157 0.4612100
21 1000 0.5884770 0.6210339 0.4613186
21 1050 0.5885063 0.6210078 0.4613713
21 1100 0.5885325 0.6209880 0.4613699
21 1150 0.5885974 0.6209189 0.4614618
21 1200 0.5885644 0.6209809 0.4614654
21 1250 0.5885900 0.6209606 0.4615024
22 50 0.5793334 0.6434253 0.4720771
22 100 0.5531895 0.6722105 0.4488228
22 150 0.5488134 0.6781810 0.4432762
22 200 0.5528279 0.6739731 0.4470342
22 250 0.5495858 0.6780192 0.4443609
22 300 0.5493595 0.6789946 0.4426837
22 350 0.5502390 0.6779176 0.4441018
22 400 0.5515463 0.6764037 0.4447485
22 450 0.5512482 0.6765891 0.4447522
22 500 0.5507834 0.6775022 0.4440172
22 550 0.5508227 0.6774242 0.4439655
22 600 0.5512931 0.6770631 0.4443603
22 650 0.5514199 0.6769012 0.4444753
22 700 0.5513916 0.6769650 0.4444461
22 750 0.5512924 0.6771605 0.4443759
22 800 0.5514432 0.6769484 0.4444075
22 850 0.5515481 0.6768914 0.4445889
22 900 0.5516099 0.6768201 0.4446169
22 950 0.5516678 0.6767554 0.4446328
22 1000 0.5516392 0.6768049 0.4446332
22 1050 0.5516028 0.6768805 0.4446273
22 1100 0.5515640 0.6769180 0.4445721
22 1150 0.5515192 0.6769851 0.4445507
22 1200 0.5515205 0.6769964 0.4445526
22 1250 0.5515608 0.6769583 0.4445800
23 50 0.6177465 0.5985763 0.4831558
23 100 0.5937322 0.6267084 0.4623962
23 150 0.6002425 0.6181106 0.4665577
23 200 0.5959912 0.6256254 0.4651166
23 250 0.5976436 0.6234317 0.4671482
23 300 0.5983712 0.6239639 0.4676933
23 350 0.5969445 0.6255335 0.4662105
23 400 0.5972253 0.6253612 0.4668455
23 450 0.5973307 0.6254383 0.4672388
23 500 0.5972284 0.6255382 0.4674735
23 550 0.5969328 0.6258570 0.4674252
23 600 0.5968363 0.6260482 0.4676239
23 650 0.5971095 0.6257743 0.4679849
23 700 0.5971499 0.6257096 0.4681145
23 750 0.5972320 0.6256212 0.4682903
23 800 0.5973860 0.6254261 0.4684687
23 850 0.5974018 0.6254181 0.4685405
23 900 0.5974908 0.6253233 0.4686105
23 950 0.5975844 0.6252652 0.4687825
23 1000 0.5976395 0.6251821 0.4688300
23 1050 0.5976240 0.6251807 0.4688325
23 1100 0.5976592 0.6251428 0.4689031
23 1150 0.5976260 0.6251732 0.4689478
23 1200 0.5976233 0.6251752 0.4689585
23 1250 0.5976330 0.6251617 0.4689663
24 50 0.6308858 0.5776525 0.4972108
24 100 0.6046511 0.6104362 0.4799937
24 150 0.5995390 0.6177781 0.4799586
24 200 0.5978228 0.6194035 0.4813903
24 250 0.5976961 0.6207397 0.4819837
24 300 0.5980968 0.6207244 0.4820177
24 350 0.5978139 0.6210982 0.4823949
24 400 0.5976232 0.6215346 0.4820746
24 450 0.5987533 0.6205269 0.4830158
24 500 0.5986787 0.6208225 0.4832110
24 550 0.5988664 0.6206171 0.4833016
24 600 0.5987075 0.6208757 0.4833147
24 650 0.5988573 0.6207512 0.4835311
24 700 0.5989057 0.6207085 0.4837492
24 750 0.5989222 0.6207121 0.4837938
24 800 0.5988335 0.6208467 0.4838003
24 850 0.5989063 0.6208014 0.4838899
24 900 0.5988832 0.6208370 0.4839103
24 950 0.5988937 0.6208618 0.4839207
24 1000 0.5989267 0.6208511 0.4839856
24 1050 0.5989133 0.6208810 0.4839989
24 1100 0.5988945 0.6209149 0.4839993
24 1150 0.5989049 0.6209117 0.4840228
24 1200 0.5989236 0.6209006 0.4840094
24 1250 0.5988978 0.6209304 0.4840227
25 50 0.5967884 0.6181418 0.4722004
25 100 0.5761255 0.6368342 0.4559817
25 150 0.5671111 0.6485046 0.4470742
25 200 0.5646577 0.6522515 0.4440080
25 250 0.5639557 0.6540791 0.4446264
25 300 0.5643908 0.6541095 0.4458751
25 350 0.5645825 0.6548046 0.4472483
25 400 0.5646346 0.6549075 0.4471908
25 450 0.5657594 0.6538895 0.4484154
25 500 0.5667385 0.6529176 0.4492661
25 550 0.5667006 0.6530126 0.4490882
25 600 0.5669719 0.6528129 0.4495295
25 650 0.5668937 0.6530017 0.4495543
25 700 0.5670696 0.6529264 0.4496043
25 750 0.5671854 0.6528701 0.4498735
25 800 0.5671214 0.6529534 0.4498998
25 850 0.5671270 0.6529565 0.4498827
25 900 0.5671469 0.6529419 0.4498852
25 950 0.5671453 0.6529481 0.4499353
25 1000 0.5671302 0.6530179 0.4499500
25 1050 0.5671623 0.6529877 0.4499665
25 1100 0.5671858 0.6529748 0.4499741
25 1150 0.5672006 0.6529891 0.4499975
25 1200 0.5672042 0.6530004 0.4500351
25 1250 0.5672439 0.6529631 0.4500798
Tuning parameter 'shrinkage' was held constant at a value of 0.1
Tuning parameter 'n.minobsinnode' was held constant at a value of 10
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were n.trees = 150, interaction.depth =
22, shrinkage = 0.1 and n.minobsinnode = 10.
Random Forest
set.seed(42)
rf_model <- train(Yield ~ ., data = train_df, method = "ranger",
scale = TRUE,
trControl = trainControl("cv", number = 10),
tuneLength = 25)
rf_predictions <- predict(rf_model, test_df)
rf_in_sample <- merge(rf_model$results, rf_model$bestTune)
results <- data.frame(t(postResample(pred = rf_predictions, obs = test_df$Yield))) %>%
mutate("In Sample RMSE" = rf_in_sample$RMSE,
"In Sample Rsquared" = rf_in_sample$Rsquared,
"In Sample MAE" = rf_in_sample$MAE,
"Model"= "Random Forest") %>%
rbind(results)
rf_model
Random Forest
144 samples
56 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 129, 129, 130, 129, 130, 130, ...
Resampling results across tuning parameters:
mtry splitrule RMSE Rsquared MAE
2 variance 0.6497858 0.6438354 0.5345934
2 extratrees 0.7107005 0.5913499 0.5896222
4 variance 0.6171845 0.6644030 0.5076339
4 extratrees 0.6635219 0.6303318 0.5515317
6 variance 0.6040398 0.6653486 0.4934050
6 extratrees 0.6438055 0.6420607 0.5349385
8 variance 0.5955708 0.6700645 0.4806501
8 extratrees 0.6262944 0.6656202 0.5207843
11 variance 0.5898819 0.6693959 0.4744251
11 extratrees 0.6146979 0.6750476 0.5068688
13 variance 0.5903434 0.6629506 0.4755268
13 extratrees 0.6078146 0.6769346 0.5042883
15 variance 0.5875867 0.6652769 0.4701072
15 extratrees 0.6046625 0.6756825 0.4957882
17 variance 0.5840520 0.6641550 0.4649468
17 extratrees 0.6063868 0.6668625 0.4976930
20 variance 0.5890784 0.6532341 0.4674457
20 extratrees 0.5946979 0.6841960 0.4852246
22 variance 0.5879453 0.6542423 0.4622086
22 extratrees 0.5963783 0.6790233 0.4886738
24 variance 0.5858153 0.6551814 0.4636970
24 extratrees 0.5971505 0.6726166 0.4875783
26 variance 0.5880400 0.6496644 0.4626234
26 extratrees 0.5981009 0.6718938 0.4871614
29 variance 0.5895010 0.6475055 0.4643408
29 extratrees 0.5967249 0.6668901 0.4851825
31 variance 0.5951690 0.6376674 0.4677508
31 extratrees 0.5976439 0.6635536 0.4847677
33 variance 0.5928874 0.6411724 0.4669791
33 extratrees 0.5924849 0.6768113 0.4838468
35 variance 0.5867051 0.6489605 0.4611046
35 extratrees 0.5929398 0.6699430 0.4837254
38 variance 0.6004208 0.6315682 0.4740340
38 extratrees 0.5968329 0.6645022 0.4862917
40 variance 0.5964688 0.6348283 0.4651849
40 extratrees 0.5939651 0.6651959 0.4825585
42 variance 0.5923281 0.6392379 0.4630702
42 extratrees 0.5927962 0.6625781 0.4817914
44 variance 0.5956628 0.6331507 0.4641829
44 extratrees 0.5925441 0.6628893 0.4776745
47 variance 0.6043288 0.6220066 0.4688005
47 extratrees 0.5964637 0.6564644 0.4838434
49 variance 0.6009236 0.6281433 0.4683679
49 extratrees 0.5942796 0.6633846 0.4830116
51 variance 0.6009500 0.6258047 0.4727813
51 extratrees 0.5895383 0.6664917 0.4784356
53 variance 0.6023795 0.6254545 0.4713244
53 extratrees 0.5917877 0.6623634 0.4786651
56 variance 0.6031440 0.6269611 0.4686278
56 extratrees 0.5920145 0.6621812 0.4794944
Tuning parameter 'min.node.size' was held constant at a value of 5
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were mtry = 17, splitrule = variance
and min.node.size = 5.
Conditional Inference Random Forest
set.seed(42)
crf_model <- train(Yield ~ ., data = train_df, method = "cforest",
trControl = trainControl("cv", number = 10),
tuneLength = 25)
crf_predictions <- predict(crf_model, test_df)
crf_in_sample <- merge(crf_model$results, crf_model$bestTune)
results <- data.frame(t(postResample(pred = crf_predictions, obs = test_df$Yield))) %>%
mutate("In Sample RMSE" = crf_in_sample$RMSE,
"In Sample Rsquared" = crf_in_sample$Rsquared,
"In Sample MAE" = crf_in_sample$MAE,
"Model"= "Conditional Random Forest") %>%
rbind(results)
crf_model
Conditional Inference Random Forest
144 samples
56 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 129, 129, 130, 129, 130, 130, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
2 0.7898970 0.5184391 0.6525080
4 0.7046135 0.5646303 0.5780069
6 0.6725218 0.5887280 0.5500396
8 0.6602660 0.5914750 0.5383059
11 0.6542383 0.5867416 0.5291135
13 0.6475067 0.5904548 0.5256915
15 0.6498053 0.5834067 0.5264809
17 0.6515475 0.5761058 0.5269520
20 0.6511948 0.5722409 0.5237739
22 0.6461205 0.5776471 0.5218458
24 0.6476044 0.5719672 0.5225875
26 0.6514498 0.5667591 0.5257020
29 0.6503738 0.5666119 0.5229556
31 0.6513746 0.5628461 0.5228537
33 0.6531447 0.5604409 0.5239990
35 0.6532619 0.5584804 0.5256739
38 0.6564974 0.5523390 0.5270321
40 0.6590074 0.5488009 0.5304481
42 0.6557320 0.5530412 0.5254378
44 0.6584156 0.5500981 0.5268187
47 0.6578192 0.5469014 0.5257655
49 0.6625088 0.5418367 0.5306678
51 0.6614085 0.5431193 0.5296342
53 0.6631957 0.5386486 0.5286492
56 0.6677752 0.5318607 0.5336618
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 22.
Results
RMSE | Rsquared | MAE | In Sample RMSE | In Sample Rsquared | In Sample MAE | Model |
---|---|---|---|---|---|---|
0.6192577 | 0.6771122 | 0.5059984 | 0.7363066 | 0.5244517 | 0.5688553 | PLS |
0.6319866 | 0.7554852 | 0.4616119 | 0.5840520 | 0.6641550 | 0.4649468 | Random Forest |
0.6569218 | 0.6667230 | 0.4849000 | 0.5488134 | 0.6781810 | 0.4432762 | Boosted Tree |
0.6934245 | 0.6790718 | 0.5093532 | 0.6461205 | 0.5776471 | 0.5218458 | Conditional Random Forest |
0.7166524 | 0.6160449 | 0.5447146 | 0.7108683 | 0.4717657 | 0.5795483 | Bagged Tree |
All tree model’s fit the in sample data better than the PLS model. However they may have overfit the data as the RMSE on the test set is higher for the random forest and boosted tree. The boosted tree model is the only on that out preformed the PLS on the test set.
Part B
Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors for the optimal linear and nonlinear models?
Here’s the top ten predictors:
loess r-squared variable importance
only 20 most important variables shown (out of 56)
Overall
ManufacturingProcess32 100.00
ManufacturingProcess13 93.82
ManufacturingProcess09 89.93
ManufacturingProcess17 88.20
BiologicalMaterial06 82.61
BiologicalMaterial03 79.44
ManufacturingProcess36 73.85
BiologicalMaterial12 72.36
ManufacturingProcess06 69.00
ManufacturingProcess11 62.34
ManufacturingProcess31 56.39
BiologicalMaterial02 50.34
BiologicalMaterial11 48.53
BiologicalMaterial09 44.76
ManufacturingProcess30 41.87
BiologicalMaterial08 40.24
ManufacturingProcess29 38.54
ManufacturingProcess33 38.16
BiologicalMaterial04 36.92
ManufacturingProcess25 36.83
The manufacturing process variables continue to dominate the list. The same variables are found in both lists (in different orders of importance)
pls variable importance
only 20 most important variables shown (out of 56)
Overall
ManufacturingProcess32 100.00
ManufacturingProcess09 88.04
ManufacturingProcess36 82.20
ManufacturingProcess13 82.11
ManufacturingProcess17 80.25
ManufacturingProcess06 59.06
ManufacturingProcess11 55.93
BiologicalMaterial02 55.46
BiologicalMaterial06 54.64
BiologicalMaterial03 54.50
ManufacturingProcess33 53.91
ManufacturingProcess12 52.04
BiologicalMaterial08 49.76
BiologicalMaterial12 47.40
ManufacturingProcess34 45.47
BiologicalMaterial11 45.05
BiologicalMaterial01 44.18
BiologicalMaterial04 42.95
ManufacturingProcess04 39.94
ManufacturingProcess28 36.61
Part C
Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?
set.seed(1)
cart_model <- train(Yield ~ ., data = train_df, method = "rpart",
trControl = trainControl("cv", number = 10),
tuneLength = 25)
fancyRpartPlot(cart_model$finalModel, sub="")
This indicates that if you want to maximize yield, then the manufacturing process 32 must be greater than or equal to 0.19, process 13 < -0.85 and the Biological material 3 >= 0.49. This recipie will produce the greatest yield. Caution should be used however as this model is likely overfitting the training data.