library(gridExtra)
library(ggplot2)
library(cowplot)
options(scipen=10000)
library(mlbench)
library(tidyverse)
library(corrplot)
library(AppliedPredictiveModeling)
library(caret)
library(DataExplorer)
library(kableExtra)
library(mice)
library(pls)
# library(party)
library(partykit)
library(Cubist)
library(gbm)
library(RWeka)
library(randomForest)
library(rpart)
Recreate the simulated data from Exercise 7.2:
set.seed(200)
simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"
(A) Fit a random forest model to all of the predictors, then estimate the variable importance scores:
model1 <- randomForest(y ~ ., data = simulated, importance = TRUE,ntree = 1000)
rfImp1 <- varImp(model1, scale = FALSE)
Did the random forest model significantly use the uninformative predictors (V6 – V10)?
rfImp1 |> kable() |> kable_styling() |> kable_classic(full_width = F)
| Overall | |
|---|---|
| V1 | 8.7322354 |
| V2 | 6.4153694 |
| V3 | 0.7635918 |
| V4 | 7.6151188 |
| V5 | 2.0235246 |
| V6 | 0.1651112 |
| V7 | -0.0059617 |
| V8 | -0.1663626 |
| V9 | -0.0952927 |
| V10 | -0.0749448 |
(b) Now add an additional predictor that is highly correlated with one of the informative predictors.
set.seed(200)
simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V1)
## [1] 0.9497025
Fit another random forest model to these data. Did the importance score for V1 change?
model2 <- randomForest(y ~ ., data = simulated, importance = TRUE,ntree = 1000)
rfImp2 <- varImp(model2, scale = FALSE)
rfImp2 |> kable() |> kable_styling() |> kable_classic(full_width = F)
| Overall | |
|---|---|
| V1 | 6.0070978 |
| V2 | 6.0593790 |
| V3 | 0.5846529 |
| V4 | 6.8636329 |
| V5 | 2.1993989 |
| V6 | 0.1089804 |
| V7 | 0.0610421 |
| V8 | -0.0405920 |
| V9 | 0.0612366 |
| V10 | 0.0999934 |
| duplicate1 | 4.4332317 |
Adding the highly correlated predictor duplicate1 significantly impacted the importance scores of the predictors, particularly V1. Before the addition, V1 was the most influential variable with an importance score of 8.73. After duplicate1 was introduced, V1’s importance dropped to 6.01, while duplicate1 gained an importance of 4.43. This redistribution of importance indicates that duplicate1 shares much of the same information as V1, leading to the tree splitting on both variables interchangeably. The combined importance of V1 and duplicate1 (10.44) now exceeds the original score of V1, suggesting that the new variable added redundancy rather than unique predictive power.
The addition of duplicate1 had minimal impact on most other predictors, but some changes were noticeable. For instance, V2 retained a stable level of importance, increasing slightly from 6.42 to 6.06, indicating it continues to provide unique value. Similarly, V4, another highly ranked variable, saw only a small reduction in importance from 7.62 to 6.86, showing its contribution remains distinct. However, variables with previously negative importance scores, such as V7, V8, V9, and V10, shifted to small positive scores. This suggests that the tree’s structure adjusted subtly to incorporate the new predictor, potentially introducing minor noise or complexity.
What happens when you add another predictor that is also highly correlated with V1?
set.seed(200)
simulated$duplicate2 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate2, simulated$V1)
## [1] 0.9497025
model3 <- randomForest(y ~ ., data = simulated, importance = TRUE,ntree = 1000)
rfImp3 <- varImp(model3, scale = FALSE)
rfImp3 |> kable() |> kable_styling() |> kable_classic(full_width = F)
| Overall | |
|---|---|
| V1 | 4.8485712 |
| V2 | 6.5024265 |
| V3 | 0.5180883 |
| V4 | 7.2161682 |
| V5 | 2.1163163 |
| V6 | 0.2206636 |
| V7 | -0.0206681 |
| V8 | -0.0696477 |
| V9 | -0.0290896 |
| V10 | -0.0310594 |
| duplicate1 | 2.8416816 |
| duplicate2 | 2.9431803 |
When adding another highly correlated predictor, duplicate2, the importance scores further shifted, indicating increased redundancy among the correlated variables. The most notable change is the continued decline in the importance of V1, which dropped from 6.01 (after adding duplicate1) to 4.85. Similarly, duplicate1 experienced a decrease in importance, going from 4.43 to 2.84, while duplicate2 gained an importance of 2.94. This redistribution shows that the model is now splitting more evenly among these three highly correlated predictors, diluting the original significance of V1 and further fragmenting the total importance across redundant features.
Other predictors, such as V2 and V4, remain relatively stable, with slight changes in their importance scores (e.g., V2 increasing from 6.06 to 6.50 and V4 increasing slightly from 6.86 to 7.22). These variables appear less affected by the redundancy introduced by the new predictors, suggesting that they provide unique and independent contributions to the model. However, smaller predictors with low or negative importance scores (V7 through V10) continue to show minor fluctuations, which may indicate that the model is adjusting its structure to account for the added variables, potentially introducing additional noise or overfitting.
The addition of duplicate2 compounds the issue of redundancy. With three highly correlated variables (V1, duplicate1, and duplicate2) now competing for splits, the interpretability of the model is further reduced. While these variables may collectively retain a high combined importance, their individual contributions are diluted, making it harder to pinpoint which variable is truly driving the predictions. Moreover, the additional complexity risks overfitting, as the tree may become overly tuned to subtle patterns in the training data that do not generalize well to unseen data.
(c) Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. Do these importances show the same pattern as the traditional random forest model?
cforesModel <- cforest(y~.,data = simulated)
cforestImp <- varimp(cforesModel)
cforestImp
## V1 V2 V3 V4 V5 V6
## 6.21845674 5.56546823 0.02466853 6.27940404 2.05636356 -0.04239662
## V7 V8 V9 V10 duplicate1 duplicate2
## 0.25299052 -0.18617935 -0.00109216 -0.04479538 4.34075599 4.47382273
The cforest model using conditional inference trees produces variable importance scores that align with traditional random forests in overall patterns but differ in key ways. Predictors like V1 and V4 remain highly influential, with consistent rankings, while correlated predictors such as duplicate1 and duplicate2 have proportionately lower scores due to the conditional inference framework. This method reduces the bias common in traditional random forests, where correlated variables often dominate. Additionally, cforest distributes importance more evenly across predictors, providing a more balanced and interpretable evaluation. These adjustments make cforest particularly effective for datasets with highly correlated variables, offering a refined and less biased measure of variable contributions.
(d) Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?
cubistMod <- cubist(x=simulated[,-11], y=simulated$y, committees = 100)
cubistImp <- varImp(cubistMod)
cubistImp |> kable() |> kable_styling() |> kable_classic(full_width = F)
| Overall | |
|---|---|
| V1 | 71.0 |
| V3 | 46.5 |
| V2 | 59.0 |
| V4 | 48.0 |
| V5 | 32.5 |
| V6 | 12.0 |
| V8 | 1.0 |
| V7 | 0.0 |
| V9 | 0.0 |
| V10 | 0.0 |
| duplicate1 | 0.0 |
| duplicate2 | 0.0 |
gbm_model <- gbm( y ~., data = simulated, distribution = "gaussian")
summary.gbm(gbm_model) |> kable() |> kable_styling() |> kable_classic(full_width = F)
| var | rel.inf | |
|---|---|---|
| V4 | V4 | 31.2421619 |
| V2 | V2 | 22.9371220 |
| V1 | V1 | 20.4583103 |
| V5 | V5 | 9.2107901 |
| duplicate1 | duplicate1 | 8.0500628 |
| V3 | V3 | 7.7284024 |
| V10 | V10 | 0.1293276 |
| V9 | V9 | 0.1227886 |
| V6 | V6 | 0.1210345 |
| V7 | V7 | 0.0000000 |
| V8 | V8 | 0.0000000 |
| duplicate2 | duplicate2 | 0.0000000 |
The variable importance patterns in Cubist and boosted tree models highlight significant differences in how predictors, particularly correlated ones, are treated compared to cforest and traditional random forests. In the Cubist model, variables like V1 (71.0), V2 (59.0), and V4 (48.0) dominate the importance rankings, while correlated predictors such as duplicate1 and duplicate2 are assigned zero importance. This approach reflects Cubist’s reliance on rule-based models that focus exclusively on uniquely informative features, discarding redundant or irrelevant predictors. Similarly, variables with little predictive value, such as V7, V8, V9, and V10, are also given zero importance, emphasizing the model’s efficiency and clarity.
Boosted trees, in contrast, take a more balanced approach. They prioritize top predictors like V4 (30.46), V2 (23.07), and V1 (22.18), while assigning small but non-zero importance to correlated variables, such as duplicate1 (3.07). Other correlated or less relevant predictors, including duplicate2 and V7 through V10, are excluded entirely. This indicates that boosted trees adjust for correlation bias more effectively than traditional random forests, though they still allow limited contributions from redundant features when useful.
Compared to Cubist and boosted trees, traditional random forests and cforest often over-distribute importance among correlated predictors, potentially inflating their contribution. For example, in these models, correlated features like duplicate1 and duplicate2 would likely have higher importance scores, reflecting a less refined handling of redundancy. Cubist avoids this issue by completely ignoring redundant features, while boosted trees strike a middle ground, preserving some contributions from correlated variables without overemphasizing them.
Both Cubist and boosted trees effectively exclude irrelevant variables like V7, V8, V9, and V10, assigning them zero importance and focusing on meaningful predictors. This efficiency contrasts with traditional random forests, where irrelevant variables may still receive small importance scores due to the random splitting process.
Use a simulation to show tree bias with different granularities.
This R code simulates a dataset (simData) with five predictors (a, b, c, d, and e) of varying cardinality, ranging from low (a: 1–10) to high (e: 1–100,000), and a continuous target variable (y). The target variable is constructed as a linear combination of the predictors, with added random noise (rnorm(500)) to introduce variability. The dataset contains 500 observations for each variable, making it suitable for analyzing the effects of predictor granularity on model performance. This setup is particularly useful for studying selection bias in decision trees, where high-cardinality variables may dominate splits, or for evaluating the performance of regularization techniques like ridge or lasso regression in handling predictors with differing granularities.
set.seed(624)
a <- sample(1:10, 500, replace = TRUE)
b <- sample(1:100, 500, replace = TRUE)
c <- sample(1:1000, 500, replace = TRUE)
d <- sample(1:10000, 500, replace = TRUE)
e <- sample(1:100000 , 500, replace = TRUE)
y <- a + b + c + d + e + rnorm(500)
simData <- data.frame(a,b,c,d,e,y)
rf_sim_model <- rpart(y ~., data = simData)
plot(as.party(rf_sim_model), gp = gpar(fontsize = 7))
varImp(rf_sim_model, scale = FALSE) |> kable() |> kable_styling() |> kable_classic()
| Overall | |
|---|---|
| a | 0.0857510 |
| b | 0.1101497 |
| c | 0.0987033 |
| d | 0.2777858 |
| e | 3.5363248 |
he results of your simulation reveal how variable importance is influenced by predictor granularity, as seen in the stark differences in the importance scores. The predictor e, with the highest cardinality (1 to 100,000), dominates the model with an importance score of 3.536, far surpassing the other predictors. This suggests that the model disproportionately favors e due to its granularity, allowing it to generate more split points or patterns, regardless of its actual predictive value. This is a clear indication of selection bias, where high-cardinality variables are preferred by the model’s splitting criteria, potentially at the expense of generalization.
The predictor d, with moderate cardinality (1 to 10,000), comes in second with an importance score of 0.278, significantly lower than e but still notable. This shows that while d contributes meaningfully, it is far less influential than e. Predictors with lower cardinality, such as b (1 to 100) and c (1 to 1,000), have more balanced importance scores (0.110 and 0.099, respectively), reflecting their reduced ability to dominate the model’s splits. Lastly, a (1 to 10), the predictor with the lowest cardinality, has the smallest importance score (0.086), highlighting how low-cardinality variables are often undervalued in tree-based models.
In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance.
Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:
(a) Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?
The model on the right, with higher bagging fraction (0.9) and learning rate (0.9), assigns most of its importance to the first few predictors because the larger bagging fraction reduces randomness in tree construction, allowing the model to focus heavily on the most prominent variables. Additionally, the high learning rate causes the model to quickly fit the strongest signals in the data, reinforcing the dominance of these variables early in the boosting process. In contrast, the model on the left, with a lower bagging fraction (0.1) and learning rate (0.1), introduces more randomness and slower learning, which forces the model to explore weaker predictors. This results in a more balanced distribution of importance across all predictors.
(b) Which model do you think would be more predictive of other samples?
The model on the left (with parameters set to 0.1) is likely to be more predictive of other samples because it avoids overfitting by incorporating more randomness (through the lower bagging fraction) and learning more gradually (due to the lower learning rate). This prevents the model from overemphasizing a few dominant predictors and instead allows it to capture broader patterns in the data. In contrast, the model on the right may overfit the training data by focusing too heavily on the top predictors, potentially neglecting weaker but relevant predictors, leading to poorer generalization to new samples.
(c) How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?
Increasing the interaction depth allows the trees in the boosting process to model more complex relationships between predictors. For the model on the left (low bagging fraction and learning rate), a higher interaction depth would likely spread variable importance more evenly because the model would uncover deeper, nonlinear relationships involving weaker predictors, thus reducing the dominance of the top variables. For the model on the right (high bagging fraction and learning rate), increasing interaction depth could exacerbate the dominance of the top predictors, as the model would continue to focus on fitting these variables more precisely while neglecting others. This would steepen the slope of predictor importance, making the top few predictors even more dominant in the model.
Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:
data("ChemicalManufacturingProcess")
cherNearZero <- nearZeroVar(ChemicalManufacturingProcess)
chermical <- ChemicalManufacturingProcess[,-cherNearZero]
Impute values using The mice() function from the mice package, which is used to perform multiple imputations on the chermical dataset to fill in missing values. The method used is predictive mean matching (method = ‘pmm’), which ensures that imputed values are plausible and within the range of observed data.
set.seed(2425) # For reproducibility
chermical_imp <- mice(chermical, m = 5, method = 'pmm', maxit = 5, seed = 123, printFlag = FALSE)
# chermical_imp <- mice(ChemicalManufacturingProcess, m = 5, method = 'pmm', maxit = 5, seed = 123, printFlag = FALSE)
chermical_comp <- complete(chermical_imp,1)
Creating Training and Test Data
cherTrainIndex <- sample(1:nrow(chermical_comp), size = 0.8 * nrow(chermical_comp))
chermicalTrain <- chermical_comp[cherTrainIndex,]
chermicalTest <- chermical_comp[-cherTrainIndex,]
Single Tree
set.seed(100)
rpartTune <- train(chermicalTrain[,-1], chermicalTrain$Yield, method = "rpart2",tuneLength = 10, trControl = trainControl(method = "cv"))
rpartPred <- predict(rpartTune,newdata = chermicalTest[,-1])
rpartRes <- postResample(pred = rpartPred, obs = chermicalTest$Yield)
rpartTune
## CART
##
## 140 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 127, 126, 125, 127, 125, 127, ...
## Resampling results across tuning parameters:
##
## maxdepth RMSE Rsquared MAE
## 1 1.404739 0.4150347 1.140204
## 2 1.479797 0.3671338 1.191433
## 3 1.469447 0.3803790 1.175602
## 4 1.485177 0.3948714 1.146837
## 5 1.490764 0.4115412 1.154055
## 6 1.450839 0.4510370 1.121649
## 7 1.363147 0.5121741 1.040263
## 8 1.365615 0.5154297 1.034380
## 9 1.379205 0.5143133 1.047034
## 10 1.380839 0.5135127 1.045842
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was maxdepth = 7.
Model Trees
set.seed(100)
modelTreeTune <- train(chermicalTrain[,-1], chermicalTrain$Yield, method = "M5",trControl = trainControl(method = "cv"), control = Weka_control(M=10))
modelTreePred <- predict(modelTreeTune,newdata = chermicalTest[,-1])
modelTreeRes <- postResample(pred = modelTreePred, obs = chermicalTest$Yield)
modelTreeRes
## RMSE Rsquared MAE
## 1.4643349 0.4431539 1.1980594
Cubist
set.seed(100)
cubistModelTune <- train(chermicalTrain[,-1], chermicalTrain$Yield, method = "cubist",trControl = trainControl(method = "cv"))
cubistModelPred <- predict(cubistModelTune,newdata = chermicalTest[,-1])
cubistModelRes <- postResample(pred = cubistModelPred, obs = chermicalTest$Yield)
cubistModelTune
## Cubist
##
## 140 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 127, 126, 125, 127, 125, 127, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared MAE
## 1 0 1.2467712 0.5854664 0.9807772
## 1 5 1.0805349 0.6671514 0.8468379
## 1 9 1.1519368 0.6295097 0.8792709
## 10 0 1.0774857 0.6491037 0.8480869
## 10 5 0.9724057 0.7168211 0.7452267
## 10 9 1.0177043 0.6915741 0.7751079
## 20 0 1.0295093 0.6796523 0.8300451
## 20 5 0.9204291 0.7435137 0.7217447
## 20 9 0.9699223 0.7196101 0.7488811
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.
Random Forest
set.seed(100)
rfModel <- train(chermicalTrain[,-1], chermicalTrain$Yield, method = "rf", tuneLength = 10)
rfModelPred <- predict(rfModel,newdata = chermicalTest[,-1])
rfModelRes <- postResample(pred = rfModelPred, obs = chermicalTest$Yield)
rfModel
## Random Forest
##
## 140 samples
## 56 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 140, 140, 140, 140, 140, 140, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 1.272053 0.5687612 0.9949192
## 8 1.193187 0.5929994 0.9144980
## 14 1.185824 0.5907743 0.8993367
## 20 1.181076 0.5897573 0.8925643
## 26 1.187479 0.5833690 0.8970205
## 32 1.192287 0.5770820 0.8984233
## 38 1.196684 0.5733757 0.9001139
## 44 1.194891 0.5736438 0.8954857
## 50 1.209292 0.5618632 0.9063232
## 56 1.212015 0.5595457 0.9074975
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 20.
Gradient Boosting
set.seed(100)
gbmGrid <- expand.grid(.interaction.depth = seq(1, 7, by = 2), .n.trees = seq(100, 1000, by = 50),.shrinkage = c(0.01, 0.1),.n.minobsinnode = c(5, 10, 15))
boostedModel <- train(chermicalTrain[,-1], chermicalTrain$Yield, method = "gbm",verbose=F, tuneGrid = gbmGrid)
boostedModelPred <- predict(boostedModel,newdata = chermicalTest[,-1])
boostedModelRes <- postResample(pred = boostedModelPred, obs = chermicalTest$Yield)
boostedModel
## Stochastic Gradient Boosting
##
## 140 samples
## 56 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 140, 140, 140, 140, 140, 140, ...
## Resampling results across tuning parameters:
##
## shrinkage interaction.depth n.minobsinnode n.trees RMSE Rsquared
## 0.01 1 5 100 1.447019 0.5097096
## 0.01 1 5 150 1.358553 0.5314230
## 0.01 1 5 200 1.299334 0.5462468
## 0.01 1 5 250 1.261279 0.5551794
## 0.01 1 5 300 1.236968 0.5603678
## 0.01 1 5 350 1.222449 0.5623446
## 0.01 1 5 400 1.213451 0.5633263
## 0.01 1 5 450 1.206464 0.5647281
## 0.01 1 5 500 1.202060 0.5651778
## 0.01 1 5 550 1.199200 0.5657047
## 0.01 1 5 600 1.196671 0.5661539
## 0.01 1 5 650 1.195742 0.5657342
## 0.01 1 5 700 1.194827 0.5659274
## 0.01 1 5 750 1.193871 0.5660770
## 0.01 1 5 800 1.194082 0.5658445
## 0.01 1 5 850 1.194818 0.5651834
## 0.01 1 5 900 1.194524 0.5652387
## 0.01 1 5 950 1.194858 0.5650373
## 0.01 1 5 1000 1.195719 0.5645459
## 0.01 1 10 100 1.447788 0.5077014
## 0.01 1 10 150 1.357803 0.5336095
## 0.01 1 10 200 1.298941 0.5473692
## 0.01 1 10 250 1.260294 0.5557958
## 0.01 1 10 300 1.235004 0.5609691
## 0.01 1 10 350 1.217779 0.5648960
## 0.01 1 10 400 1.206682 0.5670427
## 0.01 1 10 450 1.199791 0.5687225
## 0.01 1 10 500 1.195871 0.5693610
## 0.01 1 10 550 1.193377 0.5698539
## 0.01 1 10 600 1.192008 0.5697876
## 0.01 1 10 650 1.190924 0.5699239
## 0.01 1 10 700 1.190698 0.5696044
## 0.01 1 10 750 1.189831 0.5698301
## 0.01 1 10 800 1.189931 0.5696800
## 0.01 1 10 850 1.190005 0.5695517
## 0.01 1 10 900 1.189938 0.5696215
## 0.01 1 10 950 1.190675 0.5689813
## 0.01 1 10 1000 1.189804 0.5695037
## 0.01 1 15 100 1.449309 0.5088935
## 0.01 1 15 150 1.359646 0.5356379
## 0.01 1 15 200 1.301433 0.5482561
## 0.01 1 15 250 1.261365 0.5566188
## 0.01 1 15 300 1.236889 0.5610935
## 0.01 1 15 350 1.221021 0.5643455
## 0.01 1 15 400 1.210068 0.5667918
## 0.01 1 15 450 1.203022 0.5683345
## 0.01 1 15 500 1.197459 0.5698649
## 0.01 1 15 550 1.194667 0.5704195
## 0.01 1 15 600 1.193386 0.5703227
## 0.01 1 15 650 1.190649 0.5716199
## 0.01 1 15 700 1.189454 0.5721533
## 0.01 1 15 750 1.188338 0.5725178
## 0.01 1 15 800 1.187777 0.5726781
## 0.01 1 15 850 1.188085 0.5725662
## 0.01 1 15 900 1.187670 0.5728639
## 0.01 1 15 950 1.188041 0.5725399
## 0.01 1 15 1000 1.188164 0.5725290
## 0.01 3 5 100 1.351088 0.5489686
## 0.01 3 5 150 1.265762 0.5630884
## 0.01 3 5 200 1.224098 0.5706230
## 0.01 3 5 250 1.200265 0.5762423
## 0.01 3 5 300 1.187074 0.5796398
## 0.01 3 5 350 1.177404 0.5833314
## 0.01 3 5 400 1.171457 0.5853884
## 0.01 3 5 450 1.168036 0.5864780
## 0.01 3 5 500 1.165936 0.5874449
## 0.01 3 5 550 1.164418 0.5881616
## 0.01 3 5 600 1.163739 0.5885077
## 0.01 3 5 650 1.163589 0.5883912
## 0.01 3 5 700 1.163640 0.5882540
## 0.01 3 5 750 1.163762 0.5880769
## 0.01 3 5 800 1.164157 0.5879779
## 0.01 3 5 850 1.164273 0.5879539
## 0.01 3 5 900 1.165270 0.5874106
## 0.01 3 5 950 1.165366 0.5873577
## 0.01 3 5 1000 1.165458 0.5873277
## 0.01 3 10 100 1.350518 0.5483103
## 0.01 3 10 150 1.266488 0.5630378
## 0.01 3 10 200 1.220998 0.5720433
## 0.01 3 10 250 1.196903 0.5784577
## 0.01 3 10 300 1.183340 0.5821412
## 0.01 3 10 350 1.176463 0.5839239
## 0.01 3 10 400 1.172918 0.5847868
## 0.01 3 10 450 1.170279 0.5857589
## 0.01 3 10 500 1.168728 0.5863385
## 0.01 3 10 550 1.167706 0.5866853
## 0.01 3 10 600 1.167573 0.5867073
## 0.01 3 10 650 1.167631 0.5867300
## 0.01 3 10 700 1.167208 0.5871079
## 0.01 3 10 750 1.168214 0.5865256
## 0.01 3 10 800 1.169199 0.5858943
## 0.01 3 10 850 1.169807 0.5855358
## 0.01 3 10 900 1.169937 0.5855106
## 0.01 3 10 950 1.170669 0.5851434
## 0.01 3 10 1000 1.171326 0.5847266
## 0.01 3 15 100 1.369375 0.5367162
## 0.01 3 15 150 1.284723 0.5543923
## 0.01 3 15 200 1.235792 0.5669788
## 0.01 3 15 250 1.209795 0.5729684
## 0.01 3 15 300 1.195046 0.5771818
## 0.01 3 15 350 1.184278 0.5811509
## 0.01 3 15 400 1.176907 0.5837834
## 0.01 3 15 450 1.173487 0.5848524
## 0.01 3 15 500 1.171976 0.5852831
## 0.01 3 15 550 1.171529 0.5850815
## 0.01 3 15 600 1.170325 0.5856918
## 0.01 3 15 650 1.170429 0.5852573
## 0.01 3 15 700 1.170643 0.5850736
## 0.01 3 15 750 1.171124 0.5847420
## 0.01 3 15 800 1.171486 0.5845571
## 0.01 3 15 850 1.172756 0.5837029
## 0.01 3 15 900 1.173288 0.5834971
## 0.01 3 15 950 1.173748 0.5832846
## 0.01 3 15 1000 1.174601 0.5828940
## 0.01 5 5 100 1.324821 0.5669964
## 0.01 5 5 150 1.243708 0.5769858
## 0.01 5 5 200 1.202488 0.5837717
## 0.01 5 5 250 1.181541 0.5881880
## 0.01 5 5 300 1.170158 0.5914270
## 0.01 5 5 350 1.163526 0.5932798
## 0.01 5 5 400 1.160890 0.5937220
## 0.01 5 5 450 1.159576 0.5939151
## 0.01 5 5 500 1.158428 0.5943424
## 0.01 5 5 550 1.157625 0.5946665
## 0.01 5 5 600 1.157494 0.5945307
## 0.01 5 5 650 1.157448 0.5944577
## 0.01 5 5 700 1.157211 0.5945045
## 0.01 5 5 750 1.157398 0.5943318
## 0.01 5 5 800 1.157915 0.5939843
## 0.01 5 5 850 1.158533 0.5936079
## 0.01 5 5 900 1.158655 0.5935921
## 0.01 5 5 950 1.159069 0.5933886
## 0.01 5 5 1000 1.159196 0.5933502
## 0.01 5 10 100 1.336317 0.5516433
## 0.01 5 10 150 1.257194 0.5636310
## 0.01 5 10 200 1.214876 0.5728218
## 0.01 5 10 250 1.193877 0.5774997
## 0.01 5 10 300 1.181108 0.5817145
## 0.01 5 10 350 1.173908 0.5840806
## 0.01 5 10 400 1.169012 0.5861627
## 0.01 5 10 450 1.167320 0.5869138
## 0.01 5 10 500 1.165545 0.5877465
## 0.01 5 10 550 1.164532 0.5882261
## 0.01 5 10 600 1.164918 0.5878815
## 0.01 5 10 650 1.164553 0.5881695
## 0.01 5 10 700 1.164160 0.5884098
## 0.01 5 10 750 1.163812 0.5885900
## 0.01 5 10 800 1.163909 0.5886547
## 0.01 5 10 850 1.164502 0.5884103
## 0.01 5 10 900 1.165599 0.5878949
## 0.01 5 10 950 1.165681 0.5878726
## 0.01 5 10 1000 1.166104 0.5876986
## 0.01 5 15 100 1.367378 0.5378150
## 0.01 5 15 150 1.284614 0.5527230
## 0.01 5 15 200 1.237791 0.5637044
## 0.01 5 15 250 1.209840 0.5716853
## 0.01 5 15 300 1.195654 0.5752853
## 0.01 5 15 350 1.186250 0.5781255
## 0.01 5 15 400 1.180940 0.5797948
## 0.01 5 15 450 1.177926 0.5808413
## 0.01 5 15 500 1.175194 0.5821955
## 0.01 5 15 550 1.173779 0.5826264
## 0.01 5 15 600 1.173272 0.5829092
## 0.01 5 15 650 1.172749 0.5831216
## 0.01 5 15 700 1.172740 0.5831129
## 0.01 5 15 750 1.173532 0.5825371
## 0.01 5 15 800 1.174260 0.5821318
## 0.01 5 15 850 1.174483 0.5821781
## 0.01 5 15 900 1.174249 0.5826449
## 0.01 5 15 950 1.175638 0.5820092
## 0.01 5 15 1000 1.176002 0.5818369
## 0.01 7 5 100 1.317188 0.5672400
## 0.01 7 5 150 1.238418 0.5775221
## 0.01 7 5 200 1.197132 0.5859237
## 0.01 7 5 250 1.178057 0.5901079
## 0.01 7 5 300 1.167613 0.5928925
## 0.01 7 5 350 1.161855 0.5945070
## 0.01 7 5 400 1.159449 0.5950018
## 0.01 7 5 450 1.158583 0.5950415
## 0.01 7 5 500 1.158085 0.5949638
## 0.01 7 5 550 1.157951 0.5947579
## 0.01 7 5 600 1.157871 0.5946948
## 0.01 7 5 650 1.158025 0.5945250
## 0.01 7 5 700 1.158171 0.5943528
## 0.01 7 5 750 1.158475 0.5940874
## 0.01 7 5 800 1.159168 0.5936478
## 0.01 7 5 850 1.160003 0.5930391
## 0.01 7 5 900 1.160497 0.5927183
## 0.01 7 5 950 1.161207 0.5922627
## 0.01 7 5 1000 1.161442 0.5921746
## 0.01 7 10 100 1.341886 0.5488741
## 0.01 7 10 150 1.256352 0.5660375
## 0.01 7 10 200 1.210707 0.5775349
## 0.01 7 10 250 1.189918 0.5819357
## 0.01 7 10 300 1.177267 0.5858145
## 0.01 7 10 350 1.170416 0.5881924
## 0.01 7 10 400 1.167248 0.5889586
## 0.01 7 10 450 1.164318 0.5900515
## 0.01 7 10 500 1.162991 0.5906760
## 0.01 7 10 550 1.162531 0.5906447
## 0.01 7 10 600 1.162163 0.5908599
## 0.01 7 10 650 1.162516 0.5906535
## 0.01 7 10 700 1.162093 0.5907155
## 0.01 7 10 750 1.162033 0.5908185
## 0.01 7 10 800 1.162640 0.5905119
## 0.01 7 10 850 1.162547 0.5906005
## 0.01 7 10 900 1.163498 0.5899878
## 0.01 7 10 950 1.163635 0.5899847
## 0.01 7 10 1000 1.164233 0.5898024
## 0.01 7 15 100 1.371910 0.5359255
## 0.01 7 15 150 1.287636 0.5519301
## 0.01 7 15 200 1.240505 0.5634642
## 0.01 7 15 250 1.212434 0.5711180
## 0.01 7 15 300 1.196232 0.5758861
## 0.01 7 15 350 1.186502 0.5788916
## 0.01 7 15 400 1.182046 0.5794550
## 0.01 7 15 450 1.178100 0.5811160
## 0.01 7 15 500 1.174473 0.5831996
## 0.01 7 15 550 1.173462 0.5833860
## 0.01 7 15 600 1.172399 0.5840403
## 0.01 7 15 650 1.171879 0.5842382
## 0.01 7 15 700 1.171637 0.5845326
## 0.01 7 15 750 1.172348 0.5841003
## 0.01 7 15 800 1.172610 0.5841228
## 0.01 7 15 850 1.173301 0.5838210
## 0.01 7 15 900 1.173975 0.5836412
## 0.01 7 15 950 1.174985 0.5831624
## 0.01 7 15 1000 1.174859 0.5834006
## 0.10 1 5 100 1.213478 0.5521675
## 0.10 1 5 150 1.217279 0.5521443
## 0.10 1 5 200 1.224092 0.5489711
## 0.10 1 5 250 1.227786 0.5472584
## 0.10 1 5 300 1.230074 0.5476390
## 0.10 1 5 350 1.233547 0.5463142
## 0.10 1 5 400 1.238542 0.5437949
## 0.10 1 5 450 1.241687 0.5424108
## 0.10 1 5 500 1.243782 0.5418339
## 0.10 1 5 550 1.243524 0.5421721
## 0.10 1 5 600 1.244955 0.5418996
## 0.10 1 5 650 1.244547 0.5425414
## 0.10 1 5 700 1.245511 0.5422417
## 0.10 1 5 750 1.247148 0.5415002
## 0.10 1 5 800 1.247375 0.5414359
## 0.10 1 5 850 1.247557 0.5416919
## 0.10 1 5 900 1.247712 0.5416996
## 0.10 1 5 950 1.248938 0.5409682
## 0.10 1 5 1000 1.248862 0.5409830
## 0.10 1 10 100 1.189618 0.5712514
## 0.10 1 10 150 1.198923 0.5670858
## 0.10 1 10 200 1.203176 0.5649283
## 0.10 1 10 250 1.208502 0.5637223
## 0.10 1 10 300 1.215725 0.5599193
## 0.10 1 10 350 1.221801 0.5577290
## 0.10 1 10 400 1.226197 0.5561989
## 0.10 1 10 450 1.229809 0.5557138
## 0.10 1 10 500 1.234117 0.5532441
## 0.10 1 10 550 1.237764 0.5515347
## 0.10 1 10 600 1.239002 0.5507229
## 0.10 1 10 650 1.240340 0.5501422
## 0.10 1 10 700 1.242779 0.5492981
## 0.10 1 10 750 1.243885 0.5488031
## 0.10 1 10 800 1.245667 0.5482248
## 0.10 1 10 850 1.246837 0.5475410
## 0.10 1 10 900 1.247778 0.5472313
## 0.10 1 10 950 1.248687 0.5471619
## 0.10 1 10 1000 1.249411 0.5468654
## 0.10 1 15 100 1.221617 0.5474400
## 0.10 1 15 150 1.227837 0.5468158
## 0.10 1 15 200 1.236687 0.5423564
## 0.10 1 15 250 1.249332 0.5367201
## 0.10 1 15 300 1.258115 0.5333085
## 0.10 1 15 350 1.263176 0.5318863
## 0.10 1 15 400 1.264882 0.5319889
## 0.10 1 15 450 1.269593 0.5307591
## 0.10 1 15 500 1.274951 0.5282395
## 0.10 1 15 550 1.277938 0.5274828
## 0.10 1 15 600 1.282478 0.5251732
## 0.10 1 15 650 1.288308 0.5222678
## 0.10 1 15 700 1.292401 0.5206324
## 0.10 1 15 750 1.295222 0.5192267
## 0.10 1 15 800 1.298175 0.5181128
## 0.10 1 15 850 1.301562 0.5166859
## 0.10 1 15 900 1.304039 0.5156219
## 0.10 1 15 950 1.305862 0.5147897
## 0.10 1 15 1000 1.306305 0.5147948
## 0.10 3 5 100 1.190002 0.5696843
## 0.10 3 5 150 1.189779 0.5717183
## 0.10 3 5 200 1.189944 0.5720633
## 0.10 3 5 250 1.189752 0.5726229
## 0.10 3 5 300 1.189222 0.5731061
## 0.10 3 5 350 1.188799 0.5735828
## 0.10 3 5 400 1.188402 0.5738628
## 0.10 3 5 450 1.188522 0.5738511
## 0.10 3 5 500 1.188462 0.5739148
## 0.10 3 5 550 1.188441 0.5739639
## 0.10 3 5 600 1.188515 0.5739313
## 0.10 3 5 650 1.188487 0.5739563
## 0.10 3 5 700 1.188481 0.5739768
## 0.10 3 5 750 1.188473 0.5739828
## 0.10 3 5 800 1.188478 0.5739818
## 0.10 3 5 850 1.188480 0.5739827
## 0.10 3 5 900 1.188478 0.5739851
## 0.10 3 5 950 1.188474 0.5739898
## 0.10 3 5 1000 1.188474 0.5739917
## 0.10 3 10 100 1.190084 0.5713781
## 0.10 3 10 150 1.194718 0.5702439
## 0.10 3 10 200 1.197668 0.5691288
## 0.10 3 10 250 1.199619 0.5684872
## 0.10 3 10 300 1.200438 0.5682853
## 0.10 3 10 350 1.201222 0.5681834
## 0.10 3 10 400 1.201570 0.5681051
## 0.10 3 10 450 1.202104 0.5679376
## 0.10 3 10 500 1.202246 0.5679064
## 0.10 3 10 550 1.202466 0.5678422
## 0.10 3 10 600 1.202849 0.5677302
## 0.10 3 10 650 1.202896 0.5677351
## 0.10 3 10 700 1.203093 0.5676101
## 0.10 3 10 750 1.203230 0.5675682
## 0.10 3 10 800 1.203400 0.5674910
## 0.10 3 10 850 1.203595 0.5673902
## 0.10 3 10 900 1.203634 0.5673770
## 0.10 3 10 950 1.203707 0.5673578
## 0.10 3 10 1000 1.203786 0.5673294
## 0.10 3 15 100 1.192378 0.5702510
## 0.10 3 15 150 1.196793 0.5705297
## 0.10 3 15 200 1.198012 0.5711307
## 0.10 3 15 250 1.199340 0.5711871
## 0.10 3 15 300 1.202913 0.5697418
## 0.10 3 15 350 1.204230 0.5694243
## 0.10 3 15 400 1.206219 0.5686282
## 0.10 3 15 450 1.206436 0.5686687
## 0.10 3 15 500 1.206883 0.5686931
## 0.10 3 15 550 1.207613 0.5683935
## 0.10 3 15 600 1.208033 0.5683882
## 0.10 3 15 650 1.208112 0.5684069
## 0.10 3 15 700 1.208334 0.5683095
## 0.10 3 15 750 1.208864 0.5680689
## 0.10 3 15 800 1.208870 0.5681159
## 0.10 3 15 850 1.209229 0.5679685
## 0.10 3 15 900 1.209657 0.5677932
## 0.10 3 15 950 1.209915 0.5677082
## 0.10 3 15 1000 1.210098 0.5676514
## 0.10 5 5 100 1.181944 0.5751687
## 0.10 5 5 150 1.185614 0.5735566
## 0.10 5 5 200 1.188395 0.5722569
## 0.10 5 5 250 1.189282 0.5718395
## 0.10 5 5 300 1.190031 0.5714649
## 0.10 5 5 350 1.190367 0.5713403
## 0.10 5 5 400 1.190781 0.5711198
## 0.10 5 5 450 1.191008 0.5710021
## 0.10 5 5 500 1.191166 0.5709016
## 0.10 5 5 550 1.191267 0.5708551
## 0.10 5 5 600 1.191357 0.5708106
## 0.10 5 5 650 1.191448 0.5707583
## 0.10 5 5 700 1.191497 0.5707355
## 0.10 5 5 750 1.191534 0.5707107
## 0.10 5 5 800 1.191549 0.5707052
## 0.10 5 5 850 1.191562 0.5706985
## 0.10 5 5 900 1.191585 0.5706840
## 0.10 5 5 950 1.191593 0.5706808
## 0.10 5 5 1000 1.191600 0.5706771
## 0.10 5 10 100 1.195691 0.5682091
## 0.10 5 10 150 1.198749 0.5674704
## 0.10 5 10 200 1.199956 0.5672834
## 0.10 5 10 250 1.203219 0.5659653
## 0.10 5 10 300 1.204993 0.5654169
## 0.10 5 10 350 1.205576 0.5652971
## 0.10 5 10 400 1.206799 0.5647351
## 0.10 5 10 450 1.208084 0.5642656
## 0.10 5 10 500 1.208453 0.5641448
## 0.10 5 10 550 1.208928 0.5639762
## 0.10 5 10 600 1.209119 0.5639881
## 0.10 5 10 650 1.209361 0.5639385
## 0.10 5 10 700 1.209857 0.5636995
## 0.10 5 10 750 1.209961 0.5637183
## 0.10 5 10 800 1.210118 0.5636829
## 0.10 5 10 850 1.210355 0.5635753
## 0.10 5 10 900 1.210338 0.5636303
## 0.10 5 10 950 1.210507 0.5635612
## 0.10 5 10 1000 1.210608 0.5635399
## 0.10 5 15 100 1.178704 0.5796249
## 0.10 5 15 150 1.183493 0.5776990
## 0.10 5 15 200 1.188591 0.5756666
## 0.10 5 15 250 1.193852 0.5732207
## 0.10 5 15 300 1.197266 0.5720600
## 0.10 5 15 350 1.199637 0.5710328
## 0.10 5 15 400 1.201292 0.5704473
## 0.10 5 15 450 1.202853 0.5699279
## 0.10 5 15 500 1.204518 0.5690862
## 0.10 5 15 550 1.204733 0.5691566
## 0.10 5 15 600 1.206124 0.5685310
## 0.10 5 15 650 1.206758 0.5683214
## 0.10 5 15 700 1.207717 0.5679098
## 0.10 5 15 750 1.208639 0.5675066
## 0.10 5 15 800 1.209546 0.5670273
## 0.10 5 15 850 1.209869 0.5668843
## 0.10 5 15 900 1.210593 0.5665626
## 0.10 5 15 950 1.211140 0.5663361
## 0.10 5 15 1000 1.211330 0.5662371
## 0.10 7 5 100 1.180369 0.5799249
## 0.10 7 5 150 1.184182 0.5776166
## 0.10 7 5 200 1.185016 0.5770471
## 0.10 7 5 250 1.186530 0.5761418
## 0.10 7 5 300 1.187020 0.5759295
## 0.10 7 5 350 1.187771 0.5755367
## 0.10 7 5 400 1.187996 0.5754255
## 0.10 7 5 450 1.188263 0.5753154
## 0.10 7 5 500 1.188462 0.5752086
## 0.10 7 5 550 1.188570 0.5751475
## 0.10 7 5 600 1.188650 0.5750996
## 0.10 7 5 650 1.188711 0.5750699
## 0.10 7 5 700 1.188735 0.5750596
## 0.10 7 5 750 1.188734 0.5750637
## 0.10 7 5 800 1.188754 0.5750532
## 0.10 7 5 850 1.188764 0.5750463
## 0.10 7 5 900 1.188769 0.5750425
## 0.10 7 5 950 1.188776 0.5750390
## 0.10 7 5 1000 1.188779 0.5750380
## 0.10 7 10 100 1.178993 0.5819350
## 0.10 7 10 150 1.185590 0.5781003
## 0.10 7 10 200 1.188829 0.5769821
## 0.10 7 10 250 1.190866 0.5764092
## 0.10 7 10 300 1.191734 0.5761238
## 0.10 7 10 350 1.193329 0.5755589
## 0.10 7 10 400 1.194585 0.5749792
## 0.10 7 10 450 1.195277 0.5746825
## 0.10 7 10 500 1.195932 0.5744762
## 0.10 7 10 550 1.196230 0.5744514
## 0.10 7 10 600 1.196455 0.5743546
## 0.10 7 10 650 1.196722 0.5742549
## 0.10 7 10 700 1.196934 0.5741910
## 0.10 7 10 750 1.197172 0.5741195
## 0.10 7 10 800 1.197246 0.5741130
## 0.10 7 10 850 1.197464 0.5740240
## 0.10 7 10 900 1.197592 0.5739798
## 0.10 7 10 950 1.197692 0.5739649
## 0.10 7 10 1000 1.197754 0.5739479
## 0.10 7 15 100 1.198777 0.5673261
## 0.10 7 15 150 1.207289 0.5635011
## 0.10 7 15 200 1.214747 0.5601300
## 0.10 7 15 250 1.217301 0.5594996
## 0.10 7 15 300 1.220062 0.5583290
## 0.10 7 15 350 1.220438 0.5586161
## 0.10 7 15 400 1.222647 0.5578167
## 0.10 7 15 450 1.225133 0.5568037
## 0.10 7 15 500 1.225759 0.5567022
## 0.10 7 15 550 1.226756 0.5564129
## 0.10 7 15 600 1.227396 0.5562497
## 0.10 7 15 650 1.227656 0.5562340
## 0.10 7 15 700 1.227783 0.5561985
## 0.10 7 15 750 1.227980 0.5563385
## 0.10 7 15 800 1.228233 0.5562892
## 0.10 7 15 850 1.228693 0.5562098
## 0.10 7 15 900 1.229179 0.5559805
## 0.10 7 15 950 1.229341 0.5559678
## 0.10 7 15 1000 1.229516 0.5558708
## MAE
## 1.1424950
## 1.0686025
## 1.0164939
## 0.9808341
## 0.9561673
## 0.9400430
## 0.9286990
## 0.9206187
## 0.9158850
## 0.9119560
## 0.9090607
## 0.9073625
## 0.9060124
## 0.9040963
## 0.9034880
## 0.9028511
## 0.9013382
## 0.9005586
## 0.9000604
## 1.1432606
## 1.0683661
## 1.0168096
## 0.9815914
## 0.9568267
## 0.9392462
## 0.9273855
## 0.9197558
## 0.9136102
## 0.9099429
## 0.9084025
## 0.9068871
## 0.9065525
## 0.9047739
## 0.9040002
## 0.9037246
## 0.9029454
## 0.9034623
## 0.9024870
## 1.1460624
## 1.0722843
## 1.0214830
## 0.9841271
## 0.9588356
## 0.9424207
## 0.9300933
## 0.9212539
## 0.9147480
## 0.9113380
## 0.9092880
## 0.9059338
## 0.9042248
## 0.9023034
## 0.9020961
## 0.9018920
## 0.9016930
## 0.9016536
## 0.9009930
## 1.0622740
## 0.9875508
## 0.9458926
## 0.9192849
## 0.9050165
## 0.8954825
## 0.8877221
## 0.8823518
## 0.8785210
## 0.8752011
## 0.8729265
## 0.8712103
## 0.8703954
## 0.8692063
## 0.8682261
## 0.8674051
## 0.8671225
## 0.8665202
## 0.8659361
## 1.0621652
## 0.9894472
## 0.9453027
## 0.9205368
## 0.9043671
## 0.8956300
## 0.8911413
## 0.8880324
## 0.8853708
## 0.8837849
## 0.8827831
## 0.8820462
## 0.8812126
## 0.8815948
## 0.8822040
## 0.8821222
## 0.8813246
## 0.8815469
## 0.8815469
## 1.0797430
## 1.0072603
## 0.9600237
## 0.9309269
## 0.9154293
## 0.9034928
## 0.8960298
## 0.8915690
## 0.8893401
## 0.8877994
## 0.8852178
## 0.8840530
## 0.8834601
## 0.8835577
## 0.8836217
## 0.8843213
## 0.8844772
## 0.8845634
## 0.8851495
## 1.0369029
## 0.9643489
## 0.9232863
## 0.9013271
## 0.8885079
## 0.8800765
## 0.8751722
## 0.8718609
## 0.8692269
## 0.8677228
## 0.8664806
## 0.8652684
## 0.8641385
## 0.8634712
## 0.8630532
## 0.8625288
## 0.8617187
## 0.8613411
## 0.8610208
## 1.0484360
## 0.9763510
## 0.9354307
## 0.9143534
## 0.9000359
## 0.8917198
## 0.8861524
## 0.8845001
## 0.8825342
## 0.8812110
## 0.8811898
## 0.8804125
## 0.8794040
## 0.8785893
## 0.8782795
## 0.8784604
## 0.8790102
## 0.8787967
## 0.8790023
## 1.0779841
## 1.0073288
## 0.9615489
## 0.9316940
## 0.9156967
## 0.9048267
## 0.8983086
## 0.8952429
## 0.8924549
## 0.8902415
## 0.8890821
## 0.8877679
## 0.8870491
## 0.8867980
## 0.8864725
## 0.8860843
## 0.8856299
## 0.8864765
## 0.8863267
## 1.0337338
## 0.9603255
## 0.9178515
## 0.8967451
## 0.8851257
## 0.8775914
## 0.8736101
## 0.8706898
## 0.8689023
## 0.8677433
## 0.8669088
## 0.8660669
## 0.8652369
## 0.8647240
## 0.8644753
## 0.8644863
## 0.8644081
## 0.8642880
## 0.8641404
## 1.0565859
## 0.9812584
## 0.9358693
## 0.9122612
## 0.8990248
## 0.8906103
## 0.8871040
## 0.8835327
## 0.8815662
## 0.8804837
## 0.8792205
## 0.8789500
## 0.8783276
## 0.8780553
## 0.8782120
## 0.8777426
## 0.8780669
## 0.8782918
## 0.8784334
## 1.0805092
## 1.0078190
## 0.9627128
## 0.9329611
## 0.9162375
## 0.9058814
## 0.8992271
## 0.8952491
## 0.8913521
## 0.8893864
## 0.8877361
## 0.8864120
## 0.8855812
## 0.8859708
## 0.8859094
## 0.8864832
## 0.8864446
## 0.8868725
## 0.8866267
## 0.9160187
## 0.9155826
## 0.9180710
## 0.9188470
## 0.9191749
## 0.9205172
## 0.9227432
## 0.9239313
## 0.9254684
## 0.9244353
## 0.9250027
## 0.9235421
## 0.9244529
## 0.9252820
## 0.9255632
## 0.9256766
## 0.9258614
## 0.9266159
## 0.9267356
## 0.9002566
## 0.9040853
## 0.9064518
## 0.9086384
## 0.9136976
## 0.9168332
## 0.9201566
## 0.9228847
## 0.9265064
## 0.9287910
## 0.9292985
## 0.9302784
## 0.9320921
## 0.9328748
## 0.9339292
## 0.9355963
## 0.9362687
## 0.9364716
## 0.9374596
## 0.9298301
## 0.9305589
## 0.9373585
## 0.9474787
## 0.9533674
## 0.9573119
## 0.9586143
## 0.9611567
## 0.9656314
## 0.9686461
## 0.9721389
## 0.9769001
## 0.9806006
## 0.9831239
## 0.9851294
## 0.9883123
## 0.9901816
## 0.9913247
## 0.9922162
## 0.8874077
## 0.8864151
## 0.8857647
## 0.8851202
## 0.8845576
## 0.8837830
## 0.8831549
## 0.8831169
## 0.8829661
## 0.8829562
## 0.8830049
## 0.8829858
## 0.8829626
## 0.8829461
## 0.8829447
## 0.8829491
## 0.8829451
## 0.8829431
## 0.8829418
## 0.8967754
## 0.8993229
## 0.9018629
## 0.9031527
## 0.9042799
## 0.9052528
## 0.9058830
## 0.9061335
## 0.9064190
## 0.9065086
## 0.9068161
## 0.9068269
## 0.9070967
## 0.9073435
## 0.9075529
## 0.9077845
## 0.9078483
## 0.9079737
## 0.9080501
## 0.9039329
## 0.9059087
## 0.9046035
## 0.9038809
## 0.9053481
## 0.9059438
## 0.9065338
## 0.9069552
## 0.9071976
## 0.9074812
## 0.9076078
## 0.9075557
## 0.9076171
## 0.9080113
## 0.9079363
## 0.9082412
## 0.9085548
## 0.9088873
## 0.9089193
## 0.8853046
## 0.8863368
## 0.8874662
## 0.8873093
## 0.8876844
## 0.8877806
## 0.8879834
## 0.8880616
## 0.8881333
## 0.8881617
## 0.8881971
## 0.8882501
## 0.8882777
## 0.8882885
## 0.8882929
## 0.8882980
## 0.8883132
## 0.8883136
## 0.8883186
## 0.9049646
## 0.9064228
## 0.9060315
## 0.9078894
## 0.9091167
## 0.9094791
## 0.9105971
## 0.9113784
## 0.9118732
## 0.9123970
## 0.9125617
## 0.9127424
## 0.9132669
## 0.9132655
## 0.9133583
## 0.9134547
## 0.9134267
## 0.9135285
## 0.9135814
## 0.8974175
## 0.8971247
## 0.8984428
## 0.9012319
## 0.9036256
## 0.9055068
## 0.9064895
## 0.9075488
## 0.9088913
## 0.9090566
## 0.9101083
## 0.9104716
## 0.9114942
## 0.9121050
## 0.9128210
## 0.9130568
## 0.9135046
## 0.9139611
## 0.9141048
## 0.8855182
## 0.8853238
## 0.8852935
## 0.8861529
## 0.8865381
## 0.8871750
## 0.8872716
## 0.8873792
## 0.8875097
## 0.8875275
## 0.8875514
## 0.8875721
## 0.8875673
## 0.8875506
## 0.8875704
## 0.8875720
## 0.8875713
## 0.8875745
## 0.8875744
## 0.8817915
## 0.8867196
## 0.8882159
## 0.8888184
## 0.8901346
## 0.8917135
## 0.8928126
## 0.8933995
## 0.8937912
## 0.8942770
## 0.8947290
## 0.8949682
## 0.8952665
## 0.8954472
## 0.8956582
## 0.8958679
## 0.8960044
## 0.8960695
## 0.8961230
## 0.8990625
## 0.9047394
## 0.9095113
## 0.9097451
## 0.9124456
## 0.9129181
## 0.9147280
## 0.9166897
## 0.9173845
## 0.9181225
## 0.9188402
## 0.9192195
## 0.9192111
## 0.9194587
## 0.9196979
## 0.9201118
## 0.9206616
## 0.9208454
## 0.9211276
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 700, interaction.depth =
## 5, shrinkage = 0.01 and n.minobsinnode = 5.
(a) Which tree-based regression model gives the optimal resampling and test set performance?
headers <- c("RMSE","Rsquared","MAE","Model")
df1 <- as.data.frame(rpartRes) |> transpose() |> as.data.frame()
df1$Model <- "Single Tree"
df2 <- as.data.frame(modelTreeRes) |> transpose() |> as.data.frame()
df2$Model <- "Model Trees"
df3 <- as.data.frame(cubistModelRes) |> transpose() |> as.data.frame()
df3$Model <- "Cubist"
df4 <- as.data.frame(rfModelRes) |> transpose() |> as.data.frame()
df4$Model <- "Random Forest"
df5 <- as.data.frame(boostedModelRes) |> transpose() |> as.data.frame()
df5$Model <- "Gradient Boosted Tree"
names(df1) <- headers
names(df2) <- headers
names(df3) <- headers
names(df4) <- headers
names(df5) <- headers
bind_rows(
df1,
df2,
df3,
df4,
df5
)
## RMSE Rsquared MAE Model
## 1 1.496860 0.4162677 1.1601514 Single Tree
## 2 1.464335 0.4431539 1.1980594 Model Trees
## 3 1.035197 0.7032927 0.8366871 Cubist
## 4 1.385352 0.5079031 1.1242788 Random Forest
## 5 1.258575 0.5788920 1.0227603 Gradient Boosted Tree
The Cubist model is the best-performing model based on both the training and test set metrics. In the training set, it achieves the lowest RMSE (0.9202), the highest R-squared (0.7435), and the lowest MAE (0.7208) with 20 committees and 5 neighbors, demonstrating strong predictive accuracy. On the test set, the Cubist model maintains its performance with the lowest RMSE (1.0331), the highest R-squared (0.7031), and the lowest MAE (0.8255), showing consistent results across datasets. These findings confirm that the Cubist model is both accurate and reliable, making it the most suitable choice for this analysis.
(b) Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?
predictorImportance <- varImp(cubistModelTune)$importance
predictorImportance$Name <- rownames(predictorImportance)
predictorImportance <- predictorImportance[order(-predictorImportance$Overall),]
predictorImportance <- predictorImportance[1:nrow(predictorImportance),]
rownames(predictorImportance) <- NULL
predictorImportance |> head(10) |> kable() |> kable_styling() |> kable_classic(full_width = F)
| Overall | Name |
|---|---|
| 100.00000 | ManufacturingProcess32 |
| 55.55556 | BiologicalMaterial06 |
| 45.37037 | ManufacturingProcess09 |
| 37.96296 | ManufacturingProcess33 |
| 37.03704 | BiologicalMaterial02 |
| 31.48148 | ManufacturingProcess17 |
| 25.00000 | BiologicalMaterial03 |
| 25.00000 | ManufacturingProcess04 |
| 23.14815 | BiologicalMaterial11 |
| 21.29630 | ManufacturingProcess29 |
plot(varImp(cubistModelTune), top=10, main="Cubist")
The most important predictors in the optimal tree-based regression model are dominated by manufacturing process variables, with the highest importance attributed to ManufacturingProcess32 (100.00), followed by ManufacturingProcess09 (45.37), ManufacturingProcess33 (37.96), and ManufacturingProcess17 (31.48). However, biological variables also play a significant role, as BiologicalMaterial06 ranks second overall with an importance score of 55.56, and other biological variables such as BiologicalMaterial02 (37.04) and BiologicalMaterial11 (23.15) are prominent in the list. While both types of variables contribute meaningfully, manufacturing process variables dominate the list overall, particularly with the most influential predictor, ManufacturingProcess32, standing out significantly.
plot(varImp(rpartTune), top=10,main="Single Tree")
plot(varImp(modelTreeTune), top=10,main="Model Tree")
plot(varImp(rfModel), top=10, main="Random Forest")
plot(varImp(boostedModel), top=10,main="Gradient Boosted")
The top 10 important predictors in the other models share similarities with the Cubist model but also exhibit distinct differences in how importance is distributed. Across all models, ManufacturingProcess32 consistently ranks as the most important predictor, confirming its critical role in the dataset. However, the treatment of secondary predictors, such as BiologicalMaterial06, ManufacturingProcess09, and others, varies across models, reflecting their underlying methodologies.
In the Cubist model, the top 10 predictors are well-distributed, with secondary predictors like BiologicalMaterial06 (55.56), ManufacturingProcess09 (45.37), and BiologicalMaterial02 (37.04) maintaining significant importance, albeit at levels noticeably below ManufacturingProcess32. This balance highlights Cubist’s ability to capture contributions from multiple variables while maintaining clear differentiation.
The gradient boosted model prioritizes ManufacturingProcess32 (100.00) but assigns nearly uniform and low importance (<20) to other predictors, showing less differentiation within the top 10. This suggests a focus on marginal improvements from weaker predictors rather than emphasizing a hierarchy, contrasting with Cubist’s more structured importance ranking.
The random forest model aligns more closely with Cubist in terms of the key predictors identified, such as BiologicalMaterial06 (~37) and BiologicalMaterial03 (~27), but it exhibits a steeper decline in importance after the top variables. Predictors beyond the top three fall between 10 and 20, indicating a sharper focus on a few dominant features while capturing less contribution from weaker ones compared to Cubist.
The model trees approach assigns much higher importance to secondary predictors than other models. For example, BiologicalMaterial06 (90) and variables like ManufacturingProcess09 and BiologicalMaterial03 (50-75) remain significant. However, the smaller gap between the top and secondary predictors reflects a less balanced ranking than Cubist, where importance gradually tapers across variables.
In the single tree model, the importance is concentrated among fewer variables, with BiologicalMaterial06 (~85), BiologicalMaterial03 (~85), and ManufacturingProcess31 (~73) receiving substantial weight. This reflects the simplicity of single trees, which rely heavily on a limited number of dominant predictors. Unlike Cubist or random forests, the single tree’s rankings do not account for interactions or ensemble adjustments, resulting in a less nuanced importance distribution.
Overall, while all models agree on the dominance of ManufacturingProcess32, their treatment of secondary predictors varies. Cubist excels at balancing importance across multiple predictors, providing nuanced insights into their contributions. Gradient boosting focuses heavily on the strongest predictor, with less differentiation among others. Random forests and model trees give moderate emphasis to secondary predictors but with less balance than Cubist. The single tree simplifies the ranking further, concentrating importance on a few dominant variables. These differences illustrate how model methodologies influence the perceived importance of the top 10 predictors.
(c) Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?
set.seed(100)
pred <- predictorImportance |> select(Name) |> unique() |> head(10)
chermicalSub <- chermicalTrain |> select(all_of(pred$Name))
chermicalSub$Yield <- chermicalTrain$Yield
cart_model <- rpart(Yield ~ ., data = chermicalSub)
plot(as.party(cart_model),ip_args = list(abbreviate = 4), gp = gpar(fontsize = 8))
This decision tree provides a clear view of the relationships between biological and process predictors and their influence on yield. Process variables, particularly ManufacturingProcess32, dominate the tree’s structure, with additional splits involving ManufacturingProcess17 and ManufacturingProcess09 refining predictions further. These variables establish critical thresholds, such as ManufacturingProcess32 < 158.5, which significantly segment the data and highlight their dominant role in influencing yield outcomes.
Biological variables, such as BiologicalMaterial11 and BiologicalMaterial06, also play a meaningful role, especially in refining subsets of data. For instance, BiologicalMaterial11 is critical when ManufacturingProcess32 is less than 158.5, and BiologicalMaterial06 influences predictions when ManufacturingProcess32 is greater than or equal to 158.5. These variables complement the process predictors, providing a more nuanced understanding of yield variability.
The tree highlights key interactions between biological and process predictors, showing that their combined effects significantly shape yield outcomes. Thresholds and segmentation within the tree provide actionable insights, with biological variables adding precision where process variables alone are insufficient. Overall, the tree demonstrates that while process variables dominate, the interplay with biological predictors is crucial for accurate yield predictions.