# Import required R libraries
library(AppliedPredictiveModeling)
library(tidyverse)
#library(pls)
#library(elasticnet)
#library(corrplot)
# libraries for Chapter 8
library(caret)
library(Cubist)
library(gbm)
library(ipred)
library(party)
library(partykit)
library(randomForest)
library(rpart)
library(RWeka)
library(kableExtra)
# Set seed for assignment
set.seed(200)
Recreate the simulated data from Exercise 7.2:
library(mlbench)
<-mlbench.friedman1(200, sd = 1)
simulated <-cbind(simulated$x, simulated$y)
simulated <-as.data.frame(simulated)
simulated colnames(simulated)[ncol(simulated)] <-"y"
#simulated
Fit a random forest model to all of the predictors, then estimate the variable importance scores:
<- randomForest(y ~ .,
model1 data = simulated,
importance = TRUE,
ntree = 1000)
<- varImp(model1, scale = FALSE)
rfImp1
rfImp1
## Overall
## V1 8.83890885
## V2 6.49023056
## V3 0.67583163
## V4 7.58822553
## V5 2.27426009
## V6 0.17436781
## V7 0.15136583
## V8 -0.03078937
## V9 -0.02989832
## V10 -0.08529218
Did the random forest model significantly use the uninformative predictors (V6
– V10
)?
Answer: No, based on the output of the variable importance above, the uninformative predictors score very close to zero.
Note: The importance value of V1
is 8.83890885, which is the highest score.
Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
$duplicate1 <- simulated$V1 + rnorm(200) * .1
simulated
cor(simulated$duplicate1, simulated$V1)
## [1] 0.9396216
Result: Yes, the correlation between V1
and duplicate1
is above 90% … highly correlated.
Fit another random forest model to these data. Did the importance score for V1
change? What happens when you add another predictor that is also highly correlated with V1
?
<- randomForest(y ~ .,
model2 data = simulated,
importance = TRUE,
ntree = 1000)
<- varImp(model2, scale = FALSE)
rfImp2
rfImp2
## Overall
## V1 6.29780744
## V2 6.08038134
## V3 0.58410718
## V4 6.93924427
## V5 2.03104094
## V6 0.07947642
## V7 -0.02566414
## V8 -0.11007435
## V9 -0.08839463
## V10 -0.00715093
## duplicate1 3.56411581
Answer: The importance value of V1
decreased to 6.29780744 from 8.83890885 with the addition of variable duplicate1
. A drop of almost 29%.
$duplicate2 <- simulated$V1 + rnorm(200) * .1
simulated
cor(simulated$duplicate2, simulated$V1)
## [1] 0.9312569
Result: The correlation between V1
and duplicate2
is also above 93%.
<- randomForest(y ~ .,
model3 data = simulated,
importance = TRUE,
ntree = 1000)
<- varImp(model3, scale = FALSE)
rfImp3
rfImp3
## Overall
## V1 5.656397024
## V2 6.957366954
## V3 0.539700105
## V4 7.280227792
## V5 2.094226861
## V6 0.141163232
## V7 0.092792498
## V8 -0.096325566
## V9 -0.007463533
## V10 0.016839393
## duplicate1 2.566313355
## duplicate2 2.654958084
Answer: Again, the importance of V1
has decreased, this time from 6.29780744 to 5.656397024 with the addition of a second highly correlated variable.
Use the cforest
function in the party
package to fit a random forest model using conditional inference trees. The party
package function varimp
can calculate predictor importance. The conditional
argument of that function toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?
<- cforest_control(mtry = ncol(simulated) - 1)
bagCtrl
<- party::cforest(y ~ ., data = simulated, controls = bagCtrl)
model4
<- party::varimp(model4, conditional = FALSE)
rfImp4_condFalse
<- party::varimp(model4, conditional = TRUE)
rfImp4_condTrue
rfImp4_condFalse
## V1 V2 V3 V4 V5 V6
## 7.510642428 7.879691281 0.009474249 10.471205233 2.392096372 -0.023157745
## V7 V8 V9 V10 duplicate1 duplicate2
## 0.050012089 -0.045518711 -0.006739151 0.016405977 1.004973199 1.090971178
rfImp4_condTrue
## V1 V2 V3 V4 V5
## 1.1272658632 4.5746654682 0.0047509127 5.8358626840 0.8328357246
## V6 V7 V8 V9 V10
## 0.0205719357 0.0213153736 -0.0038610208 -0.0008314953 0.0173073153
## duplicate1 duplicate2
## 0.0559177535 0.0660916006
Answer: When parameter conditional
is set to FALSE, then the importance for V1
appears in line with the above models at a value of 7.373153827. When parameter conditional
is TRUE, however, the importance of V1
is much lower at 1.0901055795 and does not follow the pattern of the traditional random forest model.
Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?
<- gbm(y ~ .,
gbmModel data = simulated,
distribution = "gaussian")
summary.gbm(gbmModel)
## var rel.inf
## V4 V4 28.8808582
## V2 V2 25.2042271
## V1 V1 15.6087062
## V5 V5 11.3450756
## duplicate1 duplicate1 8.4640500
## V3 V3 6.9876974
## duplicate2 duplicate2 2.8899739
## V6 V6 0.4534957
## V7 V7 0.1659160
## V8 V8 0.0000000
## V9 V9 0.0000000
## V10 V10 0.0000000
Answer: For boosted trees, a slightly different pattern emerges. Due to the two duplicate predictor variables, predictor variable V1
has the third highest relative influence. The sum of V1
and the two duplicates totals 26.96% overall influence, which puts the combined total second. As expected, the uninformative predictors V6
-V10
have little to no relative influence.
<- simulated %>% select(-c(y))
pred_vars
<- cubist(pred_vars, simulated$y, committees = 100)
cubistMod
#summary(cubistMod)
varImp(cubistMod)
## Overall
## V1 65.5
## V3 44.0
## V2 59.0
## V4 48.0
## V5 30.5
## duplicate1 7.5
## V6 4.0
## V8 1.5
## duplicate2 0.5
## V7 0.0
## V9 0.0
## V10 0.0
Answer: The Cubist model results in a different pattern than above, in which V1
has the clear highest overall importance at 65.5% despite the inclusion of the two duplicate variables. As for the uninformative variables, the scores are close to zero as expected.
Use a simulation to show tree bias with different granularities.
I’ve created three vectors, each of 100 values. The x1
vector contains values from 1-100 inclusive, and thus the highest number of distinct values. The x2
vector contains values from 1-25 inclusive. The x3
vector contains values from 1-10 inclusive, and thus the lowest number of distinct values.
set.seed(200)
<- trunc(runif(100, 1, 100)) # 100 values between 1-100
x1 <- trunc(runif(100, 1, 25)) # 100 values between 1-25
x2 <- trunc(runif(100, 1, 10)) # 100 values between 1-10
x3 <- trunc(runif(100, 1, 50))
y
<- as.data.frame(cbind(y, x1, x2, x3))
df
<- rpart(y ~ ., data = df)
rpartTree
<- varImp(rpartTree)
sim_var_imp
sim_var_imp
## Overall
## x1 0.5076691
## x2 0.3711703
## x3 0.2530505
Answer: The results of the variable importance does reflect the tree bias expected based on the textbook description “trees suffer from selection bias: predictors with a higher number of distinct values are favored over more granular predictors.” Variable x1
has the highest number of distinct values and then has the highest variable importance, while x3
has the lowest number of distinct values and results in the lowest variable importance.
In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:
Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?
From page 207 of the textbook, “The importance profile for boosting has a much steeper importance slope than the one for random forests. This is due to the fact that the trees from boosting are dependent on each other and hence will have correlated structures as the method follows by the gradient. Therefore many of the same predictors will be selected across the trees, increasing their contribution to the importance metric.”
Answer: With a high learning rate (0.9), the model on the right will make use of fewer predictors because the percentage of the predicted value is much higher each iteration, and as noted above from the textbook, trees from boosting are dependent on each other. The higher learning rate will lead to increased correlation in the tree structures, which results in the importance being concentrated among fewer variables.
With a low learning rate (0.1), the model on the left would be adding a small fraction to the predicted value for each iteration of the predicted value, thus leading to the likely possibility of more predictors showing importance in the overall model despite the correlation of tree structures.
The bagging fraction also plays a role, as the high bagging fraction of 0.9 would dictate the training occurs on essentially the same data each iteration, again leading to a concentration of important variables. A lower bagging fraction allows the possibility of more variables playing a factor with expected differing makeup of the training samples.
Which model do you think would be more predictive of other samples?
Answer: The model on the left with the lower learning rate and the lower bagging fraction should be more predictive of other samples. The smaller value of the learning rate typically works better. Regarding the bagging fraction, according to the textbook, the recommendation is 0.5, so given the options of 0.1 or 0.9, I think the lower bagging fraction in conjunction with the lower learning rate would be more predictive of the two options.
How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?
Answer: I believe by increasing the interaction depth, a two-fold result will occur. One, the more important variables may actually increase in importance and some of the variables may reach importance above zero with an increased interaction depth. Overall, I believe that by increasing the interaction depth, otherwise known as the tree depth, will lower the slope of predictor importance for either model. By increasing the interaction depth, then more nodes would be used to construct the trees. The more nodes should lead to more variables used in the trees, thus resulting in more importance of those variables. With the greater interaction depth, the most important variables should see an increase in importance as they are re-used more frequently across trees. I would expect the interaction depth to have a greater impact on the right-hand plot which currently has the higher slope of predictor importance.
Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:
The data setup below is appropriately borrowed from my work in the previous assignment. The set.seed
function is reset to match the value from the previous assignment
# Reset seed from previous assignment
set.seed(8675309) # Jenny, I got your number
# From Exercise 6.3
data(ChemicalManufacturingProcess)
<- as.data.frame(ChemicalManufacturingProcess)
cmp_data
# Let's try to impute using preprocess function
# And make sure not to transform the 'Yield' column which is the result
<- preProcess(cmp_data[, -c(1)], method="knnImpute")
cmp_preprocess_data
<- predict(cmp_preprocess_data, cmp_data[, -c(1)])
cmp_full_data $Yield <- cmp_data$Yield
cmp_full_data
# Identify near zero variance columns for removal
<- nearZeroVar(cmp_full_data)
nzv_cols length(nzv_cols)
## [1] 1
# From: https://stackoverflow.com/questions/28043393/nearzerovar-function-in-caret
if(length(nzv_cols) > 0) cmp_full_data <- cmp_full_data[, -nzv_cols]
dim(cmp_full_data)
## [1] 176 57
<- createDataPartition(cmp_full_data$Yield, p = .80, list=FALSE)
trainingRows
# Training set
<- cmp_full_data[trainingRows, ]
training_data
# Test set
<- cmp_full_data[-trainingRows, ] test_data
With the goal of assessing tree-based regression models on the chemical manufacturing process data, I have fit models of a single tree, model trees, random forest, boosted trees and cubist.
Single tree approach based on the CART methodology.
<- train(Yield ~ .,
rpartTune data = training_data,
method = "rpart2",
tuneLength = 10,
trControl = trainControl(method = "cv"))
# Output model
rpartTune
## CART
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 130, 129, 129, 130, 130, 130, ...
## Resampling results across tuning parameters:
##
## maxdepth RMSE Rsquared MAE
## 1 1.389880 0.4653667 1.126537
## 2 1.431259 0.4171625 1.181201
## 3 1.443377 0.4014534 1.195729
## 4 1.412495 0.4282584 1.157631
## 5 1.405632 0.4258202 1.131534
## 6 1.448676 0.3904679 1.158047
## 7 1.447808 0.4022848 1.146806
## 8 1.436965 0.4090945 1.140129
## 10 1.439249 0.4099080 1.144641
## 11 1.448046 0.4072394 1.149807
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was maxdepth = 1.
# Plot model
ggplot(rpartTune) + labs(title="Single Tree Model With Tuning")
# Make predictions on Test set
<-predict(rpartTune, newdata = test_data)
rpartPred # Output prediction performance
<- postResample(pred = rpartPred, obs = test_data$Yield)
rpart_test_perf rpart_test_perf
## RMSE Rsquared MAE
## 1.5819650 0.3384325 1.2220962
# Variable importance
<- varImp(rpartTune)
rpart_var_imp rpart_var_imp
## rpart2 variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess36 68.37
## ManufacturingProcess31 60.51
## BiologicalMaterial12 60.35
## BiologicalMaterial03 59.76
## ManufacturingProcess11 0.00
## ManufacturingProcess25 0.00
## ManufacturingProcess19 0.00
## ManufacturingProcess34 0.00
## BiologicalMaterial11 0.00
## ManufacturingProcess14 0.00
## BiologicalMaterial09 0.00
## BiologicalMaterial04 0.00
## ManufacturingProcess06 0.00
## BiologicalMaterial02 0.00
## ManufacturingProcess44 0.00
## ManufacturingProcess37 0.00
## ManufacturingProcess26 0.00
## ManufacturingProcess43 0.00
## ManufacturingProcess13 0.00
Based on RMSE, the best-performing single tree model based on CART was a tree with a depth of 1.
The model trees approach uses the rule-based versions of the model, as well as the use of smoothing and pruning.
<- train(Yield ~ .,
m5Tune data = training_data,
method = "M5",
trControl = trainControl(method = "cv"),
control = Weka_control(M = 10))
# Output model
m5Tune
## Model Tree
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 132, 129, 128, 128, 132, 130, ...
## Resampling results across tuning parameters:
##
## pruned smoothed rules RMSE Rsquared MAE
## Yes Yes Yes 1.253332 0.5566076 1.005235
## Yes Yes No 1.260009 0.5778243 1.017559
## Yes No Yes 1.267118 0.5530795 1.021220
## Yes No No 1.263376 0.5708022 1.028094
## No Yes Yes 1.365960 0.5141380 1.113892
## No Yes No 1.306701 0.5485344 1.040095
## No No Yes 1.463067 0.4525543 1.139629
## No No No 1.672658 0.3686095 1.304882
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were pruned = Yes, smoothed = Yes and
## rules = Yes.
# Plot model
ggplot(m5Tune) + labs(title="Model Trees With Tuning")
# Make predictions on Test set
<-predict(m5Tune, newdata = test_data)
m5Pred # Output prediction performance
<- postResample(pred = m5Pred, obs = test_data$Yield)
m5_test_perf m5_test_perf
## RMSE Rsquared MAE
## 1.1786721 0.6393052 0.9437814
# Variable importance
<- varImp(m5Tune)
m5_var_imp m5_var_imp
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess32 100.00
## BiologicalMaterial06 85.65
## ManufacturingProcess36 81.77
## ManufacturingProcess09 81.57
## ManufacturingProcess13 79.95
## BiologicalMaterial03 77.35
## ManufacturingProcess17 76.96
## ManufacturingProcess06 69.87
## BiologicalMaterial12 63.49
## ManufacturingProcess11 62.07
## ManufacturingProcess31 61.21
## BiologicalMaterial02 58.82
## BiologicalMaterial11 50.59
## ManufacturingProcess29 50.08
## BiologicalMaterial04 45.29
## ManufacturingProcess33 45.21
## BiologicalMaterial08 40.36
## ManufacturingProcess30 38.55
## ManufacturingProcess25 38.25
## ManufacturingProcess18 38.03
Based on RMSE, the best performance by the model trees approach uses pruned as yes, smoothed as yes and also rules as yes.
The random forest approach relies on the primary implementation instead of conditional inference trees.
<- train(Yield ~ .,
rfTune data = training_data,
method = "rf",
tuneLength = 10,
trControl = trainControl(method = "cv"))
# Output model
rfTune
## Random Forest
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 130, 129, 130, 129, 128, 129, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 1.234097 0.6205612 1.0285958
## 8 1.126726 0.6593576 0.9196352
## 14 1.118264 0.6450870 0.9038643
## 20 1.117407 0.6459550 0.8943228
## 26 1.114241 0.6391623 0.8897950
## 32 1.124437 0.6303202 0.8929571
## 38 1.122500 0.6272881 0.8906916
## 44 1.131409 0.6227422 0.8909169
## 50 1.125099 0.6252333 0.8823751
## 56 1.146035 0.6100467 0.8963882
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 26.
# Plot model
ggplot(rfTune) + labs(title="Random Forest With Tuning")
# Make predictions on Test set
<-predict(rfTune, newdata = test_data)
rfPred # Output prediction performance
<- postResample(pred = rfPred, obs = test_data$Yield)
rf_test_perf rf_test_perf
## RMSE Rsquared MAE
## 1.2446461 0.6239050 0.9174017
# Variable importance
<- varImp(rfTune)
rf_var_imp rf_var_imp
## rf variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess32 100.000
## BiologicalMaterial12 21.353
## ManufacturingProcess36 20.477
## BiologicalMaterial03 19.002
## ManufacturingProcess09 17.743
## ManufacturingProcess31 16.977
## ManufacturingProcess17 13.161
## BiologicalMaterial06 11.751
## ManufacturingProcess06 10.789
## ManufacturingProcess13 10.219
## ManufacturingProcess11 9.898
## BiologicalMaterial11 7.726
## ManufacturingProcess18 6.689
## BiologicalMaterial08 6.108
## ManufacturingProcess20 6.044
## BiologicalMaterial05 5.774
## BiologicalMaterial02 5.452
## ManufacturingProcess27 5.235
## ManufacturingProcess30 5.039
## BiologicalMaterial04 4.891
Based on RMSE, the best-performing random forest model uses an mtry of 26 which represents the number of predictors that are randomly sampled as candidates for each split.
The boosting regression approach relies on stochastic gradient boosting machines.
<- expand.grid(interaction.depth = seq(1, 7, by = 2),
gbmGrid n.trees = seq(100, 1000, by = 50),
shrinkage = c(0.01, 0.1),
n.minobsinnode = c(5, 10, 15))
<- train(Yield ~ .,
gbmTune data = training_data,
method = "gbm",
tuneGrid = gbmGrid,
verbose = FALSE)
# Output model
gbmTune
## Stochastic Gradient Boosting
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ...
## Resampling results across tuning parameters:
##
## shrinkage interaction.depth n.minobsinnode n.trees RMSE Rsquared
## 0.01 1 5 100 1.486191 0.5289106
## 0.01 1 5 150 1.401307 0.5518511
## 0.01 1 5 200 1.342989 0.5640043
## 0.01 1 5 250 1.302806 0.5720254
## 0.01 1 5 300 1.276742 0.5767385
## 0.01 1 5 350 1.260119 0.5787804
## 0.01 1 5 400 1.248760 0.5805243
## 0.01 1 5 450 1.240183 0.5823591
## 0.01 1 5 500 1.233879 0.5831129
## 0.01 1 5 550 1.228067 0.5847739
## 0.01 1 5 600 1.224250 0.5856267
## 0.01 1 5 650 1.220633 0.5865702
## 0.01 1 5 700 1.217782 0.5871462
## 0.01 1 5 750 1.215754 0.5878428
## 0.01 1 5 800 1.214243 0.5878859
## 0.01 1 5 850 1.212542 0.5882409
## 0.01 1 5 900 1.210500 0.5889584
## 0.01 1 5 950 1.208764 0.5893807
## 0.01 1 5 1000 1.206774 0.5907783
## 0.01 1 10 100 1.482906 0.5353488
## 0.01 1 10 150 1.394317 0.5536670
## 0.01 1 10 200 1.336033 0.5667349
## 0.01 1 10 250 1.296756 0.5739240
## 0.01 1 10 300 1.271788 0.5778757
## 0.01 1 10 350 1.256757 0.5805566
## 0.01 1 10 400 1.245506 0.5825555
## 0.01 1 10 450 1.237618 0.5833315
## 0.01 1 10 500 1.230770 0.5848255
## 0.01 1 10 550 1.225583 0.5861517
## 0.01 1 10 600 1.221196 0.5871370
## 0.01 1 10 650 1.217924 0.5878846
## 0.01 1 10 700 1.215506 0.5885577
## 0.01 1 10 750 1.214211 0.5887033
## 0.01 1 10 800 1.213059 0.5884271
## 0.01 1 10 850 1.211293 0.5889067
## 0.01 1 10 900 1.210169 0.5890128
## 0.01 1 10 950 1.208614 0.5896844
## 0.01 1 10 1000 1.207811 0.5898940
## 0.01 1 15 100 1.482078 0.5323431
## 0.01 1 15 150 1.391562 0.5565844
## 0.01 1 15 200 1.330068 0.5700020
## 0.01 1 15 250 1.292391 0.5757887
## 0.01 1 15 300 1.269185 0.5782970
## 0.01 1 15 350 1.254787 0.5791608
## 0.01 1 15 400 1.243875 0.5800162
## 0.01 1 15 450 1.235647 0.5820913
## 0.01 1 15 500 1.229084 0.5840102
## 0.01 1 15 550 1.225108 0.5846308
## 0.01 1 15 600 1.221996 0.5851981
## 0.01 1 15 650 1.218997 0.5863983
## 0.01 1 15 700 1.216474 0.5864515
## 0.01 1 15 750 1.216169 0.5857462
## 0.01 1 15 800 1.214416 0.5861157
## 0.01 1 15 850 1.214541 0.5853230
## 0.01 1 15 900 1.212670 0.5861380
## 0.01 1 15 950 1.212407 0.5860198
## 0.01 1 15 1000 1.212037 0.5860741
## 0.01 3 5 100 1.389170 0.5655057
## 0.01 3 5 150 1.305572 0.5766748
## 0.01 3 5 200 1.257904 0.5868872
## 0.01 3 5 250 1.231296 0.5922672
## 0.01 3 5 300 1.214491 0.5973219
## 0.01 3 5 350 1.204917 0.5995709
## 0.01 3 5 400 1.197191 0.6022541
## 0.01 3 5 450 1.191423 0.6045971
## 0.01 3 5 500 1.187479 0.6059702
## 0.01 3 5 550 1.183225 0.6078217
## 0.01 3 5 600 1.180492 0.6089279
## 0.01 3 5 650 1.177732 0.6103151
## 0.01 3 5 700 1.175220 0.6116065
## 0.01 3 5 750 1.174223 0.6119615
## 0.01 3 5 800 1.173098 0.6123731
## 0.01 3 5 850 1.171874 0.6129646
## 0.01 3 5 900 1.170826 0.6135819
## 0.01 3 5 950 1.169311 0.6145080
## 0.01 3 5 1000 1.168420 0.6148545
## 0.01 3 10 100 1.391926 0.5586724
## 0.01 3 10 150 1.309779 0.5699783
## 0.01 3 10 200 1.265805 0.5777569
## 0.01 3 10 250 1.240904 0.5833732
## 0.01 3 10 300 1.225687 0.5870080
## 0.01 3 10 350 1.215585 0.5903705
## 0.01 3 10 400 1.208467 0.5927178
## 0.01 3 10 450 1.203870 0.5945746
## 0.01 3 10 500 1.198453 0.5969290
## 0.01 3 10 550 1.194480 0.5989529
## 0.01 3 10 600 1.191602 0.6003391
## 0.01 3 10 650 1.188826 0.6017275
## 0.01 3 10 700 1.186224 0.6033629
## 0.01 3 10 750 1.183729 0.6047517
## 0.01 3 10 800 1.181590 0.6058808
## 0.01 3 10 850 1.179884 0.6068177
## 0.01 3 10 900 1.178520 0.6074003
## 0.01 3 10 950 1.176826 0.6085035
## 0.01 3 10 1000 1.175477 0.6092609
## 0.01 3 15 100 1.400181 0.5578812
## 0.01 3 15 150 1.316454 0.5704691
## 0.01 3 15 200 1.271078 0.5771279
## 0.01 3 15 250 1.246257 0.5808345
## 0.01 3 15 300 1.232198 0.5830896
## 0.01 3 15 350 1.222799 0.5850863
## 0.01 3 15 400 1.216362 0.5866558
## 0.01 3 15 450 1.212159 0.5879651
## 0.01 3 15 500 1.208356 0.5892022
## 0.01 3 15 550 1.205759 0.5899204
## 0.01 3 15 600 1.203542 0.5908966
## 0.01 3 15 650 1.202446 0.5912559
## 0.01 3 15 700 1.201284 0.5915295
## 0.01 3 15 750 1.200239 0.5916646
## 0.01 3 15 800 1.198790 0.5923331
## 0.01 3 15 850 1.198079 0.5927417
## 0.01 3 15 900 1.196776 0.5933454
## 0.01 3 15 950 1.196804 0.5930703
## 0.01 3 15 1000 1.196209 0.5934084
## 0.01 5 5 100 1.369009 0.5667470
## 0.01 5 5 150 1.286456 0.5796775
## 0.01 5 5 200 1.244627 0.5870586
## 0.01 5 5 250 1.220542 0.5926281
## 0.01 5 5 300 1.205749 0.5975096
## 0.01 5 5 350 1.196059 0.6010382
## 0.01 5 5 400 1.189228 0.6037565
## 0.01 5 5 450 1.184001 0.6061067
## 0.01 5 5 500 1.180093 0.6080828
## 0.01 5 5 550 1.177483 0.6093484
## 0.01 5 5 600 1.174767 0.6108127
## 0.01 5 5 650 1.172730 0.6118515
## 0.01 5 5 700 1.170783 0.6130235
## 0.01 5 5 750 1.169202 0.6140092
## 0.01 5 5 800 1.168300 0.6144967
## 0.01 5 5 850 1.167229 0.6150153
## 0.01 5 5 900 1.166417 0.6155037
## 0.01 5 5 950 1.165403 0.6161035
## 0.01 5 5 1000 1.164614 0.6166326
## 0.01 5 10 100 1.378814 0.5590624
## 0.01 5 10 150 1.300403 0.5692073
## 0.01 5 10 200 1.259746 0.5760212
## 0.01 5 10 250 1.237738 0.5807777
## 0.01 5 10 300 1.224039 0.5848682
## 0.01 5 10 350 1.214036 0.5883988
## 0.01 5 10 400 1.207526 0.5910539
## 0.01 5 10 450 1.203586 0.5924703
## 0.01 5 10 500 1.200130 0.5937711
## 0.01 5 10 550 1.196980 0.5955693
## 0.01 5 10 600 1.193676 0.5975735
## 0.01 5 10 650 1.191018 0.5990889
## 0.01 5 10 700 1.188816 0.6004937
## 0.01 5 10 750 1.187077 0.6014315
## 0.01 5 10 800 1.185671 0.6022739
## 0.01 5 10 850 1.184459 0.6030332
## 0.01 5 10 900 1.183553 0.6033892
## 0.01 5 10 950 1.182640 0.6037095
## 0.01 5 10 1000 1.181487 0.6043854
## 0.01 5 15 100 1.398524 0.5633108
## 0.01 5 15 150 1.315901 0.5733371
## 0.01 5 15 200 1.270384 0.5795302
## 0.01 5 15 250 1.245529 0.5821243
## 0.01 5 15 300 1.231968 0.5837819
## 0.01 5 15 350 1.222852 0.5858522
## 0.01 5 15 400 1.216977 0.5875228
## 0.01 5 15 450 1.212155 0.5890289
## 0.01 5 15 500 1.208697 0.5900647
## 0.01 5 15 550 1.204831 0.5916757
## 0.01 5 15 600 1.202317 0.5925356
## 0.01 5 15 650 1.200640 0.5931611
## 0.01 5 15 700 1.199302 0.5937382
## 0.01 5 15 750 1.197414 0.5943505
## 0.01 5 15 800 1.197398 0.5941070
## 0.01 5 15 850 1.196931 0.5942230
## 0.01 5 15 900 1.196324 0.5946298
## 0.01 5 15 950 1.195934 0.5947724
## 0.01 5 15 1000 1.195343 0.5949952
## 0.01 7 5 100 1.358591 0.5750438
## 0.01 7 5 150 1.277928 0.5848194
## 0.01 7 5 200 1.235347 0.5930020
## 0.01 7 5 250 1.211391 0.5990637
## 0.01 7 5 300 1.196980 0.6039667
## 0.01 7 5 350 1.188218 0.6069974
## 0.01 7 5 400 1.182558 0.6092448
## 0.01 7 5 450 1.177286 0.6115300
## 0.01 7 5 500 1.173541 0.6133831
## 0.01 7 5 550 1.171123 0.6145779
## 0.01 7 5 600 1.169065 0.6156666
## 0.01 7 5 650 1.167314 0.6165023
## 0.01 7 5 700 1.165418 0.6175279
## 0.01 7 5 750 1.164055 0.6183079
## 0.01 7 5 800 1.162944 0.6189748
## 0.01 7 5 850 1.162116 0.6195358
## 0.01 7 5 900 1.161108 0.6202019
## 0.01 7 5 950 1.160355 0.6206625
## 0.01 7 5 1000 1.159766 0.6210294
## 0.01 7 10 100 1.377584 0.5633100
## 0.01 7 10 150 1.297447 0.5738978
## 0.01 7 10 200 1.255290 0.5808215
## 0.01 7 10 250 1.231347 0.5859233
## 0.01 7 10 300 1.218277 0.5891211
## 0.01 7 10 350 1.210005 0.5912148
## 0.01 7 10 400 1.203249 0.5937933
## 0.01 7 10 450 1.197859 0.5961164
## 0.01 7 10 500 1.195112 0.5970242
## 0.01 7 10 550 1.191842 0.5987782
## 0.01 7 10 600 1.188743 0.6006136
## 0.01 7 10 650 1.186515 0.6017020
## 0.01 7 10 700 1.184580 0.6026610
## 0.01 7 10 750 1.183068 0.6036677
## 0.01 7 10 800 1.181898 0.6042910
## 0.01 7 10 850 1.180635 0.6050317
## 0.01 7 10 900 1.178872 0.6059856
## 0.01 7 10 950 1.177872 0.6065719
## 0.01 7 10 1000 1.176578 0.6072278
## 0.01 7 15 100 1.399310 0.5595357
## 0.01 7 15 150 1.316882 0.5721242
## 0.01 7 15 200 1.271176 0.5783682
## 0.01 7 15 250 1.247319 0.5819936
## 0.01 7 15 300 1.232223 0.5847026
## 0.01 7 15 350 1.223316 0.5858624
## 0.01 7 15 400 1.217643 0.5868887
## 0.01 7 15 450 1.213601 0.5877975
## 0.01 7 15 500 1.210308 0.5886473
## 0.01 7 15 550 1.208225 0.5891899
## 0.01 7 15 600 1.206996 0.5892104
## 0.01 7 15 650 1.204342 0.5902924
## 0.01 7 15 700 1.202274 0.5912604
## 0.01 7 15 750 1.201158 0.5917787
## 0.01 7 15 800 1.200690 0.5919218
## 0.01 7 15 850 1.199689 0.5922783
## 0.01 7 15 900 1.199202 0.5924881
## 0.01 7 15 950 1.198040 0.5931035
## 0.01 7 15 1000 1.197785 0.5930329
## 0.10 1 5 100 1.220862 0.5791785
## 0.10 1 5 150 1.222883 0.5762153
## 0.10 1 5 200 1.223112 0.5758237
## 0.10 1 5 250 1.220087 0.5765737
## 0.10 1 5 300 1.220684 0.5761106
## 0.10 1 5 350 1.222383 0.5740244
## 0.10 1 5 400 1.225556 0.5725945
## 0.10 1 5 450 1.225032 0.5734623
## 0.10 1 5 500 1.224644 0.5732587
## 0.10 1 5 550 1.224012 0.5744082
## 0.10 1 5 600 1.223546 0.5743317
## 0.10 1 5 650 1.223449 0.5742545
## 0.10 1 5 700 1.224021 0.5737377
## 0.10 1 5 750 1.224921 0.5732040
## 0.10 1 5 800 1.225107 0.5734814
## 0.10 1 5 850 1.226301 0.5726728
## 0.10 1 5 900 1.226778 0.5723561
## 0.10 1 5 950 1.227322 0.5722500
## 0.10 1 5 1000 1.227184 0.5725341
## 0.10 1 10 100 1.215834 0.5811189
## 0.10 1 10 150 1.208154 0.5842033
## 0.10 1 10 200 1.206243 0.5851210
## 0.10 1 10 250 1.205844 0.5845499
## 0.10 1 10 300 1.207032 0.5837367
## 0.10 1 10 350 1.204914 0.5846532
## 0.10 1 10 400 1.203610 0.5858531
## 0.10 1 10 450 1.206256 0.5839931
## 0.10 1 10 500 1.207264 0.5834030
## 0.10 1 10 550 1.210568 0.5812689
## 0.10 1 10 600 1.211994 0.5806051
## 0.10 1 10 650 1.212829 0.5803741
## 0.10 1 10 700 1.213550 0.5801360
## 0.10 1 10 750 1.215205 0.5792676
## 0.10 1 10 800 1.215876 0.5788634
## 0.10 1 10 850 1.215833 0.5789892
## 0.10 1 10 900 1.217579 0.5777541
## 0.10 1 10 950 1.217481 0.5780386
## 0.10 1 10 1000 1.217866 0.5777837
## 0.10 1 15 100 1.223739 0.5768645
## 0.10 1 15 150 1.224656 0.5737249
## 0.10 1 15 200 1.224771 0.5734668
## 0.10 1 15 250 1.227301 0.5717879
## 0.10 1 15 300 1.231995 0.5687680
## 0.10 1 15 350 1.236037 0.5663090
## 0.10 1 15 400 1.237758 0.5651562
## 0.10 1 15 450 1.239542 0.5642582
## 0.10 1 15 500 1.240700 0.5639358
## 0.10 1 15 550 1.242925 0.5625911
## 0.10 1 15 600 1.245256 0.5616828
## 0.10 1 15 650 1.245439 0.5614085
## 0.10 1 15 700 1.247038 0.5609119
## 0.10 1 15 750 1.246600 0.5612112
## 0.10 1 15 800 1.248657 0.5605709
## 0.10 1 15 850 1.248907 0.5603258
## 0.10 1 15 900 1.250909 0.5590567
## 0.10 1 15 950 1.252975 0.5580344
## 0.10 1 15 1000 1.254052 0.5574813
## 0.10 3 5 100 1.209037 0.5856045
## 0.10 3 5 150 1.203242 0.5895538
## 0.10 3 5 200 1.201287 0.5903753
## 0.10 3 5 250 1.201929 0.5901633
## 0.10 3 5 300 1.201392 0.5905560
## 0.10 3 5 350 1.200812 0.5908816
## 0.10 3 5 400 1.200510 0.5910834
## 0.10 3 5 450 1.200435 0.5911400
## 0.10 3 5 500 1.200245 0.5912764
## 0.10 3 5 550 1.200171 0.5912961
## 0.10 3 5 600 1.200086 0.5913603
## 0.10 3 5 650 1.199998 0.5914057
## 0.10 3 5 700 1.199987 0.5914179
## 0.10 3 5 750 1.199989 0.5914139
## 0.10 3 5 800 1.199985 0.5914157
## 0.10 3 5 850 1.199990 0.5914124
## 0.10 3 5 900 1.199988 0.5914140
## 0.10 3 5 950 1.199988 0.5914131
## 0.10 3 5 1000 1.199986 0.5914151
## 0.10 3 10 100 1.209154 0.5871871
## 0.10 3 10 150 1.198627 0.5936404
## 0.10 3 10 200 1.196318 0.5947584
## 0.10 3 10 250 1.195902 0.5950076
## 0.10 3 10 300 1.195179 0.5954142
## 0.10 3 10 350 1.195374 0.5951636
## 0.10 3 10 400 1.195248 0.5951690
## 0.10 3 10 450 1.195207 0.5950877
## 0.10 3 10 500 1.194749 0.5953571
## 0.10 3 10 550 1.194474 0.5954983
## 0.10 3 10 600 1.194285 0.5955788
## 0.10 3 10 650 1.194251 0.5955393
## 0.10 3 10 700 1.194150 0.5955820
## 0.10 3 10 750 1.194073 0.5956003
## 0.10 3 10 800 1.194075 0.5955990
## 0.10 3 10 850 1.194043 0.5956129
## 0.10 3 10 900 1.193988 0.5956334
## 0.10 3 10 950 1.193983 0.5956202
## 0.10 3 10 1000 1.193981 0.5956141
## 0.10 3 15 100 1.220890 0.5750431
## 0.10 3 15 150 1.217206 0.5768986
## 0.10 3 15 200 1.214575 0.5787180
## 0.10 3 15 250 1.214091 0.5787759
## 0.10 3 15 300 1.213506 0.5790269
## 0.10 3 15 350 1.214077 0.5789369
## 0.10 3 15 400 1.214801 0.5785315
## 0.10 3 15 450 1.215048 0.5782228
## 0.10 3 15 500 1.214936 0.5781119
## 0.10 3 15 550 1.215887 0.5774881
## 0.10 3 15 600 1.216281 0.5771967
## 0.10 3 15 650 1.215980 0.5774030
## 0.10 3 15 700 1.216589 0.5769979
## 0.10 3 15 750 1.216692 0.5769319
## 0.10 3 15 800 1.216489 0.5770667
## 0.10 3 15 850 1.216619 0.5770202
## 0.10 3 15 900 1.216798 0.5768992
## 0.10 3 15 950 1.216941 0.5767932
## 0.10 3 15 1000 1.217057 0.5767471
## 0.10 5 5 100 1.189808 0.5984193
## 0.10 5 5 150 1.184494 0.6016229
## 0.10 5 5 200 1.183176 0.6025854
## 0.10 5 5 250 1.182392 0.6029769
## 0.10 5 5 300 1.182160 0.6030956
## 0.10 5 5 350 1.182028 0.6031677
## 0.10 5 5 400 1.181852 0.6032969
## 0.10 5 5 450 1.181686 0.6033868
## 0.10 5 5 500 1.181581 0.6034511
## 0.10 5 5 550 1.181551 0.6034701
## 0.10 5 5 600 1.181515 0.6034882
## 0.10 5 5 650 1.181505 0.6034865
## 0.10 5 5 700 1.181517 0.6034793
## 0.10 5 5 750 1.181498 0.6034882
## 0.10 5 5 800 1.181501 0.6034856
## 0.10 5 5 850 1.181506 0.6034809
## 0.10 5 5 900 1.181503 0.6034833
## 0.10 5 5 950 1.181505 0.6034827
## 0.10 5 5 1000 1.181505 0.6034819
## 0.10 5 10 100 1.206678 0.5839751
## 0.10 5 10 150 1.205002 0.5848032
## 0.10 5 10 200 1.201586 0.5871663
## 0.10 5 10 250 1.199668 0.5882358
## 0.10 5 10 300 1.199014 0.5886142
## 0.10 5 10 350 1.198930 0.5885237
## 0.10 5 10 400 1.198907 0.5886676
## 0.10 5 10 450 1.198998 0.5885662
## 0.10 5 10 500 1.198813 0.5886678
## 0.10 5 10 550 1.198493 0.5888504
## 0.10 5 10 600 1.198278 0.5889473
## 0.10 5 10 650 1.198066 0.5890513
## 0.10 5 10 700 1.198044 0.5890823
## 0.10 5 10 750 1.197941 0.5891485
## 0.10 5 10 800 1.197914 0.5891511
## 0.10 5 10 850 1.197807 0.5892262
## 0.10 5 10 900 1.197792 0.5892334
## 0.10 5 10 950 1.197738 0.5892815
## 0.10 5 10 1000 1.197675 0.5893152
## 0.10 5 15 100 1.220497 0.5752335
## 0.10 5 15 150 1.217947 0.5784203
## 0.10 5 15 200 1.215805 0.5799735
## 0.10 5 15 250 1.217767 0.5787783
## 0.10 5 15 300 1.219427 0.5777262
## 0.10 5 15 350 1.221591 0.5763257
## 0.10 5 15 400 1.222153 0.5760085
## 0.10 5 15 450 1.222764 0.5755597
## 0.10 5 15 500 1.223457 0.5751227
## 0.10 5 15 550 1.223776 0.5749613
## 0.10 5 15 600 1.224361 0.5746852
## 0.10 5 15 650 1.224706 0.5745046
## 0.10 5 15 700 1.224954 0.5743541
## 0.10 5 15 750 1.225194 0.5742116
## 0.10 5 15 800 1.225407 0.5741520
## 0.10 5 15 850 1.225580 0.5740641
## 0.10 5 15 900 1.225807 0.5739181
## 0.10 5 15 950 1.225904 0.5739047
## 0.10 5 15 1000 1.225835 0.5739630
## 0.10 7 5 100 1.201282 0.5886417
## 0.10 7 5 150 1.197295 0.5910847
## 0.10 7 5 200 1.195644 0.5922598
## 0.10 7 5 250 1.194906 0.5927700
## 0.10 7 5 300 1.194614 0.5930004
## 0.10 7 5 350 1.194496 0.5930489
## 0.10 7 5 400 1.194117 0.5932852
## 0.10 7 5 450 1.193982 0.5934218
## 0.10 7 5 500 1.193913 0.5934656
## 0.10 7 5 550 1.193831 0.5935223
## 0.10 7 5 600 1.193788 0.5935536
## 0.10 7 5 650 1.193723 0.5935993
## 0.10 7 5 700 1.193708 0.5936130
## 0.10 7 5 750 1.193691 0.5936253
## 0.10 7 5 800 1.193687 0.5936300
## 0.10 7 5 850 1.193677 0.5936372
## 0.10 7 5 900 1.193672 0.5936414
## 0.10 7 5 950 1.193672 0.5936415
## 0.10 7 5 1000 1.193671 0.5936419
## 0.10 7 10 100 1.210655 0.5836213
## 0.10 7 10 150 1.204394 0.5881904
## 0.10 7 10 200 1.203134 0.5890219
## 0.10 7 10 250 1.203077 0.5889877
## 0.10 7 10 300 1.202250 0.5893976
## 0.10 7 10 350 1.202679 0.5891469
## 0.10 7 10 400 1.202619 0.5891073
## 0.10 7 10 450 1.202251 0.5892622
## 0.10 7 10 500 1.201594 0.5896960
## 0.10 7 10 550 1.201511 0.5897692
## 0.10 7 10 600 1.201716 0.5896236
## 0.10 7 10 650 1.201515 0.5897288
## 0.10 7 10 700 1.201436 0.5897774
## 0.10 7 10 750 1.201407 0.5898224
## 0.10 7 10 800 1.201252 0.5899128
## 0.10 7 10 850 1.201232 0.5899221
## 0.10 7 10 900 1.201215 0.5899418
## 0.10 7 10 950 1.201175 0.5899589
## 0.10 7 10 1000 1.201115 0.5900016
## 0.10 7 15 100 1.198908 0.5927353
## 0.10 7 15 150 1.194254 0.5954321
## 0.10 7 15 200 1.193689 0.5953343
## 0.10 7 15 250 1.194858 0.5948362
## 0.10 7 15 300 1.195437 0.5943372
## 0.10 7 15 350 1.195195 0.5944245
## 0.10 7 15 400 1.196056 0.5938096
## 0.10 7 15 450 1.196657 0.5931584
## 0.10 7 15 500 1.197051 0.5928638
## 0.10 7 15 550 1.197417 0.5925315
## 0.10 7 15 600 1.197625 0.5923343
## 0.10 7 15 650 1.198260 0.5919098
## 0.10 7 15 700 1.198070 0.5920757
## 0.10 7 15 750 1.198079 0.5920714
## 0.10 7 15 800 1.198266 0.5919419
## 0.10 7 15 850 1.198350 0.5918944
## 0.10 7 15 900 1.198424 0.5918169
## 0.10 7 15 950 1.198594 0.5917227
## 0.10 7 15 1000 1.198738 0.5916117
## MAE
## 1.1980078
## 1.1241427
## 1.0705710
## 1.0335115
## 1.0096095
## 0.9930084
## 0.9814230
## 0.9735850
## 0.9671369
## 0.9626785
## 0.9585874
## 0.9553978
## 0.9528511
## 0.9511649
## 0.9502423
## 0.9487247
## 0.9469112
## 0.9452688
## 0.9439205
## 1.1945608
## 1.1172299
## 1.0643763
## 1.0268654
## 1.0011534
## 0.9860865
## 0.9746878
## 0.9666453
## 0.9594476
## 0.9537611
## 0.9496775
## 0.9461265
## 0.9439708
## 0.9430290
## 0.9417023
## 0.9409060
## 0.9396965
## 0.9392644
## 0.9392383
## 1.1961789
## 1.1163396
## 1.0598665
## 1.0219660
## 0.9973412
## 0.9810942
## 0.9691898
## 0.9601067
## 0.9537832
## 0.9496912
## 0.9468356
## 0.9448614
## 0.9426933
## 0.9428579
## 0.9410519
## 0.9417856
## 0.9398508
## 0.9399741
## 0.9399722
## 1.1142610
## 1.0345632
## 0.9879186
## 0.9621472
## 0.9447841
## 0.9364736
## 0.9301432
## 0.9253512
## 0.9213404
## 0.9177669
## 0.9157041
## 0.9132500
## 0.9106485
## 0.9095552
## 0.9087959
## 0.9073843
## 0.9063005
## 0.9052810
## 0.9042718
## 1.1143446
## 1.0352030
## 0.9908174
## 0.9662343
## 0.9523870
## 0.9435000
## 0.9367954
## 0.9329804
## 0.9288381
## 0.9257454
## 0.9234202
## 0.9218419
## 0.9199567
## 0.9183057
## 0.9167632
## 0.9154250
## 0.9144487
## 0.9135872
## 0.9129193
## 1.1231826
## 1.0438361
## 0.9963876
## 0.9696105
## 0.9557144
## 0.9467956
## 0.9412883
## 0.9372730
## 0.9339848
## 0.9315182
## 0.9301842
## 0.9296902
## 0.9290098
## 0.9282349
## 0.9275514
## 0.9267930
## 0.9261309
## 0.9259999
## 0.9265365
## 1.0933417
## 1.0143193
## 0.9726428
## 0.9494503
## 0.9343799
## 0.9259097
## 0.9193031
## 0.9146636
## 0.9113116
## 0.9088995
## 0.9069346
## 0.9053814
## 0.9038898
## 0.9030778
## 0.9021168
## 0.9013726
## 0.9006915
## 0.8999206
## 0.8991076
## 1.0993790
## 1.0222068
## 0.9798925
## 0.9576126
## 0.9454624
## 0.9373113
## 0.9325427
## 0.9294620
## 0.9268563
## 0.9246455
## 0.9224793
## 0.9209638
## 0.9196056
## 0.9189767
## 0.9182208
## 0.9178629
## 0.9173296
## 0.9169470
## 0.9160923
## 1.1237102
## 1.0442629
## 0.9949497
## 0.9660416
## 0.9526147
## 0.9445244
## 0.9385816
## 0.9345159
## 0.9318758
## 0.9290223
## 0.9277855
## 0.9266358
## 0.9255319
## 0.9244565
## 0.9250011
## 0.9251206
## 0.9253136
## 0.9255181
## 0.9253318
## 1.0848499
## 1.0051322
## 0.9623429
## 0.9396587
## 0.9272271
## 0.9189799
## 0.9144450
## 0.9098527
## 0.9068857
## 0.9049088
## 0.9030294
## 0.9013246
## 0.8998127
## 0.8986482
## 0.8978243
## 0.8971973
## 0.8966736
## 0.8961744
## 0.8957674
## 1.0999086
## 1.0218635
## 0.9774323
## 0.9546560
## 0.9420321
## 0.9344042
## 0.9281589
## 0.9237295
## 0.9215264
## 0.9186178
## 0.9166900
## 0.9154702
## 0.9140752
## 0.9130254
## 0.9127362
## 0.9120911
## 0.9110674
## 0.9104173
## 0.9098527
## 1.1239685
## 1.0456245
## 0.9968575
## 0.9705284
## 0.9552792
## 0.9466344
## 0.9415912
## 0.9381065
## 0.9348268
## 0.9335215
## 0.9323308
## 0.9304486
## 0.9288022
## 0.9286793
## 0.9288730
## 0.9282318
## 0.9285748
## 0.9278612
## 0.9279825
## 0.9523391
## 0.9563405
## 0.9557215
## 0.9546504
## 0.9540956
## 0.9546290
## 0.9580923
## 0.9588314
## 0.9582217
## 0.9573005
## 0.9573240
## 0.9575193
## 0.9580538
## 0.9590089
## 0.9592189
## 0.9603942
## 0.9610067
## 0.9617838
## 0.9618542
## 0.9508340
## 0.9473952
## 0.9502691
## 0.9476806
## 0.9482283
## 0.9458655
## 0.9459336
## 0.9493548
## 0.9503741
## 0.9530442
## 0.9546284
## 0.9553313
## 0.9551868
## 0.9572946
## 0.9585338
## 0.9596960
## 0.9609860
## 0.9613494
## 0.9617083
## 0.9480402
## 0.9505354
## 0.9552555
## 0.9602607
## 0.9660370
## 0.9685400
## 0.9704134
## 0.9717707
## 0.9740413
## 0.9758944
## 0.9777863
## 0.9788845
## 0.9802005
## 0.9806088
## 0.9827962
## 0.9833492
## 0.9852029
## 0.9870391
## 0.9874426
## 0.9349699
## 0.9292499
## 0.9262426
## 0.9262988
## 0.9260852
## 0.9256338
## 0.9254694
## 0.9254178
## 0.9252093
## 0.9251152
## 0.9250106
## 0.9249493
## 0.9249302
## 0.9249324
## 0.9249260
## 0.9249309
## 0.9249285
## 0.9249288
## 0.9249253
## 0.9504537
## 0.9454988
## 0.9443664
## 0.9435981
## 0.9436973
## 0.9442260
## 0.9442549
## 0.9441322
## 0.9439019
## 0.9437161
## 0.9437746
## 0.9438179
## 0.9436932
## 0.9436853
## 0.9437437
## 0.9437581
## 0.9437428
## 0.9437760
## 0.9438147
## 0.9480999
## 0.9452054
## 0.9427890
## 0.9421529
## 0.9418489
## 0.9421110
## 0.9430712
## 0.9433279
## 0.9428728
## 0.9438472
## 0.9438848
## 0.9435142
## 0.9441390
## 0.9442920
## 0.9440308
## 0.9443187
## 0.9444606
## 0.9445193
## 0.9446364
## 0.9233914
## 0.9190343
## 0.9172399
## 0.9162931
## 0.9158528
## 0.9157004
## 0.9155014
## 0.9153737
## 0.9152531
## 0.9152166
## 0.9151830
## 0.9151812
## 0.9151898
## 0.9151716
## 0.9151724
## 0.9151757
## 0.9151691
## 0.9151699
## 0.9151686
## 0.9360083
## 0.9363427
## 0.9341728
## 0.9328730
## 0.9329019
## 0.9328195
## 0.9328104
## 0.9329377
## 0.9328500
## 0.9327061
## 0.9324250
## 0.9323022
## 0.9322448
## 0.9321333
## 0.9321477
## 0.9320878
## 0.9320738
## 0.9320222
## 0.9319748
## 0.9510769
## 0.9523332
## 0.9524862
## 0.9552634
## 0.9561574
## 0.9580378
## 0.9587949
## 0.9592158
## 0.9597637
## 0.9599311
## 0.9604920
## 0.9607581
## 0.9609085
## 0.9609875
## 0.9613079
## 0.9615420
## 0.9618428
## 0.9618962
## 0.9619851
## 0.9290613
## 0.9251948
## 0.9240048
## 0.9237396
## 0.9235022
## 0.9234165
## 0.9230921
## 0.9229787
## 0.9229274
## 0.9228667
## 0.9228422
## 0.9228061
## 0.9228025
## 0.9227915
## 0.9227932
## 0.9227878
## 0.9227858
## 0.9227880
## 0.9227852
## 0.9445866
## 0.9409862
## 0.9403392
## 0.9412500
## 0.9411214
## 0.9417927
## 0.9421017
## 0.9420654
## 0.9415821
## 0.9415071
## 0.9416296
## 0.9414494
## 0.9413945
## 0.9414283
## 0.9413656
## 0.9414096
## 0.9414817
## 0.9415137
## 0.9415034
## 0.9259884
## 0.9232390
## 0.9261780
## 0.9287268
## 0.9299715
## 0.9298665
## 0.9310427
## 0.9318633
## 0.9318487
## 0.9323804
## 0.9323349
## 0.9329280
## 0.9328924
## 0.9327282
## 0.9328171
## 0.9327821
## 0.9328369
## 0.9329665
## 0.9331365
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1000, interaction.depth =
## 7, shrinkage = 0.01 and n.minobsinnode = 5.
# Plot model
ggplot(gbmTune) + labs(title="Boosted Trees With Tuning")
# Make predictions on Test set
<-predict(gbmTune, newdata = test_data)
gbmPred # Output prediction performance
<- postResample(pred = gbmPred, obs = test_data$Yield)
gbm_test_perf gbm_test_perf
## RMSE Rsquared MAE
## 1.1694813 0.6490858 0.8882202
# Variable importance
<- varImp(gbmTune)
gbm_var_imp gbm_var_imp
## gbm variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess32 100.000
## ManufacturingProcess09 20.525
## BiologicalMaterial12 19.468
## ManufacturingProcess06 15.996
## ManufacturingProcess17 15.798
## ManufacturingProcess31 13.514
## BiologicalMaterial09 12.595
## BiologicalMaterial03 10.440
## ManufacturingProcess11 9.701
## ManufacturingProcess36 7.878
## ManufacturingProcess13 7.836
## BiologicalMaterial05 7.243
## BiologicalMaterial08 6.975
## BiologicalMaterial06 6.596
## BiologicalMaterial11 6.481
## ManufacturingProcess43 6.392
## ManufacturingProcess20 6.253
## ManufacturingProcess04 5.828
## ManufacturingProcess21 5.678
## ManufacturingProcess18 5.538
Based on RMSE, the best-performing boosted trees model used the parameters n.trees as 1000, interaction.depth as 7, shrinkage as 0.01 and n.minobsinnode as 5.
The cubist approach is a simple rule-based model with a single committee and no instance-based adjustment, as outlined in the textbook.
<- train(Yield ~ .,
cubistTuned data = training_data,
method = "cubist")
# Output model
cubistTuned
## Cubist
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared MAE
## 1 0 1.723425 0.3545747 1.2896683
## 1 5 1.705292 0.3644032 1.2706784
## 1 9 1.713043 0.3592913 1.2787201
## 10 0 1.224689 0.5636035 0.9684193
## 10 5 1.195020 0.5817505 0.9407648
## 10 9 1.207613 0.5736908 0.9526139
## 20 0 1.172915 0.5947093 0.9267703
## 20 5 1.144283 0.6121806 0.9012686
## 20 9 1.156161 0.6044030 0.9127624
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.
# Plot model
ggplot(cubistTuned) + labs(title="Cubist Model With Tuning")
# Make predictions on Test set
<-predict(cubistTuned, newdata = test_data)
cubPred # Output prediction performance
<- postResample(pred = cubPred, obs = test_data$Yield)
cub_test_perf cub_test_perf
## RMSE Rsquared MAE
## 1.0278054 0.7528798 0.7754060
# Variable importance
<- varImp(cubistTuned)
cub_var_imp cub_var_imp
## cubist variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess09 51.26
## ManufacturingProcess17 43.70
## BiologicalMaterial10 32.77
## BiologicalMaterial06 31.09
## ManufacturingProcess26 25.21
## ManufacturingProcess11 23.53
## ManufacturingProcess04 21.85
## ManufacturingProcess25 21.01
## BiologicalMaterial03 21.01
## BiologicalMaterial11 19.33
## ManufacturingProcess31 18.49
## ManufacturingProcess27 18.49
## BiologicalMaterial12 18.49
## ManufacturingProcess33 18.49
## BiologicalMaterial02 17.65
## ManufacturingProcess13 16.81
## ManufacturingProcess28 15.97
## ManufacturingProcess29 14.29
## BiologicalMaterial08 13.45
Based on RMSE, the best-performing cubist model uses a committees value of 20 and neighbors count of 5.
Which tree-based regression model gives the optimal re-sampling and test set performance?
<- data.frame(Single_Tree=rpart_test_perf,
perf_results Model_Trees=m5_test_perf,
Random_Forest=rf_test_perf,
Boosted_Trees=gbm_test_perf,
Cubist=cub_test_perf)
%>% t() %>%
perf_results kable(caption="Comparison of Model Performance on Test Data", digits=4) %>%
kable_styling(bootstrap_options = c("hover", "striped"))
RMSE | Rsquared | MAE | |
---|---|---|---|
Single_Tree | 1.5820 | 0.3384 | 1.2221 |
Model_Trees | 1.1787 | 0.6393 | 0.9438 |
Random_Forest | 1.2446 | 0.6239 | 0.9174 |
Boosted_Trees | 1.1695 | 0.6491 | 0.8882 |
Cubist | 1.0278 | 0.7529 | 0.7754 |
Answer: As seen in the table above, the cubist model approach performed the best on the test data set with an RMSE of 1.0278 and an \(R^2\) of 0.7529. The M5 trees model, primary random forest model and gradient boosted machines model also performed well, each with an \(R^2\) above 0.62. The optimal single tree model did not fare as well on the test data, resulting in an \(R^2\) of 0.3384. Not a surprising result for the single tree model, as the optimal single tree model on the training data only had a depth of one.
Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?
Answer: The optimal tree-based regression model is the cubist model with important predictors listed below (the same as above).
cub_var_imp
## cubist variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess09 51.26
## ManufacturingProcess17 43.70
## BiologicalMaterial10 32.77
## BiologicalMaterial06 31.09
## ManufacturingProcess26 25.21
## ManufacturingProcess11 23.53
## ManufacturingProcess04 21.85
## ManufacturingProcess25 21.01
## BiologicalMaterial03 21.01
## BiologicalMaterial11 19.33
## ManufacturingProcess33 18.49
## ManufacturingProcess31 18.49
## ManufacturingProcess27 18.49
## BiologicalMaterial12 18.49
## BiologicalMaterial02 17.65
## ManufacturingProcess13 16.81
## ManufacturingProcess28 15.97
## ManufacturingProcess29 14.29
## BiologicalMaterial08 13.45
More Answer: Of the top 20 important predictors, manufacturing variables account for 13 while the biological variables account for the remaining 7. The 13-7 split is the same breakdown as the optimal nonlinear regression model (SVM). In comparison to the optimal linear model, then neither type of variable dominates as the manufacturing variables dominate in the optimal linear model. Evaluating the actual scores, the manufacturing variables clearly play a dominate role in the model. Only 3 variables score above 40%, and those three are manufacturing. ManufacturingProcess32
scores 100% and ManufacturingProcess09
scores 51% indicating the manufacturing variables play a large role in the model. The top 10 variables for both the optimal linear and nonlinear models scored about 50% importance. The tree-based method shows a much higher reliance on just a few variables as compared to the other model types.
Top Ten Variables of SVM: Nonlinear Regression Model
## Overall
## ManufacturingProcess32 100.00
## BiologicalMaterial06 85.65
## ManufacturingProcess36 81.77
## ManufacturingProcess09 81.57
## ManufacturingProcess13 79.95
## BiologicalMaterial03 77.35
## ManufacturingProcess17 76.96
## ManufacturingProcess06 69.87
## BiologicalMaterial12 63.49
## ManufacturingProcess11 62.07
Top Ten Variables of PLS: Linear Regression Model
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess09 85.19
## ManufacturingProcess36 84.84
## ManufacturingProcess13 79.27
## ManufacturingProcess17 77.75
## ManufacturingProcess06 64.16
## ManufacturingProcess11 60.24
## ManufacturingProcess33 54.36
## BiologicalMaterial02 53.90
## BiologicalMaterial08 53.64
More Answer: The top 3 variables of the cubist model (ManufacturingProcess32
, ManufacturingProcess09
and ManufacturingProcess17
) are also found in the top 10 of both the optimal linear and nonlinear models. Interesting though, the cubist model and the SVM model share two biological variables in the top 10, BiologicalMaterial06
and BiologicalMaterial03
, while the linear model does not share top-performing biological variables. As noted above, the biggest observation is the large drop in importance after the first variable in the tree-based model as compared to the optimal linear and nonlinear models.
Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?
$finalModel rpartTune
## n= 144
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 144 473.7514 40.12944
## 2) ManufacturingProcess32< 0.191596 88 157.8237 39.20886 *
## 3) ManufacturingProcess32>=0.191596 56 124.1575 41.57607 *
plot(as.party(rpartTune$finalModel), gp=gpar(fontsize=10))
Answer: As it turns out, the optimal single tree from the trained model above is a tree with a depth of 1. The condition for the single node is unsurprisingly the manufacturing process variable ManufacturingProcess32
, the most important variable for all of the optimal model types. The decision node determines the terminal node for values equal to and above 0.192, and thus below the 0.192 value. The boxplots of the terminal nodes show that the higher the value of ManufacturingProcess32
then likely the higher the Yield
value. As seen from previous assignments, the higher the manufacturing variable value, then generally the higher the resulting yield, and the above plot reiterates that notion with the most important variable. Given the absence of the biological variables, I can’t provide any evaluation of those.