library(tidyverse)
library(caret)
library(plotmo)
library(earth)
library(kernlab)
library(forecast)
library(ipred)
library(mlbench)
library(AppliedPredictiveModeling)
Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:
\[y = 10 sin (\pi x_1 x_2) + 20(x_3−0.5)^2 + 10x_4 +5x_5 + N(0,\sigma^2)\]
where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1
that simulates these data:
set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
trainingData$x <- data.frame(trainingData$x)
featurePlot(trainingData$x, trainingData$y)
Tune several models on these data.
# Define the candidate models to test
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
set.seed(100)
marsTuned <- train(trainingData$x, trainingData$y,
method = "earth",
tuneGrid = marsGrid,
trControl = trainControl(method = "cv", number = 10))
## Multivariate Adaptive Regression Spline
##
## 200 samples
## 10 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 4.327937 0.2544880 3.600474
## 1 3 3.572450 0.4912720 2.895811
## 1 4 2.596841 0.7183600 2.106341
## 1 5 2.370161 0.7659777 1.918669
## 1 6 2.276141 0.7881481 1.810001
## 1 7 1.766728 0.8751831 1.390215
## 1 8 1.780946 0.8723243 1.401345
## 1 9 1.665091 0.8819775 1.325515
## 1 10 1.663804 0.8821283 1.327657
## 1 11 1.657738 0.8822967 1.331730
## 1 12 1.653784 0.8827903 1.331504
## 1 13 1.648496 0.8823663 1.316407
## 1 14 1.639073 0.8841742 1.312833
## 1 15 1.639073 0.8841742 1.312833
## 1 16 1.639073 0.8841742 1.312833
## 1 17 1.639073 0.8841742 1.312833
## 1 18 1.639073 0.8841742 1.312833
## 1 19 1.639073 0.8841742 1.312833
## 1 20 1.639073 0.8841742 1.312833
## 1 21 1.639073 0.8841742 1.312833
## 1 22 1.639073 0.8841742 1.312833
## 1 23 1.639073 0.8841742 1.312833
## 1 24 1.639073 0.8841742 1.312833
## 1 25 1.639073 0.8841742 1.312833
## 1 26 1.639073 0.8841742 1.312833
## 1 27 1.639073 0.8841742 1.312833
## 1 28 1.639073 0.8841742 1.312833
## 1 29 1.639073 0.8841742 1.312833
## 1 30 1.639073 0.8841742 1.312833
## 1 31 1.639073 0.8841742 1.312833
## 1 32 1.639073 0.8841742 1.312833
## 1 33 1.639073 0.8841742 1.312833
## 1 34 1.639073 0.8841742 1.312833
## 1 35 1.639073 0.8841742 1.312833
## 1 36 1.639073 0.8841742 1.312833
## 1 37 1.639073 0.8841742 1.312833
## 1 38 1.639073 0.8841742 1.312833
## 2 2 4.327937 0.2544880 3.600474
## 2 3 3.572450 0.4912720 2.895811
## 2 4 2.661826 0.7070510 2.173471
## 2 5 2.404015 0.7578971 1.975387
## 2 6 2.243927 0.7914805 1.783072
## 2 7 1.856336 0.8605482 1.435682
## 2 8 1.754607 0.8763186 1.396841
## 2 9 1.603578 0.8938666 1.261361
## 2 10 1.492421 0.9084998 1.168700
## 2 11 1.317350 0.9292504 1.033926
## 2 12 1.304327 0.9320133 1.019108
## 2 13 1.277510 0.9323681 1.002927
## 2 14 1.269626 0.9350024 1.003346
## 2 15 1.266217 0.9359400 1.013893
## 2 16 1.268470 0.9354868 1.011414
## 2 17 1.268470 0.9354868 1.011414
## 2 18 1.268470 0.9354868 1.011414
## 2 19 1.268470 0.9354868 1.011414
## 2 20 1.268470 0.9354868 1.011414
## 2 21 1.268470 0.9354868 1.011414
## 2 22 1.268470 0.9354868 1.011414
## 2 23 1.268470 0.9354868 1.011414
## 2 24 1.268470 0.9354868 1.011414
## 2 25 1.268470 0.9354868 1.011414
## 2 26 1.268470 0.9354868 1.011414
## 2 27 1.268470 0.9354868 1.011414
## 2 28 1.268470 0.9354868 1.011414
## 2 29 1.268470 0.9354868 1.011414
## 2 30 1.268470 0.9354868 1.011414
## 2 31 1.268470 0.9354868 1.011414
## 2 32 1.268470 0.9354868 1.011414
## 2 33 1.268470 0.9354868 1.011414
## 2 34 1.268470 0.9354868 1.011414
## 2 35 1.268470 0.9354868 1.011414
## 2 36 1.268470 0.9354868 1.011414
## 2 37 1.268470 0.9354868 1.011414
## 2 38 1.268470 0.9354868 1.011414
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 15 and degree = 2.
## nprune degree
## 51 15 2
With MARS, the optimal model retains 15 terms and includes up to 2nd degree interactions. This is confirmed again below:
## Selected 15 of 18 terms, and 5 of 10 predictors
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6-unused, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 10 4
## GCV 1.618197 RSS 217.6151 GRSq 0.9343005 RSq 0.9553786
## earth variable importance
##
## Overall
## X1 100.00
## X4 85.14
## X2 69.24
## X5 49.31
## X3 40.00
## X9 0.00
## X6 0.00
## X8 0.00
## X7 0.00
## X10 0.00
The function plotmo
plots regression surfaces for a model. It creates a separate plot for each variable showing the predicted response as the predictor variable changes. Further details found here.
## plotmo grid: X1 X2 X3 X4 X5 X6 X7
## 0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
## X8 X9 X10
## 0.497961 0.5288716 0.5359218
If you look at the plotmo
function below, you can see that only 5 variables were plotted. Now when we look at the output below, only x1, x4, x2, x3, x5
are considered as important in the model.
## Create a specific candidate set of models to evaluate:
nnetGrid <- expand.grid(decay = c(0, 0.01, .1), size = c(1:10), bag = FALSE)
set.seed(100)
nnetTuned <- train(trainingData$x, trainingData$y,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = trainControl(method = "cv", number = 10),
preProc = c("center", "scale"),
linout = TRUE, trace = FALSE,
MaxNWts = 10 * (ncol(trainingData$x) + 1) + 10 + 1,
maxit = 500)
## Warning: executing %dopar% sequentially: no parallel backend registered
## Model Averaged Neural Network
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 2.392711 0.7610354 1.897330
## 0.00 2 2.410532 0.7567109 1.907478
## 0.00 3 2.043957 0.8224281 1.630751
## 0.00 4 2.289347 0.8130639 1.749187
## 0.00 5 2.445600 0.7709399 1.824446
## 0.00 6 2.898295 0.7388800 2.052725
## 0.00 7 3.351563 0.6644147 2.460366
## 0.00 8 6.513566 0.4418645 3.563297
## 0.00 9 4.484215 0.5644107 2.877950
## 0.00 10 3.422545 0.6247430 2.439739
## 0.01 1 2.385381 0.7602926 1.887906
## 0.01 2 2.425125 0.7510903 1.935991
## 0.01 3 2.151209 0.8016018 1.701951
## 0.01 4 2.091925 0.8154383 1.676653
## 0.01 5 2.169742 0.7999255 1.738715
## 0.01 6 2.262032 0.8056619 1.817195
## 0.01 7 2.318301 0.7861811 1.856908
## 0.01 8 2.413847 0.7772629 1.938009
## 0.01 9 2.317190 0.7847500 1.857641
## 0.01 10 2.480407 0.7408505 1.995656
## 0.10 1 2.393965 0.7596431 1.894191
## 0.10 2 2.423612 0.7525959 1.935872
## 0.10 3 2.169914 0.7982380 1.726854
## 0.10 4 2.059080 0.8224160 1.648610
## 0.10 5 1.975656 0.8394000 1.578979
## 0.10 6 2.152198 0.8098015 1.693056
## 0.10 7 2.161512 0.8163011 1.693526
## 0.10 8 2.273716 0.7922525 1.822713
## 0.10 9 2.315333 0.7811273 1.785409
## 0.10 10 2.334803 0.7692182 1.872733
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5, decay = 0.1 and bag = FALSE.
## size decay bag
## 25 5 0.1 FALSE
## Model Averaged Neural Network with 5 Repeats
##
## a 10-5-1 network with 61 weights
## options were - linear output units decay=0.1
## loess r-squared variable importance
##
## Overall
## X4 100.0000
## X1 95.5047
## X2 89.6186
## X5 45.2170
## X3 29.9330
## X9 6.3299
## X10 5.5182
## X8 3.2527
## X6 0.8884
## X7 0.0000
## plotmo grid: X1 X2 X3 X4 X5 X6 X7
## 0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
## X8 X9 X10
## 0.497961 0.5288716 0.5359218
With `NNET
10 variables are considered as important to the response variable.
svmTuned <- train(trainingData$x, trainingData$y,
method = "svmRadial",
preProc = c("center", "scale"),
tuneLength = 14,
trControl = trainControl(method = "cv"))
## Support Vector Machines with Radial Basis Function Kernel
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 2.485823 0.8027195 1.994537
## 0.50 2.229138 0.8161918 1.785385
## 1.00 2.054083 0.8349587 1.631458
## 2.00 1.947733 0.8487502 1.535577
## 4.00 1.900736 0.8551255 1.492755
## 8.00 1.870409 0.8595887 1.478525
## 16.00 1.863295 0.8606827 1.477106
## 32.00 1.863295 0.8606827 1.477106
## 64.00 1.863295 0.8606827 1.477106
## 128.00 1.863295 0.8606827 1.477106
## 256.00 1.863295 0.8606827 1.477106
## 512.00 1.863295 0.8606827 1.477106
## 1024.00 1.863295 0.8606827 1.477106
## 2048.00 1.863295 0.8606827 1.477106
##
## Tuning parameter 'sigma' was held constant at a value of 0.06219643
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06219643 and C = 16.
## sigma C
## 7 0.06219643 16
## Support Vector Machine object of class "ksvm"
##
## SV type: eps-svr (regression)
## parameter : epsilon = 0.1 cost C = 16
##
## Gaussian Radial Basis kernel function.
## Hyperparameter : sigma = 0.0621964330848378
##
## Number of Support Vectors : 152
##
## Objective Function Value : -74.9011
## Training error : 0.008485
## loess r-squared variable importance
##
## Overall
## X4 100.0000
## X1 95.5047
## X2 89.6186
## X5 45.2170
## X3 29.9330
## X9 6.3299
## X10 5.5182
## X8 3.2527
## X6 0.8884
## X7 0.0000
## plotmo grid: X1 X2 X3 X4 X5 X6 X7
## 0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
## X8 X9 X10
## 0.497961 0.5288716 0.5359218
knnTune <- train(trainingData$x,
trainingData$y,
method = "knn",
preProc = c("center", "scale"),
tuneGrid = data.frame(.k = 1:20),
trControl = trainControl(method = "cv"))
## k-Nearest Neighbors
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 4.155101 0.3952878 3.426372
## 2 3.329600 0.5487936 2.725990
## 3 3.340273 0.5474137 2.754800
## 4 3.303963 0.5572073 2.716737
## 5 3.220465 0.5929498 2.676269
## 6 3.183393 0.6075960 2.630516
## 7 3.143601 0.6303101 2.584373
## 8 3.138957 0.6428108 2.536260
## 9 3.131241 0.6525861 2.544453
## 10 3.088974 0.6791399 2.511649
## 11 3.092000 0.6866965 2.517350
## 12 3.088752 0.6885886 2.502131
## 13 3.077457 0.6980979 2.484329
## 14 3.119033 0.6876853 2.528248
## 15 3.103129 0.6979160 2.520581
## 16 3.094172 0.7068200 2.516521
## 17 3.129654 0.7112779 2.547965
## 18 3.150793 0.7057773 2.575678
## 19 3.140874 0.7140958 2.549941
## 20 3.160866 0.7178340 2.568251
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 13.
## k
## 13 13
## 13-nearest neighbor regression model
## loess r-squared variable importance
##
## Overall
## X4 100.0000
## X1 95.5047
## X2 89.6186
## X5 45.2170
## X3 29.9330
## X9 6.3299
## X10 5.5182
## X8 3.2527
## X6 0.8884
## X7 0.0000
## plotmo grid: X1 X2 X3 X4 X5 X6 X7
## 0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
## X8 X9 X10
## 0.497961 0.5288716 0.5359218
# MARS
marspred <- predict(marsTuned, newdata = testData$x)
marspv <- postResample(pred = marspred, obs = testData$y) #performance values
# NNET
nnpred <- predict(nnetTuned, newdata = testData$x)
nnpv <- postResample(pred = nnpred, obs = testData$y)
# SVM
svmpred <- predict(svmTuned, newdata = testData$x)
svmpv <- postResample(pred = svmpred, obs = testData$y)
#KNN
knnpred <- predict(knnTune, newdata = testData$x)
knnpv <- postResample(pred = knnpred, obs = testData$y)
data.frame(marspv, nnpv, svmpv, knnpv) %>% kableExtra::kable() %>% kableExtra::kable_styling(bootstrap_options = "striped")
marspv | nnpv | svmpv | knnpv | |
---|---|---|---|---|
RMSE | 1.1589948 | 2.1113956 | 2.0718255 | 3.1481557 |
Rsquared | 0.9460418 | 0.8277556 | 0.8259563 | 0.6747755 |
MAE | 0.9250230 | 1.5739011 | 1.5737503 | 2.5236041 |
It seems as though that the MARS model performed the best on the data. This is due to the fact that it has the lowest performance scores shown in the table above.
Does MARS select the informative predictors (those named X1–X5)?
Yes, and only those 5 predictors. Also the other models considers the variable X1
as the most important predictor while the other two models has X4
at the top.
Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.
data("ChemicalManufacturingProcess")
preprocessing <- preProcess(ChemicalManufacturingProcess[,-1], method = c("center", "scale", "knnImpute", "corr", "nzv"))
Xpreprocess <- predict(preprocessing, ChemicalManufacturingProcess[,-1])
yield <- as.matrix(ChemicalManufacturingProcess$Yield)
set.seed(789)
split2 <- yield %>%
createDataPartition(p = 0.8, list = FALSE, times = 1)
Xtrain.data <- Xpreprocess[split2, ] #chem train
xtest.data <- Xpreprocess[-split2, ] #chem test
Ytrain.data <- yield[split2, ] #yield train
ytest.data <- yield[-split2, ] #yield test
nnetGrid <- expand.grid(decay = c(0, 0.01, .1), size = c(1:10), bag = FALSE)
set.seed(200)
chem_nnet_tuned <- train(Xtrain.data, Ytrain.data,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = trainControl(method = "cv", number = 10),
linout = TRUE, trace = FALSE,
MaxNWts = 10 * (ncol(Xtrain.data) + 1) + 10 + 1,
maxit = 500)
nnpred2 <- predict(chem_nnet_tuned, newdata = xtest.data)
nnpv2 <- postResample(pred = nnpred2, obs = ytest.data)
marspred2 <- predict(chem_mars_tuned, newdata = xtest.data)
marspv2 <- postResample(pred = marspred2, obs = ytest.data)
svmpred2 <- predict(chem_svm_tuned, newdata = xtest.data)
svmpv2 <- postResample(pred = svmpred2, obs = ytest.data)
knnpred <- predict(chem_knn_tuned, newdata = xtest.data)
knnpv <- postResample(pred = knnpred, obs = ytest.data)
data.frame(nnpv, marspv, svmpv, knnpv) %>% kableExtra::kable() %>% kableExtra::kable_styling(bootstrap_options = "striped")
nnpv | marspv | svmpv | knnpv | |
---|---|---|---|---|
RMSE | 2.1113956 | 1.1589948 | 2.0718255 | 1.3461979 |
Rsquared | 0.8277556 | 0.9460418 | 0.8259563 | 0.3373993 |
MAE | 1.5739011 | 0.9250230 | 1.5737503 | 1.1138750 |
MARS is again the most optimal model.
Output from MARS Model
## Multivariate Adaptive Regression Spline
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 130, 129, 129, 130, 131, 131, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 1.436075 0.4577748 1.1408022
## 1 3 1.193294 0.6049739 0.9687530
## 1 4 1.193574 0.5759799 0.9516904
## 1 5 1.220425 0.5652127 0.9877018
## 1 6 1.233390 0.5544421 0.9929996
## 1 7 1.286913 0.5331624 1.0363345
## 1 8 1.257745 0.5563799 1.0121255
## 1 9 1.321295 0.5343454 1.0427887
## 1 10 1.297365 0.5482386 1.0330644
## 1 11 1.322958 0.5465284 1.0297670
## 1 12 1.335926 0.5363627 1.0406954
## 1 13 1.300053 0.5491050 1.0060914
## 1 14 1.311556 0.5461103 1.0154344
## 1 15 1.315674 0.5459365 1.0156035
## 1 16 1.306160 0.5491764 0.9961855
## 1 17 1.311054 0.5446757 0.9949384
## 1 18 1.311054 0.5446757 0.9949384
## 1 19 1.311054 0.5446757 0.9949384
## 1 20 1.311054 0.5446757 0.9949384
## 1 21 1.311054 0.5446757 0.9949384
## 1 22 1.311054 0.5446757 0.9949384
## 1 23 1.311054 0.5446757 0.9949384
## 1 24 1.311054 0.5446757 0.9949384
## 1 25 1.311054 0.5446757 0.9949384
## 1 26 1.311054 0.5446757 0.9949384
## 1 27 1.311054 0.5446757 0.9949384
## 1 28 1.311054 0.5446757 0.9949384
## 1 29 1.311054 0.5446757 0.9949384
## 1 30 1.311054 0.5446757 0.9949384
## 1 31 1.311054 0.5446757 0.9949384
## 1 32 1.311054 0.5446757 0.9949384
## 1 33 1.311054 0.5446757 0.9949384
## 1 34 1.311054 0.5446757 0.9949384
## 1 35 1.311054 0.5446757 0.9949384
## 1 36 1.311054 0.5446757 0.9949384
## 1 37 1.311054 0.5446757 0.9949384
## 1 38 1.311054 0.5446757 0.9949384
## 1 39 1.311054 0.5446757 0.9949384
## 1 40 1.311054 0.5446757 0.9949384
## 1 41 1.311054 0.5446757 0.9949384
## 1 42 1.311054 0.5446757 0.9949384
## 1 43 1.311054 0.5446757 0.9949384
## 1 44 1.311054 0.5446757 0.9949384
## 1 45 1.311054 0.5446757 0.9949384
## 1 46 1.311054 0.5446757 0.9949384
## 1 47 1.311054 0.5446757 0.9949384
## 1 48 1.311054 0.5446757 0.9949384
## 1 49 1.311054 0.5446757 0.9949384
## 1 50 1.311054 0.5446757 0.9949384
## 1 51 1.311054 0.5446757 0.9949384
## 1 52 1.311054 0.5446757 0.9949384
## 1 53 1.311054 0.5446757 0.9949384
## 1 54 1.311054 0.5446757 0.9949384
## 1 55 1.311054 0.5446757 0.9949384
## 1 56 1.311054 0.5446757 0.9949384
## 1 57 1.311054 0.5446757 0.9949384
## 1 58 1.311054 0.5446757 0.9949384
## 1 59 1.311054 0.5446757 0.9949384
## 1 60 1.311054 0.5446757 0.9949384
## 1 61 1.311054 0.5446757 0.9949384
## 1 62 1.311054 0.5446757 0.9949384
## 1 63 1.311054 0.5446757 0.9949384
## 1 64 1.311054 0.5446757 0.9949384
## 1 65 1.311054 0.5446757 0.9949384
## 1 66 1.311054 0.5446757 0.9949384
## 1 67 1.311054 0.5446757 0.9949384
## 1 68 1.311054 0.5446757 0.9949384
## 1 69 1.311054 0.5446757 0.9949384
## 1 70 1.311054 0.5446757 0.9949384
## 1 71 1.311054 0.5446757 0.9949384
## 1 72 1.311054 0.5446757 0.9949384
## 1 73 1.311054 0.5446757 0.9949384
## 1 74 1.311054 0.5446757 0.9949384
## 1 75 1.311054 0.5446757 0.9949384
## 1 76 1.311054 0.5446757 0.9949384
## 1 77 1.311054 0.5446757 0.9949384
## 1 78 1.311054 0.5446757 0.9949384
## 1 79 1.311054 0.5446757 0.9949384
## 1 80 1.311054 0.5446757 0.9949384
## 1 81 1.311054 0.5446757 0.9949384
## 1 82 1.311054 0.5446757 0.9949384
## 1 83 1.311054 0.5446757 0.9949384
## 1 84 1.311054 0.5446757 0.9949384
## 1 85 1.311054 0.5446757 0.9949384
## 1 86 1.311054 0.5446757 0.9949384
## 1 87 1.311054 0.5446757 0.9949384
## 1 88 1.311054 0.5446757 0.9949384
## 1 89 1.311054 0.5446757 0.9949384
## 1 90 1.311054 0.5446757 0.9949384
## 1 91 1.311054 0.5446757 0.9949384
## 1 92 1.311054 0.5446757 0.9949384
## 1 93 1.311054 0.5446757 0.9949384
## 1 94 1.311054 0.5446757 0.9949384
## 1 95 1.311054 0.5446757 0.9949384
## 1 96 1.311054 0.5446757 0.9949384
## 1 97 1.311054 0.5446757 0.9949384
## 1 98 1.311054 0.5446757 0.9949384
## 1 99 1.311054 0.5446757 0.9949384
## 1 100 1.311054 0.5446757 0.9949384
## 2 2 1.436075 0.4577748 1.1408022
## 2 3 1.208087 0.5969342 0.9862939
## 2 4 1.172703 0.5917301 0.9470231
## 2 5 1.174369 0.5826173 0.9531073
## 2 6 1.220553 0.5562528 0.9801425
## 2 7 1.239515 0.5637542 0.9767852
## 2 8 1.227054 0.5592244 0.9723114
## 2 9 1.224300 0.5785571 0.9752012
## 2 10 1.235363 0.5566674 0.9961565
## 2 11 1.252294 0.5527595 1.0096506
## 2 12 1.224109 0.5791192 0.9812716
## 2 13 1.482177 0.5807626 1.0659295
## 2 14 1.517563 0.5774507 1.0878785
## 2 15 1.562717 0.5644419 1.1173774
## 2 16 1.585096 0.5553231 1.1395206
## 2 17 1.691484 0.5585704 1.1843662
## 2 18 1.701229 0.5588211 1.1878379
## 2 19 1.707018 0.5682174 1.1889003
## 2 20 1.741695 0.5635167 1.2296257
## 2 21 1.798015 0.5624517 1.2531234
## 2 22 1.796379 0.5632940 1.2566458
## 2 23 2.062025 0.5620930 1.3718895
## 2 24 2.093221 0.5621082 1.3896171
## 2 25 2.158256 0.5620288 1.4209758
## 2 26 2.158256 0.5620288 1.4209758
## 2 27 2.179565 0.5619943 1.4329294
## 2 28 2.179565 0.5619943 1.4329294
## 2 29 2.179565 0.5619943 1.4329294
## 2 30 2.179565 0.5619943 1.4329294
## 2 31 2.179565 0.5619943 1.4329294
## 2 32 2.179565 0.5619943 1.4329294
## 2 33 2.179565 0.5619943 1.4329294
## 2 34 2.179565 0.5619943 1.4329294
## 2 35 2.179565 0.5619943 1.4329294
## 2 36 2.179565 0.5619943 1.4329294
## 2 37 2.179565 0.5619943 1.4329294
## 2 38 2.179565 0.5619943 1.4329294
## 2 39 2.179565 0.5619943 1.4329294
## 2 40 2.179565 0.5619943 1.4329294
## 2 41 2.179565 0.5619943 1.4329294
## 2 42 2.179565 0.5619943 1.4329294
## 2 43 2.179565 0.5619943 1.4329294
## 2 44 2.179565 0.5619943 1.4329294
## 2 45 2.179565 0.5619943 1.4329294
## 2 46 2.179565 0.5619943 1.4329294
## 2 47 2.179565 0.5619943 1.4329294
## 2 48 2.179565 0.5619943 1.4329294
## 2 49 2.179565 0.5619943 1.4329294
## 2 50 2.179565 0.5619943 1.4329294
## 2 51 2.179565 0.5619943 1.4329294
## 2 52 2.179565 0.5619943 1.4329294
## 2 53 2.179565 0.5619943 1.4329294
## 2 54 2.179565 0.5619943 1.4329294
## 2 55 2.179565 0.5619943 1.4329294
## 2 56 2.179565 0.5619943 1.4329294
## 2 57 2.179565 0.5619943 1.4329294
## 2 58 2.179565 0.5619943 1.4329294
## 2 59 2.179565 0.5619943 1.4329294
## 2 60 2.179565 0.5619943 1.4329294
## 2 61 2.179565 0.5619943 1.4329294
## 2 62 2.179565 0.5619943 1.4329294
## 2 63 2.179565 0.5619943 1.4329294
## 2 64 2.179565 0.5619943 1.4329294
## 2 65 2.179565 0.5619943 1.4329294
## 2 66 2.179565 0.5619943 1.4329294
## 2 67 2.179565 0.5619943 1.4329294
## 2 68 2.179565 0.5619943 1.4329294
## 2 69 2.179565 0.5619943 1.4329294
## 2 70 2.179565 0.5619943 1.4329294
## 2 71 2.179565 0.5619943 1.4329294
## 2 72 2.179565 0.5619943 1.4329294
## 2 73 2.179565 0.5619943 1.4329294
## 2 74 2.179565 0.5619943 1.4329294
## 2 75 2.179565 0.5619943 1.4329294
## 2 76 2.179565 0.5619943 1.4329294
## 2 77 2.179565 0.5619943 1.4329294
## 2 78 2.179565 0.5619943 1.4329294
## 2 79 2.179565 0.5619943 1.4329294
## 2 80 2.179565 0.5619943 1.4329294
## 2 81 2.179565 0.5619943 1.4329294
## 2 82 2.179565 0.5619943 1.4329294
## 2 83 2.179565 0.5619943 1.4329294
## 2 84 2.179565 0.5619943 1.4329294
## 2 85 2.179565 0.5619943 1.4329294
## 2 86 2.179565 0.5619943 1.4329294
## 2 87 2.179565 0.5619943 1.4329294
## 2 88 2.179565 0.5619943 1.4329294
## 2 89 2.179565 0.5619943 1.4329294
## 2 90 2.179565 0.5619943 1.4329294
## 2 91 2.179565 0.5619943 1.4329294
## 2 92 2.179565 0.5619943 1.4329294
## 2 93 2.179565 0.5619943 1.4329294
## 2 94 2.179565 0.5619943 1.4329294
## 2 95 2.179565 0.5619943 1.4329294
## 2 96 2.179565 0.5619943 1.4329294
## 2 97 2.179565 0.5619943 1.4329294
## 2 98 2.179565 0.5619943 1.4329294
## 2 99 2.179565 0.5619943 1.4329294
## 2 100 2.179565 0.5619943 1.4329294
## 3 2 1.436075 0.4577748 1.1408022
## 3 3 1.357914 0.5399819 1.0828529
## 3 4 1.324153 0.5406578 1.0577133
## 3 5 1.208242 0.6015042 0.9650823
## 3 6 1.245846 0.5631733 0.9926445
## 3 7 1.305641 0.5445680 1.0421780
## 3 8 1.293331 0.5598225 1.0159859
## 3 9 1.257642 0.5746593 0.9988200
## 3 10 1.302785 0.5647584 1.0042240
## 3 11 1.316448 0.5752579 1.0229778
## 3 12 1.358234 0.5565035 1.0296341
## 3 13 1.379716 0.5475452 1.0326861
## 3 14 1.393334 0.5474919 1.0527433
## 3 15 1.311340 0.5899059 1.0041141
## 3 16 1.319323 0.5807196 1.0151112
## 3 17 1.338858 0.5773579 1.0245406
## 3 18 1.396909 0.5552912 1.0534804
## 3 19 1.414198 0.5488446 1.0660197
## 3 20 1.434678 0.5366410 1.0664162
## 3 21 4.311283 0.4639792 1.8833911
## 3 22 4.616280 0.4532189 1.9875788
## 3 23 6.328470 0.4483235 2.4796212
## 3 24 6.436223 0.4474803 2.5422822
## 3 25 6.418235 0.4551389 2.5373112
## 3 26 6.398090 0.4639987 2.5237233
## 3 27 6.381858 0.4666206 2.5122745
## 3 28 6.387109 0.4655397 2.5182825
## 3 29 6.387109 0.4655397 2.5182825
## 3 30 6.387109 0.4655397 2.5182825
## 3 31 6.387109 0.4655397 2.5182825
## 3 32 6.387109 0.4655397 2.5182825
## 3 33 6.387109 0.4655397 2.5182825
## 3 34 6.387109 0.4655397 2.5182825
## 3 35 6.387109 0.4655397 2.5182825
## 3 36 6.387109 0.4655397 2.5182825
## 3 37 6.387109 0.4655397 2.5182825
## 3 38 6.387109 0.4655397 2.5182825
## 3 39 6.387109 0.4655397 2.5182825
## 3 40 6.387109 0.4655397 2.5182825
## 3 41 6.387109 0.4655397 2.5182825
## 3 42 6.387109 0.4655397 2.5182825
## 3 43 6.387109 0.4655397 2.5182825
## 3 44 6.387109 0.4655397 2.5182825
## 3 45 6.387109 0.4655397 2.5182825
## 3 46 6.387109 0.4655397 2.5182825
## 3 47 6.387109 0.4655397 2.5182825
## 3 48 6.387109 0.4655397 2.5182825
## 3 49 6.387109 0.4655397 2.5182825
## 3 50 6.387109 0.4655397 2.5182825
## 3 51 6.387109 0.4655397 2.5182825
## 3 52 6.387109 0.4655397 2.5182825
## 3 53 6.387109 0.4655397 2.5182825
## 3 54 6.387109 0.4655397 2.5182825
## 3 55 6.387109 0.4655397 2.5182825
## 3 56 6.387109 0.4655397 2.5182825
## 3 57 6.387109 0.4655397 2.5182825
## 3 58 6.387109 0.4655397 2.5182825
## 3 59 6.387109 0.4655397 2.5182825
## 3 60 6.387109 0.4655397 2.5182825
## 3 61 6.387109 0.4655397 2.5182825
## 3 62 6.387109 0.4655397 2.5182825
## 3 63 6.387109 0.4655397 2.5182825
## 3 64 6.387109 0.4655397 2.5182825
## 3 65 6.387109 0.4655397 2.5182825
## 3 66 6.387109 0.4655397 2.5182825
## 3 67 6.387109 0.4655397 2.5182825
## 3 68 6.387109 0.4655397 2.5182825
## 3 69 6.387109 0.4655397 2.5182825
## 3 70 6.387109 0.4655397 2.5182825
## 3 71 6.387109 0.4655397 2.5182825
## 3 72 6.387109 0.4655397 2.5182825
## 3 73 6.387109 0.4655397 2.5182825
## 3 74 6.387109 0.4655397 2.5182825
## 3 75 6.387109 0.4655397 2.5182825
## 3 76 6.387109 0.4655397 2.5182825
## 3 77 6.387109 0.4655397 2.5182825
## 3 78 6.387109 0.4655397 2.5182825
## 3 79 6.387109 0.4655397 2.5182825
## 3 80 6.387109 0.4655397 2.5182825
## 3 81 6.387109 0.4655397 2.5182825
## 3 82 6.387109 0.4655397 2.5182825
## 3 83 6.387109 0.4655397 2.5182825
## 3 84 6.387109 0.4655397 2.5182825
## 3 85 6.387109 0.4655397 2.5182825
## 3 86 6.387109 0.4655397 2.5182825
## 3 87 6.387109 0.4655397 2.5182825
## 3 88 6.387109 0.4655397 2.5182825
## 3 89 6.387109 0.4655397 2.5182825
## 3 90 6.387109 0.4655397 2.5182825
## 3 91 6.387109 0.4655397 2.5182825
## 3 92 6.387109 0.4655397 2.5182825
## 3 93 6.387109 0.4655397 2.5182825
## 3 94 6.387109 0.4655397 2.5182825
## 3 95 6.387109 0.4655397 2.5182825
## 3 96 6.387109 0.4655397 2.5182825
## 3 97 6.387109 0.4655397 2.5182825
## 3 98 6.387109 0.4655397 2.5182825
## 3 99 6.387109 0.4655397 2.5182825
## 3 100 6.387109 0.4655397 2.5182825
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 4 and degree = 2.
## nprune degree
## 102 4 2
## Selected 4 of 47 terms, and 2 of 56 predictors
## Termination condition: RSq changed by less than 0.001 at 47 terms
## Importance: ManufacturingProcess32, ManufacturingProcess13, ...
## Number of terms at each degree of interaction: 1 3 (additive model)
## GCV 1.47066 RSS 187.5118 GRSq 0.5928938 RSq 0.6344774
## plotmo grid: BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
## -0.1070431 -0.04306519 -0.1062217
## BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
## -0.07283723 -0.004683137 -0.09620684
## BiologicalMaterial08 BiologicalMaterial09 BiologicalMaterial10
## 0.06681 -0.04830923 -0.1178766
## BiologicalMaterial11 BiologicalMaterial12 ManufacturingProcess01
## -0.1012727 -0.03863564 0.1056672
## ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04
## 0.5096271 0.1087038 0.3424324
## ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07
## -0.06529069 -0.2229103 0.4390925
## ManufacturingProcess08 ManufacturingProcess09 ManufacturingProcess10
## 0.8941637 0.1066231 -0.1030952
## ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess13
## 0.02020002 -0.4806937 -0.007834829
## ManufacturingProcess14 ManufacturingProcess15 ManufacturingProcess16
## 0.04826216 -0.09295527 0.06169755
## ManufacturingProcess17 ManufacturingProcess18 ManufacturingProcess19
## 0.005007187 0.06617593 -0.1360039
## ManufacturingProcess20 ManufacturingProcess21 ManufacturingProcess22
## 0.0688801 -0.1744786 -0.1218132
## ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25
## -0.01031118 -0.1438567 0.0651293
## ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28
## 0.06432695 0.06918722 0.7255096
## ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31
## -0.0066778 0.03954225 0.09273307
## ManufacturingProcess32 ManufacturingProcess33 ManufacturingProcess34
## -0.08632349 0.1836771 0.1182687
## ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess37
## -0.05513017 0.4884872 -0.03063781
## ManufacturingProcess38 ManufacturingProcess39 ManufacturingProcess40
## 0.7174727 0.231727 -0.4626528
## ManufacturingProcess41 ManufacturingProcess42 ManufacturingProcess43
## -0.4405878 0.2027957 -0.1289558
## ManufacturingProcess44 ManufacturingProcess45
## 0.2946725 0.1522024
Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?
## earth variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess32 100.0
## ManufacturingProcess13 58.9
## BiologicalMaterial02 0.0
## BiologicalMaterial11 0.0
## ManufacturingProcess31 0.0
## ManufacturingProcess44 0.0
## ManufacturingProcess25 0.0
## BiologicalMaterial09 0.0
## ManufacturingProcess39 0.0
## ManufacturingProcess42 0.0
## ManufacturingProcess30 0.0
## BiologicalMaterial12 0.0
## ManufacturingProcess14 0.0
## BiologicalMaterial04 0.0
## ManufacturingProcess45 0.0
## ManufacturingProcess35 0.0
## BiologicalMaterial10 0.0
## BiologicalMaterial06 0.0
## ManufacturingProcess06 0.0
## ManufacturingProcess43 0.0
ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 15))
set.seed(101)
ridgeRegFit <- train(Xtrain.data, Ytrain.data, method = "ridge", tuneGrid = ridgeGrid, trControl = trainControl(method = "cv", number = 10))
varImp(ridgeRegFit)
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess13 100.00
## ManufacturingProcess32 97.82
## ManufacturingProcess17 86.84
## BiologicalMaterial06 83.64
## BiologicalMaterial03 78.13
## ManufacturingProcess09 72.37
## BiologicalMaterial12 72.20
## ManufacturingProcess36 70.51
## BiologicalMaterial02 63.10
## ManufacturingProcess06 61.88
## ManufacturingProcess31 58.39
## BiologicalMaterial11 56.85
## ManufacturingProcess33 47.06
## ManufacturingProcess11 45.94
## BiologicalMaterial04 45.43
## ManufacturingProcess29 44.71
## BiologicalMaterial08 44.36
## ManufacturingProcess12 38.22
## BiologicalMaterial01 35.56
## BiologicalMaterial09 33.79
predictions <- ridgeRegFit %>% predict(xtest.data)
cbind(
RMSE = RMSE(predictions, ytest.data),
R_squared = caret::R2(predictions, ytest.data)
)
## RMSE R_squared
## [1,] 1.06489 0.5657402
Both of the non-linear and linear optimal models have the manufacturing predictors as most important variables. However, the MARS non-linear model only considers the manufacturing process as important with ManufacturingProcess32
first then ManufacturingProcess13
. The linear model, on the other hand, has those two predictors as important but the other way around and also considers biological predictors in it’s top 10.
for the predictors are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?
## [,1]
## [1,] 0.6083321
## [,1]
## [1,] -0.5036797
Manufacturing proceses are possibly the steps taken to create the end product graded by a rate. Since only manufacturing processes are the most important in this model we can infer that with ManufacturingProcess32
there is a positive correlation here which make sense. If the process is great then the product will be good. On the other hand, the outcome variable has a negative correlation with the ManufacturingProcess13
which means that if the process goes bad, then the product will not be at it’s best.