Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:
\(y=10sin(\pi x_1x_2) + 20(x_3-0.5)^2+10x_4+5x_5+N(0,\sigma^2)\)
where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:
library(mlbench)
set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
featurePlot(trainingData$x, trainingData$y)
## or other methods.
## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)
Tune several models on these data.
Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?
nnetGrid <- expand.grid(.decay = c(0,0.01,.1),
.size = c(1:5),
.bag = FALSE)
nnetFit <- train(trainingData$x, trainingData$y,
method = 'avNNet',
tuneGrid = nnetGrid,
preProc = c('center','scale'),
linout = TRUE,
trace = FALSE,
MaxNWts = 5 * (ncol(trainingData$x) + 1 + 5 + 1),
maxit = 100
)
## Warning: executing %dopar% sequentially: no parallel backend registered
nnetFit
## Model Averaged Neural Network
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 2.473423 0.7539089 1.936474
## 0.00 2 2.507141 0.7476210 1.957269
## 0.00 3 2.346503 0.7779602 1.865790
## 0.00 4 2.434863 0.7625743 1.921202
## 0.00 5 2.709936 0.7215371 2.093228
## 0.01 1 2.465089 0.7523657 1.922437
## 0.01 2 2.506239 0.7461192 1.965544
## 0.01 3 2.360384 0.7758343 1.876507
## 0.01 4 2.390931 0.7706564 1.909652
## 0.01 5 2.547943 0.7454483 2.004900
## 0.10 1 2.446652 0.7563434 1.898926
## 0.10 2 2.500327 0.7477902 1.963910
## 0.10 3 2.283814 0.7878965 1.807668
## 0.10 4 2.322818 0.7847297 1.864023
## 0.10 5 2.362975 0.7752266 1.887780
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 3, decay = 0.1 and bag = FALSE.
nnetPred <- predict(nnetFit, newdata = testData$x)
postResample(pred = nnetPred, obs = testData$y)
## RMSE Rsquared MAE
## 2.1852646 0.8101321 1.6309365
MARS model
#creating tune grid
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:28)
set.seed(100)
marsTuned <- train(trainingData$x, trainingData$y,
method = 'earth',
tuneGrid = marsGrid,
trControl = trainControl(method = 'cv'))
marsTuned
## Multivariate Adaptive Regression Spline
##
## 200 samples
## 10 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 4.327937 0.2544880 3.600474
## 1 3 3.572450 0.4912720 2.895811
## 1 4 2.596841 0.7183600 2.106341
## 1 5 2.370161 0.7659777 1.918669
## 1 6 2.276141 0.7881481 1.810001
## 1 7 1.766728 0.8751831 1.390215
## 1 8 1.780946 0.8723243 1.401345
## 1 9 1.665091 0.8819775 1.325515
## 1 10 1.663804 0.8821283 1.327657
## 1 11 1.657738 0.8822967 1.331730
## 1 12 1.653784 0.8827903 1.331504
## 1 13 1.648496 0.8823663 1.316407
## 1 14 1.639073 0.8841742 1.312833
## 1 15 1.639073 0.8841742 1.312833
## 1 16 1.639073 0.8841742 1.312833
## 1 17 1.639073 0.8841742 1.312833
## 1 18 1.639073 0.8841742 1.312833
## 1 19 1.639073 0.8841742 1.312833
## 1 20 1.639073 0.8841742 1.312833
## 1 21 1.639073 0.8841742 1.312833
## 1 22 1.639073 0.8841742 1.312833
## 1 23 1.639073 0.8841742 1.312833
## 1 24 1.639073 0.8841742 1.312833
## 1 25 1.639073 0.8841742 1.312833
## 1 26 1.639073 0.8841742 1.312833
## 1 27 1.639073 0.8841742 1.312833
## 1 28 1.639073 0.8841742 1.312833
## 2 2 4.327937 0.2544880 3.600474
## 2 3 3.572450 0.4912720 2.895811
## 2 4 2.661826 0.7070510 2.173471
## 2 5 2.404015 0.7578971 1.975387
## 2 6 2.243927 0.7914805 1.783072
## 2 7 1.856336 0.8605482 1.435682
## 2 8 1.754607 0.8763186 1.396841
## 2 9 1.603578 0.8938666 1.261361
## 2 10 1.492421 0.9084998 1.168700
## 2 11 1.317350 0.9292504 1.033926
## 2 12 1.304327 0.9320133 1.019108
## 2 13 1.277510 0.9323681 1.002927
## 2 14 1.269626 0.9350024 1.003346
## 2 15 1.266217 0.9359400 1.013893
## 2 16 1.268470 0.9354868 1.011414
## 2 17 1.268470 0.9354868 1.011414
## 2 18 1.268470 0.9354868 1.011414
## 2 19 1.268470 0.9354868 1.011414
## 2 20 1.268470 0.9354868 1.011414
## 2 21 1.268470 0.9354868 1.011414
## 2 22 1.268470 0.9354868 1.011414
## 2 23 1.268470 0.9354868 1.011414
## 2 24 1.268470 0.9354868 1.011414
## 2 25 1.268470 0.9354868 1.011414
## 2 26 1.268470 0.9354868 1.011414
## 2 27 1.268470 0.9354868 1.011414
## 2 28 1.268470 0.9354868 1.011414
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 15 and degree = 2.
varImp(marsTuned)
## earth variable importance
##
## Overall
## X1 100.00
## X4 75.24
## X2 48.73
## X5 15.52
## X3 0.00
marsPred <- predict(marsTuned, newdata = testData$x)
postResample(pred = marsPred, obs = testData$y)
## RMSE Rsquared MAE
## 1.1589948 0.9460418 0.9250230
Support Vector Machines
svmRTuned <- train(trainingData$x, trainingData$y,
method = 'svmRadial',
preProc = c('center','scale'),
tuneLength = 14,
trControl = trainControl(method = 'cv'))
svmRTuned
## Support Vector Machines with Radial Basis Function Kernel
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 2.490737 0.8009120 1.982118
## 0.50 2.246868 0.8153042 1.774454
## 1.00 2.051872 0.8400992 1.614368
## 2.00 1.949707 0.8534618 1.524201
## 4.00 1.886125 0.8610205 1.465373
## 8.00 1.849240 0.8654699 1.436630
## 16.00 1.834604 0.8673639 1.429807
## 32.00 1.833221 0.8675754 1.428687
## 64.00 1.833221 0.8675754 1.428687
## 128.00 1.833221 0.8675754 1.428687
## 256.00 1.833221 0.8675754 1.428687
## 512.00 1.833221 0.8675754 1.428687
## 1024.00 1.833221 0.8675754 1.428687
## 2048.00 1.833221 0.8675754 1.428687
##
## Tuning parameter 'sigma' was held constant at a value of 0.06315483
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06315483 and C = 32.
svmPred <- predict(svmRTuned, newdata = testData$x)
postResample(pred = svmPred, obs = testData$y)
## RMSE Rsquared MAE
## 2.0741473 0.8255848 1.5755185
K - Nearest Neighbors
knnTune <- train(trainingData$x, trainingData$y,
method = 'knn',
preProc = c('center','scale'),
tuneGrid = data.frame(.k = 1:20),
trControl = trainControl(method = 'cv'))
knnTune
## k-Nearest Neighbors
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 4.299219 0.3772830 3.525540
## 2 3.551955 0.5041055 2.975169
## 3 3.468087 0.5137546 2.847178
## 4 3.311858 0.5590615 2.696377
## 5 3.219196 0.5848916 2.625896
## 6 3.197284 0.6003245 2.576550
## 7 3.164256 0.6116347 2.565735
## 8 3.157394 0.6246817 2.557959
## 9 3.174966 0.6293168 2.557658
## 10 3.140138 0.6441006 2.528967
## 11 3.069367 0.6730751 2.457660
## 12 3.069337 0.6830170 2.477946
## 13 3.086832 0.6926565 2.488467
## 14 3.082621 0.7033583 2.492675
## 15 3.089916 0.7033831 2.491969
## 16 3.115603 0.7110773 2.510279
## 17 3.117067 0.7169598 2.522369
## 18 3.137995 0.7137976 2.542800
## 19 3.137240 0.7211267 2.542361
## 20 3.141507 0.7221221 2.532482
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 12.
knnPred <- predict(knnTune, newdata = testData$x)
postResample(pred = knnPred, obs = testData$y)
## RMSE Rsquared MAE
## 3.1307150 0.6731293 2.5070032
The MARS model produces the best results with a Rsquared of 0.9460418 on the test set. The Mars model only uses the informative predictors, X1-X5.
Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.
library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)
df <- ChemicalManufacturingProcess
init <- mice(df, maxit = 0)
predM <- init$predictorMatrix
set.seed(123)
imputed <- mice(df, method = 'pmm', predictorMatrix = predM, m=5)
##
## iter imp variable
## 1 1 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 1 2 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 1 3 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 1 4 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 1 5 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 2 1 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 2 2 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 2 3 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 2 4 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 2 5 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 3 1 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 3 2 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 3 3 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 3 4 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 3 5 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 4 1 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 4 2 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 4 3 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 4 4 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 4 5 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 5 1 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 5 2 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 5 3 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 5 4 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## 5 5 ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess14 ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess40 ManufacturingProcess41
## Warning: Number of logged events: 675
df <- complete(imputed)
set.seed(123)
trans_df <- preProcess(df,
method = c('center', 'scale'))
df <- predict(trans_df, df)
sample <- sample.split(df$Yield, SplitRatio = 0.75)
X_train = subset(df[,-1], sample == TRUE)
X_test = subset(df[,-1], sample == FALSE)
y_train <- subset(df[,1], sample == TRUE)
y_test <- subset(df[,1], sample == FALSE)
enetGrid <- expand.grid(.lambda = c(0,0.01,0.1),
.fraction = seq(0.05,1,length = 20))
set.seed(213)
enetTune <- train(X_train, y_train,
method = 'enet',
tuneGrid = enetGrid
)
enetTune
## Elasticnet
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## lambda fraction RMSE Rsquared MAE
## 0.00 0.05 0.7270126 0.56132318 0.5629656
## 0.00 0.10 1.1259552 0.41659738 0.6744747
## 0.00 0.15 1.8955366 0.31904340 0.8499337
## 0.00 0.20 3.0144268 0.22447697 1.0705791
## 0.00 0.25 4.0516707 0.21514663 1.2564735
## 0.00 0.30 5.0180356 0.19349592 1.4396472
## 0.00 0.35 5.9864366 0.16262706 1.6117368
## 0.00 0.40 6.7661921 0.13050553 1.7596472
## 0.00 0.45 7.3886632 0.12020733 1.8803490
## 0.00 0.50 7.8721167 0.12096628 1.9739936
## 0.00 0.55 8.3264538 0.11484857 2.0605206
## 0.00 0.60 8.7285206 0.10703066 2.1449555
## 0.00 0.65 9.1484752 0.09959733 2.2333224
## 0.00 0.70 9.5106009 0.09423044 2.3072698
## 0.00 0.75 9.8946656 0.09136830 2.3868264
## 0.00 0.80 10.2920822 0.09056733 2.4655756
## 0.00 0.85 10.6911097 0.09031922 2.5419748
## 0.00 0.90 11.0311795 0.09057419 2.6037477
## 0.00 0.95 11.9174103 0.09069926 2.8770231
## 0.00 1.00 12.3285571 0.08941398 2.9562052
## 0.01 0.05 0.8415365 0.51958153 0.6824794
## 0.01 0.10 0.7292621 0.54190264 0.5913988
## 0.01 0.15 0.7395966 0.54138071 0.5659928
## 0.01 0.20 0.7489049 0.54744324 0.5628365
## 0.01 0.25 0.7875392 0.53512654 0.5709283
## 0.01 0.30 0.8667019 0.52140556 0.5921187
## 0.01 0.35 0.9431457 0.51335662 0.6125160
## 0.01 0.40 1.0802047 0.47259183 0.6476523
## 0.01 0.45 1.2088266 0.44330947 0.6826894
## 0.01 0.50 1.3289880 0.42612155 0.7138401
## 0.01 0.55 1.4549337 0.39679940 0.7485097
## 0.01 0.60 1.5864313 0.36831855 0.7812921
## 0.01 0.65 1.7126590 0.34755877 0.8130701
## 0.01 0.70 1.8479794 0.33197542 0.8452571
## 0.01 0.75 2.0049644 0.31222127 0.8840289
## 0.01 0.80 2.1771623 0.29141116 0.9246542
## 0.01 0.85 2.3768469 0.27059595 0.9683999
## 0.01 0.90 2.5894120 0.25282089 1.0119540
## 0.01 0.95 2.7828747 0.24269053 1.0501841
## 0.01 1.00 3.0226233 0.22078752 1.1065328
## 0.10 0.05 0.9358302 0.44646524 0.7544072
## 0.10 0.10 0.8306045 0.52549088 0.6735563
## 0.10 0.15 0.7550218 0.54448298 0.6159927
## 0.10 0.20 0.7283200 0.53960719 0.5840903
## 0.10 0.25 0.7320465 0.53798968 0.5701453
## 0.10 0.30 0.7447597 0.54066899 0.5651241
## 0.10 0.35 0.7392803 0.54525390 0.5636148
## 0.10 0.40 0.7433433 0.54529271 0.5657410
## 0.10 0.45 0.7712882 0.53747107 0.5726082
## 0.10 0.50 0.8030717 0.53640796 0.5799205
## 0.10 0.55 0.8384202 0.53638723 0.5882022
## 0.10 0.60 0.8877581 0.52533907 0.6007169
## 0.10 0.65 0.9401590 0.50779837 0.6175740
## 0.10 0.70 1.0102028 0.48376848 0.6384253
## 0.10 0.75 1.0914346 0.45646923 0.6619463
## 0.10 0.80 1.1877890 0.42705249 0.6878470
## 0.10 0.85 1.2830180 0.40204089 0.7123231
## 0.10 0.90 1.3732466 0.38258314 0.7345218
## 0.10 0.95 1.4570607 0.36674113 0.7552740
## 0.10 1.00 1.5738429 0.34139562 0.7907798
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 0.05 and lambda = 0.
Neural Network
nnetFit <- train(X_train, y_train,
method = 'avNNet',
tuneGrid = nnetGrid,
linout = TRUE,
trace = FALSE,
MaxNWts = 5 * (ncol(X_train) + 1 + 5 + 1),
maxit = 100
)
nnetFit
## Model Averaged Neural Network
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 0.8243357 0.4377872 0.6509041
## 0.00 2 0.7504338 0.5211635 0.5929630
## 0.00 3 0.7827993 0.4986901 0.6288736
## 0.00 4 0.7880309 0.5050492 0.6168466
## 0.00 5 0.7868572 0.5046784 0.6242823
## 0.01 1 0.8524163 0.4366716 0.6756025
## 0.01 2 0.7954921 0.5013045 0.6250465
## 0.01 3 0.7669991 0.5287450 0.6049580
## 0.01 4 0.7563192 0.5355756 0.5973879
## 0.01 5 0.7693275 0.5267339 0.6083181
## 0.10 1 0.9054549 0.4292684 0.7213151
## 0.10 2 0.8222747 0.5022365 0.6457569
## 0.10 3 0.7727497 0.5369488 0.6154186
## 0.10 4 0.7536887 0.5464417 0.5996231
## 0.10 5 0.7496252 0.5442816 0.5957507
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5, decay = 0.1 and bag = FALSE.
nnetPred <- predict(nnetFit, newdata = X_test)
postResample(pred = nnetPred, obs = y_test)
## RMSE Rsquared MAE
## 0.5554940 0.6108372 0.4494205
MARS model
marsTuned <- train(X_train, y_train,
method = 'earth',
tuneGrid = marsGrid,
trControl = trainControl(method = 'cv'))
marsTuned
## Multivariate Adaptive Regression Spline
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 119, 119, 120, 118, 118, 120, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 0.7904048 0.4507351 0.6275043
## 1 3 0.6794020 0.6006938 0.5595096
## 1 4 0.6668748 0.6456619 0.5480074
## 1 5 0.6721475 0.6342758 0.5478607
## 1 6 0.6659013 0.6405694 0.5501379
## 1 7 0.6992911 0.6190223 0.5764416
## 1 8 0.7379095 0.5871325 0.6076607
## 1 9 0.7475117 0.5805781 0.6090211
## 1 10 0.7479548 0.5821133 0.6156703
## 1 11 0.7656027 0.5669601 0.6275956
## 1 12 0.7763067 0.5558621 0.6414700
## 1 13 0.7614792 0.5751865 0.6347228
## 1 14 0.7511342 0.5905119 0.6204779
## 1 15 0.7539206 0.5916118 0.6251120
## 1 16 0.7492616 0.5953842 0.6207587
## 1 17 0.7492616 0.5953842 0.6207587
## 1 18 0.7492616 0.5953842 0.6207587
## 1 19 0.7492616 0.5953842 0.6207587
## 1 20 0.7492616 0.5953842 0.6207587
## 1 21 0.7492616 0.5953842 0.6207587
## 1 22 0.7492616 0.5953842 0.6207587
## 1 23 0.7492616 0.5953842 0.6207587
## 1 24 0.7492616 0.5953842 0.6207587
## 1 25 0.7492616 0.5953842 0.6207587
## 1 26 0.7492616 0.5953842 0.6207587
## 1 27 0.7492616 0.5953842 0.6207587
## 1 28 0.7492616 0.5953842 0.6207587
## 2 2 0.7904048 0.4507351 0.6275043
## 2 3 0.6948443 0.5814909 0.5696606
## 2 4 0.6975476 0.5941590 0.5701798
## 2 5 0.7211229 0.5897593 0.5936467
## 2 6 0.7572959 0.5655344 0.6224076
## 2 7 0.7296402 0.5923496 0.6016324
## 2 8 0.7648427 0.5759890 0.5979807
## 2 9 0.7723891 0.5783378 0.5937947
## 2 10 0.7863662 0.5650889 0.6005829
## 2 11 0.7832637 0.5801301 0.6050668
## 2 12 0.7947543 0.5723002 0.6193604
## 2 13 0.8188549 0.5692406 0.6265319
## 2 14 34.9506631 0.4992262 10.5176534
## 2 15 0.9044488 0.5341156 0.6707949
## 2 16 0.9009244 0.5391817 0.6752735
## 2 17 38.2771710 0.4719446 11.5165907
## 2 18 38.2439523 0.4900186 11.5051393
## 2 19 38.2511792 0.4910174 11.4962234
## 2 20 38.2448459 0.4920223 11.4947142
## 2 21 38.2444561 0.4923313 11.4955496
## 2 22 38.2378403 0.4993308 11.4892405
## 2 23 38.2385832 0.5011157 11.4899425
## 2 24 38.2385832 0.5011157 11.4899425
## 2 25 38.2498546 0.4926230 11.4983649
## 2 26 38.2498546 0.4926230 11.4983649
## 2 27 38.2498546 0.4926230 11.4983649
## 2 28 38.2498546 0.4926230 11.4983649
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 6 and degree = 1.
marsPred <- predict(marsTuned, newdata = X_test)
postResample(pred = marsPred, obs = y_test)
## RMSE Rsquared MAE
## 0.6084064 0.5203117 0.4713370
Support Vector Machines
svmRTuned <- train(X_train, y_train,
method = 'svmRadial',
preProc = c('center','scale'),
tuneLength = 14,
trControl = trainControl(method = 'cv'))
svmRTuned
## Support Vector Machines with Radial Basis Function Kernel
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (57), scaled (57)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 119, 119, 119, 119, 117, 118, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 0.8003593 0.5875267 0.6509437
## 0.50 0.7106184 0.6278583 0.5860788
## 1.00 0.6486632 0.6649921 0.5316189
## 2.00 0.6174800 0.6862592 0.4981234
## 4.00 0.5959579 0.7008934 0.4751655
## 8.00 0.5808417 0.7129948 0.4650281
## 16.00 0.5750742 0.7165059 0.4612686
## 32.00 0.5750742 0.7165059 0.4612686
## 64.00 0.5750742 0.7165059 0.4612686
## 128.00 0.5750742 0.7165059 0.4612686
## 256.00 0.5750742 0.7165059 0.4612686
## 512.00 0.5750742 0.7165059 0.4612686
## 1024.00 0.5750742 0.7165059 0.4612686
## 2048.00 0.5750742 0.7165059 0.4612686
##
## Tuning parameter 'sigma' was held constant at a value of 0.01108002
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01108002 and C = 16.
svmPred <- predict(svmRTuned, newdata = X_test)
postResample(pred = marsPred, obs = y_test)
## RMSE Rsquared MAE
## 0.6084064 0.5203117 0.4713370
K - Nearest Neighbors
knnTune <- train(X_train, y_train,
method = 'knn',
preProc = c('center','scale'),
tuneGrid = data.frame(.k = 1:20),
trControl = trainControl(method = 'cv'))
knnTune
## k-Nearest Neighbors
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (57), scaled (57)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 119, 120, 118, 118, 118, 119, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 0.8629536 0.4299310 0.6889696
## 2 0.7347385 0.5161024 0.5795412
## 3 0.7106262 0.5413118 0.5743603
## 4 0.7254470 0.5389945 0.5889178
## 5 0.7480052 0.4987857 0.6166085
## 6 0.7641720 0.4844230 0.6372895
## 7 0.7518329 0.5167120 0.6303624
## 8 0.7598386 0.5149700 0.6295734
## 9 0.7582677 0.5208215 0.6260547
## 10 0.7524152 0.5388093 0.6223745
## 11 0.7613537 0.5248970 0.6312540
## 12 0.7763461 0.5075823 0.6431757
## 13 0.7769667 0.5077997 0.6465760
## 14 0.7810270 0.5117965 0.6502142
## 15 0.7802343 0.5090971 0.6446063
## 16 0.7871757 0.5065479 0.6524623
## 17 0.7839005 0.5179838 0.6481946
## 18 0.7879388 0.5220966 0.6476796
## 19 0.7941630 0.5115538 0.6513305
## 20 0.8050150 0.5051792 0.6571662
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 3.
knnPred <- predict(knnTune, newdata = X_test)
postResample(pred = knnPred, obs = y_test)
## RMSE Rsquared MAE
## 0.6479240 0.4326222 0.5183313
The neural network model produces the best performance on the test data set with an \(R^2\) value of 0.6173072.
The process variables dominate the importance variables, similarly to the optimal linear model.
varImp(nnetFit, 10)
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess13 99.45
## ManufacturingProcess17 84.45
## BiologicalMaterial06 76.83
## ManufacturingProcess09 76.23
## BiologicalMaterial03 76.20
## ManufacturingProcess36 74.99
## ManufacturingProcess06 74.25
## BiologicalMaterial12 60.54
## BiologicalMaterial02 59.04
## BiologicalMaterial11 49.98
## ManufacturingProcess31 48.75
## ManufacturingProcess29 41.56
## ManufacturingProcess11 40.96
## BiologicalMaterial04 40.87
## ManufacturingProcess33 39.55
## ManufacturingProcess30 38.79
## BiologicalMaterial08 38.66
## BiologicalMaterial01 33.78
## ManufacturingProcess12 33.39
varImp(enetTune, 10)
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess13 99.45
## ManufacturingProcess17 84.45
## BiologicalMaterial06 76.83
## ManufacturingProcess09 76.23
## BiologicalMaterial03 76.20
## ManufacturingProcess36 74.99
## ManufacturingProcess06 74.25
## BiologicalMaterial12 60.54
## BiologicalMaterial02 59.04
## BiologicalMaterial11 49.98
## ManufacturingProcess31 48.75
## ManufacturingProcess29 41.56
## ManufacturingProcess11 40.96
## BiologicalMaterial04 40.87
## ManufacturingProcess33 39.55
## ManufacturingProcess30 38.79
## BiologicalMaterial08 38.66
## BiologicalMaterial01 33.78
## ManufacturingProcess12 33.39
The predictors seem to have a relatively strong correlation to the response variable.
vars <- varImp(nnetFit)$importance %>%
arrange(desc(Overall)) %>%
as.data.frame() %>%
head(10)
vars <- rownames(vars)
plotX <- df[,vars]
plotY <- df[,1]
featurePlot(plotX, plotY)