Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:
\[ y = 10 \sin(\pi x_1 x_2) + 20 (x_3 - 0.5)^2 + 10 x_4 + 5 x_5 + N(0, \sigma^2) \]
where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation).
The package mlbench
contains a function called
mlbench.friedman1
that simulates these data:
The provided code creates a 200 element training data set and a 5000 element test data set. The oversized test data set is intended to estimate the true error rate with good precision.
Below we view just the training data set created.
# Check out packages
library(mlbench)
library(caret)
library(earth)
library(AppliedPredictiveModeling)
library(ggplot2)
library(gridExtra)
# Create and display training data
set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
trainingData$x <- data.frame(trainingData$x)
featurePlot(trainingData$x, trainingData$y)
# Create test data
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)
Tune several models on these data.
We’ll tune kNN, SVM, MARS and neural networks.
kNN was the provided example. We’re arriving at an RMSE of 3.2286834 and an R-squared of 0.6871735.
# Train the kNN model
knnModel <- train(x = trainingData$x,
y = trainingData$y,
method = "knn",
preProc = c("center", "scale"),
tuneLength = 10)
knnModel
## k-Nearest Neighbors
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 3.466085 0.5121775 2.816838
## 7 3.349428 0.5452823 2.727410
## 9 3.264276 0.5785990 2.660026
## 11 3.214216 0.6024244 2.603767
## 13 3.196510 0.6176570 2.591935
## 15 3.184173 0.6305506 2.577482
## 17 3.183130 0.6425367 2.567787
## 19 3.198752 0.6483184 2.592683
## 21 3.188993 0.6611428 2.588787
## 23 3.200458 0.6638353 2.604529
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 17.
# Apply our kNN to the test data
knnPred <- predict(knnModel, newdata = testData$x)
# Get the test set performance values
postResample(pred = knnPred, obs = testData$y)
## RMSE Rsquared MAE
## 3.2040595 0.6819919 2.5683461
SVM offers an improvement over kNN. We’re arriving at an RMSE of 2.0646922 and an R-squared of 0.8271255.
# Train the SVM model
svmModel <- train(x = trainingData$x,
y = trainingData$y,
method = "svmRadial",
preProc = c("center", "scale"),
tuneLength = 14,
trControl = trainControl(method = "cv"))
svmModel
## Support Vector Machines with Radial Basis Function Kernel
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 2.505383 0.8031869 1.999381
## 0.50 2.290725 0.8103140 1.829703
## 1.00 2.105086 0.8302040 1.677851
## 2.00 2.014620 0.8418576 1.598814
## 4.00 1.965196 0.8491165 1.567327
## 8.00 1.927699 0.8538883 1.542308
## 16.00 1.924236 0.8545355 1.539240
## 32.00 1.924236 0.8545355 1.539240
## 64.00 1.924236 0.8545355 1.539240
## 128.00 1.924236 0.8545355 1.539240
## 256.00 1.924236 0.8545355 1.539240
## 512.00 1.924236 0.8545355 1.539240
## 1024.00 1.924236 0.8545355 1.539240
## 2048.00 1.924236 0.8545355 1.539240
##
## Tuning parameter 'sigma' was held constant at a value of 0.06802164
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06802164 and C = 16.
# Apply our SVM to the test data
svmPred <- predict(svmModel, newdata = testData$x)
# Get the test set performance values
postResample(pred = svmPred, obs = testData$y)
## RMSE Rsquared MAE
## 2.0864652 0.8236735 1.5854649
MARS offers an improvement over SVM and a significant improvement over kNN. We’re arriving at an RMSE of 1.1722635 and an R-squared of 0.9448890.
Interestingly, centering and scaling is not required for MARS because the model treats each predictor individually so scaling and centering doesn’t improve accuracy or interpretability.
# Train the MARS model
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:15)
set.seed(175175327)
marsModel <- train(x = trainingData$x,
y = trainingData$y,
method = "earth",
tuneGrid = marsGrid,
trControl = trainControl(method = "cv"))
marsModel
## Multivariate Adaptive Regression Spline
##
## 200 samples
## 10 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 4.443532 0.2061700 3.6246656
## 1 3 3.686061 0.4612860 2.9883447
## 1 4 2.670236 0.7300728 2.1601427
## 1 5 2.420196 0.7727260 1.9640585
## 1 6 2.272829 0.8053442 1.8250384
## 1 7 1.831139 0.8722483 1.4510742
## 1 8 1.681888 0.8882541 1.3498742
## 1 9 1.614483 0.8982518 1.3047595
## 1 10 1.615624 0.8962895 1.2828182
## 1 11 1.652344 0.8915774 1.3212514
## 1 12 1.607395 0.9013007 1.2841058
## 1 13 1.627154 0.8978113 1.3033838
## 1 14 1.653526 0.8940506 1.3172375
## 1 15 1.653526 0.8940506 1.3172375
## 2 2 4.443532 0.2061700 3.6246656
## 2 3 3.686061 0.4612860 2.9883447
## 2 4 2.614201 0.7369499 2.1382652
## 2 5 2.304309 0.7990862 1.8534738
## 2 6 2.294634 0.7991220 1.8222764
## 2 7 1.889211 0.8603579 1.5092850
## 2 8 1.655904 0.8877936 1.3196488
## 2 9 1.405765 0.9191741 1.1334302
## 2 10 1.410037 0.9173800 1.1271211
## 2 11 1.339156 0.9270169 1.0660183
## 2 12 1.292837 0.9320201 1.0320115
## 2 13 1.234550 0.9382399 0.9806515
## 2 14 1.221632 0.9400491 0.9646931
## 2 15 1.239291 0.9377224 0.9707171
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 14 and degree = 2.
# Apply our MARS to the test data
marsPred <- predict(marsModel, newdata = testData$x)
# Get the test set performance values
postResample(pred = marsPred, obs = testData$y)
## RMSE Rsquared MAE
## 1.1722635 0.9448890 0.9324923
Our neural network outperformed kNN but underperformed SVM and MARS. We’re arriving at an RMSE of 2.2136108 and an R-squared of 0.8074696.
# No predictors to exclude because of high pair-wise correlation
#findCorrelation(cor(trainingData$x), cutoff = .75)
# Train the neural network model
nnetGrid <- expand.grid(.decay = c(0, 0.01, .1),
.size = c(1:10),
.bag = FALSE)
set.seed(175175327)
nnetModel <- train(x = trainingData$x,
y = trainingData$y,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = trainControl(method = "cv"),
preProc = c("center", "scale"),
linout = TRUE,
trace = FALSE,
MaxNWts = 10 * (ncol(trainingData$x) + 1) + 10 + 1,
maxit = 500)
nnetModel
## Model Averaged Neural Network
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 2.454713 0.7687836 1.941236
## 0.00 2 2.378413 0.7781909 1.880364
## 0.00 3 2.115133 0.8180688 1.632097
## 0.00 4 2.080778 0.8281506 1.599236
## 0.00 5 3.051090 0.6877508 2.146463
## 0.00 6 3.248364 0.7007759 2.275962
## 0.00 7 3.460356 0.6444719 2.493511
## 0.00 8 4.233361 0.6293905 2.817352
## 0.00 9 4.296703 0.6311299 2.840973
## 0.00 10 4.460691 0.5612145 2.858168
## 0.01 1 2.377944 0.7769301 1.871507
## 0.01 2 2.380376 0.7770345 1.872891
## 0.01 3 2.079243 0.8179574 1.622305
## 0.01 4 2.085465 0.8232845 1.625832
## 0.01 5 2.082200 0.8313365 1.631450
## 0.01 6 2.183693 0.8060375 1.727257
## 0.01 7 2.380156 0.7852934 1.912939
## 0.01 8 2.443240 0.7716346 1.930034
## 0.01 9 2.536389 0.7684754 2.019504
## 0.01 10 2.473653 0.7602672 2.010368
## 0.10 1 2.382646 0.7742213 1.873647
## 0.10 2 2.427876 0.7697958 1.923026
## 0.10 3 2.174409 0.8031399 1.699955
## 0.10 4 2.060485 0.8253459 1.607417
## 0.10 5 2.150869 0.8175745 1.712457
## 0.10 6 2.054195 0.8278467 1.596078
## 0.10 7 2.276615 0.7992966 1.757261
## 0.10 8 2.329648 0.7889581 1.831415
## 0.10 9 2.304847 0.7817968 1.816082
## 0.10 10 2.417341 0.7633641 1.938319
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 6, decay = 0.1 and bag = FALSE.
# Apply our neural network to the test data
nnetPred <- predict(nnetModel, newdata = testData$x)
# Get the test set performance values
postResample(pred = nnetPred, obs = testData$y)
## RMSE Rsquared MAE
## 2.2136108 0.8074696 1.6822294
Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?
In summary, MARS outperformed the other models. It selected the informative predictors and correctly did not use the other predictors which were used to generate random information but did not go into the formula for generating the target value.
Our best performing model was MARS. See the table below for a ranking of performance
Performance | Model Type | RMSE | R-squared |
---|---|---|---|
#1 | MARS | 1.1722635 | 0.9448890 |
#2 | SVM | 2.0646922 | 0.8271255 |
#3 | Neural Network | 2.2136108 | 0.8074696 |
#4 | KNN | 3.2286834 | 0.6871735 |
Yes, MARS selected the informative predictors, X1–X5. Note in the
description below, X3 is last in importance before X6 and above which
are marked unused
. However when we graph the relative
importance of X1-X5, X3 is zero. This means it is somehow used in the
model but it’s contribution to the model’s predictive power is
minimal.
Since MARS treats each predictor individually when building individual splits or basis functions, that’s why we don’t have to center and scale MARS functions, I didn’t think there would be a possibility for X3 to influence the other predictors, but it turns out MARS allows combining predictors by multiplying basis functions and so that may be why we’re seeing X3 used in the model even if it’s not contributing much to the model’s prediction.
# Show variance importance
#marsVarImp <- varImp(marsModel)
#print(marsVarImp)
# Show final model summary results
summary(marsModel$finalModel)
## Call: earth(x=data.frame[200,10], y=c(18.46,16.1,17...), keepxy=TRUE, degree=2,
## nprune=14)
##
## coefficients
## (Intercept) 21.905319
## h(0.621722-X1) -15.726181
## h(X1-0.621722) 9.234027
## h(0.601063-X2) -18.253527
## h(X2-0.601063) 10.448545
## h(0.447442-X3) 9.700589
## h(X3-0.606015) 12.674694
## h(0.734892-X4) -9.863526
## h(X4-0.734892) 10.297964
## h(0.850094-X5) -5.324175
## h(0.218266-X1) * h(X2-0.601063) -55.358726
## h(X1-0.218266) * h(X2-0.601063) -29.100250
## h(X1-0.621722) * h(X2-0.295997) -26.833129
## h(0.649253-X1) * h(0.601063-X2) 27.120721
##
## Selected 14 of 18 terms, and 5 of 10 predictors (nprune=14)
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6-unused, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 9 4
## GCV 1.62945 RSS 225.8601 GRSq 0.9338437 RSq 0.953688
# Display variance importance of the predictors used in the model
plot(varImp(marsModel), main = "Variable Importance - MARS Model")
Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.
Here we load, impute and split the data the same as last time for comparison.
# Load the data
data("ChemicalManufacturingProcess")
# Impute missing values using kNN
imputeModel <- preProcess(ChemicalManufacturingProcess, method = c("knnImpute"))
CMP <- predict(imputeModel, ChemicalManufacturingProcess)
# Split data
set.seed(522)
trainingRows <- createDataPartition(CMP[,1], p = .80, list=FALSE)
trainPredictors <- CMP[trainingRows,-1]
trainTarget <- CMP[trainingRows,1]
testPredictors <- CMP[-trainingRows,-1]
testTarget <- CMP[-trainingRows,1]
With kNN, we’re arriving at an RMSE of 0.7579 and an R-squared of 0.4851
# Train the kNN model
set.seed(175175327)
knnModel <- train(x = trainPredictors,
y = trainTarget,
method = "knn",
preProc = c("center", "scale","nzv","corr"),
tuneLength = 10)
knnModel
## k-Nearest Neighbors
##
## 144 samples
## 57 predictor
##
## Pre-processing: centered (46), scaled (46), remove (11)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 0.8403217 0.3354140 0.6620180
## 7 0.8443433 0.3303946 0.6693365
## 9 0.8317434 0.3501138 0.6647881
## 11 0.8252569 0.3612905 0.6614803
## 13 0.8261834 0.3621206 0.6624814
## 15 0.8198583 0.3756646 0.6597847
## 17 0.8108831 0.3970905 0.6558287
## 19 0.8110869 0.4032304 0.6542984
## 21 0.8088851 0.4137685 0.6525860
## 23 0.8134998 0.4094164 0.6561710
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 21.
# Apply our kNN to the test data
knnPred <- predict(knnModel, newdata = testPredictors)
# Get the test set performance values
postResample(pred = knnPred, obs = testTarget)
## RMSE Rsquared MAE
## 0.7579062 0.4850835 0.6170100
SVM offers a significant improvement over kNN. We’re arriving at an RMSE of 0.66061 and an R-squared of 0.6320.
# Train the SVM model
set.seed(175175327)
svmModel <- train(x = trainPredictors,
y = trainTarget,
method = "svmRadial",
preProc = c("center", "scale","nzv","corr"),
tuneLength = 14,
trControl = trainControl(method = "cv"))
svmModel
## Support Vector Machines with Radial Basis Function Kernel
##
## 144 samples
## 57 predictor
##
## Pre-processing: centered (46), scaled (46), remove (11)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 129, 130, 129, 131, 131, 129, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 0.7691059 0.5097537 0.6197299
## 0.50 0.7152946 0.5509478 0.5742437
## 1.00 0.6616520 0.6063983 0.5336181
## 2.00 0.6367372 0.6496179 0.5126332
## 4.00 0.6125438 0.6839575 0.4893630
## 8.00 0.6148353 0.6833447 0.4900243
## 16.00 0.6145696 0.6835409 0.4898285
## 32.00 0.6145696 0.6835409 0.4898285
## 64.00 0.6145696 0.6835409 0.4898285
## 128.00 0.6145696 0.6835409 0.4898285
## 256.00 0.6145696 0.6835409 0.4898285
## 512.00 0.6145696 0.6835409 0.4898285
## 1024.00 0.6145696 0.6835409 0.4898285
## 2048.00 0.6145696 0.6835409 0.4898285
##
## Tuning parameter 'sigma' was held constant at a value of 0.01756631
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01756631 and C = 4.
# Apply our SVM to the test data
svmPred <- predict(svmModel, newdata = testPredictors)
# Get the test set performance values
postResample(pred = svmPred, obs = testTarget)
## RMSE Rsquared MAE
## 0.6061169 0.6319977 0.4959181
MARS performed roughly the same results on the test data as kNN with SVM in the lead. MARS arrived at an RMSE of 0.7531 and an R-squared of 0.4851.
# Train the MARS model
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:15)
set.seed(175175327)
marsModel <- train(x = trainPredictors,
y = trainTarget,
method = "earth",
preProcess = c("nzv", "corr"),
tuneGrid = marsGrid,
trControl = trainControl(method = "cv"))
marsModel
## Multivariate Adaptive Regression Spline
##
## 144 samples
## 57 predictor
##
## Pre-processing: remove (11)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 129, 130, 129, 131, 131, 129, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 0.7858023 0.4128738 0.6170093
## 1 3 0.7447863 0.4654854 0.5823625
## 1 4 0.6788166 0.5392729 0.5413664
## 1 5 0.6863361 0.5421763 0.5527994
## 1 6 0.6756801 0.5529367 0.5347030
## 1 7 0.6693112 0.5560002 0.5371421
## 1 8 0.6907235 0.5291965 0.5389524
## 1 9 0.6958703 0.5359418 0.5335775
## 1 10 0.6801193 0.5486922 0.5189771
## 1 11 0.6938671 0.5387392 0.5462674
## 1 12 0.6746041 0.5516506 0.5353572
## 1 13 0.6652235 0.5602077 0.5319404
## 1 14 0.6757268 0.5499120 0.5380170
## 1 15 0.6975093 0.5361280 0.5488372
## 2 2 0.7858023 0.4128738 0.6170093
## 2 3 0.7188345 0.4996679 0.5607470
## 2 4 0.7136738 0.5042911 0.5604273
## 2 5 0.6819435 0.5398210 0.5390034
## 2 6 0.7136103 0.4966587 0.5551547
## 2 7 0.7049913 0.5146528 0.5553919
## 2 8 0.7635640 0.4831671 0.5808472
## 2 9 0.7784452 0.4555035 0.5858035
## 2 10 0.7542570 0.4952230 0.5808509
## 2 11 0.7703341 0.4800442 0.5983735
## 2 12 0.7671642 0.4911891 0.5893508
## 2 13 0.7501213 0.4987074 0.5846191
## 2 14 0.7593959 0.4886779 0.5951316
## 2 15 0.7602186 0.4910301 0.6028569
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 13 and degree = 1.
# Apply our MARS to the test data
marsPred <- predict(marsModel, newdata = testPredictors)
# Get the test set performance values
postResample(pred = marsPred, obs = testTarget)
## RMSE Rsquared MAE
## 0.7530975 0.4850939 0.5746503
Neural network wins with the slightest advantage over SVM. We’re arriving at an RMSE of 0.6013 and an R-squared of 0.6391.
# Train the neural network model
nnetGrid <- expand.grid(.decay = c(0, 0.01, .1),
.size = c(1:10),
.bag = FALSE)
set.seed(175175327)
nnetModel <- train(x = trainPredictors,
y = trainTarget,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = trainControl(method = "cv"),
preProc = c("center", "scale", "nzv", "corr"),
linout = TRUE,
trace = FALSE,
MaxNWts = 10 * (ncol(trainPredictors) + 1) + 10 + 1,
maxit = 500)
nnetModel
## Model Averaged Neural Network
##
## 144 samples
## 57 predictor
##
## Pre-processing: centered (46), scaled (46), remove (11)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 129, 130, 129, 131, 131, 129, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 0.7773132 0.4513908 0.6226808
## 0.00 2 0.7418367 0.5202791 0.5989981
## 0.00 3 0.7303030 0.5395246 0.5768155
## 0.00 4 0.7262604 0.5354406 0.6066786
## 0.00 5 0.7640151 0.5511149 0.6185083
## 0.00 6 0.6944983 0.5923382 0.5534747
## 0.00 7 0.6712731 0.6025720 0.5263522
## 0.00 8 0.6499201 0.6192002 0.5090941
## 0.00 9 0.6957927 0.5556648 0.5569141
## 0.00 10 0.6192305 0.6503882 0.4943724
## 0.01 1 0.8634470 0.4434500 0.6515523
## 0.01 2 0.7594050 0.5110027 0.6159487
## 0.01 3 0.7451579 0.5318768 0.5900924
## 0.01 4 0.6434050 0.6345503 0.5118380
## 0.01 5 0.6277314 0.6459746 0.5126876
## 0.01 6 0.5626685 0.7024774 0.4432965
## 0.01 7 0.6128103 0.6505940 0.4993619
## 0.01 8 0.6023294 0.6768808 0.4894758
## 0.01 9 0.6063032 0.6683167 0.4912192
## 0.01 10 0.6066756 0.6665995 0.4965936
## 0.10 1 0.7757303 0.4812221 0.6184119
## 0.10 2 0.6454558 0.6335956 0.5254793
## 0.10 3 0.6663638 0.6172061 0.5477837
## 0.10 4 0.6211094 0.6530521 0.5131886
## 0.10 5 0.5879941 0.6870835 0.4747346
## 0.10 6 0.6150274 0.6570889 0.4947629
## 0.10 7 0.6143467 0.6605081 0.5005305
## 0.10 8 0.6140094 0.6399764 0.4912167
## 0.10 9 0.6083117 0.6601753 0.4916095
## 0.10 10 0.6020755 0.6710211 0.4785764
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 6, decay = 0.01 and bag = FALSE.
# Apply our neural network to the test data
nnetPred <- predict(nnetModel, newdata = testPredictors)
# Get the test set performance values
postResample(pred = nnetPred, obs = testTarget)
## RMSE Rsquared MAE
## 0.6012957 0.6390797 0.5008649
Which nonlinear regression model gives the optimal resampling and test set performance?
Neural Network performs best in both the test set and with the optimal resampling of the train data. Here’s the table in order of appearance:
Model | Test RMSE | Test R-squared | Train RMSE | Train R-squared |
---|---|---|---|---|
kNN | 0.7579 | 0.4851 | 0.8089 | 0.4138 |
SVM | 0.6061 | 0.632 | 0.6125 | 0.684 |
MARS | 0.7531 | 0.4851 | 0.6652 | 0.5602 |
Neural Network | 0.6013 | 0.6391 | 0.5627 | 0.7025 |
Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?
Below are the top ten most important predictors. Manufacturing processes beat out biological materials for importance, seven to three, the same as with the linear regression model at the end of the prior homework: https://rpubs.com/pkofy/1240790
The neural network model has ManufacturingProcess31
and
BiologicalMaterial12
top ten predictors but they do not
have coefficients in our elastic-net linear regression model referenced
above.
And there are top-ten coefficients by absolute value in our
elastic-net linear regressio model for
ManufacturingProcess11
and
ManufacturingProcess33
but they aren’t in our top 10
predictors for the neural network.
# Print the top ten most important predictors
nnetVarImp <- varImp(nnetModel)
top10_nnet <- nnetVarImp$importance |>
as.data.frame() |>
dplyr::arrange(desc(Overall)) |>
head(10)
print(top10_nnet)
## Overall
## ManufacturingProcess13 100.00000
## ManufacturingProcess32 92.87049
## ManufacturingProcess17 81.90448
## ManufacturingProcess09 80.21338
## BiologicalMaterial06 72.87876
## ManufacturingProcess31 68.97342
## BiologicalMaterial03 68.34276
## ManufacturingProcess06 67.67449
## ManufacturingProcess36 67.24545
## BiologicalMaterial12 62.43653
Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?
What the comparison of predictors between our neural network and elastic-net linear regression show us is that neural networks are capable of surfacing latent, complex relationships in the dataset that are not visible or tractable through linear methods.
Neural networks are more black box compared to more statistical
learning methods. We can’t explore the relationships between the top
predictors like we could if they were coefficients in a linear model.
However we can use the neural network to identify factors in the process
that may be helpful to Yield
that we’re on our radar.
For example, BiologicalMaterial12
was one of the newly
surfaced meaningful predictors thanks to the neural network model. It
looks like it could have a positive correlation to Yield
however it maybe fans out for positive values of
BiologicalMaterial12
. It may be that the material may not
effect the process once you have above a necessary and sufficient amount
but below that it can negatively impact Yield
. It may be
that decisions trees might handle this kind of relationship better.
ManufacturingProcess13
was the most significant
predictor in the Yield
. We can see that high positive
values for the process correspond to the worst outcomes in
Yield
and high negative values for the process correspond
to the best outcomes in Yield
. It may be that the company
should invest in new equipment that allows the chemicals to be
consistently run through the high negative values of this process. Or
more work needs to be understood about how the other base materials or
processes dictate which version of ManufacturingProcess13
can be applied. That insight might be obtainable through decision tree
methods.
Next steps would be to run this problem through different types of decision trees.
# Create scatter plots for each predictor against yield
p1 <- ggplot(CMP, aes(x = ManufacturingProcess13, y = Yield)) +
geom_point() +
ggtitle("ManufacturingProcess13 vs Yield") +
theme_minimal()
p2 <- ggplot(CMP, aes(x = BiologicalMaterial12, y = Yield)) +
geom_point() +
ggtitle("BiologicalMaterial12 vs Yield") +
theme_minimal()
# Arrange plots in a grid
grid.arrange(p1, p2, ncol = 2)
These exercises come from ‘Applied Predictive Modeling’ by Max Kuhn and Kjell Johnson.