Exercise 7.2

Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:

\[ y = 10 \sin(\pi x_1 x_2) + 20 (x_3 - 0.5)^2 + 10 x_4 + 5 x_5 + N(0, \sigma^2) \]

where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation).


Simulate data

The package mlbench contains a function called mlbench.friedman1 that simulates these data:

The provided code creates a 200 element training data set and a 5000 element test data set. The oversized test data set is intended to estimate the true error rate with good precision.

Below we view just the training data set created.

# Check out packages
library(mlbench)
library(caret)
library(earth)
library(AppliedPredictiveModeling)
library(ggplot2)
library(gridExtra)

# Create and display training data
set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
trainingData$x <- data.frame(trainingData$x)
featurePlot(trainingData$x, trainingData$y)

# Create test data
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)


Tune models

Tune several models on these data.

We’ll tune kNN, SVM, MARS and neural networks.


kNN

kNN was the provided example. We’re arriving at an RMSE of 3.2286834 and an R-squared of 0.6871735.

# Train the kNN model
knnModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = "knn",
                  preProc = c("center", "scale"),
                  tuneLength = 10)
knnModel
## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.466085  0.5121775  2.816838
##    7  3.349428  0.5452823  2.727410
##    9  3.264276  0.5785990  2.660026
##   11  3.214216  0.6024244  2.603767
##   13  3.196510  0.6176570  2.591935
##   15  3.184173  0.6305506  2.577482
##   17  3.183130  0.6425367  2.567787
##   19  3.198752  0.6483184  2.592683
##   21  3.188993  0.6611428  2.588787
##   23  3.200458  0.6638353  2.604529
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 17.
# Apply our kNN to the test data
knnPred <- predict(knnModel, newdata = testData$x)

# Get the test set performance values
postResample(pred = knnPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 3.2040595 0.6819919 2.5683461


Support Vector Machines

SVM offers an improvement over kNN. We’re arriving at an RMSE of 2.0646922 and an R-squared of 0.8271255.

# Train the SVM model
svmModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = "svmRadial",
                  preProc = c("center", "scale"),
                  tuneLength = 14,
                  trControl = trainControl(method = "cv"))
svmModel
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE      Rsquared   MAE     
##      0.25  2.505383  0.8031869  1.999381
##      0.50  2.290725  0.8103140  1.829703
##      1.00  2.105086  0.8302040  1.677851
##      2.00  2.014620  0.8418576  1.598814
##      4.00  1.965196  0.8491165  1.567327
##      8.00  1.927699  0.8538883  1.542308
##     16.00  1.924236  0.8545355  1.539240
##     32.00  1.924236  0.8545355  1.539240
##     64.00  1.924236  0.8545355  1.539240
##    128.00  1.924236  0.8545355  1.539240
##    256.00  1.924236  0.8545355  1.539240
##    512.00  1.924236  0.8545355  1.539240
##   1024.00  1.924236  0.8545355  1.539240
##   2048.00  1.924236  0.8545355  1.539240
## 
## Tuning parameter 'sigma' was held constant at a value of 0.06802164
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06802164 and C = 16.
# Apply our SVM to the test data
svmPred <- predict(svmModel, newdata = testData$x)

# Get the test set performance values
postResample(pred = svmPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 2.0864652 0.8236735 1.5854649


Multivariate Adaptive Regression Splines

MARS offers an improvement over SVM and a significant improvement over kNN. We’re arriving at an RMSE of 1.1722635 and an R-squared of 0.9448890.

Interestingly, centering and scaling is not required for MARS because the model treats each predictor individually so scaling and centering doesn’t improve accuracy or interpretability.

# Train the MARS model
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:15)
set.seed(175175327)
marsModel <- train(x = trainingData$x,
                   y = trainingData$y,
                   method = "earth",
                   tuneGrid = marsGrid,
                   trControl = trainControl(method = "cv"))
marsModel
## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE      
##   1        2      4.443532  0.2061700  3.6246656
##   1        3      3.686061  0.4612860  2.9883447
##   1        4      2.670236  0.7300728  2.1601427
##   1        5      2.420196  0.7727260  1.9640585
##   1        6      2.272829  0.8053442  1.8250384
##   1        7      1.831139  0.8722483  1.4510742
##   1        8      1.681888  0.8882541  1.3498742
##   1        9      1.614483  0.8982518  1.3047595
##   1       10      1.615624  0.8962895  1.2828182
##   1       11      1.652344  0.8915774  1.3212514
##   1       12      1.607395  0.9013007  1.2841058
##   1       13      1.627154  0.8978113  1.3033838
##   1       14      1.653526  0.8940506  1.3172375
##   1       15      1.653526  0.8940506  1.3172375
##   2        2      4.443532  0.2061700  3.6246656
##   2        3      3.686061  0.4612860  2.9883447
##   2        4      2.614201  0.7369499  2.1382652
##   2        5      2.304309  0.7990862  1.8534738
##   2        6      2.294634  0.7991220  1.8222764
##   2        7      1.889211  0.8603579  1.5092850
##   2        8      1.655904  0.8877936  1.3196488
##   2        9      1.405765  0.9191741  1.1334302
##   2       10      1.410037  0.9173800  1.1271211
##   2       11      1.339156  0.9270169  1.0660183
##   2       12      1.292837  0.9320201  1.0320115
##   2       13      1.234550  0.9382399  0.9806515
##   2       14      1.221632  0.9400491  0.9646931
##   2       15      1.239291  0.9377224  0.9707171
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 14 and degree = 2.
# Apply our MARS to the test data
marsPred <- predict(marsModel, newdata = testData$x)

# Get the test set performance values
postResample(pred = marsPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 1.1722635 0.9448890 0.9324923


Neural Networks

Our neural network outperformed kNN but underperformed SVM and MARS. We’re arriving at an RMSE of 2.2136108 and an R-squared of 0.8074696.

# No predictors to exclude because of high pair-wise correlation
#findCorrelation(cor(trainingData$x), cutoff = .75)

# Train the neural network model
nnetGrid <- expand.grid(.decay = c(0, 0.01, .1),
                        .size = c(1:10),
                        .bag = FALSE)
set.seed(175175327)
nnetModel <- train(x = trainingData$x,
                   y = trainingData$y,
                   method = "avNNet",
                   tuneGrid = nnetGrid,
                   trControl = trainControl(method = "cv"),
                   preProc = c("center", "scale"),
                   linout = TRUE,
                   trace = FALSE,
                   MaxNWts = 10 * (ncol(trainingData$x) + 1) + 10 + 1,
                   maxit = 500)
nnetModel
## Model Averaged Neural Network 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE      Rsquared   MAE     
##   0.00    1    2.454713  0.7687836  1.941236
##   0.00    2    2.378413  0.7781909  1.880364
##   0.00    3    2.115133  0.8180688  1.632097
##   0.00    4    2.080778  0.8281506  1.599236
##   0.00    5    3.051090  0.6877508  2.146463
##   0.00    6    3.248364  0.7007759  2.275962
##   0.00    7    3.460356  0.6444719  2.493511
##   0.00    8    4.233361  0.6293905  2.817352
##   0.00    9    4.296703  0.6311299  2.840973
##   0.00   10    4.460691  0.5612145  2.858168
##   0.01    1    2.377944  0.7769301  1.871507
##   0.01    2    2.380376  0.7770345  1.872891
##   0.01    3    2.079243  0.8179574  1.622305
##   0.01    4    2.085465  0.8232845  1.625832
##   0.01    5    2.082200  0.8313365  1.631450
##   0.01    6    2.183693  0.8060375  1.727257
##   0.01    7    2.380156  0.7852934  1.912939
##   0.01    8    2.443240  0.7716346  1.930034
##   0.01    9    2.536389  0.7684754  2.019504
##   0.01   10    2.473653  0.7602672  2.010368
##   0.10    1    2.382646  0.7742213  1.873647
##   0.10    2    2.427876  0.7697958  1.923026
##   0.10    3    2.174409  0.8031399  1.699955
##   0.10    4    2.060485  0.8253459  1.607417
##   0.10    5    2.150869  0.8175745  1.712457
##   0.10    6    2.054195  0.8278467  1.596078
##   0.10    7    2.276615  0.7992966  1.757261
##   0.10    8    2.329648  0.7889581  1.831415
##   0.10    9    2.304847  0.7817968  1.816082
##   0.10   10    2.417341  0.7633641  1.938319
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 6, decay = 0.1 and bag = FALSE.
# Apply our neural network to the test data
nnetPred <- predict(nnetModel, newdata = testData$x)

# Get the test set performance values
postResample(pred = nnetPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 2.2136108 0.8074696 1.6822294


Evaluate models

Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

In summary, MARS outperformed the other models. It selected the informative predictors and correctly did not use the other predictors which were used to generate random information but did not go into the formula for generating the target value.


Best performance

Our best performing model was MARS. See the table below for a ranking of performance

Performance Model Type RMSE R-squared
#1 MARS 1.1722635 0.9448890
#2 SVM 2.0646922 0.8271255
#3 Neural Network 2.2136108 0.8074696
#4 KNN 3.2286834 0.6871735


Informative predictors

Yes, MARS selected the informative predictors, X1–X5. Note in the description below, X3 is last in importance before X6 and above which are marked unused. However when we graph the relative importance of X1-X5, X3 is zero. This means it is somehow used in the model but it’s contribution to the model’s predictive power is minimal.

Since MARS treats each predictor individually when building individual splits or basis functions, that’s why we don’t have to center and scale MARS functions, I didn’t think there would be a possibility for X3 to influence the other predictors, but it turns out MARS allows combining predictors by multiplying basis functions and so that may be why we’re seeing X3 used in the model even if it’s not contributing much to the model’s prediction.

# Show variance importance
#marsVarImp <- varImp(marsModel)
#print(marsVarImp)

# Show final model summary results
summary(marsModel$finalModel)
## Call: earth(x=data.frame[200,10], y=c(18.46,16.1,17...), keepxy=TRUE, degree=2,
##             nprune=14)
## 
##                                 coefficients
## (Intercept)                        21.905319
## h(0.621722-X1)                    -15.726181
## h(X1-0.621722)                      9.234027
## h(0.601063-X2)                    -18.253527
## h(X2-0.601063)                     10.448545
## h(0.447442-X3)                      9.700589
## h(X3-0.606015)                     12.674694
## h(0.734892-X4)                     -9.863526
## h(X4-0.734892)                     10.297964
## h(0.850094-X5)                     -5.324175
## h(0.218266-X1) * h(X2-0.601063)   -55.358726
## h(X1-0.218266) * h(X2-0.601063)   -29.100250
## h(X1-0.621722) * h(X2-0.295997)   -26.833129
## h(0.649253-X1) * h(0.601063-X2)    27.120721
## 
## Selected 14 of 18 terms, and 5 of 10 predictors (nprune=14)
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6-unused, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 9 4
## GCV 1.62945    RSS 225.8601    GRSq 0.9338437    RSq 0.953688
# Display variance importance of the predictors used in the model
plot(varImp(marsModel), main = "Variable Importance - MARS Model")




Exercise 7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.


Set up data

Here we load, impute and split the data the same as last time for comparison.

# Load the data
data("ChemicalManufacturingProcess")

# Impute missing values using kNN
imputeModel <- preProcess(ChemicalManufacturingProcess, method = c("knnImpute"))
CMP <- predict(imputeModel, ChemicalManufacturingProcess)

# Split data
set.seed(522)
trainingRows <- createDataPartition(CMP[,1], p = .80, list=FALSE)
trainPredictors <- CMP[trainingRows,-1]
trainTarget <- CMP[trainingRows,1]
testPredictors <- CMP[-trainingRows,-1]
testTarget <- CMP[-trainingRows,1]


kNN

With kNN, we’re arriving at an RMSE of 0.7579 and an R-squared of 0.4851

# Train the kNN model
set.seed(175175327)
knnModel <- train(x = trainPredictors,
                  y = trainTarget,
                  method = "knn",
                  preProc = c("center", "scale","nzv","corr"),
                  tuneLength = 10)
knnModel
## k-Nearest Neighbors 
## 
## 144 samples
##  57 predictor
## 
## Pre-processing: centered (46), scaled (46), remove (11) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.8403217  0.3354140  0.6620180
##    7  0.8443433  0.3303946  0.6693365
##    9  0.8317434  0.3501138  0.6647881
##   11  0.8252569  0.3612905  0.6614803
##   13  0.8261834  0.3621206  0.6624814
##   15  0.8198583  0.3756646  0.6597847
##   17  0.8108831  0.3970905  0.6558287
##   19  0.8110869  0.4032304  0.6542984
##   21  0.8088851  0.4137685  0.6525860
##   23  0.8134998  0.4094164  0.6561710
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 21.
# Apply our kNN to the test data
knnPred <- predict(knnModel, newdata = testPredictors)

# Get the test set performance values
postResample(pred = knnPred, obs = testTarget)
##      RMSE  Rsquared       MAE 
## 0.7579062 0.4850835 0.6170100


Support Vector Machines

SVM offers a significant improvement over kNN. We’re arriving at an RMSE of 0.66061 and an R-squared of 0.6320.

# Train the SVM model
set.seed(175175327)
svmModel <- train(x = trainPredictors,
                  y = trainTarget,
                  method = "svmRadial",
                  preProc = c("center", "scale","nzv","corr"),
                  tuneLength = 14,
                  trControl = trainControl(method = "cv"))
svmModel
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 144 samples
##  57 predictor
## 
## Pre-processing: centered (46), scaled (46), remove (11) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 130, 129, 131, 131, 129, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE       Rsquared   MAE      
##      0.25  0.7691059  0.5097537  0.6197299
##      0.50  0.7152946  0.5509478  0.5742437
##      1.00  0.6616520  0.6063983  0.5336181
##      2.00  0.6367372  0.6496179  0.5126332
##      4.00  0.6125438  0.6839575  0.4893630
##      8.00  0.6148353  0.6833447  0.4900243
##     16.00  0.6145696  0.6835409  0.4898285
##     32.00  0.6145696  0.6835409  0.4898285
##     64.00  0.6145696  0.6835409  0.4898285
##    128.00  0.6145696  0.6835409  0.4898285
##    256.00  0.6145696  0.6835409  0.4898285
##    512.00  0.6145696  0.6835409  0.4898285
##   1024.00  0.6145696  0.6835409  0.4898285
##   2048.00  0.6145696  0.6835409  0.4898285
## 
## Tuning parameter 'sigma' was held constant at a value of 0.01756631
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01756631 and C = 4.
# Apply our SVM to the test data
svmPred <- predict(svmModel, newdata = testPredictors)

# Get the test set performance values
postResample(pred = svmPred, obs = testTarget)
##      RMSE  Rsquared       MAE 
## 0.6061169 0.6319977 0.4959181


Multivariate Adaptive Regression Splines

MARS performed roughly the same results on the test data as kNN with SVM in the lead. MARS arrived at an RMSE of 0.7531 and an R-squared of 0.4851.

# Train the MARS model
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:15)
set.seed(175175327)
marsModel <- train(x = trainPredictors,
                   y = trainTarget,
                   method = "earth",
                   preProcess = c("nzv", "corr"),
                   tuneGrid = marsGrid,
                   trControl = trainControl(method = "cv"))
marsModel
## Multivariate Adaptive Regression Spline 
## 
## 144 samples
##  57 predictor
## 
## Pre-processing: remove (11) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 130, 129, 131, 131, 129, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE      
##   1        2      0.7858023  0.4128738  0.6170093
##   1        3      0.7447863  0.4654854  0.5823625
##   1        4      0.6788166  0.5392729  0.5413664
##   1        5      0.6863361  0.5421763  0.5527994
##   1        6      0.6756801  0.5529367  0.5347030
##   1        7      0.6693112  0.5560002  0.5371421
##   1        8      0.6907235  0.5291965  0.5389524
##   1        9      0.6958703  0.5359418  0.5335775
##   1       10      0.6801193  0.5486922  0.5189771
##   1       11      0.6938671  0.5387392  0.5462674
##   1       12      0.6746041  0.5516506  0.5353572
##   1       13      0.6652235  0.5602077  0.5319404
##   1       14      0.6757268  0.5499120  0.5380170
##   1       15      0.6975093  0.5361280  0.5488372
##   2        2      0.7858023  0.4128738  0.6170093
##   2        3      0.7188345  0.4996679  0.5607470
##   2        4      0.7136738  0.5042911  0.5604273
##   2        5      0.6819435  0.5398210  0.5390034
##   2        6      0.7136103  0.4966587  0.5551547
##   2        7      0.7049913  0.5146528  0.5553919
##   2        8      0.7635640  0.4831671  0.5808472
##   2        9      0.7784452  0.4555035  0.5858035
##   2       10      0.7542570  0.4952230  0.5808509
##   2       11      0.7703341  0.4800442  0.5983735
##   2       12      0.7671642  0.4911891  0.5893508
##   2       13      0.7501213  0.4987074  0.5846191
##   2       14      0.7593959  0.4886779  0.5951316
##   2       15      0.7602186  0.4910301  0.6028569
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 13 and degree = 1.
# Apply our MARS to the test data
marsPred <- predict(marsModel, newdata = testPredictors)

# Get the test set performance values
postResample(pred = marsPred, obs = testTarget)
##      RMSE  Rsquared       MAE 
## 0.7530975 0.4850939 0.5746503


Neural Networks

Neural network wins with the slightest advantage over SVM. We’re arriving at an RMSE of 0.6013 and an R-squared of 0.6391.

# Train the neural network model
nnetGrid <- expand.grid(.decay = c(0, 0.01, .1),
                        .size = c(1:10),
                        .bag = FALSE)
set.seed(175175327)
nnetModel <- train(x = trainPredictors,
                   y = trainTarget,
                   method = "avNNet",
                   tuneGrid = nnetGrid,
                   trControl = trainControl(method = "cv"),
                   preProc = c("center", "scale", "nzv", "corr"),
                   linout = TRUE,
                   trace = FALSE,
                   MaxNWts = 10 * (ncol(trainPredictors) + 1) + 10 + 1,
                   maxit = 500)
nnetModel
## Model Averaged Neural Network 
## 
## 144 samples
##  57 predictor
## 
## Pre-processing: centered (46), scaled (46), remove (11) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 130, 129, 131, 131, 129, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE      
##   0.00    1    0.7773132  0.4513908  0.6226808
##   0.00    2    0.7418367  0.5202791  0.5989981
##   0.00    3    0.7303030  0.5395246  0.5768155
##   0.00    4    0.7262604  0.5354406  0.6066786
##   0.00    5    0.7640151  0.5511149  0.6185083
##   0.00    6    0.6944983  0.5923382  0.5534747
##   0.00    7    0.6712731  0.6025720  0.5263522
##   0.00    8    0.6499201  0.6192002  0.5090941
##   0.00    9    0.6957927  0.5556648  0.5569141
##   0.00   10    0.6192305  0.6503882  0.4943724
##   0.01    1    0.8634470  0.4434500  0.6515523
##   0.01    2    0.7594050  0.5110027  0.6159487
##   0.01    3    0.7451579  0.5318768  0.5900924
##   0.01    4    0.6434050  0.6345503  0.5118380
##   0.01    5    0.6277314  0.6459746  0.5126876
##   0.01    6    0.5626685  0.7024774  0.4432965
##   0.01    7    0.6128103  0.6505940  0.4993619
##   0.01    8    0.6023294  0.6768808  0.4894758
##   0.01    9    0.6063032  0.6683167  0.4912192
##   0.01   10    0.6066756  0.6665995  0.4965936
##   0.10    1    0.7757303  0.4812221  0.6184119
##   0.10    2    0.6454558  0.6335956  0.5254793
##   0.10    3    0.6663638  0.6172061  0.5477837
##   0.10    4    0.6211094  0.6530521  0.5131886
##   0.10    5    0.5879941  0.6870835  0.4747346
##   0.10    6    0.6150274  0.6570889  0.4947629
##   0.10    7    0.6143467  0.6605081  0.5005305
##   0.10    8    0.6140094  0.6399764  0.4912167
##   0.10    9    0.6083117  0.6601753  0.4916095
##   0.10   10    0.6020755  0.6710211  0.4785764
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 6, decay = 0.01 and bag = FALSE.
# Apply our neural network to the test data
nnetPred <- predict(nnetModel, newdata = testPredictors)

# Get the test set performance values
postResample(pred = nnetPred, obs = testTarget)
##      RMSE  Rsquared       MAE 
## 0.6012957 0.6390797 0.5008649


Part a

Which nonlinear regression model gives the optimal resampling and test set performance?

Neural Network performs best in both the test set and with the optimal resampling of the train data. Here’s the table in order of appearance:

Model Test RMSE Test R-squared Train RMSE Train R-squared
kNN 0.7579 0.4851 0.8089 0.4138
SVM 0.6061 0.632 0.6125 0.684
MARS 0.7531 0.4851 0.6652 0.5602
Neural Network 0.6013 0.6391 0.5627 0.7025


Part b

Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

Below are the top ten most important predictors. Manufacturing processes beat out biological materials for importance, seven to three, the same as with the linear regression model at the end of the prior homework: https://rpubs.com/pkofy/1240790

The neural network model has ManufacturingProcess31 and BiologicalMaterial12 top ten predictors but they do not have coefficients in our elastic-net linear regression model referenced above.

And there are top-ten coefficients by absolute value in our elastic-net linear regressio model for ManufacturingProcess11 and ManufacturingProcess33 but they aren’t in our top 10 predictors for the neural network.

# Print the top ten most important predictors
nnetVarImp <- varImp(nnetModel)
top10_nnet <- nnetVarImp$importance |>
  as.data.frame() |>
  dplyr::arrange(desc(Overall)) |>
  head(10)
print(top10_nnet)
##                          Overall
## ManufacturingProcess13 100.00000
## ManufacturingProcess32  92.87049
## ManufacturingProcess17  81.90448
## ManufacturingProcess09  80.21338
## BiologicalMaterial06    72.87876
## ManufacturingProcess31  68.97342
## BiologicalMaterial03    68.34276
## ManufacturingProcess06  67.67449
## ManufacturingProcess36  67.24545
## BiologicalMaterial12    62.43653


Part c

Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

What the comparison of predictors between our neural network and elastic-net linear regression show us is that neural networks are capable of surfacing latent, complex relationships in the dataset that are not visible or tractable through linear methods.

Neural networks are more black box compared to more statistical learning methods. We can’t explore the relationships between the top predictors like we could if they were coefficients in a linear model. However we can use the neural network to identify factors in the process that may be helpful to Yield that we’re on our radar.

For example, BiologicalMaterial12 was one of the newly surfaced meaningful predictors thanks to the neural network model. It looks like it could have a positive correlation to Yield however it maybe fans out for positive values of BiologicalMaterial12. It may be that the material may not effect the process once you have above a necessary and sufficient amount but below that it can negatively impact Yield. It may be that decisions trees might handle this kind of relationship better.

ManufacturingProcess13 was the most significant predictor in the Yield. We can see that high positive values for the process correspond to the worst outcomes in Yield and high negative values for the process correspond to the best outcomes in Yield. It may be that the company should invest in new equipment that allows the chemicals to be consistently run through the high negative values of this process. Or more work needs to be understood about how the other base materials or processes dictate which version of ManufacturingProcess13 can be applied. That insight might be obtainable through decision tree methods.

Next steps would be to run this problem through different types of decision trees.

# Create scatter plots for each predictor against yield
p1 <- ggplot(CMP, aes(x = ManufacturingProcess13, y = Yield)) +
  geom_point() +
  ggtitle("ManufacturingProcess13 vs Yield") +
  theme_minimal()
p2 <- ggplot(CMP, aes(x = BiologicalMaterial12, y = Yield)) +
  geom_point() +
  ggtitle("BiologicalMaterial12 vs Yield") +
  theme_minimal()

# Arrange plots in a grid
grid.arrange(p1, p2, ncol = 2)




References

These exercises come from ‘Applied Predictive Modeling’ by Max Kuhn and Kjell Johnson.