Exercises

Applied Predictive Modeling
Chapter 7 Nonlinear Regression Models

7.2

Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:
y = 10 sin(πx1x2) + 20(x3 − 0.5)2 + 10x4 + 5x5 + N(0, σ2)

where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data
featurePlot(trainingData$x, trainingData$y)

## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

Tune several models on these data.

KNN

knnModel <- train(
  x = trainingData$x,
  y = trainingData$y,
  method = "knn",
  preProc = c("center", "scale"),
  tuneLength = 10)
knnModel
## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.466085  0.5121775  2.816838
##    7  3.349428  0.5452823  2.727410
##    9  3.264276  0.5785990  2.660026
##   11  3.214216  0.6024244  2.603767
##   13  3.196510  0.6176570  2.591935
##   15  3.184173  0.6305506  2.577482
##   17  3.183130  0.6425367  2.567787
##   19  3.198752  0.6483184  2.592683
##   21  3.188993  0.6611428  2.588787
##   23  3.200458  0.6638353  2.604529
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 17.
knnPred <- predict(knnModel, newdata = testData$x)
# The function 'postResample' can be used to get the test set perforamnce values
knn_results = postResample(pred = knnPred, obs = testData$y)
knn_results
##      RMSE  Rsquared       MAE 
## 3.2040595 0.6819919 2.5683461

The optimal number of neighbors in the KNN model is 19 based on having the lowest RMSE value.


Neural Network

Neural Networks are often adversely affected by high correlation among the predictor variables due to the use of gradients to optimize the model parameters.

Before fitting the model, predictors associated with high correlations are identified and filtered out.

# identify predictors to remove to ensure that the maximum absolute pairwise
# correlation between the predictors is less than 0.75.
tooHigh <- findCorrelation(cor(trainingData$x), cutoff = .75)
tooHigh
## integer(0)

None of the predictors are significantly correlated; thus, no predictors need to be removed from the data.

## Create a specific candidate set of models to evaluate:
nnetGrid <- expand.grid(
  .decay = c(0, 0.01, .1),.size = c(1:10),
  .bag = FALSE)

ctrl <- trainControl(method = "cv", number = 10)

set.seed(100)
# mimic approach of choosing the number of hidden units and the amount of weight decay via resampling
nnetTune <- train(trainingData$x, trainingData$y,
  method = "avNNet",
  tuneGrid = nnetGrid,
  trControl = ctrl,
  # Automatically standardize data prior to modeling and prediction
  preProc = c("center", "scale"),
  linout = TRUE,
  trace = FALSE,
  MaxNWts = 10 * (ncol(trainingData$x) + 1) + 10 + 1,
  maxit = 500)

nnetTune
## Model Averaged Neural Network 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE      Rsquared   MAE     
##   0.00    1    2.399958  0.7613127  1.893172
##   0.00    2    2.417296  0.7555717  1.913551
##   0.00    3    2.044313  0.8222590  1.631380
##   0.00    4    2.346892  0.8036428  1.791547
##   0.00    5    2.558811  0.7550840  1.856888
##   0.00    6    3.319656  0.6865631  2.213605
##   0.00    7    3.797724  0.6325088  2.639918
##   0.00    8    6.233372  0.5076975  3.538537
##   0.00    9    4.349457  0.5878049  2.841743
##   0.00   10    3.585916  0.6419600  2.532514
##   0.01    1    2.385355  0.7603048  1.887716
##   0.01    2    2.417179  0.7524496  1.930389
##   0.01    3    2.151198  0.8016034  1.701939
##   0.01    4    2.091926  0.8154384  1.676653
##   0.01    5    2.192652  0.7945339  1.755759
##   0.01    6    2.254642  0.8059197  1.808248
##   0.01    7    2.362394  0.7788018  1.894616
##   0.01    8    2.465421  0.7657129  1.952369
##   0.01    9    2.398000  0.7650013  1.948351
##   0.01   10    2.480909  0.7437663  1.983851
##   0.10    1    2.393966  0.7596430  1.894195
##   0.10    2    2.431003  0.7520281  1.929497
##   0.10    3    2.169934  0.7982368  1.726892
##   0.10    4    2.059080  0.8224159  1.648607
##   0.10    5    1.991051  0.8346028  1.599298
##   0.10    6    2.143273  0.8101609  1.684721
##   0.10    7    2.121592  0.8190956  1.649157
##   0.10    8    2.302167  0.7881632  1.817938
##   0.10    9    2.348140  0.7747370  1.817227
##   0.10   10    2.340229  0.7677416  1.858249
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5, decay = 0.1 and bag = FALSE.
# review performance metrics
nnetTunePred <- predict(nnetTune, newdata = testData$x)
nnet_results = postResample(pred = nnetTunePred, obs = testData$y)
nnet_results
##      RMSE  Rsquared       MAE 
## 1.9823587 0.8458179 1.4858719

The optimal neural network model based on the lowest RMSE value used 5 hidden units and a decay of 0.1


MARS

# Define the candidate models to test
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
# Fix the seed so that the results can be reproduced
set.seed(100)
marsTuned <- train(trainingData$x, trainingData$y,
  method = "earth",
  # Explicitly declare the candidate models to test
  tuneGrid = marsGrid,
  trControl = trainControl(method = "cv"))

marsTuned
## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE     
##   1        2      4.327937  0.2544880  3.600474
##   1        3      3.572450  0.4912720  2.895811
##   1        4      2.596841  0.7183600  2.106341
##   1        5      2.370161  0.7659777  1.918669
##   1        6      2.276141  0.7881481  1.810001
##   1        7      1.766728  0.8751831  1.390215
##   1        8      1.780946  0.8723243  1.401345
##   1        9      1.665091  0.8819775  1.325515
##   1       10      1.663804  0.8821283  1.327657
##   1       11      1.657738  0.8822967  1.331730
##   1       12      1.653784  0.8827903  1.331504
##   1       13      1.648496  0.8823663  1.316407
##   1       14      1.639073  0.8841742  1.312833
##   1       15      1.639073  0.8841742  1.312833
##   1       16      1.639073  0.8841742  1.312833
##   1       17      1.639073  0.8841742  1.312833
##   1       18      1.639073  0.8841742  1.312833
##   1       19      1.639073  0.8841742  1.312833
##   1       20      1.639073  0.8841742  1.312833
##   1       21      1.639073  0.8841742  1.312833
##   1       22      1.639073  0.8841742  1.312833
##   1       23      1.639073  0.8841742  1.312833
##   1       24      1.639073  0.8841742  1.312833
##   1       25      1.639073  0.8841742  1.312833
##   1       26      1.639073  0.8841742  1.312833
##   1       27      1.639073  0.8841742  1.312833
##   1       28      1.639073  0.8841742  1.312833
##   1       29      1.639073  0.8841742  1.312833
##   1       30      1.639073  0.8841742  1.312833
##   1       31      1.639073  0.8841742  1.312833
##   1       32      1.639073  0.8841742  1.312833
##   1       33      1.639073  0.8841742  1.312833
##   1       34      1.639073  0.8841742  1.312833
##   1       35      1.639073  0.8841742  1.312833
##   1       36      1.639073  0.8841742  1.312833
##   1       37      1.639073  0.8841742  1.312833
##   1       38      1.639073  0.8841742  1.312833
##   2        2      4.327937  0.2544880  3.600474
##   2        3      3.572450  0.4912720  2.895811
##   2        4      2.661826  0.7070510  2.173471
##   2        5      2.404015  0.7578971  1.975387
##   2        6      2.243927  0.7914805  1.783072
##   2        7      1.856336  0.8605482  1.435682
##   2        8      1.754607  0.8763186  1.396841
##   2        9      1.603578  0.8938666  1.261361
##   2       10      1.492421  0.9084998  1.168700
##   2       11      1.317350  0.9292504  1.033926
##   2       12      1.304327  0.9320133  1.019108
##   2       13      1.277510  0.9323681  1.002927
##   2       14      1.269626  0.9350024  1.003346
##   2       15      1.266217  0.9359400  1.013893
##   2       16      1.268470  0.9354868  1.011414
##   2       17      1.268470  0.9354868  1.011414
##   2       18      1.268470  0.9354868  1.011414
##   2       19      1.268470  0.9354868  1.011414
##   2       20      1.268470  0.9354868  1.011414
##   2       21      1.268470  0.9354868  1.011414
##   2       22      1.268470  0.9354868  1.011414
##   2       23      1.268470  0.9354868  1.011414
##   2       24      1.268470  0.9354868  1.011414
##   2       25      1.268470  0.9354868  1.011414
##   2       26      1.268470  0.9354868  1.011414
##   2       27      1.268470  0.9354868  1.011414
##   2       28      1.268470  0.9354868  1.011414
##   2       29      1.268470  0.9354868  1.011414
##   2       30      1.268470  0.9354868  1.011414
##   2       31      1.268470  0.9354868  1.011414
##   2       32      1.268470  0.9354868  1.011414
##   2       33      1.268470  0.9354868  1.011414
##   2       34      1.268470  0.9354868  1.011414
##   2       35      1.268470  0.9354868  1.011414
##   2       36      1.268470  0.9354868  1.011414
##   2       37      1.268470  0.9354868  1.011414
##   2       38      1.268470  0.9354868  1.011414
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 15 and degree = 2.
# review performance metrics
marsTunedPred <- predict(marsTuned, newdata = testData$x)
mars_results = postResample(pred = marsTunedPred, obs = testData$y)
mars_results
##      RMSE  Rsquared       MAE 
## 1.1589948 0.9460418 0.9250230

The most optimal MARS model based on the lowest RMSE value is a second-degree model with 15 terms.


SVM

svmRTuned <- train(trainingData$x, trainingData$y,
  method = "svmRadial",
  preProc = c("center", "scale"),
  tuneLength = 14,
  trControl = trainControl(method = "cv"))

svmRTuned$finalModel
## Support Vector Machine object of class "ksvm" 
## 
## SV type: eps-svr  (regression) 
##  parameter : epsilon = 0.1  cost C = 32 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  0.0631548298520486 
## 
## Number of Support Vectors : 150 
## 
## Objective Function Value : -73.2538 
## Training error : 0.008502
# review performance metrics
svmRTunedPred <- predict(svmRTuned, newdata = testData$x)
svm_results = postResample(pred = svmRTunedPred, obs = testData$y)
svm_results
##      RMSE  Rsquared       MAE 
## 2.0741633 0.8255819 1.5755127

Cross-validation results for the SVM model fit a model with a scale factor of 0.1 and a cost value of 32.


Questions

Which models appear to give the best performance?

# compile all model performance results
final_results = rbind(
  knn_results,
  nnet_results,
  mars_results,
  svm_results
)
# add model name column
final_results = data.frame(cbind(Model = c('KNN', 'NNET', 'MARS', 'SVM'), final_results))
rownames(final_results) <- NULL
# output results
knit_table(final_results |> arrange(RMSE), 'Model Performance Metrics')
Model Performance Metrics
Model RMSE Rsquared MAE
MARS 1.15899475879587 0.946041768997385 0.925023038062675
NNET 1.98235867606548 0.845817911406657 1.48587190378097
SVM 2.07416325523646 0.825581870904828 1.57551266804197
KNN 3.20405949374908 0.681991908733613 2.56834607044141

The MARS model appears to have best performance due to having the lowest RMSE.


Does MARS select the informative predictors (those named X1–X5)?

# estimate the importance of each predictor
mars_importance = varImp(marsTuned)[1]$importance
mars_importance = rownames_to_column(mars_importance, var = "Predictor")
# output results
knit_table(mars_importance, 'MARS Predictor Estimated Importance')
MARS Predictor Estimated Importance
Predictor Overall
X1 100.00000
X4 75.23592
X2 48.72974
X5 15.51884
X3 0.00000
plot(varImp(marsTuned), main = "MARS Predictor Importance - Top 10")

The MARS model did select the informative predictors (those named X1–X5) based on the estimated importance of each predictor. X1 appears to be the most significant followed by X4, X2, X5, and X3 respectively. X3 appears to have have little to no significance due to have an importance overall importance of 0.


7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.

Data Prep

Impute, Transform, and Scale the Data

  • The predictor BiologicalMaterial07 was identified to have near zero variance due to having very few unique values relative to the number of samples. This columns was removed from the data.
data(ChemicalManufacturingProcess)

# impute, transform, and standardize the data
data_prep_process <- preProcess(
  ChemicalManufacturingProcess, 
  method = c("knnImpute", "BoxCox", "center", "scale"))
prep_data <- predict(data_prep_process, ChemicalManufacturingProcess)

# identify predictors with zero or near zero variance
colnames(prep_data)[nearZeroVar(prep_data)]
## [1] "BiologicalMaterial07"
# remove columns with zero or near zero variance 
prep_data = prep_data[,-nearZeroVar(prep_data)]

Split the Data

# Set the random number seed so we can reproduce the results
set.seed(20)
trainingRows <- createDataPartition(prep_data$Yield, p = .80, list= FALSE)

#split data into train and test sets
train_chem <- prep_data[trainingRows, ]
test_chem <- prep_data[-trainingRows, ]

KNN

knnModel_chem <- train(Yield ~., data = train_chem,
  method = "knn",
  preProcess = c("center", "scale"),
  tuneLength = 10
  )
knnModel_chem
## k-Nearest Neighbors 
## 
## 144 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.8015017  0.3823492  0.6458311
##    7  0.8012521  0.3820255  0.6478658
##    9  0.7948816  0.3885799  0.6431372
##   11  0.7946423  0.3899859  0.6414679
##   13  0.7965878  0.3892959  0.6433097
##   15  0.7994008  0.3858255  0.6479619
##   17  0.7985998  0.3912364  0.6470940
##   19  0.7971403  0.3980982  0.6451050
##   21  0.8006246  0.3959270  0.6485058
##   23  0.8053417  0.3901015  0.6514357
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 11.
knnPred_chem <- predict(knnModel_chem, newdata = test_chem)
knn_results_chem = postResample(pred = knnPred_chem, obs = test_chem$Yield)
knn_results_chem
##      RMSE  Rsquared       MAE 
## 0.6492212 0.4874072 0.4683866

The optimal number of neighbors in the KNN model is 11 based on having the lowest RMSE value.

Neural Network

Neural Networks are often adversely affected by high correlation among the predictor variables due to the use of gradients to optimize the model parameters.

Before fitting the model, predictors associated with high correlations are identified and filtered out.

# identify predictors to remove to ensure that the maximum absolute pairwise
# correlation between the predictors is less than 0.75.
tooHigh <- findCorrelation(cor(train_chem), cutoff = .75)
knit_table(tibble(`Highly Correlated Predictors` = colnames(train_chem)[tooHigh]))
Highly Correlated Predictors
BiologicalMaterial02
BiologicalMaterial06
BiologicalMaterial01
BiologicalMaterial08
BiologicalMaterial04
BiologicalMaterial12
ManufacturingProcess32
ManufacturingProcess15
ManufacturingProcess29
ManufacturingProcess13
ManufacturingProcess10
ManufacturingProcess42
ManufacturingProcess45
ManufacturingProcess26
ManufacturingProcess39
ManufacturingProcess27
ManufacturingProcess31
ManufacturingProcess18
ManufacturingProcess40
# Remove column with high correlations to keep all pair-wise correlations below a 0.75 threshold
train_nnet_chem <- train_chem[, -tooHigh]
test_nnet_chem <- test_chem[, -tooHigh]

19 predictors were identified to have pair-wise correlations greater than 0.75. These columns were removed from the data before fitting the neural network model.

# mimic approach of choosing the number of hidden units and the amount of weight decay via resampling
nnetTune_chem <- train(Yield ~ ., data = train_nnet_chem,
  method = "avNNet",
  tuneGrid = nnetGrid,
  trControl = ctrl,
  preProc = c("center", "scale"),
  linout = TRUE,
  trace = FALSE,
  MaxNWts = 10 * (ncol(train_nnet_chem) + 1) + 10 + 1,
  maxit = 500)

nnetTune_chem
## Model Averaged Neural Network 
## 
## 144 samples
##  37 predictor
## 
## Pre-processing: centered (37), scaled (37) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 129, 130, 129, 131, 130, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE      
##   0.00    1    0.9215431  0.4085737  0.6936327
##   0.00    2    0.7530974  0.4798629  0.6166415
##   0.00    3    0.8347486  0.4546008  0.6615904
##   0.00    4    0.7986780  0.4759001  0.6187075
##   0.00    5    0.8240804  0.4415768  0.6695649
##   0.00    6    0.8709329  0.3934973  0.7096894
##   0.00    7    0.8296750  0.4654343  0.6773977
##   0.00    8    0.7559354  0.5047882  0.6215966
##   0.00    9    0.7590205  0.4899559  0.6037898
##   0.00   10    0.7598074  0.4991927  0.6129294
##   0.01    1    0.7964384  0.4556596  0.6335519
##   0.01    2    0.8240593  0.4510977  0.6801288
##   0.01    3    0.7587551  0.4874756  0.6200164
##   0.01    4    0.7838466  0.4572082  0.6262609
##   0.01    5    0.7062712  0.5628523  0.5675729
##   0.01    6    0.7124832  0.5505332  0.5641237
##   0.01    7    0.6879704  0.5772658  0.5401702
##   0.01    8    0.6791100  0.5891584  0.5206540
##   0.01    9    0.7181114  0.5493945  0.5868633
##   0.01   10    0.7282227  0.5512819  0.5789078
##   0.10    1    0.8200261  0.4562204  0.6530313
##   0.10    2    0.7654329  0.4771717  0.6208142
##   0.10    3    0.7548395  0.4814655  0.6071775
##   0.10    4    0.7594984  0.5231600  0.6168862
##   0.10    5    0.6882829  0.5729120  0.5397624
##   0.10    6    0.6805450  0.5765358  0.5365855
##   0.10    7    0.6745016  0.5840402  0.5553380
##   0.10    8    0.7126941  0.5357889  0.5719314
##   0.10    9    0.7431749  0.5330285  0.5987037
##   0.10   10    0.7215311  0.5379258  0.5893741
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 7, decay = 0.1 and bag = FALSE.
# review performance metrics
nnetTunePred_chem <- predict(nnetTune_chem, newdata = test_nnet_chem)
nnet_results_chem = postResample(pred = nnetTunePred_chem, obs = test_nnet_chem$Yield)
nnet_results_chem
##      RMSE  Rsquared       MAE 
## 0.5496806 0.6633652 0.4538905

The optimal neural network model based on the lowest RMSE value used 7 hidden units and a decay of 0.1

MARS

# Fix the seed so that the results can be reproduced
set.seed(100)
marsTuned_chem <- train(Yield ~ ., data = train_chem,
  method = "earth",
  # Explicitly declare the candidate models to test
  tuneGrid = marsGrid,
  trControl = trainControl(method = "cv"))

marsTuned_chem
## Multivariate Adaptive Regression Spline 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 130, 130, 130, 130, 130, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE      
##   1        2      0.7740598  0.4733556  0.6087136
##   1        3      0.6996593  0.5417369  0.5490306
##   1        4      0.6605515  0.5834685  0.5378210
##   1        5      0.6814633  0.5435487  0.5672462
##   1        6      0.7024723  0.5376756  0.5669634
##   1        7      0.6743175  0.5767315  0.5401251
##   1        8      0.6764564  0.5843267  0.5427274
##   1        9      0.6758093  0.5853744  0.5395543
##   1       10      0.7000308  0.5631781  0.5453968
##   1       11      0.6827439  0.5894983  0.5211233
##   1       12      0.6712193  0.6039004  0.5194857
##   1       13      0.6831724  0.5943378  0.5325571
##   1       14      0.6792247  0.6016117  0.5241257
##   1       15      0.6724057  0.6059972  0.5228352
##   1       16      0.6754366  0.6027430  0.5231209
##   1       17      0.6737388  0.6048366  0.5192630
##   1       18      0.6737388  0.6048366  0.5192630
##   1       19      0.6737388  0.6048366  0.5192630
##   1       20      0.6737388  0.6048366  0.5192630
##   1       21      0.6737388  0.6048366  0.5192630
##   1       22      0.6737388  0.6048366  0.5192630
##   1       23      0.6737388  0.6048366  0.5192630
##   1       24      0.6737388  0.6048366  0.5192630
##   1       25      0.6737388  0.6048366  0.5192630
##   1       26      0.6737388  0.6048366  0.5192630
##   1       27      0.6737388  0.6048366  0.5192630
##   1       28      0.6737388  0.6048366  0.5192630
##   1       29      0.6737388  0.6048366  0.5192630
##   1       30      0.6737388  0.6048366  0.5192630
##   1       31      0.6737388  0.6048366  0.5192630
##   1       32      0.6737388  0.6048366  0.5192630
##   1       33      0.6737388  0.6048366  0.5192630
##   1       34      0.6737388  0.6048366  0.5192630
##   1       35      0.6737388  0.6048366  0.5192630
##   1       36      0.6737388  0.6048366  0.5192630
##   1       37      0.6737388  0.6048366  0.5192630
##   1       38      0.6737388  0.6048366  0.5192630
##   2        2      0.7740598  0.4733556  0.6087136
##   2        3      0.6865599  0.5550566  0.5484936
##   2        4      0.6485393  0.6060205  0.5202082
##   2        5      0.6583152  0.5847700  0.5180857
##   2        6      0.6619458  0.5748836  0.5411086
##   2        7      0.6593264  0.5849328  0.5448480
##   2        8      0.7455957  0.5147119  0.5922504
##   2        9      0.7443781  0.5214152  0.5801832
##   2       10      0.7295405  0.5459154  0.5654232
##   2       11      0.7604953  0.5071400  0.5721723
##   2       12      0.7566631  0.5213203  0.5707833
##   2       13      0.7541511  0.5290848  0.5728947
##   2       14      0.7634419  0.5177030  0.5804917
##   2       15      0.7975741  0.5027785  0.5971431
##   2       16      0.8589678  0.4888981  0.6209762
##   2       17      0.9017096  0.4842544  0.6289030
##   2       18      0.9260934  0.4680455  0.6453277
##   2       19      1.5936833  0.4330744  0.8559668
##   2       20      1.6512573  0.4213563  0.8863791
##   2       21      1.6478361  0.4222369  0.8801275
##   2       22      1.6473637  0.4213893  0.8821900
##   2       23      1.6482950  0.4214995  0.8840967
##   2       24      1.6516864  0.4178207  0.8862218
##   2       25      1.6516864  0.4178207  0.8862218
##   2       26      1.6516864  0.4178207  0.8862218
##   2       27      1.6516864  0.4178207  0.8862218
##   2       28      1.6516864  0.4178207  0.8862218
##   2       29      1.6516864  0.4178207  0.8862218
##   2       30      1.6516864  0.4178207  0.8862218
##   2       31      1.6516864  0.4178207  0.8862218
##   2       32      1.6516864  0.4178207  0.8862218
##   2       33      1.6516864  0.4178207  0.8862218
##   2       34      1.6516864  0.4178207  0.8862218
##   2       35      1.6516864  0.4178207  0.8862218
##   2       36      1.6516864  0.4178207  0.8862218
##   2       37      1.6516864  0.4178207  0.8862218
##   2       38      1.6516864  0.4178207  0.8862218
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 4 and degree = 2.
# review performance metrics
marsTunedPred_chem <- predict(marsTuned_chem, newdata = test_chem)
mars_results_chem = postResample(pred = marsTunedPred_chem, obs = test_chem$Yield)
mars_results_chem
##      RMSE  Rsquared       MAE 
## 0.5580107 0.6202870 0.4588720

The most optimal MARS model based on the lowest RMSE value is a second-degree model with 4 terms.

SVM

svmRTuned_chem <- train(Yield ~ ., data = train_chem,
  method = "svmRadial",
  preProc = c("center", "scale"),
  tuneLength = 14,
  trControl = trainControl(method = "cv"))

svmRTuned_chem
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 144 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 128, 129, 130, 131, 130, 128, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE       Rsquared   MAE      
##      0.25  0.7571299  0.5092540  0.6254305
##      0.50  0.6990910  0.5537348  0.5747492
##      1.00  0.6639040  0.5901893  0.5412185
##      2.00  0.6380184  0.6177798  0.5112582
##      4.00  0.6296507  0.6267349  0.5091241
##      8.00  0.6218346  0.6345227  0.5035030
##     16.00  0.6201751  0.6352947  0.5025944
##     32.00  0.6201751  0.6352947  0.5025944
##     64.00  0.6201751  0.6352947  0.5025944
##    128.00  0.6201751  0.6352947  0.5025944
##    256.00  0.6201751  0.6352947  0.5025944
##    512.00  0.6201751  0.6352947  0.5025944
##   1024.00  0.6201751  0.6352947  0.5025944
##   2048.00  0.6201751  0.6352947  0.5025944
## 
## Tuning parameter 'sigma' was held constant at a value of 0.0138412
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.0138412 and C = 16.
svmRTuned_chem$finalModel
## Support Vector Machine object of class "ksvm" 
## 
## SV type: eps-svr  (regression) 
##  parameter : epsilon = 0.1  cost C = 16 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  0.0138411996697324 
## 
## Number of Support Vectors : 119 
## 
## Objective Function Value : -93.3045 
## Training error : 0.008903
# review performance metrics
svmRTunedPred_chem <- predict(svmRTuned_chem, newdata = test_chem)
svm_results_chem = postResample(pred = svmRTunedPred_chem, obs = test_chem$Yield)
svm_results_chem
##      RMSE  Rsquared       MAE 
## 0.5156197 0.6943808 0.4294884

Cross-validation results for the SVM model fit a model with a scale factor of 0.1 and a cost value of 16.

Questions

(a) Which nonlinear regression model gives the optimal resampling and test set performance?

# compile all model performance results
final_results_chem = rbind(
  knn_results_chem,
  nnet_results_chem,
  mars_results_chem,
  svm_results_chem
)
# add model name column
final_results_chem = data.frame(cbind(Model = c('KNN', 'NNET', 'MARS', 'SVM'), final_results_chem))
rownames(final_results_chem) <- NULL
# output results
knit_table(final_results_chem |> arrange(RMSE), 'Model Performance Metrics')
Model Performance Metrics
Model RMSE Rsquared MAE
SVM 0.515619707472275 0.694380777283164 0.429488407257436
NNET 0.549680579464381 0.663365169364669 0.453890505095549
MARS 0.558010711328633 0.620287015285582 0.458871959004251
KNN 0.649221205563854 0.487407206700223 0.468386557728143

The SVM model appears to have best overall nonlinear regression performance due to having the lowest RMSE.


(b)

  • Which predictors are most important in the optimal nonlinear regression model?
  • Do either the biological or process variables dominate the list?
  • How do the top ten important predictors compare to the top ten predictors from the optimal linear model?
plot(varImp(svmRTuned_chem), top = 10,
     main = "SVM Model Important Predictors - Top 10")

Process variables dominate the top 10 most important predictors of the SVM model. Process variables make up 60% of the most important predictors in both the top 10 and the top 5 important predictors. This same pattern is seen within the top 10 most important predictors of the optimal linear model that was fit to this data in the previous assignment.


(c) Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

svm_chem_importance = varImp(svmRTuned_chem)$importance
svm_chem_importance = rownames_to_column(svm_chem_importance, var = "Predictor")

svm_chem_importance = svm_chem_importance |>
  arrange(desc(Overall))|>
  head(10)

top_chem_var = svm_chem_importance$Predictor
svm_chem_importance_long = prep_data |>
  select(all_of(top_chem_var), Yield) |>
  pivot_longer(cols = -Yield, names_to = "predictor", values_to = "value")

# top predictors vs yield
svm_chem_importance_long |>
  ggplot(aes(x = value, y = Yield)) +
  geom_point() +
  geom_smooth(method = "lm", formula = 'y ~ x', color = "red") +
  facet_wrap(~predictor, scales = "free_x", nrow = 3) +
  theme_minimal() +
  labs(title = "Top Predictors vs Yield")

# pair plots
top_pred_df = prep_data |>
  select(all_of(top_chem_var), Yield)
colnames(top_pred_df) = gsub("Manufacturing|Material", "", colnames(top_pred_df))
ggpairs(
  top_pred_df,
  title="Pair Plot - Top Predictors & Yield",
  progress = FALSE)

ggpairs(
  top_pred_df |> select(matches('Biological|Yield')),
  lower.panel=panel.smooth,
  title="Pair Plot - Top Biological Predictors & Yield",
  progress = FALSE,)
## Warning: The `...` argument of `ggpais()` is deprecated as of GGally 2.3.0.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

process_df = top_pred_df |>
  select(matches('Process|Yield'))
colnames(process_df) = gsub("Manufacturing", "", colnames(process_df))
ggpairs(
  process_df,
  title="Pair Plot - Top Process Predictors & Yield",
  progress = FALSE,)

  • The biological predictors all have a positive linear trend across the same y-axis scale when compared to each other, as well as a high positive correlation value. Yield is also on the same scale as the biological predictors
  • The process predictors have a lot more variance in distribution and y-axis scale than the biological predictors. While the process predictors tend to have a more clustered distribution, there does not seem to be a common trend between all of them.
  • Some predictors clearly drive yield (positive or negative)
    • Process32: strong positive correlation
    • Biological02: strong positive correlation
    • Process36: strong negative correlation
  • Predictors with weak linear correlation but visible structure in scatterplots are good candidates for being “unique to the nonlinear model”. These predictors may be important for defining the decision boundary in an SVM.
  • Predictors like Process36 (with a messy but clear downward trend) may be modeled more accurately by SVM than by linear regression