Assignment 8: Non-Linear Regression

Author

Amanda Rose Knudsen

Published

April 13, 2025

This assignment covers Exercises 7.2 and 7.5 from Kuhn and Johnson’s Applied Predictive Modeling. While there are only two problems, each includes multiple steps. Link to Applied Predictive Modeling for reference.

library(tidyverse)
library(caret)
library(pls)
library(glmnet)
library(corrplot)
library(e1071)
library(lattice)
library(car)
library(RANN)
library(AppliedPredictiveModeling)
library(mlbench)
library(earth)
library(kernlab)

7.2 Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:

where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

set.seed(5889)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names. 
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
featurePlot(trainingData$x, trainingData$y)

## or other methods.
set.seed(5889)
## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1) 
testData$x <- data.frame(testData$x)

We’ll train each model separately using the same resampling structure, similar to our approaches for exercises in linear regression.

ctrl <- trainControl(method = "cv", number = 10)

Tune several models on these data. For example:

K-Nearest Neighbors (KNN) – tuneLength = 10

set.seed(5889)
knnModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = "knn",
                  preProc = c("center", "scale"),
                  tuneLength = 10,
                  trControl = ctrl)
knnModel
k-Nearest Neighbors 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  k   RMSE      Rsquared   MAE     
   5  3.168231  0.5800258  2.582620
   7  3.046177  0.6413380  2.501907
   9  3.085816  0.6354590  2.518231
  11  3.101118  0.6430259  2.508910
  13  3.135146  0.6585310  2.545920
  15  3.169694  0.6457537  2.564948
  17  3.181696  0.6562069  2.581792
  19  3.235198  0.6530707  2.652866
  21  3.291182  0.6429373  2.703551
  23  3.338198  0.6327673  2.736384

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 7.

Below we predict and evaluate:

knnPred <- predict(knnModel, newdata = testData$x)
## The function 'postResample' can be used to get the test set 
## performance values
knnResults <- postResample(pred = knnPred, obs = testData$y)
knnResults
     RMSE  Rsquared       MAE 
3.2900978 0.6120825 2.6377038 

Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

Before we move on to different models, let’s look a bit more at what’s going on with the KNN model. We saw the results above, and that the final value used for the model was k = 7 (the number of neighbors = 7). We can also see that using the below:

knnModel$bestTune
  k
2 7

MARS Model

First we’ll fit the MARS (Multivariate Adaptive Regression Spline) model. We won’t manually center and scale this because MARS (via the earth package) doesn’t require centered or scaled predictors. Because MARS works by placing knots in the original scale of each predictor and fitting piecewise linear basis functions, scaling the predictors would change the meaning of these basis functions. So, by default, caret doesn’t center or scale data for MARS unless we are explicit in asking to - and that’s not recommended by Applied Predictive Modeling.

set.seed(5889)
marsFit <- train(x = trainingData$x, y = trainingData$y,
                 method = "earth",
                 trControl = ctrl)

marsFit
Multivariate Adaptive Regression Spline 

200 samples
 10 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  nprune  RMSE      Rsquared   MAE     
   2      3.958849  0.3133991  3.318788
   8      2.133757  0.8047589  1.685374
  14      1.944990  0.8365229  1.557951

Tuning parameter 'degree' was held constant at a value of 1
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 14 and degree = 1.

Now we’ll generate predictions

marsPred <- predict(marsFit, newdata = testData$x)

And now we’ll evaluate performance:

marsResults <- postResample(pred = marsPred, obs = testData$y)
marsResults
     RMSE  Rsquared       MAE 
1.9211835 0.8539689 1.5109182 

For MARS, we didn’t specify a tuneGrid or tuneLength, so caret defaulted to degree = 1 and internally generated a range of values for nprume (the number of terms to retain in the final model)

marsFit$results
  degree nprune     RMSE  Rsquared      MAE    RMSESD RsquaredSD     MAESD
1      1      2 3.958849 0.3133991 3.318788 0.5040435 0.12002011 0.4943766
2      1      8 2.133757 0.8047589 1.685374 0.4117511 0.07529377 0.2879605
3      1     14 1.944990 0.8365229 1.557951 0.4226748 0.06829416 0.2977976

Each row in the results above show us a model built with a different number of basis functions (terms) with its performance.

To view the best nprune:

marsFit$bestTune
  nprune degree
3     14      1

Above we can see that caret foud a 14-term additive model gave the best CV performance using just main effects.

summary(marsFit$finalModel)
Call: earth(x=data.frame[200,10], y=c(10.28,6.983,6...), keepxy=TRUE, degree=1,
            nprune=14)

                coefficients
(Intercept)        20.287415
h(X1-0.286353)     -9.142444
h(0.801181-X1)    -15.416844
h(0.317457-X2)    -16.509359
h(X2-0.317457)      4.744872
h(X2-0.906716)    -40.586588
h(X3-0.308289)     10.115611
h(0.629261-X3)     11.646076
h(0.851428-X4)    -10.513003
h(0.193468-X5)    -14.963935
h(X5-0.193468)      3.883228
h(X8-0.0911924)    -1.169287

Selected 12 of 17 terms, and 6 of 10 predictors (nprune=14)
Termination condition: Reached nk 21
Importance: X4, X2, X1, X5, X3, X8, X6-unused, X7-unused, X9-unused, ...
Number of terms at each degree of interaction: 1 11 (additive model)
GCV 3.481862    RSS 545.4163    GRSq 0.8454129    RSq 0.8777036

This allows us to see what predictors are driving the predictions we saw.

SVM Model

Now let’s try a SVM Model (Support Vector Machine). We’ll do this using an RBF kernel (method = “svmRadial”), internal tuning, and we will center and scale the data as it is recommended to do so. Again as recommended we’ll have caret tune over cost (C) and kernel with (sigma) which is done automaticaly when we’re using the method = "svmRadial".

First we’ll fit the SVM model:

set.seed(5889)
svmFit_rbf <- train(x = trainingData$x, y = trainingData$y,
                    method = "svmRadial",
                    preProc = c("center", "scale"),
                    tuneLength = 10,
                    trControl = ctrl)

svmFit_rbf
Support Vector Machines with Radial Basis Function Kernel 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  C       RMSE      Rsquared   MAE     
    0.25  2.659727  0.7450721  2.095504
    0.50  2.390573  0.7671454  1.860514
    1.00  2.211788  0.7852377  1.668816
    2.00  2.144982  0.7933801  1.616089
    4.00  2.182419  0.7874943  1.633426
    8.00  2.244198  0.7769900  1.689583
   16.00  2.302930  0.7684958  1.764967
   32.00  2.348664  0.7608682  1.790210
   64.00  2.348664  0.7608682  1.790210
  128.00  2.348664  0.7608682  1.790210

Tuning parameter 'sigma' was held constant at a value of 0.05935436
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.05935436 and C = 2.

What’s happening is that caret figures out a range of values for C(cost) which controls the penalty for prediction error, and sigma (which defines the width of the RBF kernel function), and then generates a grid of 10 combinations of C and sigma, to then perform 10fold CV to find the best combination.

We can see what values are tried:

svmFit_rbf$results  
        sigma      C     RMSE  Rsquared      MAE    RMSESD RsquaredSD     MAESD
1  0.05935436   0.25 2.659727 0.7450721 2.095504 0.3627161 0.08018576 0.2280934
2  0.05935436   0.50 2.390573 0.7671454 1.860514 0.3641536 0.07748763 0.2246471
3  0.05935436   1.00 2.211788 0.7852377 1.668816 0.4191699 0.08709607 0.2800315
4  0.05935436   2.00 2.144982 0.7933801 1.616089 0.4417028 0.08461644 0.2993606
5  0.05935436   4.00 2.182419 0.7874943 1.633426 0.4286569 0.08708807 0.2802668
6  0.05935436   8.00 2.244198 0.7769900 1.689583 0.4432195 0.09374840 0.2808045
7  0.05935436  16.00 2.302930 0.7684958 1.764967 0.4679148 0.09888332 0.2977682
8  0.05935436  32.00 2.348664 0.7608682 1.790210 0.4634991 0.09737264 0.3054964
9  0.05935436  64.00 2.348664 0.7608682 1.790210 0.4634991 0.09737264 0.3054964
10 0.05935436 128.00 2.348664 0.7608682 1.790210 0.4634991 0.09737264 0.3054964

We can also see the final best combination chosen:

svmFit_rbf$bestTune
       sigma C
4 0.05935436 2

Now we’ll predict and evaluate on the test data using SVM:

# Predict on test data
svmPred_rbf <- predict(svmFit_rbf, newdata = testData$x)

# Evaluate performance
svmResults_rbf <- postResample(pred = svmPred_rbf, obs = testData$y)
svmResults_rbf
     RMSE  Rsquared       MAE 
2.3019533 0.7987158 1.7493925 

Neural Network

Now we’ll move on to Neural Networks. We’ll use the “nnet” method which trains a single hidden layer neural net.

set.seed(5889)
nnetFit <- train(x = trainingData$x, y = trainingData$y,
                 method = "nnet",
                 preProc = c("center", "scale"),
                 linout = TRUE,    
                 trace = FALSE, 
                 tuneLength = 10,
                 trControl = ctrl)

Caret is tuning the size (number of hidden units) and decay (weight decay or regularization parameter). This matters because size controls model flexibility and decay helps avoid overfitting by shrinking large weights. The linout = TRUE is recommended in Applied Predictive modeling because it enables the linear output layer for regression. (The ‘nonlinear’ in Neural Nets comes frm the hidden layers, not the output. Hidden units apply nonlinear activation functions to combinations of predictors, which transforms the input space in nonlinear ways. The output layer then combines these hidden layer outputs linearly to form the prediction.)

Now we’ll predict on the test set using neural networks and then evaluate the test performance.

nnetPred <- predict(nnetFit, newdata = testData$x)
nnetResults <- postResample(pred = nnetPred, obs = testData$y)
nnetResults
     RMSE  Rsquared       MAE 
2.6720455 0.7199412 2.0665596 

To view all tuning combinations we can use the below, which will show us the size, decay, and associated results:

nnetFit$results
    size        decay     RMSE  Rsquared      MAE    RMSESD RsquaredSD
1      1 0.0000000000 2.960066 0.5911846 2.345260 0.8030337 0.21548934
2      1 0.0001000000 2.754333 0.6628673 2.179794 0.6261467 0.12532593
3      1 0.0002371374 2.683998 0.6798453 2.130446 0.5174341 0.11518671
4      1 0.0005623413 2.959790 0.6095024 2.383364 0.7901749 0.18077456
5      1 0.0013335214 2.707720 0.6718096 2.145022 0.6574551 0.14968160
6      1 0.0031622777 2.622062 0.6964640 2.084730 0.4781297 0.09451252
7      1 0.0074989421 2.852715 0.6392365 2.272274 0.8309276 0.16149795
8      1 0.0177827941 2.561983 0.7106418 2.019489 0.5212574 0.09835629
9      1 0.0421696503 2.559745 0.7109906 2.016831 0.5241593 0.09889309
10     1 0.1000000000 2.557483 0.7112542 2.014034 0.5289092 0.09987091
11     3 0.0000000000 2.566719 0.7295778 2.014360 0.3375631 0.05266269
12     3 0.0001000000 2.521089 0.7342866 1.994573 0.6355167 0.11180401
13     3 0.0002371374 2.657924 0.7024136 2.139791 0.5086297 0.10657603
14     3 0.0005623413 2.737867 0.6740424 2.177606 0.5602990 0.11523765
15     3 0.0013335214 2.894886 0.6564949 2.253619 0.6027926 0.14297946
16     3 0.0031622777 2.616297 0.6919107 2.064000 0.3603812 0.10463876
17     3 0.0074989421 2.792477 0.6660136 2.265722 0.6002464 0.13110662
18     3 0.0177827941 2.593983 0.7065830 2.103752 0.5376259 0.12275188
19     3 0.0421696503 2.712234 0.6898698 2.159340 0.6101531 0.12323786
20     3 0.1000000000 2.567064 0.7122182 2.044009 0.5984428 0.14138634
21     5 0.0000000000 3.441548 0.5851411 2.564986 1.3626999 0.17766988
22     5 0.0001000000 3.137969 0.6486411 2.556474 0.5741337 0.09316172
23     5 0.0002371374 2.787413 0.7012234 2.205887 0.5547889 0.10087250
24     5 0.0005623413 3.312832 0.6062596 2.580191 0.9665075 0.16546731
25     5 0.0013335214 2.794876 0.6857186 2.203715 0.2613675 0.08920459
26     5 0.0031622777 2.803137 0.6679732 2.253740 0.3467603 0.11000014
27     5 0.0074989421 3.328568 0.5786464 2.630923 0.7481979 0.14926777
28     5 0.0177827941 2.903495 0.6597503 2.336283 0.4788614 0.13759762
29     5 0.0421696503 2.912730 0.6435797 2.320075 0.8859488 0.17594765
30     5 0.1000000000 2.641473 0.7014636 2.128984 0.2978718 0.10308623
31     7 0.0000000000 3.801656 0.5353312 2.841621 1.5288182 0.22090076
32     7 0.0001000000 3.456479 0.5250305 2.732063 0.6245788 0.15209604
33     7 0.0002371374 3.690990 0.4981262 2.788840 0.9039976 0.19109467
34     7 0.0005623413 3.614065 0.5589267 2.828468 1.1860625 0.19265880
35     7 0.0013335214 3.105627 0.6254515 2.332971 0.4876116 0.08846038
36     7 0.0031622777 3.508658 0.5417562 2.698946 1.0495877 0.26020452
37     7 0.0074989421 3.610851 0.5463546 2.733308 0.8432205 0.16913401
38     7 0.0177827941 3.396124 0.5958741 2.688354 0.5922410 0.12835764
39     7 0.0421696503 3.481736 0.5669023 2.735132 0.6819072 0.16443693
40     7 0.1000000000 3.455629 0.5768330 2.739779 0.7912482 0.14945609
41     9 0.0000000000 3.751406 0.4901791 2.794844 0.9870253 0.16353641
42     9 0.0001000000 3.201354 0.5989188 2.530696 0.4757652 0.12687101
43     9 0.0002371374 3.647137 0.5336765 2.862189 0.6835737 0.14656417
44     9 0.0005623413 3.364257 0.5909571 2.615956 0.4084434 0.10259914
45     9 0.0013335214 3.506493 0.5684274 2.660703 0.6847518 0.14569658
46     9 0.0031622777 3.421854 0.5693096 2.804027 0.6282507 0.14948863
47     9 0.0074989421 4.024629 0.4842706 3.055307 1.2154853 0.18295640
48     9 0.0177827941 3.372910 0.5698929 2.653723 0.4088231 0.08911905
49     9 0.0421696503 3.865285 0.5286587 2.954066 0.8721052 0.10974794
50     9 0.1000000000 3.427846 0.5969991 2.761035 0.6672111 0.12659825
51    11 0.0000000000 3.774715 0.5093192 3.041021 0.5146279 0.14521014
52    11 0.0001000000 3.836037 0.5075262 2.983011 0.6004038 0.14687441
53    11 0.0002371374 3.467494 0.5617958 2.819735 0.8156509 0.20392925
54    11 0.0005623413 3.801431 0.5294862 3.002152 0.8289512 0.16583883
55    11 0.0013335214 4.094073 0.5096713 3.175053 0.8531817 0.11725832
56    11 0.0031622777 3.795282 0.5477407 2.975173 1.1429714 0.18300116
57    11 0.0074989421 3.639152 0.5128064 2.954849 0.4677694 0.10569186
58    11 0.0177827941 3.395383 0.5815988 2.702883 0.6434576 0.15892129
59    11 0.0421696503 3.179837 0.6283658 2.594157 0.2723401 0.06895899
60    11 0.1000000000 3.216177 0.6003395 2.553010 0.7947188 0.19952629
61    13 0.0000000000 3.763628 0.5420391 3.040120 0.6187630 0.12016371
62    13 0.0001000000 3.358993 0.5995859 2.587192 0.6422878 0.15327953
63    13 0.0002371374 3.725119 0.5242935 3.055166 0.7823044 0.16010368
64    13 0.0005623413 3.497053 0.5715562 2.812181 0.5676922 0.14753425
65    13 0.0013335214 3.243017 0.6028445 2.573356 0.5032160 0.13653429
66    13 0.0031622777 3.839781 0.5326146 3.095219 0.8135036 0.15342268
67    13 0.0074989421 3.274441 0.6013188 2.647078 0.5416352 0.17508324
68    13 0.0177827941 3.401031 0.5943611 2.738969 0.4951447 0.13324139
69    13 0.0421696503 3.230933 0.6146475 2.589246 0.4775377 0.08982416
70    13 0.1000000000 2.987737 0.6792251 2.454073 0.5258678 0.10690720
71    15 0.0000000000 3.733509 0.5027581 2.958709 0.8348844 0.20633637
72    15 0.0001000000 3.795677 0.5013014 3.034377 0.6794316 0.15282640
73    15 0.0002371374 3.456956 0.5802399 2.763125 0.6550600 0.15126532
74    15 0.0005623413 3.491071 0.5503835 2.791689 0.7908174 0.17703876
75    15 0.0013335214 3.613316 0.5373694 2.885987 0.7077221 0.16630385
76    15 0.0031622777 3.896860 0.5104539 3.108518 0.7266770 0.15067889
77    15 0.0074989421 3.488939 0.6136542 2.854495 0.9653271 0.16391058
78    15 0.0177827941 3.257427 0.5795064 2.664384 0.5908838 0.13792338
79    15 0.0421696503 3.250316 0.5945034 2.620743 0.3762659 0.09201788
80    15 0.1000000000 2.812831 0.6750570 2.226116 0.4736826 0.12500414
81    17 0.0000000000 3.509804 0.5736527 2.870158 0.7386233 0.17200021
82    17 0.0001000000 3.492733 0.5741774 2.884995 0.7075357 0.14501999
83    17 0.0002371374 3.557179 0.5559779 2.934753 0.4070211 0.13833713
84    17 0.0005623413 3.952102 0.4463663 3.126522 0.8145069 0.20933664
85    17 0.0013335214 3.042922 0.6432055 2.436510 0.6063014 0.11991384
86    17 0.0031622777 3.406596 0.5926317 2.742714 0.2864579 0.09978513
87    17 0.0074989421 3.195985 0.6340622 2.566064 0.2134780 0.07467178
88    17 0.0177827941 3.350057 0.5956858 2.692250 0.4368769 0.08833437
89    17 0.0421696503 3.083675 0.6335208 2.489995 0.4810994 0.10535660
90    17 0.1000000000 2.920911 0.6499619 2.296324 0.5291126 0.12372427
91    19 0.0000000000 3.401777 0.5686377 2.750448 0.6179573 0.12632473
92    19 0.0001000000 3.531387 0.5699739 2.843451 0.5435198 0.12925707
93    19 0.0002371374 3.152140 0.6609787 2.515176 0.7193125 0.14927450
94    19 0.0005623413 3.410183 0.5785954 2.796352 0.6550564 0.11885304
95    19 0.0013335214 3.402767 0.5957950 2.740472 0.6029413 0.15117808
96    19 0.0031622777 2.912603 0.6831519 2.355858 0.4580279 0.09521777
97    19 0.0074989421 3.033494 0.6489829 2.396184 0.6470104 0.14926975
98    19 0.0177827941 2.796227 0.7122032 2.254410 0.5387210 0.09975262
99    19 0.0421696503 3.012921 0.6640488 2.425036 0.5642545 0.14045877
100   19 0.1000000000 2.866087 0.6677864 2.311866 0.4883501 0.12170358
        MAESD
1   0.5992424
2   0.4855741
3   0.3765276
4   0.6604292
5   0.5346169
6   0.3447922
7   0.6754237
8   0.3889165
9   0.3912795
10  0.3932790
11  0.2384788
12  0.4754516
13  0.4776141
14  0.4676890
15  0.5165422
16  0.2427645
17  0.5260913
18  0.4971889
19  0.5472803
20  0.4749452
21  0.5637074
22  0.4905325
23  0.4271008
24  0.6811314
25  0.1686097
26  0.3075922
27  0.5008935
28  0.3703118
29  0.7529418
30  0.3200940
31  0.8042517
32  0.5049081
33  0.4933523
34  0.7306010
35  0.2250105
36  0.6546970
37  0.5602881
38  0.5243964
39  0.4908017
40  0.7279732
41  0.6774809
42  0.4550003
43  0.4607706
44  0.2718525
45  0.4374228
46  0.5442785
47  0.7255873
48  0.3490047
49  0.4631607
50  0.5864094
51  0.3974149
52  0.3993882
53  0.6711094
54  0.5796917
55  0.6921532
56  0.9225482
57  0.3417509
58  0.5011927
59  0.2514347
60  0.6855944
61  0.5272609
62  0.4524071
63  0.6256756
64  0.4977655
65  0.4608053
66  0.6973014
67  0.5572159
68  0.4294268
69  0.3394830
70  0.4698464
71  0.5877863
72  0.4990896
73  0.6390899
74  0.5941240
75  0.5689859
76  0.5106953
77  0.8179532
78  0.5014137
79  0.2786581
80  0.3227207
81  0.5563664
82  0.7202894
83  0.3419887
84  0.6662232
85  0.5002796
86  0.3158302
87  0.2598377
88  0.4054319
89  0.4452585
90  0.4804259
91  0.4980604
92  0.4387668
93  0.6234566
94  0.5741260
95  0.5213977
96  0.4279582
97  0.4668406
98  0.4204016
99  0.4959834
100 0.3803286

Then to view the best combination:

nnetFit$bestTune
   size decay
12    3 1e-04

To view the model weights:

nnetFit$finalModel
a 10-3-1 network with 37 weights
inputs: X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 
output(s): .outcome 
options were - linear output units  decay=1e-04

10 is the number of input units, 3 is the number of hidden units in the single hidden layer (this matches what we also saw in the previous ‘best combination’ which indicates the size of 3), and 1 is the number of output units. We got to 37 weights because 10 inputs x 3 hidden = 30 weighs, plus each hidden unit has a bias term (3 biases), plus each hidden unit connects to the output node (3 weights), plus the output node has 1 bias term = 37 weights includes all the connections and biases in the model.

Now let’s compare test set performance. We’ve evaluated each model on the 5000 observations test set - to summarize our results:

# Create a data frame comparing performance across models
model_comparison <- data.frame(
  Model = c("MARS", "SVM (RBF)", "Neural Net", "KNN"),
  RMSE = c(marsResults["RMSE"],
           svmResults_rbf["RMSE"],
           nnetResults["RMSE"],
           knnResults["RMSE"]),
  Rsquared = c(marsResults["Rsquared"],
               svmResults_rbf["Rsquared"],
               nnetResults["Rsquared"],
               knnResults["Rsquared"]),
  MAE = c(marsResults["MAE"],
          svmResults_rbf["MAE"],
          nnetResults["MAE"],
          knnResults["MAE"])
)

model_comparison
       Model     RMSE  Rsquared      MAE
1       MARS 1.921183 0.8539689 1.510918
2  SVM (RBF) 2.301953 0.7987158 1.749393
3 Neural Net 2.672046 0.7199412 2.066560
4        KNN 3.290098 0.6120825 2.637704

We can see that MARS performs best, with the lowest RMSE and MAE, and highest R-squared. SVM performed reasonably well, but with a higher RMSE and lower R-squared compared to MARS. Neural Net performs decently, but worse than both MARS and SVM. KNN performs worst — likely because it’s more sensitive to noise and less able to learn smooth nonlinear functions efficiently.

As we know, the RMSE (root mean squared error) is an average magnitude of error between predicted and actual values; it penalizes larger errors more heavily due to squaring. So we’re looking for the lowest RMSE (a lower RMSE is always better) to answer the question of how far off, on average, our model’s prediction is. Since MARS had the lowest RMSE of our models in comparison, this means the MARS model prediction is most accurate.

Another reminder of the terms we’re looking at: MAE means Mean Absolute Error, wich is the average of the absolute differences between predicted and actual values. It doesn’t square the errors so it’s less sensitive to outliers. Like RMSE, lower is better. And also like RMSE, MARS performed best.

When we look at R-squared, we know it is a proportion of variance in the outcome variable that the model can explain, and it ranges from 0 to 1. For R-squared, higher is better. It essentially tells us how much of the data’s variance is explained by the model. A R-squared, like for MARS, of 0.858 means that the model explains 85.8% of the variation in the outcome.

To answer the question of if MARS selected the informative predictors, we can see in the final model results that X1 to X5 were important and used.

summary(marsFit$finalModel)
Call: earth(x=data.frame[200,10], y=c(10.28,6.983,6...), keepxy=TRUE, degree=1,
            nprune=14)

                coefficients
(Intercept)        20.287415
h(X1-0.286353)     -9.142444
h(0.801181-X1)    -15.416844
h(0.317457-X2)    -16.509359
h(X2-0.317457)      4.744872
h(X2-0.906716)    -40.586588
h(X3-0.308289)     10.115611
h(0.629261-X3)     11.646076
h(0.851428-X4)    -10.513003
h(0.193468-X5)    -14.963935
h(X5-0.193468)      3.883228
h(X8-0.0911924)    -1.169287

Selected 12 of 17 terms, and 6 of 10 predictors (nprune=14)
Termination condition: Reached nk 21
Importance: X4, X2, X1, X5, X3, X8, X6-unused, X7-unused, X9-unused, ...
Number of terms at each degree of interaction: 1 11 (additive model)
GCV 3.481862    RSS 545.4163    GRSq 0.8454129    RSq 0.8777036

We can confirm that MARS successfully identified the informative predictors X1-X5.

7.5 Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.

We’ll recreate what we did in 6.3:

data(ChemicalManufacturingProcess)

dim(ChemicalManufacturingProcess)
[1] 176  58

As a reminder, the matrix processPredictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.

head(ChemicalManufacturingProcess)
  Yield BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
1 38.00                 6.25                49.58                56.97
2 42.44                 8.01                60.97                67.48
3 42.03                 8.01                60.97                67.48
4 41.42                 8.01                60.97                67.48
5 42.49                 7.47                63.33                72.25
6 43.57                 6.12                58.36                65.31
  BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
1                12.74                19.51                43.73
2                14.65                19.36                53.14
3                14.65                19.36                53.14
4                14.65                19.36                53.14
5                14.02                17.91                54.66
6                15.17                21.79                51.23
  BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
1                  100                16.66                11.44
2                  100                19.04                12.55
3                  100                19.04                12.55
4                  100                19.04                12.55
5                  100                18.22                12.80
6                  100                18.30                12.13
  BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
1                 3.46               138.09                18.83
2                 3.46               153.67                21.05
3                 3.46               153.67                21.05
4                 3.46               153.67                21.05
5                 3.05               147.61                21.05
6                 3.78               151.88                20.76
  ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
1                     NA                     NA                     NA
2                    0.0                      0                     NA
3                    0.0                      0                     NA
4                    0.0                      0                     NA
5                   10.7                      0                     NA
6                   12.0                      0                     NA
  ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
1                     NA                     NA                     NA
2                    917                 1032.2                  210.0
3                    912                 1003.6                  207.1
4                    911                 1014.6                  213.3
5                    918                 1027.5                  205.7
6                    924                 1016.8                  208.9
  ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
1                     NA                     NA                  43.00
2                    177                    178                  46.57
3                    178                    178                  45.07
4                    177                    177                  44.92
5                    178                    178                  44.96
6                    178                    178                  45.32
  ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
1                     NA                     NA                     NA
2                     NA                     NA                      0
3                     NA                     NA                      0
4                     NA                     NA                      0
5                     NA                     NA                      0
6                     NA                     NA                      0
  ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
1                   35.5                   4898                   6108
2                   34.0                   4869                   6095
3                   34.8                   4878                   6087
4                   34.8                   4897                   6102
5                   34.6                   4992                   6233
6                   34.0                   4985                   6222
  ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
1                   4682                   35.5                   4865
2                   4617                   34.0                   4867
3                   4617                   34.8                   4877
4                   4635                   34.8                   4872
5                   4733                   33.9                   4886
6                   4786                   33.4                   4862
  ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
1                   6049                   4665                    0.0
2                   6097                   4621                    0.0
3                   6078                   4621                    0.0
4                   6073                   4611                    0.0
5                   6102                   4659                   -0.7
6                   6115                   4696                   -0.6
  ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
1                     NA                     NA                     NA
2                      3                      0                      3
3                      4                      1                      4
4                      5                      2                      5
5                      8                      4                     18
6                      9                      1                      1
  ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
1                   4873                   6074                   4685
2                   4869                   6107                   4630
3                   4897                   6116                   4637
4                   4892                   6111                   4630
5                   4930                   6151                   4684
6                   4871                   6128                   4687
  ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
1                   10.7                   21.0                    9.9
2                   11.2                   21.4                    9.9
3                   11.1                   21.3                    9.4
4                   11.1                   21.3                    9.4
5                   11.3                   21.6                    9.0
6                   11.4                   21.7                   10.1
  ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
1                   69.1                    156                     66
2                   68.7                    169                     66
3                   69.3                    173                     66
4                   69.3                    171                     68
5                   69.4                    171                     70
6                   68.2                    173                     70
  ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
1                    2.4                    486                  0.019
2                    2.6                    508                  0.019
3                    2.6                    509                  0.018
4                    2.5                    496                  0.018
5                    2.5                    468                  0.017
6                    2.5                    490                  0.018
  ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
1                    0.5                      3                    7.2
2                    2.0                      2                    7.2
3                    0.7                      2                    7.2
4                    1.2                      2                    7.2
5                    0.2                      2                    7.3
6                    0.4                      2                    7.2
  ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
1                     NA                     NA                   11.6
2                    0.1                   0.15                   11.1
3                    0.0                   0.00                   12.0
4                    0.0                   0.00                   10.6
5                    0.0                   0.00                   11.0
6                    0.0                   0.00                   11.5
  ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
1                    3.0                    1.8                    2.4
2                    0.9                    1.9                    2.2
3                    1.0                    1.8                    2.3
4                    1.1                    1.8                    2.1
5                    1.1                    1.7                    2.1
6                    2.2                    1.8                    2.0

Now we’ll separate predictors and outcome:

cmpY <- ChemicalManufacturingProcess$Yield
cmpX <- ChemicalManufacturingProcess |> select(-Yield)

Now we’ll handle missing data with KNN imputation, as we did in 6.3. As a reminder, we first checked for missing values to confirm this KNN imputation is needed.

sapply(cmpX, function(x) sum(is.na(x)))
  BiologicalMaterial01   BiologicalMaterial02   BiologicalMaterial03 
                     0                      0                      0 
  BiologicalMaterial04   BiologicalMaterial05   BiologicalMaterial06 
                     0                      0                      0 
  BiologicalMaterial07   BiologicalMaterial08   BiologicalMaterial09 
                     0                      0                      0 
  BiologicalMaterial10   BiologicalMaterial11   BiologicalMaterial12 
                     0                      0                      0 
ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03 
                     1                      3                     15 
ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06 
                     1                      1                      2 
ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09 
                     1                      1                      0 
ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12 
                     9                     10                      1 
ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15 
                     0                      1                      0 
ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18 
                     0                      0                      0 
ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21 
                     0                      0                      0 
ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24 
                     1                      1                      1 
ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27 
                     5                      5                      5 
ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30 
                     5                      5                      5 
ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33 
                     5                      0                      5 
ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36 
                     5                      5                      5 
ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39 
                     0                      0                      0 
ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42 
                     1                      1                      0 
ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45 
                     0                      0                      0 

And now, like before, we apply KNN imputation to predictor matrix:

set.seed(5889)
cmp_impute <- preProcess(cmpX, method = "knnImpute")

cmpX_imputed <- predict(cmp_impute, newdata = cmpX)

After this, we combine back with the yield:

cmp_data <- cmpX_imputed |> 
  mutate(Yield = cmpY)

And now we can split the data into test and train:

set.seed(5889)
cmp_index <- createDataPartition(cmp_data$Yield, p = 0.8, list = FALSE)

cmp_train <- cmp_data[cmp_index, ]
cmp_test <- cmp_data[-cmp_index, ]

Since we previously defined a ctrl for the above computations and want to repeat what we did in 6.3 we’ll redefine a new ctrl variable.

ctrl75 <- trainControl(method = "repeatedcv", number = 10, repeats = 5)

Now we’re ready to fit nonlinear models KNN, SVM, MARS, and Neural Network. We’ll use the appropriate resampling (repeated 10 fold CV), evaluate performance (resampled and test set), compare results across models, check variable imporance and then identify which model performs best. We’ll name the following created objects and variables as needed to differentiate that they are for 7.5 (using ‘75’).

KNN

set.seed(5889)
knnFit_75 <- train(
  Yield ~ ., 
  data = cmp_train,
  method = "knn",
  preProc = c("center", "scale"),
  tuneLength = 10,
  trControl = ctrl75
)

A side note that we got a warning about some variables that have zero variances, a helpful note from caret which was telling us that it detected and removed zero-variance predictors from being considered in this model. We don’t need to manually remove them because caret takes care of it. (Care automatically detects and excludes zero-variance predictors when we use preProc = c(“center”, “scale”)).

Now we’ll predict on the test set and then evaluate the performance.

knnPred_75 <- predict(knnFit_75, newdata = cmp_test)

knnResults_75 <- postResample(pred = knnPred_75, obs = cmp_test$Yield)
knnResults_75
     RMSE  Rsquared       MAE 
0.9759157 0.7822784 0.7465625 

The below will show us the RMSE, R-squared, MAE for each k; and the best value of k.

knnFit_75$results
    k     RMSE  Rsquared      MAE    RMSESD RsquaredSD     MAESD
1   5 1.369477 0.4758722 1.118146 0.2812549  0.1703713 0.2268628
2   7 1.399697 0.4574371 1.156011 0.2731749  0.1797775 0.2202406
3   9 1.395386 0.4677463 1.147372 0.2564692  0.1729406 0.2006311
4  11 1.394304 0.4714680 1.143972 0.2735801  0.1691190 0.2124469
5  13 1.408877 0.4695424 1.153873 0.2680810  0.1692628 0.2064313
6  15 1.430685 0.4507518 1.174528 0.2676840  0.1666715 0.2119196
7  17 1.454419 0.4350647 1.195953 0.2640404  0.1563835 0.2070533
8  19 1.460056 0.4339833 1.198976 0.2613643  0.1559580 0.1983905
9  21 1.470713 0.4285200 1.208499 0.2632833  0.1601728 0.2016800
10 23 1.477354 0.4302337 1.212584 0.2673339  0.1664492 0.2047399
knnFit_75$bestTune
  k
1 5

MARS

set.seed(5889)
marsFit_75 <- train(
  Yield ~ .,
  data = cmp_train,
  method = "earth",
  trControl = ctrl75
)

As we know from the previous exercise, MARS doesn’t require centering or scaling, so no preProc is needed.

Let’s predict on the test set and then evaluate:

marsPred_75 <- predict(marsFit_75, newdata = cmp_test)

marsResults_75 <- postResample(pred = marsPred_75, obs = cmp_test$Yield)
marsResults_75
     RMSE  Rsquared       MAE 
1.0615075 0.6452677 0.8247697 

Let’s view the model summary (which predictors and basis functions were selected)

summary(marsFit_75$finalModel)
Call: earth(x=matrix[144,57], y=c(42.03,41.42,4...), keepxy=TRUE, degree=1,
            nprune=9)

                                     coefficients
(Intercept)                             38.420422
h(ManufacturingProcess01- -0.827151)     0.559446
h(-1.17697-ManufacturingProcess09)      -1.101866
h(ManufacturingProcess09- -1.17697)      0.638666
h(-1.28827-ManufacturingProcess13)       2.211612
h(ManufacturingProcess32- -0.827442)     1.296397
h(0.0324569-ManufacturingProcess39)     -0.470706
h(ManufacturingProcess39-0.0324569)     -3.641748

Selected 8 of 21 terms, and 5 of 57 predictors (nprune=9)
Termination condition: RSq changed by less than 0.001 at 21 terms
Importance: ManufacturingProcess32, ManufacturingProcess09, ...
Number of terms at each degree of interaction: 1 7 (additive model)
GCV 1.228672    RSS 141.9884    GRSq 0.6521719    RSq 0.7169442

SVM

As we know, the “preProc = c(”center”, “scale”) is required for SVMs.

set.seed(5889)
svmFit_75 <- train(
  Yield ~ .,
  data = cmp_train,
  method = "svmRadial",
  preProc = c("center", "scale"),  # Required for SVMs
  tuneLength = 10,
  trControl = ctrl75
)

Now we’ll predict and evaluate

svmPred_75 <- predict(svmFit_75, newdata = cmp_test)

svmResults_75 <- postResample(pred = svmPred_75, obs = cmp_test$Yield)
svmResults_75
     RMSE  Rsquared       MAE 
0.9263787 0.7180831 0.7106366 

Let’s see the tuning grid results and best parameters (C and sigma):

# Tuning grid results
svmFit_75$results
        sigma      C     RMSE  Rsquared       MAE    RMSESD RsquaredSD
1  0.01556562   0.25 1.465567 0.4787810 1.1908372 0.2922118  0.1679766
2  0.01556562   0.50 1.354287 0.5183367 1.1002152 0.2794040  0.1616677
3  0.01556562   1.00 1.262472 0.5721336 1.0244130 0.2581267  0.1464872
4  0.01556562   2.00 1.220557 0.5964907 0.9866445 0.2300747  0.1286965
5  0.01556562   4.00 1.199934 0.6103198 0.9738609 0.2170270  0.1191414
6  0.01556562   8.00 1.170481 0.6280943 0.9535726 0.2245423  0.1159037
7  0.01556562  16.00 1.169451 0.6287219 0.9529020 0.2246152  0.1160286
8  0.01556562  32.00 1.169451 0.6287219 0.9529020 0.2246152  0.1160286
9  0.01556562  64.00 1.169451 0.6287219 0.9529020 0.2246152  0.1160286
10 0.01556562 128.00 1.169451 0.6287219 0.9529020 0.2246152  0.1160286
       MAESD
1  0.2211193
2  0.2173428
3  0.2039449
4  0.1826212
5  0.1640918
6  0.1661511
7  0.1667926
8  0.1667926
9  0.1667926
10 0.1667926
# Best parameters (C and sigma)
svmFit_75$bestTune
       sigma  C
7 0.01556562 16

Neural Network

set.seed(5889)
nnetFit_75 <- train(
  Yield ~ .,
  data = cmp_train,
  method = "nnet",
  preProc = c("center", "scale"),
  linout = TRUE,  
  trace = FALSE,  
  tuneLength = 10,
  trControl = ctrl75
)

Now we predict and evaluate:

nnetPred_75 <- predict(nnetFit_75, newdata = cmp_test)

nnetResults_75 <- postResample(pred = nnetPred_75, obs = cmp_test$Yield)
nnetResults_75
     RMSE  Rsquared       MAE 
1.6338375 0.3096302 1.2340037 

Let’s see the results:

nnetFit_75$results
    size        decay      RMSE   Rsquared       MAE     RMSESD RsquaredSD
1      1 0.0000000000  1.735132 0.18988869  1.413048  0.3198043 0.16717812
2      1 0.0001000000  1.712157 0.21086250  1.381305  0.3146858 0.15197733
3      1 0.0002371374  1.670801 0.25904059  1.359907  0.3424025 0.16771038
4      1 0.0005623413  1.606098 0.30442617  1.304261  0.3306714 0.19697895
5      1 0.0013335214  1.625161 0.29228693  1.335344  0.3562655 0.20678114
6      1 0.0031622777  1.553486 0.35841981  1.268865  0.3375925 0.15811488
7      1 0.0074989421  1.520943 0.39098164  1.228656  0.3112334 0.18041485
8      1 0.0177827941  1.615357 0.34694016  1.312344  0.3332564 0.18859652
9      1 0.0421696503  1.566762 0.39760813  1.268838  0.3108324 0.15463503
10     1 0.1000000000  1.468939 0.47570873  1.200686  0.3238952 0.14265828
11     3 0.0000000000  2.216031 0.27039545  1.713184  1.0657612 0.20153346
12     3 0.0001000000  2.502176 0.22990966  1.939308  1.3682944 0.16060924
13     3 0.0002371374  2.497017 0.25861974  1.872851  1.4050034 0.18604466
14     3 0.0005623413  2.418707 0.26935619  1.843016  1.4828924 0.19289591
15     3 0.0013335214  2.467291 0.23383688  1.933121  1.3214576 0.19803300
16     3 0.0031622777  2.620427 0.26424729  2.000634  1.6353961 0.18461965
17     3 0.0074989421  2.480239 0.32496514  1.862744  1.5163080 0.20994708
18     3 0.0177827941  2.016613 0.34306965  1.572395  0.8641634 0.20457992
19     3 0.0421696503  2.435774 0.29464173  1.889162  1.4064705 0.20460592
20     3 0.1000000000  2.215347 0.27015350  1.712746  0.8498539 0.19010541
21     5 0.0000000000  3.181954 0.22369014  2.500705  0.7813749 0.18899285
22     5 0.0001000000  3.151589 0.19836636  2.424363  0.9855112 0.17126421
23     5 0.0002371374  3.221840 0.17920061  2.446661  1.0908986 0.16021084
24     5 0.0005623413  3.141597 0.17289007  2.418582  1.0187761 0.17957810
25     5 0.0013335214  3.058769 0.22373827  2.390483  0.7242279 0.17934710
26     5 0.0031622777  3.241395 0.17188436  2.494768  0.7903091 0.16150863
27     5 0.0074989421  3.280037 0.17379038  2.528194  0.9288774 0.14866793
28     5 0.0177827941  3.318161 0.19129239  2.561806  0.9148724 0.17536518
29     5 0.0421696503  2.931991 0.21853604  2.326737  0.6943278 0.17732788
30     5 0.1000000000  2.847944 0.23438393  2.193826  0.8118014 0.15754998
31     7 0.0000000000  4.761861 0.15272273  3.556010  1.8783635 0.13970644
32     7 0.0001000000  4.217727 0.17124576  3.208297  1.5979752 0.17162747
33     7 0.0002371374  5.019670 0.16903521  3.615459  2.4943514 0.17737180
34     7 0.0005623413  4.663628 0.14662904  3.608764  2.2364215 0.14669209
35     7 0.0013335214  4.777947 0.15173493  3.427383  2.3229430 0.17006427
36     7 0.0031622777  4.901006 0.16376499  3.538207  1.8899257 0.14091284
37     7 0.0074989421  4.397003 0.15878872  3.332445  1.5052179 0.15084788
38     7 0.0177827941  4.647657 0.14426842  3.459640  1.8985653 0.14391218
39     7 0.0421696503  4.224630 0.16476217  3.147239  1.3169477 0.15715119
40     7 0.1000000000  3.869777 0.18585266  2.960576  1.8744326 0.17876724
41     9 0.0000000000  7.435039 0.12782070  5.290614  4.2093640 0.12040754
42     9 0.0001000000  7.934411 0.13991963  5.446565  4.6684130 0.15467852
43     9 0.0002371374  7.847353 0.11774350  5.474588  4.4193355 0.14571278
44     9 0.0005623413  7.316996 0.14038956  4.915736  3.5125211 0.13259268
45     9 0.0013335214  8.123370 0.14712322  5.529229  4.5892790 0.16824381
46     9 0.0031622777  7.812948 0.08307844  5.083599  3.9359874 0.11420496
47     9 0.0074989421  7.923966 0.11927073  5.535078  5.0497639 0.13995416
48     9 0.0177827941  8.128094 0.08967816  5.417294  4.1711337 0.09532866
49     9 0.0421696503  7.575739 0.12539824  5.103917  4.9689741 0.17214022
50     9 0.1000000000  7.753442 0.13652728  5.383150  5.4642225 0.14208246
51    11 0.0000000000 11.438572 0.12147154  7.416054  7.4052133 0.14146196
52    11 0.0001000000 12.767415 0.09308860  7.992963  8.4018954 0.09847411
53    11 0.0002371374 11.346876 0.12564468  6.898611  7.3036528 0.16467903
54    11 0.0005623413 11.680181 0.09284698  7.159124  9.1435708 0.10155799
55    11 0.0013335214 12.592482 0.11982919  8.365890  7.9005750 0.16057680
56    11 0.0031622777 11.797845 0.11031256  7.099793  8.6565466 0.15394595
57    11 0.0074989421 12.414732 0.09588333  7.618211  6.6484052 0.13221886
58    11 0.0177827941 10.484097 0.11973375  6.569796  6.6118033 0.14658405
59    11 0.0421696503 11.121429 0.12266810  6.646377  8.4096752 0.12365149
60    11 0.1000000000 13.072860 0.08436240  8.412216  8.8707459 0.10001150
61    13 0.0000000000 15.235380 0.10457622  9.142022 11.7882053 0.11564877
62    13 0.0001000000 15.313099 0.09484282  9.373113 10.4702751 0.14478900
63    13 0.0002371374 17.679435 0.12819601 10.107094 15.3872892 0.15139018
64    13 0.0005623413 16.052872 0.10614256 10.092961 11.9745972 0.12559984
65    13 0.0013335214 15.918112 0.07629482  9.733242 12.0144539 0.09896697
66    13 0.0031622777 16.301325 0.09162202  9.540227 13.3879571 0.10264684
67    13 0.0074989421 13.008434 0.13433133  7.968517 10.8634050 0.14810119
68    13 0.0177827941 16.044412 0.10717297  9.993763 10.4711079 0.13782335
69    13 0.0421696503 17.140067 0.07741993  9.793529 11.3576684 0.11826825
70    13 0.1000000000 13.080648 0.12675923  8.097826  7.8471717 0.14210575
71    15 0.0000000000  9.544475 0.18057144  5.926233 12.1056661 0.17848233
72    15 0.0001000000  9.272899 0.17365968  5.711893  9.3903405 0.16563614
73    15 0.0002371374  8.670968 0.18263543  5.681464 10.1853857 0.18450139
74    15 0.0005623413  7.538814 0.17066275  4.933647 10.2577054 0.17098396
75    15 0.0013335214  7.552275 0.18036350  5.270901  8.2457346 0.18175507
76    15 0.0031622777  9.579385 0.22288085  5.939304 13.2808035 0.20914132
77    15 0.0074989421  6.259005 0.24801214  4.358716  9.3235562 0.19774048
78    15 0.0177827941  6.622698 0.22047484  4.259485  7.2825948 0.19451275
79    15 0.0421696503  9.975412 0.13983293  6.667825 10.9097275 0.13315065
80    15 0.1000000000  7.699578 0.17366088  4.812874  7.7597001 0.17566573
81    17 0.0000000000       NaN        NaN       NaN         NA         NA
82    17 0.0001000000       NaN        NaN       NaN         NA         NA
83    17 0.0002371374       NaN        NaN       NaN         NA         NA
84    17 0.0005623413       NaN        NaN       NaN         NA         NA
85    17 0.0013335214       NaN        NaN       NaN         NA         NA
86    17 0.0031622777       NaN        NaN       NaN         NA         NA
87    17 0.0074989421       NaN        NaN       NaN         NA         NA
88    17 0.0177827941       NaN        NaN       NaN         NA         NA
89    17 0.0421696503       NaN        NaN       NaN         NA         NA
90    17 0.1000000000       NaN        NaN       NaN         NA         NA
91    19 0.0000000000       NaN        NaN       NaN         NA         NA
92    19 0.0001000000       NaN        NaN       NaN         NA         NA
93    19 0.0002371374       NaN        NaN       NaN         NA         NA
94    19 0.0005623413       NaN        NaN       NaN         NA         NA
95    19 0.0013335214       NaN        NaN       NaN         NA         NA
96    19 0.0031622777       NaN        NaN       NaN         NA         NA
97    19 0.0074989421       NaN        NaN       NaN         NA         NA
98    19 0.0177827941       NaN        NaN       NaN         NA         NA
99    19 0.0421696503       NaN        NaN       NaN         NA         NA
100   19 0.1000000000       NaN        NaN       NaN         NA         NA
        MAESD
1   0.2456481
2   0.2503160
3   0.2783087
4   0.2591465
5   0.2893763
6   0.2841276
7   0.2707129
8   0.2872509
9   0.2590597
10  0.2664049
11  0.7456361
12  1.0114718
13  0.9404681
14  1.0273395
15  1.0294414
16  1.0950296
17  1.0667873
18  0.6831634
19  1.0903657
20  0.6034777
21  0.6443980
22  0.7157954
23  0.6983542
24  0.7299129
25  0.5483201
26  0.5976007
27  0.6918177
28  0.6927003
29  0.5446861
30  0.4920005
31  1.2322731
32  1.1278759
33  1.4752104
34  1.5599947
35  1.2887658
36  1.2056551
37  1.1230219
38  1.2711847
39  0.8249666
40  1.3827140
41  2.6763245
42  3.1994032
43  2.8117213
44  1.9651988
45  2.6218379
46  2.0891668
47  3.0793289
48  2.2508044
49  2.6389294
50  3.4318402
51  4.4840503
52  4.7376594
53  4.0210889
54  4.3163143
55  5.0798476
56  4.4045719
57  3.4501614
58  3.8479727
59  4.3242596
60  5.4803537
61  6.1440850
62  5.5563733
63  6.9778219
64  6.4001374
65  6.0115620
66  7.0465017
67  6.2251570
68  5.6325553
69  5.4445065
70  4.1528617
71  6.4769801
72  5.0579974
73  5.8455663
74  5.8421825
75  5.2315487
76  7.0256517
77  6.0167370
78  3.7669302
79  6.8067854
80  4.2497187
81         NA
82         NA
83         NA
84         NA
85         NA
86         NA
87         NA
88         NA
89         NA
90         NA
91         NA
92         NA
93         NA
94         NA
95         NA
96         NA
97         NA
98         NA
99         NA
100        NA

And now let’s see the best tuning combination of size (hidden units) and decay (regularization)

nnetFit_75$bestTune
   size decay
10    1   0.1

(a) Which nonlinear regression model gives the optimal resampling and test set performance?

model_comparison <- data.frame(
  Model = c("KNN", "MARS", "SVM", "Neural Network"),
  RMSE = c(0.9759, 1.0615, 0.9264, 1.6384),
  Rsquared = c(0.7823, 0.6453, 0.7181, 0.3096),
  MAE = c(0.7466, 0.8248, 0.7106, 1.2340)
)

model_comparison |>  mutate(across(where(is.numeric), round, 3))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `across(where(is.numeric), round, 3)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.

  # Previously
  across(a:b, mean, na.rm = TRUE)

  # Now
  across(a:b, \(x) mean(x, na.rm = TRUE))
           Model  RMSE Rsquared   MAE
1            KNN 0.976    0.782 0.747
2           MARS 1.062    0.645 0.825
3            SVM 0.926    0.718 0.711
4 Neural Network 1.638    0.310 1.234

The nonlinear regression model which gives the optimal resampling and test set performance is KNN (k-Nearest Neighbors). It has the lowest RMSE (most accurate predictions on average), highest r-squared (explains the most variance in Yield), and low MAE despite not the lowest MAE (that was SVM). Neural Network, which by far took the longest to run, also performed the worst (much higher RMSE and MAE and low R-squared, so it explains not much variance in the data). The Neural Network model likely suffered from instability during training which we saw with a lot of warnings that we suppressed in the .Rmd file and the length of time it took to train. SVM performed better than MARS, indicative that SVM is more flexible, and MARS performed decently well but not good enough for consideration as the optimal resampling and test set performance.

(b) Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

For KNN, the below shows us the most important predictors in the KNN model. Manufacturing process variables dominate the top of the list, representing the top 4, which is similar to what we saw in the linear regression model results in 6.3. In fact, the same two variables are at the top for KNN as were for the optimal linear model: ManufacturingProcess32 and ManufacturingProcess13. The #3 and #4 spot for variable imporance are reversed in this KNN variable importance plot: #3 here is ManufacturingProcess17 followed by ManufacturingProcess09 at #4, and in the optimal linear model these were reversed in order. What’s different is that, while ManufacturingProcesses still dominate the very top, in this optimal nonlinear regression model we see a biological process at #5 importance, while with the optimal linear regression model we didn’t see a biological variable until feature #8 in terms of performance. This shift may reflect the nonlinear relationships KNN captures that linear regression does not — indicating biological variables may have more complex or “local” effects that were not well captured in the linear model.

knnImp_75 <- varImp(knnFit_75, scale = FALSE)
plot(knnImp_75, top = 20)

(c) Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

For part (c), I examined the predictors that were highly ranked in the KNN model but did not appear in the top 10 of the optimal linear model from Exercise 6.3.

One example is BiologicalMaterial06, which was ranked #5 in the KNN model but was #11 in importance in the optimal linear regression model. Additionally, BiologicalMaterial03 is #7 in the KNN model but was #14 in importance in the optimal linear regression model. Then we also have ManufacturingProcess31 which is #8 in the KNN model but appeared nowhere in the top 20 important features in the optimal linear regression model.

unique_knn_vars <- c("BiologicalMaterial06", "BiologicalMaterial03", 
                     "ManufacturingProcess31")

for (var in unique_knn_vars) {
  print(
    ggplot(cmp_train, aes_string(x = var, y = "Yield")) +
      geom_point(alpha = 0.6) +
      geom_smooth(method = "loess", se = FALSE, color = "goldenrod") +
      labs(title = paste("Yield vs.", var))
  )
}
`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

When plotting Yield vs BiologicalMaterial06, the relationship appeared nonlinear and slightly curved, suggesting a pattern that KNN could detect but linear model could not. For each of these variables we see a nonlinear relationship that was captured by KNN and makes sense it wasn’t captured in the optimal linear regression model. We can see these relationships are captured by KNN nonlinear regression modeling to capture local patterns. It makes sense that the curved ManufacturingProcess31 relationship was not captured at all in the top 20 variables important to the optimal linear regression model. Overall, these plots support the idea that some biological and process variables have nonlinear or localized effects on yield. This helps explain why a nonlinear model like KNN identified different important predictors compared to the linear approach.