library(tidyverse)
library(caret)
library(pls)
library(glmnet)
library(corrplot)
library(e1071)
library(lattice)
library(car)
library(RANN)
library(AppliedPredictiveModeling)
library(mlbench)
library(earth)
library(kernlab)Assignment 8: Non-Linear Regression
This assignment covers Exercises 7.2 and 7.5 from Kuhn and Johnson’s Applied Predictive Modeling. While there are only two problems, each includes multiple steps. Link to Applied Predictive Modeling for reference.
7.2 Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:
where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:
set.seed(5889)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
featurePlot(trainingData$x, trainingData$y)## or other methods.set.seed(5889)
## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)We’ll train each model separately using the same resampling structure, similar to our approaches for exercises in linear regression.
ctrl <- trainControl(method = "cv", number = 10)Tune several models on these data. For example:
K-Nearest Neighbors (KNN) – tuneLength = 10
set.seed(5889)
knnModel <- train(x = trainingData$x,
y = trainingData$y,
method = "knn",
preProc = c("center", "scale"),
tuneLength = 10,
trControl = ctrl)
knnModelk-Nearest Neighbors
200 samples
10 predictor
Pre-processing: centered (10), scaled (10)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
Resampling results across tuning parameters:
k RMSE Rsquared MAE
5 3.168231 0.5800258 2.582620
7 3.046177 0.6413380 2.501907
9 3.085816 0.6354590 2.518231
11 3.101118 0.6430259 2.508910
13 3.135146 0.6585310 2.545920
15 3.169694 0.6457537 2.564948
17 3.181696 0.6562069 2.581792
19 3.235198 0.6530707 2.652866
21 3.291182 0.6429373 2.703551
23 3.338198 0.6327673 2.736384
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 7.
Below we predict and evaluate:
knnPred <- predict(knnModel, newdata = testData$x)
## The function 'postResample' can be used to get the test set
## performance values
knnResults <- postResample(pred = knnPred, obs = testData$y)
knnResults RMSE Rsquared MAE
3.2900978 0.6120825 2.6377038
Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?
Before we move on to different models, let’s look a bit more at what’s going on with the KNN model. We saw the results above, and that the final value used for the model was k = 7 (the number of neighbors = 7). We can also see that using the below:
knnModel$bestTune k
2 7
MARS Model
First we’ll fit the MARS (Multivariate Adaptive Regression Spline) model. We won’t manually center and scale this because MARS (via the earth package) doesn’t require centered or scaled predictors. Because MARS works by placing knots in the original scale of each predictor and fitting piecewise linear basis functions, scaling the predictors would change the meaning of these basis functions. So, by default, caret doesn’t center or scale data for MARS unless we are explicit in asking to - and that’s not recommended by Applied Predictive Modeling.
set.seed(5889)
marsFit <- train(x = trainingData$x, y = trainingData$y,
method = "earth",
trControl = ctrl)
marsFitMultivariate Adaptive Regression Spline
200 samples
10 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
Resampling results across tuning parameters:
nprune RMSE Rsquared MAE
2 3.958849 0.3133991 3.318788
8 2.133757 0.8047589 1.685374
14 1.944990 0.8365229 1.557951
Tuning parameter 'degree' was held constant at a value of 1
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 14 and degree = 1.
Now we’ll generate predictions
marsPred <- predict(marsFit, newdata = testData$x)And now we’ll evaluate performance:
marsResults <- postResample(pred = marsPred, obs = testData$y)
marsResults RMSE Rsquared MAE
1.9211835 0.8539689 1.5109182
For MARS, we didn’t specify a tuneGrid or tuneLength, so caret defaulted to degree = 1 and internally generated a range of values for nprume (the number of terms to retain in the final model)
marsFit$results degree nprune RMSE Rsquared MAE RMSESD RsquaredSD MAESD
1 1 2 3.958849 0.3133991 3.318788 0.5040435 0.12002011 0.4943766
2 1 8 2.133757 0.8047589 1.685374 0.4117511 0.07529377 0.2879605
3 1 14 1.944990 0.8365229 1.557951 0.4226748 0.06829416 0.2977976
Each row in the results above show us a model built with a different number of basis functions (terms) with its performance.
To view the best nprune:
marsFit$bestTune nprune degree
3 14 1
Above we can see that caret foud a 14-term additive model gave the best CV performance using just main effects.
summary(marsFit$finalModel)Call: earth(x=data.frame[200,10], y=c(10.28,6.983,6...), keepxy=TRUE, degree=1,
nprune=14)
coefficients
(Intercept) 20.287415
h(X1-0.286353) -9.142444
h(0.801181-X1) -15.416844
h(0.317457-X2) -16.509359
h(X2-0.317457) 4.744872
h(X2-0.906716) -40.586588
h(X3-0.308289) 10.115611
h(0.629261-X3) 11.646076
h(0.851428-X4) -10.513003
h(0.193468-X5) -14.963935
h(X5-0.193468) 3.883228
h(X8-0.0911924) -1.169287
Selected 12 of 17 terms, and 6 of 10 predictors (nprune=14)
Termination condition: Reached nk 21
Importance: X4, X2, X1, X5, X3, X8, X6-unused, X7-unused, X9-unused, ...
Number of terms at each degree of interaction: 1 11 (additive model)
GCV 3.481862 RSS 545.4163 GRSq 0.8454129 RSq 0.8777036
This allows us to see what predictors are driving the predictions we saw.
SVM Model
Now let’s try a SVM Model (Support Vector Machine). We’ll do this using an RBF kernel (method = “svmRadial”), internal tuning, and we will center and scale the data as it is recommended to do so. Again as recommended we’ll have caret tune over cost (C) and kernel with (sigma) which is done automaticaly when we’re using the method = "svmRadial".
First we’ll fit the SVM model:
set.seed(5889)
svmFit_rbf <- train(x = trainingData$x, y = trainingData$y,
method = "svmRadial",
preProc = c("center", "scale"),
tuneLength = 10,
trControl = ctrl)
svmFit_rbfSupport Vector Machines with Radial Basis Function Kernel
200 samples
10 predictor
Pre-processing: centered (10), scaled (10)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
Resampling results across tuning parameters:
C RMSE Rsquared MAE
0.25 2.659727 0.7450721 2.095504
0.50 2.390573 0.7671454 1.860514
1.00 2.211788 0.7852377 1.668816
2.00 2.144982 0.7933801 1.616089
4.00 2.182419 0.7874943 1.633426
8.00 2.244198 0.7769900 1.689583
16.00 2.302930 0.7684958 1.764967
32.00 2.348664 0.7608682 1.790210
64.00 2.348664 0.7608682 1.790210
128.00 2.348664 0.7608682 1.790210
Tuning parameter 'sigma' was held constant at a value of 0.05935436
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.05935436 and C = 2.
What’s happening is that caret figures out a range of values for C(cost) which controls the penalty for prediction error, and sigma (which defines the width of the RBF kernel function), and then generates a grid of 10 combinations of C and sigma, to then perform 10fold CV to find the best combination.
We can see what values are tried:
svmFit_rbf$results sigma C RMSE Rsquared MAE RMSESD RsquaredSD MAESD
1 0.05935436 0.25 2.659727 0.7450721 2.095504 0.3627161 0.08018576 0.2280934
2 0.05935436 0.50 2.390573 0.7671454 1.860514 0.3641536 0.07748763 0.2246471
3 0.05935436 1.00 2.211788 0.7852377 1.668816 0.4191699 0.08709607 0.2800315
4 0.05935436 2.00 2.144982 0.7933801 1.616089 0.4417028 0.08461644 0.2993606
5 0.05935436 4.00 2.182419 0.7874943 1.633426 0.4286569 0.08708807 0.2802668
6 0.05935436 8.00 2.244198 0.7769900 1.689583 0.4432195 0.09374840 0.2808045
7 0.05935436 16.00 2.302930 0.7684958 1.764967 0.4679148 0.09888332 0.2977682
8 0.05935436 32.00 2.348664 0.7608682 1.790210 0.4634991 0.09737264 0.3054964
9 0.05935436 64.00 2.348664 0.7608682 1.790210 0.4634991 0.09737264 0.3054964
10 0.05935436 128.00 2.348664 0.7608682 1.790210 0.4634991 0.09737264 0.3054964
We can also see the final best combination chosen:
svmFit_rbf$bestTune sigma C
4 0.05935436 2
Now we’ll predict and evaluate on the test data using SVM:
# Predict on test data
svmPred_rbf <- predict(svmFit_rbf, newdata = testData$x)
# Evaluate performance
svmResults_rbf <- postResample(pred = svmPred_rbf, obs = testData$y)
svmResults_rbf RMSE Rsquared MAE
2.3019533 0.7987158 1.7493925
Neural Network
Now we’ll move on to Neural Networks. We’ll use the “nnet” method which trains a single hidden layer neural net.
set.seed(5889)
nnetFit <- train(x = trainingData$x, y = trainingData$y,
method = "nnet",
preProc = c("center", "scale"),
linout = TRUE,
trace = FALSE,
tuneLength = 10,
trControl = ctrl)Caret is tuning the size (number of hidden units) and decay (weight decay or regularization parameter). This matters because size controls model flexibility and decay helps avoid overfitting by shrinking large weights. The linout = TRUE is recommended in Applied Predictive modeling because it enables the linear output layer for regression. (The ‘nonlinear’ in Neural Nets comes frm the hidden layers, not the output. Hidden units apply nonlinear activation functions to combinations of predictors, which transforms the input space in nonlinear ways. The output layer then combines these hidden layer outputs linearly to form the prediction.)
Now we’ll predict on the test set using neural networks and then evaluate the test performance.
nnetPred <- predict(nnetFit, newdata = testData$x)nnetResults <- postResample(pred = nnetPred, obs = testData$y)
nnetResults RMSE Rsquared MAE
2.6720455 0.7199412 2.0665596
To view all tuning combinations we can use the below, which will show us the size, decay, and associated results:
nnetFit$results size decay RMSE Rsquared MAE RMSESD RsquaredSD
1 1 0.0000000000 2.960066 0.5911846 2.345260 0.8030337 0.21548934
2 1 0.0001000000 2.754333 0.6628673 2.179794 0.6261467 0.12532593
3 1 0.0002371374 2.683998 0.6798453 2.130446 0.5174341 0.11518671
4 1 0.0005623413 2.959790 0.6095024 2.383364 0.7901749 0.18077456
5 1 0.0013335214 2.707720 0.6718096 2.145022 0.6574551 0.14968160
6 1 0.0031622777 2.622062 0.6964640 2.084730 0.4781297 0.09451252
7 1 0.0074989421 2.852715 0.6392365 2.272274 0.8309276 0.16149795
8 1 0.0177827941 2.561983 0.7106418 2.019489 0.5212574 0.09835629
9 1 0.0421696503 2.559745 0.7109906 2.016831 0.5241593 0.09889309
10 1 0.1000000000 2.557483 0.7112542 2.014034 0.5289092 0.09987091
11 3 0.0000000000 2.566719 0.7295778 2.014360 0.3375631 0.05266269
12 3 0.0001000000 2.521089 0.7342866 1.994573 0.6355167 0.11180401
13 3 0.0002371374 2.657924 0.7024136 2.139791 0.5086297 0.10657603
14 3 0.0005623413 2.737867 0.6740424 2.177606 0.5602990 0.11523765
15 3 0.0013335214 2.894886 0.6564949 2.253619 0.6027926 0.14297946
16 3 0.0031622777 2.616297 0.6919107 2.064000 0.3603812 0.10463876
17 3 0.0074989421 2.792477 0.6660136 2.265722 0.6002464 0.13110662
18 3 0.0177827941 2.593983 0.7065830 2.103752 0.5376259 0.12275188
19 3 0.0421696503 2.712234 0.6898698 2.159340 0.6101531 0.12323786
20 3 0.1000000000 2.567064 0.7122182 2.044009 0.5984428 0.14138634
21 5 0.0000000000 3.441548 0.5851411 2.564986 1.3626999 0.17766988
22 5 0.0001000000 3.137969 0.6486411 2.556474 0.5741337 0.09316172
23 5 0.0002371374 2.787413 0.7012234 2.205887 0.5547889 0.10087250
24 5 0.0005623413 3.312832 0.6062596 2.580191 0.9665075 0.16546731
25 5 0.0013335214 2.794876 0.6857186 2.203715 0.2613675 0.08920459
26 5 0.0031622777 2.803137 0.6679732 2.253740 0.3467603 0.11000014
27 5 0.0074989421 3.328568 0.5786464 2.630923 0.7481979 0.14926777
28 5 0.0177827941 2.903495 0.6597503 2.336283 0.4788614 0.13759762
29 5 0.0421696503 2.912730 0.6435797 2.320075 0.8859488 0.17594765
30 5 0.1000000000 2.641473 0.7014636 2.128984 0.2978718 0.10308623
31 7 0.0000000000 3.801656 0.5353312 2.841621 1.5288182 0.22090076
32 7 0.0001000000 3.456479 0.5250305 2.732063 0.6245788 0.15209604
33 7 0.0002371374 3.690990 0.4981262 2.788840 0.9039976 0.19109467
34 7 0.0005623413 3.614065 0.5589267 2.828468 1.1860625 0.19265880
35 7 0.0013335214 3.105627 0.6254515 2.332971 0.4876116 0.08846038
36 7 0.0031622777 3.508658 0.5417562 2.698946 1.0495877 0.26020452
37 7 0.0074989421 3.610851 0.5463546 2.733308 0.8432205 0.16913401
38 7 0.0177827941 3.396124 0.5958741 2.688354 0.5922410 0.12835764
39 7 0.0421696503 3.481736 0.5669023 2.735132 0.6819072 0.16443693
40 7 0.1000000000 3.455629 0.5768330 2.739779 0.7912482 0.14945609
41 9 0.0000000000 3.751406 0.4901791 2.794844 0.9870253 0.16353641
42 9 0.0001000000 3.201354 0.5989188 2.530696 0.4757652 0.12687101
43 9 0.0002371374 3.647137 0.5336765 2.862189 0.6835737 0.14656417
44 9 0.0005623413 3.364257 0.5909571 2.615956 0.4084434 0.10259914
45 9 0.0013335214 3.506493 0.5684274 2.660703 0.6847518 0.14569658
46 9 0.0031622777 3.421854 0.5693096 2.804027 0.6282507 0.14948863
47 9 0.0074989421 4.024629 0.4842706 3.055307 1.2154853 0.18295640
48 9 0.0177827941 3.372910 0.5698929 2.653723 0.4088231 0.08911905
49 9 0.0421696503 3.865285 0.5286587 2.954066 0.8721052 0.10974794
50 9 0.1000000000 3.427846 0.5969991 2.761035 0.6672111 0.12659825
51 11 0.0000000000 3.774715 0.5093192 3.041021 0.5146279 0.14521014
52 11 0.0001000000 3.836037 0.5075262 2.983011 0.6004038 0.14687441
53 11 0.0002371374 3.467494 0.5617958 2.819735 0.8156509 0.20392925
54 11 0.0005623413 3.801431 0.5294862 3.002152 0.8289512 0.16583883
55 11 0.0013335214 4.094073 0.5096713 3.175053 0.8531817 0.11725832
56 11 0.0031622777 3.795282 0.5477407 2.975173 1.1429714 0.18300116
57 11 0.0074989421 3.639152 0.5128064 2.954849 0.4677694 0.10569186
58 11 0.0177827941 3.395383 0.5815988 2.702883 0.6434576 0.15892129
59 11 0.0421696503 3.179837 0.6283658 2.594157 0.2723401 0.06895899
60 11 0.1000000000 3.216177 0.6003395 2.553010 0.7947188 0.19952629
61 13 0.0000000000 3.763628 0.5420391 3.040120 0.6187630 0.12016371
62 13 0.0001000000 3.358993 0.5995859 2.587192 0.6422878 0.15327953
63 13 0.0002371374 3.725119 0.5242935 3.055166 0.7823044 0.16010368
64 13 0.0005623413 3.497053 0.5715562 2.812181 0.5676922 0.14753425
65 13 0.0013335214 3.243017 0.6028445 2.573356 0.5032160 0.13653429
66 13 0.0031622777 3.839781 0.5326146 3.095219 0.8135036 0.15342268
67 13 0.0074989421 3.274441 0.6013188 2.647078 0.5416352 0.17508324
68 13 0.0177827941 3.401031 0.5943611 2.738969 0.4951447 0.13324139
69 13 0.0421696503 3.230933 0.6146475 2.589246 0.4775377 0.08982416
70 13 0.1000000000 2.987737 0.6792251 2.454073 0.5258678 0.10690720
71 15 0.0000000000 3.733509 0.5027581 2.958709 0.8348844 0.20633637
72 15 0.0001000000 3.795677 0.5013014 3.034377 0.6794316 0.15282640
73 15 0.0002371374 3.456956 0.5802399 2.763125 0.6550600 0.15126532
74 15 0.0005623413 3.491071 0.5503835 2.791689 0.7908174 0.17703876
75 15 0.0013335214 3.613316 0.5373694 2.885987 0.7077221 0.16630385
76 15 0.0031622777 3.896860 0.5104539 3.108518 0.7266770 0.15067889
77 15 0.0074989421 3.488939 0.6136542 2.854495 0.9653271 0.16391058
78 15 0.0177827941 3.257427 0.5795064 2.664384 0.5908838 0.13792338
79 15 0.0421696503 3.250316 0.5945034 2.620743 0.3762659 0.09201788
80 15 0.1000000000 2.812831 0.6750570 2.226116 0.4736826 0.12500414
81 17 0.0000000000 3.509804 0.5736527 2.870158 0.7386233 0.17200021
82 17 0.0001000000 3.492733 0.5741774 2.884995 0.7075357 0.14501999
83 17 0.0002371374 3.557179 0.5559779 2.934753 0.4070211 0.13833713
84 17 0.0005623413 3.952102 0.4463663 3.126522 0.8145069 0.20933664
85 17 0.0013335214 3.042922 0.6432055 2.436510 0.6063014 0.11991384
86 17 0.0031622777 3.406596 0.5926317 2.742714 0.2864579 0.09978513
87 17 0.0074989421 3.195985 0.6340622 2.566064 0.2134780 0.07467178
88 17 0.0177827941 3.350057 0.5956858 2.692250 0.4368769 0.08833437
89 17 0.0421696503 3.083675 0.6335208 2.489995 0.4810994 0.10535660
90 17 0.1000000000 2.920911 0.6499619 2.296324 0.5291126 0.12372427
91 19 0.0000000000 3.401777 0.5686377 2.750448 0.6179573 0.12632473
92 19 0.0001000000 3.531387 0.5699739 2.843451 0.5435198 0.12925707
93 19 0.0002371374 3.152140 0.6609787 2.515176 0.7193125 0.14927450
94 19 0.0005623413 3.410183 0.5785954 2.796352 0.6550564 0.11885304
95 19 0.0013335214 3.402767 0.5957950 2.740472 0.6029413 0.15117808
96 19 0.0031622777 2.912603 0.6831519 2.355858 0.4580279 0.09521777
97 19 0.0074989421 3.033494 0.6489829 2.396184 0.6470104 0.14926975
98 19 0.0177827941 2.796227 0.7122032 2.254410 0.5387210 0.09975262
99 19 0.0421696503 3.012921 0.6640488 2.425036 0.5642545 0.14045877
100 19 0.1000000000 2.866087 0.6677864 2.311866 0.4883501 0.12170358
MAESD
1 0.5992424
2 0.4855741
3 0.3765276
4 0.6604292
5 0.5346169
6 0.3447922
7 0.6754237
8 0.3889165
9 0.3912795
10 0.3932790
11 0.2384788
12 0.4754516
13 0.4776141
14 0.4676890
15 0.5165422
16 0.2427645
17 0.5260913
18 0.4971889
19 0.5472803
20 0.4749452
21 0.5637074
22 0.4905325
23 0.4271008
24 0.6811314
25 0.1686097
26 0.3075922
27 0.5008935
28 0.3703118
29 0.7529418
30 0.3200940
31 0.8042517
32 0.5049081
33 0.4933523
34 0.7306010
35 0.2250105
36 0.6546970
37 0.5602881
38 0.5243964
39 0.4908017
40 0.7279732
41 0.6774809
42 0.4550003
43 0.4607706
44 0.2718525
45 0.4374228
46 0.5442785
47 0.7255873
48 0.3490047
49 0.4631607
50 0.5864094
51 0.3974149
52 0.3993882
53 0.6711094
54 0.5796917
55 0.6921532
56 0.9225482
57 0.3417509
58 0.5011927
59 0.2514347
60 0.6855944
61 0.5272609
62 0.4524071
63 0.6256756
64 0.4977655
65 0.4608053
66 0.6973014
67 0.5572159
68 0.4294268
69 0.3394830
70 0.4698464
71 0.5877863
72 0.4990896
73 0.6390899
74 0.5941240
75 0.5689859
76 0.5106953
77 0.8179532
78 0.5014137
79 0.2786581
80 0.3227207
81 0.5563664
82 0.7202894
83 0.3419887
84 0.6662232
85 0.5002796
86 0.3158302
87 0.2598377
88 0.4054319
89 0.4452585
90 0.4804259
91 0.4980604
92 0.4387668
93 0.6234566
94 0.5741260
95 0.5213977
96 0.4279582
97 0.4668406
98 0.4204016
99 0.4959834
100 0.3803286
Then to view the best combination:
nnetFit$bestTune size decay
12 3 1e-04
To view the model weights:
nnetFit$finalModela 10-3-1 network with 37 weights
inputs: X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
output(s): .outcome
options were - linear output units decay=1e-04
10 is the number of input units, 3 is the number of hidden units in the single hidden layer (this matches what we also saw in the previous ‘best combination’ which indicates the size of 3), and 1 is the number of output units. We got to 37 weights because 10 inputs x 3 hidden = 30 weighs, plus each hidden unit has a bias term (3 biases), plus each hidden unit connects to the output node (3 weights), plus the output node has 1 bias term = 37 weights includes all the connections and biases in the model.
Now let’s compare test set performance. We’ve evaluated each model on the 5000 observations test set - to summarize our results:
# Create a data frame comparing performance across models
model_comparison <- data.frame(
Model = c("MARS", "SVM (RBF)", "Neural Net", "KNN"),
RMSE = c(marsResults["RMSE"],
svmResults_rbf["RMSE"],
nnetResults["RMSE"],
knnResults["RMSE"]),
Rsquared = c(marsResults["Rsquared"],
svmResults_rbf["Rsquared"],
nnetResults["Rsquared"],
knnResults["Rsquared"]),
MAE = c(marsResults["MAE"],
svmResults_rbf["MAE"],
nnetResults["MAE"],
knnResults["MAE"])
)
model_comparison Model RMSE Rsquared MAE
1 MARS 1.921183 0.8539689 1.510918
2 SVM (RBF) 2.301953 0.7987158 1.749393
3 Neural Net 2.672046 0.7199412 2.066560
4 KNN 3.290098 0.6120825 2.637704
We can see that MARS performs best, with the lowest RMSE and MAE, and highest R-squared. SVM performed reasonably well, but with a higher RMSE and lower R-squared compared to MARS. Neural Net performs decently, but worse than both MARS and SVM. KNN performs worst — likely because it’s more sensitive to noise and less able to learn smooth nonlinear functions efficiently.
As we know, the RMSE (root mean squared error) is an average magnitude of error between predicted and actual values; it penalizes larger errors more heavily due to squaring. So we’re looking for the lowest RMSE (a lower RMSE is always better) to answer the question of how far off, on average, our model’s prediction is. Since MARS had the lowest RMSE of our models in comparison, this means the MARS model prediction is most accurate.
Another reminder of the terms we’re looking at: MAE means Mean Absolute Error, wich is the average of the absolute differences between predicted and actual values. It doesn’t square the errors so it’s less sensitive to outliers. Like RMSE, lower is better. And also like RMSE, MARS performed best.
When we look at R-squared, we know it is a proportion of variance in the outcome variable that the model can explain, and it ranges from 0 to 1. For R-squared, higher is better. It essentially tells us how much of the data’s variance is explained by the model. A R-squared, like for MARS, of 0.858 means that the model explains 85.8% of the variation in the outcome.
To answer the question of if MARS selected the informative predictors, we can see in the final model results that X1 to X5 were important and used.
summary(marsFit$finalModel)Call: earth(x=data.frame[200,10], y=c(10.28,6.983,6...), keepxy=TRUE, degree=1,
nprune=14)
coefficients
(Intercept) 20.287415
h(X1-0.286353) -9.142444
h(0.801181-X1) -15.416844
h(0.317457-X2) -16.509359
h(X2-0.317457) 4.744872
h(X2-0.906716) -40.586588
h(X3-0.308289) 10.115611
h(0.629261-X3) 11.646076
h(0.851428-X4) -10.513003
h(0.193468-X5) -14.963935
h(X5-0.193468) 3.883228
h(X8-0.0911924) -1.169287
Selected 12 of 17 terms, and 6 of 10 predictors (nprune=14)
Termination condition: Reached nk 21
Importance: X4, X2, X1, X5, X3, X8, X6-unused, X7-unused, X9-unused, ...
Number of terms at each degree of interaction: 1 11 (additive model)
GCV 3.481862 RSS 545.4163 GRSq 0.8454129 RSq 0.8777036
We can confirm that MARS successfully identified the informative predictors X1-X5.
7.5 Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.
We’ll recreate what we did in 6.3:
data(ChemicalManufacturingProcess)
dim(ChemicalManufacturingProcess)[1] 176 58
As a reminder, the matrix processPredictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.
head(ChemicalManufacturingProcess) Yield BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
1 38.00 6.25 49.58 56.97
2 42.44 8.01 60.97 67.48
3 42.03 8.01 60.97 67.48
4 41.42 8.01 60.97 67.48
5 42.49 7.47 63.33 72.25
6 43.57 6.12 58.36 65.31
BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
1 12.74 19.51 43.73
2 14.65 19.36 53.14
3 14.65 19.36 53.14
4 14.65 19.36 53.14
5 14.02 17.91 54.66
6 15.17 21.79 51.23
BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
1 100 16.66 11.44
2 100 19.04 12.55
3 100 19.04 12.55
4 100 19.04 12.55
5 100 18.22 12.80
6 100 18.30 12.13
BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
1 3.46 138.09 18.83
2 3.46 153.67 21.05
3 3.46 153.67 21.05
4 3.46 153.67 21.05
5 3.05 147.61 21.05
6 3.78 151.88 20.76
ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
1 NA NA NA
2 0.0 0 NA
3 0.0 0 NA
4 0.0 0 NA
5 10.7 0 NA
6 12.0 0 NA
ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
1 NA NA NA
2 917 1032.2 210.0
3 912 1003.6 207.1
4 911 1014.6 213.3
5 918 1027.5 205.7
6 924 1016.8 208.9
ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
1 NA NA 43.00
2 177 178 46.57
3 178 178 45.07
4 177 177 44.92
5 178 178 44.96
6 178 178 45.32
ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
1 NA NA NA
2 NA NA 0
3 NA NA 0
4 NA NA 0
5 NA NA 0
6 NA NA 0
ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
1 35.5 4898 6108
2 34.0 4869 6095
3 34.8 4878 6087
4 34.8 4897 6102
5 34.6 4992 6233
6 34.0 4985 6222
ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
1 4682 35.5 4865
2 4617 34.0 4867
3 4617 34.8 4877
4 4635 34.8 4872
5 4733 33.9 4886
6 4786 33.4 4862
ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
1 6049 4665 0.0
2 6097 4621 0.0
3 6078 4621 0.0
4 6073 4611 0.0
5 6102 4659 -0.7
6 6115 4696 -0.6
ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
1 NA NA NA
2 3 0 3
3 4 1 4
4 5 2 5
5 8 4 18
6 9 1 1
ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
1 4873 6074 4685
2 4869 6107 4630
3 4897 6116 4637
4 4892 6111 4630
5 4930 6151 4684
6 4871 6128 4687
ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
1 10.7 21.0 9.9
2 11.2 21.4 9.9
3 11.1 21.3 9.4
4 11.1 21.3 9.4
5 11.3 21.6 9.0
6 11.4 21.7 10.1
ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
1 69.1 156 66
2 68.7 169 66
3 69.3 173 66
4 69.3 171 68
5 69.4 171 70
6 68.2 173 70
ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
1 2.4 486 0.019
2 2.6 508 0.019
3 2.6 509 0.018
4 2.5 496 0.018
5 2.5 468 0.017
6 2.5 490 0.018
ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
1 0.5 3 7.2
2 2.0 2 7.2
3 0.7 2 7.2
4 1.2 2 7.2
5 0.2 2 7.3
6 0.4 2 7.2
ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
1 NA NA 11.6
2 0.1 0.15 11.1
3 0.0 0.00 12.0
4 0.0 0.00 10.6
5 0.0 0.00 11.0
6 0.0 0.00 11.5
ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
1 3.0 1.8 2.4
2 0.9 1.9 2.2
3 1.0 1.8 2.3
4 1.1 1.8 2.1
5 1.1 1.7 2.1
6 2.2 1.8 2.0
Now we’ll separate predictors and outcome:
cmpY <- ChemicalManufacturingProcess$Yield
cmpX <- ChemicalManufacturingProcess |> select(-Yield)Now we’ll handle missing data with KNN imputation, as we did in 6.3. As a reminder, we first checked for missing values to confirm this KNN imputation is needed.
sapply(cmpX, function(x) sum(is.na(x))) BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
0 0 0
BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
0 0 0
BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
0 0 0
BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
0 0 0
ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
1 3 15
ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
1 1 2
ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
1 1 0
ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
9 10 1
ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
0 1 0
ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
0 0 0
ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
0 0 0
ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
1 1 1
ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
5 5 5
ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
5 5 5
ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
5 0 5
ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
5 5 5
ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
0 0 0
ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
1 1 0
ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
0 0 0
And now, like before, we apply KNN imputation to predictor matrix:
set.seed(5889)
cmp_impute <- preProcess(cmpX, method = "knnImpute")
cmpX_imputed <- predict(cmp_impute, newdata = cmpX)After this, we combine back with the yield:
cmp_data <- cmpX_imputed |>
mutate(Yield = cmpY)And now we can split the data into test and train:
set.seed(5889)
cmp_index <- createDataPartition(cmp_data$Yield, p = 0.8, list = FALSE)
cmp_train <- cmp_data[cmp_index, ]
cmp_test <- cmp_data[-cmp_index, ]Since we previously defined a ctrl for the above computations and want to repeat what we did in 6.3 we’ll redefine a new ctrl variable.
ctrl75 <- trainControl(method = "repeatedcv", number = 10, repeats = 5)Now we’re ready to fit nonlinear models KNN, SVM, MARS, and Neural Network. We’ll use the appropriate resampling (repeated 10 fold CV), evaluate performance (resampled and test set), compare results across models, check variable imporance and then identify which model performs best. We’ll name the following created objects and variables as needed to differentiate that they are for 7.5 (using ‘75’).
KNN
set.seed(5889)
knnFit_75 <- train(
Yield ~ .,
data = cmp_train,
method = "knn",
preProc = c("center", "scale"),
tuneLength = 10,
trControl = ctrl75
)A side note that we got a warning about some variables that have zero variances, a helpful note from caret which was telling us that it detected and removed zero-variance predictors from being considered in this model. We don’t need to manually remove them because caret takes care of it. (Care automatically detects and excludes zero-variance predictors when we use preProc = c(“center”, “scale”)).
Now we’ll predict on the test set and then evaluate the performance.
knnPred_75 <- predict(knnFit_75, newdata = cmp_test)
knnResults_75 <- postResample(pred = knnPred_75, obs = cmp_test$Yield)
knnResults_75 RMSE Rsquared MAE
0.9759157 0.7822784 0.7465625
The below will show us the RMSE, R-squared, MAE for each k; and the best value of k.
knnFit_75$results k RMSE Rsquared MAE RMSESD RsquaredSD MAESD
1 5 1.369477 0.4758722 1.118146 0.2812549 0.1703713 0.2268628
2 7 1.399697 0.4574371 1.156011 0.2731749 0.1797775 0.2202406
3 9 1.395386 0.4677463 1.147372 0.2564692 0.1729406 0.2006311
4 11 1.394304 0.4714680 1.143972 0.2735801 0.1691190 0.2124469
5 13 1.408877 0.4695424 1.153873 0.2680810 0.1692628 0.2064313
6 15 1.430685 0.4507518 1.174528 0.2676840 0.1666715 0.2119196
7 17 1.454419 0.4350647 1.195953 0.2640404 0.1563835 0.2070533
8 19 1.460056 0.4339833 1.198976 0.2613643 0.1559580 0.1983905
9 21 1.470713 0.4285200 1.208499 0.2632833 0.1601728 0.2016800
10 23 1.477354 0.4302337 1.212584 0.2673339 0.1664492 0.2047399
knnFit_75$bestTune k
1 5
MARS
set.seed(5889)
marsFit_75 <- train(
Yield ~ .,
data = cmp_train,
method = "earth",
trControl = ctrl75
)As we know from the previous exercise, MARS doesn’t require centering or scaling, so no preProc is needed.
Let’s predict on the test set and then evaluate:
marsPred_75 <- predict(marsFit_75, newdata = cmp_test)
marsResults_75 <- postResample(pred = marsPred_75, obs = cmp_test$Yield)
marsResults_75 RMSE Rsquared MAE
1.0615075 0.6452677 0.8247697
Let’s view the model summary (which predictors and basis functions were selected)
summary(marsFit_75$finalModel)Call: earth(x=matrix[144,57], y=c(42.03,41.42,4...), keepxy=TRUE, degree=1,
nprune=9)
coefficients
(Intercept) 38.420422
h(ManufacturingProcess01- -0.827151) 0.559446
h(-1.17697-ManufacturingProcess09) -1.101866
h(ManufacturingProcess09- -1.17697) 0.638666
h(-1.28827-ManufacturingProcess13) 2.211612
h(ManufacturingProcess32- -0.827442) 1.296397
h(0.0324569-ManufacturingProcess39) -0.470706
h(ManufacturingProcess39-0.0324569) -3.641748
Selected 8 of 21 terms, and 5 of 57 predictors (nprune=9)
Termination condition: RSq changed by less than 0.001 at 21 terms
Importance: ManufacturingProcess32, ManufacturingProcess09, ...
Number of terms at each degree of interaction: 1 7 (additive model)
GCV 1.228672 RSS 141.9884 GRSq 0.6521719 RSq 0.7169442
SVM
As we know, the “preProc = c(”center”, “scale”) is required for SVMs.
set.seed(5889)
svmFit_75 <- train(
Yield ~ .,
data = cmp_train,
method = "svmRadial",
preProc = c("center", "scale"), # Required for SVMs
tuneLength = 10,
trControl = ctrl75
)Now we’ll predict and evaluate
svmPred_75 <- predict(svmFit_75, newdata = cmp_test)
svmResults_75 <- postResample(pred = svmPred_75, obs = cmp_test$Yield)
svmResults_75 RMSE Rsquared MAE
0.9263787 0.7180831 0.7106366
Let’s see the tuning grid results and best parameters (C and sigma):
# Tuning grid results
svmFit_75$results sigma C RMSE Rsquared MAE RMSESD RsquaredSD
1 0.01556562 0.25 1.465567 0.4787810 1.1908372 0.2922118 0.1679766
2 0.01556562 0.50 1.354287 0.5183367 1.1002152 0.2794040 0.1616677
3 0.01556562 1.00 1.262472 0.5721336 1.0244130 0.2581267 0.1464872
4 0.01556562 2.00 1.220557 0.5964907 0.9866445 0.2300747 0.1286965
5 0.01556562 4.00 1.199934 0.6103198 0.9738609 0.2170270 0.1191414
6 0.01556562 8.00 1.170481 0.6280943 0.9535726 0.2245423 0.1159037
7 0.01556562 16.00 1.169451 0.6287219 0.9529020 0.2246152 0.1160286
8 0.01556562 32.00 1.169451 0.6287219 0.9529020 0.2246152 0.1160286
9 0.01556562 64.00 1.169451 0.6287219 0.9529020 0.2246152 0.1160286
10 0.01556562 128.00 1.169451 0.6287219 0.9529020 0.2246152 0.1160286
MAESD
1 0.2211193
2 0.2173428
3 0.2039449
4 0.1826212
5 0.1640918
6 0.1661511
7 0.1667926
8 0.1667926
9 0.1667926
10 0.1667926
# Best parameters (C and sigma)
svmFit_75$bestTune sigma C
7 0.01556562 16
Neural Network
set.seed(5889)
nnetFit_75 <- train(
Yield ~ .,
data = cmp_train,
method = "nnet",
preProc = c("center", "scale"),
linout = TRUE,
trace = FALSE,
tuneLength = 10,
trControl = ctrl75
)Now we predict and evaluate:
nnetPred_75 <- predict(nnetFit_75, newdata = cmp_test)
nnetResults_75 <- postResample(pred = nnetPred_75, obs = cmp_test$Yield)
nnetResults_75 RMSE Rsquared MAE
1.6338375 0.3096302 1.2340037
Let’s see the results:
nnetFit_75$results size decay RMSE Rsquared MAE RMSESD RsquaredSD
1 1 0.0000000000 1.735132 0.18988869 1.413048 0.3198043 0.16717812
2 1 0.0001000000 1.712157 0.21086250 1.381305 0.3146858 0.15197733
3 1 0.0002371374 1.670801 0.25904059 1.359907 0.3424025 0.16771038
4 1 0.0005623413 1.606098 0.30442617 1.304261 0.3306714 0.19697895
5 1 0.0013335214 1.625161 0.29228693 1.335344 0.3562655 0.20678114
6 1 0.0031622777 1.553486 0.35841981 1.268865 0.3375925 0.15811488
7 1 0.0074989421 1.520943 0.39098164 1.228656 0.3112334 0.18041485
8 1 0.0177827941 1.615357 0.34694016 1.312344 0.3332564 0.18859652
9 1 0.0421696503 1.566762 0.39760813 1.268838 0.3108324 0.15463503
10 1 0.1000000000 1.468939 0.47570873 1.200686 0.3238952 0.14265828
11 3 0.0000000000 2.216031 0.27039545 1.713184 1.0657612 0.20153346
12 3 0.0001000000 2.502176 0.22990966 1.939308 1.3682944 0.16060924
13 3 0.0002371374 2.497017 0.25861974 1.872851 1.4050034 0.18604466
14 3 0.0005623413 2.418707 0.26935619 1.843016 1.4828924 0.19289591
15 3 0.0013335214 2.467291 0.23383688 1.933121 1.3214576 0.19803300
16 3 0.0031622777 2.620427 0.26424729 2.000634 1.6353961 0.18461965
17 3 0.0074989421 2.480239 0.32496514 1.862744 1.5163080 0.20994708
18 3 0.0177827941 2.016613 0.34306965 1.572395 0.8641634 0.20457992
19 3 0.0421696503 2.435774 0.29464173 1.889162 1.4064705 0.20460592
20 3 0.1000000000 2.215347 0.27015350 1.712746 0.8498539 0.19010541
21 5 0.0000000000 3.181954 0.22369014 2.500705 0.7813749 0.18899285
22 5 0.0001000000 3.151589 0.19836636 2.424363 0.9855112 0.17126421
23 5 0.0002371374 3.221840 0.17920061 2.446661 1.0908986 0.16021084
24 5 0.0005623413 3.141597 0.17289007 2.418582 1.0187761 0.17957810
25 5 0.0013335214 3.058769 0.22373827 2.390483 0.7242279 0.17934710
26 5 0.0031622777 3.241395 0.17188436 2.494768 0.7903091 0.16150863
27 5 0.0074989421 3.280037 0.17379038 2.528194 0.9288774 0.14866793
28 5 0.0177827941 3.318161 0.19129239 2.561806 0.9148724 0.17536518
29 5 0.0421696503 2.931991 0.21853604 2.326737 0.6943278 0.17732788
30 5 0.1000000000 2.847944 0.23438393 2.193826 0.8118014 0.15754998
31 7 0.0000000000 4.761861 0.15272273 3.556010 1.8783635 0.13970644
32 7 0.0001000000 4.217727 0.17124576 3.208297 1.5979752 0.17162747
33 7 0.0002371374 5.019670 0.16903521 3.615459 2.4943514 0.17737180
34 7 0.0005623413 4.663628 0.14662904 3.608764 2.2364215 0.14669209
35 7 0.0013335214 4.777947 0.15173493 3.427383 2.3229430 0.17006427
36 7 0.0031622777 4.901006 0.16376499 3.538207 1.8899257 0.14091284
37 7 0.0074989421 4.397003 0.15878872 3.332445 1.5052179 0.15084788
38 7 0.0177827941 4.647657 0.14426842 3.459640 1.8985653 0.14391218
39 7 0.0421696503 4.224630 0.16476217 3.147239 1.3169477 0.15715119
40 7 0.1000000000 3.869777 0.18585266 2.960576 1.8744326 0.17876724
41 9 0.0000000000 7.435039 0.12782070 5.290614 4.2093640 0.12040754
42 9 0.0001000000 7.934411 0.13991963 5.446565 4.6684130 0.15467852
43 9 0.0002371374 7.847353 0.11774350 5.474588 4.4193355 0.14571278
44 9 0.0005623413 7.316996 0.14038956 4.915736 3.5125211 0.13259268
45 9 0.0013335214 8.123370 0.14712322 5.529229 4.5892790 0.16824381
46 9 0.0031622777 7.812948 0.08307844 5.083599 3.9359874 0.11420496
47 9 0.0074989421 7.923966 0.11927073 5.535078 5.0497639 0.13995416
48 9 0.0177827941 8.128094 0.08967816 5.417294 4.1711337 0.09532866
49 9 0.0421696503 7.575739 0.12539824 5.103917 4.9689741 0.17214022
50 9 0.1000000000 7.753442 0.13652728 5.383150 5.4642225 0.14208246
51 11 0.0000000000 11.438572 0.12147154 7.416054 7.4052133 0.14146196
52 11 0.0001000000 12.767415 0.09308860 7.992963 8.4018954 0.09847411
53 11 0.0002371374 11.346876 0.12564468 6.898611 7.3036528 0.16467903
54 11 0.0005623413 11.680181 0.09284698 7.159124 9.1435708 0.10155799
55 11 0.0013335214 12.592482 0.11982919 8.365890 7.9005750 0.16057680
56 11 0.0031622777 11.797845 0.11031256 7.099793 8.6565466 0.15394595
57 11 0.0074989421 12.414732 0.09588333 7.618211 6.6484052 0.13221886
58 11 0.0177827941 10.484097 0.11973375 6.569796 6.6118033 0.14658405
59 11 0.0421696503 11.121429 0.12266810 6.646377 8.4096752 0.12365149
60 11 0.1000000000 13.072860 0.08436240 8.412216 8.8707459 0.10001150
61 13 0.0000000000 15.235380 0.10457622 9.142022 11.7882053 0.11564877
62 13 0.0001000000 15.313099 0.09484282 9.373113 10.4702751 0.14478900
63 13 0.0002371374 17.679435 0.12819601 10.107094 15.3872892 0.15139018
64 13 0.0005623413 16.052872 0.10614256 10.092961 11.9745972 0.12559984
65 13 0.0013335214 15.918112 0.07629482 9.733242 12.0144539 0.09896697
66 13 0.0031622777 16.301325 0.09162202 9.540227 13.3879571 0.10264684
67 13 0.0074989421 13.008434 0.13433133 7.968517 10.8634050 0.14810119
68 13 0.0177827941 16.044412 0.10717297 9.993763 10.4711079 0.13782335
69 13 0.0421696503 17.140067 0.07741993 9.793529 11.3576684 0.11826825
70 13 0.1000000000 13.080648 0.12675923 8.097826 7.8471717 0.14210575
71 15 0.0000000000 9.544475 0.18057144 5.926233 12.1056661 0.17848233
72 15 0.0001000000 9.272899 0.17365968 5.711893 9.3903405 0.16563614
73 15 0.0002371374 8.670968 0.18263543 5.681464 10.1853857 0.18450139
74 15 0.0005623413 7.538814 0.17066275 4.933647 10.2577054 0.17098396
75 15 0.0013335214 7.552275 0.18036350 5.270901 8.2457346 0.18175507
76 15 0.0031622777 9.579385 0.22288085 5.939304 13.2808035 0.20914132
77 15 0.0074989421 6.259005 0.24801214 4.358716 9.3235562 0.19774048
78 15 0.0177827941 6.622698 0.22047484 4.259485 7.2825948 0.19451275
79 15 0.0421696503 9.975412 0.13983293 6.667825 10.9097275 0.13315065
80 15 0.1000000000 7.699578 0.17366088 4.812874 7.7597001 0.17566573
81 17 0.0000000000 NaN NaN NaN NA NA
82 17 0.0001000000 NaN NaN NaN NA NA
83 17 0.0002371374 NaN NaN NaN NA NA
84 17 0.0005623413 NaN NaN NaN NA NA
85 17 0.0013335214 NaN NaN NaN NA NA
86 17 0.0031622777 NaN NaN NaN NA NA
87 17 0.0074989421 NaN NaN NaN NA NA
88 17 0.0177827941 NaN NaN NaN NA NA
89 17 0.0421696503 NaN NaN NaN NA NA
90 17 0.1000000000 NaN NaN NaN NA NA
91 19 0.0000000000 NaN NaN NaN NA NA
92 19 0.0001000000 NaN NaN NaN NA NA
93 19 0.0002371374 NaN NaN NaN NA NA
94 19 0.0005623413 NaN NaN NaN NA NA
95 19 0.0013335214 NaN NaN NaN NA NA
96 19 0.0031622777 NaN NaN NaN NA NA
97 19 0.0074989421 NaN NaN NaN NA NA
98 19 0.0177827941 NaN NaN NaN NA NA
99 19 0.0421696503 NaN NaN NaN NA NA
100 19 0.1000000000 NaN NaN NaN NA NA
MAESD
1 0.2456481
2 0.2503160
3 0.2783087
4 0.2591465
5 0.2893763
6 0.2841276
7 0.2707129
8 0.2872509
9 0.2590597
10 0.2664049
11 0.7456361
12 1.0114718
13 0.9404681
14 1.0273395
15 1.0294414
16 1.0950296
17 1.0667873
18 0.6831634
19 1.0903657
20 0.6034777
21 0.6443980
22 0.7157954
23 0.6983542
24 0.7299129
25 0.5483201
26 0.5976007
27 0.6918177
28 0.6927003
29 0.5446861
30 0.4920005
31 1.2322731
32 1.1278759
33 1.4752104
34 1.5599947
35 1.2887658
36 1.2056551
37 1.1230219
38 1.2711847
39 0.8249666
40 1.3827140
41 2.6763245
42 3.1994032
43 2.8117213
44 1.9651988
45 2.6218379
46 2.0891668
47 3.0793289
48 2.2508044
49 2.6389294
50 3.4318402
51 4.4840503
52 4.7376594
53 4.0210889
54 4.3163143
55 5.0798476
56 4.4045719
57 3.4501614
58 3.8479727
59 4.3242596
60 5.4803537
61 6.1440850
62 5.5563733
63 6.9778219
64 6.4001374
65 6.0115620
66 7.0465017
67 6.2251570
68 5.6325553
69 5.4445065
70 4.1528617
71 6.4769801
72 5.0579974
73 5.8455663
74 5.8421825
75 5.2315487
76 7.0256517
77 6.0167370
78 3.7669302
79 6.8067854
80 4.2497187
81 NA
82 NA
83 NA
84 NA
85 NA
86 NA
87 NA
88 NA
89 NA
90 NA
91 NA
92 NA
93 NA
94 NA
95 NA
96 NA
97 NA
98 NA
99 NA
100 NA
And now let’s see the best tuning combination of size (hidden units) and decay (regularization)
nnetFit_75$bestTune size decay
10 1 0.1
(a) Which nonlinear regression model gives the optimal resampling and test set performance?
model_comparison <- data.frame(
Model = c("KNN", "MARS", "SVM", "Neural Network"),
RMSE = c(0.9759, 1.0615, 0.9264, 1.6384),
Rsquared = c(0.7823, 0.6453, 0.7181, 0.3096),
MAE = c(0.7466, 0.8248, 0.7106, 1.2340)
)
model_comparison |> mutate(across(where(is.numeric), round, 3))Warning: There was 1 warning in `mutate()`.
ℹ In argument: `across(where(is.numeric), round, 3)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.
# Previously
across(a:b, mean, na.rm = TRUE)
# Now
across(a:b, \(x) mean(x, na.rm = TRUE))
Model RMSE Rsquared MAE
1 KNN 0.976 0.782 0.747
2 MARS 1.062 0.645 0.825
3 SVM 0.926 0.718 0.711
4 Neural Network 1.638 0.310 1.234
The nonlinear regression model which gives the optimal resampling and test set performance is KNN (k-Nearest Neighbors). It has the lowest RMSE (most accurate predictions on average), highest r-squared (explains the most variance in Yield), and low MAE despite not the lowest MAE (that was SVM). Neural Network, which by far took the longest to run, also performed the worst (much higher RMSE and MAE and low R-squared, so it explains not much variance in the data). The Neural Network model likely suffered from instability during training which we saw with a lot of warnings that we suppressed in the .Rmd file and the length of time it took to train. SVM performed better than MARS, indicative that SVM is more flexible, and MARS performed decently well but not good enough for consideration as the optimal resampling and test set performance.
(b) Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?
For KNN, the below shows us the most important predictors in the KNN model. Manufacturing process variables dominate the top of the list, representing the top 4, which is similar to what we saw in the linear regression model results in 6.3. In fact, the same two variables are at the top for KNN as were for the optimal linear model: ManufacturingProcess32 and ManufacturingProcess13. The #3 and #4 spot for variable imporance are reversed in this KNN variable importance plot: #3 here is ManufacturingProcess17 followed by ManufacturingProcess09 at #4, and in the optimal linear model these were reversed in order. What’s different is that, while ManufacturingProcesses still dominate the very top, in this optimal nonlinear regression model we see a biological process at #5 importance, while with the optimal linear regression model we didn’t see a biological variable until feature #8 in terms of performance. This shift may reflect the nonlinear relationships KNN captures that linear regression does not — indicating biological variables may have more complex or “local” effects that were not well captured in the linear model.
knnImp_75 <- varImp(knnFit_75, scale = FALSE)
plot(knnImp_75, top = 20)(c) Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?
For part (c), I examined the predictors that were highly ranked in the KNN model but did not appear in the top 10 of the optimal linear model from Exercise 6.3.
One example is BiologicalMaterial06, which was ranked #5 in the KNN model but was #11 in importance in the optimal linear regression model. Additionally, BiologicalMaterial03 is #7 in the KNN model but was #14 in importance in the optimal linear regression model. Then we also have ManufacturingProcess31 which is #8 in the KNN model but appeared nowhere in the top 20 important features in the optimal linear regression model.
unique_knn_vars <- c("BiologicalMaterial06", "BiologicalMaterial03",
"ManufacturingProcess31")
for (var in unique_knn_vars) {
print(
ggplot(cmp_train, aes_string(x = var, y = "Yield")) +
geom_point(alpha = 0.6) +
geom_smooth(method = "loess", se = FALSE, color = "goldenrod") +
labs(title = paste("Yield vs.", var))
)
}`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
When plotting Yield vs BiologicalMaterial06, the relationship appeared nonlinear and slightly curved, suggesting a pattern that KNN could detect but linear model could not. For each of these variables we see a nonlinear relationship that was captured by KNN and makes sense it wasn’t captured in the optimal linear regression model. We can see these relationships are captured by KNN nonlinear regression modeling to capture local patterns. It makes sense that the curved ManufacturingProcess31 relationship was not captured at all in the top 20 variables important to the optimal linear regression model. Overall, these plots support the idea that some biological and process variables have nonlinear or localized effects on yield. This helps explain why a nonlinear model like KNN identified different important predictors compared to the linear approach.