Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:
\[
y = 10\sin(\pi x_1 x_2) + 20(x_3 - 0.5)^2 + 10x_4 + 5x_5 +
\mathcal{N}(0, \sigma^2)
\] When the \(x\) values are
random variables uniformly distributed between \([0,1]\), (there are also 5 other
non-informative variables also created in the simulation). The package
mlbench contains a function called
mlbench.friedman1. Which models appear to give the best
performance? Does MARS select the informative predictors (those named
x1-x5)?
set.seed(523)
# training data with 200 observations
training_data <- mlbench.friedman1(200, sd = 1)
training_data$x <- as.data.frame(training_data$x)
# large test set for a stable estimate of test performance
test_data <- mlbench.friedman1(5000, sd = 1)
test_data$x <- as.data.frame(test_data$x)
dim(training_data$x)
## [1] 200 10
head(training_data$x)
head(training_data$y)
## [1] 13.79226 14.26570 12.75271 10.42632 17.89828 13.88635
set.seed(711)
# KNN needs centering and scaling because distance matters
knn_model <- train(
x = training_data$x,
y = training_data$y,
method = "knn",
preProcess = c("center", "scale"),
tuneLength = 10
)
# make predictions on the test set
knn_pred <- predict(knn_model, newdata = test_data$x)
# evaluate test performance
knn_perf <- postResample(pred = knn_pred, obs = test_data$y)
knn_perf
## RMSE Rsquared MAE
## 3.067497 0.670777 2.447238
set.seed(711)
# MARS can automatically model nonlinear effects and interactions
mars_model <- train(
x = training_data$x,
y = training_data$y,
method = "earth",
tuneLength = 10
)
mars_model
## Multivariate Adaptive Regression Spline
##
## 200 samples
## 10 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## nprune RMSE Rsquared MAE
## 2 4.228762 0.3565304 3.457980
## 3 3.799064 0.4876387 3.036414
## 4 3.429216 0.5941799 2.716858
## 6 2.691678 0.7417994 2.131112
## 7 2.266327 0.8147079 1.765858
## 9 1.913156 0.8708335 1.505295
## 10 1.852916 0.8786086 1.449118
## 12 1.833128 0.8818429 1.412838
## 13 1.864663 0.8771211 1.432693
## 15 1.880327 0.8757602 1.453217
##
## Tuning parameter 'degree' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 12 and degree = 1.
mars_pred <- predict(mars_model, newdata = test_data$x)
mars_perf <- postResample(pred = mars_pred, obs = test_data$y)
mars_perf
## RMSE Rsquared MAE
## 1.8549447 0.8657453 1.4339010
set.seed(711)
# SVM also benefits from centering and scaling
svm_model <- train(
x = training_data$x,
y = training_data$y,
method = "svmRadial",
preProcess = c("center", "scale"),
tuneLength = 8
)
svm_model
## Support Vector Machines with Radial Basis Function Kernel
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 2.943605 0.7343084 2.336887
## 0.50 2.652884 0.7600243 2.121254
## 1.00 2.448094 0.7880937 1.955265
## 2.00 2.286777 0.8130825 1.817966
## 4.00 2.226374 0.8229231 1.776091
## 8.00 2.231532 0.8224507 1.786547
## 16.00 2.232409 0.8221225 1.785002
## 32.00 2.232409 0.8221225 1.785002
##
## Tuning parameter 'sigma' was held constant at a value of 0.06468623
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06468623 and C = 4.
svm_pred <- predict(svm_model, newdata = test_data$x)
svm_perf <- postResample(pred = svm_pred, obs = test_data$y)
svm_perf
## RMSE Rsquared MAE
## 2.0063671 0.8423032 1.5450528
perf_compare <- data.frame(
Model = c("KNN", "MARS", "SVM Radial"),
RMSE = c(knn_perf["RMSE"], mars_perf["RMSE"], svm_perf["RMSE"]),
Rsquared = c(knn_perf["Rsquared"], mars_perf["Rsquared"], svm_perf["Rsquared"]),
MAE = c(knn_perf["MAE"], mars_perf["MAE"], svm_perf["MAE"])
)
kable(perf_compare, digits = 3, caption = "Exercise 7.2 Test Set Performance")
| Model | RMSE | Rsquared | MAE |
|---|---|---|---|
| KNN | 3.067 | 0.671 | 2.447 |
| MARS | 1.855 | 0.866 | 1.434 |
| SVM Radial | 2.006 | 0.842 | 1.545 |
Based on the test-set results, MARS performed best among the three nonlinear models. It achieved the lowest RMSE (1.855), the highest \(R^2\) (0.866), and the lowest MAE (1.434). The radial SVM model ranked second, with RMSE = 2.006, \(R^2 = 0.842\), and MAE = 1.545, while KNN gave the weakest performance, with RMSE = 3.067, \(R^2 = 0.671\), and MAE = 2.447.
These results suggest that MARS was best able to recover the nonlinear structure in the simulated Friedman data. This is reasonable because the response was generated from a smooth nonlinear function, and MARS is designed to capture this type of pattern effectively.
# variable importance helps us see which predictors MARS used most
mars_imp <- varImp(mars_model)$importance %>%
rownames_to_column("Predictor") %>%
arrange(desc(Overall))
kable(mars_imp, digits = 2, caption = "Exercise 7.2 MARS Variable Importance")
| Predictor | Overall |
|---|---|
| V4 | 100.00 |
| V1 | 63.17 |
| V2 | 41.56 |
| V5 | 21.79 |
| V3 | 0.00 |
# refit the final MARS model to inspect which variables were used
mars_fit <- earth(
x = training_data$x,
y = training_data$y,
nprune = mars_model$bestTune$nprune,
degree = mars_model$bestTune$degree
)
summary(mars_fit)
## Call: earth(x=training_data$x, y=training_data$y,
## degree=mars_model$bestTune$degree,
## nprune=mars_model$bestTune$nprune)
##
## coefficients
## (Intercept) 29.406789
## h(0.413077-V1) -13.606744
## h(V1-0.413077) 5.596838
## h(V1-0.783373) -17.457891
## h(0.660692-V2) -9.588465
## h(0.488376-V3) 11.939197
## h(V3-0.488376) 11.232006
## h(V4-0.127252) -11.766258
## h(0.943702-V4) -21.938402
## h(0.695462-V5) -5.156652
## h(V5-0.695462) 8.301894
##
## Selected 11 of 18 terms, and 5 of 10 predictors (nprune=12)
## Termination condition: Reached nk 21
## Importance: V4, V1, V2, V5, V3, V6-unused, V7-unused, V8-unused, V9-unused, ...
## Number of terms at each degree of interaction: 1 10 (additive model)
## GCV 2.798772 RSS 448.3773 GRSq 0.8989645 RSq 0.9182526
The MARS variable-importance output shows that the model focused on V4, V1, V2, V5, and V3. The final model summary states that 5 of 10 predictors were used, and the unused variables were V6 through V10. This means that MARS successfully identified the truly informative predictors and ignored the five noise variables.
Although V3 had the smallest importance value, it still appeared in the final MARS model summary, so the model did select all five informative predictors overall.
Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.
data(ChemicalManufacturingProcess)
predictors <- ChemicalManufacturingProcess %>% select(-Yield)
yield <- ChemicalManufacturingProcess$Yield
data_overview <- data.frame(
Rows = nrow(predictors),
Predictors = ncol(predictors),
Yield_Mean = mean(yield),
Yield_SD = sd(yield)
)
kable(data_overview, digits = 2, caption = "Exercise 7.5 Data Overview")
| Rows | Predictors | Yield_Mean | Yield_SD |
|---|---|---|---|
| 176 | 57 | 40.18 | 1.85 |
set.seed(901)
training_rows <- createDataPartition(yield, p = 0.7, list = FALSE)
train_predictors <- predictors[training_rows, ]
train_yield <- yield[training_rows]
test_predictors <- predictors[-training_rows, ]
test_yield <- yield[-training_rows]
# use the same general preprocessing idea
pp <- preProcess(
train_predictors,
method = c("YeoJohnson", "center", "scale", "knnImpute")
)
pp_train_predictors <- predict(pp, train_predictors)
pp_test_predictors <- predict(pp, test_predictors)
# remove near-zero variance predictors
nzvpp <- nearZeroVar(pp_train_predictors)
if(length(nzvpp) > 0) {
pp_train_predictors <- pp_train_predictors[, -nzvpp]
pp_test_predictors <- pp_test_predictors[, -nzvpp]
}
# remove highly correlated predictors
predcorr <- cor(pp_train_predictors)
highCorrpp <- findCorrelation(predcorr)
if(length(highCorrpp) > 0) {
pp_train_predictors <- pp_train_predictors[, -highCorrpp]
pp_test_predictors <- pp_test_predictors[, -highCorrpp]
}
preprocess_overview <- data.frame(
Training_Rows = nrow(pp_train_predictors),
Training_Predictors = ncol(pp_train_predictors),
Test_Rows = nrow(pp_test_predictors),
Test_Predictors = ncol(pp_test_predictors)
)
kable(preprocess_overview, caption = "Exercise 7.5 Data After Preprocessing")
| Training_Rows | Training_Predictors | Test_Rows | Test_Predictors |
|---|---|---|---|
| 124 | 46 | 52 | 46 |
set.seed(901)
# bootstrap resampling
ctrl <- trainControl(method = "boot", number = 25)
set.seed(415)
mars_chem_grid <- expand.grid(
degree = 1:2,
nprune = 2:10
)
mars_chem_tune <- train(
x = pp_train_predictors,
y = train_yield,
method = "earth",
tuneGrid = mars_chem_grid,
trControl = ctrl
)
mars_chem_tune
## Multivariate Adaptive Regression Spline
##
## 124 samples
## 46 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 1.421239 0.3555088 1.1274650
## 1 3 1.317376 0.4776361 0.9971114
## 1 4 1.710509 0.4716973 1.0557262
## 1 5 1.782419 0.4954470 1.0612259
## 1 6 1.851577 0.4485230 1.1130193
## 1 7 3.106691 0.4233550 1.3307674
## 1 8 3.061793 0.4445200 1.2969232
## 1 9 4.435348 0.4029023 1.5229229
## 1 10 4.181450 0.4323397 1.4693020
## 2 2 1.414421 0.3625596 1.1218329
## 2 3 1.307061 0.4503190 1.0339823
## 2 4 1.410912 0.4692309 1.0426146
## 2 5 1.422248 0.4713860 1.0506640
## 2 6 1.500273 0.4369123 1.0979399
## 2 7 1.662403 0.4365046 1.1417185
## 2 8 1.737990 0.4215994 1.1918019
## 2 9 1.729978 0.4319843 1.1980618
## 2 10 1.782667 0.4261091 1.2146607
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 3 and degree = 2.
set.seed(415)
psvm_tune_grid <- expand.grid(
degree = c(1, 2),
scale = c(0.25, 0.5, 1),
C = c(0.01, 0.05, 0.1)
)
psvm_chem_tune <- train(
x = pp_train_predictors,
y = train_yield,
method = "svmPoly",
trControl = ctrl,
tuneGrid = psvm_tune_grid
)
psvm_chem_tune
## Support Vector Machines with Polynomial Kernel
##
## 124 samples
## 46 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ...
## Resampling results across tuning parameters:
##
## degree scale C RMSE Rsquared MAE
## 1 0.25 0.01 1.606975 0.3404999 1.172610
## 1 0.25 0.05 2.131784 0.3240448 1.220070
## 1 0.25 0.10 2.655741 0.2941084 1.329853
## 1 0.50 0.01 1.733798 0.3400106 1.153279
## 1 0.50 0.05 2.655704 0.2941243 1.329840
## 1 0.50 0.10 3.152961 0.2736269 1.434653
## 1 1.00 0.01 1.991684 0.3327148 1.192845
## 1 1.00 0.05 3.152936 0.2736284 1.434646
## 1 1.00 0.10 3.855858 0.2607261 1.572457
## 2 0.25 0.01 7.590887 0.2767245 2.062433
## 2 0.25 0.05 8.029920 0.2274698 2.159394
## 2 0.25 0.10 8.000849 0.2218754 2.165128
## 2 0.50 0.01 9.688751 0.2067014 2.452452
## 2 0.50 0.05 9.711800 0.1885894 2.479941
## 2 0.50 0.10 9.711800 0.1885894 2.479941
## 2 1.00 0.01 11.245890 0.1704498 2.768059
## 2 1.00 0.05 11.245890 0.1704498 2.768059
## 2 1.00 0.10 11.245890 0.1704498 2.768059
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were degree = 1, scale = 0.25 and C = 0.01.
set.seed(415)
knn_chem_tune <- train(
x = pp_train_predictors,
y = train_yield,
method = "knn",
tuneLength = 10,
trControl = ctrl
)
knn_chem_tune
## k-Nearest Neighbors
##
## 124 samples
## 46 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 1.405772 0.3745983 1.145204
## 7 1.376566 0.3880314 1.125607
## 9 1.345643 0.4161709 1.101996
## 11 1.339045 0.4229833 1.098540
## 13 1.343674 0.4205954 1.102025
## 15 1.343667 0.4250135 1.100131
## 17 1.351936 0.4207890 1.110264
## 19 1.353119 0.4219766 1.112926
## 21 1.362666 0.4192643 1.121974
## 23 1.371645 0.4146175 1.130539
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 11.
best_tune_75 <- bind_rows(
data.frame(Model = "MARS", mars_chem_tune$bestTune),
data.frame(Model = "SVM Poly", psvm_chem_tune$bestTune),
data.frame(Model = "KNN", knn_chem_tune$bestTune)
)
kable(best_tune_75, caption = "Exercise 7.5 Best Tuning Parameters")
| Model | nprune | degree | scale | C | k |
|---|---|---|---|---|---|
| MARS | 3 | 2 | NA | NA | NA |
| SVM Poly | NA | 1 | 0.25 | 0.01 | NA |
| KNN | NA | NA | NA | NA | 11 |
resamp <- resamples(list(
MARS = mars_chem_tune,
SVM_Poly = psvm_chem_tune,
KNN = knn_chem_tune
))
summary(resamp)
##
## Call:
## summary.resamples(object = resamp)
##
## Models: MARS, SVM_Poly, KNN
## Number of resamples: 25
##
## MAE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## MARS 0.7141849 0.9542644 1.030962 1.033982 1.108773 1.405970 0
## SVM_Poly 0.9438352 1.0896824 1.148821 1.172610 1.317141 1.414758 0
## KNN 0.8845607 0.9910457 1.112735 1.098540 1.190447 1.326746 0
##
## RMSE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## MARS 0.930888 1.228278 1.287889 1.307061 1.399731 1.768076 0
## SVM_Poly 1.144107 1.324773 1.501958 1.606975 1.775840 3.186983 0
## KNN 1.140694 1.226457 1.328108 1.339045 1.406592 1.572455 0
##
## Rsquared
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## MARS 0.21268136 0.3846315 0.4310200 0.4503190 0.5018176 0.7071742 0
## SVM_Poly 0.04309999 0.1939825 0.3431186 0.3404999 0.4660888 0.6265679 0
## KNN 0.25623654 0.3579855 0.4105657 0.4229833 0.4814752 0.6256223 0
bwplot(resamp, metric = "RMSE")
dotplot(resamp, metric = "Rsquared")
mars_pred <- predict(mars_chem_tune, newdata = pp_test_predictors)
svm_pred <- predict(psvm_chem_tune, newdata = pp_test_predictors)
knn_pred <- predict(knn_chem_tune, newdata = pp_test_predictors)
mars_perf <- postResample(mars_pred, test_yield)
svm_perf <- postResample(svm_pred, test_yield)
knn_perf <- postResample(knn_pred, test_yield)
chem_compare <- data.frame(
Model = c("MARS", "SVM Poly", "KNN"),
RMSE = c(mars_perf["RMSE"], svm_perf["RMSE"], knn_perf["RMSE"]),
Rsquared = c(mars_perf["Rsquared"], svm_perf["Rsquared"], knn_perf["Rsquared"]),
MAE = c(mars_perf["MAE"], svm_perf["MAE"], knn_perf["MAE"])
)
chem_compare_display <- chem_compare %>%
mutate(
RMSE = format(round(RMSE, 3), big.mark = ",", scientific = FALSE),
Rsquared = sprintf("%.3f", Rsquared),
MAE = format(round(MAE, 3), big.mark = ",", scientific = FALSE)
)
kable(chem_compare_display, caption = "Exercise 7.5 Test-Set Performance")
| Model | RMSE | Rsquared | MAE |
|---|---|---|---|
| MARS | 1.294 | 0.611 | 1.065 |
| SVM Poly | 22,487,073,217.711 | 0.001 | 3,118,395,982.684 |
| KNN | 1.580 | 0.403 | 1.297 |
Among the nonlinear models, MARS gave the best overall performance. In the resampling comparison, MARS had the lowest average RMSE and the highest average \(R^2\). On the test set, MARS also performed best, with RMSE = 1.294, \(R^2 = 0.611\), and MAE = 1.065.
KNN ranked second on the test set, with RMSE = 1.580, \(R^2 = 0.403\), and MAE = 1.297. In contrast, the polynomial SVM performed extremely poorly on the test set, with a very large RMSE and almost no explanatory power. Although its resampling results were not the worst, its test-set performance suggests poor generalization to new data.
Overall, MARS is the best nonlinear model for this dataset.
mars_imp <- varImp(mars_chem_tune)$importance %>%
rownames_to_column("Predictor") %>%
arrange(desc(Overall)) %>%
filter(Overall > 0 | row_number() <= 2)
kable(mars_imp, digits = 3, caption = "Key Predictors in the Final MARS Model")
| Predictor | Overall |
|---|---|
| ManufacturingProcess32 | 100 |
| ManufacturingProcess09 | 0 |
# top predictors from the optimal linear model
linear_top10 <- tibble(
Predictor = c(
"ManufacturingProcess32",
"ManufacturingProcess36",
"ManufacturingProcess13",
"ManufacturingProcess09",
"ManufacturingProcess17",
"BiologicalMaterial06",
"ManufacturingProcess33",
"BiologicalMaterial08",
"BiologicalMaterial01",
"BiologicalMaterial03"
),
Linear_Importance = c(
0.15328137,
0.12314752,
0.12221124,
0.11978856,
0.11768643,
0.10420115,
0.10329736,
0.09362826,
0.09014032,
0.08934011
),
Predictor_Type = c(
"Process", "Process", "Process", "Process", "Process",
"Biological", "Process", "Biological", "Biological", "Biological"
)
)
kable(linear_top10, digits = 3, caption = "Top 10 Predictors from the Optimal Linear Model")
| Predictor | Linear_Importance | Predictor_Type |
|---|---|---|
| ManufacturingProcess32 | 0.153 | Process |
| ManufacturingProcess36 | 0.123 | Process |
| ManufacturingProcess13 | 0.122 | Process |
| ManufacturingProcess09 | 0.120 | Process |
| ManufacturingProcess17 | 0.118 | Process |
| BiologicalMaterial06 | 0.104 | Biological |
| ManufacturingProcess33 | 0.103 | Process |
| BiologicalMaterial08 | 0.094 | Biological |
| BiologicalMaterial01 | 0.090 | Biological |
| BiologicalMaterial03 | 0.089 | Biological |
comparison_75 <- full_join(
mars_imp %>% rename(Nonlinear_Importance = Overall),
linear_top10,
by = "Predictor"
)
kable(comparison_75, digits = 3, caption = "Comparison of Nonlinear and Linear Model Predictors")
| Predictor | Nonlinear_Importance | Linear_Importance | Predictor_Type |
|---|---|---|---|
| ManufacturingProcess32 | 100 | 0.153 | Process |
| ManufacturingProcess09 | 0 | 0.120 | Process |
| ManufacturingProcess36 | NA | 0.123 | Process |
| ManufacturingProcess13 | NA | 0.122 | Process |
| ManufacturingProcess17 | NA | 0.118 | Process |
| BiologicalMaterial06 | NA | 0.104 | Biological |
| ManufacturingProcess33 | NA | 0.103 | Process |
| BiologicalMaterial08 | NA | 0.094 | Biological |
| BiologicalMaterial01 | NA | 0.090 | Biological |
| BiologicalMaterial03 | NA | 0.089 | Biological |
linear_type_summary <- linear_top10 %>%
count(Predictor_Type)
kable(linear_type_summary, caption = "Variable Types in the Top 10 Linear Predictors")
| Predictor_Type | n |
|---|---|
| Biological | 4 |
| Process | 6 |
The variable-importance output indicates that ManufacturingProcess32 is the dominant predictor in the final MARS model. ManufacturingProcess09 also appears in the final model, although its reported importance is much smaller. Because both retained predictors are manufacturing process variables, the final nonlinear model is clearly dominated by process variables.
Compared with the optimal linear model from Exercise 6.3, there is clear overlap. The optimal linear model was a PLS model, and its top predictors included ManufacturingProcess32, ManufacturingProcess36, ManufacturingProcess13, ManufacturingProcess09, ManufacturingProcess17, BiologicalMaterial06, ManufacturingProcess33, BiologicalMaterial08, BiologicalMaterial01, and BiologicalMaterial03. Among those top 10 linear predictors, 6 were process variables and 4 were biological variables, and the first five predictors in the linear ranking were all process variables.
This means that the nonlinear model and the linear model agree on the importance of ManufacturingProcess32 and ManufacturingProcess09. The nonlinear MARS model is more compact because it retains only a small number of predictors, while the linear PLS model spreads importance across a broader group of variables. Even so, both models suggest that process variables are the main drivers of yield.
top2 <- mars_imp$Predictor[1:2]
top2
## [1] "ManufacturingProcess32" "ManufacturingProcess09"
plot_data <- data.frame(
x1 = pp_train_predictors[[top2[1]]],
x2 = pp_train_predictors[[top2[2]]],
Yield = train_yield
)
ggplot(plot_data, aes(x = x1, y = Yield)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "loess", se = FALSE) +
labs(title = paste("Yield vs", top2[1]), x = top2[1], y = "Yield")
ggplot(plot_data, aes(x = x2, y = Yield)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "loess", se = FALSE) +
labs(title = paste("Yield vs", top2[2]), x = top2[2], y = "Yield")
The scatterplots and LOESS smooth curves show that both ManufacturingProcess32 and ManufacturingProcess09 are positively associated with yield overall, although the patterns are not identical. For ManufacturingProcess32, the relationship appears mildly nonlinear: yield is lower in the middle-left region, rises as the predictor increases, and then levels off slightly at the high end. For ManufacturingProcess09, the relationship looks more clearly increasing, with yield tending to rise as the predictor increases.
These plots suggest that the relationship between important process predictors and yield is not perfectly linear, especially for ManufacturingProcess32, where the smooth curve shows noticeable curvature. This helps explain why a nonlinear model such as MARS performed well in this comparison.
At the same time, these two predictors are not unique to the nonlinear model, because both ManufacturingProcess32 and ManufacturingProcess09 were also among the top predictors in the optimal linear model from Exercise 6.3. Therefore, the plots do not show a completely different set of nonlinear-only drivers. Instead, they show that the nonlinear model places stronger emphasis on two process predictors that were already important in the linear analysis.