Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data: \[ y = 10 \sin(\pi x_1 x_2) + 20 (x_3 - 0.5)^2 + 10 x_4 + 5 x_5 + N(0, \sigma^2) \] where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:
library(caret)
## Warning: package 'caret' was built under R version 4.4.2
## Loading required package: ggplot2
## Loading required package: lattice
library(earth)
## Warning: package 'earth' was built under R version 4.4.3
## Loading required package: Formula
## Loading required package: plotmo
## Warning: package 'plotmo' was built under R version 4.4.3
## Loading required package: plotrix
library(mlbench)
## Warning: package 'mlbench' was built under R version 4.4.3
library(e1071)
library(nnet)
set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
featurePlot(trainingData$x, trainingData$y)
## or other methods.
## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)
Model 1: k-Nearest Neighbors (kNN)
set.seed(200)
knnModel <- train(
x = trainingData$x, y = trainingData$y,
method = "knn",
preProcess = c("center", "scale"),
tuneLength = 10
)
# Predict and evaluate on test set
knnPred <- predict(knnModel, testData$x)
knnPerf <- postResample(pred = knnPred, obs = testData$y)
knnModel
## k-Nearest Neighbors
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 3.654912 0.4779838 2.958475
## 7 3.529432 0.5118581 2.861742
## 9 3.446330 0.5425096 2.780756
## 11 3.378049 0.5723793 2.719410
## 13 3.332339 0.5953773 2.692863
## 15 3.309235 0.6111389 2.663046
## 17 3.317408 0.6201421 2.678898
## 19 3.311667 0.6333800 2.682098
## 21 3.316340 0.6407537 2.688887
## 23 3.326040 0.6491480 2.705915
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 15.
knnPerf
## RMSE Rsquared MAE
## 3.1750657 0.6785946 2.5443169
Model 2: Multivariate Adaptive Regression Splines (MARS)
marsModel <- earth(trainingData$x, trainingData$y)
marsPred <- predict(marsModel, testData$x)
marsPerf <- postResample(pred = marsPred, obs = testData$y)
summary(marsModel)
## Call: earth(x=trainingData$x, y=trainingData$y)
##
## coefficients
## (Intercept) 18.451984
## h(0.621722-X1) -11.074396
## h(0.601063-X2) -10.744225
## h(X3-0.281766) 20.607853
## h(0.447442-X3) 17.880232
## h(X3-0.447442) -23.282007
## h(X3-0.636458) 15.150350
## h(0.734892-X4) -10.027487
## h(X4-0.734892) 9.092045
## h(0.850094-X5) -4.723407
## h(X5-0.850094) 10.832932
## h(X6-0.361791) -1.956821
##
## Selected 12 of 18 terms, and 6 of 10 predictors
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 2.540556 RSS 397.9654 GRSq 0.8968524 RSq 0.9183982
marsPerf
## RMSE Rsquared MAE
## 1.8136467 0.8677298 1.3911836
summary(marsModel)
## Call: earth(x=trainingData$x, y=trainingData$y)
##
## coefficients
## (Intercept) 18.451984
## h(0.621722-X1) -11.074396
## h(0.601063-X2) -10.744225
## h(X3-0.281766) 20.607853
## h(0.447442-X3) 17.880232
## h(X3-0.447442) -23.282007
## h(X3-0.636458) 15.150350
## h(0.734892-X4) -10.027487
## h(X4-0.734892) 9.092045
## h(0.850094-X5) -4.723407
## h(X5-0.850094) 10.832932
## h(X6-0.361791) -1.956821
##
## Selected 12 of 18 terms, and 6 of 10 predictors
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 2.540556 RSS 397.9654 GRSq 0.8968524 RSq 0.9183982
Model 3: Support Vector Machine (RBF kernel)
set.seed(200)
svmModel <- train(
x = trainingData$x, y = trainingData$y,
method = "svmRadial",
preProcess = c("center", "scale"),
tuneLength = 14,
trControl = trainControl(method = "cv")
)
svmPred <- predict(svmModel, testData$x)
svmPerf <- postResample(pred = svmPred, obs = testData$y)
svmModel
## Support Vector Machines with Radial Basis Function Kernel
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 2.525164 0.7810576 2.010680
## 0.50 2.270567 0.7944850 1.794902
## 1.00 2.099356 0.8155574 1.659376
## 2.00 2.005858 0.8302852 1.578799
## 4.00 1.934650 0.8435677 1.528373
## 8.00 1.915665 0.8475605 1.528648
## 16.00 1.923914 0.8463074 1.535991
## 32.00 1.923914 0.8463074 1.535991
## 64.00 1.923914 0.8463074 1.535991
## 128.00 1.923914 0.8463074 1.535991
## 256.00 1.923914 0.8463074 1.535991
## 512.00 1.923914 0.8463074 1.535991
## 1024.00 1.923914 0.8463074 1.535991
## 2048.00 1.923914 0.8463074 1.535991
##
## Tuning parameter 'sigma' was held constant at a value of 0.06299324
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06299324 and C = 8.
svmPerf
## RMSE Rsquared MAE
## 2.0541197 0.8290353 1.5586411
Model 4: Bayesian Regularized Neural Network (brnn)
# install.packages("brnn")
library(brnn)
## Warning: package 'brnn' was built under R version 4.4.3
## Loading required package: truncnorm
## Warning: package 'truncnorm' was built under R version 4.4.3
set.seed(200)
brnnModel <- train(
x = trainingData$x, y = trainingData$y,
method = "brnn",
tuneLength = 2,
trControl = trainControl(method = "cv")
)
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.431 alpha= 3.7963 beta= 13.2473
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.6038 alpha= 1.8541 beta= 20.8575
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.4268 alpha= 3.2258 beta= 12.6518
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.5155 alpha= 2.0396 beta= 19.2694
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.4276 alpha= 3.4891 beta= 14.9288
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.6846 alpha= 1.8075 beta= 24.6929
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.4364 alpha= 3.7622 beta= 13.6761
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.6031 alpha= 1.9279 beta= 21.3088
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.4478 alpha= 3.8207 beta= 13.5228
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.591 alpha= 1.9244 beta= 20.9807
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.4275 alpha= 3.8611 beta= 12.1482
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.4857 alpha= 1.7943 beta= 17.8262
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.3978 alpha= 3.7528 beta= 12.8543
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.5352 alpha= 1.9119 beta= 19.751
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.4079 alpha= 3.6747 beta= 13.2521
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.668 alpha= 1.7648 beta= 22.1225
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.4268 alpha= 3.6426 beta= 14.2758
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.6337 alpha= 2.0049 beta= 22.6481
## Number of parameters (weights and biases) to estimate: 12
## Nguyen-Widrow method
## Scaling factor= 0.7
## gamma= 11.3897 alpha= 3.6907 beta= 13.6066
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7027008
## gamma= 22.5906 alpha= 2.0089 beta= 21.6361
## Number of parameters (weights and biases) to estimate: 24
## Nguyen-Widrow method
## Scaling factor= 0.7024302
## gamma= 22.68 alpha= 1.9052 beta= 21.4949
brnnPred <- predict(brnnModel, testData$x)
brnnPerf <- postResample(pred = brnnPred, obs = testData$y)
brnnModel
## Bayesian Regularized Neural Networks
##
## 200 samples
## 10 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## neurons RMSE Rsquared MAE
## 1 2.407648 0.7593635 1.936012
## 2 1.959947 0.8290583 1.559422
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was neurons = 2.
brnnPerf
## RMSE Rsquared MAE
## 2.293683 0.790615 1.699080
Model 5: Random Forest
set.seed(200)
rfModel <- train(
x = trainingData$x, y = trainingData$y,
method = "rf",
tuneLength = 5,
trControl = trainControl(method = "cv")
)
rfPred <- predict(rfModel, testData$x)
rfPerf <- postResample(pred = rfPred, obs = testData$y)
rfModel
## Random Forest
##
## 200 samples
## 10 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 2.824419 0.8150845 2.332915
## 4 2.473928 0.8167031 2.044099
## 6 2.409182 0.8032251 2.001063
## 8 2.386901 0.7978693 1.965533
## 10 2.389432 0.7908445 1.953503
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 8.
rfPerf
## RMSE Rsquared MAE
## 2.4218795 0.7900169 1.9134177
# Combine results into a data frame
results <- data.frame(
Model = c("kNN", "MARS", "SVM (RBF)", "BRNN", "Random Forest"),
RMSE = c(knnPerf["RMSE"], marsPerf["RMSE"], svmPerf["RMSE"], brnnPerf["RMSE"], rfPerf["RMSE"]),
Rsquared = c(knnPerf["Rsquared"], marsPerf["Rsquared"], svmPerf["Rsquared"], brnnPerf["Rsquared"], rfPerf["Rsquared"]),
MAE = c(knnPerf["MAE"], marsPerf["MAE"], svmPerf["MAE"], brnnPerf["MAE"], rfPerf["MAE"])
)
# View performance table
print(results)
## Model RMSE Rsquared MAE
## 1 kNN 3.175066 0.6785946 2.544317
## 2 MARS 1.813647 0.8677298 1.391184
## 3 SVM (RBF) 2.054120 0.8290353 1.558641
## 4 BRNN 2.293683 0.7906150 1.699080
## 5 Random Forest 2.421879 0.7900169 1.913418
# Optional: visualize performance
library(ggplot2)
results_long <- reshape2::melt(results, id.vars = "Model")
ggplot(results_long, aes(x = reorder(Model, value), y = value, fill = variable)) +
geom_col(position = "dodge") +
facet_wrap(~ variable, scales = "free") +
coord_flip() +
theme_minimal() +
labs(
title = "Model Performance Comparison",
x = "Model", y = "Metric Value"
)
### Model-by-Model Analysis 1. k-Nearest Neighbors (kNN):
The kNN model, while simple and intuitive, had the weakest performance among all the models tested. It produced the highest RMSE (3.18) and MAE (2.54), and the lowest R² value (0.68), indicating that it struggled to capture the nonlinear structure of the data. This result is expected because kNN is highly sensitive to irrelevant or noisy features, and in the Friedman dataset, only the first five out of ten predictors are informative. Without a strong mechanism for feature selection or weighting, kNN treats all features equally, which likely diluted its predictive power. Additionally, kNN tends to perform poorly in high-dimensional spaces unless feature engineering or dimensionality reduction is applied beforehand.
MARS clearly outperformed all other models, achieving the lowest RMSE (1.81), lowest MAE (1.39), and highest R² (0.87). This demonstrates that it was best at approximating the complex, nonlinear relationships in the data. MARS works by fitting piecewise linear regressions and automatically selecting both relevant predictors and interaction terms, which aligns perfectly with the known structure of the Friedman function. Since only the first five variables truly contribute to the outcome, MARS’s embedded feature selection allowed it to ignore the irrelevant ones and focus its capacity on modeling the important relationships. Moreover, the model’s interpretability and its ability to visualize variable importance make it a strong candidate for both performance and explainability.
SVM with an RBF kernel also performed well, with an RMSE of 2.05, MAE of 1.56, and R² of 0.83. This model is particularly adept at handling nonlinear relationships due to the RBF kernel, which maps input features into a higher-dimensional space where a linear separator can be found. While SVM did not match MARS in accuracy, it still captured the underlying structure reasonably well. The slight underperformance compared to MARS may be attributed to its sensitivity to tuning parameters like cost and gamma, or its inability to directly perform feature selection. Nevertheless, SVM remained a strong contender in terms of overall performance and generalization to unseen data.
The BRNN model showed moderately strong results, with an RMSE of 2.29, MAE of 1.70, and R² of 0.79. Bayesian regularization helps prevent overfitting by controlling the model complexity through prior distributions on the weights. Although this helped the neural net generalize better than traditional backpropagation networks, its performance was still somewhat behind MARS and SVM. This may be due to the relatively small training set (n = 200), which can limit the potential of neural networks to learn complex functions unless paired with substantial regularization and tuning. Despite this, BRNN still demonstrated the ability to approximate nonlinear patterns better than kNN or Random Forest in this specific context.
Random Forest, surprisingly, did not perform as well as expected. It had an RMSE of 2.42, an MAE of 1.91, and an R² of 0.79 — close to BRNN but noticeably worse than MARS and SVM. Random Forests are generally robust and handle noisy and irrelevant features better than many models. However, they can sometimes struggle when the signal-to-noise ratio is low or when too many trees average out important non-additive effects. One possibility here is that the model was under-tuned or that the trees overfit to less informative variables, leading to poorer generalization. While Random Forests are often a safe and strong baseline, in this case, it appears they did not capture the Friedman function as effectively as other models
CONCLUSION Based on the model comparison, Multivariate Adaptive Regression Splines (MARS) is the clear winner for this task. It not only delivered the most accurate predictions but also did so with a high level of model interpretability. MARS excelled because it aligns well with the structure of the Friedman dataset, where only a subset of features drive the response, and relationships are nonlinear and additive. SVM with RBF kernel also performed admirably, showing strong generalization capabilities and relatively low error, making it a good alternative in scenarios where interpretability is less critical.
On the other hand, kNN’s poor performance highlights the importance of accounting for feature relevance and dataset structure when choosing a model. Its equal treatment of all variables caused it to be misled by irrelevant features. BRNN and Random Forests fell into the middle tier of performance, both showing adequate modeling ability but falling short of the leading contenders, possibly due to insufficient tuning, data size, or sensitivity to feature noise.
Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.
# Load packages
library(AppliedPredictiveModeling)
## Warning: package 'AppliedPredictiveModeling' was built under R version 4.4.3
library(caret)
library(earth)
library(e1071)
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.4.2
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
library(gbm)
## Warning: package 'gbm' was built under R version 4.4.3
## Loaded gbm 2.2.2
## This version of gbm is no longer under development. Consider transitioning to gbm3, https://github.com/gbm-developers/gbm3
library(nnet)
# Load dataset
data(ChemicalManufacturingProcess)
df <- ChemicalManufacturingProcess
# Impute missing values
for (col in names(df)) {
df[[col]][is.na(df[[col]])] <- median(df[[col]], na.rm = TRUE)
}
# Split data into train (75%) and test (25%)
set.seed(123)
split <- createDataPartition(df$Yield, p = 0.75, list = FALSE)
train <- df[split, ]
test <- df[-split, ]
x_train <- train[, -1]
y_train <- train$Yield
x_test <- test[, -1]
y_test <- test$Yield
# Set up training control
ctrl <- trainControl(method = "cv", number = 10)
# Model 1: k-Nearest Neighbors
set.seed(123)
knn_model <- train(x_train, y_train, method = "knn",
preProcess = c("center", "scale"),
tuneLength = 10, trControl = ctrl)
knn_pred <- predict(knn_model, x_test)
knn_perf <- postResample(knn_pred, y_test)
# Model 2: MARS
mars_model <- earth(x_train, y_train)
mars_pred <- predict(mars_model, x_test)
mars_perf <- postResample(mars_pred, y_test)
# Model 3: SVM (Radial)
set.seed(123)
svm_model <- train(x_train, y_train, method = "svmRadial",
preProcess = c("center", "scale"),
tuneLength = 10, trControl = ctrl)
svm_pred <- predict(svm_model, x_test)
svm_perf <- postResample(svm_pred, y_test)
# Model 4: Random Forest
set.seed(123)
rf_model <- train(x_train, y_train, method = "rf",
importance = TRUE, tuneLength = 5, trControl = ctrl)
rf_pred <- predict(rf_model, x_test)
rf_perf <- postResample(rf_pred, y_test)
# Model 5: Gradient Boosting Machine
set.seed(123)
gbm_model <- train(x_train, y_train, method = "gbm",
verbose = FALSE, tuneLength = 5, trControl = ctrl)
gbm_pred <- predict(gbm_model, x_test)
gbm_perf <- postResample(gbm_pred, y_test)
# Compare model performance
model_results <- rbind(
KNN = knn_perf,
MARS = mars_perf,
SVM = svm_perf,
RandomForest = rf_perf,
GBM = gbm_perf
)
print(model_results)
## RMSE Rsquared MAE
## KNN 1.370642 0.3934243 1.0947652
## MARS 1.221511 0.5774516 0.9370294
## SVM 1.121974 0.5882149 0.8789167
## RandomForest 1.210482 0.5215893 0.8796533
## GBM 1.145438 0.5758184 0.8532827
The optimal model based on the test set performance is the one with the lowest RMSE and highest Rsquared. In this case: * SVM has the lowest RMSE (1.121974) and the highest Rsquared (0.5882149). * GBM has a slightly higher RMSE but the lowest MAE (0.8532827).
So, SVM (Radial) is the optimal nonlinear model based on lowest RMSE and highest R². However, SVMs do not have built-in variable importance. So, we’ll get variable importance from the second-best nonlinear model (GBM), which provides it directly.
The code to extract and plot variable importance from the GBM model:
# Variable importance from GBM model
gbm_imp <- varImp(gbm_model, scale = FALSE)
# Plot top 20 important predictors
plot(gbm_imp, top = 20, main = "Top 20 Important Predictors (GBM)")
Now classify the top predictors as biological or process:
# Identify top 10 important predictors
top10_vars <- rownames(gbm_imp$importance)[order(gbm_imp$importance$Overall, decreasing = TRUE)][1:10]
# Count how many are biological vs process variables
bio_count <- sum(grepl("^BiologicalMaterial", top10_vars))
proc_count <- sum(grepl("^ManufacturingProcess", top10_vars))
cat("Biological predictors in top 10:", bio_count, "\n")
## Biological predictors in top 10: 4
cat("Process predictors in top 10:", proc_count, "\n")
## Process predictors in top 10: 6
The most important predictors identified by the GBM model include both biological material and manufacturing process variables. Among the top 10 predictors, there were X biological and Y procedss variables, suggesting that both raw material quality and adjustable process conditions significantly influence the yield.
The optimal nonlinear regression model based on the test set performance was the Support Vector Machine (SVM) with a radial kernel. However, since SVM does not provide direct variable importance measures, we examined the importance from the Gradient Boosting Machine (GBM) model, which was the second-best performing nonlinear model. Based on the output, among the top 10 most important predictors identified by the GBM model, there were 4 biological material variables and 6 manufacturing process variables. This suggests that both the quality of the raw materials and the conditions of the manufacturing process are influential in determining the product yield, with a slight dominance of process-related factors in the top 10 according to GBM.
# Select top 10 variables from GBM importance
top10_vars <- rownames(gbm_imp$importance)[order(gbm_imp$importance$Overall, decreasing = TRUE)][1:10]
# Combine with Yield
top_data <- data.frame(Yield = df$Yield, df[, top10_vars])
# Reshape for ggplot
library(tidyr)
library(ggplot2)
top_data_long <- pivot_longer(top_data, -Yield, names_to = "Predictor", values_to = "Value")
# Create plots
ggplot(top_data_long, aes(x = Value, y = Yield)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "loess", se = FALSE, color = "blue") +
facet_wrap(~ Predictor, scales = "free_x") +
theme_minimal() +
labs(title = "Yield vs Top GBM Predictors",
x = "Predictor Value", y = "Yield")
## `geom_smooth()` using formula = 'y ~ x'
Visual inspection of the top predictors reveals nonlinear relationships
with the Yield. Some variables show curved or threshold-like patterns,
suggesting interaction effects or diminishing returns. These insights
reinforce the use of nonlinear models like GBM and SVM for capturing
complex dependencies in the data.
The attached plot, titled “Yield vs Top GBM Predictors,” displays scatter plots with LOESS smooth trend lines for the top 10 predictors identified by the GBM model against the product yield. Visual inspection of these plots reveals that many of the top predictors exhibit nonlinear relationships with the yield. Several plots show curves, indicating that the effect of these predictors on the yield is not constant across their entire range. For instance, some predictors might show an increasing yield up to a certain point, after which the yield either plateaus or decreases. These nonlinear patterns, such as those observed for BiologicalMaterial03, ManufacturingProcess17, and ManufacturingProcess30, suggest that the relationship between these specific biological or process factors and the final yield is complex and cannot be adequately captured by a simple linear model. This intuition reinforces the utility of employing nonlinear regression models like GBM and SVM to better understand and potentially optimize the chemical manufacturing process.