Friedman (1991) introduced several benchmark data sets create by sim- ulation. One of these simulations used the following nonlinear equation to create data: y = 10 sin(πx1x2) + 20(x3 − 0.5)2 + 10x4 + 5x5 + N(0, σ2) where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation)
The package mlbench contains a function called mlbench.friedman1 that simulates these data.
library(caret)
## Warning: package 'caret' was built under R version 4.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.3
## Loading required package: lattice
## Warning: package 'lattice' was built under R version 4.4.3
library(mlbench)
## Warning: package 'mlbench' was built under R version 4.4.2
set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
featurePlot(trainingData$x, trainingData$y)
## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)
#install.packages("earth")
library(earth)
## Warning: package 'earth' was built under R version 4.4.3
## Loading required package: Formula
## Loading required package: plotmo
## Warning: package 'plotmo' was built under R version 4.4.3
## Loading required package: plotrix
marsModel <- earth(x = trainingData$x, y = trainingData$y)
summary(marsModel) # Check for predictor importance
## Call: earth(x=trainingData$x, y=trainingData$y)
##
## coefficients
## (Intercept) 18.451984
## h(0.621722-X1) -11.074396
## h(0.601063-X2) -10.744225
## h(X3-0.281766) 20.607853
## h(0.447442-X3) 17.880232
## h(X3-0.447442) -23.282007
## h(X3-0.636458) 15.150350
## h(0.734892-X4) -10.027487
## h(X4-0.734892) 9.092045
## h(0.850094-X5) -4.723407
## h(X5-0.850094) 10.832932
## h(X6-0.361791) -1.956821
##
## Selected 12 of 18 terms, and 6 of 10 predictors
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 2.540556 RSS 397.9654 GRSq 0.8968524 RSq 0.9183982
#Understanding the influence of predictors
plotmo(marsModel)
## plotmo grid: X1 X2 X3 X4 X5 X6 X7
## 0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
## X8 X9 X10
## 0.497961 0.5288716 0.5359218
library(magrittr)
## Warning: package 'magrittr' was built under R version 4.4.3
knnModel <- train(x = trainingData$x,
y = trainingData$y,
method = "knn",
preProc = c("center", "scale"),
tuneLength = 10)
knnModel
## k-Nearest Neighbors
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 3.631337 0.4736866 2.952051
## 7 3.459177 0.5226908 2.803521
## 9 3.345159 0.5615803 2.708064
## 11 3.266588 0.5936149 2.647474
## 13 3.241379 0.6129913 2.612002
## 15 3.231158 0.6289031 2.604464
## 17 3.247220 0.6348270 2.614427
## 19 3.264636 0.6423761 2.639587
## 21 3.271335 0.6510671 2.650748
## 23 3.282586 0.6575603 2.664574
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 15.
#Testing the model
predict(knnModel, testData$x) %>%
postResample(pred = ., obs = testData$y)
## RMSE Rsquared MAE
## 3.1750657 0.6785946 2.5443169
#Training linear regression
lmModel <- train(x = trainingData$x,
y = trainingData$y,
method = "lm",
preProc = c("center", "scale"),
tuneLength = 10)
lmModel
## Linear Regression
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 2.466242 0.7610647 1.955361
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
While K-Nearest Neighbors (KNN) makes predictions based on the proximity of data points in the feature space, treating all features with equal importance, linear regression achieves better results by specifically modeling the relationship between each feature and the response variable, considering their correlation.
predict(lmModel, testData$x) %>%
postResample(pred = ., obs = testData$y)
## RMSE Rsquared MAE
## 2.6970680 0.7084666 2.0600540
svmrModel <- train(x = trainingData$x,
y = trainingData$y,
method = "svmRadial",
preProc = c("center", "scale"),
tuneLength = 14)
svmrModel
## Support Vector Machines with Radial Basis Function Kernel
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 2.591797 0.7652097 2.040022
## 0.50 2.383115 0.7830132 1.858793
## 1.00 2.243293 0.8018514 1.738839
## 2.00 2.131571 0.8182703 1.655132
## 4.00 2.093475 0.8232072 1.626784
## 8.00 2.061948 0.8277847 1.602788
## 16.00 2.051696 0.8294611 1.594581
## 32.00 2.051355 0.8295187 1.594280
## 64.00 2.051355 0.8295187 1.594280
## 128.00 2.051355 0.8295187 1.594280
## 256.00 2.051355 0.8295187 1.594280
## 512.00 2.051355 0.8295187 1.594280
## 1024.00 2.051355 0.8295187 1.594280
## 2048.00 2.051355 0.8295187 1.594280
##
## Tuning parameter 'sigma' was held constant at a value of 0.05732269
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.05732269 and C = 32.
#
predict(svmrModel, testData$x) %>%
postResample(pred = ., obs = testData$y)
## RMSE Rsquared MAE
## 2.0617418 0.8276253 1.5668772
what about this 7.5 exercise: 7.5. Exercise 6.3 describes data for a
chemical manufacturing process. Use the same data imputation, data
splitting, and pre-processing steps as before and train several
nonlinear regression models. a) Which nonlinear regression model gives
the optimal resampling and test set performance? b) Which predictors are
most important in the optimal nonlinear regression model? Do either the
biological or process variables dominate the list? How do the top ten
important predictors compare to the top ten predictors from the optimal
linear model? c) Explore the relationships between the top predictors
and the response for the predictors that are unique to the optimal
nonlinear regression model. Do these plots reveal intuition about the
biological or process predictors and their relationship with yield
Exercise 6.3: A chemical manufacturing process for a pharmaceutical
product was discussed in Sect. 1.4. In this problem, the objective is to
understand the re- lationship between biological measurements of the raw
materials as predictors, measurements of the manufacturing process
(predictors), and the response of product yield. Biological predictors
cannot be changed but can be used to assess the quality of the raw
material before processing. On the other hand, manufacturing process
predictors can be changed in the manufacturing pro- cess. Improving
product yield by 1 % will boost revenue by approximately one hundred
thousand dollars per batch.
#Steps done to prepare data in exercise 6.3
library(AppliedPredictiveModeling)
## Warning: package 'AppliedPredictiveModeling' was built under R version 4.4.3
data(ChemicalManufacturingProcess)
preProcess(ChemicalManufacturingProcess, method = c("knnImpute", "BoxCox", "center", "scale")) |>
predict(ChemicalManufacturingProcess) -> cmp
part <- createDataPartition(cmp$Yield, p = 0.75, list = FALSE)
cmp_train <- cmp[part,]
cmp_test <- cmp[-part,]
dim(cmp_train)
## [1] 132 58
#Training kN Model
knnModel <- train(x = cmp_train[,-1],
y = cmp_train$Yield,
method = "knn",
tuneLength = 10)
knnModel
## k-Nearest Neighbors
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 0.7966925 0.3596492 0.6357015
## 7 0.7736043 0.3885559 0.6232738
## 9 0.7642366 0.4054645 0.6148636
## 11 0.7627292 0.4088394 0.6168555
## 13 0.7688236 0.3989811 0.6261143
## 15 0.7682704 0.4024564 0.6265736
## 17 0.7713378 0.4012508 0.6259949
## 19 0.7757745 0.3951154 0.6282825
## 21 0.7831882 0.3855136 0.6332646
## 23 0.7891889 0.3787429 0.6398937
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 11.
# Testing the model on one set
predict(knnModel, cmp_test[,-1]) %>%
postResample(pred = ., obs = cmp_test$Yield)
## RMSE Rsquared MAE
## 0.7691319 0.5412774 0.6394610
#Training SVM model
svmrModel <- train(x = cmp_train[,-1],
y = cmp_train$Yield,
method = "svmRadial",
tuneLength = 14)
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
svmrModel
## Support Vector Machines with Radial Basis Function Kernel
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 0.7310350 0.4774658 0.6020568
## 0.50 0.6894586 0.5148293 0.5621725
## 1.00 0.6616257 0.5452638 0.5324502
## 2.00 0.6499563 0.5532766 0.5203021
## 4.00 0.6533609 0.5485653 0.5227168
## 8.00 0.6528786 0.5499635 0.5231998
## 16.00 0.6523688 0.5506442 0.5226165
## 32.00 0.6523688 0.5506442 0.5226165
## 64.00 0.6523688 0.5506442 0.5226165
## 128.00 0.6523688 0.5506442 0.5226165
## 256.00 0.6523688 0.5506442 0.5226165
## 512.00 0.6523688 0.5506442 0.5226165
## 1024.00 0.6523688 0.5506442 0.5226165
## 2048.00 0.6523688 0.5506442 0.5226165
##
## Tuning parameter 'sigma' was held constant at a value of 0.01130335
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01130335 and C = 2.
predict(svmrModel, cmp_test[,-1]) %>%
postResample(pred = ., obs = cmp_test$Yield)
## RMSE Rsquared MAE
## 0.6755068 0.5925273 0.5773412
The kNN model generated best results than SVM. Following this, a linear Support Vector Machine was trained. The resulting RMSE was substantially higher than that of the radial SVM and KNN, providing evidence that the decision boundary required to effectively model our data is likely non-linear, rendering a linear approach suboptimal.
svmModel <- train(x = cmp_train[,-1],
y = cmp_train$Yield,
method = "svmLinear",
preProc = c("center", "scale"),
tuneLength = 14)
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
svmModel
## Support Vector Machines with Linear Kernel
##
## 132 samples
## 57 predictor
##
## Pre-processing: centered (57), scaled (57)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 3.467424 0.1940873 1.169497
##
## Tuning parameter 'C' was held constant at a value of 1
#Testing the RSME model
predict(svmModel, cmp_test[,-1]) %>%
postResample(pred = ., obs = cmp_test$Yield)
## RMSE Rsquared MAE
## 2.5636119 0.1260975 0.9884771
Now in this step the Mars Model is set to get trained
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:14)
marsModel <- train(x = cmp_train[,-1],
y = cmp_train$Yield,
method = "earth",
tuneGrid = marsGrid)
marsModel
## Multivariate Adaptive Regression Spline
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 0.8138017 0.3531127 0.6463840
## 1 3 0.6704387 0.5553252 0.5448596
## 1 4 0.6916390 0.5356241 0.5519542
## 1 5 0.7315700 0.5115560 0.5723532
## 1 6 0.7388899 0.5070988 0.5741744
## 1 7 0.7835975 0.4719523 0.6003233
## 1 8 0.8039370 0.4620699 0.6090559
## 1 9 0.8746786 0.4495768 0.6168025
## 1 10 0.8674741 0.4373438 0.6208033
## 1 11 0.8791291 0.4284318 0.6308472
## 1 12 0.8948957 0.4293195 0.6356232
## 1 13 0.8965397 0.4380694 0.6358690
## 1 14 1.1667197 0.4061846 0.6925281
## 2 2 0.8412510 0.3109642 0.6719476
## 2 3 0.7173671 0.4971387 0.5748290
## 2 4 0.7483763 0.4717177 0.5868740
## 2 5 0.7493999 0.4711173 0.5866434
## 2 6 0.7871970 0.4490325 0.6136162
## 2 7 1.0303484 0.4328551 0.6496558
## 2 8 0.8025959 0.4516194 0.6188355
## 2 9 0.8036918 0.4616256 0.6175595
## 2 10 0.8308936 0.4462361 0.6307570
## 2 11 1.0493039 0.4302409 0.6735992
## 2 12 1.1103994 0.4115791 0.6901853
## 2 13 1.1371685 0.3987766 0.7074884
## 2 14 0.9352758 0.4124943 0.6781520
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 3 and degree = 1.
# Getting second RMSE with MARS
predict(marsModel, cmp_test[,-1]) %>%
postResample(pred = ., obs = cmp_test$Yield)
## RMSE Rsquared MAE
## 0.6192644 0.6547952 0.5041792
# Training a Neural Network Model
nnetModel <- train(x = cmp_train[,-1],
y = cmp_train$Yield,
method = "nnet",
trace = FALSE,
linout = TRUE)
nnetModel
## Neural Network
##
## 132 samples
## 57 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## size decay RMSE Rsquared MAE
## 1 0e+00 1.0125380 0.2507108 0.7998582
## 1 1e-04 1.0036946 0.2562198 0.7984913
## 1 1e-01 0.9582983 0.3286442 0.7520286
## 3 0e+00 1.1046055 0.2654114 0.8763299
## 3 1e-04 1.0092634 0.3062618 0.8105773
## 3 1e-01 0.8516790 0.4190122 0.6789180
## 5 0e+00 1.0022394 0.3250860 0.8010552
## 5 1e-04 0.9625833 0.3579471 0.7608243
## 5 1e-01 0.7835509 0.4776804 0.6231332
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5 and decay = 0.1.
# The neural network model's RMSE and R-squared are okay, somewhere in the middle.
predict(nnetModel, cmp_test[,-1]) %>%
postResample(pred = ., obs = cmp_test$Yield)
## RMSE Rsquared MAE
## 0.8921480 0.5300504 0.6703499
Now responding the previous requirements for this exercise:
The radial SVM model achieved the best test set performance as RMSE: 0.6710227, suggesting a radial pattern in the data. Nevertheless, the superior performance of the linear PLS model RMSE: 0.6340737 reveals the continued presence of significant linear relationships influencing predictions.
Here’s the paragraph reworded in the third person:
The analysis revealed that the top two most important predictors are manufacturing processes. Furthermore, a greater number of manufacturing processes appeared within the top ten important predictors compared to biological factors. Notably, their Partial Least Squares (PLS) model identified the top six predictors as exclusively manufacturing processes, followed by a significant decrease in importance where biological material began to appear.
With the exception of biological material 12 entering the top ten instead of biological material 8, the list of important predictors remained consistent across models. However, the order of importance differed. This observation illustrates the similarities and distinctions between linear and nonlinear modeling approaches in identifying feature importance.
plot(varImp(svmrModel), top = 20)
ggplot(data = ChemicalManufacturingProcess, aes(y = Yield, x = BiologicalMaterial12)) +
geom_point() +
labs(
title = "Yield vs. Biological Material 12",
y = "Yield (%)",
x = "Biological Material 12 (Units)",
caption = "Source: Chemical Manufacturing Process Data"
) +
theme_minimal()
Visualizing the unique predictor, the biological material 12 within their top 10 important variables, it is understandable that this predictor held greater importance for their non-linear model compared to their linear one. While a slight positive linear correlation appears to exist, the relationship seems to exhibit a more parabolic form. Specifically, yield increases as the value of biological material 12 approaches 21 from 18, but subsequently, yield tends to decrease beyond this point. Therefore, to maximize yield, maintaining biological material 12 at approximately 21 units is advisable.