7.2

Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:

\(y=10sin(\pi x_1x_2)+20(x_3−0.5)^2+10x_4+5x_5 + N (0, σ^2)\)

where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

library(mlbench)
set.seed(888)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
featurePlot(trainingData$x, trainingData$y)

## or other methods.

## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

Tune several models on these data. For example:

knnModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = "knn",
                  preProc = c("center", "scale"),
                  tuneLength = 10)

knnModel
## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.860616  0.4601607  3.130380
##    7  3.755009  0.4910550  3.077735
##    9  3.666459  0.5193907  3.006640
##   11  3.621065  0.5395638  2.972521
##   13  3.601361  0.5532723  2.957626
##   15  3.602722  0.5613166  2.959610
##   17  3.592971  0.5750973  2.941233
##   19  3.600254  0.5815636  2.954610
##   21  3.594421  0.5930504  2.946189
##   23  3.606967  0.5976664  2.959164
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 17.
knnPred <- predict(knnModel, newdata = testData$x)
## The function 'postResample' can be used to get the test set
## perforamnce values
postResample(pred = knnPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 3.2174703 0.6840352 2.5792401

knn Model

As noted above, in the example model given for this exercise:

  • The final value used for the model was k = 17
  • The \(R^2\) tells us that 68% of the variation in the test data can be predicted using this model (it was 58% in training)
  • RSME tells us that the sqrt of the average of squared differences between prediction and actual observation of the test data is 3.217 (it was 3.593 in training)

We can plot the Model to see how the RMSE varied with choice of k.

plot(knnModel)

MARS

Trying a MARS model:

  • The final value used for the model nprune = 13 and degree = 2
  • The \(R^2\) tells us that 94% of the variation in the test data can be predicted using this model (it was 92% in training)
  • RSME tells us that the sqrt of the average of squared differences between prediction and actual observation of the test data is 1.283 (it was 1.450 in training)

This is therefore an improvement over the knn model.

Plotting the MARS model, we can see how the RSME varied with the number of terms (nprune) and the degree.

varImp() is used to show the importance of variables in the MARS model.

marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:10)

mars <- train(x = trainingData$x,
              y = trainingData$y,
              method = "earth",
              tuneGrid = marsGrid)

mars
## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE     
##   1        2      4.282476  0.3405743  3.544470
##   1        3      3.511856  0.5556841  2.749261
##   1        4      3.059520  0.6636445  2.383107
##   1        5      2.752090  0.7297090  2.196361
##   1        6      2.576652  0.7652359  2.051358
##   1        7      2.218664  0.8230322  1.743615
##   1        8      2.081278  0.8454350  1.598307
##   1        9      2.005701  0.8568815  1.513628
##   1       10      2.038621  0.8511071  1.519090
##   2        2      4.282476  0.3405743  3.544470
##   2        3      3.579351  0.5386879  2.823563
##   2        4      3.185307  0.6385522  2.497635
##   2        5      2.919317  0.7003977  2.316884
##   2        6      2.750357  0.7324939  2.162438
##   2        7      2.480682  0.7799509  1.956363
##   2        8      2.325418  0.8057844  1.796541
##   2        9      2.156192  0.8340017  1.655812
##   2       10      1.819411  0.8803590  1.409258
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 10 and degree = 2.
marsPred <- predict(mars, newdata = testData$x)

postResample(pred = marsPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 1.5771575 0.9046325 1.2648048
plot(mars)

varImp(mars)
## earth variable importance
## 
##    Overall
## X4  100.00
## X2   66.22
## X1   56.29
## X5   35.11
## X3    0.00

SVM

Trying an SVM model:

  • The final values used for the model were \(\sigma\) = 0.06547296 and C = 8
  • The \(R^2\) tells us that 82% of the variation in the test data can be predicted using this model (it was 85% in training)
  • RSME tells us that the sqrt of the average of squared differences between prediction and actual observation of the test data is 2.152 (it was 2.035 in training)

This is better than knn, but not as good as MARS.

svm <- train(x = trainingData$x,
                   y = trainingData$y,
                   method = "svmRadial",
                   preProc = c("center", "scale"),
                   tuneLength = 14,
                   trControl = trainControl(method = "cv"))


svm
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE      Rsquared   MAE     
##      0.25  2.844930  0.7549411  2.291793
##      0.50  2.576504  0.7733652  2.046986
##      1.00  2.410927  0.7899479  1.913474
##      2.00  2.230765  0.8186050  1.759713
##      4.00  2.123520  0.8369284  1.669599
##      8.00  2.105953  0.8426725  1.669878
##     16.00  2.090359  0.8439848  1.664350
##     32.00  2.100793  0.8413428  1.675672
##     64.00  2.100793  0.8413428  1.675672
##    128.00  2.100793  0.8413428  1.675672
##    256.00  2.100793  0.8413428  1.675672
##    512.00  2.100793  0.8413428  1.675672
##   1024.00  2.100793  0.8413428  1.675672
##   2048.00  2.100793  0.8413428  1.675672
## 
## Tuning parameter 'sigma' was held constant at a value of 0.05907177
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.05907177 and C = 16.
svmPred <- predict(svm, newdata = testData$x)

postResample(pred = svmPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 2.1542198 0.8158619 1.6843622
svm$finalModel
## Support Vector Machine object of class "ksvm" 
## 
## SV type: eps-svr  (regression) 
##  parameter : epsilon = 0.1  cost C = 16 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  0.0590717746467405 
## 
## Number of Support Vectors : 158 
## 
## Objective Function Value : -99.637 
## Training error : 0.008854

nnet

With an nnet model:

  • The final values used for the model were size = 3, decay = 0.1 and bag = FALSE
  • The \(R^2\) tells us that 82% of the variation in the test data can be predicted using this model (it was 73% in training)
  • RSME tells us that the sqrt of the average of squared differences between prediction and actual observation of the test data is 2.13 (it was 2.71 in training)

Again, not as good as MARS.

nnetGrid <- expand.grid(.decay = c(0, 0.01, .1),
                        .size = c(1:10),
                        .bag = FALSE)

nnet <- train(x = trainingData$x,
              y = trainingData$y,
              tuneGrid = nnetGrid,
              method = "avNNet",
              preProc = c("center", "scale"),
              linout = TRUE,
              trace = FALSE,
              MaxNWts = 10*(ncol(trainingData$x)+1)+10+1,
              maxit = 500)


nnet
## Model Averaged Neural Network 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE      Rsquared   MAE     
##   0.00    1    2.910359  0.7052616  2.291559
##   0.00    2    2.891316  0.7059847  2.275294
##   0.00    3    2.826052  0.7256298  2.127671
##   0.00    4    3.056024  0.6914895  2.316239
##   0.00    5    3.528615  0.6370737  2.489970
##   0.00    6    5.169135  0.4783460  3.372186
##   0.00    7    5.661825  0.4427716  3.555873
##   0.00    8    5.143869  0.4705707  3.520602
##   0.00    9    3.400229  0.6333138  2.551793
##   0.00   10    3.111584  0.6641882  2.461360
##   0.01    1    2.867377  0.7131098  2.235117
##   0.01    2    2.886182  0.7071470  2.269346
##   0.01    3    2.699269  0.7427663  2.096798
##   0.01    4    2.761483  0.7324298  2.159188
##   0.01    5    2.834302  0.7196884  2.200141
##   0.01    6    3.060042  0.6864469  2.406364
##   0.01    7    3.088678  0.6793872  2.423030
##   0.01    8    2.993779  0.6915515  2.336350
##   0.01    9    2.961494  0.6974550  2.318784
##   0.01   10    3.052464  0.6788714  2.388105
##   0.10    1    2.831871  0.7169271  2.210673
##   0.10    2    2.908004  0.7019150  2.266091
##   0.10    3    2.701724  0.7391899  2.094884
##   0.10    4    2.707163  0.7418123  2.082118
##   0.10    5    2.789224  0.7244677  2.203423
##   0.10    6    2.928365  0.7043891  2.268106
##   0.10    7    2.957806  0.7035750  2.308985
##   0.10    8    2.975297  0.6960755  2.313736
##   0.10    9    2.862723  0.7131205  2.232737
##   0.10   10    2.761268  0.7298741  2.165462
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 3, decay = 0.01 and bag = FALSE.
nnetPred <- predict(nnet, newdata = testData$x)

postResample(pred = nnetPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 1.9425148 0.8536435 1.5311263

Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

The MARS model appears to give the best performance.

Yes, the MARS model does select the informative predictors X1-X5.

7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.

PreProcessing used in 6.3

library(AppliedPredictiveModeling) 
data(ChemicalManufacturingProcess)
cmp <- preProcess(ChemicalManufacturingProcess,
                        method = "knnImpute")

cmp <- predict(cmp, ChemicalManufacturingProcess)
#split
set.seed(888)

train_rw2 <- createDataPartition(cmp$Yield,
                                 p = 0.8,
                                 list = FALSE)

train_cmp <- cmp[train_rw2,-1]
train_yld <- cmp$Yield[train_rw2]

test_cmp <- cmp[-train_rw2,-1]
test_yld <- cmp$Yield[-train_rw2]
train_cmp2 <- train_cmp[,-nearZeroVar(train_cmp)]
test_cmp2 <- test_cmp[,-nearZeroVar(train_cmp)]

knn

Here, the knn model results were:

  • The final value used for the model was k = 21
  • The \(R^2\) tells us that 36% of the variation in the test data can be predicted using this model (it was 44% in training)
  • RSME tells us that the sqrt of the average of squared differences between prediction and actual observation of the test data is 0.725 (it was 0.796 in training)

The plot shows how the RMSE varied with choice of k.

 set.seed(888)
knn2 <- train(train_cmp2,
              train_yld,
              method = "knn",
              tuneLength = 10,  
              preProcess = c("center", "scale"))
 
knn2 
## k-Nearest Neighbors 
## 
## 144 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.8038570  0.3941336  0.6260692
##    7  0.8073666  0.3877304  0.6384033
##    9  0.7978396  0.4033837  0.6404796
##   11  0.8003472  0.4052019  0.6437881
##   13  0.7990918  0.4095493  0.6430512
##   15  0.8020644  0.4105543  0.6487954
##   17  0.7986863  0.4198387  0.6457158
##   19  0.7966959  0.4275125  0.6432641
##   21  0.7962042  0.4373037  0.6418541
##   23  0.7986513  0.4423130  0.6449602
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 21.
knn2Pred <- predict(knn2, newdata = test_cmp2)

postResample(pred = knn2Pred, obs = test_yld)
##      RMSE  Rsquared       MAE 
## 0.7254909 0.3602139 0.5779304
plot(knn2)

MARS

  • The final values used for the model were nprune = 4 and degree = 1
  • The \(R^2\) tells us that 46% of the variation in the test data can be predicted (it was 44% in training)
  • RSME is 0.668 (0.589 in training)

This is an improvement over the knn model.

marsGrid2 <- expand.grid(.degree = 1:2, .nprune = 2:15)

mars2 <- train(x = train_cmp2,
              y = train_yld,
              method = "earth",
              tuneGrid = marsGrid,
              trControl = trainControl(method = "cv"),
              preProcess = c("center", "scale"))

mars2
## Multivariate Adaptive Regression Spline 
## 
## 144 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 128, 130, 128, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE      
##   1        2      0.7756591  0.4781901  0.6026055
##   1        3      0.6260804  0.6564901  0.4958041
##   1        4      0.5888955  0.6778981  0.4751589
##   1        5      0.6186650  0.6467308  0.4961045
##   1        6      0.6177312  0.6470263  0.5047212
##   1        7      0.6290476  0.6295109  0.5184378
##   1        8      0.6280382  0.6274032  0.5128726
##   1        9      0.6447224  0.6234985  0.5245790
##   1       10      0.6807522  0.5950872  0.5614279
##   2        2      0.7756591  0.4781901  0.6026055
##   2        3      0.6483686  0.6230736  0.5139154
##   2        4      0.6167836  0.6465433  0.4914388
##   2        5      0.6298447  0.6354757  0.5006078
##   2        6      0.6583111  0.6206833  0.5101689
##   2        7      0.6677018  0.6143335  0.5181517
##   2        8      0.6820080  0.6036207  0.5254510
##   2        9      0.6757066  0.6164303  0.5208933
##   2       10      0.6970403  0.5989016  0.5326307
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 4 and degree = 1.
mars2Pred <- predict(mars2, newdata = test_cmp2)

postResample(pred = mars2Pred, obs = test_yld)
##      RMSE  Rsquared       MAE 
## 0.6682436 0.4618526 0.5367781
varImp(mars2)
## earth variable importance
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess09   49.18
## ManufacturingProcess13    0.00
plot(mars2)

SVM

Trying an SVM model:

  • The final values used for the model were \(\sigma\) = 0.01542519 and C = 8
  • The \(R^2\) tells us that 65% of the variation in the test data can be predicted using this model (it was 66% in training)
  • RSME tells us that the sqrt of the average of squared differences between prediction and actual observation of the test data is 0.552 (it was 0.630 in training)

This is better than both knn and MARS, the best so far.

svm2 <- train(x = train_cmp2,
              y = train_yld,
              method = "svmRadial",
              preProc = c("center", "scale"),
              tuneLength = 14,
              trControl = trainControl(method = "cv"))


svm2
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 144 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 129, 129, 128, 130, 131, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE       Rsquared   MAE      
##      0.25  0.7811359  0.5062586  0.6469410
##      0.50  0.7105396  0.5592116  0.5864744
##      1.00  0.6633066  0.6066053  0.5410931
##      2.00  0.6458475  0.6295793  0.5214597
##      4.00  0.6296182  0.6517828  0.5003175
##      8.00  0.6249926  0.6556208  0.5004709
##     16.00  0.6249926  0.6556208  0.5004709
##     32.00  0.6249926  0.6556208  0.5004709
##     64.00  0.6249926  0.6556208  0.5004709
##    128.00  0.6249926  0.6556208  0.5004709
##    256.00  0.6249926  0.6556208  0.5004709
##    512.00  0.6249926  0.6556208  0.5004709
##   1024.00  0.6249926  0.6556208  0.5004709
##   2048.00  0.6249926  0.6556208  0.5004709
## 
## Tuning parameter 'sigma' was held constant at a value of 0.01542519
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01542519 and C = 8.
svm2Pred <- predict(svm2, newdata = test_cmp2)

postResample(pred = svm2Pred, obs = test_yld)
##      RMSE  Rsquared       MAE 
## 0.5524226 0.6527462 0.4592831
svm2$finalModel
## Support Vector Machine object of class "ksvm" 
## 
## SV type: eps-svr  (regression) 
##  parameter : epsilon = 0.1  cost C = 8 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  0.0154251875282437 
## 
## Number of Support Vectors : 121 
## 
## Objective Function Value : -78.5658 
## Training error : 0.008928

nnet

The nnet model:

  • The final values used for the model were size = 2, decay = 0.1 and bag = FALSE
  • The \(R^2\) tells us that 49% of the variation in the test data can be predicted using this model (it was 50% in training)
  • RSME tells us that the sqrt of the average of squared differences between prediction and actual observation of the test data is 0.676 (it was 0.810 in training)

This is better than knn and MARS, but not as good as SVM.

nnet2Grid <- expand.grid(.decay = c(0, 0.01, .1),
                        .size = c(1:10),
                        .bag = FALSE)
set.seed(888)

nnet2 <- train(x = train_cmp2,
               y = train_yld,
               tuneGrid = nnet2Grid,
               method = "avNNet",
               preProc = c("center", "scale"),
               linout = TRUE,
               trace = FALSE,
               MaxNWts = 10*(ncol(trainingData$x)+1)+10+1,
               maxit = 500)
nnet2
## Model Averaged Neural Network 
## 
## 144 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 144, 144, 144, 144, 144, 144, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE      
##   0.00    1    0.9078320  0.3255887  0.7218284
##   0.00    2    1.0186784  0.3559190  0.7899344
##   0.00    3          NaN        NaN        NaN
##   0.00    4          NaN        NaN        NaN
##   0.00    5          NaN        NaN        NaN
##   0.00    6          NaN        NaN        NaN
##   0.00    7          NaN        NaN        NaN
##   0.00    8          NaN        NaN        NaN
##   0.00    9          NaN        NaN        NaN
##   0.00   10          NaN        NaN        NaN
##   0.01    1    0.9559450  0.3337787  0.7444062
##   0.01    2    0.9291364  0.4082701  0.7273691
##   0.01    3          NaN        NaN        NaN
##   0.01    4          NaN        NaN        NaN
##   0.01    5          NaN        NaN        NaN
##   0.01    6          NaN        NaN        NaN
##   0.01    7          NaN        NaN        NaN
##   0.01    8          NaN        NaN        NaN
##   0.01    9          NaN        NaN        NaN
##   0.01   10          NaN        NaN        NaN
##   0.10    1    0.9680074  0.3514304  0.7543096
##   0.10    2    0.8098539  0.5010686  0.6325175
##   0.10    3          NaN        NaN        NaN
##   0.10    4          NaN        NaN        NaN
##   0.10    5          NaN        NaN        NaN
##   0.10    6          NaN        NaN        NaN
##   0.10    7          NaN        NaN        NaN
##   0.10    8          NaN        NaN        NaN
##   0.10    9          NaN        NaN        NaN
##   0.10   10          NaN        NaN        NaN
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 2, decay = 0.1 and bag = FALSE.
nnet2Pred <- predict(nnet2, newdata = test_cmp2)

postResample(pred = nnet2Pred, obs = test_yld)
##      RMSE  Rsquared       MAE 
## 0.6763186 0.4862515 0.5483451

(a)

Which nonlinear regression model gives the optimal resampling and test set performance?

The SVM gave the best results.

(b)

Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

See list of predictors below. There are 6 Manufacturing and 4 Biological in the top 10, so Manufacturing is more important, but not sure I would say ‘dominate’.

Comparing to the results from last week, with the linear model:

  • ManufacturingProcess (MP) 13 and 32 are again the top 2, though the order is switched
  • MP 17, 09 and 36 are also in the top 10 here and with the linear model.
  • Everything in the top 10 here are in the top 20 in the linear model, but some are much further down.
  • The ratio of Manufacturing to Biological in the top 10 was 7:3 for the linear model (vs 6:4 here), so Manufacturing was more ‘dominant’.
varImp(svm2)
## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 56)
## 
##                        Overall
## ManufacturingProcess13  100.00
## ManufacturingProcess32   97.84
## ManufacturingProcess17   92.13
## BiologicalMaterial06     84.18
## BiologicalMaterial12     79.26
## ManufacturingProcess09   77.42
## ManufacturingProcess36   74.90
## BiologicalMaterial03     71.81
## BiologicalMaterial02     67.35
## ManufacturingProcess06   61.14
## ManufacturingProcess31   57.83
## ManufacturingProcess11   54.63
## ManufacturingProcess33   49.86
## BiologicalMaterial11     47.61
## ManufacturingProcess29   45.99
## ManufacturingProcess12   40.70
## BiologicalMaterial01     40.67
## BiologicalMaterial04     40.23
## ManufacturingProcess30   37.13
## BiologicalMaterial08     36.81

(c)

Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

There are 4 predictors in the top 20 that appear in the optimal nonlinear regression model, but did not appear on the optimal linear regression model. They are all Manufacturing Process (MP): MP31, MP29, MP12, MP30.

Plotting them vs. Yield shows that:

  • MP12 has a value of either -0.5 or 2, and as Yield increased, it is more likely to be 2
  • MP30 looks to have one outlier, but otherwise clusters between -2 and 2, without any intuition about Yield.
  • MP31 and MP29 both have a couple of outliers, but mostly clusters very close to 0, again, with no intuition about Yield.

A corrplot reveals much the same.

uniq <- cmp |> 
  select (Yield,
          ManufacturingProcess31,
          ManufacturingProcess29,
          ManufacturingProcess12,
          ManufacturingProcess30) |>
  rename(MP31 = ManufacturingProcess31,
         MP29 = ManufacturingProcess29,
         MP12 = ManufacturingProcess12,
         MP30 = ManufacturingProcess30)

featurePlot(uniq[,-1],uniq$Yield)

uniq|>
  cor() |>
  corrplot.mixed(tl.srt = 45, tl.cex = 0.5, na.label = "square", na.label.col = "lightgrey", tl.col = 'black', number.cex = 0.5)