Week 11 - Non-linear Regression - Homework

C. Rosemond 11.15.20

library(tidyverse)
library(corrplot)
library(caret)
library(e1071)
library(kernlab)
library(earth)
library(neural)
library(RSNNS)
library(mlbench)
library(AppliedPredictiveModeling)

7.2

Friedman (1991) introduced several benchmark data sets create by simulation. ... Tune several models on these data. Which models appear to give the best performance? Does MARS select the informative predictors (those named X1-X5)?

set.seed(624)
train_f1 <- mlbench.friedman1(200, sd = 1) #200 observations
train_f1$x <- data.frame(train_f1$x)
test_f1 <- mlbench.friedman1(5000, sd = 1) #5000 observations
test_f1$x <- data.frame(test_f1$x)

featurePlot(train_f1$x, train_f1$y)

In general, the predictors do not appear strongly related to the response. Notably, predictor X4 displays a weakly positive relationship with the response.

sum(is.na(train_f1$x))

## [1] 0

train_f1$x %>% 
  gather() %>%
  ggplot(aes(x = "", y = value)) +
    facet_wrap(~ key, scales = "free") +
    geom_boxplot() +
    coord_flip() +
    labs(x = NULL, y = NULL)

corr <- cor(train_f1$x, method = "spearman")
corrplot::corrplot(corr)

The data are simulated, so zero missingness is expected, and each predictor's distribution is symmetrical and essentially the same in spread. Further, per Spearman's correlation, none of the predictors are highly correlated with any others. Beyond centering and scaling, which are generally recommended for the subsequent nonlinear methods, additional pre-processing appears unnecessary.

K-Nearest Neighbors (KNN)

set.seed(624)
(knn_f1 <- caret::train(x = train_f1$x,
                y = train_f1$y,
                method = "knn",
                preProc = c("center", "scale"),
                tuneLength = 10,
                trControl = trainControl(method = "cv")))

## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.069204  0.6088491  2.433501
##    7  3.011768  0.6567422  2.399211
##    9  3.037390  0.6618269  2.452561
##   11  3.038967  0.6719378  2.478658
##   13  3.084122  0.6617994  2.526191
##   15  3.152288  0.6453826  2.556748
##   17  3.148509  0.6620085  2.560961
##   19  3.123090  0.6905532  2.541291
##   21  3.172754  0.6951162  2.578561
##   23  3.230072  0.6895217  2.625250
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 7.

Using 10-fold cross-validation to tune the k parameter, the optimal KNN model with 7 neighbors minimizes RMSE at approximately 3.01.

set.seed(624)
knnpred_f1 <- predict(knn_f1, newdata = test_f1$x)
postResample(pred = knnpred_f1, obs = test_f1$y)

##     RMSE Rsquared      MAE 
## 3.285951 0.618920 2.636732

Evaluating on the test set returns a RMSE of approximately 3.29, which is understandably higher than the training set RMSE.

Support Vector Machine (SVM)

set.seed(624)
(svm_f1 <- caret::train(x = train_f1$x,
                       y = train_f1$y,
                       method = "svmRadial",
                       preProc = c("center","scale"),
                       tuneLength = 14,
                       trControl = trainControl(method = "cv")))

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE      Rsquared   MAE     
##      0.25  2.584231  0.7797669  2.091152
##      0.50  2.360069  0.7912991  1.943972
##      1.00  2.113758  0.8196184  1.750612
##      2.00  2.017285  0.8317397  1.648217
##      4.00  1.965667  0.8395102  1.582769
##      8.00  1.984661  0.8331146  1.567490
##     16.00  2.025350  0.8255122  1.602328
##     32.00  2.027394  0.8251813  1.605477
##     64.00  2.027394  0.8251813  1.605477
##    128.00  2.027394  0.8251813  1.605477
##    256.00  2.027394  0.8251813  1.605477
##    512.00  2.027394  0.8251813  1.605477
##   1024.00  2.027394  0.8251813  1.605477
##   2048.00  2.027394  0.8251813  1.605477
## 
## Tuning parameter 'sigma' was held constant at a value of 0.05960274
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.05960274 and C = 4.

Kuhn and Johnson note that, of the various kernel functions, the radial basis function is generally effective. Using that kernel along with 10-fold cross-validation to tune \(\sigma\) and cost, the optimal SVM model with \(\sigma\) of approximately 0.06 and cost of 4 minimizes RMSE at approximately 1.97.

set.seed(624)
svmpred_f1 <- predict(svm_f1, newdata = test_f1$x)
postResample(pred = svmpred_f1, obs = test_f1$y)

##      RMSE  Rsquared       MAE 
## 2.1310844 0.8281178 1.6484015

Evaluating on the test set returns a RMSE of approximately 2.13, which is understandably higher than the training set RMSE. That value is lower than the same for the optimal KNN model.

Multivariate Adaptive Regression Splines (MARS)

set.seed(624)
(mars_f1 <- caret::train(x = train_f1$x,
                 y = train_f1$y,
                 method = "earth",
                 trControl = trainControl(method = "cv")))

## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   nprune  RMSE      Rsquared   MAE     
##    2      3.922423  0.3603280  3.329203
##    8      1.886486  0.8643516  1.521238
##   15      1.907564  0.8652765  1.544721
## 
## Tuning parameter 'degree' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 8 and degree = 1.

Using 10-fold cross-validation to tune the number of retained terms (degree is held constant at 1), the optimal, first-degree MARS model with 8 terms minimizes RMSE at approximately 1.89.

set.seed(624)
marspred_f1 <- predict(mars_f1, newdata = test_f1$x)
postResample(pred = marspred_f1, obs = test_f1$y)

##      RMSE  Rsquared       MAE 
## 1.9506231 0.8595922 1.5182805

Evaluating on the test set returns a RMSE of approximately 1.95, which is understandably higher than the training set RMSE. It is also lower than the test RMSE for both the KNN and SVM models.

varImp(mars_f1)

## earth variable importance
## 
##    Overall
## X4  100.00
## X1   70.32
## X2   46.62
## X5   28.97
## X3    0.00

The optimal MARS model does select the informative predictors (X1-X5)

Neural Network

set.seed(624)
nnetGrid <- expand.grid(.decay = c(0, 0.01, 1),
                        .size = c(1:10))
(nnet_f1 <- caret::train(x = train_f1$x,
                        y = train_f1$y,
                        method = "nnet",
                        tuneGrid = nnetGrid,
                        preProc = c("center", "scale"),
                        trControl = trainControl(method = "cv"),
                        linout = TRUE,
                        trace = FALSE))

## Neural Network 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE      Rsquared   MAE     
##   0.00    1    2.752369  0.6832636  2.204780
##   0.00    2    2.745469  0.6902325  2.184996
##   0.00    3    2.660248  0.6957751  2.187869
##   0.00    4    2.462959  0.7707206  1.973344
##   0.00    5    2.827899  0.7053522  2.285789
##   0.00    6    2.933040  0.7208237  2.285767
##   0.00    7    3.173320  0.6119132  2.570983
##   0.00    8    3.320302  0.6200110  2.596053
##   0.00    9    3.515911  0.6126600  2.681617
##   0.00   10    3.764635  0.5856651  3.045548
##   0.01    1    2.570137  0.7233154  2.093350
##   0.01    2    2.697154  0.6935491  2.224286
##   0.01    3    2.434795  0.7538776  1.974477
##   0.01    4    2.341151  0.7808289  1.865100
##   0.01    5    2.608191  0.7329522  2.146297
##   0.01    6    3.162233  0.6445225  2.455594
##   0.01    7    3.408134  0.5635911  2.591209
##   0.01    8    3.767024  0.5421410  2.783466
##   0.01    9    3.273421  0.6185443  2.597598
##   0.01   10    3.258314  0.6202967  2.537709
##   1.00    1    2.535983  0.7299049  2.042844
##   1.00    2    2.452274  0.7476484  1.955862
##   1.00    3    2.196975  0.7940627  1.790236
##   1.00    4    2.194446  0.8023363  1.785340
##   1.00    5    2.262647  0.7870007  1.836427
##   1.00    6    2.279762  0.7931883  1.804620
##   1.00    7    2.228026  0.7867747  1.799120
##   1.00    8    2.610867  0.7127715  2.080618
##   1.00    9    2.262709  0.7864614  1.842691
##   1.00   10    2.246225  0.7957875  1.825513
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 4 and decay = 1.

Using the book-suggested candidate set of models as well as 10-fold cross-validation to tune the parameters, the optimal neural network model with size of 4 and decay of 1 minimizes RMSE at approximately 2.19.

set.seed(624)
nnetpred_f1 <- predict(nnet_f1, newdata = test_f1$x)
postResample(pred = nnetpred_f1, obs = test_f1$y)

##      RMSE  Rsquared       MAE 
## 2.3875982 0.7794368 1.8641057

Evaluating on the test set returns an RMSE of approximately 2.39, which is higher than the training set RMSE. It is also higher than the MARS model RMSE.

Of the four models, the MARS model returns the smallest RMSE (~1.95). That model selects the informative predictors (X1-X5).

7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.

The code and descriptions below are adapted directly from the Week 10 Homework assignment.

data("ChemicalManufacturingProcess")
imputed <- impute::impute.knn(as.matrix(ChemicalManufacturingProcess), rng.seed = 624)
cmp <- as.data.frame(imputed$data)
sum(is.na(cmp))

## [1] 0

The data set contains the 57 predictors (12 describing input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.

Data are assumed to missing at least at random for the purposes of this exercise, and KNN imputation is used to estimate the 106 missing values in the set. The imputation process uses 10 neighbors.

set.seed(624)
index <- createDataPartition(cmp$Yield, p = .80, list = FALSE)
cmp_train <- cmp[index,] # 144 observations
cmp_test <- cmp[-index,] # 32 observations

An 80/20 split is used to create a training set of 144 runs and a test set of 32 runs.

ex <- nearZeroVar(cmp_train[-1], saveMetrics = TRUE)
ex %>% arrange(-freqRatio, percentUnique, -nzv) %>% head()

##                        freqRatio percentUnique zeroVar   nzv
## BiologicalMaterial07   71.000000      1.388889   FALSE  TRUE
## ManufacturingProcess41  6.500000      2.777778   FALSE FALSE
## ManufacturingProcess28  5.400000     14.583333   FALSE FALSE
## ManufacturingProcess12  4.760000      1.388889   FALSE FALSE
## ManufacturingProcess34  4.636364      6.250000   FALSE FALSE
## ManufacturingProcess40  4.333333      1.388889   FALSE FALSE

sum(ex$nzv)

## [1] 1

A check for near-zero variance predictors returns just one: BiologicalMaterial07, with a frequency ratio of approximately 71.

corr <- cor((cmp_train %>% select(-c("Yield","BiologicalMaterial07"))), method = "spearman")
corrplot::corrplot(corr)

hicorr <- findCorrelation(corr)

set.seed(624)
cmp_train_slim <- cmp_train %>% select(-c(Yield, BiologicalMaterial07)) %>% select(-all_of(hicorr))
cmp_train_transform <- cmp_train_slim %>% preProcess(method = c("BoxCox", "center", "scale")) %>% predict(cmp_train_slim) %>% cbind(cmp_train$Yield) %>% rename(Yield = "cmp_train$Yield")
cmp_test_slim <- cmp_test %>% select(-c(Yield, BiologicalMaterial07)) %>% select(-all_of(hicorr))
cmp_test_transform <- cmp_test_slim %>% preProcess(method = c("BoxCox", "center", "scale")) %>% predict(cmp_test_slim) %>% cbind(cmp_test$Yield) %>% rename(Yield = "cmp_test$Yield")

In pre-processing, predictors with near-zero variance or high correlations (using Spearman's \(\rho\)) are removed, and remaining predictors undergo a Box-Cox transformation as well as centering and scaling. The resulting training and test sets feature 51 predictors.

a. Which nonlinear regression model gives the optimal resampling and test set performance?

K-Nearest Neighbors (KNN)

set.seed(624)
(knn_cmp <- caret::train(Yield ~ .,
                         data = cmp_train_transform,
                         method = "knn",
                         tuneLength = 10,
                         trControl = trainControl(method = "repeatedcv", repeats = 5)))

## k-Nearest Neighbors 
## 
## 144 samples
##  51 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 130, 130, 130, 130, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  1.260486  0.5369754  1.048647
##    7  1.298180  0.5102898  1.078990
##    9  1.344889  0.4683830  1.119324
##   11  1.372963  0.4420748  1.144277
##   13  1.375529  0.4469999  1.142038
##   15  1.371445  0.4582999  1.133800
##   17  1.378592  0.4601082  1.139604
##   19  1.394591  0.4527576  1.154889
##   21  1.408886  0.4457260  1.167704
##   23  1.414787  0.4500894  1.166490
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

knnpred_cmp <- predict(knn_cmp, newdata = cmp_test_transform)
postResample(pred = knnpred_cmp, obs = cmp_test_transform$Yield)

##      RMSE  Rsquared       MAE 
## 1.3113527 0.6600128 1.0633125

Using repeated 10-fold cross-validation to tune the k parameter, the optimal KNN model with 5 neighbors minimizes RMSE at approximately 1.26. Evaluating on the test set returns a RMSE of approximately 1.31.

Support Vector Machine (SVM)

set.seed(624)
(svm_cmp <- caret::train(Yield ~ .,
                         data = cmp_train_transform,
                         method = "svmRadial",
                         tuneLength = 14,
                         trControl = trainControl(method = "repeatedcv", repeats = 5)))

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 144 samples
##  51 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 130, 130, 130, 130, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE      Rsquared   MAE      
##      0.25  1.386272  0.4938595  1.1409501
##      0.50  1.279571  0.5341400  1.0539208
##      1.00  1.201502  0.5779977  0.9830663
##      2.00  1.158854  0.6055503  0.9477725
##      4.00  1.137874  0.6161072  0.9250061
##      8.00  1.136175  0.6166123  0.9212457
##     16.00  1.143851  0.6112416  0.9272377
##     32.00  1.143851  0.6112416  0.9272377
##     64.00  1.143851  0.6112416  0.9272377
##    128.00  1.143851  0.6112416  0.9272377
##    256.00  1.143851  0.6112416  0.9272377
##    512.00  1.143851  0.6112416  0.9272377
##   1024.00  1.143851  0.6112416  0.9272377
##   2048.00  1.143851  0.6112416  0.9272377
## 
## Tuning parameter 'sigma' was held constant at a value of 0.01381135
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01381135 and C = 8.

svmpred_cmp <- predict(svm_cmp, newdata = cmp_test_transform)
postResample(pred = svmpred_cmp, obs = cmp_test_transform$Yield)

##      RMSE  Rsquared       MAE 
## 1.2362638 0.7072089 0.8728152

Using the radial basis kernel function along with repeated 10-fold cross-validation to tune \(\sigma\) and cost, the optimal SVM model with \(\sigma\) of approximately 0.01 and cost of 8 minimizes RMSE at approximately 1.14. Evaluating on the test set returns a RMSE of approximately 1.24.

Multivariate Adaptive Regression Splines (MARS)

set.seed(624)
(mars_cmp <- caret::train(Yield ~ .,
                          data = cmp_train_transform,
                          method = "earth",
                          trControl = trainControl(method = "repeatedcv", repeats = 5)))

## Multivariate Adaptive Regression Spline 
## 
## 144 samples
##  51 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 130, 130, 130, 130, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   nprune  RMSE      Rsquared   MAE     
##    2      1.392380  0.4413063  1.107060
##    9      1.302645  0.5279347  1.056024
##   17      1.382676  0.5027627  1.131236
## 
## Tuning parameter 'degree' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 9 and degree = 1.

marspred_cmp <- predict(mars_cmp, newdata = cmp_test_transform)
postResample(pred = marspred_cmp, obs = cmp_test_transform$Yield)

##      RMSE  Rsquared       MAE 
## 1.8727433 0.5251501 1.2875771

Using repeated 10-fold cross-validation to tune the number of retained terms (degree is held constant at 1), the optimal, first-degree MARS model with 9 terms minimizes RMSE at approximately 1.30. Evaluating on the test set returns a RMSE of approximately 1.87.

Neural Network

set.seed(624)
nnetGrid <- expand.grid(.decay = c(0, 0.01, 1),
                        .size = c(1:10))
(nnet_cmp <- caret::train(Yield ~ .,
                          data = cmp_train_transform,
                          method = "nnet",
                          tuneGrid = nnetGrid,
                          trControl = trainControl(method = "repeatedcv", repeats = 5),
                          linout = TRUE,
                          trace = FALSE))

## Neural Network 
## 
## 144 samples
##  51 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 130, 130, 130, 130, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE     
##   0.00    1     1.670009  0.1806237  1.374443
##   0.00    2     1.657755  0.2454971  1.356228
##   0.00    3     2.664469  0.2315111  1.969068
##   0.00    4     2.962083  0.2194136  2.306097
##   0.00    5     3.170609  0.2055507  2.496803
##   0.00    6     3.864795  0.1529160  2.957014
##   0.00    7     4.869968  0.1751694  3.596567
##   0.00    8     5.723303  0.1671704  3.995768
##   0.00    9     7.574280  0.1278161  5.138112
##   0.00   10    11.668788  0.1288191  7.232222
##   0.01    1     1.455108  0.3961894  1.166489
##   0.01    2     1.783189  0.3756717  1.363308
##   0.01    3     2.188949  0.3405498  1.647693
##   0.01    4     2.709422  0.2216519  2.105344
##   0.01    5     3.130002  0.1736650  2.432992
##   0.01    6     3.922071  0.1329444  3.023401
##   0.01    7     4.649875  0.1823120  3.342745
##   0.01    8     5.930056  0.1174045  4.170495
##   0.01    9     7.566669  0.1211547  5.034450
##   0.01   10    10.825161  0.1300644  6.800165
##   1.00    1     2.604097  0.2515688  1.717115
##   1.00    2     2.701495  0.2529938  1.886564
##   1.00    3     2.926110  0.2609018  2.111644
##   1.00    4     2.709226  0.2385605  2.074288
##   1.00    5     2.512027  0.2802414  1.989836
##   1.00    6     2.700054  0.2521452  2.135603
##   1.00    7     3.276308  0.2301583  2.495083
##   1.00    8     4.158249  0.2249644  2.908250
##   1.00    9     5.241916  0.1671743  3.566585
##   1.00   10     5.839900  0.1555386  3.967025
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 1 and decay = 0.01.

nnetpred_cmp <- predict(nnet_cmp, newdata = cmp_test_transform)
postResample(pred = nnetpred_cmp, obs = cmp_test_transform$Yield)

##      RMSE  Rsquared       MAE 
## 1.6005267 0.4401126 1.1712126

Using the book-suggested candidate set of models as well as repeated 10-fold cross-validation to tune the parameters, the optimal neural network model with size of 1 and decay of 0.01 minimizes RMSE at approximately 1.46. Evaluating on the test set returns an RMSE of approximately 1.60.

Of the four models, the SVM model performs the best on the test set. Its test set RMSE is approximately 1.24.

b. Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

varImp(svm_cmp)$importance %>%
  arrange(-Overall) %>%
  rownames_to_column("predictor") %>%
  top_n(10) %>%
  ggplot(aes(x = reorder(predictor, Overall), y = Overall)) +
    geom_col() +
    ggtitle("Top ten predictors of product yield, by importance in SVM model") +
    xlab(NULL) +
    ylab("Importance") +
    coord_flip()

set.seed(624)
cmp_pls <- caret::train(Yield ~ .,
            data = cmp_train_transform,
            method = "pls",
            tuneLength = 20,
            trControl = trainControl(method = "repeatedcv", repeats = 5)
            )
varImp(cmp_pls)$importance %>%
  arrange(-Overall) %>%
  rownames_to_column("predictor") %>%
  top_n(10) %>%
  ggplot(aes(x = reorder(predictor, Overall), y = Overall)) +
    geom_col() +
    ggtitle("Top ten predictors of product yield, by importance in PLS model") +
    xlab(NULL) +
    ylab("Importance") +
    coord_flip()

Generally, the same set of predictors is important in both the nonlinear SVM and linear Partial Least Squares (PLS) models. Both models are led by ManufacturingProcess32 and place greater importance on manufacturing predictors. However, the biological predictors are more important in the SVM model, with two of them--BiologicalMaterial06 and BiologicalMaterial03--in the three most important predictors. These two are also important in the PLS model, though less so.

c. Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

svm_top10 <- varImp(svm_cmp)$importance %>%
  arrange(-Overall) %>%
  rownames_to_column("predictor") %>%
  top_n(10) %>%
  select(-Overall)
pls_top10 <- varImp(cmp_pls)$importance %>%
  arrange(-Overall) %>%
  rownames_to_column("predictor") %>%
  top_n(10) %>%
  select(-Overall)
(svm_unique <- unname(unlist(anti_join(svm_top10, pls_top10))))

## [1] "ManufacturingProcess31" "BiologicalMaterial12"

featurePlot(cmp_train_transform[, svm_unique], cmp_train_transform$Yield)

cor(cmp_train_transform[, svm_unique], cmp_train_transform$Yield, method = "spearman")

##                              [,1]
## ManufacturingProcess31 -0.3800313
## BiologicalMaterial12    0.4206077

Two of the top ten predictors for the SVM model are unique relative to the PLS model. They are BiologicalMaterial12 and ManufacturingProcess31. Using the training set, plotting Yield against the two predictors reveals weak if any relations with the biological predictor and at least one clear outlier for the manufacturing predictor. The latter was not addressed by the Box-Cox transformation in pre-processing. Spearman's \(\rho\) for each predictor confirms the visual patterns, though the outlier(s) is playing a role for the manufacturing predictor. Generally, these plots do not reveal much about the relationships between the predictors and Yield, though perhaps additional cleaning would be useful.

Source

Kuhn, M. and Johnson, K. (2013). Applied predictive modeling. doi 10.1007/978-1-4614-6849-3