Instructions

In Kuhn and Johnson do problems 7.2 and 7.5. There are only two but they consist of many parts. Please submit both a link to your Rpubs and the .rmd file.

Question 7.2

Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data: Formula Reference where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

Code Given(1):

library(mlbench)
## Warning: package 'mlbench' was built under R version 4.4.2
set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
print(featurePlot(trainingData$x, trainingData$y))

## or other methods.
## This creates a list with a vector 'y' and a matrix of predictors 'x'. Also simulate a large test set to estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

Tune several models on these data. For example:

Code Given(2):

library(caret)
knnModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = "knn",
                  preProc = c("center", "scale"),
                  tuneLength = 10)
knnModel
## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.466085  0.5121775  2.816838
##    7  3.349428  0.5452823  2.727410
##    9  3.264276  0.5785990  2.660026
##   11  3.214216  0.6024244  2.603767
##   13  3.196510  0.6176570  2.591935
##   15  3.184173  0.6305506  2.577482
##   17  3.183130  0.6425367  2.567787
##   19  3.198752  0.6483184  2.592683
##   21  3.188993  0.6611428  2.588787
##   23  3.200458  0.6638353  2.604529
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 17.
knnPred <- predict(knnModel, newdata = testData$x)
## The function 'postResample' can be used to get the test set perforamnce values
postResample(pred = knnPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 3.2040595 0.6819919 2.5683461

Start of My Model work with the other types mentioned in chapter

Working with MARS Model

## Setting up the parameters for the MARS model allowing individual predictors, as well as limited dual combinations of them. Allowing large range for pruning.
marsGrid <- expand.grid(.degree = 1:2,  .nprune = 2:50) 
set.seed(100)
marsFit <- train(x = trainingData$x,
                 y = trainingData$y,
                 method = "earth",
                 tuneGrid = marsGrid,
                 preProc = c("center", "scale"),
                 trControl = trainControl(method = "cv"))


print(summary(marsFit$finalModel))
## Call: earth(x=data.frame[200,10], y=c(18.46,16.1,17...), keepxy=TRUE, degree=2,
##             nprune=14)
## 
##                                   coefficients
## (Intercept)                         22.0339290
## h(0.507267-X1)                      -4.5157241
## h(X1-0.507267)                       2.6841329
## h(0.325504-X2)                      -5.4438679
## h(-0.216741-X3)                      3.3683600
## h(X3- -0.216741)                     2.0371575
## h(0.953812-X4)                      -2.7853388
## h(X4-0.953812)                       2.7366442
## h(1.17878-X5)                       -1.5636213
## h(X1- -0.951872) * h(X2-0.325504)   -0.7790969
## h(X1-0.507267) * h(X2- -0.798188)   -2.6276789
## h(0.606835-X1) * h(0.325504-X2)      2.1773145
## h(0.325504-X2) * h(X3-0.795427)      1.7739671
## h(X2-0.325504) * h(X3- -0.917499)    0.5726623
## 
## Selected 14 of 21 terms, and 5 of 10 predictors (nprune=14)
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6-unused, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 8 5
## GCV 1.841667    RSS 255.2757    GRSq 0.9252276    RSq 0.9476564
#Predictions
marsPred <- predict(marsFit, newdata = testData$x)
print(postResample(pred = marsPred, obs = testData$y))
##      RMSE  Rsquared       MAE 
## 1.2779993 0.9338365 1.0147070
#     RMSE  Rsquared       MAE 
#1.2433451 0.9364207 0.9824062 

Working with Neural Network Model

### Neural net model, we have 1-10 neurons, keeping inline with text books decay and bagging suggestions,
set.seed(100)
netGrid <- expand.grid(.decay = c(0, 0.01, 0.1),.size = 1:10,.bag = FALSE)

nnetTune <- train(trainingData$x, trainingData$y,method = "avNNet",  tuneGrid = netGrid,
                  trControl = trainControl(method = "cv"),
                  preProc = c("center", "scale"),
                  linout = TRUE,
                  trace = FALSE,maxit = 500)
## Warning: executing %dopar% sequentially: no parallel backend registered
print(summary(nnetTune$bestTune))
##       size       decay        bag         
##  Min.   :5   Min.   :0.1   Mode :logical  
##  1st Qu.:5   1st Qu.:0.1   FALSE:1        
##  Median :5   Median :0.1                  
##  Mean   :5   Mean   :0.1                  
##  3rd Qu.:5   3rd Qu.:0.1                  
##  Max.   :5   Max.   :0.1
print(summary(nnetTune$finalModel))
##             Length Class      Mode     
## model        5     -none-     list     
## repeats      1     -none-     numeric  
## bag          1     -none-     logical  
## seeds        5     -none-     numeric  
## names       10     -none-     character
## terms        3     terms      call     
## coefnames   10     -none-     character
## xlevels      0     -none-     list     
## xNames      10     -none-     character
## problemType  1     -none-     character
## tuneValue    3     data.frame list     
## obsLevels    1     -none-     logical  
## param        3     -none-     list
print(nnetTune$results)
##    decay size   bag     RMSE  Rsquared      MAE    RMSESD RsquaredSD     MAESD
## 1   0.00    1 FALSE 2.392711 0.7610354 1.897330 0.5399693 0.17419578 0.4641855
## 11  0.01    1 FALSE 2.385381 0.7602926 1.887906 0.5431008 0.17543766 0.4628980
## 21  0.10    1 FALSE 2.393965 0.7596431 1.894191 0.5491281 0.17609869 0.4603040
## 2   0.00    2 FALSE 2.410532 0.7567109 1.907478 0.5537566 0.16917116 0.4919607
## 12  0.01    2 FALSE 2.425125 0.7510903 1.935991 0.5736786 0.17578732 0.4898851
## 22  0.10    2 FALSE 2.423612 0.7525959 1.935872 0.5655335 0.16649632 0.4884680
## 3   0.00    3 FALSE 2.043957 0.8224281 1.630751 0.5108291 0.12133400 0.3649992
## 13  0.01    3 FALSE 2.151209 0.8016018 1.701951 0.5304276 0.14210227 0.3833116
## 23  0.10    3 FALSE 2.169914 0.7982380 1.726854 0.5615105 0.14535072 0.4255456
## 4   0.00    4 FALSE 2.289347 0.8130639 1.749187 1.2305832 0.12997012 0.7232752
## 14  0.01    4 FALSE 2.091925 0.8154383 1.676653 0.4658305 0.10671510 0.3146718
## 24  0.10    4 FALSE 2.059080 0.8224160 1.648610 0.4180618 0.11004539 0.2603083
## 5   0.00    5 FALSE 2.445600 0.7709399 1.824446 0.6475054 0.15948604 0.4273511
## 15  0.01    5 FALSE 2.169742 0.7999255 1.738715 0.4283499 0.11256277 0.3392557
## 25  0.10    5 FALSE 1.975656 0.8394000 1.578979 0.4361593 0.10985162 0.3170231
## 6   0.00    6 FALSE 2.898295 0.7388800 2.052725 1.7653452 0.22072875 0.9189415
## 16  0.01    6 FALSE 2.262032 0.8056619 1.817195 0.3837171 0.08434748 0.2665117
## 26  0.10    6 FALSE 2.152198 0.8098015 1.693056 0.4125129 0.11092059 0.2616601
## 7   0.00    7 FALSE 3.351563 0.6644147 2.460366 1.0264160 0.12674850 0.5790942
## 17  0.01    7 FALSE 2.318301 0.7861811 1.856908 0.4915103 0.12867124 0.3401214
## 27  0.10    7 FALSE 2.161512 0.8163011 1.693526 0.4391635 0.08772441 0.2562003
## 8   0.00    8 FALSE 6.513566 0.4418645 3.563297 5.3438631 0.28363806 1.9345711
## 18  0.01    8 FALSE 2.413847 0.7772629 1.938009 0.5063174 0.11741514 0.3397851
## 28  0.10    8 FALSE 2.273716 0.7922525 1.822713 0.3808189 0.10972838 0.2184895
## 9   0.00    9 FALSE 4.484215 0.5644107 2.877950 2.3737627 0.22792454 0.9598735
## 19  0.01    9 FALSE 2.317190 0.7847500 1.857641 0.5353001 0.13951823 0.4198470
## 29  0.10    9 FALSE 2.315333 0.7811273 1.785409 0.5720570 0.13029365 0.3979768
## 10  0.00   10 FALSE 3.422545 0.6247430 2.439739 1.5522354 0.22789677 0.6384685
## 20  0.01   10 FALSE 2.480407 0.7408505 1.995656 0.5927652 0.16156925 0.4396292
## 30  0.10   10 FALSE 2.334803 0.7692182 1.872733 0.5149409 0.13810727 0.4015950
print(plot(nnetTune))

### Executing on Test Data 
nnetPred <- predict(nnetTune, newdata = testData$x)
print(postResample(pred = nnetPred, obs = testData$y))
##      RMSE  Rsquared       MAE 
## 2.1113956 0.8277556 1.5739011
#     RMSE  Rsquared       MAE 
#2.1917095 0.8121672 1.6442031

Working with SVM Model

set.seed(100)
svmRTuned <- train(trainingData$x, trainingData$y, 
                   method = "svmRadial",
                   preProc = c("center", "scale"),
                   tuneLength = 14,
                   trControl = trainControl(method = "cv"))


print(summary(svmRTuned$finalModel))
## Length  Class   Mode 
##      1   ksvm     S4
print(svmRTuned$results)
##         sigma       C     RMSE  Rsquared      MAE    RMSESD RsquaredSD
## 1  0.06509124    0.25 2.530787 0.7922715 2.013175 0.2696024 0.12291001
## 2  0.06509124    0.50 2.259539 0.8064569 1.789962 0.3313399 0.11781033
## 3  0.06509124    1.00 2.099789 0.8274242 1.656154 0.3481393 0.10198585
## 4  0.06509124    2.00 2.002943 0.8412934 1.583791 0.3311129 0.08451813
## 5  0.06509124    4.00 1.943618 0.8504425 1.546586 0.3292231 0.07589338
## 6  0.06509124    8.00 1.918711 0.8547582 1.532981 0.3162050 0.07104595
## 7  0.06509124   16.00 1.920651 0.8536189 1.536116 0.3185506 0.07191691
## 8  0.06509124   32.00 1.920651 0.8536189 1.536116 0.3185506 0.07191691
## 9  0.06509124   64.00 1.920651 0.8536189 1.536116 0.3185506 0.07191691
## 10 0.06509124  128.00 1.920651 0.8536189 1.536116 0.3185506 0.07191691
## 11 0.06509124  256.00 1.920651 0.8536189 1.536116 0.3185506 0.07191691
## 12 0.06509124  512.00 1.920651 0.8536189 1.536116 0.3185506 0.07191691
## 13 0.06509124 1024.00 1.920651 0.8536189 1.536116 0.3185506 0.07191691
## 14 0.06509124 2048.00 1.920651 0.8536189 1.536116 0.3185506 0.07191691
##        MAESD
## 1  0.2661719
## 2  0.3248628
## 3  0.3121841
## 4  0.2760162
## 5  0.2546977
## 6  0.2460647
## 7  0.2495101
## 8  0.2495101
## 9  0.2495101
## 10 0.2495101
## 11 0.2495101
## 12 0.2495101
## 13 0.2495101
## 14 0.2495101
## Test Data
svmPred <- predict(svmRTuned, newdata = testData$x)
postResample(pred = svmPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 2.0631908 0.8275736 1.5662213
#     RMSE  Rsquared       MAE 
#2.0202829 0.8327653 1.5675884 

Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

Answer

Here are the output results of each of my models:

Model RMSE MAE
MARS 1.2433 0.9364 0.9824
SVM (Radial) 2.0203 0.8328 1.5676
Neural Network 2.1917 0.8122 1.6442

The MARS model is by far the best with the lowest Root Mean Squared error and the highest R^2. This means this model explains about 93 percent of the variance in the target variable with the least error, as MAE is also the lowest in this model. The runner up would be the SVM model, with the neural network model coming in third place.

With respect to the MARS model, yes the model chose X variables 1 through 5. The ranking of importance of variable by influence was listed as X1, X4, X2, X5, X3. This generally matches the initially provide formula with the MARS model ignoring the variables that are not relevant.

Question 7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.

Training the data as in Exercise 6.3

library(AppliedPredictiveModeling)
## Warning: package 'AppliedPredictiveModeling' was built under R version 4.4.2
data(ChemicalManufacturingProcess)


# Prepping the data 
set.seed(100)

## IMPUTING Keeping it simple with taking the median values of each of the columns, as most columns that have NA only have one NA
imputed <- preProcess(ChemicalManufacturingProcess, method = "medianImpute")
chem_df_imputed <- predict(imputed, ChemicalManufacturingProcess)

## Establishing the x and y vars
X <- chem_df_imputed[, -which(names(chem_df_imputed) == "Yield")]
y <- chem_df_imputed$Yield

## putting together for working & spltting 
ChemData <- list(x = X, y = y)

## Splitting 70/30
splitIndex <- createDataPartition(ChemData$y, p = 0.7, list = FALSE)

ChemtrainingData <- list(x = data.frame(ChemData$x[splitIndex, ]),y = ChemData$y[splitIndex])
ChemtestData <- list( x = data.frame(ChemData$x[-splitIndex, ]),y = ChemData$y[-splitIndex])

(a) Which nonlinear regression model gives the optimal resampling and test set performance?

#Looking at which of the predictor variables create the best fitting non-linear model to predict yield
set.seed(100)

### Firstly the Neural Net Model
netGrid <- expand.grid(.decay = c(0, 0.01, 0.1),.size = 1:10,.bag = FALSE)

nnetTune <- train( ChemtrainingData$x, 
                   ChemtrainingData$y,
                  method = "avNNet",  tuneGrid = netGrid,
                  trControl = trainControl(method = "cv"),
                  preProc = c("center", "scale"),
                  linout = TRUE,
                  trace = FALSE,maxit = 500)
#----Evaluation
nnetPred <- predict(nnetTune, newdata = ChemtestData$x)
print(postResample(pred = nnetPred, obs = ChemtestData$y))
##      RMSE  Rsquared       MAE 
## 1.3310979 0.4190896 1.0918633
#     RMSE  Rsquared       MAE 
#1.3745797 0.5428737 1.0879853

### MARS model
marsGrid <- expand.grid(.degree = 1:2,  .nprune = 2:50) 
marsFit <- train(x = ChemtrainingData$x,
                 y = ChemtrainingData$y,
                 method = "earth",
                 tuneGrid = marsGrid,
                 preProc = c("center", "scale"),
                 trControl = trainControl(method = "cv"))
#----Evaluation
marsPred <- predict(marsFit, newdata = ChemtestData$x)
print(postResample(pred = marsPred, obs = ChemtestData$y))
##      RMSE  Rsquared       MAE 
## 1.4072009 0.4851382 1.1332918
#     RMSE  Rsquared       MAE 
#1.2135116 0.6004976 0.9666867

#KNN Model
knnModel <- train(x = ChemtrainingData$x,
                  y = ChemtrainingData$y,
                  method = "knn",
                  preProc = c("center", "scale"),
                  tuneLength = 10)
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
#----Evaluation
knnPred <- predict(knnModel, newdata = ChemtestData$x)
print(postResample(pred = knnPred, obs = ChemtestData$y))
##      RMSE  Rsquared       MAE 
## 1.2268780 0.4636449 1.0207692
#     RMSE  Rsquared       MAE 
#1.4009888 0.5174132 1.1011325 

##SVM model
svmRTuned <- train(ChemtrainingData$x, ChemtrainingData$y, 
                   method = "svmRadial",
                   preProc = c("center", "scale"),
                   tuneLength = 14,
                   trControl = trainControl(method = "cv"))


#----Evaluation
svmPred <- predict(svmRTuned, newdata = ChemtestData$x)
print(postResample(pred = svmPred, obs = ChemtestData$y))
##      RMSE  Rsquared       MAE 
## 1.2175550 0.5212596 0.9650886
#     RMSE  Rsquared       MAE 
#1.1690926 0.6398953 0.8977043 

(b) Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

Of all the models ran on the data, the SVM model is the best performing model. The R^2 from this model was 0.634 and the RMSE was 1.16. The MAE was 0.897. OVerall these were the best numbers. The second place model was the MARS model and it had an RMSE of 1.213, an r^2 of 0.600, and a MAE 0.966. Taking a look at the SVM model, the top ten most important predictors include both biological and process variables. Out of the ten, six were process variables and four were biological. This means that process variables were a bit more dominant in the “most important” predictors. When compared to the results of the linear model from 6.3, there are many more biological variables dominating in the SVM model results. However, the ManufacturingProcess32 predicotr is still number one, and the ManufacturingProcess13 is still in the top 3.

## SVM Model results for important variables
print(varImp(svmRTuned))
## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess13   94.64
## BiologicalMaterial06     76.19
## ManufacturingProcess17   73.26
## ManufacturingProcess09   69.65
## BiologicalMaterial12     66.34
## BiologicalMaterial02     66.06
## ManufacturingProcess36   65.18
## ManufacturingProcess31   64.17
## BiologicalMaterial03     63.32
## ManufacturingProcess11   54.80
## ManufacturingProcess06   53.67
## BiologicalMaterial04     50.26
## ManufacturingProcess30   49.24
## ManufacturingProcess29   44.38
## BiologicalMaterial08     44.11
## BiologicalMaterial09     42.22
## BiologicalMaterial11     39.01
## ManufacturingProcess33   38.36
## ManufacturingProcess02   36.08
Linear Model Important Variables
Linear Model Important Variables

(c) Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

I looked at BiologicalMaterial06, BiologicalMaterial03 and BiologicalMaterial12. The first two of these variables were in the top ten for both model, but were much higher in the SVM model than the linear model. The third was not in the top 10 of the linear model, but was the 6th most important in the SVM model. When plotted against yield,all of these variables showed some one ambiguous, not so linear patterns. This explains the discrepency in model variable results.

library(ggplot2)

vars_to_plot <- c("BiologicalMaterial06", "BiologicalMaterial03", "BiologicalMaterial12")

for (var in vars_to_plot) {
  print(
    ggplot(ChemicalManufacturingProcess, aes_string(x = var, y = "Yield")) +
      geom_point() +
      #geom_smooth(method = "loess", se = FALSE, color = "blue") +
      labs(title = paste("Yield vs", var))
  )
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.