Question 6.2 A

library(AppliedPredictiveModeling)
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
library(elasticnet)
## Loading required package: lars
## Loaded lars 1.3
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tibble)
library(tidyr)
library(ggplot2)
data(permeability)

Part B

After subtracting using NearZeroVar, we only have 388 predictors left.

nzv <- nearZeroVar(fingerprints)
fingerprints_filtered <- fingerprints[, -nzv]
dim(fingerprints_filtered)
## [1] 165 388

Part C Split the data into training and test sets and then fit a PLS model on the training data using 10 fold cross validation. The tuning parameter was evaluated from 1 to 20 components. Based on our resampling results the optimal model selected by cross-validation used 9 latent variables and had a resampled R^2 of 0.4373. The highest observed cross-validated R^2 was 0.4443 at 5 components, but the lowest RMSE occurred at 9 components, which is why this would be our optimal number of latent variables.

#First we set our seed and 
set.seed(62462)

x <- fingerprints[, -nzv]
y <- permeability

#Split data 80%/20%
Trainsplit <- createDataPartition(y, p = 0.8, list = FALSE)

train_x <- x[Trainsplit, ]
test_x  <- x[-Trainsplit, ]
train_y <- y[Trainsplit]
test_y  <- y[-Trainsplit]

ctrl <- trainControl(method = "cv", number = 10)

pls_model <- train(
  x = train_x,
  y = train_y,
  method = "pls",
  preProcess = c("center", "scale"),
  tuneLength = 20,
  trControl = ctrl
)

pls_model
## Partial Least Squares 
## 
## 133 samples
## 388 predictors
## 
## Pre-processing: centered (388), scaled (388) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 120, 118, 121, 119, 119, 118, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE      
##    1     13.58830  0.2701593  10.484186
##    2     12.51249  0.3866277   8.874014
##    3     12.35552  0.4091924   9.303344
##    4     12.31505  0.4283548   9.169979
##    5     12.16807  0.4443312   9.063389
##    6     12.15799  0.4365662   9.162581
##    7     12.29198  0.4244414   9.384539
##    8     12.30901  0.4229727   9.429921
##    9     12.16223  0.4373211   9.224731
##   10     12.52104  0.4312315   9.417766
##   11     12.62894  0.4260757   9.503149
##   12     12.95416  0.4101189   9.819188
##   13     13.35056  0.3912642  10.006916
##   14     13.76414  0.3731831  10.282876
##   15     14.40740  0.3380444  10.610761
##   16     14.98058  0.3103440  10.922962
##   17     15.30738  0.2999879  11.048193
##   18     15.53869  0.2917046  11.230411
##   19     15.51836  0.2908946  11.262496
##   20     15.95305  0.2750077  11.645145
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 6.

Part D The tuned PLS model was then used to predict permeability on the held-out test set. The test-set performance was RMSE = 8.4201, MAE = 6.4669, and R^2 = 0.6805. The model explained about 68% of the variation indicating strong predictive performance on the test set. I would have expected it to be a bit closer to the test RMSE so I must assume that these were very strong predictors or there may have been some more variable compounds in the prediction dataset as well. It is likely a combination of these that caused this outcome.

pls_pred <- predict(pls_model, newdata = test_x)
postResample(pred = pls_pred, obs = test_y)
##      RMSE  Rsquared       MAE 
## 8.4200782 0.6804608 6.4668817

Part E

I compared the tuned PLS model with ridge regression, lasso regression, and elastic net using the same training and test split. On the test set, the PLS model had RMSE = 8.42, R^2 = 0.6805, and MAE = 6.47. Ridge regression had a higher R^2 = 0.7785, but it also had a worse RMSE = 10.43 and MAE = 7.64. Elastic net also had worse prediction errors than PLS, with RMSE = 10.50 and MAE = 7.07. The lasso model produced extremely large errors, suggesting an unstable or poor fit, so it was not considered here. Our results suggest PLS provided the best predictive performance on this split because it had the lowest test-set error measures.

# Ridge regression
ridge_grid <- expand.grid(lambda = seq(0, 0.10, length = 20))

ridge_model <- train(
  x = train_x,
  y = train_y,
  method = "ridge",
  preProcess = c("center", "scale"),
  tuneGrid = ridge_grid,
  trControl = ctrl
)

ridge_model
## Ridge Regression 
## 
## 133 samples
## 388 predictors
## 
## Pre-processing: centered (388), scaled (388) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 120, 119, 119, 121, 118, 121, ... 
## Resampling results across tuning parameters:
## 
##   lambda       RMSE          Rsquared   MAE         
##   0.000000000  1.570704e+26  0.1357507  5.654761e+25
##   0.005263158  4.916346e+01  0.2261837  2.893677e+01
##   0.010526316  4.114272e+01  0.2778971  2.554478e+01
##   0.015789474  1.692617e+01  0.2875033  1.220246e+01
##   0.021052632  1.924429e+03  0.2917511  1.073490e+03
##   0.026315789  1.576733e+01  0.3031166  1.150796e+01
##   0.031578947  1.538408e+01  0.3149947  1.126935e+01
##   0.036842105  1.520018e+01  0.3194208  1.119145e+01
##   0.042105263  1.487557e+01  0.3337439  1.096383e+01
##   0.047368421  1.468251e+01  0.3425391  1.085727e+01
##   0.052631579  1.453659e+01  0.3494269  1.076514e+01
##   0.057894737  1.456310e+01  0.3480253  1.076029e+01
##   0.063157895  1.431162e+01  0.3629589  1.064745e+01
##   0.068421053  1.421177e+01  0.3687409  1.058548e+01
##   0.073684211  1.412136e+01  0.3740305  1.052565e+01
##   0.078947368  1.405344e+01  0.3783054  1.048605e+01
##   0.084210526  1.397682e+01  0.3834068  1.042852e+01
##   0.089473684  1.389516e+01  0.3883824  1.036535e+01
##   0.094736842  1.383332e+01  0.3923924  1.031843e+01
##   0.100000000  1.379852e+01  0.3955399  1.029681e+01
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.1.
ridge_model$bestTune
##    lambda
## 20    0.1
ridge_pred <- predict(ridge_model, newdata = test_x)
ridge_test <- postResample(pred = ridge_pred, obs = test_y)
ridge_test
##      RMSE  Rsquared       MAE 
## 8.8923630 0.6637158 7.0802748
# Lasso regression
lasso_grid <- expand.grid(fraction = seq(0.05, 1.00, length = 20))

lasso_model <- train(
  x = train_x,
  y = train_y,
  method = "lasso",
  preProcess = c("center", "scale"),
  tuneGrid = lasso_grid,
  trControl = ctrl
)

lasso_model
## The lasso 
## 
## 133 samples
## 388 predictors
## 
## Pre-processing: centered (388), scaled (388) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 120, 118, 119, 120, 119, 121, ... 
## Resampling results across tuning parameters:
## 
##   fraction  RMSE          Rsquared   MAE         
##   0.05      7.125985e+14  0.3913785  2.790099e+14
##   0.10      1.436233e+15  0.3830256  5.612075e+14
##   0.15      2.159869e+15  0.3874314  8.434050e+14
##   0.20      2.883505e+15  0.3887548  1.125603e+15
##   0.25      3.607141e+15  0.3840712  1.407800e+15
##   0.30      4.330778e+15  0.3811369  1.689998e+15
##   0.35      5.054414e+15  0.3764348  1.972195e+15
##   0.40      5.778051e+15  0.3678653  2.254393e+15
##   0.45      6.501687e+15  0.3581032  2.536590e+15
##   0.50      7.225324e+15  0.3483971  2.818788e+15
##   0.55      7.948960e+15  0.3434420  3.100986e+15
##   0.60      8.672597e+15  0.3365944  3.383183e+15
##   0.65      9.396234e+15  0.3337979  3.665381e+15
##   0.70      1.011987e+16  0.3349652  3.947578e+15
##   0.75      1.084351e+16  0.3348344  4.229776e+15
##   0.80      1.156714e+16  0.3317065  4.511973e+15
##   0.85      1.229078e+16  0.3276144  4.794171e+15
##   0.90      1.301442e+16  0.3251199  5.076368e+15
##   0.95      1.373805e+16  0.3188556  5.358565e+15
##   1.00      1.446169e+16  0.3111218  5.640761e+15
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was fraction = 0.05.
lasso_model$bestTune
##   fraction
## 1     0.05
lasso_pred <- predict(lasso_model, newdata = test_x)
lasso_test <- postResample(pred = lasso_pred, obs = test_y)
lasso_test
##         RMSE     Rsquared          MAE 
## 1.146737e+04 5.586020e-02 3.472114e+03
# Elastic net
enet_grid <- expand.grid(
  fraction = seq(0.05, 1.00, length = 10),
  lambda   = seq(0.001, 0.10, length = 10)
)

enet_model <- train(
  x = train_x,
  y = train_y,
  method = "enet",
  preProcess = c("center", "scale"),
  tuneGrid = enet_grid,
  trControl = ctrl
)

enet_model
## Elasticnet 
## 
## 133 samples
## 388 predictors
## 
## Pre-processing: centered (388), scaled (388) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 119, 121, 121, 120, 118, 119, ... 
## Resampling results across tuning parameters:
## 
##   lambda  fraction   RMSE        Rsquared   MAE        
##   0.001   0.0500000   118.00689  0.2045016    69.856020
##   0.001   0.1555556   329.05047  0.1912810   182.247634
##   0.001   0.2611111   538.15192  0.1717057   301.049642
##   0.001   0.3666667   758.65020  0.1545316   427.466080
##   0.001   0.4722222   981.53946  0.1388149   552.433267
##   0.001   0.5777778  1203.42782  0.1264437   674.703059
##   0.001   0.6833333  1427.41287  0.1155196   797.223614
##   0.001   0.7888889  1653.89416  0.1093290   921.038571
##   0.001   0.8944444  1883.44311  0.1061554  1045.340636
##   0.001   1.0000000  2110.19188  0.1053302  1168.933946
##   0.012   0.0500000    14.75561  0.3976749    10.729054
##   0.012   0.1555556    22.94609  0.3934940    16.102293
##   0.012   0.2611111    31.28186  0.3768562    21.242523
##   0.012   0.3666667    39.09975  0.3508812    25.538881
##   0.012   0.4722222    47.39641  0.3255760    29.884548
##   0.012   0.5777778    55.78782  0.3043199    34.330082
##   0.012   0.6833333    64.20138  0.2929942    38.751785
##   0.012   0.7888889    72.63021  0.2877169    43.413641
##   0.012   0.8944444    80.98266  0.2832547    47.992833
##   0.012   1.0000000    89.27559  0.2803421    52.491375
##   0.023   0.0500000    12.44939  0.4170920     8.892478
##   0.023   0.1555556    12.39041  0.4048264     8.855836
##   0.023   0.2611111    12.50736  0.4085065     9.125323
##   0.023   0.3666667    12.94110  0.3890827     9.342383
##   0.023   0.4722222    13.44217  0.3648879     9.535783
##   0.023   0.5777778    13.90789  0.3441756     9.906125
##   0.023   0.6833333    14.43625  0.3269003    10.333523
##   0.023   0.7888889    14.97204  0.3142979    10.736179
##   0.023   0.8944444    15.43489  0.3060791    11.107463
##   0.023   1.0000000    15.82875  0.2991945    11.402738
##   0.034   0.0500000    12.47636  0.4218148     9.038863
##   0.034   0.1555556    12.53653  0.3942653     8.865798
##   0.034   0.2611111    12.52110  0.4086231     9.126980
##   0.034   0.3666667    12.86031  0.3969946     9.358264
##   0.034   0.4722222    13.21704  0.3819855     9.466610
##   0.034   0.5777778    13.59810  0.3658623     9.663813
##   0.034   0.6833333    13.92188  0.3570046     9.910022
##   0.034   0.7888889    14.34750  0.3454552    10.257172
##   0.034   0.8944444    14.74981  0.3343056    10.603892
##   0.034   1.0000000    15.10433  0.3254961    10.879889
##   0.045   0.0500000    12.56344  0.4250384     9.203076
##   0.045   0.1555556    12.54785  0.3943201     8.814199
##   0.045   0.2611111    12.39672  0.4163723     9.036362
##   0.045   0.3666667    12.72419  0.4024322     9.278699
##   0.045   0.4722222    12.97930  0.3946764     9.363929
##   0.045   0.5777778    13.29848  0.3816799     9.472416
##   0.045   0.6833333    13.49552  0.3779408     9.577994
##   0.045   0.7888889    13.83033  0.3693305     9.812806
##   0.045   0.8944444    14.19647  0.3570046    10.111674
##   0.045   1.0000000    14.51402  0.3482228    10.376781
##   0.056   0.0500000    12.60020  0.4251990     9.281039
##   0.056   0.1555556    12.51650  0.3971114     8.779484
##   0.056   0.2611111    12.32401  0.4208399     8.961922
##   0.056   0.3666667    12.64842  0.4069165     9.215172
##   0.056   0.4722222    12.84404  0.4027014     9.290319
##   0.056   0.5777778    13.11463  0.3932367     9.367557
##   0.056   0.6833333    13.31621  0.3894322     9.462072
##   0.056   0.7888889    13.60181  0.3826295     9.653833
##   0.056   0.8944444    13.93910  0.3701522     9.920964
##   0.056   1.0000000    14.23457  0.3612952    10.143099
##   0.067   0.0500000    12.65488  0.4288773     9.391299
##   0.067   0.1555556    12.55285  0.3963772     8.782547
##   0.067   0.2611111    12.37359  0.4189880     8.963664
##   0.067   0.3666667    12.65950  0.4080633     9.233022
##   0.067   0.4722222    12.83073  0.4040766     9.294671
##   0.067   0.5777778    13.03496  0.3981701     9.332239
##   0.067   0.6833333    13.20324  0.3963600     9.394521
##   0.067   0.7888889    13.41139  0.3936857     9.496656
##   0.067   0.8944444    13.71684  0.3828756     9.733943
##   0.067   1.0000000    13.98900  0.3740494     9.959850
##   0.078   0.0500000    12.69368  0.4302793     9.457133
##   0.078   0.1555556    12.54608  0.3979219     8.771262
##   0.078   0.2611111    12.37537  0.4185778     8.924040
##   0.078   0.3666667    12.62398  0.4109466     9.199665
##   0.078   0.4722222    12.82236  0.4051801     9.300259
##   0.078   0.5777778    12.98363  0.4021263     9.315936
##   0.078   0.6833333    13.12536  0.4019012     9.366369
##   0.078   0.7888889    13.29809  0.4004761     9.448096
##   0.078   0.8944444    13.56184  0.3921885     9.616327
##   0.078   1.0000000    13.82490  0.3831107     9.834757
##   0.089   0.0500000    12.76805  0.4301631     9.553760
##   0.089   0.1555556    12.52206  0.4011014     8.740802
##   0.089   0.2611111    12.40075  0.4169507     8.904942
##   0.089   0.3666667    12.60812  0.4134609     9.194771
##   0.089   0.4722222    12.82023  0.4061166     9.310851
##   0.089   0.5777778    12.94414  0.4056864     9.307764
##   0.089   0.6833333    13.06684  0.4063374     9.346933
##   0.089   0.7888889    13.21561  0.4057463     9.413205
##   0.089   0.8944444    13.43999  0.3996871     9.536338
##   0.089   1.0000000    13.68994  0.3912227     9.729456
##   0.100   0.0500000    12.81281  0.4304290     9.624466
##   0.100   0.1555556    12.51065  0.4035470     8.720264
##   0.100   0.2611111    12.43278  0.4145059     8.894274
##   0.100   0.3666667    12.61310  0.4130824     9.184413
##   0.100   0.4722222    12.84834  0.4027745     9.325883
##   0.100   0.5777778    12.94386  0.4050024     9.314861
##   0.100   0.6833333    13.03499  0.4084090     9.343808
##   0.100   0.7888889    13.15677  0.4093087     9.397065
##   0.100   0.8944444    13.35397  0.4051781     9.486202
##   0.100   1.0000000    13.58956  0.3976765     9.656569
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 0.2611111 and lambda
##  = 0.056.
enet_model$bestTune
##     fraction lambda
## 53 0.2611111  0.056
enet_pred <- predict(enet_model, newdata = test_x)
enet_test <- postResample(pred = enet_pred, obs = test_y)
enet_test
##      RMSE  Rsquared       MAE 
## 9.7528530 0.5801447 7.8950955
#Results
results <- rbind(
  PLS   = c(RMSE = 8.4200782, Rsquared = 0.6804608, MAE = 6.4668817),
  Ridge = ridge_test,
  Lasso = lasso_test,
  Enet  = enet_test
)

results <- as.data.frame(results)
results
##               RMSE  Rsquared         MAE
## PLS       8.420078 0.6804608    6.466882
## Ridge     8.892363 0.6637158    7.080275
## Lasso 11467.370062 0.0558602 3472.114238
## Enet      9.752853 0.5801447    7.895095

Part F

In this analysis, the PLS model showed reasonably good predictive ability on the test set, but not enough certainty to fully replace laboratory measurement. Our best R^2 was .68 and RMSE of 8.42 I don’t believe would be accurate enough to actually replace our permeability lab experiment. It would be more appropriate as a way to reduce cost and time by identifying compounds to be tested in the permeability experiment/

6.3 A

library(AppliedPredictiveModeling)
data("ChemicalManufacturingProcess")

Part B Missing predictor values were imputed using median imputation. After splitting the data into training and test datasets I applied preProcess() with medianImpute, center, and scale to the predictors.

chem <- ChemicalManufacturingProcess

ProcessPredictors <- chem[, names(chem) != "Yield"]
yield <- chem$Yield


Training_split <- createDataPartition(yield, p = 0.80, list = FALSE)

trainsplit_x <- ProcessPredictors[Training_split, ]
testsplit_x  <- ProcessPredictors[-Training_split, ]

trainsplit_y <- yield[Training_split]
testsplit_y  <- yield[-Training_split]

medprocess <- preProcess(
  trainsplit_x,
  method = c("medianImpute", "center", "scale")
)

training_x_f <- predict(medprocess, trainsplit_x)
test_x_f    <- predict(medprocess, testsplit_x)

Part C The optimal model used 4 latent variables, based on the smallest RMSE. The outcome at 4 was RMSE = 1.1945, R^2 = 0.6342, and MAE = 0.9822. In our PLS we seem to gain in RMSE and R^2 values up to 4 with a drop off beginning after that point.

ctrl <- trainControl(
  method = "cv",
  number = 10
)

chem_pls <- train(
  x = training_x_f,
  y = trainsplit_y,
  method = "pls",
  tuneLength = 20,
  metric = "RMSE",
  trControl = ctrl
)

chem_pls
## Partial Least Squares 
## 
## 144 samples
##  57 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 130, 131, 129, 128, 129, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE      
##    1     1.390444  0.4808496  1.1440788
##    2     1.274229  0.5542536  1.0295985
##    3     1.223829  0.5801451  0.9732079
##    4     1.196021  0.6049435  0.9724990
##    5     1.200570  0.6174447  0.9847294
##    6     1.211527  0.6093765  0.9917656
##    7     1.201692  0.6240207  0.9940188
##    8     1.244765  0.6106207  1.0317198
##    9     1.256870  0.6103914  1.0415963
##   10     1.275109  0.6071681  1.0547934
##   11     1.277922  0.6013721  1.0495775
##   12     1.278733  0.6042408  1.0545566
##   13     1.285990  0.6011582  1.0657477
##   14     1.288896  0.6030001  1.0683622
##   15     1.298976  0.5993073  1.0773815
##   16     1.302490  0.5960829  1.0727553
##   17     1.297523  0.5985248  1.0672503
##   18     1.300664  0.5954899  1.0676782
##   19     1.306795  0.5911569  1.0712583
##   20     1.309907  0.5876253  1.0715018
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 4.

Part D The PLS model was used to predict yield on the test set. The test set showed was RMSE = 2.4906, R^2 = 0.2189, and MAE = 1.4280. Compared with the cross validated training from part c, this shows a noticeable decline in accuracy. The resampled training results were RMSE = 1.1945 and R^2 = 0.6342 which suggests the model performed much worse on unseen data. This concludes that the model did not generalize as well as expected due to the substantial R^2 difference.

pls_pred <- predict(chem_pls, newdata = test_x_f)
postResample(pred = pls_pred, obs = testsplit_y)
##      RMSE  Rsquared       MAE 
## 2.2962372 0.3001268 1.1916439

Part E

The PLS model indicated that the most important predictors were ManufacturingProcess32, ManufacturingProcess36, ManufacturingProcess13, ManufacturingProcess09, ManufacturingProcess17, and ManufacturingProcess33, followed by BiologicalMaterial02, BiologicalMaterial03, BiologicalMaterial08, and BiologicalMaterial06. The manufacturing process predictors were the majority of the list with 6 of the top 10 predictors coming from the process variables and the 6 most important predictors overall were all manufacturing variables. This suggests that manufacturing process conditions (process predicters) probably have a stronger relationship with yield than the biological characteristics of the materials.

chem_imp <- varImp(chem_pls, scale = TRUE)
## Warning: package 'pls' was built under R version 4.5.2
## 
## Attaching package: 'pls'
## The following object is masked from 'package:caret':
## 
##     R2
## The following object is masked from 'package:stats':
## 
##     loadings
chem_imp
## pls variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess13   87.13
## ManufacturingProcess09   82.17
## ManufacturingProcess36   81.32
## ManufacturingProcess17   77.34
## ManufacturingProcess06   62.30
## BiologicalMaterial02     60.89
## BiologicalMaterial03     60.01
## BiologicalMaterial06     58.51
## BiologicalMaterial08     58.19
## ManufacturingProcess12   58.00
## ManufacturingProcess31   55.69
## ManufacturingProcess33   55.26
## BiologicalMaterial12     54.60
## BiologicalMaterial11     52.12
## ManufacturingProcess11   49.63
## ManufacturingProcess30   49.60
## BiologicalMaterial04     49.34
## ManufacturingProcess29   48.99
## ManufacturingProcess28   47.72
plot(chem_imp, top = 15)

Part F The relationships between the most important predictors and yield can help us see which variables are associated with the strongest with a higher or lower yield. Since the top predictors are mostly manufacturing process variables this suggests that improvements in yield would come more from optimizing processes than from differences in raw material biological factors. Biological predictors are still useful because they can help assess the quality of the raw materials before production begins. Examining scatterplots of the top predictors versus yield may reveal trends, thresholds, or optimal operating ranges that can guide future process adjustments and improve product yield. This may also be a way to chose predictors for other models.

top_vars <- c(
  "ManufacturingProcess32",
  "ManufacturingProcess36",
  "ManufacturingProcess13",
  "ManufacturingProcess09",
  "ManufacturingProcess17",
  "ManufacturingProcess33",
  "BiologicalMaterial02",
  "BiologicalMaterial03"
)

plot_data <- data.frame(training_x_f, Yield = trainsplit_y) |>
  select(all_of(top_vars), Yield) |>
  pivot_longer(
    cols = -Yield,
    names_to = "Predictor",
    values_to = "Value"
  )

ggplot(plot_data, aes(x = Value, y = Yield)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE) +
  facet_wrap(~ Predictor, scales = "free_x") +
  labs(
    title = "Yield vs Top Predictors",
    x = "Predictor Value",
    y = "Yield"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.