title: “DS624_HW7_JagdishChhabria” author: “Jagdish Chhabria” date: “11/4/2021” output: html_document: default

6.2. Developing a model to predict permeability could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug:

  1. Start R and use these commands to load the data:

The matrix fingerprints contains the 1,107 binary molecular predictors for the 165 compounds, while permeability contains permeability response.

## [1]  165 1107
## [1] 165   1
  1. The fingerprint predictors indicate the presence or absence of substructures of a molecule and are often sparse meaning that relatively few of the molecules contain each substructure. Filter out the predictors that have low frequencies using the nearZeroVar function from the caret package. How many predictors are left for modeling?
## [1] 165 389
## [1] 165 388

From the above, we can see that we’re now left with 388 predictors from the original 1107.

  1. Split the data into a training and a test set, pre-process the data, and tune a PLS model. How many latent variables are optimal and what is the corresponding resampled estimate of R2?
## [1] 123 389
## [1]  42 389
## [1] 165   1
## [1] 123 388
## [1]  42 388
#ctrl<-trainControl(method = "cv", number = 10)

#plsTune<-train(fp.X.train, permeability, method = "pls", tuneLength = 10, trControl = ctrl, preProc = c("center", "scale"))

The above results show that the optimal number of latent variables is 3, based on the RMSEP (Root Mean Square of Predictions).

The above re-iterates that the optimal number of components is 3 based on the RMSEP (Root Mean Square of Predictions).

## Data:    X dimension: 123 388 
##  Y dimension: 123 1
## Fit method: kernelpls
## Number of components considered: 10
## 
## VALIDATION: RMSEP
## Cross-validated using 123 leave-one-out segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           14.56    12.56    11.62    11.59    11.80    11.68    11.67
## adjCV        14.56    12.56    11.61    11.58    11.79    11.65    11.66
##        7 comps  8 comps  9 comps  10 comps
## CV       11.62    11.71    11.84     12.07
## adjCV    11.61    11.69    11.82     12.04
## 
## TRAINING: % variance explained
##               1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X               23.23    34.99    39.42    44.49    48.81    57.46    62.17
## permeability    31.23    48.98    58.64    64.50    70.68    72.90    75.00
##               8 comps  9 comps  10 comps
## X               65.33    69.44     71.94
## permeability    77.43    79.19     80.64

As discussed above, the RMSEP is lowest for number of components = 3

  1. Predict the response for the test set. What is the test set estimate of R2?

Below, we predict the target variable for the test data using the fitted PLS model.

##       RMSE   Rsquared        MAE 
## 13.0100889  0.4674614  9.7768923

The R2 for the test set is 0.467 and the RMSE is 13.01.

  1. Try building other models discussed in this chapter. Do any have better predictive performance?

Below, we try to fit penalized models and compare their results to the PLS model. The glmnet method in the caret package has an alpha argument that determines which penalized model is fit, i.e. ridge or lasso. If alpha = 0 then a ridge regression model is fit, and if alpha = 1 then a lasso model is fit. The best lambda is then defined as the one that minimizes the cross-validation prediction error rate.

From the ridge model below, the best tune is with a lambda = 91.31, since R2 was used to select the optimal model using the largest value. It is equal to 0.46, while the RMSE is 11.02.

##    alpha   lambda
## 29     0 91.31313

The best lambda parameter chosen is 91.31

##    rsquared     rmse
## 1 0.4668815 11.02298

This model results in a RMSE of 11.02 and explains 46.68 percent of the variability in the target variable i.e. permeability.

##     alpha lambda
## 100     1      3

The best lambda parameter chosen is 3

##    rsquared     rmse
## 1 0.5226225 10.85566

This model results in a RMSE of 10.85 and explains 52.26 percent of the variability in the target variable i.e. permeability.

## $pls
##      RMSE  Rsquared       MAE 
##        NA 0.2200368        NA 
## 
## $ridge
##       RMSE   Rsquared        MAE 
## 13.6195164  0.4583612  9.7819882 
## 
## $lasso
##       RMSE   Rsquared        MAE 
## 14.7132555  0.3513113 10.5706987

From the models fitted above, the Ridge model seems to perform the best, with a lower RMSE and a higher R-squared compared to the PLS model and the lasso model.

  1. Would you recommend any of your models to replace the permeability laboratory experiment?

I would not recommend replacing the permeability laboratory experiment because the highest R-squared from amongst the 3 models tried, was 0.52 which doesn’t justify the replacement of the laboratory experiment.

6.3. A chemical manufacturing process for a pharmaceutical product was discussed in Sect. 1.4. In this problem, the objective is to understand the relationship between biological measurements of the raw materials (predictors), measurements of the manufacturing process (predictors), and the response of product yield. Biological predictors cannot be changed but can be used to assess the quality of the raw material before processing. On the other hand, manufacturing process predictors can be changed in the manufacturing process. Improving product yield by 1% will boost revenue by approximately one hundred thousand dollars per batch:

  1. Start R and use these commands to load the data:

The matrix process Predictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.

##   Yield BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
## 1 38.00                 6.25                49.58                56.97
## 2 42.44                 8.01                60.97                67.48
## 3 42.03                 8.01                60.97                67.48
## 4 41.42                 8.01                60.97                67.48
## 5 42.49                 7.47                63.33                72.25
## 6 43.57                 6.12                58.36                65.31
##   BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
## 1                12.74                19.51                43.73
## 2                14.65                19.36                53.14
## 3                14.65                19.36                53.14
## 4                14.65                19.36                53.14
## 5                14.02                17.91                54.66
## 6                15.17                21.79                51.23
##   BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
## 1                  100                16.66                11.44
## 2                  100                19.04                12.55
## 3                  100                19.04                12.55
## 4                  100                19.04                12.55
## 5                  100                18.22                12.80
## 6                  100                18.30                12.13
##   BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
## 1                 3.46               138.09                18.83
## 2                 3.46               153.67                21.05
## 3                 3.46               153.67                21.05
## 4                 3.46               153.67                21.05
## 5                 3.05               147.61                21.05
## 6                 3.78               151.88                20.76
##   ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
## 1                     NA                     NA                     NA
## 2                    0.0                      0                     NA
## 3                    0.0                      0                     NA
## 4                    0.0                      0                     NA
## 5                   10.7                      0                     NA
## 6                   12.0                      0                     NA
##   ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
## 1                     NA                     NA                     NA
## 2                    917                 1032.2                  210.0
## 3                    912                 1003.6                  207.1
## 4                    911                 1014.6                  213.3
## 5                    918                 1027.5                  205.7
## 6                    924                 1016.8                  208.9
##   ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
## 1                     NA                     NA                  43.00
## 2                    177                    178                  46.57
## 3                    178                    178                  45.07
## 4                    177                    177                  44.92
## 5                    178                    178                  44.96
## 6                    178                    178                  45.32
##   ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
## 1                     NA                     NA                     NA
## 2                     NA                     NA                      0
## 3                     NA                     NA                      0
## 4                     NA                     NA                      0
## 5                     NA                     NA                      0
## 6                     NA                     NA                      0
##   ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
## 1                   35.5                   4898                   6108
## 2                   34.0                   4869                   6095
## 3                   34.8                   4878                   6087
## 4                   34.8                   4897                   6102
## 5                   34.6                   4992                   6233
## 6                   34.0                   4985                   6222
##   ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
## 1                   4682                   35.5                   4865
## 2                   4617                   34.0                   4867
## 3                   4617                   34.8                   4877
## 4                   4635                   34.8                   4872
## 5                   4733                   33.9                   4886
## 6                   4786                   33.4                   4862
##   ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
## 1                   6049                   4665                    0.0
## 2                   6097                   4621                    0.0
## 3                   6078                   4621                    0.0
## 4                   6073                   4611                    0.0
## 5                   6102                   4659                   -0.7
## 6                   6115                   4696                   -0.6
##   ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
## 1                     NA                     NA                     NA
## 2                      3                      0                      3
## 3                      4                      1                      4
## 4                      5                      2                      5
## 5                      8                      4                     18
## 6                      9                      1                      1
##   ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
## 1                   4873                   6074                   4685
## 2                   4869                   6107                   4630
## 3                   4897                   6116                   4637
## 4                   4892                   6111                   4630
## 5                   4930                   6151                   4684
## 6                   4871                   6128                   4687
##   ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
## 1                   10.7                   21.0                    9.9
## 2                   11.2                   21.4                    9.9
## 3                   11.1                   21.3                    9.4
## 4                   11.1                   21.3                    9.4
## 5                   11.3                   21.6                    9.0
## 6                   11.4                   21.7                   10.1
##   ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
## 1                   69.1                    156                     66
## 2                   68.7                    169                     66
## 3                   69.3                    173                     66
## 4                   69.3                    171                     68
## 5                   69.4                    171                     70
## 6                   68.2                    173                     70
##   ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
## 1                    2.4                    486                  0.019
## 2                    2.6                    508                  0.019
## 3                    2.6                    509                  0.018
## 4                    2.5                    496                  0.018
## 5                    2.5                    468                  0.017
## 6                    2.5                    490                  0.018
##   ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
## 1                    0.5                      3                    7.2
## 2                    2.0                      2                    7.2
## 3                    0.7                      2                    7.2
## 4                    1.2                      2                    7.2
## 5                    0.2                      2                    7.3
## 6                    0.4                      2                    7.2
##   ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
## 1                     NA                     NA                   11.6
## 2                    0.1                   0.15                   11.1
## 3                    0.0                   0.00                   12.0
## 4                    0.0                   0.00                   10.6
## 5                    0.0                   0.00                   11.0
## 6                    0.0                   0.00                   11.5
##   ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
## 1                    3.0                    1.8                    2.4
## 2                    0.9                    1.9                    2.2
## 3                    1.0                    1.8                    2.3
## 4                    1.1                    1.8                    2.1
## 5                    1.1                    1.7                    2.1
## 6                    2.2                    1.8                    2.0

The first column i.e. percent yield is the target variable.

Let’s inspect the data summary stats.

vars n mean sd median trimmed mad min max range skew kurtosis se
Yield 1 176 40.1765341 1.8456664 39.970 40.1150000 1.9718580 35.250 46.340 11.090 0.3109596 -0.1132944 0.1391223
BiologicalMaterial01 2 176 6.4114205 0.7139225 6.305 6.3933803 0.6745830 4.580 8.810 4.230 0.2733165 0.4567758 0.0538139
BiologicalMaterial02 3 176 55.6887500 4.0345806 55.090 55.5810563 4.5812340 46.870 64.750 17.880 0.2441269 -0.7050911 0.3041180
BiologicalMaterial03 4 176 67.7050000 4.0010641 67.220 67.6780986 4.2773010 56.970 78.250 21.280 0.0285108 -0.1235203 0.3015916
BiologicalMaterial04 5 176 12.3492614 1.7746607 12.100 12.1860563 1.3714050 9.380 23.090 13.710 1.7323153 7.0564614 0.1337701
BiologicalMaterial05 6 176 18.5986364 1.8441408 18.490 18.5488732 1.8829020 13.240 24.850 11.610 0.3040053 0.2198005 0.1390073
BiologicalMaterial06 7 176 48.9103977 3.7460718 48.460 48.7379577 3.9437160 40.600 59.380 18.780 0.3685344 -0.3654933 0.2823708
BiologicalMaterial07 8 176 100.0141477 0.1077423 100.000 100.0000000 0.0000000 100.000 100.830 0.830 7.3986642 53.0417012 0.0081214
BiologicalMaterial08 9 176 17.4947727 0.6769536 17.510 17.4687324 0.5930400 15.880 19.140 3.260 0.2200539 0.0627721 0.0510273
BiologicalMaterial09 10 176 12.8500568 0.4151757 12.835 12.8635211 0.4225410 11.440 14.080 2.640 -0.2684177 0.2927765 0.0312950
BiologicalMaterial10 11 176 2.8006250 0.5991433 2.710 2.7328873 0.4003020 1.770 6.870 5.100 2.4023783 11.6471845 0.0451621
BiologicalMaterial11 12 176 146.9531818 4.8204704 146.080 146.7863380 4.1142150 135.810 158.730 22.920 0.3588211 0.0162456 0.3633566
BiologicalMaterial12 13 176 20.1998864 0.7735440 20.120 20.1776056 0.6671700 18.350 22.210 3.860 0.3038443 0.0146595 0.0583081
ManufacturingProcess01 14 175 11.2074286 1.8224342 11.400 11.4078014 1.0378200 0.000 14.100 14.100 -3.9201855 21.8688069 0.1377631
ManufacturingProcess02 15 173 16.6826590 8.4715694 21.000 18.0575540 1.4826000 0.000 22.500 22.500 -1.4307675 0.1062466 0.6440815
ManufacturingProcess03 16 161 1.5395652 0.0223983 1.540 1.5410078 0.0148260 1.470 1.600 0.130 -0.4799447 1.7280557 0.0017652
ManufacturingProcess04 17 175 931.8514286 6.2744406 934.000 932.2836879 5.9304000 911.000 946.000 35.000 -0.6979357 0.0631282 0.4743031
ManufacturingProcess05 18 175 1001.6931429 30.5272134 999.200 998.6248227 17.3464200 923.000 1175.300 252.300 2.5872769 11.7446904 2.3076404
ManufacturingProcess06 19 174 207.4017241 2.6993999 206.800 207.0928571 1.9273800 203.000 227.400 24.400 3.0419007 17.3764864 0.2046410
ManufacturingProcess07 20 175 177.4800000 0.5010334 177.000 177.4751773 0.0000000 177.000 178.000 1.000 0.0793788 -2.0050587 0.0378746
ManufacturingProcess08 21 175 177.5542857 0.4984706 178.000 177.5673759 0.0000000 177.000 178.000 1.000 -0.2165645 -1.9642262 0.0376808
ManufacturingProcess09 22 176 45.6601136 1.5464407 45.730 45.7188732 1.2157320 38.890 49.360 10.470 -0.9406685 3.2701986 0.1165674
ManufacturingProcess10 23 167 9.1790419 0.7666884 9.100 9.1318519 0.5930400 7.500 11.600 4.100 0.6492504 0.6317264 0.0593281
ManufacturingProcess11 24 166 9.3855422 0.7157336 9.400 9.3932836 0.6671700 7.500 11.500 4.000 -0.0193109 0.3227966 0.0555517
ManufacturingProcess12 25 175 857.8114286 1784.5282624 0.000 516.1985816 0.0000000 0.000 4549.000 4549.000 1.5786729 0.4951353 134.8976569
ManufacturingProcess13 26 176 34.5079545 1.0152800 34.600 34.5119718 0.8895600 32.100 38.600 6.500 0.4802776 1.9593883 0.0765296
ManufacturingProcess14 27 175 4853.8685714 54.5236412 4856.000 4854.5744681 40.0302000 4701.000 5055.000 354.000 -0.0109687 1.0781378 4.1215999
ManufacturingProcess15 28 176 6038.9204545 58.3125023 6031.500 6035.5211268 40.7715000 5904.000 6233.000 329.000 0.6743478 1.2162163 4.3954702
ManufacturingProcess16 29 176 4565.8011364 351.6973215 4588.000 4588.3591549 42.9954000 0.000 4852.000 4852.000 -12.4202248 158.3981993 26.5101831
ManufacturingProcess17 30 176 34.3437500 1.2482059 34.400 34.3126761 1.1860800 31.300 40.000 8.700 1.1629715 4.6626982 0.0940871
ManufacturingProcess18 31 176 4809.6818182 367.4777364 4835.000 4837.0704225 34.8411000 0.000 4971.000 4971.000 -12.7361378 163.7375845 27.6996766
ManufacturingProcess19 32 176 6028.1988636 45.5785689 6022.000 6026.1549296 36.3237000 5890.000 6146.000 256.000 0.2973414 0.2962151 3.4356139
ManufacturingProcess20 33 176 4556.4602273 349.0089784 4582.000 4580.9788732 42.9954000 0.000 4759.000 4759.000 -12.6383268 162.0663905 26.3075416
ManufacturingProcess21 34 176 -0.1642045 0.7782930 -0.300 -0.2556338 0.4447800 -1.800 3.600 5.400 1.7291140 5.0274763 0.0586660
ManufacturingProcess22 35 175 5.4057143 3.3306262 5.000 5.2482270 4.4478000 0.000 12.000 12.000 0.3148909 -1.0175458 0.2517717
ManufacturingProcess23 36 175 3.0171429 1.6625499 3.000 2.9432624 1.4826000 0.000 6.000 6.000 0.1967985 -0.9975572 0.1256770
ManufacturingProcess24 37 175 8.8342857 5.7994224 8.000 8.5744681 7.4130000 0.000 23.000 23.000 0.3593200 -1.0207362 0.4383951
ManufacturingProcess25 38 171 4828.1754386 373.4810865 4855.000 4855.5620438 34.0998000 0.000 4990.000 4990.000 -12.6310220 160.3293620 28.5608125
ManufacturingProcess26 39 171 6015.5964912 464.8674900 6047.000 6048.5547445 38.5476000 0.000 6161.000 6161.000 -12.6694398 160.9849144 35.5493055
ManufacturingProcess27 40 171 4562.5087719 353.9848679 4587.000 4587.4452555 35.5824000 0.000 4710.000 4710.000 -12.5174778 158.3931091 27.0698994
ManufacturingProcess28 41 171 6.5918129 5.2489823 10.400 6.8248175 1.0378200 0.000 11.500 11.500 -0.4556335 -1.7907822 0.4013997
ManufacturingProcess29 42 171 20.0111111 1.6638879 19.900 20.0437956 0.4447800 0.000 22.000 22.000 -10.0848133 119.4378857 0.1272407
ManufacturingProcess30 43 171 9.1614035 0.9760824 9.100 9.2145985 0.7413000 0.000 11.200 11.200 -4.7557268 43.0848842 0.0746429
ManufacturingProcess31 44 171 70.1847953 5.5557816 70.800 70.7240876 0.8895600 0.000 72.500 72.500 -11.8231008 146.0094297 0.4248612
ManufacturingProcess32 45 176 158.4659091 5.3972456 158.000 158.3380282 4.4478000 143.000 173.000 30.000 0.2112252 0.0602714 0.4068327
ManufacturingProcess33 46 171 63.5438596 2.4833813 64.000 63.5474453 1.4826000 56.000 70.000 14.000 -0.1310030 0.2740324 0.1899089
ManufacturingProcess34 47 171 2.4935673 0.0543910 2.500 2.4927007 0.0000000 2.300 2.600 0.300 -0.2634497 1.0013075 0.0041594
ManufacturingProcess35 48 171 495.5964912 10.8196874 495.000 495.7445255 8.8956000 463.000 522.000 59.000 -0.1556154 0.4130958 0.8274022
ManufacturingProcess36 49 171 0.0195731 0.0008739 0.020 0.0195620 0.0014826 0.017 0.022 0.005 0.1453141 -0.0557822 0.0000668
ManufacturingProcess37 50 176 1.0136364 0.4450828 1.000 0.9964789 0.4447800 0.000 2.300 2.300 0.3783578 0.0698597 0.0335494
ManufacturingProcess38 51 176 2.5340909 0.6493753 3.000 2.6126761 0.0000000 0.000 3.000 3.000 -1.6818052 3.9189211 0.0489485
ManufacturingProcess39 52 176 6.8511364 1.5054943 7.200 7.1718310 0.1482600 0.000 7.500 7.500 -4.2691214 16.4987895 0.1134809
ManufacturingProcess40 53 175 0.0177143 0.0382885 0.000 0.0099291 0.0000000 0.000 0.100 0.100 1.6768073 0.8164458 0.0028943
ManufacturingProcess41 54 175 0.0237143 0.0538242 0.000 0.0106383 0.0000000 0.000 0.200 0.200 2.1686898 3.6290714 0.0040687
ManufacturingProcess42 55 176 11.2062500 1.9416092 11.600 11.5429577 0.2965200 0.000 12.100 12.100 -5.4500082 28.5288867 0.1463543
ManufacturingProcess43 56 176 0.9119318 0.8679860 0.800 0.8077465 0.2965200 0.000 11.000 11.000 9.0548747 101.0332345 0.0654269
ManufacturingProcess44 57 176 1.8051136 0.3220062 1.900 1.8549296 0.1482600 0.000 2.100 2.100 -4.9703552 25.0876065 0.0242721
ManufacturingProcess45 58 176 2.1380682 0.4069043 2.200 2.2042254 0.1482600 0.000 2.600 2.600 -4.0779411 18.7565001 0.0306716
  1. A small percentage of cells in the predictor set contain missing values. Use an imputation function to fill in these missing values.
Missing Values
variable n_miss pct_miss
ManufacturingProcess03 15 8.5227273
ManufacturingProcess11 10 5.6818182
ManufacturingProcess10 9 5.1136364
ManufacturingProcess25 5 2.8409091
ManufacturingProcess26 5 2.8409091
ManufacturingProcess27 5 2.8409091
ManufacturingProcess28 5 2.8409091
ManufacturingProcess29 5 2.8409091
ManufacturingProcess30 5 2.8409091
ManufacturingProcess31 5 2.8409091
ManufacturingProcess33 5 2.8409091
ManufacturingProcess34 5 2.8409091
ManufacturingProcess35 5 2.8409091
ManufacturingProcess36 5 2.8409091
ManufacturingProcess02 3 1.7045455
ManufacturingProcess06 2 1.1363636
ManufacturingProcess01 1 0.5681818
ManufacturingProcess04 1 0.5681818
ManufacturingProcess05 1 0.5681818
ManufacturingProcess07 1 0.5681818
ManufacturingProcess08 1 0.5681818
ManufacturingProcess12 1 0.5681818
ManufacturingProcess14 1 0.5681818
ManufacturingProcess22 1 0.5681818
ManufacturingProcess23 1 0.5681818
ManufacturingProcess24 1 0.5681818
ManufacturingProcess40 1 0.5681818
ManufacturingProcess41 1 0.5681818
miss_var_summary(chemical)%>%filter(n_miss>0)%>%kable(caption = 'Missing Values', format="html", table.attr="style='width:50%;'")%>%kable_styling()
Missing Values
variable n_miss pct_miss

After imputation, there are no more missing values.

  1. Split the data into a training and a test set, pre-process the data, and tune a model of your choice from this chapter. What is the optimal value of the performance metric?
## [1] 7
##  [1]  2  4 11 40 53 38 36 42 29 51
## [1] 176  46
##   BiologicalMaterial01 BiologicalMaterial03 BiologicalMaterial05
## 1           -0.2261036          -2.68303622            0.4941942
## 2            2.2391498          -0.05623504            0.4128555
## 3            2.2391498          -0.05623504            0.4128555
## 4            2.2391498          -0.05623504            0.4128555
## 5            1.4827653           1.13594780           -0.3734185
## 6           -0.4081962          -0.59859075            1.7305423
##   BiologicalMaterial06 BiologicalMaterial08 BiologicalMaterial09
## 1           -1.3828880            -1.233131           -3.3962895
## 2            1.1290767             2.282619           -0.7227225
## 3            1.1290767             2.282619           -0.7227225
## 4            1.1290767             2.282619           -0.7227225
## 5            1.5348350             1.071310           -0.1205678
## 6            0.6192092             1.189487           -1.7343424
##   BiologicalMaterial10 BiologicalMaterial11 ManufacturingProcess01
## 1            1.1005296            -1.838655              0.2154105
## 2            1.1005296             1.393395             -6.1497028
## 3            1.1005296             1.393395             -6.1497028
## 4            1.1005296             1.393395             -6.1497028
## 5            0.4162193             0.136256             -0.2784345
## 6            1.6346255             1.022062              0.4348971
##   ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04
## 1              0.5662872              0.3765810              0.5655598
## 2             -1.9692525              0.1979962             -2.3669726
## 3             -1.9692525              0.1087038             -3.1638563
## 4             -1.9692525              0.4658734             -3.3232331
## 5             -1.9692525              0.1087038             -2.2075958
## 6             -1.9692525              0.5551658             -1.2513352
##   ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07
## 1            -0.44593467             -0.5414997             -0.1596700
## 2             0.99933318              0.9625383             -0.9580199
## 3             0.06246417             -0.1117745              1.0378549
## 4             0.42279841              2.1850322             -0.9580199
## 5             0.84537219             -0.6304083              1.0378549
## 6             0.49486525              0.5550403              1.0378549
##   ManufacturingProcess08 ManufacturingProcess09 ManufacturingProcess10
## 1             -0.3095182             -1.7201524            -0.07700901
## 2              0.8941637              0.5883746             0.52297397
## 3              0.8941637             -0.3815947             0.31428424
## 4             -1.1119728             -0.4785917            -0.02483658
## 5              0.8941637             -0.4527258            -0.39004361
## 6              0.8941637             -0.2199332             0.28819802
##   ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess13
## 1            -0.09157342             -0.4806937             0.97711512
## 2             1.08204765             -0.4806937            -0.50030980
## 3             0.55112383             -0.4806937             0.28765016
## 4             0.80261406             -0.4806937             0.28765016
## 5             0.10403009             -0.4806937             0.09066017
## 6             1.41736795             -0.4806937            -0.50030980
##   ManufacturingProcess14 ManufacturingProcess15 ManufacturingProcess16
## 1              0.8093999              1.1846438              0.3303945
## 2              0.2775205              0.9617071              0.1455765
## 3              0.4425865              0.8245152              0.1455765
## 4              0.7910592              1.0817499              0.1967569
## 5              2.5334227              3.3282665              0.4754056
## 6              2.4050380              3.1396277              0.6261033
##   ManufacturingProcess17 ManufacturingProcess19 ManufacturingProcess20
## 1              0.9263296              0.4563798              0.3109942
## 2             -0.2753953              1.5095063              0.1849230
## 3              0.3655246              1.0926437              0.1849230
## 4              0.3655246              0.9829430              0.1562704
## 5             -0.3555103              1.6192070              0.2938027
## 6             -0.7560852              1.9044287              0.3998171
##   ManufacturingProcess21 ManufacturingProcess22 ManufacturingProcess23
## 1              0.2109804             0.05833309              0.8317688
## 2              0.2109804            -0.72230090             -1.8147683
## 3              0.2109804            -0.42205706             -1.2132826
## 4              0.2109804            -0.12181322             -0.6117969
## 5             -0.6884239             0.77891831              0.5911745
## 6             -0.5599376             1.07916216             -1.2132826
##   ManufacturingProcess24 ManufacturingProcess26 ManufacturingProcess28
## 1              0.8907291              0.1256347              0.7826636
## 2             -1.0060115              0.1966227              0.8779201
## 3             -0.8335805              0.2159831              0.8588688
## 4             -0.6611496              0.2052273              0.8588688
## 5              1.5804530              0.2912733              0.8969714
## 6             -1.3508734              0.2417969              0.9160227
##   ManufacturingProcess30 ManufacturingProcess32 ManufacturingProcess33
## 1              0.7566948             -0.4568829              0.9890307
## 2              0.7566948              1.9517531              0.9890307
## 3              0.2444430              2.6928719              0.9890307
## 4              0.2444430              2.3223125              1.7943843
## 5             -0.1653585              2.3223125              2.5997378
## 6              0.9615956              2.6928719              2.5997378
##   ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
## 1             -1.7202722            -0.88694718             -0.6557774
## 2              1.9568096             1.14638329             -0.6557774
## 3              1.9568096             1.23880740             -1.8000420
## 4              0.1182687             0.03729394             -1.8000420
## 5              0.1182687            -2.55058120             -2.9443066
## 6              0.1182687            -0.51725073             -1.8000420
##   ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
## 1             -1.1540243              0.7174727              0.2317270
## 2              2.2161351             -0.8224687              0.2317270
## 3             -0.7046697             -0.8224687              0.2317270
## 4              0.4187168             -0.8224687              0.2317270
## 5             -1.8280562             -0.8224687              0.2981503
## 6             -1.3787016             -0.8224687              0.2317270
##   ManufacturingProcess41 ManufacturingProcess43 ManufacturingProcess44
## 1            -0.06900773             2.40564734            -0.01588055
## 2             2.34626280            -0.01374656             0.29467248
## 3            -0.44058781             0.10146268            -0.01588055
## 4            -0.44058781             0.21667191            -0.01588055
## 5            -0.44058781             0.21667191            -0.32643359
## 6            -0.44058781             1.48397347            -0.01588055
##   ManufacturingProcess45
## 1             0.64371849
## 2             0.15220242
## 3             0.39796046
## 4            -0.09355562
## 5            -0.09355562
## 6            -0.33931365
## [1] 176  46
## [1] 176   1
  1. Predict the response for the test set.What is the value of the performance metric and how does this compare with the resampled performance metric on the training set?
##    alpha     lambda
## 75   0.8 0.01584038

From the elastic net model fitted above, we can see that the best tune is with alpha = 0.3 and lambda = 0.40, since R2 was used to select the optimal model using the largest value.

##    rsquared     rmse
## 1 0.6846746 1.150071

R2 is equal to 0.639, while the RMSE is 1.122.

##       RMSE   Rsquared        MAE 
## 11.2216556  0.0264722  2.8782738

For the predicted values, we can see that the R=squared is quite low at 0.181 and the RMSE is also higher at 3.11

  1. Which predictors are most important in the model you have trained? Do either the biological or process predictors dominate the list?
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 46)
## 
##                        Overall
## ManufacturingProcess26 5.10807
## ManufacturingProcess20 1.46344
## ManufacturingProcess32 1.42492
## ManufacturingProcess33 0.68191
## ManufacturingProcess30 0.62083
## ManufacturingProcess13 0.53432
## BiologicalMaterial06   0.34837
## ManufacturingProcess04 0.32541
## ManufacturingProcess28 0.25215
## BiologicalMaterial10   0.23152
## ManufacturingProcess09 0.23113
## ManufacturingProcess45 0.22319
## ManufacturingProcess37 0.20234
## ManufacturingProcess14 0.15921
## ManufacturingProcess03 0.15001
## ManufacturingProcess43 0.14675
## ManufacturingProcess39 0.11041
## ManufacturingProcess38 0.10098
## ManufacturingProcess05 0.09859
## BiologicalMaterial11   0.09777

From the fitted model, the coefficients of each predictor explain the corresponding impact on the target variable. From the above, we can see that ManufacturingProcess26 is the most important variable out of the 46 variables since it has the largest, absolute coefficient.

##                 variables        coef
## 1             (Intercept) 39.79681844
## 2    BiologicalMaterial05  0.09069360
## 3    BiologicalMaterial06  0.34837004
## 4    BiologicalMaterial10 -0.23151548
## 5    BiologicalMaterial11  0.09776693
## 6  ManufacturingProcess01  0.08332074
## 7  ManufacturingProcess03 -0.15000924
## 8  ManufacturingProcess04  0.32541241
## 9  ManufacturingProcess05 -0.09859363
## 10 ManufacturingProcess06  0.01738979
## 11 ManufacturingProcess08 -0.07798636
## 12 ManufacturingProcess09  0.23112864
## 13 ManufacturingProcess10 -0.07586241
## 14 ManufacturingProcess12  0.04056577
## 15 ManufacturingProcess13 -0.53432158
## 16 ManufacturingProcess14  0.15921151
## 17 ManufacturingProcess16  0.04629846
## 18 ManufacturingProcess17 -0.05009535
## 19 ManufacturingProcess19  0.06461070
## 20 ManufacturingProcess20 -1.46344013
## 21 ManufacturingProcess22 -0.05807113
## 22 ManufacturingProcess23  0.01732239
## 23 ManufacturingProcess26  5.10806895
## 24 ManufacturingProcess28 -0.25214565
## 25 ManufacturingProcess30  0.62083037
## 26 ManufacturingProcess32  1.42491729
## 27 ManufacturingProcess33 -0.68191245
## 28 ManufacturingProcess34  0.01700857
## 29 ManufacturingProcess35 -0.06432492
## 30 ManufacturingProcess37 -0.20234413
## 31 ManufacturingProcess38 -0.10097678
## 32 ManufacturingProcess39  0.11040612
## 33 ManufacturingProcess43  0.14675431
## 34 ManufacturingProcess45  0.22318556

The table above shows the intercept and the co-efficients for each of the included predictors in the final tuned model.

  1. Explore the relationships between each of the top predictors and the response. How could this information be helpful in improving yield in future runs of the manufacturing process?

A positive coefficient indicates that as the value of the predictor increases, the mean of the response variable also tends to increase. A negative coefficient indicates that as the predictor increases, the response variable tends to decrease. The coefficient value signifies how much the mean of the Yield changes given a one-unit shift in the predictor variable while all other variables in the model are held constant. This property of holding the other variables constant is important because it allows for the assessment of the effect of each variable in isolation from the others.

##                 variables        coef
## 1             (Intercept) 39.79681844
## 2    BiologicalMaterial05  0.09069360
## 3    BiologicalMaterial06  0.34837004
## 4    BiologicalMaterial11  0.09776693
## 5  ManufacturingProcess01  0.08332074
## 6  ManufacturingProcess04  0.32541241
## 7  ManufacturingProcess06  0.01738979
## 8  ManufacturingProcess09  0.23112864
## 9  ManufacturingProcess12  0.04056577
## 10 ManufacturingProcess14  0.15921151
## 11 ManufacturingProcess16  0.04629846
## 12 ManufacturingProcess19  0.06461070
## 13 ManufacturingProcess23  0.01732239
## 14 ManufacturingProcess26  5.10806895
## 15 ManufacturingProcess30  0.62083037
## 16 ManufacturingProcess32  1.42491729
## 17 ManufacturingProcess34  0.01700857
## 18 ManufacturingProcess39  0.11040612
## 19 ManufacturingProcess43  0.14675431
## 20 ManufacturingProcess45  0.22318556
##                 variables        coef
## 4    BiologicalMaterial10 -0.23151548
## 7  ManufacturingProcess03 -0.15000924
## 9  ManufacturingProcess05 -0.09859363
## 11 ManufacturingProcess08 -0.07798636
## 13 ManufacturingProcess10 -0.07586241
## 15 ManufacturingProcess13 -0.53432158
## 18 ManufacturingProcess17 -0.05009535
## 20 ManufacturingProcess20 -1.46344013
## 21 ManufacturingProcess22 -0.05807113
## 24 ManufacturingProcess28 -0.25214565
## 27 ManufacturingProcess33 -0.68191245
## 29 ManufacturingProcess35 -0.06432492
## 30 ManufacturingProcess37 -0.20234413
## 31 ManufacturingProcess38 -0.10097678

From the 17 positive coefficients, ManufacturingProcess26 improved the yield tremendously, and from the 6 negative co-effcients, ManufacturingProcess13 decreased the yield the most. This information can be used to modify the manufacturing process to increase the yield.

since biological predictors cannot be altered, we should focus on the manufacturing process-related predictors and explore further how those can be tilted in order to improve the yield.