| title: “DS624_HW7_JagdishChhabria” author: “Jagdish Chhabria” date: “11/4/2021” output: html_document: default |
The matrix fingerprints contains the 1,107 binary molecular predictors for the 165 compounds, while permeability contains permeability response.
## [1] 165 1107
## [1] 165 1
## [1] 165 389
## [1] 165 388
From the above, we can see that we’re now left with 388 predictors from the original 1107.
## [1] 123 389
## [1] 42 389
## [1] 165 1
## [1] 123 388
## [1] 42 388
#ctrl<-trainControl(method = "cv", number = 10)
#plsTune<-train(fp.X.train, permeability, method = "pls", tuneLength = 10, trControl = ctrl, preProc = c("center", "scale"))
The above results show that the optimal number of latent variables is 3, based on the RMSEP (Root Mean Square of Predictions).
The above re-iterates that the optimal number of components is 3 based on the RMSEP (Root Mean Square of Predictions).
## Data: X dimension: 123 388
## Y dimension: 123 1
## Fit method: kernelpls
## Number of components considered: 10
##
## VALIDATION: RMSEP
## Cross-validated using 123 leave-one-out segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 14.56 12.56 11.62 11.59 11.80 11.68 11.67
## adjCV 14.56 12.56 11.61 11.58 11.79 11.65 11.66
## 7 comps 8 comps 9 comps 10 comps
## CV 11.62 11.71 11.84 12.07
## adjCV 11.61 11.69 11.82 12.04
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
## X 23.23 34.99 39.42 44.49 48.81 57.46 62.17
## permeability 31.23 48.98 58.64 64.50 70.68 72.90 75.00
## 8 comps 9 comps 10 comps
## X 65.33 69.44 71.94
## permeability 77.43 79.19 80.64
As discussed above, the RMSEP is lowest for number of components = 3
Below, we predict the target variable for the test data using the fitted PLS model.
## RMSE Rsquared MAE
## 13.0100889 0.4674614 9.7768923
The R2 for the test set is 0.467 and the RMSE is 13.01.
Below, we try to fit penalized models and compare their results to the PLS model. The glmnet method in the caret package has an alpha argument that determines which penalized model is fit, i.e. ridge or lasso. If alpha = 0 then a ridge regression model is fit, and if alpha = 1 then a lasso model is fit. The best lambda is then defined as the one that minimizes the cross-validation prediction error rate.
From the ridge model below, the best tune is with a lambda = 91.31, since R2 was used to select the optimal model using the largest value. It is equal to 0.46, while the RMSE is 11.02.
## alpha lambda
## 29 0 91.31313
The best lambda parameter chosen is 91.31
## rsquared rmse
## 1 0.4668815 11.02298
This model results in a RMSE of 11.02 and explains 46.68 percent of the variability in the target variable i.e. permeability.
## alpha lambda
## 100 1 3
The best lambda parameter chosen is 3
## rsquared rmse
## 1 0.5226225 10.85566
This model results in a RMSE of 10.85 and explains 52.26 percent of the variability in the target variable i.e. permeability.
## $pls
## RMSE Rsquared MAE
## NA 0.2200368 NA
##
## $ridge
## RMSE Rsquared MAE
## 13.6195164 0.4583612 9.7819882
##
## $lasso
## RMSE Rsquared MAE
## 14.7132555 0.3513113 10.5706987
From the models fitted above, the Ridge model seems to perform the best, with a lower RMSE and a higher R-squared compared to the PLS model and the lasso model.
I would not recommend replacing the permeability laboratory experiment because the highest R-squared from amongst the 3 models tried, was 0.52 which doesn’t justify the replacement of the laboratory experiment.
The matrix process Predictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.
## Yield BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
## 1 38.00 6.25 49.58 56.97
## 2 42.44 8.01 60.97 67.48
## 3 42.03 8.01 60.97 67.48
## 4 41.42 8.01 60.97 67.48
## 5 42.49 7.47 63.33 72.25
## 6 43.57 6.12 58.36 65.31
## BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
## 1 12.74 19.51 43.73
## 2 14.65 19.36 53.14
## 3 14.65 19.36 53.14
## 4 14.65 19.36 53.14
## 5 14.02 17.91 54.66
## 6 15.17 21.79 51.23
## BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
## 1 100 16.66 11.44
## 2 100 19.04 12.55
## 3 100 19.04 12.55
## 4 100 19.04 12.55
## 5 100 18.22 12.80
## 6 100 18.30 12.13
## BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
## 1 3.46 138.09 18.83
## 2 3.46 153.67 21.05
## 3 3.46 153.67 21.05
## 4 3.46 153.67 21.05
## 5 3.05 147.61 21.05
## 6 3.78 151.88 20.76
## ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
## 1 NA NA NA
## 2 0.0 0 NA
## 3 0.0 0 NA
## 4 0.0 0 NA
## 5 10.7 0 NA
## 6 12.0 0 NA
## ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
## 1 NA NA NA
## 2 917 1032.2 210.0
## 3 912 1003.6 207.1
## 4 911 1014.6 213.3
## 5 918 1027.5 205.7
## 6 924 1016.8 208.9
## ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
## 1 NA NA 43.00
## 2 177 178 46.57
## 3 178 178 45.07
## 4 177 177 44.92
## 5 178 178 44.96
## 6 178 178 45.32
## ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
## 1 NA NA NA
## 2 NA NA 0
## 3 NA NA 0
## 4 NA NA 0
## 5 NA NA 0
## 6 NA NA 0
## ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
## 1 35.5 4898 6108
## 2 34.0 4869 6095
## 3 34.8 4878 6087
## 4 34.8 4897 6102
## 5 34.6 4992 6233
## 6 34.0 4985 6222
## ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
## 1 4682 35.5 4865
## 2 4617 34.0 4867
## 3 4617 34.8 4877
## 4 4635 34.8 4872
## 5 4733 33.9 4886
## 6 4786 33.4 4862
## ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
## 1 6049 4665 0.0
## 2 6097 4621 0.0
## 3 6078 4621 0.0
## 4 6073 4611 0.0
## 5 6102 4659 -0.7
## 6 6115 4696 -0.6
## ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
## 1 NA NA NA
## 2 3 0 3
## 3 4 1 4
## 4 5 2 5
## 5 8 4 18
## 6 9 1 1
## ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
## 1 4873 6074 4685
## 2 4869 6107 4630
## 3 4897 6116 4637
## 4 4892 6111 4630
## 5 4930 6151 4684
## 6 4871 6128 4687
## ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
## 1 10.7 21.0 9.9
## 2 11.2 21.4 9.9
## 3 11.1 21.3 9.4
## 4 11.1 21.3 9.4
## 5 11.3 21.6 9.0
## 6 11.4 21.7 10.1
## ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
## 1 69.1 156 66
## 2 68.7 169 66
## 3 69.3 173 66
## 4 69.3 171 68
## 5 69.4 171 70
## 6 68.2 173 70
## ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
## 1 2.4 486 0.019
## 2 2.6 508 0.019
## 3 2.6 509 0.018
## 4 2.5 496 0.018
## 5 2.5 468 0.017
## 6 2.5 490 0.018
## ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
## 1 0.5 3 7.2
## 2 2.0 2 7.2
## 3 0.7 2 7.2
## 4 1.2 2 7.2
## 5 0.2 2 7.3
## 6 0.4 2 7.2
## ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
## 1 NA NA 11.6
## 2 0.1 0.15 11.1
## 3 0.0 0.00 12.0
## 4 0.0 0.00 10.6
## 5 0.0 0.00 11.0
## 6 0.0 0.00 11.5
## ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
## 1 3.0 1.8 2.4
## 2 0.9 1.9 2.2
## 3 1.0 1.8 2.3
## 4 1.1 1.8 2.1
## 5 1.1 1.7 2.1
## 6 2.2 1.8 2.0
The first column i.e. percent yield is the target variable.
Let’s inspect the data summary stats.
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Yield | 1 | 176 | 40.1765341 | 1.8456664 | 39.970 | 40.1150000 | 1.9718580 | 35.250 | 46.340 | 11.090 | 0.3109596 | -0.1132944 | 0.1391223 |
| BiologicalMaterial01 | 2 | 176 | 6.4114205 | 0.7139225 | 6.305 | 6.3933803 | 0.6745830 | 4.580 | 8.810 | 4.230 | 0.2733165 | 0.4567758 | 0.0538139 |
| BiologicalMaterial02 | 3 | 176 | 55.6887500 | 4.0345806 | 55.090 | 55.5810563 | 4.5812340 | 46.870 | 64.750 | 17.880 | 0.2441269 | -0.7050911 | 0.3041180 |
| BiologicalMaterial03 | 4 | 176 | 67.7050000 | 4.0010641 | 67.220 | 67.6780986 | 4.2773010 | 56.970 | 78.250 | 21.280 | 0.0285108 | -0.1235203 | 0.3015916 |
| BiologicalMaterial04 | 5 | 176 | 12.3492614 | 1.7746607 | 12.100 | 12.1860563 | 1.3714050 | 9.380 | 23.090 | 13.710 | 1.7323153 | 7.0564614 | 0.1337701 |
| BiologicalMaterial05 | 6 | 176 | 18.5986364 | 1.8441408 | 18.490 | 18.5488732 | 1.8829020 | 13.240 | 24.850 | 11.610 | 0.3040053 | 0.2198005 | 0.1390073 |
| BiologicalMaterial06 | 7 | 176 | 48.9103977 | 3.7460718 | 48.460 | 48.7379577 | 3.9437160 | 40.600 | 59.380 | 18.780 | 0.3685344 | -0.3654933 | 0.2823708 |
| BiologicalMaterial07 | 8 | 176 | 100.0141477 | 0.1077423 | 100.000 | 100.0000000 | 0.0000000 | 100.000 | 100.830 | 0.830 | 7.3986642 | 53.0417012 | 0.0081214 |
| BiologicalMaterial08 | 9 | 176 | 17.4947727 | 0.6769536 | 17.510 | 17.4687324 | 0.5930400 | 15.880 | 19.140 | 3.260 | 0.2200539 | 0.0627721 | 0.0510273 |
| BiologicalMaterial09 | 10 | 176 | 12.8500568 | 0.4151757 | 12.835 | 12.8635211 | 0.4225410 | 11.440 | 14.080 | 2.640 | -0.2684177 | 0.2927765 | 0.0312950 |
| BiologicalMaterial10 | 11 | 176 | 2.8006250 | 0.5991433 | 2.710 | 2.7328873 | 0.4003020 | 1.770 | 6.870 | 5.100 | 2.4023783 | 11.6471845 | 0.0451621 |
| BiologicalMaterial11 | 12 | 176 | 146.9531818 | 4.8204704 | 146.080 | 146.7863380 | 4.1142150 | 135.810 | 158.730 | 22.920 | 0.3588211 | 0.0162456 | 0.3633566 |
| BiologicalMaterial12 | 13 | 176 | 20.1998864 | 0.7735440 | 20.120 | 20.1776056 | 0.6671700 | 18.350 | 22.210 | 3.860 | 0.3038443 | 0.0146595 | 0.0583081 |
| ManufacturingProcess01 | 14 | 175 | 11.2074286 | 1.8224342 | 11.400 | 11.4078014 | 1.0378200 | 0.000 | 14.100 | 14.100 | -3.9201855 | 21.8688069 | 0.1377631 |
| ManufacturingProcess02 | 15 | 173 | 16.6826590 | 8.4715694 | 21.000 | 18.0575540 | 1.4826000 | 0.000 | 22.500 | 22.500 | -1.4307675 | 0.1062466 | 0.6440815 |
| ManufacturingProcess03 | 16 | 161 | 1.5395652 | 0.0223983 | 1.540 | 1.5410078 | 0.0148260 | 1.470 | 1.600 | 0.130 | -0.4799447 | 1.7280557 | 0.0017652 |
| ManufacturingProcess04 | 17 | 175 | 931.8514286 | 6.2744406 | 934.000 | 932.2836879 | 5.9304000 | 911.000 | 946.000 | 35.000 | -0.6979357 | 0.0631282 | 0.4743031 |
| ManufacturingProcess05 | 18 | 175 | 1001.6931429 | 30.5272134 | 999.200 | 998.6248227 | 17.3464200 | 923.000 | 1175.300 | 252.300 | 2.5872769 | 11.7446904 | 2.3076404 |
| ManufacturingProcess06 | 19 | 174 | 207.4017241 | 2.6993999 | 206.800 | 207.0928571 | 1.9273800 | 203.000 | 227.400 | 24.400 | 3.0419007 | 17.3764864 | 0.2046410 |
| ManufacturingProcess07 | 20 | 175 | 177.4800000 | 0.5010334 | 177.000 | 177.4751773 | 0.0000000 | 177.000 | 178.000 | 1.000 | 0.0793788 | -2.0050587 | 0.0378746 |
| ManufacturingProcess08 | 21 | 175 | 177.5542857 | 0.4984706 | 178.000 | 177.5673759 | 0.0000000 | 177.000 | 178.000 | 1.000 | -0.2165645 | -1.9642262 | 0.0376808 |
| ManufacturingProcess09 | 22 | 176 | 45.6601136 | 1.5464407 | 45.730 | 45.7188732 | 1.2157320 | 38.890 | 49.360 | 10.470 | -0.9406685 | 3.2701986 | 0.1165674 |
| ManufacturingProcess10 | 23 | 167 | 9.1790419 | 0.7666884 | 9.100 | 9.1318519 | 0.5930400 | 7.500 | 11.600 | 4.100 | 0.6492504 | 0.6317264 | 0.0593281 |
| ManufacturingProcess11 | 24 | 166 | 9.3855422 | 0.7157336 | 9.400 | 9.3932836 | 0.6671700 | 7.500 | 11.500 | 4.000 | -0.0193109 | 0.3227966 | 0.0555517 |
| ManufacturingProcess12 | 25 | 175 | 857.8114286 | 1784.5282624 | 0.000 | 516.1985816 | 0.0000000 | 0.000 | 4549.000 | 4549.000 | 1.5786729 | 0.4951353 | 134.8976569 |
| ManufacturingProcess13 | 26 | 176 | 34.5079545 | 1.0152800 | 34.600 | 34.5119718 | 0.8895600 | 32.100 | 38.600 | 6.500 | 0.4802776 | 1.9593883 | 0.0765296 |
| ManufacturingProcess14 | 27 | 175 | 4853.8685714 | 54.5236412 | 4856.000 | 4854.5744681 | 40.0302000 | 4701.000 | 5055.000 | 354.000 | -0.0109687 | 1.0781378 | 4.1215999 |
| ManufacturingProcess15 | 28 | 176 | 6038.9204545 | 58.3125023 | 6031.500 | 6035.5211268 | 40.7715000 | 5904.000 | 6233.000 | 329.000 | 0.6743478 | 1.2162163 | 4.3954702 |
| ManufacturingProcess16 | 29 | 176 | 4565.8011364 | 351.6973215 | 4588.000 | 4588.3591549 | 42.9954000 | 0.000 | 4852.000 | 4852.000 | -12.4202248 | 158.3981993 | 26.5101831 |
| ManufacturingProcess17 | 30 | 176 | 34.3437500 | 1.2482059 | 34.400 | 34.3126761 | 1.1860800 | 31.300 | 40.000 | 8.700 | 1.1629715 | 4.6626982 | 0.0940871 |
| ManufacturingProcess18 | 31 | 176 | 4809.6818182 | 367.4777364 | 4835.000 | 4837.0704225 | 34.8411000 | 0.000 | 4971.000 | 4971.000 | -12.7361378 | 163.7375845 | 27.6996766 |
| ManufacturingProcess19 | 32 | 176 | 6028.1988636 | 45.5785689 | 6022.000 | 6026.1549296 | 36.3237000 | 5890.000 | 6146.000 | 256.000 | 0.2973414 | 0.2962151 | 3.4356139 |
| ManufacturingProcess20 | 33 | 176 | 4556.4602273 | 349.0089784 | 4582.000 | 4580.9788732 | 42.9954000 | 0.000 | 4759.000 | 4759.000 | -12.6383268 | 162.0663905 | 26.3075416 |
| ManufacturingProcess21 | 34 | 176 | -0.1642045 | 0.7782930 | -0.300 | -0.2556338 | 0.4447800 | -1.800 | 3.600 | 5.400 | 1.7291140 | 5.0274763 | 0.0586660 |
| ManufacturingProcess22 | 35 | 175 | 5.4057143 | 3.3306262 | 5.000 | 5.2482270 | 4.4478000 | 0.000 | 12.000 | 12.000 | 0.3148909 | -1.0175458 | 0.2517717 |
| ManufacturingProcess23 | 36 | 175 | 3.0171429 | 1.6625499 | 3.000 | 2.9432624 | 1.4826000 | 0.000 | 6.000 | 6.000 | 0.1967985 | -0.9975572 | 0.1256770 |
| ManufacturingProcess24 | 37 | 175 | 8.8342857 | 5.7994224 | 8.000 | 8.5744681 | 7.4130000 | 0.000 | 23.000 | 23.000 | 0.3593200 | -1.0207362 | 0.4383951 |
| ManufacturingProcess25 | 38 | 171 | 4828.1754386 | 373.4810865 | 4855.000 | 4855.5620438 | 34.0998000 | 0.000 | 4990.000 | 4990.000 | -12.6310220 | 160.3293620 | 28.5608125 |
| ManufacturingProcess26 | 39 | 171 | 6015.5964912 | 464.8674900 | 6047.000 | 6048.5547445 | 38.5476000 | 0.000 | 6161.000 | 6161.000 | -12.6694398 | 160.9849144 | 35.5493055 |
| ManufacturingProcess27 | 40 | 171 | 4562.5087719 | 353.9848679 | 4587.000 | 4587.4452555 | 35.5824000 | 0.000 | 4710.000 | 4710.000 | -12.5174778 | 158.3931091 | 27.0698994 |
| ManufacturingProcess28 | 41 | 171 | 6.5918129 | 5.2489823 | 10.400 | 6.8248175 | 1.0378200 | 0.000 | 11.500 | 11.500 | -0.4556335 | -1.7907822 | 0.4013997 |
| ManufacturingProcess29 | 42 | 171 | 20.0111111 | 1.6638879 | 19.900 | 20.0437956 | 0.4447800 | 0.000 | 22.000 | 22.000 | -10.0848133 | 119.4378857 | 0.1272407 |
| ManufacturingProcess30 | 43 | 171 | 9.1614035 | 0.9760824 | 9.100 | 9.2145985 | 0.7413000 | 0.000 | 11.200 | 11.200 | -4.7557268 | 43.0848842 | 0.0746429 |
| ManufacturingProcess31 | 44 | 171 | 70.1847953 | 5.5557816 | 70.800 | 70.7240876 | 0.8895600 | 0.000 | 72.500 | 72.500 | -11.8231008 | 146.0094297 | 0.4248612 |
| ManufacturingProcess32 | 45 | 176 | 158.4659091 | 5.3972456 | 158.000 | 158.3380282 | 4.4478000 | 143.000 | 173.000 | 30.000 | 0.2112252 | 0.0602714 | 0.4068327 |
| ManufacturingProcess33 | 46 | 171 | 63.5438596 | 2.4833813 | 64.000 | 63.5474453 | 1.4826000 | 56.000 | 70.000 | 14.000 | -0.1310030 | 0.2740324 | 0.1899089 |
| ManufacturingProcess34 | 47 | 171 | 2.4935673 | 0.0543910 | 2.500 | 2.4927007 | 0.0000000 | 2.300 | 2.600 | 0.300 | -0.2634497 | 1.0013075 | 0.0041594 |
| ManufacturingProcess35 | 48 | 171 | 495.5964912 | 10.8196874 | 495.000 | 495.7445255 | 8.8956000 | 463.000 | 522.000 | 59.000 | -0.1556154 | 0.4130958 | 0.8274022 |
| ManufacturingProcess36 | 49 | 171 | 0.0195731 | 0.0008739 | 0.020 | 0.0195620 | 0.0014826 | 0.017 | 0.022 | 0.005 | 0.1453141 | -0.0557822 | 0.0000668 |
| ManufacturingProcess37 | 50 | 176 | 1.0136364 | 0.4450828 | 1.000 | 0.9964789 | 0.4447800 | 0.000 | 2.300 | 2.300 | 0.3783578 | 0.0698597 | 0.0335494 |
| ManufacturingProcess38 | 51 | 176 | 2.5340909 | 0.6493753 | 3.000 | 2.6126761 | 0.0000000 | 0.000 | 3.000 | 3.000 | -1.6818052 | 3.9189211 | 0.0489485 |
| ManufacturingProcess39 | 52 | 176 | 6.8511364 | 1.5054943 | 7.200 | 7.1718310 | 0.1482600 | 0.000 | 7.500 | 7.500 | -4.2691214 | 16.4987895 | 0.1134809 |
| ManufacturingProcess40 | 53 | 175 | 0.0177143 | 0.0382885 | 0.000 | 0.0099291 | 0.0000000 | 0.000 | 0.100 | 0.100 | 1.6768073 | 0.8164458 | 0.0028943 |
| ManufacturingProcess41 | 54 | 175 | 0.0237143 | 0.0538242 | 0.000 | 0.0106383 | 0.0000000 | 0.000 | 0.200 | 0.200 | 2.1686898 | 3.6290714 | 0.0040687 |
| ManufacturingProcess42 | 55 | 176 | 11.2062500 | 1.9416092 | 11.600 | 11.5429577 | 0.2965200 | 0.000 | 12.100 | 12.100 | -5.4500082 | 28.5288867 | 0.1463543 |
| ManufacturingProcess43 | 56 | 176 | 0.9119318 | 0.8679860 | 0.800 | 0.8077465 | 0.2965200 | 0.000 | 11.000 | 11.000 | 9.0548747 | 101.0332345 | 0.0654269 |
| ManufacturingProcess44 | 57 | 176 | 1.8051136 | 0.3220062 | 1.900 | 1.8549296 | 0.1482600 | 0.000 | 2.100 | 2.100 | -4.9703552 | 25.0876065 | 0.0242721 |
| ManufacturingProcess45 | 58 | 176 | 2.1380682 | 0.4069043 | 2.200 | 2.2042254 | 0.1482600 | 0.000 | 2.600 | 2.600 | -4.0779411 | 18.7565001 | 0.0306716 |
| variable | n_miss | pct_miss |
|---|---|---|
| ManufacturingProcess03 | 15 | 8.5227273 |
| ManufacturingProcess11 | 10 | 5.6818182 |
| ManufacturingProcess10 | 9 | 5.1136364 |
| ManufacturingProcess25 | 5 | 2.8409091 |
| ManufacturingProcess26 | 5 | 2.8409091 |
| ManufacturingProcess27 | 5 | 2.8409091 |
| ManufacturingProcess28 | 5 | 2.8409091 |
| ManufacturingProcess29 | 5 | 2.8409091 |
| ManufacturingProcess30 | 5 | 2.8409091 |
| ManufacturingProcess31 | 5 | 2.8409091 |
| ManufacturingProcess33 | 5 | 2.8409091 |
| ManufacturingProcess34 | 5 | 2.8409091 |
| ManufacturingProcess35 | 5 | 2.8409091 |
| ManufacturingProcess36 | 5 | 2.8409091 |
| ManufacturingProcess02 | 3 | 1.7045455 |
| ManufacturingProcess06 | 2 | 1.1363636 |
| ManufacturingProcess01 | 1 | 0.5681818 |
| ManufacturingProcess04 | 1 | 0.5681818 |
| ManufacturingProcess05 | 1 | 0.5681818 |
| ManufacturingProcess07 | 1 | 0.5681818 |
| ManufacturingProcess08 | 1 | 0.5681818 |
| ManufacturingProcess12 | 1 | 0.5681818 |
| ManufacturingProcess14 | 1 | 0.5681818 |
| ManufacturingProcess22 | 1 | 0.5681818 |
| ManufacturingProcess23 | 1 | 0.5681818 |
| ManufacturingProcess24 | 1 | 0.5681818 |
| ManufacturingProcess40 | 1 | 0.5681818 |
| ManufacturingProcess41 | 1 | 0.5681818 |
miss_var_summary(chemical)%>%filter(n_miss>0)%>%kable(caption = 'Missing Values', format="html", table.attr="style='width:50%;'")%>%kable_styling()
| variable | n_miss | pct_miss |
|---|---|---|
After imputation, there are no more missing values.
## [1] 7
## [1] 2 4 11 40 53 38 36 42 29 51
## [1] 176 46
## BiologicalMaterial01 BiologicalMaterial03 BiologicalMaterial05
## 1 -0.2261036 -2.68303622 0.4941942
## 2 2.2391498 -0.05623504 0.4128555
## 3 2.2391498 -0.05623504 0.4128555
## 4 2.2391498 -0.05623504 0.4128555
## 5 1.4827653 1.13594780 -0.3734185
## 6 -0.4081962 -0.59859075 1.7305423
## BiologicalMaterial06 BiologicalMaterial08 BiologicalMaterial09
## 1 -1.3828880 -1.233131 -3.3962895
## 2 1.1290767 2.282619 -0.7227225
## 3 1.1290767 2.282619 -0.7227225
## 4 1.1290767 2.282619 -0.7227225
## 5 1.5348350 1.071310 -0.1205678
## 6 0.6192092 1.189487 -1.7343424
## BiologicalMaterial10 BiologicalMaterial11 ManufacturingProcess01
## 1 1.1005296 -1.838655 0.2154105
## 2 1.1005296 1.393395 -6.1497028
## 3 1.1005296 1.393395 -6.1497028
## 4 1.1005296 1.393395 -6.1497028
## 5 0.4162193 0.136256 -0.2784345
## 6 1.6346255 1.022062 0.4348971
## ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04
## 1 0.5662872 0.3765810 0.5655598
## 2 -1.9692525 0.1979962 -2.3669726
## 3 -1.9692525 0.1087038 -3.1638563
## 4 -1.9692525 0.4658734 -3.3232331
## 5 -1.9692525 0.1087038 -2.2075958
## 6 -1.9692525 0.5551658 -1.2513352
## ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07
## 1 -0.44593467 -0.5414997 -0.1596700
## 2 0.99933318 0.9625383 -0.9580199
## 3 0.06246417 -0.1117745 1.0378549
## 4 0.42279841 2.1850322 -0.9580199
## 5 0.84537219 -0.6304083 1.0378549
## 6 0.49486525 0.5550403 1.0378549
## ManufacturingProcess08 ManufacturingProcess09 ManufacturingProcess10
## 1 -0.3095182 -1.7201524 -0.07700901
## 2 0.8941637 0.5883746 0.52297397
## 3 0.8941637 -0.3815947 0.31428424
## 4 -1.1119728 -0.4785917 -0.02483658
## 5 0.8941637 -0.4527258 -0.39004361
## 6 0.8941637 -0.2199332 0.28819802
## ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess13
## 1 -0.09157342 -0.4806937 0.97711512
## 2 1.08204765 -0.4806937 -0.50030980
## 3 0.55112383 -0.4806937 0.28765016
## 4 0.80261406 -0.4806937 0.28765016
## 5 0.10403009 -0.4806937 0.09066017
## 6 1.41736795 -0.4806937 -0.50030980
## ManufacturingProcess14 ManufacturingProcess15 ManufacturingProcess16
## 1 0.8093999 1.1846438 0.3303945
## 2 0.2775205 0.9617071 0.1455765
## 3 0.4425865 0.8245152 0.1455765
## 4 0.7910592 1.0817499 0.1967569
## 5 2.5334227 3.3282665 0.4754056
## 6 2.4050380 3.1396277 0.6261033
## ManufacturingProcess17 ManufacturingProcess19 ManufacturingProcess20
## 1 0.9263296 0.4563798 0.3109942
## 2 -0.2753953 1.5095063 0.1849230
## 3 0.3655246 1.0926437 0.1849230
## 4 0.3655246 0.9829430 0.1562704
## 5 -0.3555103 1.6192070 0.2938027
## 6 -0.7560852 1.9044287 0.3998171
## ManufacturingProcess21 ManufacturingProcess22 ManufacturingProcess23
## 1 0.2109804 0.05833309 0.8317688
## 2 0.2109804 -0.72230090 -1.8147683
## 3 0.2109804 -0.42205706 -1.2132826
## 4 0.2109804 -0.12181322 -0.6117969
## 5 -0.6884239 0.77891831 0.5911745
## 6 -0.5599376 1.07916216 -1.2132826
## ManufacturingProcess24 ManufacturingProcess26 ManufacturingProcess28
## 1 0.8907291 0.1256347 0.7826636
## 2 -1.0060115 0.1966227 0.8779201
## 3 -0.8335805 0.2159831 0.8588688
## 4 -0.6611496 0.2052273 0.8588688
## 5 1.5804530 0.2912733 0.8969714
## 6 -1.3508734 0.2417969 0.9160227
## ManufacturingProcess30 ManufacturingProcess32 ManufacturingProcess33
## 1 0.7566948 -0.4568829 0.9890307
## 2 0.7566948 1.9517531 0.9890307
## 3 0.2444430 2.6928719 0.9890307
## 4 0.2444430 2.3223125 1.7943843
## 5 -0.1653585 2.3223125 2.5997378
## 6 0.9615956 2.6928719 2.5997378
## ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
## 1 -1.7202722 -0.88694718 -0.6557774
## 2 1.9568096 1.14638329 -0.6557774
## 3 1.9568096 1.23880740 -1.8000420
## 4 0.1182687 0.03729394 -1.8000420
## 5 0.1182687 -2.55058120 -2.9443066
## 6 0.1182687 -0.51725073 -1.8000420
## ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
## 1 -1.1540243 0.7174727 0.2317270
## 2 2.2161351 -0.8224687 0.2317270
## 3 -0.7046697 -0.8224687 0.2317270
## 4 0.4187168 -0.8224687 0.2317270
## 5 -1.8280562 -0.8224687 0.2981503
## 6 -1.3787016 -0.8224687 0.2317270
## ManufacturingProcess41 ManufacturingProcess43 ManufacturingProcess44
## 1 -0.06900773 2.40564734 -0.01588055
## 2 2.34626280 -0.01374656 0.29467248
## 3 -0.44058781 0.10146268 -0.01588055
## 4 -0.44058781 0.21667191 -0.01588055
## 5 -0.44058781 0.21667191 -0.32643359
## 6 -0.44058781 1.48397347 -0.01588055
## ManufacturingProcess45
## 1 0.64371849
## 2 0.15220242
## 3 0.39796046
## 4 -0.09355562
## 5 -0.09355562
## 6 -0.33931365
## [1] 176 46
## [1] 176 1
## alpha lambda
## 75 0.8 0.01584038
From the elastic net model fitted above, we can see that the best tune is with alpha = 0.3 and lambda = 0.40, since R2 was used to select the optimal model using the largest value.
## rsquared rmse
## 1 0.6846746 1.150071
R2 is equal to 0.639, while the RMSE is 1.122.
## RMSE Rsquared MAE
## 11.2216556 0.0264722 2.8782738
For the predicted values, we can see that the R=squared is quite low at 0.181 and the RMSE is also higher at 3.11
## glmnet variable importance
##
## only 20 most important variables shown (out of 46)
##
## Overall
## ManufacturingProcess26 5.10807
## ManufacturingProcess20 1.46344
## ManufacturingProcess32 1.42492
## ManufacturingProcess33 0.68191
## ManufacturingProcess30 0.62083
## ManufacturingProcess13 0.53432
## BiologicalMaterial06 0.34837
## ManufacturingProcess04 0.32541
## ManufacturingProcess28 0.25215
## BiologicalMaterial10 0.23152
## ManufacturingProcess09 0.23113
## ManufacturingProcess45 0.22319
## ManufacturingProcess37 0.20234
## ManufacturingProcess14 0.15921
## ManufacturingProcess03 0.15001
## ManufacturingProcess43 0.14675
## ManufacturingProcess39 0.11041
## ManufacturingProcess38 0.10098
## ManufacturingProcess05 0.09859
## BiologicalMaterial11 0.09777
From the fitted model, the coefficients of each predictor explain the corresponding impact on the target variable. From the above, we can see that ManufacturingProcess26 is the most important variable out of the 46 variables since it has the largest, absolute coefficient.
## variables coef
## 1 (Intercept) 39.79681844
## 2 BiologicalMaterial05 0.09069360
## 3 BiologicalMaterial06 0.34837004
## 4 BiologicalMaterial10 -0.23151548
## 5 BiologicalMaterial11 0.09776693
## 6 ManufacturingProcess01 0.08332074
## 7 ManufacturingProcess03 -0.15000924
## 8 ManufacturingProcess04 0.32541241
## 9 ManufacturingProcess05 -0.09859363
## 10 ManufacturingProcess06 0.01738979
## 11 ManufacturingProcess08 -0.07798636
## 12 ManufacturingProcess09 0.23112864
## 13 ManufacturingProcess10 -0.07586241
## 14 ManufacturingProcess12 0.04056577
## 15 ManufacturingProcess13 -0.53432158
## 16 ManufacturingProcess14 0.15921151
## 17 ManufacturingProcess16 0.04629846
## 18 ManufacturingProcess17 -0.05009535
## 19 ManufacturingProcess19 0.06461070
## 20 ManufacturingProcess20 -1.46344013
## 21 ManufacturingProcess22 -0.05807113
## 22 ManufacturingProcess23 0.01732239
## 23 ManufacturingProcess26 5.10806895
## 24 ManufacturingProcess28 -0.25214565
## 25 ManufacturingProcess30 0.62083037
## 26 ManufacturingProcess32 1.42491729
## 27 ManufacturingProcess33 -0.68191245
## 28 ManufacturingProcess34 0.01700857
## 29 ManufacturingProcess35 -0.06432492
## 30 ManufacturingProcess37 -0.20234413
## 31 ManufacturingProcess38 -0.10097678
## 32 ManufacturingProcess39 0.11040612
## 33 ManufacturingProcess43 0.14675431
## 34 ManufacturingProcess45 0.22318556
The table above shows the intercept and the co-efficients for each of the included predictors in the final tuned model.
A positive coefficient indicates that as the value of the predictor increases, the mean of the response variable also tends to increase. A negative coefficient indicates that as the predictor increases, the response variable tends to decrease. The coefficient value signifies how much the mean of the Yield changes given a one-unit shift in the predictor variable while all other variables in the model are held constant. This property of holding the other variables constant is important because it allows for the assessment of the effect of each variable in isolation from the others.
## variables coef
## 1 (Intercept) 39.79681844
## 2 BiologicalMaterial05 0.09069360
## 3 BiologicalMaterial06 0.34837004
## 4 BiologicalMaterial11 0.09776693
## 5 ManufacturingProcess01 0.08332074
## 6 ManufacturingProcess04 0.32541241
## 7 ManufacturingProcess06 0.01738979
## 8 ManufacturingProcess09 0.23112864
## 9 ManufacturingProcess12 0.04056577
## 10 ManufacturingProcess14 0.15921151
## 11 ManufacturingProcess16 0.04629846
## 12 ManufacturingProcess19 0.06461070
## 13 ManufacturingProcess23 0.01732239
## 14 ManufacturingProcess26 5.10806895
## 15 ManufacturingProcess30 0.62083037
## 16 ManufacturingProcess32 1.42491729
## 17 ManufacturingProcess34 0.01700857
## 18 ManufacturingProcess39 0.11040612
## 19 ManufacturingProcess43 0.14675431
## 20 ManufacturingProcess45 0.22318556
## variables coef
## 4 BiologicalMaterial10 -0.23151548
## 7 ManufacturingProcess03 -0.15000924
## 9 ManufacturingProcess05 -0.09859363
## 11 ManufacturingProcess08 -0.07798636
## 13 ManufacturingProcess10 -0.07586241
## 15 ManufacturingProcess13 -0.53432158
## 18 ManufacturingProcess17 -0.05009535
## 20 ManufacturingProcess20 -1.46344013
## 21 ManufacturingProcess22 -0.05807113
## 24 ManufacturingProcess28 -0.25214565
## 27 ManufacturingProcess33 -0.68191245
## 29 ManufacturingProcess35 -0.06432492
## 30 ManufacturingProcess37 -0.20234413
## 31 ManufacturingProcess38 -0.10097678
From the 17 positive coefficients, ManufacturingProcess26 improved the yield tremendously, and from the 6 negative co-effcients, ManufacturingProcess13 decreased the yield the most. This information can be used to modify the manufacturing process to increase the yield.
since biological predictors cannot be altered, we should focus on the manufacturing process-related predictors and explore further how those can be tilted in order to improve the yield.