Developing a model to predict permeability (see Sect. 1.4) could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug:
library(dplyr)
library(varImp)
library(elasticnet)
library(AppliedPredictiveModeling)
data(permeability)
str(permeability)
## num [1:165, 1] 12.52 1.12 19.41 1.73 1.68 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:165] "1" "2" "3" "4" ...
## ..$ : chr "permeability"
The matrix fingerprints contains the 1,107 binary molecular predictors for the 165 compounds, while permeability contains permeability response.
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:varImp':
##
## varImp
## The following objects are masked from 'package:measures':
##
## MAE, RMSE
dim(fingerprints)
## [1] 165 1107
We have 1107 predictors.
fp <- fingerprints[, -nearZeroVar(fingerprints)]
dim(fp)
## [1] 165 388
After using near zero function we are left out with 388 predictors for modeling.719 columns are removed.
Build the model
set.seed(1975)
trainingRows <- createDataPartition(permeability, p = .80, list= FALSE)
x_train <- fp[trainingRows, ]
y_train <- permeability[trainingRows]
x_test <- fp[-trainingRows, ]
y_test <- permeability[-trainingRows]
Pls_Fit <- train(x=x_train,
y=y_train,
method='pls',
metric='Rsquared',
tuneLength=20,
trControl=trainControl(method='cv'),
preProcess=c('center', 'scale')
)
Pls_Result <- Pls_Fit$results
Pls_Fit
## Partial Least Squares
##
## 133 samples
## 388 predictors
##
## Pre-processing: centered (388), scaled (388)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 119, 119, 120, 118, 121, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 12.82437 0.2838024 9.717487
## 2 11.74004 0.4121155 8.380243
## 3 11.75878 0.4183513 8.819616
## 4 11.72542 0.4267349 8.822729
## 5 11.41756 0.4501351 8.535854
## 6 11.36243 0.4558707 8.447026
## 7 11.42753 0.4565745 8.602487
## 8 11.38306 0.4594429 8.520737
## 9 11.43938 0.4652513 8.557379
## 10 11.59106 0.4600797 8.684356
## 11 11.78819 0.4460928 8.779366
## 12 11.79820 0.4460045 8.931621
## 13 12.05356 0.4299189 9.075983
## 14 12.33810 0.4165925 9.265023
## 15 12.66426 0.3920415 9.555767
## 16 13.04760 0.3737891 9.696449
## 17 13.35336 0.3700379 9.785863
## 18 13.56518 0.3693013 9.896085
## 19 13.85977 0.3560775 10.031240
## 20 13.97978 0.3618402 10.046974
##
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 9.
The optional ncomp value we got is 9 fwith R2 value as 0.4652513.
plot(Pls_Fit, col ="blue")
plsPred <- predict(Pls_Fit, newdata=x_test)
postResample(pred=plsPred, obs=y_test)
## RMSE Rsquared MAE
## 10.6879771 0.6455296 7.9698362
R2 for test set prediction is 0.6455296
Build Ridge model, lambda (from 0 to 1 by 0.1)
set.seed(1978)
ridgeFit <- train(x=x_train,
y=y_train,
method='ridge',
metric='Rsquared',
tuneGrid=data.frame(.lambda = seq(0, 1, by=0.1)),
trControl=trainControl(method='cv'),
preProcess=c('center','scale')
)
ridgeFit
## Ridge Regression
##
## 133 samples
## 388 predictors
##
## Pre-processing: centered (388), scaled (388)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 121, 120, 120, 118, 119, 121, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.0 13.30262 0.3699179 9.717486
## 0.1 12.51231 0.3946092 9.073459
## 0.2 12.51692 0.4144288 9.160555
## 0.3 12.70677 0.4281878 9.399541
## 0.4 12.99653 0.4383417 9.732002
## 0.5 13.35574 0.4462718 10.070762
## 0.6 13.77281 0.4525722 10.449211
## 0.7 14.23676 0.4577417 10.843842
## 0.8 14.74042 0.4620608 11.282368
## 0.9 15.28080 0.4656637 11.735541
## 1.0 15.84399 0.4689070 12.184461
##
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was lambda = 1.
plot(ridgeFit, col = "blue")
Build lasso model, fraction (from 0 to 0.5 by 0.05)
set.seed(1979)
lassoFit <- train(x=x_train,
y=y_train,
method='lasso',
metric='Rsquared',
tuneGrid=data.frame(.fraction = seq(0, 0.5, by=0.05)),
trControl=trainControl(method='cv'),
preProcess=c('center','scale')
)
lassoFit
## The lasso
##
## 133 samples
## 388 predictors
##
## Pre-processing: centered (388), scaled (388)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 120, 120, 120, 119, 121, ...
## Resampling results across tuning parameters:
##
## fraction RMSE Rsquared MAE
## 0.00 15.30333 NaN 12.101925
## 0.05 12.72322 0.4513424 9.198406
## 0.10 12.45164 0.4791129 8.533772
## 0.15 12.44457 0.4678709 8.615643
## 0.20 12.40386 0.4583045 8.515930
## 0.25 12.38774 0.4502046 8.539393
## 0.30 12.32033 0.4561066 8.509060
## 0.35 12.31641 0.4478454 8.580255
## 0.40 12.49048 0.4265954 8.774272
## 0.45 12.76620 0.4017907 8.951147
## 0.50 12.94915 0.3869022 9.029649
##
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was fraction = 0.1.
plot(lassoFit, col = "blue")
Build ElasticNet model, fraction and lambda (2-D grid with each D from 0 to 1 by 0.1)
set.seed(1980)
enetFit <- train(x=x_train,
y=y_train,
method='enet',
metric='Rsquared',
tuneGrid=expand.grid(.fraction = seq(0, 1, by=0.1),
.lambda = seq(0, 1, by=0.1)),
trControl=trainControl(method='cv'),
preProcess=c('center','scale')
)
enetFit
## Elasticnet
##
## 133 samples
## 388 predictors
##
## Pre-processing: centered (388), scaled (388)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 119, 120, 120, 121, 121, 120, ...
## Resampling results across tuning parameters:
##
## lambda fraction RMSE Rsquared MAE
## 0.0 0.0 14.92814 NaN 11.809472
## 0.0 0.1 12.06302 0.3868374 8.709104
## 0.0 0.2 12.51837 0.3755235 9.059791
## 0.0 0.3 12.58383 0.3840898 9.136017
## 0.0 0.4 12.46541 0.3955282 8.993267
## 0.0 0.5 12.47292 0.4053663 8.981291
## 0.0 0.6 12.77650 0.4028777 9.114889
## 0.0 0.7 13.23335 0.3963114 9.410939
## 0.0 0.8 13.79899 0.3825910 9.795211
## 0.0 0.9 14.24226 0.3746923 10.105872
## 0.0 1.0 14.66576 0.3672121 10.353615
## 0.1 0.0 14.92814 NaN 11.809472
## 0.1 0.1 11.82105 0.4070263 8.352050
## 0.1 0.2 11.77844 0.4072313 8.614019
## 0.1 0.3 11.63624 0.4244399 8.539275
## 0.1 0.4 11.69945 0.4265356 8.455723
## 0.1 0.5 11.80332 0.4293193 8.333748
## 0.1 0.6 11.93407 0.4289248 8.382518
## 0.1 0.7 12.02487 0.4308984 8.544490
## 0.1 0.8 12.12739 0.4322087 8.699894
## 0.1 0.9 12.23057 0.4332321 8.809751
## 0.1 1.0 12.35601 0.4309401 8.906825
## 0.2 0.0 14.92814 NaN 11.809472
## 0.2 0.1 11.81124 0.4111448 8.399745
## 0.2 0.2 11.85087 0.4120686 8.602259
## 0.2 0.3 11.70084 0.4310323 8.624337
## 0.2 0.4 11.71323 0.4360912 8.547983
## 0.2 0.5 11.78824 0.4395082 8.438153
## 0.2 0.6 11.86567 0.4447895 8.473867
## 0.2 0.7 11.99120 0.4452297 8.663189
## 0.2 0.8 12.09507 0.4467855 8.800416
## 0.2 0.9 12.18409 0.4491048 8.922343
## 0.2 1.0 12.27557 0.4498422 9.033042
## 0.3 0.0 14.92814 NaN 11.809472
## 0.3 0.1 11.81084 0.4115808 8.409769
## 0.3 0.2 11.91123 0.4172757 8.543041
## 0.3 0.3 11.82585 0.4344429 8.715412
## 0.3 0.4 11.81445 0.4435193 8.664466
## 0.3 0.5 11.90026 0.4464677 8.626396
## 0.3 0.6 11.98878 0.4516578 8.664102
## 0.3 0.7 12.11464 0.4540042 8.854863
## 0.3 0.8 12.26144 0.4540805 9.060447
## 0.3 0.9 12.36255 0.4564871 9.200626
## 0.3 1.0 12.45419 0.4587318 9.309379
## 0.4 0.0 14.92814 NaN 11.809472
## 0.4 0.1 11.81583 0.4107636 8.409205
## 0.4 0.2 11.98713 0.4209486 8.528233
## 0.4 0.3 11.99155 0.4360406 8.803195
## 0.4 0.4 12.00741 0.4467751 8.825949
## 0.4 0.5 12.10092 0.4506162 8.832692
## 0.4 0.6 12.21504 0.4553178 8.907371
## 0.4 0.7 12.35393 0.4583580 9.093223
## 0.4 0.8 12.52861 0.4585143 9.327192
## 0.4 0.9 12.64886 0.4612142 9.503352
## 0.4 1.0 12.75461 0.4640106 9.642399
## 0.5 0.0 14.92814 NaN 11.809472
## 0.5 0.1 11.82933 0.4090958 8.390258
## 0.5 0.2 12.09390 0.4214220 8.519929
## 0.5 0.3 12.19263 0.4370645 8.923711
## 0.5 0.4 12.24304 0.4488127 8.993902
## 0.5 0.5 12.35957 0.4536972 9.070980
## 0.5 0.6 12.50942 0.4576811 9.182375
## 0.5 0.7 12.67240 0.4607841 9.365318
## 0.5 0.8 12.86390 0.4617598 9.652265
## 0.5 0.9 13.00840 0.4644079 9.862298
## 0.5 1.0 13.13305 0.4676140 10.035603
## 0.6 0.0 14.92814 NaN 11.809472
## 0.6 0.1 11.85094 0.4071436 8.371064
## 0.6 0.2 12.20654 0.4219231 8.534682
## 0.6 0.3 12.41786 0.4378925 9.054915
## 0.6 0.4 12.52176 0.4501736 9.201354
## 0.6 0.5 12.66871 0.4559463 9.357358
## 0.6 0.6 12.85748 0.4594138 9.502046
## 0.6 0.7 13.05411 0.4623182 9.725584
## 0.6 0.8 13.25974 0.4641673 10.025985
## 0.6 0.9 13.42804 0.4668672 10.276843
## 0.6 1.0 13.57348 0.4702076 10.482250
## 0.7 0.0 14.92814 NaN 11.809472
## 0.7 0.1 11.88425 0.4048350 8.367710
## 0.7 0.2 12.33872 0.4221091 8.574851
## 0.7 0.3 12.66816 0.4385333 9.204526
## 0.7 0.4 12.83923 0.4511248 9.432889
## 0.7 0.5 13.02277 0.4572653 9.650004
## 0.7 0.6 13.24790 0.4607659 9.833800
## 0.7 0.7 13.48167 0.4634726 10.101674
## 0.7 0.8 13.70447 0.4659228 10.430506
## 0.7 0.9 13.89593 0.4687864 10.715262
## 0.7 1.0 14.06326 0.4721350 10.935175
## 0.8 0.0 14.92814 NaN 11.809472
## 0.8 0.1 11.92548 0.4024975 8.375759
## 0.8 0.2 12.48844 0.4223267 8.621213
## 0.8 0.3 12.94072 0.4390992 9.362509
## 0.8 0.4 13.19033 0.4515667 9.683356
## 0.8 0.5 13.41449 0.4581204 9.964285
## 0.8 0.6 13.67308 0.4619623 10.181490
## 0.8 0.7 13.94345 0.4644661 10.495549
## 0.8 0.8 14.18686 0.4673771 10.844149
## 0.8 0.9 14.40497 0.4702601 11.149337
## 0.8 1.0 14.59394 0.4735999 11.387036
## 0.9 0.0 14.92814 NaN 11.809472
## 0.9 0.1 11.97323 0.4002734 8.382751
## 0.9 0.2 12.65161 0.4221899 8.678556
## 0.9 0.3 13.23662 0.4393919 9.525255
## 0.9 0.4 13.56997 0.4515703 9.952608
## 0.9 0.5 13.83662 0.4587367 10.278642
## 0.9 0.6 14.13356 0.4627831 10.554721
## 0.9 0.7 14.43922 0.4651515 10.910095
## 0.9 0.8 14.70715 0.4682803 11.259464
## 0.9 0.9 14.94868 0.4713706 11.579854
## 0.9 1.0 15.15859 0.4747033 11.833890
## 1.0 0.0 14.92814 NaN 11.809472
## 1.0 0.1 12.02696 0.3982054 8.387693
## 1.0 0.2 12.82758 0.4219231 8.747192
## 1.0 0.3 13.54789 0.4397650 9.704109
## 1.0 0.4 13.97145 0.4513976 10.225134
## 1.0 0.5 14.28742 0.4590379 10.602224
## 1.0 0.6 14.62330 0.4633605 10.946498
## 1.0 0.7 14.96271 0.4656276 11.333849
## 1.0 0.8 15.25936 0.4688519 11.676644
## 1.0 0.9 15.52007 0.4722818 12.007571
## 1.0 1.0 15.75182 0.4755504 12.276304
##
## Rsquared was used to select the optimal model using the largest value.
## The final values used for the model were fraction = 1 and lambda = 1.
plot(enetFit)
Compare models:
multiResample <- function(models, newdata, obs){
res = list()
methods = c()
i = 1
for (model in models){
pred <- predict(model, newdata=newdata)
metrics <- postResample(pred=pred, obs=obs)
res[[i]] <- metrics
methods[[i]] <- model$method
i <- 1 + i
}
names(res) <- methods
return(res)
}
models <- list(ridgeFit, lassoFit, enetFit)
(resampleResult <- multiResample(models, x_test, y_test))
## $ridge
## RMSE Rsquared MAE
## 15.6805328 0.6088296 12.3338009
##
## $lasso
## RMSE Rsquared MAE
## 10.510243 0.684831 7.236435
##
## $enet
## RMSE Rsquared MAE
## 15.6805328 0.6088296 12.3338009
According to the evaluation we can say that the best model is lasso with R2:0.684831
Plot a histogram to see what target variable permeability:
hist(permeability, col="lightblue")
The above graph of target variable permeability indicates that most of the results are below 10 and many are below 5. I would not recommend any other models to replace permeability laboratory experiment.
A chemical manufacturing process for a pharmaceutical product was discussed in Sect. 1.4. In this problem, the objective is to understand the relationship between biological measurements of the raw materials (predictors), measurements of the manufacturing process (predictors), and the response of product yield. Biological predictors cannot be changed but can be used to assess the quality of the raw material before processing. On the other hand, manufacturing process predictors can be changed in the manufacturing process. Improving product yield by 1% will boost revenue by approximately one hundred thousand dollars per batch:
library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)
str(ChemicalManufacturingProcess)
## 'data.frame': 176 obs. of 58 variables:
## $ Yield : num 38 42.4 42 41.4 42.5 ...
## $ BiologicalMaterial01 : num 6.25 8.01 8.01 8.01 7.47 6.12 7.48 6.94 6.94 6.94 ...
## $ BiologicalMaterial02 : num 49.6 61 61 61 63.3 ...
## $ BiologicalMaterial03 : num 57 67.5 67.5 67.5 72.2 ...
## $ BiologicalMaterial04 : num 12.7 14.6 14.6 14.6 14 ...
## $ BiologicalMaterial05 : num 19.5 19.4 19.4 19.4 17.9 ...
## $ BiologicalMaterial06 : num 43.7 53.1 53.1 53.1 54.7 ...
## $ BiologicalMaterial07 : num 100 100 100 100 100 100 100 100 100 100 ...
## $ BiologicalMaterial08 : num 16.7 19 19 19 18.2 ...
## $ BiologicalMaterial09 : num 11.4 12.6 12.6 12.6 12.8 ...
## $ BiologicalMaterial10 : num 3.46 3.46 3.46 3.46 3.05 3.78 3.04 3.85 3.85 3.85 ...
## $ BiologicalMaterial11 : num 138 154 154 154 148 ...
## $ BiologicalMaterial12 : num 18.8 21.1 21.1 21.1 21.1 ...
## $ ManufacturingProcess01: num NA 0 0 0 10.7 12 11.5 12 12 12 ...
## $ ManufacturingProcess02: num NA 0 0 0 0 0 0 0 0 0 ...
## $ ManufacturingProcess03: num NA NA NA NA NA NA 1.56 1.55 1.56 1.55 ...
## $ ManufacturingProcess04: num NA 917 912 911 918 924 933 929 928 938 ...
## $ ManufacturingProcess05: num NA 1032 1004 1015 1028 ...
## $ ManufacturingProcess06: num NA 210 207 213 206 ...
## $ ManufacturingProcess07: num NA 177 178 177 178 178 177 178 177 177 ...
## $ ManufacturingProcess08: num NA 178 178 177 178 178 178 178 177 177 ...
## $ ManufacturingProcess09: num 43 46.6 45.1 44.9 45 ...
## $ ManufacturingProcess10: num NA NA NA NA NA NA 11.6 10.2 9.7 10.1 ...
## $ ManufacturingProcess11: num NA NA NA NA NA NA 11.5 11.3 11.1 10.2 ...
## $ ManufacturingProcess12: num NA 0 0 0 0 0 0 0 0 0 ...
## $ ManufacturingProcess13: num 35.5 34 34.8 34.8 34.6 34 32.4 33.6 33.9 34.3 ...
## $ ManufacturingProcess14: num 4898 4869 4878 4897 4992 ...
## $ ManufacturingProcess15: num 6108 6095 6087 6102 6233 ...
## $ ManufacturingProcess16: num 4682 4617 4617 4635 4733 ...
## $ ManufacturingProcess17: num 35.5 34 34.8 34.8 33.9 33.4 33.8 33.6 33.9 35.3 ...
## $ ManufacturingProcess18: num 4865 4867 4877 4872 4886 ...
## $ ManufacturingProcess19: num 6049 6097 6078 6073 6102 ...
## $ ManufacturingProcess20: num 4665 4621 4621 4611 4659 ...
## $ ManufacturingProcess21: num 0 0 0 0 -0.7 -0.6 1.4 0 0 1 ...
## $ ManufacturingProcess22: num NA 3 4 5 8 9 1 2 3 4 ...
## $ ManufacturingProcess23: num NA 0 1 2 4 1 1 2 3 1 ...
## $ ManufacturingProcess24: num NA 3 4 5 18 1 1 2 3 4 ...
## $ ManufacturingProcess25: num 4873 4869 4897 4892 4930 ...
## $ ManufacturingProcess26: num 6074 6107 6116 6111 6151 ...
## $ ManufacturingProcess27: num 4685 4630 4637 4630 4684 ...
## $ ManufacturingProcess28: num 10.7 11.2 11.1 11.1 11.3 11.4 11.2 11.1 11.3 11.4 ...
## $ ManufacturingProcess29: num 21 21.4 21.3 21.3 21.6 21.7 21.2 21.2 21.5 21.7 ...
## $ ManufacturingProcess30: num 9.9 9.9 9.4 9.4 9 10.1 11.2 10.9 10.5 9.8 ...
## $ ManufacturingProcess31: num 69.1 68.7 69.3 69.3 69.4 68.2 67.6 67.9 68 68.5 ...
## $ ManufacturingProcess32: num 156 169 173 171 171 173 159 161 160 164 ...
## $ ManufacturingProcess33: num 66 66 66 68 70 70 65 65 65 66 ...
## $ ManufacturingProcess34: num 2.4 2.6 2.6 2.5 2.5 2.5 2.5 2.5 2.5 2.5 ...
## $ ManufacturingProcess35: num 486 508 509 496 468 490 475 478 491 488 ...
## $ ManufacturingProcess36: num 0.019 0.019 0.018 0.018 0.017 0.018 0.019 0.019 0.019 0.019 ...
## $ ManufacturingProcess37: num 0.5 2 0.7 1.2 0.2 0.4 0.8 1 1.2 1.8 ...
## $ ManufacturingProcess38: num 3 2 2 2 2 2 2 2 3 3 ...
## $ ManufacturingProcess39: num 7.2 7.2 7.2 7.2 7.3 7.2 7.3 7.3 7.4 7.1 ...
## $ ManufacturingProcess40: num NA 0.1 0 0 0 0 0 0 0 0 ...
## $ ManufacturingProcess41: num NA 0.15 0 0 0 0 0 0 0 0 ...
## $ ManufacturingProcess42: num 11.6 11.1 12 10.6 11 11.5 11.7 11.4 11.4 11.3 ...
## $ ManufacturingProcess43: num 3 0.9 1 1.1 1.1 2.2 0.7 0.8 0.9 0.8 ...
## $ ManufacturingProcess44: num 1.8 1.9 1.8 1.8 1.7 1.8 2 2 1.9 1.9 ...
## $ ManufacturingProcess45: num 2.4 2.2 2.3 2.1 2.1 2 2.2 2.2 2.1 2.4 ...
The matrix processPredictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.
We will use the missmap function available in Amelia package to find out the missing values:
library(Amelia)
## Loading required package: Rcpp
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.1, built: 2022-11-18)
## ## Copyright (C) 2005-2023 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
missmap(ChemicalManufacturingProcess, col = c("purple", "lightblue"))
Use bagImpute method to impute missing values:
cmpImpute <- preProcess(ChemicalManufacturingProcess[,-c(1)], method=c('bagImpute'))
cmpImpute
## Created from 152 samples and 57 variables
##
## Pre-processing:
## - bagged tree imputation (57)
## - ignored (0)
cmp <- predict(cmpImpute, ChemicalManufacturingProcess[,-c(1)])
set.seed(1977)
trainRow <- createDataPartition(ChemicalManufacturingProcess$Yield, p=0.8, list=FALSE)
x_train <- cmp[trainRow, ]
y_train <- ChemicalManufacturingProcess$Yield[trainRow]
x_test <- cmp[-trainRow, ]
y_test <- ChemicalManufacturingProcess$Yield[-trainRow]
Build Elastic Net model. Lambda (from 0 to 1 by 0.1). RMSE is used as metric
set.seed(1981)
enetFit <- train(x=x_train,
y=y_train,
method='enet',
metric='RMSE',
tuneGrid=expand.grid(.fraction = seq(0, 1, by=0.1),
.lambda = seq(0, 1, by=0.1)),
trControl=trainControl(method='cv'),
preProcess=c('center','scale')
)
enetFit
## Elasticnet
##
## 144 samples
## 57 predictor
##
## Pre-processing: centered (57), scaled (57)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 130, 129, 129, 129, 129, 129, ...
## Resampling results across tuning parameters:
##
## lambda fraction RMSE Rsquared MAE
## 0.0 0.0 1.864708 NaN 1.5124430
## 0.0 0.1 1.239645 0.6135148 1.0033249
## 0.0 0.2 1.849843 0.5972989 1.1663513
## 0.0 0.3 3.057038 0.4832820 1.5325351
## 0.0 0.4 3.508008 0.4603980 1.6644773
## 0.0 0.5 3.704933 0.4488960 1.7383326
## 0.0 0.6 3.793850 0.4390087 1.7894433
## 0.0 0.7 3.737190 0.4318544 1.8004578
## 0.0 0.8 4.606455 0.4194939 2.0526604
## 0.0 0.9 5.364310 0.4097940 2.2733560
## 0.0 1.0 6.025003 0.4075752 2.4531389
## 0.1 0.0 1.864708 NaN 1.5124430
## 0.1 0.1 1.502226 0.5920297 1.2346175
## 0.1 0.2 1.268324 0.5999394 1.0521655
## 0.1 0.3 1.212464 0.6099627 0.9970262
## 0.1 0.4 1.243106 0.6122161 0.9914471
## 0.1 0.5 1.346872 0.5995365 1.0207451
## 0.1 0.6 1.405440 0.5970689 1.0455283
## 0.1 0.7 1.560328 0.5692091 1.0974347
## 0.1 0.8 1.757807 0.5502589 1.1592417
## 0.1 0.9 2.008009 0.5284495 1.2322544
## 0.1 1.0 2.185558 0.5169575 1.2832988
## 0.2 0.0 1.864708 NaN 1.5124430
## 0.2 0.1 1.535985 0.5877786 1.2596079
## 0.2 0.2 1.302940 0.5986808 1.0758782
## 0.2 0.3 1.215772 0.6016854 1.0141343
## 0.2 0.4 1.215240 0.6117582 0.9881412
## 0.2 0.5 1.332598 0.5972761 1.0203729
## 0.2 0.6 1.485125 0.5863286 1.0618717
## 0.2 0.7 1.565605 0.5859770 1.0916767
## 0.2 0.8 1.664744 0.5632208 1.1357395
## 0.2 0.9 1.740270 0.5534231 1.1604963
## 0.2 1.0 1.891957 0.5478582 1.2030571
## 0.3 0.0 1.864708 NaN 1.5124430
## 0.3 0.1 1.545179 0.5857402 1.2661119
## 0.3 0.2 1.312776 0.5982781 1.0838782
## 0.3 0.3 1.217187 0.5995322 1.0185969
## 0.3 0.4 1.201215 0.6125325 0.9843902
## 0.3 0.5 1.291184 0.6001112 1.0112346
## 0.3 0.6 1.484359 0.5816377 1.0708700
## 0.3 0.7 1.576937 0.5815186 1.0953902
## 0.3 0.8 1.617727 0.5773968 1.1245459
## 0.3 0.9 1.693710 0.5604714 1.1595983
## 0.3 1.0 1.773722 0.5583247 1.1842533
## 0.4 0.0 1.864708 NaN 1.5124430
## 0.4 0.1 1.546977 0.5851944 1.2669847
## 0.4 0.2 1.314960 0.5978081 1.0860271
## 0.4 0.3 1.217042 0.5983458 1.0195366
## 0.4 0.4 1.201600 0.6099615 0.9832249
## 0.4 0.5 1.271859 0.6019177 1.0087425
## 0.4 0.6 1.476389 0.5807753 1.0760080
## 0.4 0.7 1.574377 0.5779499 1.1089203
## 0.4 0.8 1.593516 0.5814813 1.1245041
## 0.4 0.9 1.672430 0.5632543 1.1656748
## 0.4 1.0 1.722069 0.5626953 1.1854640
## 0.5 0.0 1.864708 NaN 1.5124430
## 0.5 0.1 1.545827 0.5852066 1.2659469
## 0.5 0.2 1.314156 0.5972491 1.0859746
## 0.5 0.3 1.215507 0.5982284 1.0182419
## 0.5 0.4 1.207684 0.6062406 0.9835650
## 0.5 0.5 1.264389 0.6039307 1.0105478
## 0.5 0.6 1.471034 0.5807430 1.0838435
## 0.5 0.7 1.569032 0.5769134 1.1211745
## 0.5 0.8 1.590807 0.5824616 1.1354444
## 0.5 0.9 1.672189 0.5654378 1.1772818
## 0.5 1.0 1.707850 0.5640827 1.1952201
## 0.6 0.0 1.864708 NaN 1.5124430
## 0.6 0.1 1.543498 0.5854435 1.2640469
## 0.6 0.2 1.311875 0.5968162 1.0846070
## 0.6 0.3 1.214763 0.5975884 1.0168327
## 0.6 0.4 1.216154 0.6026549 0.9856500
## 0.6 0.5 1.266337 0.6052448 1.0164073
## 0.6 0.6 1.468271 0.5809156 1.0925265
## 0.6 0.7 1.571173 0.5768353 1.1356662
## 0.6 0.8 1.598625 0.5826159 1.1516533
## 0.6 0.9 1.676701 0.5688602 1.1939424
## 0.6 1.0 1.717129 0.5642442 1.2123446
## 0.7 0.0 1.864708 NaN 1.5124430
## 0.7 0.1 1.540392 0.5854592 1.2615660
## 0.7 0.2 1.308661 0.5964326 1.0823773
## 0.7 0.3 1.214875 0.5963475 1.0154703
## 0.7 0.4 1.225498 0.5997091 0.9888900
## 0.7 0.5 1.275268 0.6054795 1.0225005
## 0.7 0.6 1.469162 0.5812881 1.1036176
## 0.7 0.7 1.579302 0.5772512 1.1538967
## 0.7 0.8 1.614867 0.5819056 1.1753395
## 0.7 0.9 1.684216 0.5714734 1.2163170
## 0.7 1.0 1.741943 0.5641199 1.2408560
## 0.8 0.0 1.864708 NaN 1.5124430
## 0.8 0.1 1.537101 0.5851944 1.2589236
## 0.8 0.2 1.304940 0.5960604 1.0797883
## 0.8 0.3 1.215243 0.5950762 1.0139589
## 0.8 0.4 1.234480 0.5977314 0.9909266
## 0.8 0.5 1.289135 0.6047042 1.0298212
## 0.8 0.6 1.475169 0.5818443 1.1174357
## 0.8 0.7 1.591459 0.5781491 1.1734493
## 0.8 0.8 1.638014 0.5811556 1.2011245
## 0.8 0.9 1.702300 0.5731856 1.2420994
## 0.8 1.0 1.777503 0.5641104 1.2748777
## 0.9 0.0 1.864708 NaN 1.5124430
## 0.9 0.1 1.533588 0.5848772 1.2561131
## 0.9 0.2 1.301250 0.5954486 1.0778693
## 0.9 0.3 1.216052 0.5936714 1.0125888
## 0.9 0.4 1.244299 0.5959415 0.9937499
## 0.9 0.5 1.306452 0.6032804 1.0404248
## 0.9 0.6 1.487025 0.5821498 1.1345525
## 0.9 0.7 1.610996 0.5787033 1.1956919
## 0.9 0.8 1.666228 0.5805678 1.2279459
## 0.9 0.9 1.729621 0.5742054 1.2722547
## 0.9 1.0 1.820801 0.5643297 1.3123049
## 1.0 0.0 1.864708 NaN 1.5124430
## 1.0 0.1 1.529873 0.5845239 1.2531391
## 1.0 0.2 1.297481 0.5948839 1.0759408
## 1.0 0.3 1.217124 0.5923127 1.0114886
## 1.0 0.4 1.254227 0.5944094 0.9967872
## 1.0 0.5 1.325435 0.6016949 1.0543388
## 1.0 0.6 1.503499 0.5821198 1.1523663
## 1.0 0.7 1.633108 0.5790515 1.2187505
## 1.0 0.8 1.699105 0.5799548 1.2575073
## 1.0 0.9 1.774791 0.5727015 1.3102632
## 1.0 1.0 1.869861 0.5647588 1.3554041
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 0.4 and lambda = 0.3.
plot(enetFit)
The best parameter combo is fraction = 0.4, lambda = 0.3, with the RMSE = 1.217042.
enet_Pred <- predict(enetFit, newdata=x_test)
(predResult <- postResample(pred=enet_Pred, obs=y_test))
## RMSE Rsquared MAE
## 1.0235202 0.6221641 0.7542969
The test set RMSE is 1.0235202, which is less tham RMSE for training set. Test set should be better.
coeffs <- predict.enet(enetFit$finalModel, s=enetFit$bestTune[1, "fraction"], type="coef", mode="fraction")$coefficients
Display the predictors
coeffs
## BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
## 0.00000000 0.14523788 0.02967554
## BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
## 0.00000000 0.00000000 0.06689327
## BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
## 0.00000000 0.01045532 0.00000000
## BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
## 0.00000000 0.02390760 0.00000000
## ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
## 0.00000000 0.00000000 0.00000000
## ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
## 0.00000000 0.00000000 0.14352750
## ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
## 0.00000000 0.00000000 0.39165718
## ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
## 0.00000000 0.05717589 0.00000000
## ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
## -0.29560835 0.00000000 0.07651259
## ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
## 0.00000000 -0.24378641 0.00000000
## ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
## 0.00000000 0.00000000 0.00000000
## ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
## 0.00000000 0.00000000 0.00000000
## ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
## 0.00000000 0.00000000 0.00000000
## ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
## 0.00000000 0.00000000 0.00000000
## ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
## 0.00000000 0.57871912 0.00000000
## ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
## 0.12555693 0.00000000 -0.28454206
## ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
## -0.03954640 0.00000000 0.01710914
## ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
## 0.00000000 0.00000000 0.00000000
## ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
## 0.00000000 0.03429687 0.06347790
Bassed on above results we can observe some of the predictors are zero.
Lets find out the important predictors
coeffs.sorted <- abs(coeffs)
coeffs.sorted <- coeffs.sorted[coeffs.sorted>0]
(coeffs.sorted <- sort(coeffs.sorted, decreasing = T))
## ManufacturingProcess32 ManufacturingProcess09 ManufacturingProcess13
## 0.57871912 0.39165718 0.29560835
## ManufacturingProcess36 ManufacturingProcess17 BiologicalMaterial02
## 0.28454206 0.24378641 0.14523788
## ManufacturingProcess06 ManufacturingProcess34 ManufacturingProcess15
## 0.14352750 0.12555693 0.07651259
## BiologicalMaterial06 ManufacturingProcess45 ManufacturingProcess11
## 0.06689327 0.06347790 0.05717589
## ManufacturingProcess37 ManufacturingProcess44 BiologicalMaterial03
## 0.03954640 0.03429687 0.02967554
## BiologicalMaterial11 ManufacturingProcess39 BiologicalMaterial08
## 0.02390760 0.01710914 0.01045532
(temp <- varImp(enetFit))
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess13 100.00
## ManufacturingProcess32 96.86
## ManufacturingProcess17 92.26
## BiologicalMaterial06 85.32
## ManufacturingProcess09 84.61
## BiologicalMaterial12 78.03
## ManufacturingProcess36 76.70
## ManufacturingProcess06 74.87
## BiologicalMaterial03 74.63
## ManufacturingProcess31 68.87
## BiologicalMaterial02 68.73
## BiologicalMaterial11 59.06
## ManufacturingProcess29 54.91
## ManufacturingProcess11 54.00
## BiologicalMaterial08 47.54
## BiologicalMaterial04 45.77
## BiologicalMaterial01 44.54
## ManufacturingProcess33 44.36
## BiologicalMaterial09 42.68
## ManufacturingProcess30 41.52
The data shows the top 20 values out of 57. Of the 20 values, we have 11 values for the Manufacturing process and 9 values for Biological Material. The three most important values are all Manufacturing process
The coefficients directly explain how the predictors affect the target. Positive coefficients improve performance, while negative coefficients reduce it.
Positive coefficients for Manufacturing Process:
coeffs_mp <- coeffs.sorted[grep('ManufacturingProcess', names(coeffs.sorted))] %>% names() %>% coeffs[.]
coeffs_mp[coeffs_mp>0]
## ManufacturingProcess32 ManufacturingProcess09 ManufacturingProcess06
## 0.57871912 0.39165718 0.14352750
## ManufacturingProcess34 ManufacturingProcess15 ManufacturingProcess45
## 0.12555693 0.07651259 0.06347790
## ManufacturingProcess11 ManufacturingProcess44 ManufacturingProcess39
## 0.05717589 0.03429687 0.01710914
Negative coefficients for Manufacturing Process:
coeffs_mp[coeffs_mp<0]
## ManufacturingProcess13 ManufacturingProcess36 ManufacturingProcess17
## -0.2956083 -0.2845421 -0.2437864
## ManufacturingProcess37
## -0.0395464