Code
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, error = FALSE)knitr::opts_chunk$set(echo = TRUE, warning = FALSE, error = FALSE)In Kuhn and Johnson do problems 6.2 and 6.3. There are only two but they consist of many parts. Please submit a link to your Rpubs and submit the .rmd file as well.
6.2. Developing a model to predict permeability (see Sect. 1.4) could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug:
The matrix fingerprints contains the 1,107 binary molecular predictors for the 165 compounds, while permeability contains permeability response.
library(AppliedPredictiveModeling)
data(permeability)
dim(fingerprints)[1] 165 1107
filter_predictors <- nearZeroVar(fingerprints)
fingerprints_filter<- fingerprints[,-filter_predictors]
dim(fingerprints_filter)[1] 165 388
ANSWER:AFter filtering there are 388 predictors left.
model_df <- as.data.frame(fingerprints_filter) %>% bind_cols(permeability)#library(mdatools)
#install.packages("pls")
library(pls)
set.seed(123)
train_index <- createDataPartition(model_df$permeability , p=.8, list=F)
train <- model_df[ train_index,]
test <- model_df[-train_index,]
pls_modeldf <- train(
permeability ~ ., data = train, method = "pls",
trControl = trainControl("cv", number = 10),
tuneLength = 20,preProc = c("center", "scale")
)
(pls_modeldf)Partial Least Squares
133 samples
388 predictors
Pre-processing: centered (388), scaled (388)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 121, 121, 118, 119, 119, 119, ...
Resampling results across tuning parameters:
ncomp RMSE Rsquared MAE
1 13.31894 0.3442124 10.254018
2 11.78898 0.4830504 8.534741
3 11.98818 0.4792649 9.219285
4 12.04349 0.4923322 9.448926
5 11.79823 0.5193195 9.049121
6 11.53275 0.5335956 8.658301
7 11.64053 0.5229621 8.878265
8 11.86459 0.5144801 9.265252
9 11.98385 0.5188205 9.218594
10 12.55634 0.4808614 9.610747
11 12.69674 0.4758068 9.702325
12 13.01534 0.4538906 9.956623
13 13.12637 0.4367362 9.878017
14 13.44865 0.4140715 10.065088
15 13.60135 0.4034269 10.188150
16 13.79361 0.3943904 10.247160
17 14.00756 0.3845119 10.412776
18 14.18113 0.3711378 10.587027
19 14.25674 0.3703610 10.575726
20 14.33121 0.3723176 10.679764
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 6.
plot(pls_modeldf)summary(pls_modeldf)Data: X dimension: 133 388
Y dimension: 133 1
Fit method: oscorespls
Number of components considered: 6
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
X 22.98 34.61 40.51 46.13 53.69 58.12
.outcome 33.73 55.03 61.84 67.77 71.65 75.73
pls_pred <- predict(pls_modeldf, newdata=test)
postResample(pred = pls_pred, obs = test[, "permeability"]) RMSE Rsquared MAE
12.3486900 0.3244542 8.2881075
set.seed(1001)
ctrl <- trainControl(method = "cv", number = 10)
#PCR
pls_modeldf1 <- train(
permeability ~ ., data = train, method = "pcr",
trControl = trainControl("cv", number = 10),
tuneLength = 20,preProc = c("center", "scale")
)
ls_pred <- predict(pls_modeldf1, newdata=test)
postResample(pred = ls_pred, obs = test[, "permeability"]) RMSE Rsquared MAE
12.2076841 0.2922108 8.2251123
plot(pls_modeldf1)pls_modeldf1Principal Component Analysis
133 samples
388 predictors
Pre-processing: centered (388), scaled (388)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 120, 120, 117, 121, 119, 121, ...
Resampling results across tuning parameters:
ncomp RMSE Rsquared MAE
1 15.14668 0.1315485 11.952810
2 15.13213 0.1349274 11.963591
3 14.21507 0.2701217 10.991641
4 14.35656 0.2257860 11.249392
5 12.43672 0.4331662 8.830270
6 12.62043 0.4238408 8.971598
7 12.66619 0.4152104 9.007578
8 12.13353 0.4598869 8.534978
9 12.11428 0.4613030 8.655395
10 12.00776 0.4752162 8.735052
11 11.79698 0.4846552 8.585593
12 11.90393 0.4770794 8.683035
13 11.94076 0.4759730 8.672465
14 12.06253 0.4659794 8.771641
15 12.05160 0.4691230 8.808072
16 12.06959 0.4704286 8.856250
17 12.07411 0.4704429 8.870894
18 12.24831 0.4567389 9.069925
19 12.04358 0.4750833 8.896307
20 11.92057 0.4871423 8.832622
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 11.
#GLMNET
set.seed(1001)
enetGrid <- expand.grid(alpha=seq(0,1,by=0.05),
lambda=seq(0,1,by=0.05))
enetTune <- train(permeability ~ ., data = train,
method = 'glmnet',
tuneGrid = enetGrid,
trControl = ctrl,
preProc = c('center','scale'))
enet_predict <- predict(enetTune, newdata=test)
postResample(pred = enet_predict, obs = test[, "permeability"]) RMSE Rsquared MAE
10.9472977 0.3973954 7.2993799
enetTune glmnet
133 samples
388 predictors
Pre-processing: centered (388), scaled (388)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 120, 120, 117, 121, 119, 121, ...
Resampling results across tuning parameters:
alpha lambda RMSE Rsquared MAE
0.00 0.00 11.53793 0.5250415 8.515914
0.00 0.05 11.53793 0.5250415 8.515914
0.00 0.10 11.53793 0.5250415 8.515914
0.00 0.15 11.53793 0.5250415 8.515914
0.00 0.20 11.53793 0.5250415 8.515914
0.00 0.25 11.53793 0.5250415 8.515914
0.00 0.30 11.53793 0.5250415 8.515914
0.00 0.35 11.53793 0.5250415 8.515914
0.00 0.40 11.53793 0.5250415 8.515914
0.00 0.45 11.53793 0.5250415 8.515914
0.00 0.50 11.53793 0.5250415 8.515914
0.00 0.55 11.53793 0.5250415 8.515914
0.00 0.60 11.53793 0.5250415 8.515914
0.00 0.65 11.53793 0.5250415 8.515914
0.00 0.70 11.53793 0.5250415 8.515914
0.00 0.75 11.53793 0.5250415 8.515914
0.00 0.80 11.53793 0.5250415 8.515914
0.00 0.85 11.53793 0.5250415 8.515914
0.00 0.90 11.53793 0.5250415 8.515914
0.00 0.95 11.53793 0.5250415 8.515914
0.00 1.00 11.53793 0.5250415 8.515914
0.05 0.00 11.51188 0.5193477 8.411928
0.05 0.05 11.51188 0.5193477 8.411928
0.05 0.10 11.51188 0.5193477 8.411928
0.05 0.15 11.51188 0.5193477 8.411928
0.05 0.20 11.51188 0.5193477 8.411928
0.05 0.25 11.51188 0.5193477 8.411928
0.05 0.30 11.51188 0.5193477 8.411928
0.05 0.35 11.51188 0.5193477 8.411928
0.05 0.40 11.51188 0.5193477 8.411928
0.05 0.45 11.51188 0.5193477 8.411928
0.05 0.50 11.51188 0.5193477 8.411928
0.05 0.55 11.51188 0.5193477 8.411928
0.05 0.60 11.51188 0.5193477 8.411928
0.05 0.65 11.51188 0.5193477 8.411928
0.05 0.70 11.51188 0.5193477 8.411928
0.05 0.75 11.51188 0.5193477 8.411928
0.05 0.80 11.51188 0.5193477 8.411928
0.05 0.85 11.51188 0.5193477 8.411928
0.05 0.90 11.51188 0.5193477 8.411928
0.05 0.95 11.51188 0.5193477 8.411928
0.05 1.00 11.51188 0.5193477 8.411928
0.10 0.00 11.87388 0.4931600 8.612027
0.10 0.05 11.87388 0.4931600 8.612027
0.10 0.10 11.87388 0.4931600 8.612027
0.10 0.15 11.87388 0.4931600 8.612027
0.10 0.20 11.87388 0.4931600 8.612027
0.10 0.25 11.87388 0.4931600 8.612027
0.10 0.30 11.87388 0.4931600 8.612027
0.10 0.35 11.87388 0.4931600 8.612027
0.10 0.40 11.87388 0.4931600 8.612027
0.10 0.45 11.87388 0.4931600 8.612027
0.10 0.50 11.87388 0.4931600 8.612027
0.10 0.55 11.87388 0.4931600 8.612027
0.10 0.60 11.87388 0.4931600 8.612027
0.10 0.65 11.87388 0.4931600 8.612027
0.10 0.70 11.87388 0.4931600 8.612027
0.10 0.75 11.87388 0.4931600 8.612027
0.10 0.80 11.87388 0.4931600 8.612027
0.10 0.85 11.87388 0.4931600 8.612027
0.10 0.90 11.87388 0.4931600 8.612027
0.10 0.95 11.87388 0.4931600 8.612027
0.10 1.00 11.87388 0.4931600 8.612027
0.15 0.00 12.11703 0.4756167 8.724019
0.15 0.05 12.11703 0.4756167 8.724019
0.15 0.10 12.11703 0.4756167 8.724019
0.15 0.15 12.11703 0.4756167 8.724019
0.15 0.20 12.11703 0.4756167 8.724019
0.15 0.25 12.11703 0.4756167 8.724019
0.15 0.30 12.11703 0.4756167 8.724019
0.15 0.35 12.11703 0.4756167 8.724019
0.15 0.40 12.11703 0.4756167 8.724019
0.15 0.45 12.11703 0.4756167 8.724019
0.15 0.50 12.11703 0.4756167 8.724019
0.15 0.55 12.11703 0.4756167 8.724019
0.15 0.60 12.11703 0.4756167 8.724019
0.15 0.65 12.11703 0.4756167 8.724019
0.15 0.70 12.10403 0.4767119 8.718872
0.15 0.75 12.02291 0.4821018 8.672541
0.15 0.80 11.93966 0.4876726 8.635465
0.15 0.85 11.86257 0.4930314 8.604027
0.15 0.90 11.79192 0.4980014 8.570039
0.15 0.95 11.73101 0.5022725 8.534427
0.15 1.00 11.67477 0.5063446 8.501190
0.20 0.00 12.24848 0.4663328 8.773614
0.20 0.05 12.24848 0.4663328 8.773614
0.20 0.10 12.24848 0.4663328 8.773614
0.20 0.15 12.24848 0.4663328 8.773614
0.20 0.20 12.24848 0.4663328 8.773614
0.20 0.25 12.24848 0.4663328 8.773614
0.20 0.30 12.24848 0.4663328 8.773614
0.20 0.35 12.24848 0.4663328 8.773614
0.20 0.40 12.24848 0.4663328 8.773614
0.20 0.45 12.24848 0.4663328 8.773614
0.20 0.50 12.24848 0.4663328 8.773614
0.20 0.55 12.17897 0.4709977 8.740790
0.20 0.60 12.06661 0.4782224 8.685266
0.20 0.65 11.96414 0.4849993 8.635803
0.20 0.70 11.87225 0.4914678 8.593015
0.20 0.75 11.79410 0.4972047 8.549929
0.20 0.80 11.72640 0.5023259 8.507920
0.20 0.85 11.66564 0.5071148 8.467147
0.20 0.90 11.60921 0.5115616 8.432457
0.20 0.95 11.56092 0.5153977 8.402606
0.20 1.00 11.51688 0.5187266 8.370980
0.25 0.00 12.31877 0.4615697 8.799004
0.25 0.05 12.31877 0.4615697 8.799004
0.25 0.10 12.31877 0.4615697 8.799004
0.25 0.15 12.31877 0.4615697 8.799004
0.25 0.20 12.31877 0.4615697 8.799004
0.25 0.25 12.31877 0.4615697 8.799004
0.25 0.30 12.31877 0.4615697 8.799004
0.25 0.35 12.31877 0.4615697 8.799004
0.25 0.40 12.31877 0.4615697 8.799004
0.25 0.45 12.22714 0.4673565 8.748050
0.25 0.50 12.09642 0.4756758 8.678306
0.25 0.55 11.97531 0.4838085 8.625506
0.25 0.60 11.86323 0.4918342 8.567662
0.25 0.65 11.77947 0.4982574 8.521258
0.25 0.70 11.70193 0.5045454 8.474541
0.25 0.75 11.62845 0.5103499 8.435073
0.25 0.80 11.56933 0.5150707 8.403723
0.25 0.85 11.51438 0.5192589 8.368416
0.25 0.90 11.46338 0.5232182 8.335977
0.25 0.95 11.41629 0.5266781 8.310120
0.25 1.00 11.37580 0.5296153 8.280928
0.30 0.00 12.35501 0.4591300 8.804308
0.30 0.05 12.35501 0.4591300 8.804308
0.30 0.10 12.35501 0.4591300 8.804308
0.30 0.15 12.35501 0.4591300 8.804308
0.30 0.20 12.35501 0.4591300 8.804308
0.30 0.25 12.35501 0.4591300 8.804308
0.30 0.30 12.35501 0.4591300 8.804308
0.30 0.35 12.34402 0.4599569 8.800258
0.30 0.40 12.19981 0.4687427 8.714704
0.30 0.45 12.05790 0.4780679 8.649481
0.30 0.50 11.91188 0.4884659 8.580495
0.30 0.55 11.80472 0.4964037 8.523671
0.30 0.60 11.70176 0.5047292 8.467669
0.30 0.65 11.61859 0.5113518 8.422116
0.30 0.70 11.55437 0.5163965 8.388502
0.30 0.75 11.49675 0.5208619 8.355595
0.30 0.80 11.44276 0.5247331 8.325311
0.30 0.85 11.39872 0.5278216 8.296165
0.30 0.90 11.36329 0.5301583 8.271249
0.30 0.95 11.33762 0.5315246 8.261173
0.30 1.00 11.31488 0.5325901 8.248856
0.35 0.00 12.38580 0.4568722 8.803290
0.35 0.05 12.38580 0.4568722 8.803290
0.35 0.10 12.38580 0.4568722 8.803290
0.35 0.15 12.38580 0.4568722 8.803290
0.35 0.20 12.38580 0.4568722 8.803290
0.35 0.25 12.38580 0.4568722 8.803290
0.35 0.30 12.37348 0.4577637 8.798540
0.35 0.35 12.21275 0.4673477 8.702618
0.35 0.40 12.04581 0.4785900 8.625206
0.35 0.45 11.89168 0.4898816 8.549936
0.35 0.50 11.76583 0.4995036 8.486959
0.35 0.55 11.65307 0.5084587 8.426953
0.35 0.60 11.57949 0.5143359 8.393130
0.35 0.65 11.51819 0.5190539 8.373933
0.35 0.70 11.45740 0.5234114 8.341405
0.35 0.75 11.40759 0.5268800 8.306631
0.35 0.80 11.36929 0.5291104 8.286790
0.35 0.85 11.34235 0.5303545 8.272736
0.35 0.90 11.31986 0.5312583 8.257601
0.35 0.95 11.29949 0.5320984 8.242515
0.35 1.00 11.28183 0.5329384 8.229440
0.40 0.00 12.40375 0.4553822 8.813358
0.40 0.05 12.40375 0.4553822 8.813358
0.40 0.10 12.40375 0.4553822 8.813358
0.40 0.15 12.40375 0.4553822 8.813358
0.40 0.20 12.40375 0.4553822 8.813358
0.40 0.25 12.40375 0.4553822 8.813358
0.40 0.30 12.25574 0.4641974 8.726030
0.40 0.35 12.07146 0.4764203 8.637286
0.40 0.40 11.91325 0.4880842 8.552235
0.40 0.45 11.76179 0.4997488 8.477598
0.40 0.50 11.64914 0.5087558 8.420247
0.40 0.55 11.57389 0.5147961 8.393213
0.40 0.60 11.50371 0.5200827 8.365838
0.40 0.65 11.43519 0.5246959 8.325514
0.40 0.70 11.38925 0.5274167 8.302926
0.40 0.75 11.36032 0.5287328 8.282293
0.40 0.80 11.33531 0.5295769 8.262606
0.40 0.85 11.31433 0.5303465 8.247694
0.40 0.90 11.29579 0.5312627 8.236844
0.40 0.95 11.27191 0.5327982 8.222481
0.40 1.00 11.25009 0.5342904 8.210286
0.45 0.00 12.41421 0.4544498 8.803154
0.45 0.05 12.41421 0.4544498 8.803154
0.45 0.10 12.41421 0.4544498 8.803154
0.45 0.15 12.41421 0.4544498 8.803154
0.45 0.20 12.41421 0.4544498 8.803154
0.45 0.25 12.33235 0.4592815 8.751044
0.45 0.30 12.13214 0.4716177 8.643181
0.45 0.35 11.95471 0.4845195 8.551787
0.45 0.40 11.78602 0.4975038 8.470291
0.45 0.45 11.66075 0.5075564 8.425050
0.45 0.50 11.57659 0.5142913 8.402620
0.45 0.55 11.49295 0.5204237 8.370108
0.45 0.60 11.42884 0.5246253 8.338893
0.45 0.65 11.38770 0.5267068 8.314548
0.45 0.70 11.35847 0.5277200 8.285585
0.45 0.75 11.33681 0.5284087 8.266647
0.45 0.80 11.31106 0.5298820 8.249939
0.45 0.85 11.28318 0.5317364 8.231283
0.45 0.90 11.26129 0.5332358 8.221858
0.45 0.95 11.24195 0.5350083 8.206785
0.45 1.00 11.23810 0.5355378 8.198364
0.50 0.00 12.42555 0.4537188 8.809092
0.50 0.05 12.42555 0.4537188 8.809092
0.50 0.10 12.42555 0.4537188 8.809092
0.50 0.15 12.42555 0.4537188 8.809092
0.50 0.20 12.42555 0.4537188 8.809092
0.50 0.25 12.22778 0.4651758 8.685305
0.50 0.30 12.01653 0.4796907 8.580682
0.50 0.35 11.83927 0.4933150 8.491722
0.50 0.40 11.68898 0.5053107 8.426051
0.50 0.45 11.58669 0.5132974 8.398244
0.50 0.50 11.49188 0.5202644 8.362166
0.50 0.55 11.42753 0.5243783 8.338511
0.50 0.60 11.39113 0.5257732 8.310539
0.50 0.65 11.36053 0.5268023 8.280276
0.50 0.70 11.33473 0.5280177 8.263263
0.50 0.75 11.30409 0.5300525 8.247595
0.50 0.80 11.27970 0.5317976 8.236227
0.50 0.85 11.25551 0.5338377 8.219442
0.50 0.90 11.25139 0.5344164 8.210266
0.50 0.95 11.25581 0.5342874 8.202729
0.50 1.00 11.26047 0.5342598 8.191695
0.55 0.00 12.44335 0.4524284 8.814507
0.55 0.05 12.44335 0.4524284 8.814507
0.55 0.10 12.44335 0.4524284 8.814507
0.55 0.15 12.44335 0.4524284 8.814507
0.55 0.20 12.37910 0.4561825 8.775409
0.55 0.25 12.13236 0.4711394 8.633629
0.55 0.30 11.92756 0.4864176 8.519790
0.55 0.35 11.73675 0.5012893 8.439699
0.55 0.40 11.61591 0.5108870 8.404480
0.55 0.45 11.50449 0.5191634 8.362960
0.55 0.50 11.43927 0.5232602 8.346430
0.55 0.55 11.39513 0.5250129 8.310013
0.55 0.60 11.36506 0.5260435 8.282241
0.55 0.65 11.33160 0.5280040 8.263716
0.55 0.70 11.30446 0.5298864 8.250004
0.55 0.75 11.27868 0.5317796 8.236361
0.55 0.80 11.26093 0.5334356 8.221950
0.55 0.85 11.26602 0.5333093 8.216717
0.55 0.90 11.27161 0.5332251 8.206439
0.55 0.95 11.27114 0.5334219 8.186413
0.55 1.00 11.26927 0.5337369 8.172354
0.60 0.00 12.47331 0.4507460 8.835912
0.60 0.05 12.47331 0.4507460 8.835912
0.60 0.10 12.47331 0.4507460 8.835912
0.60 0.15 12.47331 0.4507460 8.835912
0.60 0.20 12.30199 0.4604742 8.717974
0.60 0.25 12.05801 0.4766056 8.603982
0.60 0.30 11.83431 0.4937326 8.482620
0.60 0.35 11.67140 0.5065258 8.414784
0.60 0.40 11.53615 0.5166853 8.371462
0.60 0.45 11.45583 0.5219123 8.350406
0.60 0.50 11.40482 0.5241991 8.314230
0.60 0.55 11.37111 0.5254148 8.284465
0.60 0.60 11.33504 0.5276331 8.266806
0.60 0.65 11.30718 0.5295493 8.254408
0.60 0.70 11.28320 0.5313404 8.238537
0.60 0.75 11.27413 0.5323843 8.230810
0.60 0.80 11.27935 0.5323330 8.222415
0.60 0.85 11.28132 0.5324667 8.206477
0.60 0.90 11.28037 0.5326847 8.187330
0.60 0.95 11.28075 0.5329440 8.176776
0.60 1.00 11.27876 0.5334150 8.163125
0.65 0.00 12.48460 0.4501898 8.834668
0.65 0.05 12.48460 0.4501898 8.834668
0.65 0.10 12.48460 0.4501898 8.834668
0.65 0.15 12.48460 0.4501898 8.834668
0.65 0.20 12.22876 0.4644966 8.662618
0.65 0.25 11.96961 0.4828294 8.531299
0.65 0.30 11.75523 0.4996815 8.442373
0.65 0.35 11.60031 0.5117776 8.398337
0.65 0.40 11.48713 0.5196491 8.367479
0.65 0.45 11.42926 0.5224803 8.334205
0.65 0.50 11.38465 0.5244189 8.295704
0.65 0.55 11.34085 0.5271020 8.270013
0.65 0.60 11.31266 0.5289928 8.259918
0.65 0.65 11.29092 0.5306391 8.242795
0.65 0.70 11.28627 0.5314347 8.240068
0.65 0.75 11.29068 0.5314457 8.228580
0.65 0.80 11.28906 0.5318016 8.209590
0.65 0.85 11.29068 0.5319039 8.194202
0.65 0.90 11.28754 0.5324562 8.179138
0.65 0.95 11.28910 0.5326146 8.166889
0.65 1.00 11.29827 0.5322598 8.156901
0.70 0.00 12.49749 0.4494513 8.848617
0.70 0.05 12.49749 0.4494513 8.848617
0.70 0.10 12.49749 0.4494513 8.848617
0.70 0.15 12.48337 0.4503052 8.844427
0.70 0.20 12.16177 0.4688465 8.639601
0.70 0.25 11.88754 0.4892225 8.492364
0.70 0.30 11.68639 0.5051840 8.409086
0.70 0.35 11.53119 0.5166740 8.362567
0.70 0.40 11.44983 0.5212037 8.338807
0.70 0.45 11.40365 0.5231939 8.299077
0.70 0.50 11.35762 0.5259004 8.277247
0.70 0.55 11.31766 0.5284623 8.261502
0.70 0.60 11.30130 0.5296811 8.251245
0.70 0.65 11.29589 0.5306006 8.248815
0.70 0.70 11.29736 0.5309075 8.233501
0.70 0.75 11.29691 0.5311578 8.215740
0.70 0.80 11.29876 0.5312879 8.198663
0.70 0.85 11.29570 0.5318110 8.184278
0.70 0.90 11.30080 0.5316792 8.172830
0.70 0.95 11.31459 0.5310160 8.160210
0.70 1.00 11.34117 0.5293599 8.156729
0.75 0.00 12.52788 0.4476468 8.873164
0.75 0.05 12.52788 0.4476468 8.873164
0.75 0.10 12.52788 0.4476468 8.873164
0.75 0.15 12.42986 0.4528574 8.810499
0.75 0.20 12.10083 0.4732677 8.616552
0.75 0.25 11.81878 0.4947967 8.469014
0.75 0.30 11.62579 0.5097137 8.392247
0.75 0.35 11.49506 0.5183514 8.365566
0.75 0.40 11.42342 0.5220947 8.309935
0.75 0.45 11.37535 0.5247165 8.281197
0.75 0.50 11.33040 0.5275653 8.263144
0.75 0.55 11.31014 0.5288388 8.258475
0.75 0.60 11.30102 0.5300301 8.254030
0.75 0.65 11.30516 0.5302770 8.241267
0.75 0.70 11.30429 0.5305451 8.221971
0.75 0.75 11.30436 0.5308641 8.205565
0.75 0.80 11.30302 0.5312254 8.189711
0.75 0.85 11.31136 0.5308926 8.177592
0.75 0.90 11.32915 0.5299199 8.163900
0.75 0.95 11.36112 0.5279413 8.161977
0.75 1.00 11.40149 0.5251406 8.165720
0.80 0.00 12.54000 0.4468539 8.873734
0.80 0.05 12.54000 0.4468539 8.873734
0.80 0.10 12.54000 0.4468539 8.873734
0.80 0.15 12.36307 0.4565334 8.742192
0.80 0.20 12.02404 0.4785603 8.557303
0.80 0.25 11.76134 0.4992987 8.432701
0.80 0.30 11.57194 0.5134562 8.380611
0.80 0.35 11.46387 0.5196827 8.346879
0.80 0.40 11.39699 0.5233409 8.293668
0.80 0.45 11.35275 0.5261122 8.270987
0.80 0.50 11.31889 0.5280267 8.261904
0.80 0.55 11.30490 0.5294804 8.257773
0.80 0.60 11.31305 0.5295496 8.248625
0.80 0.65 11.31121 0.5299650 8.231328
0.80 0.70 11.31072 0.5302944 8.213271
0.80 0.75 11.31090 0.5305524 8.197348
0.80 0.80 11.32150 0.5300571 8.184373
0.80 0.85 11.34020 0.5290321 8.169324
0.80 0.90 11.37652 0.5267793 8.167839
0.80 0.95 11.41946 0.5236841 8.173839
0.80 1.00 11.45911 0.5206614 8.183411
0.85 0.00 12.54803 0.4461196 8.852025
0.85 0.05 12.54803 0.4461196 8.852025
0.85 0.10 12.54803 0.4461196 8.852025
0.85 0.15 12.31247 0.4585734 8.670953
0.85 0.20 11.95894 0.4834174 8.510166
0.85 0.25 11.70876 0.5032555 8.433600
0.85 0.30 11.53136 0.5156476 8.381240
0.85 0.35 11.44048 0.5205888 8.322306
0.85 0.40 11.37784 0.5243200 8.277037
0.85 0.45 11.32786 0.5273018 8.260366
0.85 0.50 11.31525 0.5283616 8.262300
0.85 0.55 11.31891 0.5288733 8.256804
0.85 0.60 11.31754 0.5294385 8.239075
0.85 0.65 11.31554 0.5298325 8.219069
0.85 0.70 11.31773 0.5299914 8.206214
0.85 0.75 11.32862 0.5294489 8.192200
0.85 0.80 11.34804 0.5283704 8.175150
0.85 0.85 11.38708 0.5259155 8.174393
0.85 0.90 11.43254 0.5225868 8.181934
0.85 0.95 11.47399 0.5194248 8.192436
0.85 1.00 11.51905 0.5158442 8.204177
0.90 0.00 12.57181 0.4447620 8.878599
0.90 0.05 12.57181 0.4447620 8.878599
0.90 0.10 12.57181 0.4447620 8.878599
0.90 0.15 12.26752 0.4611327 8.642921
0.90 0.20 11.90618 0.4879739 8.483744
0.90 0.25 11.65398 0.5073772 8.411530
0.90 0.30 11.50192 0.5169928 8.370120
0.90 0.35 11.41111 0.5221526 8.303676
0.90 0.40 11.35461 0.5256853 8.267778
0.90 0.45 11.32438 0.5273323 8.261434
0.90 0.50 11.31906 0.5284532 8.259224
0.90 0.55 11.32531 0.5287524 8.248572
0.90 0.60 11.32212 0.5291427 8.224983
0.90 0.65 11.32206 0.5295867 8.215843
0.90 0.70 11.33433 0.5289026 8.205041
0.90 0.75 11.35398 0.5278053 8.185521
0.90 0.80 11.39477 0.5252169 8.183693
0.90 0.85 11.44204 0.5217380 8.191303
0.90 0.90 11.48631 0.5183558 8.201823
0.90 0.95 11.53607 0.5143657 8.214532
0.90 1.00 11.56761 0.5119473 8.217216
0.95 0.00 12.64299 0.4408521 8.959014
0.95 0.05 12.64299 0.4408521 8.959014
0.95 0.10 12.64299 0.4408521 8.959014
0.95 0.15 12.24337 0.4630925 8.667176
0.95 0.20 11.85848 0.4921552 8.487227
0.95 0.25 11.61332 0.5100353 8.405217
0.95 0.30 11.48060 0.5177301 8.354104
0.95 0.35 11.39027 0.5233023 8.293152
0.95 0.40 11.32992 0.5269098 8.258529
0.95 0.45 11.32616 0.5273702 8.257551
0.95 0.50 11.33454 0.5276900 8.258231
0.95 0.55 11.33337 0.5281933 8.239576
0.95 0.60 11.33211 0.5287148 8.227804
0.95 0.65 11.34308 0.5281104 8.218278
0.95 0.70 11.36098 0.5272382 8.197175
0.95 0.75 11.39817 0.5248119 8.189957
0.95 0.80 11.44759 0.5212445 8.195806
0.95 0.85 11.49467 0.5176776 8.204240
0.95 0.90 11.54795 0.5134197 8.217127
0.95 0.95 11.58307 0.5106644 8.226298
0.95 1.00 11.59774 0.5092634 8.237056
1.00 0.00 12.69102 0.4379992 8.998774
1.00 0.05 12.69102 0.4379992 8.998774
1.00 0.10 12.69102 0.4379992 8.998774
1.00 0.15 12.21426 0.4653128 8.673577
1.00 0.20 11.81040 0.4958095 8.477271
1.00 0.25 11.58852 0.5112907 8.415173
1.00 0.30 11.45379 0.5192040 8.334757
1.00 0.35 11.38265 0.5237455 8.288194
1.00 0.40 11.34250 0.5257692 8.268232
1.00 0.45 11.34255 0.5265540 8.261116
1.00 0.50 11.34772 0.5271131 8.246642
1.00 0.55 11.34530 0.5275501 8.221234
1.00 0.60 11.34833 0.5277125 8.205959
1.00 0.65 11.36951 0.5265951 8.192786
1.00 0.70 11.40030 0.5246927 8.182277
1.00 0.75 11.45252 0.5209767 8.199225
1.00 0.80 11.50430 0.5170904 8.220566
1.00 0.85 11.56128 0.5126118 8.246597
1.00 0.90 11.60323 0.5093905 8.268168
1.00 0.95 11.61898 0.5078498 8.277227
1.00 1.00 11.63148 0.5070128 8.290492
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 0.45 and lambda = 1.
plot(enetTune)#LARS
set.seed(1001)
larsTune <- train(permeability ~ ., data = train, method = "lars", metric = "Rsquared",
tuneLength = 20, trControl = ctrl, preProc = c("center", "scale"))
larsTuneLeast Angle Regression
133 samples
388 predictors
Pre-processing: centered (388), scaled (388)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 120, 120, 117, 121, 119, 121, ...
Resampling results across tuning parameters:
fraction RMSE Rsquared MAE
0.05 11.53166 0.5027853 8.350140
0.10 11.41717 0.5146005 8.317320
0.15 11.59310 0.5015690 8.300499
0.20 12.10994 0.4671489 8.606010
0.25 12.71180 0.4338619 8.904153
0.30 13.11411 0.4122326 9.128523
0.35 13.65877 0.3850360 9.539137
0.40 14.39596 0.3509255 10.072081
0.45 15.33677 0.3137628 10.677424
0.50 16.30043 0.2827741 11.319450
0.55 17.24957 0.2562205 11.874363
0.60 18.31506 0.2322732 12.533598
0.65 19.12677 0.2166460 12.989689
0.70 19.90471 0.2052300 13.273240
0.75 20.60832 0.1972836 13.556998
0.80 21.35375 0.1905442 13.935738
0.85 22.06860 0.1838469 14.356435
0.90 22.81429 0.1791615 14.772341
0.95 23.49268 0.1775721 15.169671
1.00 24.23783 0.1745248 15.642677
Rsquared was used to select the optimal model using the largest value.
The final value used for the model was fraction = 0.1.
plot(larsTune)lars_predict <- predict(larsTune, test)
postResample(pred=lars_predict, obs = test[, "permeability"]) RMSE Rsquared MAE
11.0018629 0.3887295 7.3843906
6.3. A chemical manufacturing process for a pharmaceutical product was discussed in Sect. 1.4. In this problem, the objective is to understand the relationship between biological measurements of the raw materials (predictors), measurements of the manufacturing process (predictors), and the response of product yield. Biological predictors cannot be changed but can be used to assess the quality of the raw material before processing. On the other hand, manufacturing process predictors can be changed in the manufacturing process. Improving product yield by 1% will boost revenue by approximately one hundred thousand dollars per batch:
library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)set.seed(2424)
sum(is.na(ChemicalManufacturingProcess))[1] 106
missing1 <- preProcess(ChemicalManufacturingProcess, method = "bagImpute")
Chemical <- predict(missing1, ChemicalManufacturingProcess)
sum(is.na(Chemical))[1] 0
The best parameter combination is lamda = 0.00 fraction =0.25 RMSE= 1.153193 RSquared = 0.6316451.It has a higher Rsquared value. We will choose the enet for this process for the exercise.
set.seed(2424)
Chemical <- Chemical[, -nearZeroVar(Chemical)]
# index for training
index <- createDataPartition(Chemical$Yield, p = .8, list = FALSE)
# train
train_chem <- Chemical[index, ]
# test
test_chem <- Chemical[-index, ]
plsTune <- train(Yield ~ ., train_chem , method = "pls",
tuneLength = 20, trControl = ctrl, preProc = c("center", "scale"))
plot(plsTune)plsTunePartial Least Squares
144 samples
56 predictor
Pre-processing: centered (56), scaled (56)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 129, 130, 130, 130, 128, 129, ...
Resampling results across tuning parameters:
ncomp RMSE Rsquared MAE
1 1.533401 0.4082590 1.1843972
2 1.615060 0.4933855 1.1246271
3 1.375594 0.5461950 1.0520376
4 1.293071 0.5673878 1.0212218
5 1.231708 0.5973083 0.9993092
6 1.214868 0.6070342 0.9793783
7 1.219530 0.6016678 0.9843603
8 1.279174 0.5704773 1.0135515
9 1.394046 0.5395742 1.0620331
10 1.506136 0.5259329 1.0961468
11 1.558271 0.5209210 1.1196984
12 1.649394 0.5199287 1.1541972
13 1.694805 0.5209227 1.1516763
14 1.778950 0.5076492 1.1834368
15 1.914717 0.5012890 1.2294413
16 2.051858 0.5052564 1.2658658
17 2.145115 0.5010709 1.2906665
18 2.170917 0.5012588 1.2980514
19 2.204924 0.5059257 1.3000658
20 2.223914 0.5042411 1.3013299
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 6.
set.seed(2424)
# grid of penalties
enetGrid <- expand.grid(.lambda = c(0, 0.01, .1), .fraction = seq(.05, 1, length = 20))
enetTune <- train(Yield ~ ., train_chem , method = "enet",
tuneGrid = enetGrid, trControl = ctrl, preProc = c("center", "scale"))
plot(enetTune)enetTuneElasticnet
144 samples
56 predictor
Pre-processing: centered (56), scaled (56)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 130, 130, 130, 129, 131, 129, ...
Resampling results across tuning parameters:
lambda fraction RMSE Rsquared MAE
0.00 0.05 1.358003 0.5790569 1.0975625
0.00 0.10 1.216383 0.5977077 0.9768607
0.00 0.15 1.179433 0.6136766 0.9564745
0.00 0.20 1.167768 0.6215296 0.9664152
0.00 0.25 1.153193 0.6316451 0.9516922
0.00 0.30 1.182192 0.6190988 0.9649380
0.00 0.35 1.229640 0.5968054 0.9919681
0.00 0.40 1.283114 0.5710265 1.0294122
0.00 0.45 1.333830 0.5486191 1.0618781
0.00 0.50 1.374127 0.5333748 1.0854533
0.00 0.55 1.424459 0.5179247 1.1237876
0.00 0.60 1.503522 0.5058301 1.1703727
0.00 0.65 1.590342 0.4992850 1.2095564
0.00 0.70 1.696133 0.4939354 1.2514632
0.00 0.75 1.811158 0.4894001 1.2943707
0.00 0.80 1.925386 0.4854254 1.3363208
0.00 0.85 2.122677 0.4823981 1.3990628
0.00 0.90 2.235418 0.4793635 1.4365919
0.00 0.95 2.316803 0.4762688 1.4646934
0.00 1.00 2.327246 0.4734243 1.4720562
0.01 0.05 1.555118 0.5559403 1.2478543
0.01 0.10 1.331648 0.5839038 1.0765140
0.01 0.15 1.245270 0.5887826 1.0093502
0.01 0.20 1.214734 0.6007588 0.9780824
0.01 0.25 1.190497 0.6114435 0.9608251
0.01 0.30 1.175243 0.6174477 0.9548463
0.01 0.35 1.171149 0.6216265 0.9652828
0.01 0.40 1.180036 0.6194803 0.9751146
0.01 0.45 1.198305 0.6160819 0.9823155
0.01 0.50 1.180566 0.6223725 0.9735382
0.01 0.55 1.199985 0.6125234 0.9817319
0.01 0.60 1.225700 0.5998531 0.9964828
0.01 0.65 1.253750 0.5857629 1.0130091
0.01 0.70 1.296249 0.5653053 1.0434394
0.01 0.75 1.335749 0.5516099 1.0661418
0.01 0.80 1.368026 0.5419973 1.0851297
0.01 0.85 1.398722 0.5339207 1.1025537
0.01 0.90 1.433964 0.5263868 1.1212462
0.01 0.95 1.509941 0.5194268 1.1518631
0.01 1.00 1.610456 0.5138765 1.1876712
0.10 0.05 1.674865 0.5010854 1.3427495
0.10 0.10 1.508754 0.5667889 1.2125779
0.10 0.15 1.377788 0.5832363 1.1056545
0.10 0.20 1.294493 0.5817700 1.0552534
0.10 0.25 1.251688 0.5877561 1.0152858
0.10 0.30 1.228132 0.5954235 0.9952406
0.10 0.35 1.211536 0.6035302 0.9817192
0.10 0.40 1.199738 0.6075301 0.9677512
0.10 0.45 1.195464 0.6083768 0.9643027
0.10 0.50 1.194239 0.6089545 0.9655652
0.10 0.55 1.195380 0.6089864 0.9726889
0.10 0.60 1.198415 0.6089931 0.9773074
0.10 0.65 1.203158 0.6081404 0.9820957
0.10 0.70 1.214772 0.6035954 0.9923240
0.10 0.75 1.238868 0.5940245 1.0102928
0.10 0.80 1.248068 0.5901695 1.0184832
0.10 0.85 1.244725 0.5912813 1.0175108
0.10 0.90 1.247201 0.5909072 1.0168099
0.10 0.95 1.253798 0.5881365 1.0167359
0.10 1.00 1.261200 0.5847390 1.0235644
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were fraction = 0.25 and lambda = 0.
set.seed(2424)
lt <- predict(enetTune , test_chem[ ,-1])
postResample(lt, test_chem[ ,1]) RMSE Rsquared MAE
1.2099108 0.6335565 0.9005472
varImp(enetTune)loess r-squared variable importance
only 20 most important variables shown (out of 56)
Overall
ManufacturingProcess32 100.00
ManufacturingProcess13 99.20
ManufacturingProcess17 80.95
BiologicalMaterial06 75.70
ManufacturingProcess36 74.33
ManufacturingProcess09 72.15
BiologicalMaterial12 68.43
BiologicalMaterial03 68.34
ManufacturingProcess31 64.65
ManufacturingProcess06 55.15
BiologicalMaterial02 53.86
ManufacturingProcess33 50.33
ManufacturingProcess11 46.61
BiologicalMaterial09 46.34
ManufacturingProcess30 44.75
BiologicalMaterial11 44.46
ManufacturingProcess29 41.69
BiologicalMaterial04 39.47
BiologicalMaterial08 38.22
ManufacturingProcess12 37.94
plot(varImp(enetTune))library(caret)
library(stats)
library(corrplot)
top <- varImp(enetTune)$importance %>%
arrange(-Overall) %>%
head(10)
t<- Chemical %>%
dplyr::select(c("Yield", row.names(top)))
cor_matrix <- cor(t)
cor_matrix Yield ManufacturingProcess32
Yield 1.00000000 0.60833215
ManufacturingProcess32 0.60833215 1.00000000
ManufacturingProcess13 -0.50367972 -0.10120679
ManufacturingProcess17 -0.42580687 0.01604178
BiologicalMaterial06 0.47816342 0.60059580
ManufacturingProcess36 -0.53133449 -0.79563175
ManufacturingProcess09 0.50347051 0.04100301
BiologicalMaterial12 0.36749764 0.38777603
BiologicalMaterial03 0.44508598 0.53185738
ManufacturingProcess31 -0.06881574 -0.00500481
ManufacturingProcess06 0.39202539 0.21105758
ManufacturingProcess13 ManufacturingProcess17
Yield -0.50367972 -0.425806872
ManufacturingProcess32 -0.10120679 0.016041778
ManufacturingProcess13 1.00000000 0.782413453
ManufacturingProcess17 0.78241345 1.000000000
BiologicalMaterial06 -0.12186756 0.006004003
ManufacturingProcess36 0.09842477 -0.007108255
ManufacturingProcess09 -0.79135366 -0.715456036
BiologicalMaterial12 -0.11198335 0.018842856
BiologicalMaterial03 -0.13369531 -0.097605022
ManufacturingProcess31 0.06995808 0.031559524
ManufacturingProcess06 -0.41350289 -0.258261829
BiologicalMaterial06 ManufacturingProcess36
Yield 0.478163422 -0.531334494
ManufacturingProcess32 0.600595801 -0.795631748
ManufacturingProcess13 -0.121867557 0.098424769
ManufacturingProcess17 0.006004003 -0.007108255
BiologicalMaterial06 1.000000000 -0.532150776
ManufacturingProcess36 -0.532150776 1.000000000
ManufacturingProcess09 0.230059682 -0.053526128
BiologicalMaterial12 0.812853967 -0.374906225
BiologicalMaterial03 0.872363670 -0.471490584
ManufacturingProcess31 -0.044909168 0.090604821
ManufacturingProcess06 0.235333884 -0.253580875
ManufacturingProcess09 BiologicalMaterial12
Yield 0.50347051 0.36749764
ManufacturingProcess32 0.04100301 0.38777603
ManufacturingProcess13 -0.79135366 -0.11198335
ManufacturingProcess17 -0.71545604 0.01884286
BiologicalMaterial06 0.23005968 0.81285397
ManufacturingProcess36 -0.05352613 -0.37490623
ManufacturingProcess09 1.00000000 0.24585610
BiologicalMaterial12 0.24585610 1.00000000
BiologicalMaterial03 0.21460099 0.69731478
ManufacturingProcess31 -0.11321706 -0.10229244
ManufacturingProcess06 0.37247636 0.26219908
BiologicalMaterial03 ManufacturingProcess31
Yield 0.44508598 -0.06881574
ManufacturingProcess32 0.53185738 -0.00500481
ManufacturingProcess13 -0.13369531 0.06995808
ManufacturingProcess17 -0.09760502 0.03155952
BiologicalMaterial06 0.87236367 -0.04490917
ManufacturingProcess36 -0.47149058 0.09060482
ManufacturingProcess09 0.21460099 -0.11321706
BiologicalMaterial12 0.69731478 -0.10229244
BiologicalMaterial03 1.00000000 0.01168781
ManufacturingProcess31 0.01168781 1.00000000
ManufacturingProcess06 0.19598067 -0.09085930
ManufacturingProcess06
Yield 0.3920254
ManufacturingProcess32 0.2110576
ManufacturingProcess13 -0.4135029
ManufacturingProcess17 -0.2582618
BiologicalMaterial06 0.2353339
ManufacturingProcess36 -0.2535809
ManufacturingProcess09 0.3724764
BiologicalMaterial12 0.2621991
BiologicalMaterial03 0.1959807
ManufacturingProcess31 -0.0908593
ManufacturingProcess06 1.0000000
corrplot(cor_matrix)#,
# method="color",
# type="upper")