CUNY DATA624 HW7
CUNY DATA624 HW7
- 6.2 Developing a model to predict permeability could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug:
- a) The matrix
fingerprintscontains the 1,107 binary molecular predictors for the 165 compounds, whilepermeabilitycontains permeability response. - b) The fingerprint predictors indicate the presence or absence of substructures of a molecule and are often sparse meaning that relatively few of the molecules contain each substructure. Filter out the predictors that have low frequencies using the
nearZeroVarfunction from the caret package. How many predictors are left for modeling? - c) Split the data into a training and a test set, pre-process the data, and tune a PLS model. How many latent variables are optimal and what is the corresponding resampled estimate of \(R^2\)?
- d) Predict the response for the test set. What is the test set estimate of \(R^2\)?
- e) Try building other models discussed in this chapter. Do any have better predictive performance?
- f) Would you recommend any of your models to replace the permeability laboratory experiment?
- a) The matrix
- 6.3 A chemical manufacturing process for a pharmaceutical product was discussed in Sect. 1.4. In this problem, the objective is to understand the relationship between biological measurements of the raw materials (predictors), measurements of the manufacturing process (predictors), and the response of product yield. Biological predictors cannot be changed but can be used to assess the quality of the raw material before processing. On the other hand, manufacturing process predictors can be changed in the manufacturing process. Improving product yield by 1% will boost revenue by approximately one hundred thousand dollars per batch:
- a) The matrix
processPredictorscontains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs.yieldcontains the percent yield for each run. - b) A small percentage of cells in the predictor set contain missing values. Use an imputation function to fill in these missing values
- c) Split the data into a training and a test set, pre-process the data, and tune a model of your choice from this chapter. What is the optimal value of the performance metric?
- d) Predict the response for the test set. What is the value of the performance metric and how does this compare with the resampled performance metric on the training set?
- a) The matrix
6.2 Developing a model to predict permeability could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug:
a) The matrix fingerprints contains the 1,107 binary molecular predictors for the 165 compounds, while permeability contains permeability response.
b) The fingerprint predictors indicate the presence or absence of substructures of a molecule and are often sparse meaning that relatively few of the molecules contain each substructure. Filter out the predictors that have low frequencies using the nearZeroVar function from the caret package. How many predictors are left for modeling?
There are currently 1,107 predictors as mentioned in part A and also below when we use the ncol function.
## [1] 1107
When we use the nearZeroVar function, there are 719 predictors to remove, leaving us with 388 predictors for modeling (below).
## [1] 388
We’ll save the matrix with the relevant predictors for future use.
c) Split the data into a training and a test set, pre-process the data, and tune a PLS model. How many latent variables are optimal and what is the corresponding resampled estimate of \(R^2\)?
Since, we’re trying to predict permeability, we use the createDataPartition function to help us create our training and test sets.
set.seed(42)
trainrows <- createDataPartition(permeability,
p=0.8,
list=FALSE)
train_fp <- fp[trainrows,]
train_perm <- permeability[trainrows,]
test_fp <- fp[-trainrows, ]
test_perm <- permeability[-trainrows,]Now we tune our PLS model using cross validation and from the code below, we can see the optimal number of latent variables.
set.seed(42)
ctrl = trainControl(method='cv', number = 10)
plsTune <- train(x=train_fp, y=train_perm,
method = "pls",
tuneLength = 20,
trControl = ctrl,
preProc = c("center", "scale"))
plot(plsTune)## [1] "The optimal number of latent variables = 5"
From the table below, we can see that for ncomp = 5 the corresponding \(R^2 = 0.406\)
## Partial Least Squares
##
## 133 samples
## 388 predictors
##
## Pre-processing: centered (388), scaled (388)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 119, 121, 120, 120, 118, 119, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 13.12631 0.3028742 10.029920
## 2 12.11593 0.4426055 8.774476
## 3 11.85556 0.4509721 9.039633
## 4 11.95439 0.4470189 9.005672
## 5 11.32556 0.4905685 8.470917
## 6 11.44232 0.4981471 8.512476
## 7 11.36593 0.5016860 8.579412
## 8 11.43091 0.4951021 8.738870
## 9 11.41819 0.4954867 8.602458
## 10 11.50184 0.4928341 8.669320
## 11 11.65389 0.4880699 8.692119
## 12 11.76185 0.4765640 8.755928
## 13 12.00174 0.4563971 8.992871
## 14 12.41083 0.4338308 9.146298
## 15 12.44980 0.4373552 9.173314
## 16 12.66798 0.4239454 9.278853
## 17 12.91954 0.4038774 9.570894
## 18 13.18323 0.3927971 9.693686
## 19 13.45573 0.3790540 9.875498
## 20 13.52435 0.3803545 9.867799
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 5.
d) Predict the response for the test set. What is the test set estimate of \(R^2\)?
Below we can see that the \(R^2\) estimate for the test set is 0.447
fp_pred <- predict(plsTune, newdata = test_fp)
defaultSummary(data.frame(obs = test_perm, pred=fp_pred))## RMSE Rsquared MAE
## 14.3499136 0.2855738 11.1917749
e) Try building other models discussed in this chapter. Do any have better predictive performance?
Below, I tried both ridge and lasso regression and both did not perform as wel as the PLS model
Ridge Regression
ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 15))
set.seed(42)
ridgeRegFit <- train(x=train_fp, y=train_perm,
method = "ridge",
tuneGrid = ridgeGrid,
trControl = ctrl,
preProc = c("center", "scale"))
ridgeRegFit## Ridge Regression
##
## 133 samples
## 388 predictors
##
## Pre-processing: centered (388), scaled (388)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 119, 121, 120, 120, 118, 119, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.000000000 24.66318 0.2757693 18.096185
## 0.007142857 15.52673 0.3245990 11.494276
## 0.014285714 139.18240 0.3409874 108.414446
## 0.021428571 13.61766 0.3868480 9.994901
## 0.028571429 13.19319 0.4045492 9.658507
## 0.035714286 12.96935 0.4155138 9.455701
## 0.042857143 12.71796 0.4268010 9.272863
## 0.050000000 12.51029 0.4380482 9.121286
## 0.057142857 12.37443 0.4450249 9.023325
## 0.064285714 12.27362 0.4514116 8.950980
## 0.071428571 12.23244 0.4550147 8.933299
## 0.078571429 12.12523 0.4613185 8.855556
## 0.085714286 12.08238 0.4642299 8.828290
## 0.092857143 12.09525 0.4654539 8.862810
## 0.100000000 12.02008 0.4703192 8.819515
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.1.
fp_ridgepred <- predict(ridgeRegFit, newdata = test_fp)
defaultSummary(data.frame(obs = test_perm, pred=fp_ridgepred))## RMSE Rsquared MAE
## 14.2806823 0.3177461 10.5800037
Lasso Regression
enetGrid <- expand.grid(.lambda = c(0, 0.01, .1),
.fraction = seq(.05, 1, length = 20))
set.seed(42)
enetTune <- train(train_fp, train_perm,
method = "enet",
tuneGrid = enetGrid,
trControl = ctrl,
preProc = c("center", "scale"))
enetTune## Elasticnet
##
## 133 samples
## 388 predictors
##
## Pre-processing: centered (388), scaled (388)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 119, 121, 120, 120, 118, 119, ...
## Resampling results across tuning parameters:
##
## lambda fraction RMSE Rsquared MAE
## 0.00 0.05 11.43782 0.4992148 8.847282
## 0.00 0.10 11.25501 0.5129622 8.533042
## 0.00 0.15 12.03110 0.4771312 8.774326
## 0.00 0.20 12.76238 0.4533056 9.028350
## 0.00 0.25 13.27042 0.4477735 9.428899
## 0.00 0.30 13.87833 0.4289005 9.958050
## 0.00 0.35 14.62201 0.4133093 10.570820
## 0.00 0.40 15.50348 0.4012624 11.299566
## 0.00 0.45 16.49710 0.3864137 12.093590
## 0.00 0.50 17.64229 0.3704828 12.940777
## 0.00 0.55 18.64877 0.3582644 13.678525
## 0.00 0.60 19.48843 0.3507808 14.256270
## 0.00 0.65 20.14874 0.3405494 14.739689
## 0.00 0.70 20.78344 0.3311630 15.237609
## 0.00 0.75 21.40064 0.3225830 15.686945
## 0.00 0.80 22.05634 0.3133142 16.189308
## 0.00 0.85 22.75321 0.3032592 16.722482
## 0.00 0.90 23.34474 0.2943714 17.164903
## 0.00 0.95 24.00617 0.2838052 17.660912
## 0.00 1.00 24.66318 0.2757693 18.096185
## 0.01 0.05 11.30772 0.4942483 8.093209
## 0.01 0.10 10.49617 0.5453325 7.476541
## 0.01 0.15 10.57148 0.5437003 7.812613
## 0.01 0.20 10.69096 0.5345984 8.045101
## 0.01 0.25 11.15396 0.5016138 8.348004
## 0.01 0.30 11.61816 0.4704004 8.601899
## 0.01 0.35 12.00717 0.4460259 8.858209
## 0.01 0.40 12.31704 0.4303885 9.109076
## 0.01 0.45 12.54078 0.4221279 9.299680
## 0.01 0.50 12.78854 0.4134597 9.483983
## 0.01 0.55 13.06522 0.4040495 9.725846
## 0.01 0.60 13.33346 0.3943784 9.959184
## 0.01 0.65 13.60415 0.3846914 10.150502
## 0.01 0.70 13.86627 0.3759544 10.296316
## 0.01 0.75 14.13677 0.3660339 10.466103
## 0.01 0.80 14.38367 0.3562923 10.651182
## 0.01 0.85 14.60850 0.3482641 10.819588
## 0.01 0.90 14.80564 0.3415264 10.963649
## 0.01 0.95 14.97014 0.3369076 11.076676
## 0.01 1.00 15.13126 0.3326039 11.171433
## 0.10 0.05 12.54680 0.4242606 9.427604
## 0.10 0.10 11.20048 0.4958875 8.037751
## 0.10 0.15 10.63530 0.5319108 7.517095
## 0.10 0.20 10.50390 0.5461102 7.544521
## 0.10 0.25 10.49472 0.5509441 7.702546
## 0.10 0.30 10.41925 0.5587123 7.759176
## 0.10 0.35 10.52712 0.5542362 7.920797
## 0.10 0.40 10.75165 0.5400271 8.099932
## 0.10 0.45 10.94363 0.5263853 8.246305
## 0.10 0.50 11.09861 0.5155610 8.331537
## 0.10 0.55 11.20544 0.5086556 8.375126
## 0.10 0.60 11.29184 0.5037099 8.416906
## 0.10 0.65 11.36553 0.5005304 8.468361
## 0.10 0.70 11.45314 0.4967980 8.534663
## 0.10 0.75 11.54955 0.4926189 8.599881
## 0.10 0.80 11.65627 0.4876138 8.658895
## 0.10 0.85 11.75883 0.4826742 8.707970
## 0.10 0.90 11.85808 0.4779313 8.752449
## 0.10 0.95 11.94750 0.4736723 8.793308
## 0.10 1.00 12.02008 0.4703192 8.819515
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 0.3 and lambda = 0.1.
lasso_pred <- predict(enetTune, newdata = test_fp)
defaultSummary(data.frame(obs = test_perm, pred=lasso_pred))## RMSE Rsquared MAE
## 13.5445765 0.2975322 11.1324739
f) Would you recommend any of your models to replace the permeability laboratory experiment?
The PLS model would be the best choice, however I’m not sure what could be considered an acceptable/good \(R^2\) for this study. An acceptable/good \(R^2\) would depend on how much of the variability just simply cannot be explained. Because my knowledge in biology and pharmaceuticals is severely lacking, I would not know if our \(R^2\) would be sufficient enough to recommend this model.
6.3 A chemical manufacturing process for a pharmaceutical product was discussed in Sect. 1.4. In this problem, the objective is to understand the relationship between biological measurements of the raw materials (predictors), measurements of the manufacturing process (predictors), and the response of product yield. Biological predictors cannot be changed but can be used to assess the quality of the raw material before processing. On the other hand, manufacturing process predictors can be changed in the manufacturing process. Improving product yield by 1% will boost revenue by approximately one hundred thousand dollars per batch:
a) The matrix processPredictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.
b) A small percentage of cells in the predictor set contain missing values. Use an imputation function to fill in these missing values
As in previous assignments, a kNN imputation method will be used to impute the missing values.
c) Split the data into a training and a test set, pre-process the data, and tune a model of your choice from this chapter. What is the optimal value of the performance metric?
First, we’ll utilize the nearZerovar function used earlier and then do a test/train split on the data. Then after the data has been prepared, I’ll run it through several of the models that was introduced in the chapter. I’ll then provide the relevant optimal value for each model.
cmp_non_pred <- nearZeroVar(cmpImp)
cmp <- cmpImp[,-cmp_non_pred]
set.seed(8)
trainrows <- createDataPartition(cmp$Yield,
p=0.8,
list=FALSE)
cmp_train <- cmp[trainrows,]
cmp_test <- cmp[-trainrows,]
X_train <- cmp_train[,-1]
Y_train <- cmp_train[,1]
X_test <- cmp_test[,-1]
Y_test <- cmp_test[,1]
# cross validation
ctrl = trainControl(method='cv', number = 10)Linear Regression
set.seed(8)
lm_cmpTune <- train(x = X_train, y = Y_train,
method = "lm",
trControl = ctrl)
lm_cmpTune## Linear Regression
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 129, 132, 129, 128, 131, 131, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 7.462568 0.383472 2.948277
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
For linear regression, stated above “Tuning parameter intercept was held constant at a value of TRUE”
PLS
set.seed(8)
pls_cmpTune <- train(x=X_train, y=Y_train,
method = "pls",
tuneLength = 20,
trControl = ctrl,
preProc = c("center", "scale"))
pls_cmpTune## Partial Least Squares
##
## 144 samples
## 56 predictor
##
## Pre-processing: centered (56), scaled (56)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 129, 132, 129, 128, 131, 131, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 1.554941 0.4483190 1.193826
## 2 2.055282 0.4678454 1.301299
## 3 1.596828 0.5187550 1.155859
## 4 1.767286 0.5178720 1.217920
## 5 2.054411 0.5106762 1.309666
## 6 2.182990 0.5052719 1.353758
## 7 2.525829 0.4942351 1.476046
## 8 2.662967 0.4957308 1.518107
## 9 2.919284 0.4774127 1.634768
## 10 3.193686 0.4711553 1.720623
## 11 3.501326 0.4606444 1.809975
## 12 3.647455 0.4501590 1.857916
## 13 3.633360 0.4554329 1.840247
## 14 3.658999 0.4474023 1.851775
## 15 3.620189 0.4470239 1.833028
## 16 3.482063 0.4487003 1.778911
## 17 3.341519 0.4502064 1.734531
## 18 3.244179 0.4552209 1.701815
## 19 3.097408 0.4575057 1.648129
## 20 3.098581 0.4601752 1.636867
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 1.
For PLS, the optimal number of latent variables is 3.
Ridge Regression
ridgeGrid <- data.frame(.lambda = seq(0.3, .5, length = 20))
set.seed(8)
ridge_cmpTune <- train(x=X_train, y=Y_train,
method = "ridge",
tuneGrid = ridgeGrid,
trControl = ctrl,
preProc = c("center", "scale"))
ridge_cmpTune## Ridge Regression
##
## 144 samples
## 56 predictor
##
## Pre-processing: centered (56), scaled (56)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 129, 132, 129, 128, 131, 131, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.3000000 2.171350 0.5215160 1.369490
## 0.3105263 2.165935 0.5217617 1.369538
## 0.3210526 2.161043 0.5219942 1.369735
## 0.3315789 2.156640 0.5222149 1.370085
## 0.3421053 2.152695 0.5224252 1.370611
## 0.3526316 2.149180 0.5226263 1.371462
## 0.3631579 2.146068 0.5228191 1.372406
## 0.3736842 2.143337 0.5230047 1.373563
## 0.3842105 2.140964 0.5231839 1.374892
## 0.3947368 2.138930 0.5233574 1.376320
## 0.4052632 2.137216 0.5235260 1.377904
## 0.4157895 2.135805 0.5236901 1.379553
## 0.4263158 2.134683 0.5238504 1.381261
## 0.4368421 2.133833 0.5240073 1.383056
## 0.4473684 2.133244 0.5241613 1.385372
## 0.4578947 2.132902 0.5243127 1.387750
## 0.4684211 2.132796 0.5244619 1.390172
## 0.4789474 2.132916 0.5246092 1.392766
## 0.4894737 2.133250 0.5247549 1.395413
## 0.5000000 2.133790 0.5248992 1.398096
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.4684211.
The optimal model had a lambda value of 0.4052632
Lasso
enetGrid <- expand.grid(.lambda = c(0, 0.01, .1),
.fraction = seq(.05, 1, length = 20))
set.seed(8)
enet_cmpTune <- train(x=X_train, y=Y_train,
method = "enet",
tuneGrid = enetGrid,
trControl = ctrl,
preProc = c("center", "scale"))
enet_cmpTune ## Elasticnet
##
## 144 samples
## 56 predictor
##
## Pre-processing: centered (56), scaled (56)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 129, 132, 129, 128, 131, 131, ...
## Resampling results across tuning parameters:
##
## lambda fraction RMSE Rsquared MAE
## 0.00 0.05 1.215442 0.6120655 1.0061326
## 0.00 0.10 1.278747 0.5842531 1.0038815
## 0.00 0.15 1.531577 0.5613581 1.0492453
## 0.00 0.20 1.585318 0.5570600 1.0665089
## 0.00 0.25 1.454973 0.5546742 1.0548253
## 0.00 0.30 1.517125 0.5331071 1.1233330
## 0.00 0.35 1.750319 0.5037711 1.2209377
## 0.00 0.40 1.939982 0.4869322 1.2956932
## 0.00 0.45 1.917775 0.4804730 1.3056247
## 0.00 0.50 2.374497 0.4615946 1.4353601
## 0.00 0.55 3.348818 0.4446601 1.6992079
## 0.00 0.60 4.308300 0.4322957 1.9514471
## 0.00 0.65 5.182976 0.4243114 2.1817724
## 0.00 0.70 6.022617 0.4163797 2.4142604
## 0.00 0.75 6.597587 0.4094283 2.5891565
## 0.00 0.80 6.770507 0.4030721 2.6653127
## 0.00 0.85 6.939576 0.3970876 2.7390958
## 0.00 0.90 7.057579 0.3918822 2.7987088
## 0.00 0.95 7.241703 0.3873771 2.8706171
## 0.00 1.00 7.462568 0.3834720 2.9482766
## 0.01 0.05 1.502638 0.5751176 1.2346498
## 0.01 0.10 1.270991 0.6026829 1.0587577
## 0.01 0.15 1.265598 0.5737832 1.0180169
## 0.01 0.20 1.364192 0.5645903 1.0235409
## 0.01 0.25 1.600102 0.5454874 1.0869972
## 0.01 0.30 1.775445 0.5366448 1.1460554
## 0.01 0.35 1.784454 0.5334887 1.1543766
## 0.01 0.40 1.755650 0.5366938 1.1507519
## 0.01 0.45 1.723746 0.5422050 1.1442372
## 0.01 0.50 1.708300 0.5476298 1.1461731
## 0.01 0.55 1.799308 0.5396728 1.1909662
## 0.01 0.60 1.924147 0.5285191 1.2403883
## 0.01 0.65 2.021887 0.5192113 1.2828267
## 0.01 0.70 2.252929 0.5090508 1.3569929
## 0.01 0.75 2.467770 0.5013379 1.4252814
## 0.01 0.80 2.605230 0.4945886 1.4677347
## 0.01 0.85 2.739536 0.4882602 1.5121611
## 0.01 0.90 2.988542 0.4812531 1.5878352
## 0.01 0.95 3.235669 0.4753663 1.6611628
## 0.01 1.00 3.470767 0.4701015 1.7296431
## 0.10 0.05 1.621044 0.5365171 1.3299694
## 0.10 0.10 1.445478 0.5885097 1.1920142
## 0.10 0.15 1.311479 0.5981651 1.0910337
## 0.10 0.20 1.234110 0.5940398 1.0341257
## 0.10 0.25 1.235512 0.5844133 0.9986076
## 0.10 0.30 1.365497 0.5690259 1.0147910
## 0.10 0.35 1.500354 0.5622917 1.0485666
## 0.10 0.40 1.604483 0.5552707 1.0796564
## 0.10 0.45 1.682237 0.5514107 1.1108060
## 0.10 0.50 1.702451 0.5503190 1.1289963
## 0.10 0.55 1.743950 0.5500092 1.1499204
## 0.10 0.60 1.789034 0.5478768 1.1730868
## 0.10 0.65 1.834246 0.5446151 1.1955993
## 0.10 0.70 1.886039 0.5380923 1.2259580
## 0.10 0.75 1.950352 0.5312070 1.2591265
## 0.10 0.80 2.053357 0.5249776 1.2993386
## 0.10 0.85 2.179490 0.5205151 1.3439724
## 0.10 0.90 2.293311 0.5165944 1.3830474
## 0.10 0.95 2.387531 0.5128940 1.4157880
## 0.10 1.00 2.471390 0.5093985 1.4458673
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 0.05 and lambda = 0.
The optimal Lasso model had fraction = 0.3 and lambda = 0.01
d) Predict the response for the test set. What is the value of the performance metric and how does this compare with the resampled performance metric on the training set?
We’ll create a function to evaluate how our model performs against the test set so we can reuse later in the markdown.
m_eval <- function(tuned_m){
predictions <- predict(tuned_m, newdata = X_test)
return(defaultSummary(data.frame(obs = Y_test, pred=predictions)))
}Below, a comparison of the performance metrics between the test set and the training set showed that the test set for all four models had better metrics.
Linear Regression
Test
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## RMSE Rsquared MAE
## 1.1785319 0.5659899 0.9414049
PLS
Ridge
Lasso
Test
## RMSE Rsquared MAE
## 1.2836058 0.6360101 1.0379887
- Which predictors are most important in the model you have trained? Do either the biological or process predictors dominate the list?
Of the models above, the best performing model was the lasso model. We’ll evaluate the most important predictors for this model.
For the below, we’ll only select the predictors where the coefficient is not 0
library(elasticnet)
set.seed(8)
lasso_m <- enet(x = as.matrix(X_train), y = Y_train,
lambda = 0.01, normalize = TRUE)
lasso_pred <- predict(lasso_m, newx = as.matrix(X_test),
s = .1, mode = "fraction",
type = "coefficients")
coeff <- lasso_pred$coefficients
coeff[coeff != 0]## ManufacturingProcess09 ManufacturingProcess13 ManufacturingProcess32
## 0.2034032 -0.1959582 0.1132021
From the above, we can see that there are only 5 predictors used and they are all manufacturing predictors.
- Explore the relationships between each of the top predictors and the response. How could this information be helpful in improving yield in future runs of the manufacturing process?
The assumed goal is that the company/factory wants to increase the yield. Based on the values above, the manufacturing processes that should be enhanced/increased is processes 9 and 32 while simultaneously reducing or eliminating processes 13, 17, and 36