library(AppliedPredictiveModeling)
data(permeability)
fingerprints contains the 1,107 binary molecular predictors for the 165 compounds, while permeability contains permeability responsenearZeroVar function from the caret package. How many predictors are left for modeling?library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
dim(fingerprints)
## [1] 165 1107
fp <- fingerprints[, -nearZeroVar(fingerprints)]
dim(fp)
## [1] 165 388
After running nearZeroVar, our number of predictors drops to 388 out of the original 1107.
library(pls)
##
## Attaching package: 'pls'
## The following object is masked from 'package:caret':
##
## R2
## The following object is masked from 'package:stats':
##
## loadings
set.seed(100)
fp <- as.data.frame(fp)
smp <- floor(0.75 * nrow(fp))
x <- sample(seq_len(nrow(fp)), size = smp)
y <- sample(seq_len(nrow(permeability)), size = smp)
#x <- createDataPartition(fp$X1, p = .75, list = FALSE)
#y <- createDataPartition(permeability, p = .75, list = FALSE)
fpTrain <- fp[x,]
fpTest <- fp[-x,]
pTrain <- permeability[y,]
pTest <- permeability[-y,]
ctrl <- trainControl(method = "cv", number = 10)
plsTune <- train(fpTrain, pTrain,
method = "pls",
tuneLength = 20, trControl = ctrl,
preProc = c("center", "scale"))
plsTune
## Partial Least Squares
##
## 123 samples
## 388 predictors
##
## Pre-processing: centered (388), scaled (388)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 111, 110, 111, 110, 111, 111, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 15.68753 0.07384986 12.33870
## 2 15.11851 0.08123332 11.72096
## 3 15.41619 0.08618764 12.18908
## 4 15.22515 0.11307945 11.99908
## 5 15.67184 0.09944382 12.38331
## 6 15.48768 0.09404962 12.13557
## 7 16.03970 0.07151399 12.66405
## 8 16.56358 0.05268528 13.00251
## 9 16.66535 0.05799473 13.09828
## 10 17.37128 0.06056492 13.78941
## 11 17.92980 0.05852196 14.23970
## 12 18.83832 0.04997884 14.98638
## 13 19.34537 0.04078344 15.59109
## 14 19.45846 0.04847269 15.57522
## 15 19.80906 0.04963971 15.83474
## 16 20.07748 0.05520941 15.95051
## 17 20.44437 0.05958450 16.19336
## 18 20.64521 0.06211526 16.40822
## 19 20.79998 0.06972072 16.69736
## 20 21.33585 0.06149764 17.05324
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 2.
The \(R^2\) value for the chosen model (1) is 0.08589549.
pls.pred <- predict(plsTune, newdata = fpTest)
postResample(pred = pls.pred, obs = pTest)
## RMSE Rsquared MAE
## 18.68255753 0.01121507 14.09012196
The \(R^2\) of the test set prediction is 0.01121507
ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 15))
ridgeRegFit <- train(fpTrain, pTrain, method = "ridge",
tuneGrid = ridgeGrid, trControl = ctrl,
preProc = c("center", "scale"))
## Warning: model fit failed for Fold01: lambda=0.000000 Error in if (zmin < gamhat) { : missing value where TRUE/FALSE needed
## Warning: model fit failed for Fold02: lambda=0.000000 Error in if (zmin < gamhat) { : missing value where TRUE/FALSE needed
## Warning: model fit failed for Fold04: lambda=0.000000 Error in if (zmin < gamhat) { : missing value where TRUE/FALSE needed
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
## trainInfo, : There were missing values in resampled performance measures.
ridgeRegFit
## Ridge Regression
##
## 123 samples
## 388 predictors
##
## Pre-processing: centered (388), scaled (388)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 111, 111, 111, 111, 111, 111, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.000000000 20.51098 0.06905898 16.41406
## 0.007142857 194.28218 0.09077920 130.66816
## 0.014285714 20.66957 0.07865991 16.47118
## 0.021428571 644.30493 0.08393098 530.58719
## 0.028571429 113.58739 0.05258217 89.28957
## 0.035714286 19.54970 0.07040693 15.41411
## 0.042857143 19.24159 0.06840641 15.19483
## 0.050000000 19.04286 0.06565406 15.02679
## 0.057142857 19.01522 0.06335369 14.96659
## 0.064285714 18.81534 0.06142334 14.83894
## 0.071428571 18.72133 0.05974456 14.73841
## 0.078571429 18.62610 0.05779726 14.65993
## 0.085714286 18.55481 0.05628458 14.59050
## 0.092857143 18.49479 0.05491843 14.52929
## 0.100000000 18.44258 0.05369448 14.47271
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.1.
rr.pred <- predict(ridgeRegFit, newdata = fpTest)
postResample(pred = rr.pred, obs = pTest)
## RMSE Rsquared MAE
## 23.542228994 0.008685525 18.737656451
The \(R^2\) for a penalized regression model, in this case a ridge-regression model, is 0.008685525
qqnorm(pls.pred, main = "PLS")
qqline(pls.pred)
qqnorm(rr.pred, main = "Ridge-Regression")
qqline(rr.pred)
Though the \(R^2\) score for both models are low, we should ultimately examine the residuals of both to see how well the model truly performs. In an examination of the two, it appears that the PLS model does a better job at fitting the data, as opposed to the ridge-regression model, which constantly under and over predicts.
This is something that can definitely be improved upon. Would it replace the permeability laboratory experiment? Probably not at this junction, but the potential is certainly there.
data(ChemicalManufacturingProcess)
processPredictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.We first need to see which of the predictors have missing values.
summary(ChemicalManufacturingProcess)
## Yield BiologicalMaterial01 BiologicalMaterial02
## Min. :35.25 Min. :4.580 Min. :46.87
## 1st Qu.:38.75 1st Qu.:5.978 1st Qu.:52.68
## Median :39.97 Median :6.305 Median :55.09
## Mean :40.18 Mean :6.411 Mean :55.69
## 3rd Qu.:41.48 3rd Qu.:6.870 3rd Qu.:58.74
## Max. :46.34 Max. :8.810 Max. :64.75
##
## BiologicalMaterial03 BiologicalMaterial04 BiologicalMaterial05
## Min. :56.97 Min. : 9.38 Min. :13.24
## 1st Qu.:64.98 1st Qu.:11.24 1st Qu.:17.23
## Median :67.22 Median :12.10 Median :18.49
## Mean :67.70 Mean :12.35 Mean :18.60
## 3rd Qu.:70.43 3rd Qu.:13.22 3rd Qu.:19.90
## Max. :78.25 Max. :23.09 Max. :24.85
##
## BiologicalMaterial06 BiologicalMaterial07 BiologicalMaterial08
## Min. :40.60 Min. :100.0 Min. :15.88
## 1st Qu.:46.05 1st Qu.:100.0 1st Qu.:17.06
## Median :48.46 Median :100.0 Median :17.51
## Mean :48.91 Mean :100.0 Mean :17.49
## 3rd Qu.:51.34 3rd Qu.:100.0 3rd Qu.:17.88
## Max. :59.38 Max. :100.8 Max. :19.14
##
## BiologicalMaterial09 BiologicalMaterial10 BiologicalMaterial11
## Min. :11.44 Min. :1.770 Min. :135.8
## 1st Qu.:12.60 1st Qu.:2.460 1st Qu.:143.8
## Median :12.84 Median :2.710 Median :146.1
## Mean :12.85 Mean :2.801 Mean :147.0
## 3rd Qu.:13.13 3rd Qu.:2.990 3rd Qu.:149.6
## Max. :14.08 Max. :6.870 Max. :158.7
##
## BiologicalMaterial12 ManufacturingProcess01 ManufacturingProcess02
## Min. :18.35 Min. : 0.00 Min. : 0.00
## 1st Qu.:19.73 1st Qu.:10.80 1st Qu.:19.30
## Median :20.12 Median :11.40 Median :21.00
## Mean :20.20 Mean :11.21 Mean :16.68
## 3rd Qu.:20.75 3rd Qu.:12.15 3rd Qu.:21.50
## Max. :22.21 Max. :14.10 Max. :22.50
## NA's :1 NA's :3
## ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05
## Min. :1.47 Min. :911.0 Min. : 923.0
## 1st Qu.:1.53 1st Qu.:928.0 1st Qu.: 986.8
## Median :1.54 Median :934.0 Median : 999.2
## Mean :1.54 Mean :931.9 Mean :1001.7
## 3rd Qu.:1.55 3rd Qu.:936.0 3rd Qu.:1008.9
## Max. :1.60 Max. :946.0 Max. :1175.3
## NA's :15 NA's :1 NA's :1
## ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08
## Min. :203.0 Min. :177.0 Min. :177.0
## 1st Qu.:205.7 1st Qu.:177.0 1st Qu.:177.0
## Median :206.8 Median :177.0 Median :178.0
## Mean :207.4 Mean :177.5 Mean :177.6
## 3rd Qu.:208.7 3rd Qu.:178.0 3rd Qu.:178.0
## Max. :227.4 Max. :178.0 Max. :178.0
## NA's :2 NA's :1 NA's :1
## ManufacturingProcess09 ManufacturingProcess10 ManufacturingProcess11
## Min. :38.89 Min. : 7.500 Min. : 7.500
## 1st Qu.:44.89 1st Qu.: 8.700 1st Qu.: 9.000
## Median :45.73 Median : 9.100 Median : 9.400
## Mean :45.66 Mean : 9.179 Mean : 9.386
## 3rd Qu.:46.52 3rd Qu.: 9.550 3rd Qu.: 9.900
## Max. :49.36 Max. :11.600 Max. :11.500
## NA's :9 NA's :10
## ManufacturingProcess12 ManufacturingProcess13 ManufacturingProcess14
## Min. : 0.0 Min. :32.10 Min. :4701
## 1st Qu.: 0.0 1st Qu.:33.90 1st Qu.:4828
## Median : 0.0 Median :34.60 Median :4856
## Mean : 857.8 Mean :34.51 Mean :4854
## 3rd Qu.: 0.0 3rd Qu.:35.20 3rd Qu.:4882
## Max. :4549.0 Max. :38.60 Max. :5055
## NA's :1 NA's :1
## ManufacturingProcess15 ManufacturingProcess16 ManufacturingProcess17
## Min. :5904 Min. : 0 Min. :31.30
## 1st Qu.:6010 1st Qu.:4561 1st Qu.:33.50
## Median :6032 Median :4588 Median :34.40
## Mean :6039 Mean :4566 Mean :34.34
## 3rd Qu.:6061 3rd Qu.:4619 3rd Qu.:35.10
## Max. :6233 Max. :4852 Max. :40.00
##
## ManufacturingProcess18 ManufacturingProcess19 ManufacturingProcess20
## Min. : 0 Min. :5890 Min. : 0
## 1st Qu.:4813 1st Qu.:6001 1st Qu.:4553
## Median :4835 Median :6022 Median :4582
## Mean :4810 Mean :6028 Mean :4556
## 3rd Qu.:4862 3rd Qu.:6050 3rd Qu.:4610
## Max. :4971 Max. :6146 Max. :4759
##
## ManufacturingProcess21 ManufacturingProcess22 ManufacturingProcess23
## Min. :-1.8000 Min. : 0.000 Min. :0.000
## 1st Qu.:-0.6000 1st Qu.: 3.000 1st Qu.:2.000
## Median :-0.3000 Median : 5.000 Median :3.000
## Mean :-0.1642 Mean : 5.406 Mean :3.017
## 3rd Qu.: 0.0000 3rd Qu.: 8.000 3rd Qu.:4.000
## Max. : 3.6000 Max. :12.000 Max. :6.000
## NA's :1 NA's :1
## ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26
## Min. : 0.000 Min. : 0 Min. : 0
## 1st Qu.: 4.000 1st Qu.:4832 1st Qu.:6020
## Median : 8.000 Median :4855 Median :6047
## Mean : 8.834 Mean :4828 Mean :6016
## 3rd Qu.:14.000 3rd Qu.:4877 3rd Qu.:6070
## Max. :23.000 Max. :4990 Max. :6161
## NA's :1 NA's :5 NA's :5
## ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29
## Min. : 0 Min. : 0.000 Min. : 0.00
## 1st Qu.:4560 1st Qu.: 0.000 1st Qu.:19.70
## Median :4587 Median :10.400 Median :19.90
## Mean :4563 Mean : 6.592 Mean :20.01
## 3rd Qu.:4609 3rd Qu.:10.750 3rd Qu.:20.40
## Max. :4710 Max. :11.500 Max. :22.00
## NA's :5 NA's :5 NA's :5
## ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess32
## Min. : 0.000 Min. : 0.00 Min. :143.0
## 1st Qu.: 8.800 1st Qu.:70.10 1st Qu.:155.0
## Median : 9.100 Median :70.80 Median :158.0
## Mean : 9.161 Mean :70.18 Mean :158.5
## 3rd Qu.: 9.700 3rd Qu.:71.40 3rd Qu.:162.0
## Max. :11.200 Max. :72.50 Max. :173.0
## NA's :5 NA's :5
## ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35
## Min. :56.00 Min. :2.300 Min. :463.0
## 1st Qu.:62.00 1st Qu.:2.500 1st Qu.:490.0
## Median :64.00 Median :2.500 Median :495.0
## Mean :63.54 Mean :2.494 Mean :495.6
## 3rd Qu.:65.00 3rd Qu.:2.500 3rd Qu.:501.5
## Max. :70.00 Max. :2.600 Max. :522.0
## NA's :5 NA's :5 NA's :5
## ManufacturingProcess36 ManufacturingProcess37 ManufacturingProcess38
## Min. :0.01700 Min. :0.000 Min. :0.000
## 1st Qu.:0.01900 1st Qu.:0.700 1st Qu.:2.000
## Median :0.02000 Median :1.000 Median :3.000
## Mean :0.01957 Mean :1.014 Mean :2.534
## 3rd Qu.:0.02000 3rd Qu.:1.300 3rd Qu.:3.000
## Max. :0.02200 Max. :2.300 Max. :3.000
## NA's :5
## ManufacturingProcess39 ManufacturingProcess40 ManufacturingProcess41
## Min. :0.000 Min. :0.00000 Min. :0.00000
## 1st Qu.:7.100 1st Qu.:0.00000 1st Qu.:0.00000
## Median :7.200 Median :0.00000 Median :0.00000
## Mean :6.851 Mean :0.01771 Mean :0.02371
## 3rd Qu.:7.300 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :7.500 Max. :0.10000 Max. :0.20000
## NA's :1 NA's :1
## ManufacturingProcess42 ManufacturingProcess43 ManufacturingProcess44
## Min. : 0.00 Min. : 0.0000 Min. :0.000
## 1st Qu.:11.40 1st Qu.: 0.6000 1st Qu.:1.800
## Median :11.60 Median : 0.8000 Median :1.900
## Mean :11.21 Mean : 0.9119 Mean :1.805
## 3rd Qu.:11.70 3rd Qu.: 1.0250 3rd Qu.:1.900
## Max. :12.10 Max. :11.0000 Max. :2.100
##
## ManufacturingProcess45
## Min. :0.000
## 1st Qu.:2.100
## Median :2.200
## Mean :2.138
## 3rd Qu.:2.300
## Max. :2.600
##
We see that ManufacturingProcess01-ManufacturingProcess08, ManufacturingProcess10-ManufacturingProcess12, ManufacturingProcess14, ManufacturingProcess22-ManufacturingProcess31, ManufacturingProcess33-ManufacturingProcess36, ManufacturingProcess40, and ManufacturingProcess41 are all the predictors we need to impute.
library(impute)
cmp <- impute.knn(as.matrix(ChemicalManufacturingProcess))
cmp <- as.data.frame(cmp$data)
summary(cmp)
## Yield BiologicalMaterial01 BiologicalMaterial02
## Min. :35.25 Min. :4.580 Min. :46.87
## 1st Qu.:38.75 1st Qu.:5.978 1st Qu.:52.68
## Median :39.97 Median :6.305 Median :55.09
## Mean :40.18 Mean :6.411 Mean :55.69
## 3rd Qu.:41.48 3rd Qu.:6.870 3rd Qu.:58.74
## Max. :46.34 Max. :8.810 Max. :64.75
## BiologicalMaterial03 BiologicalMaterial04 BiologicalMaterial05
## Min. :56.97 Min. : 9.38 Min. :13.24
## 1st Qu.:64.98 1st Qu.:11.24 1st Qu.:17.23
## Median :67.22 Median :12.10 Median :18.49
## Mean :67.70 Mean :12.35 Mean :18.60
## 3rd Qu.:70.43 3rd Qu.:13.22 3rd Qu.:19.90
## Max. :78.25 Max. :23.09 Max. :24.85
## BiologicalMaterial06 BiologicalMaterial07 BiologicalMaterial08
## Min. :40.60 Min. :100.0 Min. :15.88
## 1st Qu.:46.05 1st Qu.:100.0 1st Qu.:17.06
## Median :48.46 Median :100.0 Median :17.51
## Mean :48.91 Mean :100.0 Mean :17.49
## 3rd Qu.:51.34 3rd Qu.:100.0 3rd Qu.:17.88
## Max. :59.38 Max. :100.8 Max. :19.14
## BiologicalMaterial09 BiologicalMaterial10 BiologicalMaterial11
## Min. :11.44 Min. :1.770 Min. :135.8
## 1st Qu.:12.60 1st Qu.:2.460 1st Qu.:143.8
## Median :12.84 Median :2.710 Median :146.1
## Mean :12.85 Mean :2.801 Mean :147.0
## 3rd Qu.:13.13 3rd Qu.:2.990 3rd Qu.:149.6
## Max. :14.08 Max. :6.870 Max. :158.7
## BiologicalMaterial12 ManufacturingProcess01 ManufacturingProcess02
## Min. :18.35 Min. : 0.00 Min. : 0.00
## 1st Qu.:19.73 1st Qu.:10.78 1st Qu.:19.17
## Median :20.12 Median :11.40 Median :21.00
## Mean :20.20 Mean :11.20 Mean :16.66
## 3rd Qu.:20.75 3rd Qu.:12.12 3rd Qu.:21.50
## Max. :22.21 Max. :14.10 Max. :22.50
## ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05
## Min. :1.470 Min. :911.0 Min. : 923.0
## 1st Qu.:1.530 1st Qu.:928.0 1st Qu.: 986.8
## Median :1.544 Median :934.0 Median : 999.4
## Mean :1.540 Mean :931.8 Mean :1001.8
## 3rd Qu.:1.550 3rd Qu.:936.0 3rd Qu.:1009.2
## Max. :1.600 Max. :946.0 Max. :1175.3
## ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08
## Min. :203.0 Min. :177.0 Min. :177.0
## 1st Qu.:205.7 1st Qu.:177.0 1st Qu.:177.0
## Median :206.8 Median :177.0 Median :178.0
## Mean :207.4 Mean :177.5 Mean :177.6
## 3rd Qu.:208.7 3rd Qu.:178.0 3rd Qu.:178.0
## Max. :227.4 Max. :178.0 Max. :178.0
## ManufacturingProcess09 ManufacturingProcess10 ManufacturingProcess11
## Min. :38.89 Min. : 7.500 Min. : 7.500
## 1st Qu.:44.89 1st Qu.: 8.700 1st Qu.: 9.000
## Median :45.73 Median : 9.100 Median : 9.400
## Mean :45.66 Mean : 9.186 Mean : 9.396
## 3rd Qu.:46.52 3rd Qu.: 9.525 3rd Qu.: 9.900
## Max. :49.36 Max. :11.600 Max. :11.500
## ManufacturingProcess12 ManufacturingProcess13 ManufacturingProcess14
## Min. : 0.0 Min. :32.10 Min. :4701
## 1st Qu.: 0.0 1st Qu.:33.90 1st Qu.:4827
## Median : 0.0 Median :34.60 Median :4856
## Mean : 852.9 Mean :34.51 Mean :4854
## 3rd Qu.: 0.0 3rd Qu.:35.20 3rd Qu.:4882
## Max. :4549.0 Max. :38.60 Max. :5055
## ManufacturingProcess15 ManufacturingProcess16 ManufacturingProcess17
## Min. :5904 Min. : 0 Min. :31.30
## 1st Qu.:6010 1st Qu.:4561 1st Qu.:33.50
## Median :6032 Median :4588 Median :34.40
## Mean :6039 Mean :4566 Mean :34.34
## 3rd Qu.:6061 3rd Qu.:4619 3rd Qu.:35.10
## Max. :6233 Max. :4852 Max. :40.00
## ManufacturingProcess18 ManufacturingProcess19 ManufacturingProcess20
## Min. : 0 Min. :5890 Min. : 0
## 1st Qu.:4813 1st Qu.:6001 1st Qu.:4553
## Median :4835 Median :6022 Median :4582
## Mean :4810 Mean :6028 Mean :4556
## 3rd Qu.:4862 3rd Qu.:6050 3rd Qu.:4610
## Max. :4971 Max. :6146 Max. :4759
## ManufacturingProcess21 ManufacturingProcess22 ManufacturingProcess23
## Min. :-1.8000 Min. : 0.000 Min. :0.000
## 1st Qu.:-0.6000 1st Qu.: 3.000 1st Qu.:2.000
## Median :-0.3000 Median : 5.000 Median :3.000
## Mean :-0.1642 Mean : 5.406 Mean :3.011
## 3rd Qu.: 0.0000 3rd Qu.: 8.000 3rd Qu.:4.000
## Max. : 3.6000 Max. :12.000 Max. :6.000
## ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26
## Min. : 0.000 Min. : 0 Min. : 0
## 1st Qu.: 4.000 1st Qu.:4831 1st Qu.:6020
## Median : 8.000 Median :4854 Median :6046
## Mean : 8.823 Mean :4825 Mean :6013
## 3rd Qu.:14.000 3rd Qu.:4876 3rd Qu.:6069
## Max. :23.000 Max. :4990 Max. :6161
## ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29
## Min. : 0 Min. : 0.000 Min. : 0.0
## 1st Qu.:4561 1st Qu.: 0.000 1st Qu.:19.7
## Median :4588 Median :10.400 Median :19.9
## Mean :4561 Mean : 6.444 Mean :20.0
## 3rd Qu.:4609 3rd Qu.:10.700 3rd Qu.:20.4
## Max. :4710 Max. :11.500 Max. :22.0
## ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess32
## Min. : 0.000 Min. : 0.00 Min. :143.0
## 1st Qu.: 8.800 1st Qu.:70.10 1st Qu.:155.0
## Median : 9.200 Median :70.80 Median :158.0
## Mean : 9.167 Mean :70.16 Mean :158.5
## 3rd Qu.: 9.700 3rd Qu.:71.40 3rd Qu.:162.0
## Max. :11.200 Max. :72.50 Max. :173.0
## ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35
## Min. :56.00 Min. :2.300 Min. :463.0
## 1st Qu.:62.00 1st Qu.:2.500 1st Qu.:490.0
## Median :64.00 Median :2.500 Median :495.5
## Mean :63.49 Mean :2.493 Mean :495.7
## 3rd Qu.:65.00 3rd Qu.:2.500 3rd Qu.:501.2
## Max. :70.00 Max. :2.600 Max. :522.0
## ManufacturingProcess36 ManufacturingProcess37 ManufacturingProcess38
## Min. :0.01700 Min. :0.000 Min. :0.000
## 1st Qu.:0.01900 1st Qu.:0.700 1st Qu.:2.000
## Median :0.02000 Median :1.000 Median :3.000
## Mean :0.01959 Mean :1.014 Mean :2.534
## 3rd Qu.:0.02000 3rd Qu.:1.300 3rd Qu.:3.000
## Max. :0.02200 Max. :2.300 Max. :3.000
## ManufacturingProcess39 ManufacturingProcess40 ManufacturingProcess41
## Min. :0.000 Min. :0.00000 Min. :0.00000
## 1st Qu.:7.100 1st Qu.:0.00000 1st Qu.:0.00000
## Median :7.200 Median :0.00000 Median :0.00000
## Mean :6.851 Mean :0.01761 Mean :0.02358
## 3rd Qu.:7.300 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :7.500 Max. :0.10000 Max. :0.20000
## ManufacturingProcess42 ManufacturingProcess43 ManufacturingProcess44
## Min. : 0.00 Min. : 0.0000 Min. :0.000
## 1st Qu.:11.40 1st Qu.: 0.6000 1st Qu.:1.800
## Median :11.60 Median : 0.8000 Median :1.900
## Mean :11.21 Mean : 0.9119 Mean :1.805
## 3rd Qu.:11.70 3rd Qu.: 1.0250 3rd Qu.:1.900
## Max. :12.10 Max. :11.0000 Max. :2.100
## ManufacturingProcess45
## Min. :0.000
## 1st Qu.:2.100
## Median :2.200
## Mean :2.138
## 3rd Qu.:2.300
## Max. :2.600
Now we have no NA’s.
We’ll use Partial Least Squares, as it performed better in the previous question.
smp1 <- floor(0.75 * nrow(cmp))
x1 <- sample(seq_len(nrow(cmp)), size = smp)
y1 <- sample(seq_len(nrow(cmp)), size = smp)
cmpTrain <- cmp[x1,-1]
cmpTest <- cmp[-x1,-1]
yTrain <- cmp[y1,1]
yTest <- cmp[-y1,1]
cmpTune <- train(cmpTrain, yTrain,
method = "pls",
tuneLength = 20, trControl = ctrl,
preProc = c("center", "scale"))
cmpTune
## Partial Least Squares
##
## 123 samples
## 57 predictor
##
## Pre-processing: centered (57), scaled (57)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 111, 109, 111, 111, 111, 111, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 1.990940 0.08872065 1.584375
## 2 2.542706 0.03976928 1.828412
## 3 2.952260 0.08348326 2.050262
## 4 3.566405 0.07761645 2.243481
## 5 3.799803 0.06841577 2.348720
## 6 3.771451 0.06546726 2.345041
## 7 3.833465 0.06413688 2.358874
## 8 4.100629 0.07865771 2.502442
## 9 3.812132 0.07371600 2.448421
## 10 3.424531 0.04191420 2.308153
## 11 3.346966 0.06570311 2.322883
## 12 3.127500 0.06660960 2.241396
## 13 2.973043 0.06205586 2.177309
## 14 3.049624 0.05758056 2.220079
## 15 3.204395 0.05281387 2.281923
## 16 3.162355 0.06017971 2.282734
## 17 3.278946 0.06524080 2.354455
## 18 3.383178 0.06943381 2.385199
## 19 3.692067 0.06929113 2.493352
## 20 3.785993 0.07096712 2.521672
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 1.
In this instance, the optimal value was ncomp = 2, which has an \(R^2\) of 0.12783409.
cmpPred <- predict(cmpTune, newdata = cmpTest)
postResample(pred = cmpPred, obs = yTest)
## RMSE Rsquared MAE
## 1.4462084 0.2041162 1.2537531
The \(R^2\) is a really low 0.008163636.
cmpTune$finalModel$coefficients
## , , 1 comps
##
## .outcome
## BiologicalMaterial01 0.0608653757
## BiologicalMaterial02 0.0553578126
## BiologicalMaterial03 0.0545395762
## BiologicalMaterial04 0.0400824643
## BiologicalMaterial05 0.0031263824
## BiologicalMaterial06 0.0517050408
## BiologicalMaterial07 -0.0026399147
## BiologicalMaterial08 0.0532354895
## BiologicalMaterial09 0.0491323388
## BiologicalMaterial10 0.0335511643
## BiologicalMaterial11 0.0278888413
## BiologicalMaterial12 0.0404087506
## ManufacturingProcess01 0.0135255227
## ManufacturingProcess02 -0.0186387429
## ManufacturingProcess03 -0.0243446535
## ManufacturingProcess04 -0.0157808228
## ManufacturingProcess05 0.0319798266
## ManufacturingProcess06 0.0608121273
## ManufacturingProcess07 -0.0114531023
## ManufacturingProcess08 -0.0307741942
## ManufacturingProcess09 0.0534116518
## ManufacturingProcess10 0.0131215054
## ManufacturingProcess11 0.0173837131
## ManufacturingProcess12 0.0083950179
## ManufacturingProcess13 -0.0402197652
## ManufacturingProcess14 -0.0049238624
## ManufacturingProcess15 0.0115572171
## ManufacturingProcess16 0.0039142476
## ManufacturingProcess17 -0.0205987386
## ManufacturingProcess18 -0.0501901868
## ManufacturingProcess19 0.0111998087
## ManufacturingProcess20 -0.0488868804
## ManufacturingProcess21 0.0184004041
## ManufacturingProcess22 -0.0233300208
## ManufacturingProcess23 -0.0125285815
## ManufacturingProcess24 -0.0143699415
## ManufacturingProcess25 -0.0011906891
## ManufacturingProcess26 -0.0008204135
## ManufacturingProcess27 -0.0018846585
## ManufacturingProcess28 0.0272632465
## ManufacturingProcess29 0.0046614355
## ManufacturingProcess30 0.0054802385
## ManufacturingProcess31 -0.0064500459
## ManufacturingProcess32 0.0267722840
## ManufacturingProcess33 0.0117218025
## ManufacturingProcess34 -0.0302936245
## ManufacturingProcess35 -0.0591382575
## ManufacturingProcess36 -0.0545803525
## ManufacturingProcess37 0.0105183772
## ManufacturingProcess38 -0.0302796235
## ManufacturingProcess39 -0.0443561795
## ManufacturingProcess40 -0.0140099119
## ManufacturingProcess41 -0.0127288010
## ManufacturingProcess42 -0.0194486336
## ManufacturingProcess43 -0.0067860655
## ManufacturingProcess44 -0.0162936766
## ManufacturingProcess45 -0.0204819420
Looking at only 2 comps, The Manufacturing Process seems to have the most importance, as generally their scores are higher than the Biological Materials. ManufacturingProcess40 has the highest score at 0.2013412892.
The highest scoring Biological Material was 08 at 0.0945111599. In general, though the biological materials cannot be changed during the refinement process, identifying which ingredients/materials are more vital will help to ensure a higher yield as the company can focus on obtaining high quality ingredients of those materials. Likewise, knowing the most important manufacturing process steps allows the company to pinpoint where they can start fine tuning the procedure.