Question: A chemical manufacturing process for a pharmaceutical product was discussed in Sect. 1.4. In this problem, the objective is to understand the relationship between biological measurements of the raw materials (predictors), measurements of the manufacturing process (predictors), and the response of product yield. Biological predictors cannot be changed but can be used to assess the quality of the raw material before processing. On the other hand, manufacturing process predictors can be changed in the manufacturing process. Improving product yield by 1% will boost revenue by approximately one hundred thousand dollars per batch:

  1. The matrix processPredictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.
 library(AppliedPredictiveModeling)
## Warning: package 'AppliedPredictiveModeling' was built under R version 3.5.3
library(VIM)
## Warning: package 'VIM' was built under R version 3.5.3
## Loading required package: colorspace
## Warning: package 'colorspace' was built under R version 3.5.3
## Loading required package: grid
## Loading required package: data.table
## Warning: package 'data.table' was built under R version 3.5.3
## VIM is ready to use. 
##  Since version 4.0.0 the GUI is in its own package VIMGUI.
## 
##           Please use the package to use the new (and old) GUI.
## Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues
## 
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
## 
##     sleep
library(caret)
## Warning: package 'caret' was built under R version 3.5.3
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.5.3
data(ChemicalManufacturingProcess)
  1. A small percentage of cells in the predictor set contain missing values. Use an imputation function to fill in these missing values (e.g., see Sect. 3.8).

We wil user kNN imputation method to impute the missing values from VIM package.

summary(ChemicalManufacturingProcess)
##      Yield       BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
##  Min.   :35.25   Min.   :4.580        Min.   :46.87        Min.   :56.97       
##  1st Qu.:38.75   1st Qu.:5.978        1st Qu.:52.68        1st Qu.:64.98       
##  Median :39.97   Median :6.305        Median :55.09        Median :67.22       
##  Mean   :40.18   Mean   :6.411        Mean   :55.69        Mean   :67.70       
##  3rd Qu.:41.48   3rd Qu.:6.870        3rd Qu.:58.74        3rd Qu.:70.43       
##  Max.   :46.34   Max.   :8.810        Max.   :64.75        Max.   :78.25       
##                                                                                
##  BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
##  Min.   : 9.38        Min.   :13.24        Min.   :40.60       
##  1st Qu.:11.24        1st Qu.:17.23        1st Qu.:46.05       
##  Median :12.10        Median :18.49        Median :48.46       
##  Mean   :12.35        Mean   :18.60        Mean   :48.91       
##  3rd Qu.:13.22        3rd Qu.:19.90        3rd Qu.:51.34       
##  Max.   :23.09        Max.   :24.85        Max.   :59.38       
##                                                                
##  BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
##  Min.   :100.0        Min.   :15.88        Min.   :11.44       
##  1st Qu.:100.0        1st Qu.:17.06        1st Qu.:12.60       
##  Median :100.0        Median :17.51        Median :12.84       
##  Mean   :100.0        Mean   :17.49        Mean   :12.85       
##  3rd Qu.:100.0        3rd Qu.:17.88        3rd Qu.:13.13       
##  Max.   :100.8        Max.   :19.14        Max.   :14.08       
##                                                                
##  BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
##  Min.   :1.770        Min.   :135.8        Min.   :18.35       
##  1st Qu.:2.460        1st Qu.:143.8        1st Qu.:19.73       
##  Median :2.710        Median :146.1        Median :20.12       
##  Mean   :2.801        Mean   :147.0        Mean   :20.20       
##  3rd Qu.:2.990        3rd Qu.:149.6        3rd Qu.:20.75       
##  Max.   :6.870        Max.   :158.7        Max.   :22.21       
##                                                                
##  ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
##  Min.   : 0.00          Min.   : 0.00          Min.   :1.47          
##  1st Qu.:10.80          1st Qu.:19.30          1st Qu.:1.53          
##  Median :11.40          Median :21.00          Median :1.54          
##  Mean   :11.21          Mean   :16.68          Mean   :1.54          
##  3rd Qu.:12.15          3rd Qu.:21.50          3rd Qu.:1.55          
##  Max.   :14.10          Max.   :22.50          Max.   :1.60          
##  NA's   :1              NA's   :3              NA's   :15            
##  ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
##  Min.   :911.0          Min.   : 923.0         Min.   :203.0         
##  1st Qu.:928.0          1st Qu.: 986.8         1st Qu.:205.7         
##  Median :934.0          Median : 999.2         Median :206.8         
##  Mean   :931.9          Mean   :1001.7         Mean   :207.4         
##  3rd Qu.:936.0          3rd Qu.:1008.9         3rd Qu.:208.7         
##  Max.   :946.0          Max.   :1175.3         Max.   :227.4         
##  NA's   :1              NA's   :1              NA's   :2             
##  ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
##  Min.   :177.0          Min.   :177.0          Min.   :38.89         
##  1st Qu.:177.0          1st Qu.:177.0          1st Qu.:44.89         
##  Median :177.0          Median :178.0          Median :45.73         
##  Mean   :177.5          Mean   :177.6          Mean   :45.66         
##  3rd Qu.:178.0          3rd Qu.:178.0          3rd Qu.:46.52         
##  Max.   :178.0          Max.   :178.0          Max.   :49.36         
##  NA's   :1              NA's   :1                                    
##  ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
##  Min.   : 7.500         Min.   : 7.500         Min.   :   0.0        
##  1st Qu.: 8.700         1st Qu.: 9.000         1st Qu.:   0.0        
##  Median : 9.100         Median : 9.400         Median :   0.0        
##  Mean   : 9.179         Mean   : 9.386         Mean   : 857.8        
##  3rd Qu.: 9.550         3rd Qu.: 9.900         3rd Qu.:   0.0        
##  Max.   :11.600         Max.   :11.500         Max.   :4549.0        
##  NA's   :9              NA's   :10             NA's   :1             
##  ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
##  Min.   :32.10          Min.   :4701           Min.   :5904          
##  1st Qu.:33.90          1st Qu.:4828           1st Qu.:6010          
##  Median :34.60          Median :4856           Median :6032          
##  Mean   :34.51          Mean   :4854           Mean   :6039          
##  3rd Qu.:35.20          3rd Qu.:4882           3rd Qu.:6061          
##  Max.   :38.60          Max.   :5055           Max.   :6233          
##                         NA's   :1                                    
##  ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
##  Min.   :   0           Min.   :31.30          Min.   :   0          
##  1st Qu.:4561           1st Qu.:33.50          1st Qu.:4813          
##  Median :4588           Median :34.40          Median :4835          
##  Mean   :4566           Mean   :34.34          Mean   :4810          
##  3rd Qu.:4619           3rd Qu.:35.10          3rd Qu.:4862          
##  Max.   :4852           Max.   :40.00          Max.   :4971          
##                                                                      
##  ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
##  Min.   :5890           Min.   :   0           Min.   :-1.8000       
##  1st Qu.:6001           1st Qu.:4553           1st Qu.:-0.6000       
##  Median :6022           Median :4582           Median :-0.3000       
##  Mean   :6028           Mean   :4556           Mean   :-0.1642       
##  3rd Qu.:6050           3rd Qu.:4610           3rd Qu.: 0.0000       
##  Max.   :6146           Max.   :4759           Max.   : 3.6000       
##                                                                      
##  ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
##  Min.   : 0.000         Min.   :0.000          Min.   : 0.000        
##  1st Qu.: 3.000         1st Qu.:2.000          1st Qu.: 4.000        
##  Median : 5.000         Median :3.000          Median : 8.000        
##  Mean   : 5.406         Mean   :3.017          Mean   : 8.834        
##  3rd Qu.: 8.000         3rd Qu.:4.000          3rd Qu.:14.000        
##  Max.   :12.000         Max.   :6.000          Max.   :23.000        
##  NA's   :1              NA's   :1              NA's   :1             
##  ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
##  Min.   :   0           Min.   :   0           Min.   :   0          
##  1st Qu.:4832           1st Qu.:6020           1st Qu.:4560          
##  Median :4855           Median :6047           Median :4587          
##  Mean   :4828           Mean   :6016           Mean   :4563          
##  3rd Qu.:4877           3rd Qu.:6070           3rd Qu.:4609          
##  Max.   :4990           Max.   :6161           Max.   :4710          
##  NA's   :5              NA's   :5              NA's   :5             
##  ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
##  Min.   : 0.000         Min.   : 0.00          Min.   : 0.000        
##  1st Qu.: 0.000         1st Qu.:19.70          1st Qu.: 8.800        
##  Median :10.400         Median :19.90          Median : 9.100        
##  Mean   : 6.592         Mean   :20.01          Mean   : 9.161        
##  3rd Qu.:10.750         3rd Qu.:20.40          3rd Qu.: 9.700        
##  Max.   :11.500         Max.   :22.00          Max.   :11.200        
##  NA's   :5              NA's   :5              NA's   :5             
##  ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
##  Min.   : 0.00          Min.   :143.0          Min.   :56.00         
##  1st Qu.:70.10          1st Qu.:155.0          1st Qu.:62.00         
##  Median :70.80          Median :158.0          Median :64.00         
##  Mean   :70.18          Mean   :158.5          Mean   :63.54         
##  3rd Qu.:71.40          3rd Qu.:162.0          3rd Qu.:65.00         
##  Max.   :72.50          Max.   :173.0          Max.   :70.00         
##  NA's   :5                                     NA's   :5             
##  ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
##  Min.   :2.300          Min.   :463.0          Min.   :0.01700       
##  1st Qu.:2.500          1st Qu.:490.0          1st Qu.:0.01900       
##  Median :2.500          Median :495.0          Median :0.02000       
##  Mean   :2.494          Mean   :495.6          Mean   :0.01957       
##  3rd Qu.:2.500          3rd Qu.:501.5          3rd Qu.:0.02000       
##  Max.   :2.600          Max.   :522.0          Max.   :0.02200       
##  NA's   :5              NA's   :5              NA's   :5             
##  ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
##  Min.   :0.000          Min.   :0.000          Min.   :0.000         
##  1st Qu.:0.700          1st Qu.:2.000          1st Qu.:7.100         
##  Median :1.000          Median :3.000          Median :7.200         
##  Mean   :1.014          Mean   :2.534          Mean   :6.851         
##  3rd Qu.:1.300          3rd Qu.:3.000          3rd Qu.:7.300         
##  Max.   :2.300          Max.   :3.000          Max.   :7.500         
##                                                                      
##  ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
##  Min.   :0.00000        Min.   :0.00000        Min.   : 0.00         
##  1st Qu.:0.00000        1st Qu.:0.00000        1st Qu.:11.40         
##  Median :0.00000        Median :0.00000        Median :11.60         
##  Mean   :0.01771        Mean   :0.02371        Mean   :11.21         
##  3rd Qu.:0.00000        3rd Qu.:0.00000        3rd Qu.:11.70         
##  Max.   :0.10000        Max.   :0.20000        Max.   :12.10         
##  NA's   :1              NA's   :1                                    
##  ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
##  Min.   : 0.0000        Min.   :0.000          Min.   :0.000         
##  1st Qu.: 0.6000        1st Qu.:1.800          1st Qu.:2.100         
##  Median : 0.8000        Median :1.900          Median :2.200         
##  Mean   : 0.9119        Mean   :1.805          Mean   :2.138         
##  3rd Qu.: 1.0250        3rd Qu.:1.900          3rd Qu.:2.300         
##  Max.   :11.0000        Max.   :2.100          Max.   :2.600         
## 
impu_data <- kNN(ChemicalManufacturingProcess, imp_var = FALSE)


summary((ChemicalManufacturingProcess$ManufacturingProcess02))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   19.30   21.00   16.68   21.50   22.50       3
summary(impu_data$ManufacturingProcess02)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   19.30   21.00   16.76   21.50   22.50
data.frame("OLD"=ChemicalManufacturingProcess$ManufacturingProcess02, 
            "Imputed"=impu_data$ManufacturingProcess02)
##      OLD Imputed
## 1     NA    21.0
## 2    0.0     0.0
## 3    0.0     0.0
## 4    0.0     0.0
## 5    0.0     0.0
## 6    0.0     0.0
## 7    0.0     0.0
## 8    0.0     0.0
## 9    0.0     0.0
## 10   0.0     0.0
## 11   0.0     0.0
## 12   0.0     0.0
## 13   0.0     0.0
## 14   0.0     0.0
## 15   0.0     0.0
## 16   0.0     0.0
## 17   0.0     0.0
## 18   0.0     0.0
## 19   0.0     0.0
## 20   0.0     0.0
## 21   0.0     0.0
## 22   0.0     0.0
## 23   0.0     0.0
## 24   0.0     0.0
## 25   0.0     0.0
## 26   0.0     0.0
## 27   0.0     0.0
## 28   0.0     0.0
## 29   0.0     0.0
## 30   0.0     0.0
## 31   0.0     0.0
## 32   0.0     0.0
## 33   0.0     0.0
## 34   0.0     0.0
## 35   0.0     0.0
## 36   0.0     0.0
## 37  19.7    19.7
## 38  19.9    19.9
## 39  19.3    19.3
## 40  19.5    19.5
## 41  19.3    19.3
## 42  22.5    22.5
## 43  20.5    20.5
## 44  21.5    21.5
## 45  20.5    20.5
## 46  20.5    20.5
## 47  20.5    20.5
## 48  20.0    20.0
## 49  18.0    18.0
## 50  19.0    19.0
## 51  18.0    18.0
## 52  19.5    19.5
## 53  19.5    19.5
## 54  19.5    19.5
## 55  19.5    19.5
## 56  19.5    19.5
## 57  19.5    19.5
## 58  19.5    19.5
## 59  18.0    18.0
## 60  20.0    20.0
## 61  19.0    19.0
## 62  20.0    20.0
## 63  19.5    19.5
## 64  19.5    19.5
## 65  20.0    20.0
## 66  19.5    19.5
## 67  19.5    19.5
## 68  19.5    19.5
## 69  20.0    20.0
## 70  19.0    19.0
## 71  19.0    19.0
## 72  19.0    19.0
## 73  19.5    19.5
## 74  21.5    21.5
## 75  22.2    22.2
## 76  22.0    22.0
## 77  22.5    22.5
## 78  21.5    21.5
## 79  21.5    21.5
## 80  22.0    22.0
## 81  22.0    22.0
## 82  22.0    22.0
## 83  20.5    20.5
## 84  21.0    21.0
## 85  22.0    22.0
## 86  21.0    21.0
## 87  21.5    21.5
## 88  21.5    21.5
## 89  21.5    21.5
## 90  21.5    21.5
## 91  21.7    21.7
## 92  22.0    22.0
## 93  21.5    21.5
## 94  21.5    21.5
## 95  21.5    21.5
## 96  22.0    22.0
## 97  22.0    22.0
## 98  20.9    20.9
## 99  22.0    22.0
## 100 21.0    21.0
## 101 21.5    21.5
## 102 21.9    21.9
## 103 21.7    21.7
## 104 21.6    21.6
## 105 21.8    21.8
## 106 20.8    20.8
## 107 22.0    22.0
## 108 21.9    21.9
## 109 22.4    22.4
## 110 22.0    22.0
## 111 20.5    20.5
## 112 22.2    22.2
## 113 22.3    22.3
## 114 22.0    22.0
## 115 21.2    21.2
## 116 21.1    21.1
## 117 21.0    21.0
## 118 21.0    21.0
## 119 20.9    20.9
## 120 21.1    21.1
## 121 21.2    21.2
## 122 21.5    21.5
## 123 21.2    21.2
## 124 20.8    20.8
## 125 20.9    20.9
## 126 21.2    21.2
## 127 21.3    21.3
## 128 21.3    21.3
## 129 21.4    21.4
## 130 21.5    21.5
## 131 21.4    21.4
## 132 21.5    21.5
## 133 21.2    21.2
## 134   NA    21.4
## 135 21.4    21.4
## 136 21.3    21.3
## 137 21.3    21.3
## 138 21.6    21.6
## 139   NA    20.9
## 140 21.3    21.3
## 141 21.2    21.2
## 142 21.2    21.2
## 143 21.4    21.4
## 144 21.4    21.4
## 145 21.4    21.4
## 146 21.6    21.6
## 147 21.6    21.6
## 148 21.4    21.4
## 149 21.4    21.4
## 150 21.4    21.4
## 151 21.1    21.1
## 152 21.5    21.5
## 153 21.7    21.7
## 154 21.3    21.3
## 155 21.2    21.2
## 156 21.3    21.3
## 157 21.0    21.0
## 158 21.2    21.2
## 159 21.4    21.4
## 160 21.3    21.3
## 161 21.5    21.5
## 162 21.1    21.1
## 163 21.0    21.0
## 164 21.2    21.2
## 165 21.2    21.2
## 166 21.2    21.2
## 167 21.2    21.2
## 168 20.0    20.0
## 169 20.8    20.8
## 170 19.9    19.9
## 171 20.0    20.0
## 172 21.5    21.5
## 173 21.5    21.5
## 174 20.4    20.4
## 175 21.6    21.6
## 176 20.8    20.8
  1. Split the data into a training and a test set, pre-process the data, and tune a model of your choice from this chapter. What is the optimal value of the performance metric?
n <- nrow(impu_data)
i.training <- sort(sample(n,round(n*0.8)))
L.training <- impu_data[i.training,]
L.test  <- impu_data[-i.training,]



X_train <- L.training[,-1]
Y_train <- L.training[,1]

X_test <- L.test[,-1]
Y_test <- L.test[,1]

ctrl <- trainControl(method = "cv", number = 10)
 model_lm <- lm(Yield~.,data=L.training )
# model_lm <- train(x = X_train, y = Y_train,
#                     method = "lm", 
#                     trControl = ctrl)

summary(model_lm)
## 
## Call:
## lm(formula = Yield ~ ., data = L.training)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.81632 -0.49431 -0.06162  0.48675  2.19256 
## 
## Coefficients: (1 not defined because of singularities)
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             6.855e+01  2.099e+02   0.327 0.744751    
## BiologicalMaterial01    3.407e-01  3.979e-01   0.856 0.394245    
## BiologicalMaterial02   -1.860e-01  1.499e-01  -1.241 0.218073    
## BiologicalMaterial03    5.356e-01  3.095e-01   1.730 0.087269 .  
## BiologicalMaterial04   -4.979e-01  6.468e-01  -0.770 0.443579    
## BiologicalMaterial05    2.422e-01  1.262e-01   1.918 0.058449 .  
## BiologicalMaterial06   -3.594e-01  3.957e-01  -0.908 0.366402    
## BiologicalMaterial07   -1.803e+00  1.187e+00  -1.518 0.132694    
## BiologicalMaterial08    9.236e-01  7.973e-01   1.158 0.249993    
## BiologicalMaterial09   -2.204e+00  1.769e+00  -1.246 0.216266    
## BiologicalMaterial10    9.662e-01  1.679e+00   0.576 0.566451    
## BiologicalMaterial11   -1.208e-01  9.490e-02  -1.272 0.206709    
## BiologicalMaterial12    8.466e-01  7.290e-01   1.161 0.248803    
## ManufacturingProcess01  8.844e-02  1.088e-01   0.813 0.418652    
## ManufacturingProcess02 -9.604e-03  6.142e-02  -0.156 0.876123    
## ManufacturingProcess03 -4.287e+00  6.737e+00  -0.636 0.526299    
## ManufacturingProcess04  6.468e-02  3.600e-02   1.797 0.075967 .  
## ManufacturingProcess05  8.987e-04  4.405e-03   0.204 0.838853    
## ManufacturingProcess06 -1.584e-03  4.955e-02  -0.032 0.974574    
## ManufacturingProcess07 -2.312e-01  2.471e-01  -0.936 0.352127    
## ManufacturingProcess08 -9.281e-02  3.378e-01  -0.275 0.784210    
## ManufacturingProcess09  2.221e-01  2.073e-01   1.072 0.286940    
## ManufacturingProcess10  2.259e-01  6.229e-01   0.363 0.717743    
## ManufacturingProcess11  5.030e-01  8.072e-01   0.623 0.534882    
## ManufacturingProcess12  1.593e-04  1.395e-04   1.142 0.256645    
## ManufacturingProcess13 -3.586e-01  4.603e-01  -0.779 0.438077    
## ManufacturingProcess14  4.716e-03  1.175e-02   0.401 0.689193    
## ManufacturingProcess15  2.474e-04  1.163e-02   0.021 0.983085    
## ManufacturingProcess16  2.627e-04  5.335e-04   0.493 0.623648    
## ManufacturingProcess17  1.099e-02  3.860e-01   0.028 0.977360    
## ManufacturingProcess18  8.869e-03  6.270e-03   1.415 0.160897    
## ManufacturingProcess19 -9.901e-03  1.246e-02  -0.795 0.428930    
## ManufacturingProcess20 -2.624e-03  1.039e-02  -0.252 0.801302    
## ManufacturingProcess21         NA         NA      NA       NA    
## ManufacturingProcess22 -3.883e-03  4.977e-02  -0.078 0.937990    
## ManufacturingProcess23 -2.561e-03  1.018e-01  -0.025 0.979988    
## ManufacturingProcess24 -3.015e-02  2.726e-02  -1.106 0.271886    
## ManufacturingProcess25  2.042e-02  2.590e-02   0.788 0.432643    
## ManufacturingProcess26  1.031e-02  1.416e-02   0.728 0.468573    
## ManufacturingProcess27 -1.413e-02  1.298e-02  -1.088 0.279585    
## ManufacturingProcess28 -1.317e-01  3.981e-02  -3.307 0.001387 ** 
## ManufacturingProcess29  8.536e-01  1.602e+00   0.533 0.595511    
## ManufacturingProcess30  7.183e-01  1.476e+00   0.487 0.627846    
## ManufacturingProcess31  8.014e-02  1.296e-01   0.619 0.537869    
## ManufacturingProcess32  2.875e-01  7.361e-02   3.906 0.000189 ***
## ManufacturingProcess33 -2.954e-01  1.452e-01  -2.035 0.045020 *  
## ManufacturingProcess34 -3.541e-01  3.172e+00  -0.112 0.911370    
## ManufacturingProcess35 -9.016e-03  1.922e-02  -0.469 0.640182    
## ManufacturingProcess36  2.794e+02  3.451e+02   0.810 0.420408    
## ManufacturingProcess37 -4.519e-01  3.496e-01  -1.293 0.199666    
## ManufacturingProcess38 -5.445e-01  3.022e-01  -1.802 0.075181 .  
## ManufacturingProcess39  2.857e-02  1.609e-01   0.178 0.859478    
## ManufacturingProcess40  2.308e+00  7.962e+00   0.290 0.772601    
## ManufacturingProcess41 -6.233e-01  5.650e+00  -0.110 0.912414    
## ManufacturingProcess42  1.229e-01  2.531e-01   0.486 0.628398    
## ManufacturingProcess43  2.330e-01  4.020e-01   0.580 0.563721    
## ManufacturingProcess44 -1.908e-01  1.412e+00  -0.135 0.892840    
## ManufacturingProcess45  1.073e+00  6.444e-01   1.665 0.099599 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.055 on 84 degrees of freedom
## Multiple R-squared:  0.7948, Adjusted R-squared:  0.6581 
## F-statistic: 5.812 on 56 and 84 DF,  p-value: 3.462e-13
# # The train function generates a resampling estimate of performance. Because
# the training set size is not small, 10-fold cross-validation should produce
# reasonable estimates of model performance. The function trainControl specifies
# the type of resampling:
ctrl <- trainControl(method = "cv", number = 10)
model_lm1 <- train(x = X_train, y = Y_train, method = "lm", trControl = ctrl)
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
model_lm1
## Linear Regression 
## 
## 141 samples
##  57 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 128, 126, 126, 126, 128, 126, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   1.598888  0.4552606  1.222505
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE
xyplot(Y_train ~ predict(model_lm1),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

 xyplot(resid(model_lm1) ~ predict(model_lm1),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

 # To build a smaller model without predictors with extremely high correlations,
 
corThresh <- .9
tooHigh <- findCorrelation(cor(X_train), corThresh)
print(paste0(names(X_train)[tooHigh]))
## [1] "BiologicalMaterial02"   "ManufacturingProcess26" "BiologicalMaterial11"  
## [4] "BiologicalMaterial04"   "ManufacturingProcess11" "ManufacturingProcess20"
## [7] "ManufacturingProcess42" "ManufacturingProcess40"
corrPred <- names(X_train)[tooHigh]
X_train_no_cor <- X_train[, -tooHigh]
X_test_no_cor <- X_test[, -tooHigh]
model_lm1_no_cor <- train(X_train_no_cor, Y_train, method = "lm",
 trControl = ctrl)
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
model_lm1_no_cor
## Linear Regression 
## 
## 141 samples
##  49 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 128, 127, 125, 126, 127, 128, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   1.430235  0.4855718  1.112113
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE
xyplot(Y_train ~ predict(model_lm1_no_cor),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

 xyplot(resid(model_lm1_no_cor) ~ predict(model_lm1_no_cor),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

 #PLS
 # Useing train perform  to perfrom  pre-process and tuning together. The function first preprocess the training set by centering it and scaling it. Then the function uses 10-fold cross validation to try the number of components, i.e. latent variables, of the PLS model from 1 to 20.
 model_pls_no_cor <- train(x=X_train_no_cor, y=Y_train,
                     method = "pls",
                     tuneLength = 20,
                     metric='Rsquared',
                     trControl = ctrl,
                     preProc = c("center", "scale"))


model_pls_no_cor
## Partial Least Squares 
## 
## 141 samples
##  49 predictor
## 
## Pre-processing: centered (49), scaled (49) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 127, 128, 127, 128, 126, 128, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE     
##    1     1.352559  0.4237399  1.119060
##    2     1.306717  0.4916857  1.038064
##    3     1.280462  0.5205842  1.026928
##    4     1.358211  0.5082162  1.066627
##    5     1.496745  0.4939937  1.098108
##    6     1.535209  0.4863527  1.133805
##    7     1.595225  0.4877445  1.140873
##    8     1.665308  0.4821861  1.160681
##    9     1.728322  0.4855999  1.166969
##   10     1.760620  0.4812682  1.168557
##   11     1.772001  0.4801923  1.175224
##   12     1.803225  0.4745867  1.182951
##   13     1.795151  0.4771398  1.177729
##   14     1.802661  0.4816548  1.176691
##   15     1.806402  0.4852451  1.181149
##   16     1.802705  0.4839794  1.178522
##   17     1.785123  0.4848508  1.170325
##   18     1.748027  0.4905052  1.154816
##   19     1.706958  0.4938117  1.139354
##   20     1.667764  0.4959363  1.127674
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 3.
summary(model_pls_no_cor)
## Data:    X dimension: 141 49 
##  Y dimension: 141 1
## Fit method: oscorespls
## Number of components considered: 3
## TRAINING: % variance explained
##           1 comps  2 comps  3 comps
## X           15.42    27.72    35.52
## .outcome    48.31    59.06    65.74
#enet
# The optimal Lasso model had fraction = 0.25 and lambda = 0.1
enetGrid <- expand.grid(.lambda = c(0, 0.01, .1),
                        .fraction = seq(.05, 1, length = 20))

model_ener_no_cor <- train(x=X_train_no_cor, y=Y_train,
                      method = "enet",
                      tuneGrid = enetGrid,
                      trControl = ctrl,
                      preProc = c("center", "scale"))
model_ener_no_cor 
## Elasticnet 
## 
## 141 samples
##  49 predictor
## 
## Pre-processing: centered (49), scaled (49) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 126, 126, 126, 129, 127, 126, ... 
## Resampling results across tuning parameters:
## 
##   lambda  fraction  RMSE      Rsquared   MAE      
##   0.00    0.05      1.360451  0.5649771  1.0987539
##   0.00    0.10      1.203622  0.5969312  0.9915067
##   0.00    0.15      1.150417  0.6336913  0.9491414
##   0.00    0.20      1.121107  0.6619976  0.9166624
##   0.00    0.25      1.202878  0.6444083  0.9416305
##   0.00    0.30      1.299302  0.6280623  0.9653624
##   0.00    0.35      1.317518  0.6218992  0.9787508
##   0.00    0.40      1.348375  0.6134200  0.9992157
##   0.00    0.45      1.389790  0.6013857  1.0241425
##   0.00    0.50      1.425620  0.5882141  1.0481056
##   0.00    0.55      1.455330  0.5771522  1.0682668
##   0.00    0.60      1.467587  0.5691349  1.0815281
##   0.00    0.65      1.458131  0.5622400  1.0837084
##   0.00    0.70      1.478584  0.5498588  1.1011325
##   0.00    0.75      1.482400  0.5425422  1.1109586
##   0.00    0.80      1.492231  0.5364563  1.1215561
##   0.00    0.85      1.503322  0.5306192  1.1329810
##   0.00    0.90      1.513973  0.5257701  1.1440097
##   0.00    0.95      1.521371  0.5227101  1.1525218
##   0.00    1.00      1.529539  0.5202044  1.1606609
##   0.01    0.05      1.487815  0.5512335  1.2082114
##   0.01    0.10      1.293331  0.5909135  1.0516703
##   0.01    0.15      1.202649  0.6016451  0.9958021
##   0.01    0.20      1.170199  0.6136118  0.9677014
##   0.01    0.25      1.147817  0.6299576  0.9508650
##   0.01    0.30      1.127903  0.6473051  0.9294303
##   0.01    0.35      1.127934  0.6547241  0.9247375
##   0.01    0.40      1.144058  0.6538066  0.9351020
##   0.01    0.45      1.238617  0.6216153  0.9651042
##   0.01    0.50      1.301059  0.6085674  0.9815664
##   0.01    0.55      1.342435  0.6023693  0.9931510
##   0.01    0.60      1.362799  0.5984926  1.0056976
##   0.01    0.65      1.381257  0.5926075  1.0205206
##   0.01    0.70      1.418299  0.5834401  1.0430805
##   0.01    0.75      1.448181  0.5745608  1.0631203
##   0.01    0.80      1.468924  0.5674403  1.0776766
##   0.01    0.85      1.486189  0.5616798  1.0891677
##   0.01    0.90      1.502679  0.5562293  1.1000668
##   0.01    0.95      1.517548  0.5517330  1.1109006
##   0.01    1.00      1.515870  0.5492250  1.1163279
##   0.10    0.05      1.591250  0.5022393  1.2909029
##   0.10    0.10      1.442471  0.5613759  1.1695561
##   0.10    0.15      1.320348  0.5891172  1.0713270
##   0.10    0.20      1.232800  0.5988620  1.0139788
##   0.10    0.25      1.199567  0.6007628  0.9959314
##   0.10    0.30      1.177890  0.6075256  0.9764046
##   0.10    0.35      1.164760  0.6157855  0.9624020
##   0.10    0.40      1.151994  0.6264089  0.9506656
##   0.10    0.45      1.139487  0.6377487  0.9395184
##   0.10    0.50      1.136233  0.6440727  0.9360874
##   0.10    0.55      1.149283  0.6438029  0.9482037
##   0.10    0.60      1.203148  0.6250287  0.9717133
##   0.10    0.65      1.260161  0.6103432  0.9906007
##   0.10    0.70      1.297422  0.6024558  1.0034303
##   0.10    0.75      1.326758  0.5963442  1.0144447
##   0.10    0.80      1.359087  0.5904091  1.0271307
##   0.10    0.85      1.390280  0.5850707  1.0400624
##   0.10    0.90      1.418278  0.5814148  1.0512445
##   0.10    0.95      1.446559  0.5781043  1.0619710
##   0.10    1.00      1.473148  0.5748002  1.0718274
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 0.2 and lambda = 0.
test_model <- function(modelName,predData){
options(warn=-1)      #turn off warnings
predicted_result <- predict(modelName, predData)
options(warn=1)  

#We can collect the observed and predicted values into a data frame, then use
# the caret function defaultSummary to estimate the test set performance
DT_model_lm_pred <- data.frame(obs=Y_test,pred=predicted_result)
return(defaultSummary(DT_model_lm_pred))
}
  1. Predict the response for the test set. What is the value of the performance metric and how does this compare with the resampled performance metric on the training set?
model_lm1$results[,2:4]
##       RMSE  Rsquared      MAE
## 1 1.598888 0.4552606 1.222505
test_model(model_lm1,X_test)
##         RMSE     Rsquared          MAE 
## 21.796607337  0.002641053  5.433700967
model_lm1_no_cor$results[,2:4]
##       RMSE  Rsquared      MAE
## 1 1.430235 0.4855718 1.112113
test_model(model_lm1_no_cor,X_test)
##         RMSE     Rsquared          MAE 
## 11.478834110  0.006228559  3.648167475
model_pls_no_cor$results[3,2:4]
##       RMSE  Rsquared      MAE
## 3 1.280462 0.5205842 1.026928
test_model(model_pls_no_cor,X_test_no_cor)
##      RMSE  Rsquared       MAE 
## 1.3619794 0.5796786 1.0054176
model_ener_no_cor$results[2,2:4]
##    fraction     RMSE  Rsquared
## 21     0.05 1.487815 0.5512335
test_model(model_ener_no_cor,X_test_no_cor)
##      RMSE  Rsquared       MAE 
## 1.4990011 0.4716679 1.0660539
  1. Which predictors are most important in the model you have trained? Do either the biological or process predictors dominate the list?
model_pls_no_cor$finalModel$coefficients
## , , 1 comps
## 
##                             .outcome
## BiologicalMaterial01    0.0811917468
## BiologicalMaterial03    0.1106421140
## BiologicalMaterial05    0.0622645198
## BiologicalMaterial06    0.1171179762
## BiologicalMaterial07   -0.0257500462
## BiologicalMaterial08    0.0875351619
## BiologicalMaterial09    0.0147886029
## BiologicalMaterial10    0.0415764598
## BiologicalMaterial12    0.0869532281
## ManufacturingProcess01 -0.0222539211
## ManufacturingProcess02 -0.0387653149
## ManufacturingProcess03 -0.0266235258
## ManufacturingProcess04 -0.0641538334
## ManufacturingProcess05  0.0281483292
## ManufacturingProcess06  0.0979870252
## ManufacturingProcess07 -0.0091753827
## ManufacturingProcess08 -0.0002137590
## ManufacturingProcess09  0.1193612782
## ManufacturingProcess10  0.0608774544
## ManufacturingProcess12  0.1007042305
## ManufacturingProcess13 -0.1353524933
## ManufacturingProcess14 -0.0107695509
## ManufacturingProcess15  0.0440265789
## ManufacturingProcess16 -0.0109265287
## ManufacturingProcess17 -0.1141015834
## ManufacturingProcess18 -0.0326011136
## ManufacturingProcess19  0.0287368824
## ManufacturingProcess21 -0.0074632652
## ManufacturingProcess22  0.0110370864
## ManufacturingProcess23 -0.0247043182
## ManufacturingProcess24 -0.0570364481
## ManufacturingProcess25 -0.0200951042
## ManufacturingProcess27 -0.0367147669
## ManufacturingProcess28  0.0571427772
## ManufacturingProcess29  0.0866245715
## ManufacturingProcess30  0.0720875065
## ManufacturingProcess31 -0.0820753867
## ManufacturingProcess32  0.1681311671
## ManufacturingProcess33  0.1143910170
## ManufacturingProcess34  0.0466278258
## ManufacturingProcess35 -0.0486929273
## ManufacturingProcess36 -0.1534944068
## ManufacturingProcess37 -0.0379425395
## ManufacturingProcess38 -0.0218278231
## ManufacturingProcess39  0.0135977527
## ManufacturingProcess41  0.0008307844
## ManufacturingProcess43  0.0459291539
## ManufacturingProcess44  0.0196029935
## ManufacturingProcess45  0.0099836621
## 
## , , 2 comps
## 
##                            .outcome
## BiologicalMaterial01    0.038728848
## BiologicalMaterial03    0.125294125
## BiologicalMaterial05    0.045689593
## BiologicalMaterial06    0.101309761
## BiologicalMaterial07   -0.068979671
## BiologicalMaterial08    0.039333790
## BiologicalMaterial09   -0.011806305
## BiologicalMaterial10   -0.011027118
## BiologicalMaterial12    0.045596186
## ManufacturingProcess01  0.011953281
## ManufacturingProcess02  0.026584953
## ManufacturingProcess03 -0.028776611
## ManufacturingProcess04 -0.005191144
## ManufacturingProcess05 -0.003271607
## ManufacturingProcess06  0.140708135
## ManufacturingProcess07 -0.012400682
## ManufacturingProcess08  0.023415046
## ManufacturingProcess09  0.188809339
## ManufacturingProcess10  0.060623045
## ManufacturingProcess12  0.132390227
## ManufacturingProcess13 -0.225619811
## ManufacturingProcess14 -0.021704397
## ManufacturingProcess15  0.042825857
## ManufacturingProcess16 -0.016027994
## ManufacturingProcess17 -0.221037575
## ManufacturingProcess18 -0.073063809
## ManufacturingProcess19 -0.005623751
## ManufacturingProcess21 -0.064868419
## ManufacturingProcess22  0.030744952
## ManufacturingProcess23 -0.019827781
## ManufacturingProcess24 -0.051491311
## ManufacturingProcess25 -0.057138217
## ManufacturingProcess27 -0.093870931
## ManufacturingProcess28 -0.010603997
## ManufacturingProcess29  0.069599750
## ManufacturingProcess30  0.097386951
## ManufacturingProcess31 -0.075013646
## ManufacturingProcess32  0.254288589
## ManufacturingProcess33  0.144151279
## ManufacturingProcess34  0.110326436
## ManufacturingProcess35 -0.054750386
## ManufacturingProcess36 -0.222137286
## ManufacturingProcess37 -0.102647668
## ManufacturingProcess38 -0.018903746
## ManufacturingProcess39  0.061484189
## ManufacturingProcess41 -0.016901673
## ManufacturingProcess43  0.028799213
## ManufacturingProcess44  0.069634004
## ManufacturingProcess45  0.051078205
## 
## , , 3 comps
## 
##                            .outcome
## BiologicalMaterial01    0.016519671
## BiologicalMaterial03    0.138197492
## BiologicalMaterial05    0.061528275
## BiologicalMaterial06    0.097940280
## BiologicalMaterial07   -0.129465235
## BiologicalMaterial08    0.006614475
## BiologicalMaterial09   -0.061456843
## BiologicalMaterial10   -0.041986656
## BiologicalMaterial12    0.013111877
## ManufacturingProcess01  0.033499363
## ManufacturingProcess02  0.044413703
## ManufacturingProcess03 -0.011359615
## ManufacturingProcess04  0.086305210
## ManufacturingProcess05 -0.029540856
## ManufacturingProcess06  0.149073712
## ManufacturingProcess07 -0.033639851
## ManufacturingProcess08  0.037474073
## ManufacturingProcess09  0.196218539
## ManufacturingProcess10  0.017759283
## ManufacturingProcess12  0.091342167
## ManufacturingProcess13 -0.244466778
## ManufacturingProcess14  0.030057896
## ManufacturingProcess15  0.097093565
## ManufacturingProcess16 -0.032197078
## ManufacturingProcess17 -0.249101118
## ManufacturingProcess18 -0.019991511
## ManufacturingProcess19  0.052289701
## ManufacturingProcess21 -0.086605459
## ManufacturingProcess22  0.042136105
## ManufacturingProcess23 -0.021173787
## ManufacturingProcess24 -0.043474516
## ManufacturingProcess25  0.008451286
## ManufacturingProcess27 -0.072953203
## ManufacturingProcess28 -0.093560592
## ManufacturingProcess29  0.118542636
## ManufacturingProcess30  0.061625205
## ManufacturingProcess31 -0.043179763
## ManufacturingProcess32  0.389278924
## ManufacturingProcess33  0.197038942
## ManufacturingProcess34  0.186939703
## ManufacturingProcess35 -0.048284180
## ManufacturingProcess36 -0.316363155
## ManufacturingProcess37 -0.179202964
## ManufacturingProcess38 -0.014733380
## ManufacturingProcess39  0.135910129
## ManufacturingProcess41 -0.046145686
## ManufacturingProcess43  0.027099839
## ManufacturingProcess44  0.135790131
## ManufacturingProcess45  0.113996708
# it appears that ManufacturingProcess are more important. Alternatively, varImp function can be used to rank the importance of predictors:
varImp(model_ener_no_cor)
## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 49)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess13   88.93
## ManufacturingProcess36   83.35
## ManufacturingProcess17   76.17
## BiologicalMaterial06     69.06
## BiologicalMaterial03     61.18
## BiologicalMaterial12     55.41
## ManufacturingProcess09   54.30
## ManufacturingProcess31   48.38
## ManufacturingProcess06   46.80
## ManufacturingProcess33   46.29
## ManufacturingProcess12   35.88
## BiologicalMaterial08     33.97
## BiologicalMaterial09     33.38
## ManufacturingProcess27   32.82
## ManufacturingProcess18   30.36
## ManufacturingProcess29   26.55
## ManufacturingProcess25   25.90
## BiologicalMaterial01     25.46
## ManufacturingProcess01   24.56
varImp(model_pls_no_cor)
## Warning: package 'pls' was built under R version 3.5.3
## 
## Attaching package: 'pls'
## The following object is masked from 'package:caret':
## 
##     R2
## The following object is masked from 'package:stats':
## 
##     loadings
## pls variable importance
## 
##   only 20 most important variables shown (out of 49)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess36   87.83
## ManufacturingProcess13   75.86
## ManufacturingProcess17   67.67
## ManufacturingProcess09   64.68
## ManufacturingProcess33   60.89
## BiologicalMaterial06     57.25
## BiologicalMaterial03     54.53
## ManufacturingProcess12   53.39
## ManufacturingProcess06   50.99
## BiologicalMaterial08     48.03
## BiologicalMaterial12     46.95
## ManufacturingProcess29   45.21
## BiologicalMaterial01     43.46
## ManufacturingProcess04   41.56
## ManufacturingProcess31   40.61
## ManufacturingProcess28   38.42
## ManufacturingProcess30   37.90
## ManufacturingProcess34   32.22
## BiologicalMaterial05     30.58

Looking at only 3 comps, The Manufacturing Process seems to have the most importance, as generally their scores are higher than the Biological Materials. ManufacturingProcess32 has the highest score at 0.3687089330.

The evaluation on the test sets seems to suggest that the PLS model is best, with R^2 = 0.7202954 Here we noted that when we apply all the models on not correalted data then RMSE and Rsquared for bothe test and train PLS model is better compare to other model. Train: RMSE : 1.666406 Rsquared :0.4722788 TEST: RMSE : 1.0391511 Rsquared :0.7202954

13 out of the 20 in the list are ManufacturingProcess predictors, which makes it more important than BiologicalMaterial.

  1. Explore the relationships between each of the top predictors and the response. How could this information be helpful in improving yield in future runs of the manufacturing process?

We can compare the non-zero coefficients, Elastic net is a linear regression model. The coefficients directly explain how the predictors affect the target. Positive coefficients improve the yield, while negative coefficients decrease the yield.

coeffs <- elasticnet::predict.enet(model_ener_no_cor$finalModel, s=model_ener_no_cor$bestTune[1, "fraction"], type="coef", mode="fraction")$coefficients

# We can compare the non-zero coefficients by taking their absolute value, and then sorting them:
coeffs.sorted <- abs(coeffs) 
coeffs.sorted <- coeffs.sorted[coeffs.sorted>0]
(coeffs.sorted <- sort(coeffs.sorted, decreasing = T))
## ManufacturingProcess32 ManufacturingProcess09 ManufacturingProcess13 
##           0.9154689661           0.3505546312           0.2680540630 
## ManufacturingProcess17 ManufacturingProcess28 ManufacturingProcess29 
##           0.2504668291           0.2135184795           0.2047236372 
## ManufacturingProcess39   BiologicalMaterial05 ManufacturingProcess04 
##           0.2039668257           0.1604384955           0.1570178065 
## ManufacturingProcess37 ManufacturingProcess34   BiologicalMaterial03 
##           0.1483763891           0.1451908440           0.1318903648 
## ManufacturingProcess45 ManufacturingProcess36 ManufacturingProcess07 
##           0.0991057249           0.0748144128           0.0618134692 
## ManufacturingProcess35 ManufacturingProcess03 ManufacturingProcess06 
##           0.0594179081           0.0408640219           0.0226108232 
## ManufacturingProcess15 ManufacturingProcess01   BiologicalMaterial12 
##           0.0217963478           0.0167725117           0.0128089987 
##   BiologicalMaterial10   BiologicalMaterial07 ManufacturingProcess44 
##           0.0101762033           0.0066443935           0.0004514611
coeffs.mp <-  coeffs[names(coeffs.sorted[grep('ManufacturingProcess', names(coeffs.sorted))])] 
coeffs.mp[coeffs.mp>0]
## ManufacturingProcess32 ManufacturingProcess09 ManufacturingProcess29 
##           0.9154689661           0.3505546312           0.2047236372 
## ManufacturingProcess39 ManufacturingProcess04 ManufacturingProcess34 
##           0.2039668257           0.1570178065           0.1451908440 
## ManufacturingProcess45 ManufacturingProcess06 ManufacturingProcess15 
##           0.0991057249           0.0226108232           0.0217963478 
## ManufacturingProcess01 ManufacturingProcess44 
##           0.0167725117           0.0004514611
coeffs.mp[coeffs.mp<0]
## ManufacturingProcess13 ManufacturingProcess17 ManufacturingProcess28 
##            -0.26805406            -0.25046683            -0.21351848 
## ManufacturingProcess37 ManufacturingProcess36 ManufacturingProcess07 
##            -0.14837639            -0.07481441            -0.06181347 
## ManufacturingProcess35 ManufacturingProcess03 
##            -0.05941791            -0.04086402

For the ManufacturingProcess having the negative coefficients, we would change the process so that it would decrease the Yeald. Similarly ManufacturingProcess with surge in coefficients would help in increasitng the yeald.