y = 10 sin(πx1x2) + 20(x3 − 0.5)2 + 10x4 + 5x5 + N(0, σ2) where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:
Tune several models on these data.
Let’s start with the K-Nearest Neighbor (KNN) model which classifies the target variable based on the k nearest (in Euclidean distance) neighbors in the training set and the final classification is decided by majority vote, with ties broken at random.
## k
## 3 9
## rsquared rmse
## 1 0.6701072 3.07972
The optimal number of neighbors for the KNN model which resulted in the smallest root mean squared error is 9. It has RMSE = 3.079 and R2 = 0.67. It explains the highest amount of variability. The top 5 informative predictors are X4, X1, X2, X5, X3 as can be seen below.
## RMSE Rsquared MAE
## 3.1172319 0.6556622 2.4899907
## sigma C
## 6 0.06509124 8
## rsquared rmse
## 1 0.8547582 1.918711
RMSE is used to select the optimal model using the smallest value. The best hyperparameter for the SVM model which results in the smallest root mean squared error is 8. The tuning parameter ‘sigma’ was held constant at a value of 0.065. It has RMSE = 1.91, and R2 = 0.854. In this case, it does account for the largest portion of the variability in the data than all other variables, and it produces the smallest error. Moreover, the top 5 informative predictors are X4, X1, X2, X5, and X3.
#plot(svmModel)
## nprune degree
## 43 15 2
## rsquared rmse
## 1 0.8841742 1.639073
RMSE was used to select the optimal model using the smallest value. The best tuned parameters for the MARS model which resulted in the smallest root mean squared error is with 2 degrees of interactions and the number of retained terms of 15. It has RMSE = 1.63, and R2 = 0.884. This accounts for the largest portion of the variability in the data than all other variables, and it produces the smallest error. The top 5 informative predictors are X1, X4, X2, X5 and X3.
## size decay bag
## 25 5 0.1 FALSE
## rsquared rmse
## 1 0.5644107 4.484215
RMSE is used to select the optimal model using the smallest value. The best tuned parameters for the NNET model which result in the smallest root mean squared error is with the number of units in the hidden layer being 5 and the regularization parameter to avoid over-fitting is 0.1. It has RMSE = 4.48, and R2 = 0.564. This accounts for the largest portion of the variability in the data than all other variables, and it produces the smallest error. Moreover, the top 5 informative predictors are X4, X1, X2, X5, and X3.
Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?
## RMSE Rsquared MAE
## KNN 3.117232 0.6556622 2.489991
## SVM 2.063191 0.8275736 1.566221
## MARS 1.158995 0.9460418 0.925023
## NNET 2.111396 0.8277556 1.573901
From the results above, it suggests that the MARS model explains a larger portion of the variability with X1-X5 informative predictors. It resulted in a root mean squared error that is the smallest among the models with the test data at RMSE = 1.15. It can therefore be stated that the Multivariate Adaptive Regression Splines model best fitts the training data than the K-Nearest Neighbors, Support Vector Machine, and Neural Networks models.
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Yield | 1 | 176 | 40.1765341 | 1.8456664 | 39.970 | 40.1150000 | 1.9718580 | 35.250 | 46.340 | 11.090 | 0.3109596 | -0.1132944 | 0.1391223 |
| BiologicalMaterial01 | 2 | 176 | 6.4114205 | 0.7139225 | 6.305 | 6.3933803 | 0.6745830 | 4.580 | 8.810 | 4.230 | 0.2733165 | 0.4567758 | 0.0538139 |
| BiologicalMaterial02 | 3 | 176 | 55.6887500 | 4.0345806 | 55.090 | 55.5810563 | 4.5812340 | 46.870 | 64.750 | 17.880 | 0.2441269 | -0.7050911 | 0.3041180 |
| BiologicalMaterial03 | 4 | 176 | 67.7050000 | 4.0010641 | 67.220 | 67.6780986 | 4.2773010 | 56.970 | 78.250 | 21.280 | 0.0285108 | -0.1235203 | 0.3015916 |
| BiologicalMaterial04 | 5 | 176 | 12.3492614 | 1.7746607 | 12.100 | 12.1860563 | 1.3714050 | 9.380 | 23.090 | 13.710 | 1.7323153 | 7.0564614 | 0.1337701 |
| BiologicalMaterial05 | 6 | 176 | 18.5986364 | 1.8441408 | 18.490 | 18.5488732 | 1.8829020 | 13.240 | 24.850 | 11.610 | 0.3040053 | 0.2198005 | 0.1390073 |
| BiologicalMaterial06 | 7 | 176 | 48.9103977 | 3.7460718 | 48.460 | 48.7379577 | 3.9437160 | 40.600 | 59.380 | 18.780 | 0.3685344 | -0.3654933 | 0.2823708 |
| BiologicalMaterial07 | 8 | 176 | 100.0141477 | 0.1077423 | 100.000 | 100.0000000 | 0.0000000 | 100.000 | 100.830 | 0.830 | 7.3986642 | 53.0417012 | 0.0081214 |
| BiologicalMaterial08 | 9 | 176 | 17.4947727 | 0.6769536 | 17.510 | 17.4687324 | 0.5930400 | 15.880 | 19.140 | 3.260 | 0.2200539 | 0.0627721 | 0.0510273 |
| BiologicalMaterial09 | 10 | 176 | 12.8500568 | 0.4151757 | 12.835 | 12.8635211 | 0.4225410 | 11.440 | 14.080 | 2.640 | -0.2684177 | 0.2927765 | 0.0312950 |
| BiologicalMaterial10 | 11 | 176 | 2.8006250 | 0.5991433 | 2.710 | 2.7328873 | 0.4003020 | 1.770 | 6.870 | 5.100 | 2.4023783 | 11.6471845 | 0.0451621 |
| BiologicalMaterial11 | 12 | 176 | 146.9531818 | 4.8204704 | 146.080 | 146.7863380 | 4.1142150 | 135.810 | 158.730 | 22.920 | 0.3588211 | 0.0162456 | 0.3633566 |
| BiologicalMaterial12 | 13 | 176 | 20.1998864 | 0.7735440 | 20.120 | 20.1776056 | 0.6671700 | 18.350 | 22.210 | 3.860 | 0.3038443 | 0.0146595 | 0.0583081 |
| ManufacturingProcess01 | 14 | 175 | 11.2074286 | 1.8224342 | 11.400 | 11.4078014 | 1.0378200 | 0.000 | 14.100 | 14.100 | -3.9201855 | 21.8688069 | 0.1377631 |
| ManufacturingProcess02 | 15 | 173 | 16.6826590 | 8.4715694 | 21.000 | 18.0575540 | 1.4826000 | 0.000 | 22.500 | 22.500 | -1.4307675 | 0.1062466 | 0.6440815 |
| ManufacturingProcess03 | 16 | 161 | 1.5395652 | 0.0223983 | 1.540 | 1.5410078 | 0.0148260 | 1.470 | 1.600 | 0.130 | -0.4799447 | 1.7280557 | 0.0017652 |
| ManufacturingProcess04 | 17 | 175 | 931.8514286 | 6.2744406 | 934.000 | 932.2836879 | 5.9304000 | 911.000 | 946.000 | 35.000 | -0.6979357 | 0.0631282 | 0.4743031 |
| ManufacturingProcess05 | 18 | 175 | 1001.6931429 | 30.5272134 | 999.200 | 998.6248227 | 17.3464200 | 923.000 | 1175.300 | 252.300 | 2.5872769 | 11.7446904 | 2.3076404 |
| ManufacturingProcess06 | 19 | 174 | 207.4017241 | 2.6993999 | 206.800 | 207.0928571 | 1.9273800 | 203.000 | 227.400 | 24.400 | 3.0419007 | 17.3764864 | 0.2046410 |
| ManufacturingProcess07 | 20 | 175 | 177.4800000 | 0.5010334 | 177.000 | 177.4751773 | 0.0000000 | 177.000 | 178.000 | 1.000 | 0.0793788 | -2.0050587 | 0.0378746 |
| ManufacturingProcess08 | 21 | 175 | 177.5542857 | 0.4984706 | 178.000 | 177.5673759 | 0.0000000 | 177.000 | 178.000 | 1.000 | -0.2165645 | -1.9642262 | 0.0376808 |
| ManufacturingProcess09 | 22 | 176 | 45.6601136 | 1.5464407 | 45.730 | 45.7188732 | 1.2157320 | 38.890 | 49.360 | 10.470 | -0.9406685 | 3.2701986 | 0.1165674 |
| ManufacturingProcess10 | 23 | 167 | 9.1790419 | 0.7666884 | 9.100 | 9.1318519 | 0.5930400 | 7.500 | 11.600 | 4.100 | 0.6492504 | 0.6317264 | 0.0593281 |
| ManufacturingProcess11 | 24 | 166 | 9.3855422 | 0.7157336 | 9.400 | 9.3932836 | 0.6671700 | 7.500 | 11.500 | 4.000 | -0.0193109 | 0.3227966 | 0.0555517 |
| ManufacturingProcess12 | 25 | 175 | 857.8114286 | 1784.5282624 | 0.000 | 516.1985816 | 0.0000000 | 0.000 | 4549.000 | 4549.000 | 1.5786729 | 0.4951353 | 134.8976569 |
| ManufacturingProcess13 | 26 | 176 | 34.5079545 | 1.0152800 | 34.600 | 34.5119718 | 0.8895600 | 32.100 | 38.600 | 6.500 | 0.4802776 | 1.9593883 | 0.0765296 |
| ManufacturingProcess14 | 27 | 175 | 4853.8685714 | 54.5236412 | 4856.000 | 4854.5744681 | 40.0302000 | 4701.000 | 5055.000 | 354.000 | -0.0109687 | 1.0781378 | 4.1215999 |
| ManufacturingProcess15 | 28 | 176 | 6038.9204545 | 58.3125023 | 6031.500 | 6035.5211268 | 40.7715000 | 5904.000 | 6233.000 | 329.000 | 0.6743478 | 1.2162163 | 4.3954702 |
| ManufacturingProcess16 | 29 | 176 | 4565.8011364 | 351.6973215 | 4588.000 | 4588.3591549 | 42.9954000 | 0.000 | 4852.000 | 4852.000 | -12.4202248 | 158.3981993 | 26.5101831 |
| ManufacturingProcess17 | 30 | 176 | 34.3437500 | 1.2482059 | 34.400 | 34.3126761 | 1.1860800 | 31.300 | 40.000 | 8.700 | 1.1629715 | 4.6626982 | 0.0940871 |
| ManufacturingProcess18 | 31 | 176 | 4809.6818182 | 367.4777364 | 4835.000 | 4837.0704225 | 34.8411000 | 0.000 | 4971.000 | 4971.000 | -12.7361378 | 163.7375845 | 27.6996766 |
| ManufacturingProcess19 | 32 | 176 | 6028.1988636 | 45.5785689 | 6022.000 | 6026.1549296 | 36.3237000 | 5890.000 | 6146.000 | 256.000 | 0.2973414 | 0.2962151 | 3.4356139 |
| ManufacturingProcess20 | 33 | 176 | 4556.4602273 | 349.0089784 | 4582.000 | 4580.9788732 | 42.9954000 | 0.000 | 4759.000 | 4759.000 | -12.6383268 | 162.0663905 | 26.3075416 |
| ManufacturingProcess21 | 34 | 176 | -0.1642045 | 0.7782930 | -0.300 | -0.2556338 | 0.4447800 | -1.800 | 3.600 | 5.400 | 1.7291140 | 5.0274763 | 0.0586660 |
| ManufacturingProcess22 | 35 | 175 | 5.4057143 | 3.3306262 | 5.000 | 5.2482270 | 4.4478000 | 0.000 | 12.000 | 12.000 | 0.3148909 | -1.0175458 | 0.2517717 |
| ManufacturingProcess23 | 36 | 175 | 3.0171429 | 1.6625499 | 3.000 | 2.9432624 | 1.4826000 | 0.000 | 6.000 | 6.000 | 0.1967985 | -0.9975572 | 0.1256770 |
| ManufacturingProcess24 | 37 | 175 | 8.8342857 | 5.7994224 | 8.000 | 8.5744681 | 7.4130000 | 0.000 | 23.000 | 23.000 | 0.3593200 | -1.0207362 | 0.4383951 |
| ManufacturingProcess25 | 38 | 171 | 4828.1754386 | 373.4810865 | 4855.000 | 4855.5620438 | 34.0998000 | 0.000 | 4990.000 | 4990.000 | -12.6310220 | 160.3293620 | 28.5608125 |
| ManufacturingProcess26 | 39 | 171 | 6015.5964912 | 464.8674900 | 6047.000 | 6048.5547445 | 38.5476000 | 0.000 | 6161.000 | 6161.000 | -12.6694398 | 160.9849144 | 35.5493055 |
| ManufacturingProcess27 | 40 | 171 | 4562.5087719 | 353.9848679 | 4587.000 | 4587.4452555 | 35.5824000 | 0.000 | 4710.000 | 4710.000 | -12.5174778 | 158.3931091 | 27.0698994 |
| ManufacturingProcess28 | 41 | 171 | 6.5918129 | 5.2489823 | 10.400 | 6.8248175 | 1.0378200 | 0.000 | 11.500 | 11.500 | -0.4556335 | -1.7907822 | 0.4013997 |
| ManufacturingProcess29 | 42 | 171 | 20.0111111 | 1.6638879 | 19.900 | 20.0437956 | 0.4447800 | 0.000 | 22.000 | 22.000 | -10.0848133 | 119.4378857 | 0.1272407 |
| ManufacturingProcess30 | 43 | 171 | 9.1614035 | 0.9760824 | 9.100 | 9.2145985 | 0.7413000 | 0.000 | 11.200 | 11.200 | -4.7557268 | 43.0848842 | 0.0746429 |
| ManufacturingProcess31 | 44 | 171 | 70.1847953 | 5.5557816 | 70.800 | 70.7240876 | 0.8895600 | 0.000 | 72.500 | 72.500 | -11.8231008 | 146.0094297 | 0.4248612 |
| ManufacturingProcess32 | 45 | 176 | 158.4659091 | 5.3972456 | 158.000 | 158.3380282 | 4.4478000 | 143.000 | 173.000 | 30.000 | 0.2112252 | 0.0602714 | 0.4068327 |
| ManufacturingProcess33 | 46 | 171 | 63.5438596 | 2.4833813 | 64.000 | 63.5474453 | 1.4826000 | 56.000 | 70.000 | 14.000 | -0.1310030 | 0.2740324 | 0.1899089 |
| ManufacturingProcess34 | 47 | 171 | 2.4935673 | 0.0543910 | 2.500 | 2.4927007 | 0.0000000 | 2.300 | 2.600 | 0.300 | -0.2634497 | 1.0013075 | 0.0041594 |
| ManufacturingProcess35 | 48 | 171 | 495.5964912 | 10.8196874 | 495.000 | 495.7445255 | 8.8956000 | 463.000 | 522.000 | 59.000 | -0.1556154 | 0.4130958 | 0.8274022 |
| ManufacturingProcess36 | 49 | 171 | 0.0195731 | 0.0008739 | 0.020 | 0.0195620 | 0.0014826 | 0.017 | 0.022 | 0.005 | 0.1453141 | -0.0557822 | 0.0000668 |
| ManufacturingProcess37 | 50 | 176 | 1.0136364 | 0.4450828 | 1.000 | 0.9964789 | 0.4447800 | 0.000 | 2.300 | 2.300 | 0.3783578 | 0.0698597 | 0.0335494 |
| ManufacturingProcess38 | 51 | 176 | 2.5340909 | 0.6493753 | 3.000 | 2.6126761 | 0.0000000 | 0.000 | 3.000 | 3.000 | -1.6818052 | 3.9189211 | 0.0489485 |
| ManufacturingProcess39 | 52 | 176 | 6.8511364 | 1.5054943 | 7.200 | 7.1718310 | 0.1482600 | 0.000 | 7.500 | 7.500 | -4.2691214 | 16.4987895 | 0.1134809 |
| ManufacturingProcess40 | 53 | 175 | 0.0177143 | 0.0382885 | 0.000 | 0.0099291 | 0.0000000 | 0.000 | 0.100 | 0.100 | 1.6768073 | 0.8164458 | 0.0028943 |
| ManufacturingProcess41 | 54 | 175 | 0.0237143 | 0.0538242 | 0.000 | 0.0106383 | 0.0000000 | 0.000 | 0.200 | 0.200 | 2.1686898 | 3.6290714 | 0.0040687 |
| ManufacturingProcess42 | 55 | 176 | 11.2062500 | 1.9416092 | 11.600 | 11.5429577 | 0.2965200 | 0.000 | 12.100 | 12.100 | -5.4500082 | 28.5288867 | 0.1463543 |
| ManufacturingProcess43 | 56 | 176 | 0.9119318 | 0.8679860 | 0.800 | 0.8077465 | 0.2965200 | 0.000 | 11.000 | 11.000 | 9.0548747 | 101.0332345 | 0.0654269 |
| ManufacturingProcess44 | 57 | 176 | 1.8051136 | 0.3220062 | 1.900 | 1.8549296 | 0.1482600 | 0.000 | 2.100 | 2.100 | -4.9703552 | 25.0876065 | 0.0242721 |
| ManufacturingProcess45 | 58 | 176 | 2.1380682 | 0.4069043 | 2.200 | 2.2042254 | 0.1482600 | 0.000 | 2.600 | 2.600 | -4.0779411 | 18.7565001 | 0.0306716 |
| variable | n_miss | pct_miss |
|---|---|---|
## X1 X2 value
## 1997 ManufacturingProcess26 ManufacturingProcess25 0.9975339
## 2052 ManufacturingProcess25 ManufacturingProcess26 0.9975339
## 2054 ManufacturingProcess27 ManufacturingProcess26 0.9960721
## 2109 ManufacturingProcess26 ManufacturingProcess27 0.9960721
## 1998 ManufacturingProcess27 ManufacturingProcess25 0.9934932
## 2108 ManufacturingProcess25 ManufacturingProcess27 0.9934932
## 1599 ManufacturingProcess20 ManufacturingProcess18 0.9917474
## 1709 ManufacturingProcess18 ManufacturingProcess20 0.9917474
## 2002 ManufacturingProcess31 ManufacturingProcess25 0.9706780
## 2332 ManufacturingProcess25 ManufacturingProcess31 0.9706780
Let’s train some non-linear models next.
## k
## 1 5
## rsquared rmse
## 1 0.4743023 1.355025
The top 5 important variables are ManufacturingProcess13, ManufacturingProcess32, BiologicalMaterial06, ManufacturingProcess17 and BiologicalMaterial03.
The best tune for the KNN model which resulted in the smallest root mean squared error is 5 nearest neighbors. It has RMSE = 1.35, and R2 = 0.47. This model accounts for the largest portion of the variability in the data than all other latent variables, as well as produces the smallest error. Moreover, the residuals are quite small and there are a few top informative predictors over 35%.
## sigma C
## 7 0.01328094 16
## rsquared rmse
## 1 0.6573563 1.083869
RMSE would be used to select the optimal model using the smallest value. The best parameter for the SVM model which resulted in the smallest root mean squared error is 16. The tuning parameter ‘sigma’ was held constant at a value of 0.013. It has RMSE = 1.08, and R2 = 0.65. In this case, it does account for the largest portion of the variability in the data than all other variables, and it produces the smallest error.
The top 5 important variables are ManufacturingProcess13, ManufacturingProcess32, BiologicalMaterial06, ManufacturingProcess17 and BiologicalMaterial03.
## nprune degree
## 4 5 1
## rsquared rmse
## 1 0.5170738 1.317713
RMSE is used to select the optimal model using the smallest value. The best parameter for the MARS model which resulted in the smallest root mean squared error is with 1 degree of interactions and the number of retained terms of 5. It has RMSE = 1.31, and R2 = 0.51. In this case, it does account for the largest portion of the variability in the data than all other variables, and it produces the smallest error. Also, there are only 3 top informative predictors in this case - ManufacturingProcess32, ManufacturingProcess09 and ManufacturingProcess13.
## size decay bag
## 5 5 0.01 FALSE
## rsquared rmse
## 1 0.3985338 1.68192
RMSE is used to select the optimal model using the smallest value. The best hyperparameters for the NNET model which result in the smallest root mean squared error is with the number of units in the hidden layer being 5 and the regularization parameter to avoid over-fitting is 0.01. It has RMSE = 1.68, and R2 = 0.39. In this case, it does account for the largest portion of the variability in the data than all other variables, even though that is only 39%, and it produces the smallest error.
The top 5 important variables are ManufacturingProcess13, ManufacturingProcess32, BiologicalMaterial06, ManufacturingProcess17 and BiologicalMaterial03.
Resampling Via resampling method, the performance metrics are calculated below and analyzed to select the model that best fits the data. The results below suggest that the SVM model had the largest mean R2 = 0.65 from the 10 sample cross-validations. Moreover, the SVM model also produced the smallest errors, RMSE = 1.08. It can therefore be stated that the SVM model best fits the data than the KNN, MARS, and NNET models.
##
## Call:
## summary.resamples(object = resamples(list(knn = knnModel, svm = svmModel,
## mars = marsModel, nnet = nnetModel)))
##
## Models: knn, svm, mars, nnet
## Number of resamples: 10
##
## MAE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## knn 0.7970000 1.0039667 1.1231429 1.0941885 1.1817286 1.279714 0
## svm 0.5528591 0.7857382 0.9124654 0.8615100 0.9403178 1.068548 0
## mars 0.7289428 0.8619740 0.9237133 0.9751312 1.1126344 1.230238 0
## nnet 0.8064562 1.2567945 1.4186034 1.4077181 1.5926685 1.933445 0
##
## RMSE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## knn 1.0521074 1.266880 1.316138 1.355025 1.476499 1.661273 0
## svm 0.8083948 1.004793 1.072562 1.083869 1.182588 1.435115 0
## mars 0.9865612 1.139639 1.163285 1.215740 1.364162 1.503633 0
## nnet 0.9927290 1.520522 1.705051 1.681920 1.895587 2.148641 0
##
## Rsquared
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## knn 0.1556778 0.2418591 0.5541821 0.4743023 0.6647513 0.7209630 0
## svm 0.4302169 0.5847018 0.6834006 0.6573563 0.7289782 0.8408634 0
## mars 0.3651698 0.4501465 0.6024610 0.5754682 0.6868517 0.7685032 0
## nnet 0.1631993 0.3515936 0.4320122 0.3985338 0.4704340 0.6250521 0
Now let’s calculate the prediction accuracy on the test data.
## $knn
## RMSE Rsquared MAE
## 1.1490426 0.5268106 0.9010625
##
## $svm
## RMSE Rsquared MAE
## 0.9209008 0.6970918 0.7439161
##
## $mars
## RMSE Rsquared MAE
## 1.0983490 0.5614890 0.8468836
##
## $nnet
## RMSE Rsquared MAE
## 1.9327283 0.3821101 1.3210742
From the results above, we can see that the SVM model predicts the test response with the best accuracy. It has R2 = 0.69 and RMSE = 0.92.
The list of important predictors is shown below.
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 46)
##
## Overall
## ManufacturingProcess13 100.00
## ManufacturingProcess32 99.64
## BiologicalMaterial06 83.61
## ManufacturingProcess17 78.70
## BiologicalMaterial03 77.88
## ManufacturingProcess36 74.36
## ManufacturingProcess09 69.04
## ManufacturingProcess06 59.92
## ManufacturingProcess33 51.79
## ManufacturingProcess11 46.74
## BiologicalMaterial08 42.75
## BiologicalMaterial11 42.55
## BiologicalMaterial01 42.47
## BiologicalMaterial09 40.49
## ManufacturingProcess02 37.58
## ManufacturingProcess12 36.38
## ManufacturingProcess30 32.79
## ManufacturingProcess20 27.36
## ManufacturingProcess15 23.26
## BiologicalMaterial10 23.22
As can be seen, the process variables dominate the list, with 8 variables in the top 10 and 5 variables in the next 10. In the previous assignment, we had found that the elastic net model provided the best fit.
Now, let’s re-check the key predictors for this model.
## glmnet variable importance
##
## only 20 most important variables shown (out of 46)
##
## Overall
## ManufacturingProcess32 0.84511
## ManufacturingProcess09 0.37092
## ManufacturingProcess13 0.34844
## ManufacturingProcess06 0.19245
## ManufacturingProcess15 0.13481
## ManufacturingProcess07 0.12152
## ManufacturingProcess17 0.12095
## ManufacturingProcess39 0.10437
## ManufacturingProcess37 0.10047
## ManufacturingProcess04 0.08981
## ManufacturingProcess34 0.08618
## BiologicalMaterial03 0.08296
## ManufacturingProcess36 0.08243
## BiologicalMaterial05 0.07407
## ManufacturingProcess44 0.05441
## ManufacturingProcess28 0.01030
## ManufacturingProcess19 0.00000
## ManufacturingProcess23 0.00000
## ManufacturingProcess16 0.00000
## BiologicalMaterial10 0.00000
Comparing the top 10 predictors according to the best linear and non-linear models, we see that 5 of them are common across both lists. These are: ManufacturingProcess13, ManufacturingProcess32, ManufacturingProcess17,
ManufacturingProcess09 and ManufacturingProcess17. For the remaining 5, the SVM model selects other process variables, while the elastic net model selects 2 biological variables and 3 process variables.
Let’s look at the correlations between the top predictors and the response variable i.e. yield for the data.
## [,1]
## ManufacturingProcess13 -0.5264633
## ManufacturingProcess32 0.6180817
## BiologicalMaterial06 0.5030526
## ManufacturingProcess17 -0.4550352
## BiologicalMaterial03 0.4755898
## ManufacturingProcess36 -0.5339554
## ManufacturingProcess09 0.5020150
## ManufacturingProcess06 0.4467170
## ManufacturingProcess33 0.4456051
## ManufacturingProcess11 0.3520161
From the above we can see that 7 of the top 10 predictors (for example: ManufacturingProcess32) are positively correlated with yield while 3 of the top 10 predictors (for example: ManufacturingProcess13 are negatively correlated with yield. For the positive coefficients, ManufacturingProcess32 improves the yield significantly,with a correlation coefficient of 61%, indicating the mean increase of the yield for every additional unit of ManufacturingProcess32. For the negative coefficients, ManufacturingProcess13 improves the yield the most with a correlation coefficient of -52%, indicating the mean decrease in the yield for every additional unit of ManufacturingProcess13.