The goal of this report is to better understand ABC Beverage company’s manufacturing process, specifically with analyzing predictive factors and generating the best predictive model of PH.
Data distribution of variables. There is some normal distribution and some skewedness amongst variables. We can see some bi-modal peaks as well.
## Rows: 2,571
## Columns: 33
## $ `Brand Code` <fct> B, A, B, A, A, A, A, B, B, B, B, B, B, B, B, B, C,…
## $ `Carb Volume` <dbl> 5.340000, 5.426667, 5.286667, 5.440000, 5.486667, …
## $ `Fill Ounces` <dbl> 23.96667, 24.00667, 24.06000, 24.00667, 24.31333, …
## $ `PC Volume` <dbl> 0.2633333, 0.2386667, 0.2633333, 0.2933333, 0.1113…
## $ `Carb Pressure` <dbl> 68.2, 68.4, 70.8, 63.0, 67.2, 66.6, 64.2, 67.6, 64…
## $ `Carb Temp` <dbl> 141.2, 139.6, 144.8, 132.6, 136.8, 138.4, 136.8, 1…
## $ PSC <dbl> 0.104, 0.124, 0.090, NA, 0.026, 0.090, 0.128, 0.15…
## $ `PSC Fill` <dbl> 0.26, 0.22, 0.34, 0.42, 0.16, 0.24, 0.40, 0.34, 0.…
## $ `PSC CO2` <dbl> 0.04, 0.04, 0.16, 0.04, 0.12, 0.04, 0.04, 0.04, 0.…
## $ `Mnf Flow` <dbl> -100, -100, -100, -100, -100, -100, -100, -100, -1…
## $ `Carb Pressure1` <dbl> 118.8, 121.6, 120.2, 115.2, 118.4, 119.6, 122.2, 1…
## $ `Fill Pressure` <dbl> 46.0, 46.0, 46.0, 46.4, 45.8, 45.6, 51.8, 46.8, 46…
## $ `Hyd Pressure1` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure2` <dbl> NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure3` <dbl> NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure4` <dbl> 118, 106, 82, 92, 92, 116, 124, 132, 90, 108, 94, …
## $ `Filler Level` <dbl> 121.2, 118.6, 120.0, 117.8, 118.6, 120.2, 123.4, 1…
## $ `Filler Speed` <dbl> 4002, 3986, 4020, 4012, 4010, 4014, NA, 1004, 4014…
## $ Temperature <dbl> 66.0, 67.6, 67.0, 65.6, 65.6, 66.2, 65.8, 65.2, 65…
## $ `Usage cont` <dbl> 16.18, 19.90, 17.76, 17.42, 17.68, 23.82, 20.74, 1…
## $ `Carb Flow` <dbl> 2932, 3144, 2914, 3062, 3054, 2948, 30, 684, 2902,…
## $ Density <dbl> 0.88, 0.92, 1.58, 1.54, 1.54, 1.52, 0.84, 0.84, 0.…
## $ MFR <dbl> 725.0, 726.8, 735.0, 730.6, 722.8, 738.8, NA, NA, …
## $ Balling <dbl> 1.398, 1.498, 3.142, 3.042, 3.042, 2.992, 1.298, 1…
## $ `Pressure Vacuum` <dbl> -4.0, -4.0, -3.8, -4.4, -4.4, -4.4, -4.4, -4.4, -4…
## $ PH <dbl> 8.36, 8.26, 8.94, 8.24, 8.26, 8.32, 8.40, 8.38, 8.…
## $ `Oxygen Filler` <dbl> 0.022, 0.026, 0.024, 0.030, 0.030, 0.024, 0.066, 0…
## $ `Bowl Setpoint` <dbl> 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, …
## $ `Pressure Setpoint` <dbl> 46.4, 46.8, 46.6, 46.0, 46.0, 46.0, 46.0, 46.0, 46…
## $ `Air Pressurer` <dbl> 142.6, 143.0, 142.0, 146.2, 146.2, 146.6, 146.2, 1…
## $ `Alch Rel` <dbl> 6.58, 6.56, 7.66, 7.14, 7.14, 7.16, 6.54, 6.52, 6.…
## $ `Carb Rel` <dbl> 5.32, 5.30, 5.84, 5.42, 5.44, 5.44, 5.38, 5.34, 5.…
## $ `Balling Lvl` <dbl> 1.48, 1.56, 3.28, 3.04, 3.04, 3.02, 1.44, 1.44, 1.…
## n mean sd median min max range skew
## Brand Code* 2451 2.51 1.00 2.00 1.00 4.00 3.00 0.38
## Carb Volume 2561 5.37 0.11 5.35 5.04 5.70 0.66 0.39
## Fill Ounces 2533 23.97 0.09 23.97 23.63 24.32 0.69 -0.02
## PC Volume 2532 0.28 0.06 0.27 0.08 0.48 0.40 0.34
## Carb Pressure 2544 68.19 3.54 68.20 57.00 79.40 22.40 0.18
## Carb Temp 2545 141.09 4.04 140.80 128.60 154.00 25.40 0.25
## PSC 2538 0.08 0.05 0.08 0.00 0.27 0.27 0.85
## PSC Fill 2548 0.20 0.12 0.18 0.00 0.62 0.62 0.93
## PSC CO2 2532 0.06 0.04 0.04 0.00 0.24 0.24 1.73
## Mnf Flow 2569 24.57 119.48 65.20 -100.20 229.40 329.60 0.00
## Carb Pressure1 2539 122.59 4.74 123.20 105.60 140.20 34.60 0.05
## Fill Pressure 2549 47.92 3.18 46.40 34.60 60.40 25.80 0.55
## Hyd Pressure1 2560 12.44 12.43 11.40 -0.80 58.00 58.80 0.78
## Hyd Pressure2 2556 20.96 16.39 28.60 0.00 59.40 59.40 -0.30
## Hyd Pressure3 2556 20.46 15.98 27.60 -1.20 50.00 51.20 -0.32
## Hyd Pressure4 2541 96.29 13.12 96.00 52.00 142.00 90.00 0.55
## Filler Level 2551 109.25 15.70 118.40 55.80 161.20 105.40 -0.85
## Filler Speed 2514 3687.20 770.82 3982.00 998.00 4030.00 3032.00 -2.87
## Temperature 2557 65.97 1.38 65.60 63.60 76.20 12.60 2.39
## Usage cont 2566 20.99 2.98 21.79 12.08 25.90 13.82 -0.54
## Carb Flow 2569 2468.35 1073.70 3028.00 26.00 5104.00 5078.00 -0.99
## Density 2570 1.17 0.38 0.98 0.24 1.92 1.68 0.53
## MFR 2359 704.05 73.90 724.00 31.40 868.60 837.20 -5.09
## Balling 2570 2.20 0.93 1.65 -0.17 4.01 4.18 0.59
## Pressure Vacuum 2571 -5.22 0.57 -5.40 -6.60 -3.60 3.00 0.53
## PH 2567 8.55 0.17 8.54 7.88 9.36 1.48 -0.29
## Oxygen Filler 2559 0.05 0.05 0.03 0.00 0.40 0.40 2.66
## Bowl Setpoint 2569 109.33 15.30 120.00 70.00 140.00 70.00 -0.97
## Pressure Setpoint 2559 47.62 2.04 46.00 44.00 52.00 8.00 0.20
## Air Pressurer 2571 142.83 1.21 142.60 140.80 148.20 7.40 2.25
## Alch Rel 2562 6.90 0.51 6.56 5.28 8.62 3.34 0.88
## Carb Rel 2561 5.44 0.13 5.40 4.96 6.06 1.10 0.50
## Balling Lvl 2570 2.05 0.87 1.48 0.00 3.66 3.66 0.59
## kurtosis
## Brand Code* -1.06
## Carb Volume -0.47
## Fill Ounces 0.86
## PC Volume 0.67
## Carb Pressure -0.01
## Carb Temp 0.24
## PSC 0.65
## PSC Fill 0.77
## PSC CO2 3.73
## Mnf Flow -1.87
## Carb Pressure1 0.14
## Fill Pressure 1.41
## Hyd Pressure1 -0.14
## Hyd Pressure2 -1.56
## Hyd Pressure3 -1.57
## Hyd Pressure4 0.63
## Filler Level 0.05
## Filler Speed 6.71
## Temperature 10.16
## Usage cont -1.02
## Carb Flow -0.58
## Density -1.20
## MFR 30.46
## Balling -1.39
## Pressure Vacuum -0.03
## PH 0.06
## Oxygen Filler 11.09
## Bowl Setpoint -0.06
## Pressure Setpoint -1.60
## Air Pressurer 4.73
## Alch Rel -0.85
## Carb Rel -0.29
## Balling Lvl -1.49
We can see how much missing data there is for each predictor.
## Brand Code Carb Volume Fill Ounces PC Volume
## 120 10 38 39
## Carb Pressure Carb Temp PSC PSC Fill
## 27 26 33 23
## PSC CO2 Mnf Flow Carb Pressure1 Fill Pressure
## 39 2 32 22
## Hyd Pressure1 Hyd Pressure2 Hyd Pressure3 Hyd Pressure4
## 11 15 15 30
## Filler Level Filler Speed Temperature Usage cont
## 20 57 14 5
## Carb Flow Density MFR Balling
## 2 1 212 1
## Pressure Vacuum PH Oxygen Filler Bowl Setpoint
## 0 4 12 2
## Pressure Setpoint Air Pressurer Alch Rel Carb Rel
## 12 0 9 10
## Balling Lvl
## 1
Some variables have a large amount of correlation. Here we are
looking at numeric variables.
Here we observe outliers in the data. We can see that variables Filler Speed and Carb Flow have greater outliers.
## Warning: Removed 724 rows containing non-finite values (`stat_boxplot()`).
## Warning: Number of logged events: 1
Here the data is partitioned between 70% training and 30% testing, based on independent variables and variables that depend on PH.
The data variability is only 39%.
##
## Call:
## lm(formula = y.studData ~ ., data = x.studData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.49946 -0.07937 0.01417 0.09057 0.77228
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.927e+02 1.995e+03 -0.247 0.804986
## Brand.Code.A -4.581e-02 1.833e-02 -2.500 0.012513 *
## Brand.Code.B -2.366e-02 2.769e-02 -0.855 0.392943
## Brand.Code.C -1.583e-01 2.837e-02 -5.581 2.76e-08 ***
## Brand.Code.D NA NA NA NA
## Carb.Volume -7.800e-02 8.457e-02 -0.922 0.356459
## Fill.Ounces -1.127e-01 3.889e-02 -2.898 0.003806 **
## PC.Volume -1.071e-01 1.009e-01 -1.061 0.288739
## Carb.Pressure 1.037e-01 6.303e-01 0.165 0.869352
## Carb.Temp 9.231e+02 3.675e+03 0.251 0.801687
## PSC -1.750e-01 6.839e-02 -2.559 0.010584 *
## PSC.Fill -8.037e-02 5.930e-02 -1.355 0.175464
## PSC.CO2 -9.963e-02 7.659e-02 -1.301 0.193481
## Mnf.Flow -6.776e-04 5.516e-05 -12.283 < 2e-16 ***
## Carb.Pressure1 3.013e-02 3.717e-03 8.106 9.63e-16 ***
## Fill.Pressure 1.329e+00 6.233e-01 2.132 0.033124 *
## Hyd.Pressure2 7.037e-03 1.277e-03 5.512 4.07e-08 ***
## Hyd.Pressure4 -1.521e-01 1.334e-01 -1.140 0.254513
## Filler.Speed 1.939e-07 9.209e-06 0.021 0.983201
## Temperature -8.800e-03 2.591e-03 -3.396 0.000699 ***
## Usage.cont -5.878e-03 1.372e-03 -4.285 1.92e-05 ***
## Carb.Flow 1.068e-06 4.726e-07 2.260 0.023973 *
## Density -5.052e-01 1.088e-01 -4.645 3.65e-06 ***
## MFR -2.943e-05 5.242e-05 -0.561 0.574633
## Pressure.Vacuum -2.935e-05 1.811e-04 -0.162 0.871317
## Oxygen.Filler -2.953e-01 7.750e-02 -3.810 0.000144 ***
## Bowl.Setpoint 1.927e-03 3.196e-04 6.030 1.99e-09 ***
## Pressure.Setpoint -8.456e-03 2.391e-03 -3.537 0.000416 ***
## Air.Pressurer 1.755e-03 2.768e-03 0.634 0.526226
## Alch.Rel 6.160e-02 2.254e-02 2.733 0.006342 **
## Carb.Rel 3.434e-02 5.640e-02 0.609 0.542668
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1355 on 1772 degrees of freedom
## Multiple R-squared: 0.4006, Adjusted R-squared: 0.3908
## F-statistic: 40.84 on 29 and 1772 DF, p-value: < 2.2e-16
We still see only 37% variability since R-squared is 0.37. ncomp = 20 was used to best tune the model. RMSE is here small.
## Partial Least Squares
##
## 1802 samples
## 30 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 0.1713245 0.02904804 0.1372979
## 2 0.1697293 0.04409482 0.1366806
## 3 0.1557012 0.19763254 0.1225521
## 4 0.1554119 0.20027558 0.1221232
## 5 0.1544327 0.20959762 0.1217433
## 6 0.1539928 0.21449336 0.1213981
## 7 0.1485014 0.26831439 0.1161607
## 8 0.1448417 0.30441999 0.1127449
## 9 0.1425598 0.32542137 0.1104592
## 10 0.1410537 0.33947245 0.1093773
## 11 0.1393603 0.35544460 0.1086594
## 12 0.1378897 0.36849570 0.1082404
## 13 0.1378106 0.36917934 0.1082171
## 14 0.1377927 0.36918346 0.1080344
## 15 0.1375947 0.37085743 0.1079677
## 16 0.1371678 0.37503941 0.1073795
## 17 0.1365672 0.38094020 0.1065596
## 18 0.1368167 0.37903405 0.1067908
## 19 0.1366654 0.38078891 0.1066763
## 20 0.1366556 0.38094217 0.1065843
##
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 20.
## ncomp
## 20 20
## ncomp RMSE Rsquared
## 1 20 0.1366556 0.3809422
## Rsquared RMSE
## 1 0.3809422 0.1366556
Variability is much better, covering close to 50% here because R-squared is higher: 0.47. nPrune = 26 is used to best tune the model.
## nprune degree
## 58 30 2
## Call: earth(x=data.frame[1802,30], y=c(8.26,8.24,8.3...), keepxy=TRUE,
## degree=2, nprune=30)
##
## coefficients
## (Intercept) 8.3728581
## Brand.Code.C -0.2855665
## h(0.199349-Mnf.Flow) 0.0015335
## h(Mnf.Flow-0.199349) 0.0013336
## h(68.4-Temperature) 0.0345019
## h(Temperature-68.4) 0.0195519
## h(Bowl.Setpoint-90) 0.0009516
## Brand.Code.C * h(0.190029-Hyd.Pressure2) -0.8571696
## Brand.Code.C * h(65.4-Temperature) -0.2284343
## Brand.Code.C * h(Density-0.365487) 1.4456453
## Brand.Code.C * h(0.365487-Density) 3.5458547
## Brand.Code.C * h(Pressure.Vacuum- -87.1375) 0.0046425
## Brand.Code.C * h(-87.1375-Pressure.Vacuum) 0.0020037
## h(0.199349-Mnf.Flow) * h(Pressure.Vacuum- -70.9711) -0.0000386
## h(0.199349-Mnf.Flow) * h(-70.9711-Pressure.Vacuum) -0.0000195
## h(0.199349-Mnf.Flow) * h(Air.Pressurer-143.8) -0.0003153
## h(Mnf.Flow-0.199349) * h(146.4-Air.Pressurer) -0.0003623
## h(0.199349-Mnf.Flow) * h(Alch.Rel-7.12) 0.0013575
## h(Mnf.Flow-0.199349) * h(Alch.Rel-7.16) 0.0019209
## h(68.4-Temperature) * h(Usage.cont-21.76) -0.0086194
## h(0.589456-Density) * h(Bowl.Setpoint-90) 0.0190281
## h(Density-0.589456) * h(Bowl.Setpoint-90) -0.0995469
##
## Selected 22 of 30 terms, and 10 of 30 predictors (nprune=30)
## Termination condition: RSq changed by less than 0.001 at 30 terms
## Importance: Mnf.Flow, Brand.Code.C, Alch.Rel, Air.Pressurer, Hyd.Pressure2, ...
## Number of terms at each degree of interaction: 1 6 15
## GCV 0.01548611 RSS 26.27356 GRSq 0.4861794 RSq 0.515699
## Rsquared RMSE
## 1 0.4581286 0.1282684
Data variability is getting much better here with R-squared = 0.51.
## Support Vector Machines with Radial Basis Function Kernel
##
## 1802 samples
## 30 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 0.1296327 0.4526929 0.09664701
## 0.50 0.1266832 0.4747169 0.09366711
## 1.00 0.1242716 0.4922324 0.09126480
## 2.00 0.1227377 0.5036585 0.08982547
## 4.00 0.1220135 0.5102544 0.08938673
## 8.00 0.1226486 0.5088387 0.08990746
## 16.00 0.1257302 0.4931350 0.09271659
## 32.00 0.1298161 0.4749069 0.09571814
## 64.00 0.1364763 0.4447209 0.10127156
## 128.00 0.1437624 0.4121773 0.10700773
##
## Tuning parameter 'sigma' was held constant at a value of 0.02244293
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.02244293 and C = 4.
## Length Class Mode
## 1 ksvm S4
## Rsquared RMSE
## 1 0.5102544 0.1220135
Coverage of variability in the data is only 40% b/c the R-squared value is 0.40. This is much lower than the Support Vector Machines model.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
## CART
##
## 1802 samples
## 30 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ...
## Resampling results across tuning parameters:
##
## cp RMSE Rsquared MAE
## 0.01187216 0.1364632 0.3874982 0.1060713
## 0.01216548 0.1360866 0.3904339 0.1056655
## 0.01285880 0.1372723 0.3796730 0.1061757
## 0.01386854 0.1380792 0.3708841 0.1069508
## 0.01439357 0.1394621 0.3585176 0.1084625
## 0.01771630 0.1401329 0.3493895 0.1100357
## 0.03004458 0.1423411 0.3280546 0.1114517
## 0.04275824 0.1460013 0.2930598 0.1149739
## 0.06388653 0.1520005 0.2365264 0.1189231
## 0.21183446 0.1660484 0.1684701 0.1316049
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.01216548.
## cp
## 2 0.01216548
## Rsquared RMSE
## 1 0.3904339 0.1360866
R-squared is .54 so data variability coverage is substantially better here, 54%.
## Stochastic Gradient Boosting
##
## 1802 samples
## 30 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ...
## Resampling results across tuning parameters:
##
## interaction.depth n.minobsinnode n.trees RMSE Rsquared MAE
## 5 5 100 0.1199091 0.5242820 0.09070372
## 5 5 200 0.1162544 0.5505175 0.08722424
## 5 5 300 0.1151418 0.5594973 0.08599164
## 5 5 400 0.1144831 0.5647117 0.08556658
## 5 5 500 0.1140816 0.5684267 0.08552277
## 5 5 600 0.1138611 0.5703591 0.08532112
## 5 5 700 0.1136130 0.5723102 0.08501534
## 5 5 800 0.1138383 0.5709910 0.08526792
## 5 5 900 0.1134374 0.5741716 0.08505661
## 5 5 1000 0.1133797 0.5747984 0.08508351
## 5 10 100 0.1198591 0.5248985 0.09020298
## 5 10 200 0.1169332 0.5461175 0.08715302
## 5 10 300 0.1147076 0.5626667 0.08534338
## 5 10 400 0.1137000 0.5704367 0.08459977
## 5 10 500 0.1134989 0.5724447 0.08428993
## 5 10 600 0.1132744 0.5744628 0.08400791
## 5 10 700 0.1129588 0.5769682 0.08382788
## 5 10 800 0.1128918 0.5777364 0.08384482
## 5 10 900 0.1131412 0.5760118 0.08386667
## 5 10 1000 0.1128383 0.5785628 0.08362460
## 10 5 100 0.1132872 0.5747481 0.08473948
## 10 5 200 0.1117614 0.5849693 0.08292171
## 10 5 300 0.1115873 0.5867385 0.08276946
## 10 5 400 0.1114115 0.5880941 0.08281834
## 10 5 500 0.1110481 0.5906765 0.08256492
## 10 5 600 0.1110415 0.5908941 0.08257919
## 10 5 700 0.1110486 0.5908621 0.08250908
## 10 5 800 0.1110940 0.5904492 0.08253846
## 10 5 900 0.1110153 0.5910191 0.08251189
## 10 5 1000 0.1109802 0.5912940 0.08252443
## 10 10 100 0.1142389 0.5671225 0.08499040
## 10 10 200 0.1120675 0.5829485 0.08303796
## 10 10 300 0.1117855 0.5851979 0.08271858
## 10 10 400 0.1118071 0.5850589 0.08270012
## 10 10 500 0.1117272 0.5859739 0.08258634
## 10 10 600 0.1116540 0.5865121 0.08253732
## 10 10 700 0.1117027 0.5861833 0.08260377
## 10 10 800 0.1117855 0.5855298 0.08269215
## 10 10 900 0.1119326 0.5845723 0.08279999
## 10 10 1000 0.1119575 0.5843887 0.08283964
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1000, interaction.depth
## = 10, shrinkage = 0.1 and n.minobsinnode = 5.
## n.trees interaction.depth shrinkage n.minobsinnode
## 30 1000 10 0.1 5
## Rsquared RMSE
## 1 0.5777364 0.1128918
R-squared value is substantially better than all previous models with .64.
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Cubist
##
## 1802 samples
## 30 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared MAE
## 1 0 0.1311210 0.4783302 0.09187410
## 1 5 0.1311210 0.4783302 0.09187410
## 1 9 0.1311210 0.4783302 0.09187410
## 10 0 0.1081583 0.6144236 0.08017720
## 10 5 0.1081583 0.6144236 0.08017720
## 10 9 0.1081583 0.6144236 0.08017720
## 20 0 0.1065343 0.6282301 0.07868027
## 20 5 0.1065343 0.6282301 0.07868027
## 20 9 0.1065343 0.6282301 0.07868027
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 0.
## committees neighbors
## 7 20 0
## Rsquared RMSE
## 1 0.6282301 0.1065343
R-squared here has the best value, of 0.67; 67% coverage of data variability. RMSE of .10 is also a better value here. This is the best model so far.
## Random Forest
##
## 1802 samples
## 30 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1621, 1622, 1621, 1622, 1621, 1622, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 0.1167803 0.5981781 0.08917200
## 5 0.1087382 0.6377895 0.08103808
## 8 0.1062868 0.6479370 0.07842360
## 11 0.1053870 0.6495279 0.07717344
## 14 0.1044003 0.6535880 0.07631755
## 17 0.1041866 0.6526610 0.07569956
## 20 0.1042424 0.6508385 0.07538026
## 23 0.1041572 0.6497579 0.07532373
## 26 0.1040774 0.6486613 0.07516248
## 30 0.1040685 0.6476317 0.07505072
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 30.
## mtry
## 10 30
## Rsquared RMSE
## 1 0.6476317 0.1040685
## rf variable importance
##
## only 20 most important variables shown (out of 30)
##
## Overall
## Mnf.Flow 100.000
## Brand.Code.C 33.690
## Oxygen.Filler 22.146
## Air.Pressurer 21.648
## Alch.Rel 20.987
## Pressure.Vacuum 19.354
## Density 16.166
## Temperature 15.511
## Carb.Pressure1 14.294
## Carb.Flow 14.146
## Usage.cont 13.885
## Carb.Rel 13.175
## Filler.Speed 10.882
## Fill.Ounces 8.665
## PC.Volume 8.632
## Bowl.Setpoint 8.399
## MFR 6.793
## Hyd.Pressure2 6.646
## Fill.Pressure 6.325
## Carb.Volume 6.078
R-squared value 0.60 here has not changed much but is still high. Random Forest still has the best R-squared and RMSE values, making it the best model.
## [21:18:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## eXtreme Gradient Boosting
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1443, 1442, 1443, 1441, 1439
## Resampling results across tuning parameters:
##
## max_depth colsample_bytree nrounds RMSE Rsquared MAE
## 10 0.5 100 0.1101306 0.5985438 0.08073673
## 10 0.5 200 0.1095880 0.6023085 0.08013195
## 10 0.6 100 0.1116859 0.5876458 0.08105096
## 10 0.6 200 0.1112895 0.5902825 0.08053533
## 10 0.7 100 0.1101404 0.5996566 0.08073727
## 10 0.7 200 0.1096444 0.6029766 0.08017499
## 10 0.8 100 0.1103995 0.5969781 0.07978522
## 10 0.8 200 0.1097685 0.6013542 0.07922762
## 10 0.9 100 0.1116225 0.5879890 0.08035341
## 10 0.9 200 0.1114361 0.5892884 0.08010886
## 15 0.5 100 0.1128714 0.5791446 0.08236451
## 15 0.5 200 0.1127293 0.5799052 0.08221163
## 15 0.6 100 0.1116849 0.5871629 0.08172433
## 15 0.6 200 0.1115435 0.5878909 0.08154356
## 15 0.7 100 0.1117988 0.5869304 0.08116509
## 15 0.7 200 0.1117237 0.5871620 0.08101293
## 15 0.8 100 0.1115817 0.5895437 0.08083917
## 15 0.8 200 0.1115216 0.5896520 0.08070959
## 15 0.9 100 0.1102138 0.5992230 0.07993898
## 15 0.9 200 0.1102105 0.5989384 0.07983175
## 20 0.5 100 0.1116027 0.5887801 0.08188634
## 20 0.5 200 0.1115135 0.5890656 0.08176921
## 20 0.6 100 0.1127241 0.5810126 0.08150658
## 20 0.6 200 0.1126268 0.5813337 0.08136949
## 20 0.7 100 0.1127185 0.5806046 0.08123352
## 20 0.7 200 0.1126354 0.5808332 0.08112543
## 20 0.8 100 0.1112426 0.5914091 0.08063050
## 20 0.8 200 0.1111642 0.5916313 0.08054062
## 20 0.9 100 0.1136288 0.5753711 0.08167075
## 20 0.9 200 0.1136113 0.5751702 0.08157242
## 25 0.5 100 0.1101120 0.6009393 0.08051768
## 25 0.5 200 0.1100148 0.6012815 0.08039126
## 25 0.6 100 0.1129253 0.5798910 0.08191238
## 25 0.6 200 0.1128338 0.5801636 0.08177973
## 25 0.7 100 0.1113004 0.5912984 0.08080646
## 25 0.7 200 0.1112228 0.5914509 0.08069122
## 25 0.8 100 0.1110339 0.5925923 0.08092132
## 25 0.8 200 0.1109708 0.5926770 0.08082922
## 25 0.9 100 0.1130638 0.5774251 0.08144635
## 25 0.9 200 0.1130320 0.5773058 0.08134804
##
## Tuning parameter 'eta' was held constant at a value of 0.1
## Tuning
## parameter 'min_child_weight' was held constant at a value of 1
##
## Tuning parameter 'subsample' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nrounds = 200, max_depth = 10, eta
## = 0.1, gamma = 0, colsample_bytree = 0.5, min_child_weight = 1 and subsample
## = 1.
## nrounds max_depth eta gamma colsample_bytree min_child_weight subsample
## 2 200 10 0.1 0 0.5 1 1
## Rsquared RMSE
## 1 0.5876458 0.1116859
The Random Forest model is selected because it performed the best, with the highest R-squared value.
We can further see that Random Forest is the best performing model based on R-squared values; RMSE values are more less similar among the models, mainly tree models. The PH values are 8 or above, suggesting that this is the average PH of beverages in this manufacturing process based on the predictor variables from the data.
## RMSE Rsquared MAE
## PLS 0.13488381 0.3763929 0.10647830
## MARS 0.12966561 0.4292487 0.09772683
## SVM 0.11875390 0.5214721 0.08501554
## SingTree 0.13273943 0.4018938 0.10168252
## RandFrst 0.09692278 0.6828094 0.06908257
## Boosting 0.10462754 0.6275910 0.07875976
## Cubist 0.10262016 0.6423931 0.07484190
Here we remove PH from the eval data, predict the finanl PH values,
and make PH predictions.
## [1] 8.572175 8.548214 8.548472 8.558963 8.518947 8.523043