Summary

The goal of this report is to better understand ABC Beverage company’s manufacturing process, specifically with analyzing predictive factors and generating the best predictive model of PH.

Data Acquisition

Data Exploration

Histogram

Data distribution of variables. There is some normal distribution and some skewedness amongst variables. We can see some bi-modal peaks as well.

## Rows: 2,571
## Columns: 33
## $ `Brand Code`        <fct> B, A, B, A, A, A, A, B, B, B, B, B, B, B, B, B, C,…
## $ `Carb Volume`       <dbl> 5.340000, 5.426667, 5.286667, 5.440000, 5.486667, …
## $ `Fill Ounces`       <dbl> 23.96667, 24.00667, 24.06000, 24.00667, 24.31333, …
## $ `PC Volume`         <dbl> 0.2633333, 0.2386667, 0.2633333, 0.2933333, 0.1113…
## $ `Carb Pressure`     <dbl> 68.2, 68.4, 70.8, 63.0, 67.2, 66.6, 64.2, 67.6, 64…
## $ `Carb Temp`         <dbl> 141.2, 139.6, 144.8, 132.6, 136.8, 138.4, 136.8, 1…
## $ PSC                 <dbl> 0.104, 0.124, 0.090, NA, 0.026, 0.090, 0.128, 0.15…
## $ `PSC Fill`          <dbl> 0.26, 0.22, 0.34, 0.42, 0.16, 0.24, 0.40, 0.34, 0.…
## $ `PSC CO2`           <dbl> 0.04, 0.04, 0.16, 0.04, 0.12, 0.04, 0.04, 0.04, 0.…
## $ `Mnf Flow`          <dbl> -100, -100, -100, -100, -100, -100, -100, -100, -1…
## $ `Carb Pressure1`    <dbl> 118.8, 121.6, 120.2, 115.2, 118.4, 119.6, 122.2, 1…
## $ `Fill Pressure`     <dbl> 46.0, 46.0, 46.0, 46.4, 45.8, 45.6, 51.8, 46.8, 46…
## $ `Hyd Pressure1`     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure2`     <dbl> NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure3`     <dbl> NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure4`     <dbl> 118, 106, 82, 92, 92, 116, 124, 132, 90, 108, 94, …
## $ `Filler Level`      <dbl> 121.2, 118.6, 120.0, 117.8, 118.6, 120.2, 123.4, 1…
## $ `Filler Speed`      <dbl> 4002, 3986, 4020, 4012, 4010, 4014, NA, 1004, 4014…
## $ Temperature         <dbl> 66.0, 67.6, 67.0, 65.6, 65.6, 66.2, 65.8, 65.2, 65…
## $ `Usage cont`        <dbl> 16.18, 19.90, 17.76, 17.42, 17.68, 23.82, 20.74, 1…
## $ `Carb Flow`         <dbl> 2932, 3144, 2914, 3062, 3054, 2948, 30, 684, 2902,…
## $ Density             <dbl> 0.88, 0.92, 1.58, 1.54, 1.54, 1.52, 0.84, 0.84, 0.…
## $ MFR                 <dbl> 725.0, 726.8, 735.0, 730.6, 722.8, 738.8, NA, NA, …
## $ Balling             <dbl> 1.398, 1.498, 3.142, 3.042, 3.042, 2.992, 1.298, 1…
## $ `Pressure Vacuum`   <dbl> -4.0, -4.0, -3.8, -4.4, -4.4, -4.4, -4.4, -4.4, -4…
## $ PH                  <dbl> 8.36, 8.26, 8.94, 8.24, 8.26, 8.32, 8.40, 8.38, 8.…
## $ `Oxygen Filler`     <dbl> 0.022, 0.026, 0.024, 0.030, 0.030, 0.024, 0.066, 0…
## $ `Bowl Setpoint`     <dbl> 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, …
## $ `Pressure Setpoint` <dbl> 46.4, 46.8, 46.6, 46.0, 46.0, 46.0, 46.0, 46.0, 46…
## $ `Air Pressurer`     <dbl> 142.6, 143.0, 142.0, 146.2, 146.2, 146.6, 146.2, 1…
## $ `Alch Rel`          <dbl> 6.58, 6.56, 7.66, 7.14, 7.14, 7.16, 6.54, 6.52, 6.…
## $ `Carb Rel`          <dbl> 5.32, 5.30, 5.84, 5.42, 5.44, 5.44, 5.38, 5.34, 5.…
## $ `Balling Lvl`       <dbl> 1.48, 1.56, 3.28, 3.04, 3.04, 3.02, 1.44, 1.44, 1.…
##                      n    mean      sd  median     min     max   range  skew
## Brand Code*       2451    2.51    1.00    2.00    1.00    4.00    3.00  0.38
## Carb Volume       2561    5.37    0.11    5.35    5.04    5.70    0.66  0.39
## Fill Ounces       2533   23.97    0.09   23.97   23.63   24.32    0.69 -0.02
## PC Volume         2532    0.28    0.06    0.27    0.08    0.48    0.40  0.34
## Carb Pressure     2544   68.19    3.54   68.20   57.00   79.40   22.40  0.18
## Carb Temp         2545  141.09    4.04  140.80  128.60  154.00   25.40  0.25
## PSC               2538    0.08    0.05    0.08    0.00    0.27    0.27  0.85
## PSC Fill          2548    0.20    0.12    0.18    0.00    0.62    0.62  0.93
## PSC CO2           2532    0.06    0.04    0.04    0.00    0.24    0.24  1.73
## Mnf Flow          2569   24.57  119.48   65.20 -100.20  229.40  329.60  0.00
## Carb Pressure1    2539  122.59    4.74  123.20  105.60  140.20   34.60  0.05
## Fill Pressure     2549   47.92    3.18   46.40   34.60   60.40   25.80  0.55
## Hyd Pressure1     2560   12.44   12.43   11.40   -0.80   58.00   58.80  0.78
## Hyd Pressure2     2556   20.96   16.39   28.60    0.00   59.40   59.40 -0.30
## Hyd Pressure3     2556   20.46   15.98   27.60   -1.20   50.00   51.20 -0.32
## Hyd Pressure4     2541   96.29   13.12   96.00   52.00  142.00   90.00  0.55
## Filler Level      2551  109.25   15.70  118.40   55.80  161.20  105.40 -0.85
## Filler Speed      2514 3687.20  770.82 3982.00  998.00 4030.00 3032.00 -2.87
## Temperature       2557   65.97    1.38   65.60   63.60   76.20   12.60  2.39
## Usage cont        2566   20.99    2.98   21.79   12.08   25.90   13.82 -0.54
## Carb Flow         2569 2468.35 1073.70 3028.00   26.00 5104.00 5078.00 -0.99
## Density           2570    1.17    0.38    0.98    0.24    1.92    1.68  0.53
## MFR               2359  704.05   73.90  724.00   31.40  868.60  837.20 -5.09
## Balling           2570    2.20    0.93    1.65   -0.17    4.01    4.18  0.59
## Pressure Vacuum   2571   -5.22    0.57   -5.40   -6.60   -3.60    3.00  0.53
## PH                2567    8.55    0.17    8.54    7.88    9.36    1.48 -0.29
## Oxygen Filler     2559    0.05    0.05    0.03    0.00    0.40    0.40  2.66
## Bowl Setpoint     2569  109.33   15.30  120.00   70.00  140.00   70.00 -0.97
## Pressure Setpoint 2559   47.62    2.04   46.00   44.00   52.00    8.00  0.20
## Air Pressurer     2571  142.83    1.21  142.60  140.80  148.20    7.40  2.25
## Alch Rel          2562    6.90    0.51    6.56    5.28    8.62    3.34  0.88
## Carb Rel          2561    5.44    0.13    5.40    4.96    6.06    1.10  0.50
## Balling Lvl       2570    2.05    0.87    1.48    0.00    3.66    3.66  0.59
##                   kurtosis
## Brand Code*          -1.06
## Carb Volume          -0.47
## Fill Ounces           0.86
## PC Volume             0.67
## Carb Pressure        -0.01
## Carb Temp             0.24
## PSC                   0.65
## PSC Fill              0.77
## PSC CO2               3.73
## Mnf Flow             -1.87
## Carb Pressure1        0.14
## Fill Pressure         1.41
## Hyd Pressure1        -0.14
## Hyd Pressure2        -1.56
## Hyd Pressure3        -1.57
## Hyd Pressure4         0.63
## Filler Level          0.05
## Filler Speed          6.71
## Temperature          10.16
## Usage cont           -1.02
## Carb Flow            -0.58
## Density              -1.20
## MFR                  30.46
## Balling              -1.39
## Pressure Vacuum      -0.03
## PH                    0.06
## Oxygen Filler        11.09
## Bowl Setpoint        -0.06
## Pressure Setpoint    -1.60
## Air Pressurer         4.73
## Alch Rel             -0.85
## Carb Rel             -0.29
## Balling Lvl          -1.49

Missing data plot and Correlation Plot

We can see how much missing data there is for each predictor.

##        Brand Code       Carb Volume       Fill Ounces         PC Volume 
##               120                10                38                39 
##     Carb Pressure         Carb Temp               PSC          PSC Fill 
##                27                26                33                23 
##           PSC CO2          Mnf Flow    Carb Pressure1     Fill Pressure 
##                39                 2                32                22 
##     Hyd Pressure1     Hyd Pressure2     Hyd Pressure3     Hyd Pressure4 
##                11                15                15                30 
##      Filler Level      Filler Speed       Temperature        Usage cont 
##                20                57                14                 5 
##         Carb Flow           Density               MFR           Balling 
##                 2                 1               212                 1 
##   Pressure Vacuum                PH     Oxygen Filler     Bowl Setpoint 
##                 0                 4                12                 2 
## Pressure Setpoint     Air Pressurer          Alch Rel          Carb Rel 
##                12                 0                 9                10 
##       Balling Lvl 
##                 1

Some variables have a large amount of correlation. Here we are looking at numeric variables.

Boxplot

Here we observe outliers in the data. We can see that variables Filler Speed and Carb Flow have greater outliers.

## Warning: Removed 724 rows containing non-finite values (`stat_boxplot()`).

Data Cleaning

## Warning: Number of logged events: 1

Data Partition

Here the data is partitioned between 70% training and 30% testing, based on independent variables and variables that depend on PH.

Models

Linear Regression

Simple Linear Regression

The data variability is only 39%.

## 
## Call:
## lm(formula = y.studData ~ ., data = x.studData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.49946 -0.07937  0.01417  0.09057  0.77228 
## 
## Coefficients: (1 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -4.927e+02  1.995e+03  -0.247 0.804986    
## Brand.Code.A      -4.581e-02  1.833e-02  -2.500 0.012513 *  
## Brand.Code.B      -2.366e-02  2.769e-02  -0.855 0.392943    
## Brand.Code.C      -1.583e-01  2.837e-02  -5.581 2.76e-08 ***
## Brand.Code.D              NA         NA      NA       NA    
## Carb.Volume       -7.800e-02  8.457e-02  -0.922 0.356459    
## Fill.Ounces       -1.127e-01  3.889e-02  -2.898 0.003806 ** 
## PC.Volume         -1.071e-01  1.009e-01  -1.061 0.288739    
## Carb.Pressure      1.037e-01  6.303e-01   0.165 0.869352    
## Carb.Temp          9.231e+02  3.675e+03   0.251 0.801687    
## PSC               -1.750e-01  6.839e-02  -2.559 0.010584 *  
## PSC.Fill          -8.037e-02  5.930e-02  -1.355 0.175464    
## PSC.CO2           -9.963e-02  7.659e-02  -1.301 0.193481    
## Mnf.Flow          -6.776e-04  5.516e-05 -12.283  < 2e-16 ***
## Carb.Pressure1     3.013e-02  3.717e-03   8.106 9.63e-16 ***
## Fill.Pressure      1.329e+00  6.233e-01   2.132 0.033124 *  
## Hyd.Pressure2      7.037e-03  1.277e-03   5.512 4.07e-08 ***
## Hyd.Pressure4     -1.521e-01  1.334e-01  -1.140 0.254513    
## Filler.Speed       1.939e-07  9.209e-06   0.021 0.983201    
## Temperature       -8.800e-03  2.591e-03  -3.396 0.000699 ***
## Usage.cont        -5.878e-03  1.372e-03  -4.285 1.92e-05 ***
## Carb.Flow          1.068e-06  4.726e-07   2.260 0.023973 *  
## Density           -5.052e-01  1.088e-01  -4.645 3.65e-06 ***
## MFR               -2.943e-05  5.242e-05  -0.561 0.574633    
## Pressure.Vacuum   -2.935e-05  1.811e-04  -0.162 0.871317    
## Oxygen.Filler     -2.953e-01  7.750e-02  -3.810 0.000144 ***
## Bowl.Setpoint      1.927e-03  3.196e-04   6.030 1.99e-09 ***
## Pressure.Setpoint -8.456e-03  2.391e-03  -3.537 0.000416 ***
## Air.Pressurer      1.755e-03  2.768e-03   0.634 0.526226    
## Alch.Rel           6.160e-02  2.254e-02   2.733 0.006342 ** 
## Carb.Rel           3.434e-02  5.640e-02   0.609 0.542668    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1355 on 1772 degrees of freedom
## Multiple R-squared:  0.4006, Adjusted R-squared:  0.3908 
## F-statistic: 40.84 on 29 and 1772 DF,  p-value: < 2.2e-16

Partial Least Squares

We still see only 37% variability since R-squared is 0.37. ncomp = 20 was used to best tune the model. RMSE is here small.

## Partial Least Squares 
## 
## 1802 samples
##   30 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE       Rsquared    MAE      
##    1     0.1713245  0.02904804  0.1372979
##    2     0.1697293  0.04409482  0.1366806
##    3     0.1557012  0.19763254  0.1225521
##    4     0.1554119  0.20027558  0.1221232
##    5     0.1544327  0.20959762  0.1217433
##    6     0.1539928  0.21449336  0.1213981
##    7     0.1485014  0.26831439  0.1161607
##    8     0.1448417  0.30441999  0.1127449
##    9     0.1425598  0.32542137  0.1104592
##   10     0.1410537  0.33947245  0.1093773
##   11     0.1393603  0.35544460  0.1086594
##   12     0.1378897  0.36849570  0.1082404
##   13     0.1378106  0.36917934  0.1082171
##   14     0.1377927  0.36918346  0.1080344
##   15     0.1375947  0.37085743  0.1079677
##   16     0.1371678  0.37503941  0.1073795
##   17     0.1365672  0.38094020  0.1065596
##   18     0.1368167  0.37903405  0.1067908
##   19     0.1366654  0.38078891  0.1066763
##   20     0.1366556  0.38094217  0.1065843
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 20.
##    ncomp
## 20    20

##   ncomp      RMSE  Rsquared
## 1    20 0.1366556 0.3809422
##    Rsquared      RMSE
## 1 0.3809422 0.1366556

Non Linear Regression

MARS

Variability is much better, covering close to 50% here because R-squared is higher: 0.47. nPrune = 26 is used to best tune the model.

##    nprune degree
## 58     30      2

## Call: earth(x=data.frame[1802,30], y=c(8.26,8.24,8.3...), keepxy=TRUE,
##             degree=2, nprune=30)
## 
##                                                     coefficients
## (Intercept)                                            8.3728581
## Brand.Code.C                                          -0.2855665
## h(0.199349-Mnf.Flow)                                   0.0015335
## h(Mnf.Flow-0.199349)                                   0.0013336
## h(68.4-Temperature)                                    0.0345019
## h(Temperature-68.4)                                    0.0195519
## h(Bowl.Setpoint-90)                                    0.0009516
## Brand.Code.C * h(0.190029-Hyd.Pressure2)              -0.8571696
## Brand.Code.C * h(65.4-Temperature)                    -0.2284343
## Brand.Code.C * h(Density-0.365487)                     1.4456453
## Brand.Code.C * h(0.365487-Density)                     3.5458547
## Brand.Code.C * h(Pressure.Vacuum- -87.1375)            0.0046425
## Brand.Code.C * h(-87.1375-Pressure.Vacuum)             0.0020037
## h(0.199349-Mnf.Flow) * h(Pressure.Vacuum- -70.9711)   -0.0000386
## h(0.199349-Mnf.Flow) * h(-70.9711-Pressure.Vacuum)    -0.0000195
## h(0.199349-Mnf.Flow) * h(Air.Pressurer-143.8)         -0.0003153
## h(Mnf.Flow-0.199349) * h(146.4-Air.Pressurer)         -0.0003623
## h(0.199349-Mnf.Flow) * h(Alch.Rel-7.12)                0.0013575
## h(Mnf.Flow-0.199349) * h(Alch.Rel-7.16)                0.0019209
## h(68.4-Temperature) * h(Usage.cont-21.76)             -0.0086194
## h(0.589456-Density) * h(Bowl.Setpoint-90)              0.0190281
## h(Density-0.589456) * h(Bowl.Setpoint-90)             -0.0995469
## 
## Selected 22 of 30 terms, and 10 of 30 predictors (nprune=30)
## Termination condition: RSq changed by less than 0.001 at 30 terms
## Importance: Mnf.Flow, Brand.Code.C, Alch.Rel, Air.Pressurer, Hyd.Pressure2, ...
## Number of terms at each degree of interaction: 1 6 15
## GCV 0.01548611    RSS 26.27356    GRSq 0.4861794    RSq 0.515699
##    Rsquared      RMSE
## 1 0.4581286 0.1282684

Support Vector Machines

Data variability is getting much better here with R-squared = 0.51.

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 1802 samples
##   30 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ... 
## Resampling results across tuning parameters:
## 
##   C       RMSE       Rsquared   MAE       
##     0.25  0.1296327  0.4526929  0.09664701
##     0.50  0.1266832  0.4747169  0.09366711
##     1.00  0.1242716  0.4922324  0.09126480
##     2.00  0.1227377  0.5036585  0.08982547
##     4.00  0.1220135  0.5102544  0.08938673
##     8.00  0.1226486  0.5088387  0.08990746
##    16.00  0.1257302  0.4931350  0.09271659
##    32.00  0.1298161  0.4749069  0.09571814
##    64.00  0.1364763  0.4447209  0.10127156
##   128.00  0.1437624  0.4121773  0.10700773
## 
## Tuning parameter 'sigma' was held constant at a value of 0.02244293
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.02244293 and C = 4.
## Length  Class   Mode 
##      1   ksvm     S4

##    Rsquared      RMSE
## 1 0.5102544 0.1220135

Trees

Single Tree

Coverage of variability in the data is only 40% b/c the R-squared value is 0.40. This is much lower than the Support Vector Machines model.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
## CART 
## 
## 1802 samples
##   30 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ... 
## Resampling results across tuning parameters:
## 
##   cp          RMSE       Rsquared   MAE      
##   0.01187216  0.1364632  0.3874982  0.1060713
##   0.01216548  0.1360866  0.3904339  0.1056655
##   0.01285880  0.1372723  0.3796730  0.1061757
##   0.01386854  0.1380792  0.3708841  0.1069508
##   0.01439357  0.1394621  0.3585176  0.1084625
##   0.01771630  0.1401329  0.3493895  0.1100357
##   0.03004458  0.1423411  0.3280546  0.1114517
##   0.04275824  0.1460013  0.2930598  0.1149739
##   0.06388653  0.1520005  0.2365264  0.1189231
##   0.21183446  0.1660484  0.1684701  0.1316049
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.01216548.
##           cp
## 2 0.01216548

##    Rsquared      RMSE
## 1 0.3904339 0.1360866

Boosted Tree

R-squared is .54 so data variability coverage is substantially better here, 54%.

## Stochastic Gradient Boosting 
## 
## 1802 samples
##   30 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ... 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.minobsinnode  n.trees  RMSE       Rsquared   MAE       
##    5                  5               100     0.1199091  0.5242820  0.09070372
##    5                  5               200     0.1162544  0.5505175  0.08722424
##    5                  5               300     0.1151418  0.5594973  0.08599164
##    5                  5               400     0.1144831  0.5647117  0.08556658
##    5                  5               500     0.1140816  0.5684267  0.08552277
##    5                  5               600     0.1138611  0.5703591  0.08532112
##    5                  5               700     0.1136130  0.5723102  0.08501534
##    5                  5               800     0.1138383  0.5709910  0.08526792
##    5                  5               900     0.1134374  0.5741716  0.08505661
##    5                  5              1000     0.1133797  0.5747984  0.08508351
##    5                 10               100     0.1198591  0.5248985  0.09020298
##    5                 10               200     0.1169332  0.5461175  0.08715302
##    5                 10               300     0.1147076  0.5626667  0.08534338
##    5                 10               400     0.1137000  0.5704367  0.08459977
##    5                 10               500     0.1134989  0.5724447  0.08428993
##    5                 10               600     0.1132744  0.5744628  0.08400791
##    5                 10               700     0.1129588  0.5769682  0.08382788
##    5                 10               800     0.1128918  0.5777364  0.08384482
##    5                 10               900     0.1131412  0.5760118  0.08386667
##    5                 10              1000     0.1128383  0.5785628  0.08362460
##   10                  5               100     0.1132872  0.5747481  0.08473948
##   10                  5               200     0.1117614  0.5849693  0.08292171
##   10                  5               300     0.1115873  0.5867385  0.08276946
##   10                  5               400     0.1114115  0.5880941  0.08281834
##   10                  5               500     0.1110481  0.5906765  0.08256492
##   10                  5               600     0.1110415  0.5908941  0.08257919
##   10                  5               700     0.1110486  0.5908621  0.08250908
##   10                  5               800     0.1110940  0.5904492  0.08253846
##   10                  5               900     0.1110153  0.5910191  0.08251189
##   10                  5              1000     0.1109802  0.5912940  0.08252443
##   10                 10               100     0.1142389  0.5671225  0.08499040
##   10                 10               200     0.1120675  0.5829485  0.08303796
##   10                 10               300     0.1117855  0.5851979  0.08271858
##   10                 10               400     0.1118071  0.5850589  0.08270012
##   10                 10               500     0.1117272  0.5859739  0.08258634
##   10                 10               600     0.1116540  0.5865121  0.08253732
##   10                 10               700     0.1117027  0.5861833  0.08260377
##   10                 10               800     0.1117855  0.5855298  0.08269215
##   10                 10               900     0.1119326  0.5845723  0.08279999
##   10                 10              1000     0.1119575  0.5843887  0.08283964
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1000, interaction.depth
##  = 10, shrinkage = 0.1 and n.minobsinnode = 5.
##    n.trees interaction.depth shrinkage n.minobsinnode
## 30    1000                10       0.1              5

##    Rsquared      RMSE
## 1 0.5777364 0.1128918

Cubist

R-squared value is substantially better than all previous models with .64.

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion

## Warning in cubist.default(x, y, committees = param$committees, ...): NAs
## introduced by coercion
## Cubist 
## 
## 1802 samples
##   30 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1623, 1622, 1622, 1623, 1622, 1622, ... 
## Resampling results across tuning parameters:
## 
##   committees  neighbors  RMSE       Rsquared   MAE       
##    1          0          0.1311210  0.4783302  0.09187410
##    1          5          0.1311210  0.4783302  0.09187410
##    1          9          0.1311210  0.4783302  0.09187410
##   10          0          0.1081583  0.6144236  0.08017720
##   10          5          0.1081583  0.6144236  0.08017720
##   10          9          0.1081583  0.6144236  0.08017720
##   20          0          0.1065343  0.6282301  0.07868027
##   20          5          0.1065343  0.6282301  0.07868027
##   20          9          0.1065343  0.6282301  0.07868027
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 0.
##   committees neighbors
## 7         20         0

##    Rsquared      RMSE
## 1 0.6282301 0.1065343

Random Forest

R-squared here has the best value, of 0.67; 67% coverage of data variability. RMSE of .10 is also a better value here. This is the best model so far.

## Random Forest 
## 
## 1802 samples
##   30 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1621, 1622, 1621, 1622, 1621, 1622, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE       Rsquared   MAE       
##    2    0.1167803  0.5981781  0.08917200
##    5    0.1087382  0.6377895  0.08103808
##    8    0.1062868  0.6479370  0.07842360
##   11    0.1053870  0.6495279  0.07717344
##   14    0.1044003  0.6535880  0.07631755
##   17    0.1041866  0.6526610  0.07569956
##   20    0.1042424  0.6508385  0.07538026
##   23    0.1041572  0.6497579  0.07532373
##   26    0.1040774  0.6486613  0.07516248
##   30    0.1040685  0.6476317  0.07505072
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 30.
##    mtry
## 10   30

##    Rsquared      RMSE
## 1 0.6476317 0.1040685
## rf variable importance
## 
##   only 20 most important variables shown (out of 30)
## 
##                 Overall
## Mnf.Flow        100.000
## Brand.Code.C     33.690
## Oxygen.Filler    22.146
## Air.Pressurer    21.648
## Alch.Rel         20.987
## Pressure.Vacuum  19.354
## Density          16.166
## Temperature      15.511
## Carb.Pressure1   14.294
## Carb.Flow        14.146
## Usage.cont       13.885
## Carb.Rel         13.175
## Filler.Speed     10.882
## Fill.Ounces       8.665
## PC.Volume         8.632
## Bowl.Setpoint     8.399
## MFR               6.793
## Hyd.Pressure2     6.646
## Fill.Pressure     6.325
## Carb.Volume       6.078

XGB Tree

R-squared value 0.60 here has not changed much but is still high. Random Forest still has the best R-squared and RMSE values, making it the best model.

## [21:18:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:18:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:19:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:20:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:21:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [21:22:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## eXtreme Gradient Boosting 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1443, 1442, 1443, 1441, 1439 
## Resampling results across tuning parameters:
## 
##   max_depth  colsample_bytree  nrounds  RMSE       Rsquared   MAE       
##   10         0.5               100      0.1101306  0.5985438  0.08073673
##   10         0.5               200      0.1095880  0.6023085  0.08013195
##   10         0.6               100      0.1116859  0.5876458  0.08105096
##   10         0.6               200      0.1112895  0.5902825  0.08053533
##   10         0.7               100      0.1101404  0.5996566  0.08073727
##   10         0.7               200      0.1096444  0.6029766  0.08017499
##   10         0.8               100      0.1103995  0.5969781  0.07978522
##   10         0.8               200      0.1097685  0.6013542  0.07922762
##   10         0.9               100      0.1116225  0.5879890  0.08035341
##   10         0.9               200      0.1114361  0.5892884  0.08010886
##   15         0.5               100      0.1128714  0.5791446  0.08236451
##   15         0.5               200      0.1127293  0.5799052  0.08221163
##   15         0.6               100      0.1116849  0.5871629  0.08172433
##   15         0.6               200      0.1115435  0.5878909  0.08154356
##   15         0.7               100      0.1117988  0.5869304  0.08116509
##   15         0.7               200      0.1117237  0.5871620  0.08101293
##   15         0.8               100      0.1115817  0.5895437  0.08083917
##   15         0.8               200      0.1115216  0.5896520  0.08070959
##   15         0.9               100      0.1102138  0.5992230  0.07993898
##   15         0.9               200      0.1102105  0.5989384  0.07983175
##   20         0.5               100      0.1116027  0.5887801  0.08188634
##   20         0.5               200      0.1115135  0.5890656  0.08176921
##   20         0.6               100      0.1127241  0.5810126  0.08150658
##   20         0.6               200      0.1126268  0.5813337  0.08136949
##   20         0.7               100      0.1127185  0.5806046  0.08123352
##   20         0.7               200      0.1126354  0.5808332  0.08112543
##   20         0.8               100      0.1112426  0.5914091  0.08063050
##   20         0.8               200      0.1111642  0.5916313  0.08054062
##   20         0.9               100      0.1136288  0.5753711  0.08167075
##   20         0.9               200      0.1136113  0.5751702  0.08157242
##   25         0.5               100      0.1101120  0.6009393  0.08051768
##   25         0.5               200      0.1100148  0.6012815  0.08039126
##   25         0.6               100      0.1129253  0.5798910  0.08191238
##   25         0.6               200      0.1128338  0.5801636  0.08177973
##   25         0.7               100      0.1113004  0.5912984  0.08080646
##   25         0.7               200      0.1112228  0.5914509  0.08069122
##   25         0.8               100      0.1110339  0.5925923  0.08092132
##   25         0.8               200      0.1109708  0.5926770  0.08082922
##   25         0.9               100      0.1130638  0.5774251  0.08144635
##   25         0.9               200      0.1130320  0.5773058  0.08134804
## 
## Tuning parameter 'eta' was held constant at a value of 0.1
## Tuning
##  parameter 'min_child_weight' was held constant at a value of 1
## 
## Tuning parameter 'subsample' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nrounds = 200, max_depth = 10, eta
##  = 0.1, gamma = 0, colsample_bytree = 0.5, min_child_weight = 1 and subsample
##  = 1.
##   nrounds max_depth eta gamma colsample_bytree min_child_weight subsample
## 2     200        10 0.1     0              0.5                1         1

##    Rsquared      RMSE
## 1 0.5876458 0.1116859

Model Selection

The Random Forest model is selected because it performed the best, with the highest R-squared value.

Predictions and Conclusion

We can further see that Random Forest is the best performing model based on R-squared values; RMSE values are more less similar among the models, mainly tree models. The PH values are 8 or above, suggesting that this is the average PH of beverages in this manufacturing process based on the predictor variables from the data.

##                RMSE  Rsquared        MAE
## PLS      0.13488381 0.3763929 0.10647830
## MARS     0.12966561 0.4292487 0.09772683
## SVM      0.11875390 0.5214721 0.08501554
## SingTree 0.13273943 0.4018938 0.10168252
## RandFrst 0.09692278 0.6828094 0.06908257
## Boosting 0.10462754 0.6275910 0.07875976
## Cubist   0.10262016 0.6423931 0.07484190

Here we remove PH from the eval data, predict the finanl PH values, and make PH predictions.

## [1] 8.572175 8.548214 8.548472 8.558963 8.518947 8.523043