This is role playing. I am your new boss. I am in charge of production at ABC Beverage and you are a team of data scientists reporting to me. My leadership has told me that new regulations are requiring us to understand our manufacturing process, the predictive factors and be able to report to them our predictive model of PH.
Please use the historical data set I am providing. Build and report the factors in BOTH a technical and non-technical report. I like to use Word and Excel.Please provide your non-technical report in a business friendly readable document and your predictions in an Excel readable format. The technical report should show clearly the models you tested and how you selected your final approach.
## Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC
## 1 B 5.340000 23.96667 0.2633333 68.2 141.2 0.104
## 2 A 5.426667 24.00667 0.2386667 68.4 139.6 0.124
## 3 B 5.286667 24.06000 0.2633333 70.8 144.8 0.090
## 4 A 5.440000 24.00667 0.2933333 63.0 132.6 NA
## 5 A 5.486667 24.31333 0.1113333 67.2 136.8 0.026
## 6 A 5.380000 23.92667 0.2693333 66.6 138.4 0.090
## PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1
## 1 0.26 0.04 -100 118.8 46.0 0
## 2 0.22 0.04 -100 121.6 46.0 0
## 3 0.34 0.16 -100 120.2 46.0 0
## 4 0.42 0.04 -100 115.2 46.4 0
## 5 0.16 0.12 -100 118.4 45.8 0
## 6 0.24 0.04 -100 119.6 45.6 0
## Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed
## 1 NA NA 118 121.2 4002
## 2 NA NA 106 118.6 3986
## 3 NA NA 82 120.0 4020
## 4 0 0 92 117.8 4012
## 5 0 0 92 118.6 4010
## 6 0 0 116 120.2 4014
## Temperature Usage.cont Carb.Flow Density MFR Balling Pressure.Vacuum PH
## 1 66.0 16.18 2932 0.88 725.0 1.398 -4.0 8.36
## 2 67.6 19.90 3144 0.92 726.8 1.498 -4.0 8.26
## 3 67.0 17.76 2914 1.58 735.0 3.142 -3.8 8.94
## 4 65.6 17.42 3062 1.54 730.6 3.042 -4.4 8.24
## 5 65.6 17.68 3054 1.54 722.8 3.042 -4.4 8.26
## 6 66.2 23.82 2948 1.52 738.8 2.992 -4.4 8.32
## Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Air.Pressurer Alch.Rel Carb.Rel
## 1 0.022 120 46.4 142.6 6.58 5.32
## 2 0.026 120 46.8 143.0 6.56 5.30
## 3 0.024 120 46.6 142.0 7.66 5.84
## 4 0.030 120 46.0 146.2 7.14 5.42
## 5 0.030 120 46.0 146.2 7.14 5.44
## 6 0.024 120 46.0 146.6 7.16 5.44
## Balling.Lvl
## 1 1.48
## 2 1.56
## 3 3.28
## 4 3.04
## 5 3.04
## 6 3.02
EXCEL
The first thing we did was to open the EXCEL files for training and test data that was provided. This is just to get a general idea of what we are looking at. We first looked out the training data. We have 1 response variable (PH) and 32 predictor variables(all numerical) with 2571 observation. One thing that we noticed immediately using the filter function was that we are missing about 120 of the predictor variable, “Brand Code”. We also noticed that about(4) of our response variable, PH was also missing. In addition to the 4 missing entries, we realized that it may be benficial to convert PH from numerical to catergorical based on the value(ie. basic, acidic, neutral). We know that anything below 7 is acidic, while anything above 7 is basic, although we realize that are data ranges from 7 up. Below is the summary statistic we obtained from our EXCEL dive.
Column.Name | MIN | MAX | MEAN | MEDIAN | NA.S |
---|---|---|---|---|---|
Carb Volume | 5.0400 | 5.700 | 5.370 | 5.347 | 10 |
Fill Ounces | 23.6300 | 24.320 | 23.970 | 2397.000 | 34 |
PC Volume | 0.0790 | 0.478 | 0.277 | 0.271 | 39 |
Carb Pressure | 57.0000 | 79.400 | 68.190 | 68.200 | 27 |
Carb Temp | 128.6000 | 154.000 | 141.090 | 140.800 | 26 |
PSC | 0.0000 | 0.270 | 0.080 | 0.080 | 33 |
PSC Fill | 0.0000 | 0.620 | 0.200 | 0.180 | 23 |
PSC CO2 | 0.0000 | 0.240 | 0.060 | 0.040 | 39 |
Mnf Flow | -100.2000 | 229.400 | 24.590 | 65.200 | 2 |
Carb Pressure 1 | 105.6000 | 140.200 | 122.590 | 123.200 | 32 |
Fill Pressure | 34.6000 | 60.400 | 47.920 | 46.400 | 22 |
Hyd Pressure 1 | -0.8000 | 58.000 | 12.440 | 11.400 | 11 |
Hyd Pressure 2 | 0.0000 | 59.400 | 20.960 | 28.600 | 11 |
Hyd Pressure 3 | -1.2000 | 50.000 | 20.460 | 27.600 | 15 |
Hyd Pressure 4 | 52.0000 | 142.000 | 96.290 | 96.000 | 30 |
Filler Level | 55.8000 | 161.200 | 109.250 | 118.400 | 20 |
Filler Speed | 998.0000 | 4030.000 | 3687.200 | 3982.000 | 57 |
Temperature | 63.6000 | 76.200 | 65.970 | 65.600 | 14 |
Usage Cont | 12.0800 | 25.900 | 20.990 | 21.790 | 5 |
Carb Flow | 26.0000 | 5104.000 | 2468.350 | 3028.000 | 2 |
Density | 0.2400 | 1.920 | 1.170 | 0.980 | 1 |
MFR | 31.4000 | 868.600 | 704.050 | 724.000 | 212 |
Balling | -0.1700 | 4.012 | 2.200 | 1.650 | 1 |
Pressure Vacuum | -6.6000 | -3.600 | -5.220 | -5.400 | 0 |
PH | 7.8800 | 9.360 | 8.550 | 8.540 | 4 |
Oxygen Filler | 0.0024 | 0.400 | 0.047 | 0.030 | 12 |
Bowl Setpoint | 70.0000 | 130.000 | 109.340 | 120.000 | 2 |
Pressure Setpoint | 44.0000 | 52.000 | 47.620 | 46.000 | 12 |
Air Pressurer | 140.8000 | 148.200 | 142.830 | 142.600 | 0 |
Alch Rel | 5.2800 | 8.620 | 6.900 | 6.560 | 9 |
Carb Rel | 4.9600 | 6.060 | 5.440 | 5.400 | 10 |
Balling Lvl | 0.0000 | 3.660 | 2.050 | 1.480 | 1 |
Aside from the missing response variables, there are quite a bit of the predictor variables with missing values. MFR has a total of 212 missing values and some like “Pressure Vacuum” and “Air Pressure” have no missing values. We will go ahead a impute the missing values for the predictor variables. There are a few variables that we worry may have outliers because of the range between the min and the max. One such variable is “Carb Flow”, with a min of 26 and max of 5104. Another would be MFR.
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | IQR | Q0.25 | Q0.75 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Brand.Code* | 1 | 2451 | 2.5063239 | 0.9956337 | 2.0000000 | 2.5079041 | 0.0000000 | 1.0000000 | 4.000 | 3.0000000 | 0.3818872 | -1.0613288 | 0.0201107 | 2.0000000 | 2.0000000 | 4.000000 |
Carb.Volume | 2 | 2561 | 5.3701978 | 0.1063852 | 5.3466667 | 5.3654840 | 0.1087240 | 5.0400000 | 5.700 | 0.6600000 | 0.3922121 | -0.4669916 | 0.0021022 | 0.1600000 | 5.2933333 | 5.453333 |
Fill.Ounces | 3 | 2533 | 23.9747546 | 0.0875299 | 23.9733333 | 23.9751390 | 0.0790720 | 23.6333333 | 24.320 | 0.6866667 | -0.0215452 | 0.8624714 | 0.0017392 | 0.1066667 | 23.9200000 | 24.026667 |
PC.Volume | 4 | 2532 | 0.2771187 | 0.0606953 | 0.2713333 | 0.2745818 | 0.0523852 | 0.0793333 | 0.478 | 0.3986667 | 0.3396269 | 0.6699690 | 0.0012062 | 0.0728333 | 0.2391667 | 0.312000 |
Carb.Pressure | 5 | 2544 | 68.1895755 | 3.5382039 | 68.2000000 | 68.1212574 | 3.5582400 | 57.0000000 | 79.400 | 22.4000000 | 0.1822162 | -0.0138046 | 0.0701495 | 5.0000000 | 65.6000000 | 70.600000 |
Carb.Temp | 6 | 2545 | 141.0949234 | 4.0373861 | 140.8000000 | 140.9912617 | 3.8547600 | 128.6000000 | 154.000 | 25.4000000 | 0.2468280 | 0.2375822 | 0.0800307 | 5.4000000 | 138.4000000 | 143.800000 |
PSC | 7 | 2538 | 0.0845737 | 0.0492690 | 0.0760000 | 0.0802746 | 0.0474432 | 0.0020000 | 0.270 | 0.2680000 | 0.8491445 | 0.6480498 | 0.0009780 | 0.0640000 | 0.0480000 | 0.112000 |
PSC.Fill | 8 | 2548 | 0.1953689 | 0.1177817 | 0.1800000 | 0.1837059 | 0.1186080 | 0.0000000 | 0.620 | 0.6200000 | 0.9334450 | 0.7691466 | 0.0023333 | 0.1600000 | 0.1000000 | 0.260000 |
PSC.CO2 | 9 | 2532 | 0.0564139 | 0.0430387 | 0.0400000 | 0.0494965 | 0.0296520 | 0.0000000 | 0.240 | 0.2400000 | 1.7288937 | 3.7250025 | 0.0008553 | 0.0600000 | 0.0200000 | 0.080000 |
Mnf.Flow | 10 | 2569 | 24.5689373 | 119.4811263 | 65.2000000 | 21.0679631 | 169.0164000 | -100.2000000 | 229.400 | 329.6000000 | 0.0041430 | -1.8697072 | 2.3573130 | 240.8000000 | -100.0000000 | 140.800000 |
Carb.Pressure1 | 11 | 2539 | 122.5863726 | 4.7428819 | 123.2000000 | 122.5379242 | 4.4478000 | 105.6000000 | 140.200 | 34.6000000 | 0.0543587 | 0.1418265 | 0.0941263 | 6.4000000 | 119.0000000 | 125.400000 |
Fill.Pressure | 12 | 2549 | 47.9221656 | 3.1775457 | 46.4000000 | 47.7071044 | 2.3721600 | 34.6000000 | 60.400 | 25.8000000 | 0.5471107 | 1.4067532 | 0.0629371 | 4.0000000 | 46.0000000 | 50.000000 |
Hyd.Pressure1 | 13 | 2560 | 12.4375781 | 12.4332538 | 11.4000000 | 10.8374023 | 16.9016400 | -0.8000000 | 58.000 | 58.8000000 | 0.7798043 | -0.1426463 | 0.2457338 | 20.2000000 | 0.0000000 | 20.200000 |
Hyd.Pressure2 | 14 | 2556 | 20.9610329 | 16.3863066 | 28.6000000 | 21.0519062 | 13.3434000 | 0.0000000 | 59.400 | 59.4000000 | -0.3019570 | -1.5592984 | 0.3241161 | 34.6000000 | 0.0000000 | 34.600000 |
Hyd.Pressure3 | 15 | 2556 | 20.4584507 | 15.9757236 | 27.6000000 | 20.5052786 | 13.9364400 | -1.2000000 | 50.000 | 51.2000000 | -0.3189061 | -1.5745834 | 0.3159949 | 33.4000000 | 0.0000000 | 33.400000 |
Hyd.Pressure4 | 16 | 2541 | 96.2888627 | 13.1225594 | 96.0000000 | 95.4530251 | 11.8608000 | 52.0000000 | 142.000 | 90.0000000 | 0.5459786 | 0.6340041 | 0.2603252 | 16.0000000 | 86.0000000 | 102.000000 |
Filler.Level | 17 | 2551 | 109.2523716 | 15.6984241 | 118.4000000 | 111.0417442 | 9.1921200 | 55.8000000 | 161.200 | 105.4000000 | -0.8482847 | 0.0460488 | 0.3108142 | 21.7000000 | 98.3000000 | 120.000000 |
Filler.Speed | 18 | 2514 | 3687.1988862 | 770.8200208 | 3982.0000000 | 3919.9870775 | 47.4432000 | 998.0000000 | 4030.000 | 3032.0000000 | -2.8700359 | 6.7059692 | 15.3734149 | 110.0000000 | 3888.0000000 | 3998.000000 |
Temperature | 19 | 2557 | 65.9675401 | 1.3827783 | 65.6000000 | 65.7986321 | 0.8895600 | 63.6000000 | 76.200 | 12.6000000 | 2.3869732 | 10.1612904 | 0.0273456 | 1.2000000 | 65.2000000 | 66.400000 |
Usage.cont | 20 | 2566 | 20.9929618 | 2.9779364 | 21.7900000 | 21.2517819 | 3.1875900 | 12.0800000 | 25.900 | 13.8200000 | -0.5353253 | -1.0170230 | 0.0587878 | 5.3950000 | 18.3600000 | 23.755000 |
Carb.Flow | 21 | 2569 | 2468.3542234 | 1073.6964743 | 3028.0000000 | 2601.1356344 | 326.1720000 | 26.0000000 | 5104.000 | 5078.0000000 | -0.9877287 | -0.5826893 | 21.1835857 | 2042.0000000 | 1144.0000000 | 3186.000000 |
Density | 22 | 2570 | 1.1736498 | 0.3775269 | 0.9800000 | 1.1533463 | 0.1482600 | 0.2400000 | 1.920 | 1.6800000 | 0.5260149 | -1.1992070 | 0.0074470 | 0.7200000 | 0.9000000 | 1.620000 |
MFR | 23 | 2359 | 704.0492582 | 73.8983094 | 724.0000000 | 718.1566967 | 15.4190400 | 31.4000000 | 868.600 | 837.2000000 | -5.0917729 | 30.4558939 | 1.5214950 | 24.7000000 | 706.3000000 | 731.000000 |
Balling | 24 | 2570 | 2.1977696 | 0.9310914 | 1.6480000 | 2.1287189 | 0.3706500 | -0.1700000 | 4.012 | 4.1820000 | 0.5939224 | -1.3855651 | 0.0183665 | 1.7960000 | 1.4960000 | 3.292000 |
Pressure.Vacuum | 25 | 2571 | -5.2161027 | 0.5699933 | -5.4000000 | -5.2521147 | 0.5930400 | -6.6000000 | -3.600 | 3.0000000 | 0.5256608 | -0.0313126 | 0.0112414 | 0.6000000 | -5.6000000 | -5.000000 |
PH | 26 | 2567 | 8.5456486 | 0.1725162 | 8.5400000 | 8.5516788 | 0.1779120 | 7.8800000 | 9.360 | 1.4800000 | -0.2906437 | 0.0644294 | 0.0034050 | 0.2400000 | 8.4400000 | 8.680000 |
Oxygen.Filler | 27 | 2559 | 0.0468426 | 0.0466436 | 0.0334000 | 0.0388837 | 0.0249077 | 0.0024000 | 0.400 | 0.3976000 | 2.6603955 | 11.0882098 | 0.0009221 | 0.0380000 | 0.0220000 | 0.060000 |
Bowl.Setpoint | 28 | 2569 | 109.3265862 | 15.3031541 | 120.0000000 | 111.3466213 | 0.0000000 | 70.0000000 | 140.000 | 70.0000000 | -0.9743842 | -0.0564212 | 0.3019249 | 20.0000000 | 100.0000000 | 120.000000 |
Pressure.Setpoint | 29 | 2559 | 47.6153966 | 2.0390474 | 46.0000000 | 47.6026354 | 0.0000000 | 44.0000000 | 52.000 | 8.0000000 | 0.2031970 | -1.6012622 | 0.0403081 | 4.0000000 | 46.0000000 | 50.000000 |
Air.Pressurer | 30 | 2571 | 142.8339946 | 1.2119170 | 142.6000000 | 142.5812348 | 0.5930400 | 140.8000000 | 148.200 | 7.4000000 | 2.2521053 | 4.7336291 | 0.0239013 | 0.8000000 | 142.2000000 | 143.000000 |
Alch.Rel | 31 | 2562 | 6.8974161 | 0.5052753 | 6.5600000 | 6.8384390 | 0.0593040 | 5.2800000 | 8.620 | 3.3400000 | 0.8836378 | -0.8506221 | 0.0099825 | 0.7000000 | 6.5400000 | 7.240000 |
Carb.Rel | 32 | 2561 | 5.4367825 | 0.1287183 | 5.4000000 | 5.4301318 | 0.1186080 | 4.9600000 | 6.060 | 1.1000000 | 0.5032472 | -0.2949480 | 0.0025435 | 0.2000000 | 5.3400000 | 5.540000 |
Balling.Lvl | 33 | 2570 | 2.0500078 | 0.8703089 | 1.4800000 | 1.9827237 | 0.2075640 | 0.0000000 | 3.660 | 3.6600000 | 0.5858456 | -1.4858636 | 0.0171675 | 1.7600000 | 1.3800000 | 3.140000 |
The describe function from the psych package gives us a more descriptive summary statistic breakdown, inclduing skewness. We see that some variables are right skewed(PSC CO2, PSC Fill, and Temperature) while some are left skewed(Filler Speed, Carb Flow, and MFR). We will perform some transformations later to address the skewness of the data. First, let’s do some plots|further exploration of our predictors.
library(DataExplorer)
#create_report(bev, y = "PH")
DataExplorer::plot_histogram(bev, nrow = 3L, ncol = 4L)
Looking at the plots, a few things jump out immediately at us It doesn’t appear that a lot of the variables have a normal distribution. A few of them have spikes that we think might be outliers and will be explored further. A few of the distributions appear to be bimodial. We will create dummy variables to flag which these are. We will definitely need to do some pre-processing before throughing into a model. We’d like to take a look at the correlation plots to see if we have highly correlated date. We will remove those that are.
library(dplyr)
bev.new <- bev %>%
mutate(Mnf.Flow = if_else(Mnf.Flow < 0, 1, 0)) %>%
mutate(Hyd.Pressure1 = if_else(Hyd.Pressure1 <= 0, 1, 0)) %>%
mutate(Hyd.Pressure2 = if_else(Hyd.Pressure2 <= 0, 1, 0)) %>%
mutate(Filler.Speed = if_else(Filler.Speed < 2500, 1, 0)) %>%
mutate(Carb.Flow = if_else(Carb.Flow < 2000, 1, 0)) %>%
mutate(Balling = if_else(Balling < 2.5, 1, 0))
Now we’ll take a look at a correlation plot.
library(corrplot)
cor.plt <- cor(bev.new %>% dplyr::select(-Brand.Code), use = "pairwise.complete.obs", method = "pearson")
corrplot(cor.plt, method = "color", type = "upper", order = "original", number.cex = .6, addCoef.col = "black", tl.srt = 90, diag = TRUE)
bev.remove <- names(bev.new) %in% c("Density", "Balling", "Carb.Rel", "Alch.Rel")
bev.new <- bev.new[!bev.remove]
head(bev.new)
## Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC
## 1 B 5.340000 23.96667 0.2633333 68.2 141.2 0.104
## 2 A 5.426667 24.00667 0.2386667 68.4 139.6 0.124
## 3 B 5.286667 24.06000 0.2633333 70.8 144.8 0.090
## 4 A 5.440000 24.00667 0.2933333 63.0 132.6 NA
## 5 A 5.486667 24.31333 0.1113333 67.2 136.8 0.026
## 6 A 5.380000 23.92667 0.2693333 66.6 138.4 0.090
## PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1
## 1 0.26 0.04 1 118.8 46.0 1
## 2 0.22 0.04 1 121.6 46.0 1
## 3 0.34 0.16 1 120.2 46.0 1
## 4 0.42 0.04 1 115.2 46.4 1
## 5 0.16 0.12 1 118.4 45.8 1
## 6 0.24 0.04 1 119.6 45.6 1
## Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed
## 1 NA NA 118 121.2 0
## 2 NA NA 106 118.6 0
## 3 NA NA 82 120.0 0
## 4 1 0 92 117.8 0
## 5 1 0 92 118.6 0
## 6 1 0 116 120.2 0
## Temperature Usage.cont Carb.Flow MFR Pressure.Vacuum PH Oxygen.Filler
## 1 66.0 16.18 0 725.0 -4.0 8.36 0.022
## 2 67.6 19.90 0 726.8 -4.0 8.26 0.026
## 3 67.0 17.76 0 735.0 -3.8 8.94 0.024
## 4 65.6 17.42 0 730.6 -4.4 8.24 0.030
## 5 65.6 17.68 0 722.8 -4.4 8.26 0.030
## 6 66.2 23.82 0 738.8 -4.4 8.32 0.024
## Bowl.Setpoint Pressure.Setpoint Air.Pressurer Balling.Lvl
## 1 120 46.4 142.6 1.48
## 2 120 46.8 143.0 1.56
## 3 120 46.6 142.0 3.28
## 4 120 46.0 146.2 3.04
## 5 120 46.0 146.2 3.04
## 6 120 46.0 146.6 3.02
#library(ggplot2)
#plot_correlation(bev.new, type = c("all", "discrete", "continuous"),
#maxcat = 20L, cor_args = list(), geom_text_args = list(),
#title = NULL, ggtheme = theme_gray(),
#theme_config = list(legend.position = "bottom", axis.text.x =
#element_text(angle = 90)))
From the plot, we notice that Density, Balling, Carb.Rel, Alch.Rel are highly correlated so we decided to remove those variables. As we stated earlier, Brand Code was missing about 120 variables. We first converted the Brand.Code predictor to factors so that it would be compatible for a random forest imputation.
We then filtered out the subset of records (4) with a missing response (PH) values and imputed the remaining missing values using the random forest imputation.
Using missForest to impute took much longer than rfImpute, but it works better for our purposes. Initally, we wanted to convert our response variable to be categorical but at this point, we decided against it as it would lead to lose of information. Next, let’s delve into whether we have zero-variance variables or not. Zero-variance variables are those where the percentage of unique values is less than 10%.
We notice that there are no variables where we are getting a true for near zero variance(nzv) so we will move one to look at splitting our dataset. We mentioned earlier that we had a couple of variables that exhibited some skewness. We will do a BoxCox transformation of those variables(PSC, PSC.Fill and PSC.CO2, etc). We notice that PSC.Fill and PSC.CO2 have 0 values so we will add a small offset.
#lambda <- BoxCox.lambda(bev.imp.missForest)
#bev.boxcox <- BoxCox(bev.imp.missForest, lambda)
library(forecast)
bev.boxcox <- bev.imp.missForest
offset <- .0000001
bev.boxcox$PSC.Fill <- bev.boxcox$PSC.Fill + offset
bev.boxcox$PSC.CO2 <- bev.boxcox$PSC.CO2 + offset
#psc.boxcox <- boxcox(bev.boxcox$PSC ~ 1, lamda = seq(-6, 6, .1))
#pscfill.boxcox <- boxcox(bev.boxcox$PSC.Fill ~ 1, lambda = seq(-6, 6, 0.1))
#psccos.boxcox <- boxcox(bev.boxcox$PSC.CO2 ~ 1, lambda = seq(-6, 6, 0.1))
#oxygenfiller.boxcox <- boxcox(bev.boxcox$Oxygen.Filler ~ 1, lambda = seq(-6, 6, .1))
#bc1 <- data.frame(psc.boxcox$x, psc.boxcox$y)
#bc2 <- bc1[with(bc1, order(-bc1$psc.boxcox.y)),]
#bc2[1,]
#bc3 <- data.frame(pscfill.boxcox$x, pscfill.boxcox$y)
#bc4 <- bc3[with(bc3, order(-bc3$pscfill.boxcox.y)),]
#bc4[1,]
#bc5 <- data.frame(psccos.boxcox$x, psccos.boxcox$y)
#bc6 <- bc5[with(bc5, order(-bc5$psccos.boxcox.y)),]
#bc6[1,]
#bc7 <- data.frame(oxygenfiller.boxcox$x, oxygenfiller.boxcox$y)
#bc8 <- bc7[with(bc7, order(-bc7$oxygenfiller.boxcox.y)),]
#bc8[1,]
# to find optimal lambda
lambda1 <- BoxCox.lambda(bev.boxcox$PSC.Fill)
lambda2 <- BoxCox.lambda(bev.boxcox$PSC.CO2)
lambda3 <- BoxCox.lambda(bev.boxcox$Oxygen.Filler)
lambda4 <- BoxCox.lambda(bev.boxcox$PSC)
# now to transform vector
trans.vector1 = BoxCox(bev.boxcox$PSC.Fill, lambda1)
bev.boxcox$PSC.Fill <- trans.vector1
trans.vector2 = BoxCox(bev.boxcox$PSC.CO2, lambda2)
bev.boxcox$PSC.CO2 <- trans.vector2
trans.vector3 = BoxCox(bev.boxcox$Oxygen.Filler, lambda3)
bev.boxcox$Oxygen.Filler <- trans.vector3
trans.vector4 = BoxCox(bev.boxcox$PSC, lambda4)
bev.boxcox$PSC <- trans.vector4
DataExplorer::plot_histogram(bev.boxcox, nrow = 3L, ncol = 4L)
Now that we have completed transforming our dataset, we will go ahead and split the trainig data that we were given. We will split a few ways so that we are able to use for a few different models.
#set.seed(123)
#myvars <- names(bev.boxcox) %in% c("Brand.Code")
#bev.boxcox2<- bev.boxcox[, !myvars]
## 75% of the sample size
smp_size <- floor(0.75 * nrow(bev.boxcox))
## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(bev.boxcox)), size = smp_size)
bev.train <- bev.boxcox[train_ind, ]
bev.test <- bev.boxcox[-train_ind, ]
bev.trainX <- bev.train[, !names(bev.train) %in% "PH"]
bev.trainY <- bev.train[, "PH"]
bev.testX <- bev.test[, !names(bev.train) %in% "PH"]
bev.testY <- bev.test[, "PH"]
ctrl <- trainControl(method = "cv", number = 10)
GLM Model
GLM or generalized linear models, formulated by John Nelder and Robert Wedderburn, are “a flexible generalization of an ordinary linear ergression model” by allowing the linear model to be related to the response variable via a link-function. It was initally formulated as a way of unifying various models such as: linear, logistic, and Poisson regressions. It allows for a non-normal error distribution models.
library(tictoc)
set.seed(456)
tic()
glm.model <- train(PH ~., data = bev.train, metric = "RMSE", method = "glm", preProcess = c("center", "scale", "BoxCox"), trControl = ctrl)
glm.predict <- predict(glm.model, newdata = bev.test)
pre.eval <- data.frame(obs = bev.testY, pred = glm.predict)
glm.results <- data.frame(defaultSummary(pre.eval))
glm.rmse <- glm.results[1, 1]
toc()
## 4.84 sec elapsed
exectime <- toc()
exectime <- exectime$toc - exectime$tic
paste0("The RMSE value for the GLM model is ", glm.rmse)
## [1] "The RMSE value for the GLM model is 0.132169145517236"
GLMNET MODEL
GLMNET is for elastic net regression. Unlike GLM, there is a penalty term associated with this model. Elastics net is a regularized regression method that combines the L1 and L1 penalities of lasso and ridge.
set.seed(789)
tic()
glmnet.model <- train(PH ~., data = bev.train, metric = "RMSE", method = "glmnet", preProcess = c("center", "scale", "BoxCox"), trControl = ctrl)
glmnet.predict <- predict(glmnet.model, newdata = bev.test)
pre.eval2 <- data.frame(obs = bev.testY, pred = glmnet.predict)
glmnet.results <- data.frame(defaultSummary(pre.eval2))
glmnet.rmse <- glmnet.results[1, 1]
toc()
## 11.28 sec elapsed
exectime <- toc()
exectime <- exectime$toc - exectime$tic
paste0("The RMSE value for the GLMNET model is ", glmnet.rmse)
## [1] "The RMSE value for the GLMNET model is 0.132072825560028"
We will next try partial least squares regression(PLS) model.PLS is typically used when we have more predictors than observations, although that is not the case in our current situation. PLS is a dimension reduction technique similar to PCA. Our predictors are mapped to a smaller set of vairables and within that space we perform aregression against the our response variable. It aims to choose new mapped variables that maximally explains the outcome variable.
library(pls)
#model <- plsr(PH ~., data = bev.train, validation = "CV")
#cv <- RMSEP(model)
#best.dims <- which.min(cv$val[estimate = "adjCV", , ]) - 1
#model <- plsr(PH ~., data = bev.train, ncomp = best.dims)
#model
set.seed(654)
tic()
pls.bev <- train(PH ~., data = bev.train, metric = "RMSE", method = "pls", tunelength = 15, preProcess = c("center", "scale", "BoxCox"), trControl = ctrl)
pls.pred <- predict(pls.bev, bev.test)
pre.eval3 <- data.frame(obs = bev.testY, pred = pls.pred)
pls.results <- data.frame(defaultSummary(pre.eval3))
pls.rmse <- pls.results[1, 1]
toc()
## 4.38 sec elapsed
exectime <- toc()
exectime <- exectime$toc - exectime$tic
paste0("The RMSE value for the PLS model is ", pls.rmse)
## [1] "The RMSE value for the PLS model is 0.134651698744935"
Random Forest
ctrl2 <- trainControl(method = "repeatedcv", number = 5, repeats = 2, search = "random", allowParallel = TRUE)
mtry <- sqrt(ncol(bev.train))
set.seed(321)
tic()
ranfor.bev <- train(PH ~., data = bev.train, metric = "RMSE", method = "rf", tunelength = 5, trControl = ctrl2, importance = T)
rf.Pred <- predict(ranfor.bev, newdata = bev.test)
rf.results <- data.frame(postResample(pred = rf.Pred, obs = bev.test$PH))
rf.rmse <- rf.results[1, 1]
toc()
## 527.37 sec elapsed
exectime <- toc()
exectime <- exectime$toc - exectime$tic
paste0("The RMSE value for the Random Forest model is ", rf.rmse)
## [1] "The RMSE value for the Random Forest model is 0.105823809400893"
## rf variable importance
##
## only 20 most important variables shown (out of 30)
##
## Overall
## Mnf.Flow 100.00
## Brand.CodeC 72.57
## Pressure.Vacuum 68.96
## Air.Pressurer 65.93
## Oxygen.Filler 63.94
## Balling.Lvl 50.44
## Usage.cont 45.89
## Carb.Pressure1 45.19
## Hyd.Pressure3 43.84
## Brand.CodeD 43.08
## Bowl.Setpoint 41.93
## Temperature 38.50
## Filler.Level 34.92
## Carb.Volume 33.26
## MFR 29.28
## Fill.Pressure 27.78
## PC.Volume 25.17
## Carb.Flow 22.76
## Hyd.Pressure4 22.29
## Pressure.Setpoint 20.05
From the random forest model, we see that the top 5 most important variables are:
- Mnf.Flow
- Brand.CodeC
- Air.Pressure
- Pressure.Vacuum
- Oxygen.Filler
XGBoost Model
We decided to try the Extreme Gradient boosting model because of its high accuracy and optimization to tackle regression problems as it allows optimization of an arbitrary differentiable loss function XGBoost Model. We decided to try the Extreeme Gradient boosting model because of its high accuracy and optimization to tackle regression problems as it allows optimization of an arbitrary differentiable loss function. Xgboost accepts only numerical predictors, so let’s convert the Brandcode to numerical.
We clearly see that the most important predictors are 1. Mnf.Flow 2. Usage.cont 3. Carb.Flow 4. Oxygen.Filler 5. Carb.Rel
MARS model
We decided to try MARs model because it could predict the values of a continuous dependent or outcome variable from a set of independent or predictor variables.The reason I chose the MARSplines is because it is a nonparametric regression procedure that makes no assumption about the underlying functional relationship between the dependent and independent variables. Since in this case it was not clear if there was linear relationship or not. It is worls even in situations where the relationship between the predictors and the dependent variables is non-monotone and difficult to approximate with parametric models
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
set.seed(100)
tic()
MarsModel <- train(x = bev.trainX,
y = bev.train$PH,
method = "earth",
tuneGrid = marsGrid,
trControl = trainControl(method='cv'))
MarsModel$bestTune
## nprune degree
## 69 33 2
MarsModelTunePred <- predict(MarsModel, newdata = bev.testX)
mars.results <- data.frame(postResample(pred =MarsModelTunePred, obs = bev.test$PH))
mars.rmse <- mars.results[1, 1]
toc()
## 180.98 sec elapsed
exectime <- toc()
exectime <- exectime$toc - exectime$tic
paste0("The RMSE value for the MARS model is ", mars.rmse)
## [1] "The RMSE value for the MARS model is 0.1277669928728"
We clearly see that the most important predictors for the MARS model are 1. Mnf.Flow 2. Brand_code 3. Airpressure 4. Alch.Rel 5. Bowl.Setpoint
glm.rmse | glmnet.rmse | pls.rmse | rf.rmse | xgboost.rmse | mars.rmse |
---|---|---|---|---|---|
0.1321691 | 0.1320728 | 0.1346517 | 0.1058238 | 0.1229514 | 0.127767 |
We see that the random forest model has the best RMSE as .107. The model that performed best following the random forest was the MARS model at .130. We also timed each of our models and the model with the best time was ## Model Testing
Preprocess test set by imputing missing values
Test_set_bev$`Brand.Code` <- factor(Test_set_bev$`Brand.Code`)
set.seed(123)
myvars <- names(Test_set_bev) %in% c("PH")
Test_set_bev.missForest <- Test_set_bev[, !myvars]
summary(Test_set_bev.missForest)
## Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure
## A : 35 Min. :5.147 Min. :23.75 Min. :0.09867 Min. :60.20
## B :129 1st Qu.:5.287 1st Qu.:23.92 1st Qu.:0.23333 1st Qu.:65.30
## C : 31 Median :5.340 Median :23.97 Median :0.27533 Median :68.00
## D : 64 Mean :5.369 Mean :23.97 Mean :0.27769 Mean :68.25
## NA's: 8 3rd Qu.:5.465 3rd Qu.:24.01 3rd Qu.:0.32200 3rd Qu.:70.60
## Max. :5.667 Max. :24.20 Max. :0.46400 Max. :77.60
## NA's :1 NA's :6 NA's :4
## Carb.Temp PSC PSC.Fill PSC.CO2
## Min. :130.0 Min. :0.00400 Min. :0.0200 Min. :0.00000
## 1st Qu.:138.4 1st Qu.:0.04450 1st Qu.:0.1000 1st Qu.:0.02000
## Median :140.8 Median :0.07600 Median :0.1800 Median :0.04000
## Mean :141.2 Mean :0.08545 Mean :0.1903 Mean :0.05107
## 3rd Qu.:143.8 3rd Qu.:0.11200 3rd Qu.:0.2600 3rd Qu.:0.06000
## Max. :154.0 Max. :0.24600 Max. :0.6200 Max. :0.24000
## NA's :1 NA's :5 NA's :3 NA's :5
## Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1
## Min. :-100.20 Min. :113.0 Min. :37.80 Min. :-50.00
## 1st Qu.:-100.00 1st Qu.:120.2 1st Qu.:46.00 1st Qu.: 0.00
## Median : 0.20 Median :123.4 Median :47.80 Median : 10.40
## Mean : 21.03 Mean :123.0 Mean :48.14 Mean : 12.01
## 3rd Qu.: 141.30 3rd Qu.:125.5 3rd Qu.:50.20 3rd Qu.: 20.40
## Max. : 220.40 Max. :136.0 Max. :60.20 Max. : 50.00
## NA's :4 NA's :2
## Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level
## Min. :-50.00 Min. :-50.00 Min. : 68.00 Min. : 69.2
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 90.00 1st Qu.:100.6
## Median : 26.80 Median : 27.70 Median : 98.00 Median :118.6
## Mean : 20.11 Mean : 19.61 Mean : 97.84 Mean :110.3
## 3rd Qu.: 34.80 3rd Qu.: 33.00 3rd Qu.:104.00 3rd Qu.:120.2
## Max. : 61.40 Max. : 49.20 Max. :140.00 Max. :153.2
## NA's :1 NA's :1 NA's :4 NA's :2
## Filler.Speed Temperature Usage.cont Carb.Flow Density
## Min. :1006 Min. :63.80 Min. :12.90 Min. : 0 Min. :0.060
## 1st Qu.:3812 1st Qu.:65.40 1st Qu.:18.12 1st Qu.:1083 1st Qu.:0.920
## Median :3978 Median :65.80 Median :21.44 Median :3038 Median :0.980
## Mean :3581 Mean :66.23 Mean :20.90 Mean :2409 Mean :1.177
## 3rd Qu.:3996 3rd Qu.:66.60 3rd Qu.:23.74 3rd Qu.:3215 3rd Qu.:1.600
## Max. :4020 Max. :75.40 Max. :24.60 Max. :3858 Max. :1.840
## NA's :10 NA's :2 NA's :2 NA's :1
## MFR Balling Pressure.Vacuum Oxygen.Filler
## Min. : 15.6 Min. :0.902 Min. :-6.400 Min. :0.00240
## 1st Qu.:707.0 1st Qu.:1.498 1st Qu.:-5.600 1st Qu.:0.01960
## Median :724.6 Median :1.648 Median :-5.200 Median :0.03370
## Mean :697.8 Mean :2.203 Mean :-5.174 Mean :0.04666
## 3rd Qu.:731.5 3rd Qu.:3.242 3rd Qu.:-4.800 3rd Qu.:0.05440
## Max. :784.8 Max. :3.788 Max. :-3.600 Max. :0.39800
## NA's :31 NA's :1 NA's :1 NA's :3
## Bowl.Setpoint Pressure.Setpoint Air.Pressurer Alch.Rel
## Min. : 70.0 Min. :44.00 Min. :141.2 Min. :6.400
## 1st Qu.:100.0 1st Qu.:46.00 1st Qu.:142.2 1st Qu.:6.540
## Median :120.0 Median :46.00 Median :142.6 Median :6.580
## Mean :109.6 Mean :47.73 Mean :142.8 Mean :6.907
## 3rd Qu.:120.0 3rd Qu.:50.00 3rd Qu.:142.8 3rd Qu.:7.180
## Max. :130.0 Max. :52.00 Max. :147.2 Max. :7.820
## NA's :1 NA's :2 NA's :1 NA's :3
## Carb.Rel Balling.Lvl
## Min. :5.18 Min. :0.000
## 1st Qu.:5.34 1st Qu.:1.380
## Median :5.40 Median :1.480
## Mean :5.44 Mean :2.051
## 3rd Qu.:5.56 3rd Qu.:3.080
## Max. :5.74 Max. :3.420
## NA's :2
#make Brand code a factor
#Test_set_bev.imp <- mice(Test_set_bev, m =3, maxit =3, print = FALSE, seed = 234)
#Test_set_bev.imp.missForest <- rfImpute(PH ~ ., Test_set_bev)
#summary(Test_set_bev.imp[1]$data)
Test_set_bev.missForest2 <- missForest(Test_set_bev.missForest)
## missForest iteration 1 in progress...done!
## missForest iteration 2 in progress...done!
## missForest iteration 3 in progress...done!
## Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure
## A: 35 Min. :5.147 Min. :23.75 Min. :0.09867 Min. :60.20
## B:135 1st Qu.:5.286 1st Qu.:23.92 1st Qu.:0.23433 1st Qu.:65.30
## C: 33 Median :5.340 Median :23.97 Median :0.27600 Median :68.00
## D: 64 Mean :5.368 Mean :23.97 Mean :0.27815 Mean :68.25
## 3rd Qu.:5.463 3rd Qu.:24.01 3rd Qu.:0.32233 3rd Qu.:70.60
## Max. :5.667 Max. :24.20 Max. :0.46400 Max. :77.60
## Carb.Temp PSC PSC.Fill PSC.CO2
## Min. :130.0 Min. :0.0040 Min. :0.0200 Min. :0.00000
## 1st Qu.:138.4 1st Qu.:0.0460 1st Qu.:0.1100 1st Qu.:0.02000
## Median :140.8 Median :0.0780 Median :0.1800 Median :0.04000
## Mean :141.2 Mean :0.0858 Mean :0.1906 Mean :0.05126
## 3rd Qu.:143.9 3rd Qu.:0.1120 3rd Qu.:0.2500 3rd Qu.:0.06000
## Max. :154.0 Max. :0.2460 Max. :0.6200 Max. :0.24000
## Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1
## Min. :-100.20 Min. :113.0 Min. :37.80 Min. :-50.00
## 1st Qu.:-100.00 1st Qu.:120.2 1st Qu.:46.00 1st Qu.: 0.00
## Median : 0.20 Median :123.4 Median :47.80 Median : 10.40
## Mean : 21.03 Mean :123.0 Mean :48.12 Mean : 12.01
## 3rd Qu.: 141.30 3rd Qu.:125.5 3rd Qu.:50.20 3rd Qu.: 20.40
## Max. : 220.40 Max. :136.0 Max. :60.20 Max. : 50.00
## Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level
## Min. :-50.00 Min. :-50.00 Min. : 68.00 Min. : 69.2
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 90.00 1st Qu.:100.6
## Median : 26.80 Median : 27.60 Median : 98.00 Median :118.6
## Mean : 20.04 Mean : 19.54 Mean : 98.07 Mean :110.4
## 3rd Qu.: 34.80 3rd Qu.: 33.00 3rd Qu.:104.00 3rd Qu.:120.2
## Max. : 61.40 Max. : 49.20 Max. :140.00 Max. :153.2
## Filler.Speed Temperature Usage.cont Carb.Flow Density
## Min. :1006 Min. :63.80 Min. :12.90 Min. : 0 Min. :0.060
## 1st Qu.:3795 1st Qu.:65.40 1st Qu.:18.12 1st Qu.:1083 1st Qu.:0.910
## Median :3918 Median :65.80 Median :21.40 Median :3038 Median :0.980
## Mean :3506 Mean :66.24 Mean :20.89 Mean :2409 Mean :1.175
## 3rd Qu.:3996 3rd Qu.:66.60 3rd Qu.:23.74 3rd Qu.:3215 3rd Qu.:1.600
## Max. :4020 Max. :75.40 Max. :24.60 Max. :3858 Max. :1.840
## MFR Balling Pressure.Vacuum Oxygen.Filler
## Min. : 15.6 Min. :0.902 Min. :-6.400 Min. :0.00240
## 1st Qu.:687.2 1st Qu.:1.497 1st Qu.:-5.600 1st Qu.:0.02070
## Median :720.4 Median :1.648 Median :-5.200 Median :0.03380
## Mean :670.8 Mean :2.200 Mean :-5.173 Mean :0.04724
## 3rd Qu.:730.7 3rd Qu.:3.242 3rd Qu.:-4.800 3rd Qu.:0.05710
## Max. :784.8 Max. :3.788 Max. :-3.600 Max. :0.39800
## Bowl.Setpoint Pressure.Setpoint Air.Pressurer Alch.Rel
## Min. : 70.0 Min. :44.00 Min. :141.2 Min. :6.400
## 1st Qu.:100.0 1st Qu.:46.00 1st Qu.:142.2 1st Qu.:6.540
## Median :120.0 Median :46.00 Median :142.6 Median :6.580
## Mean :109.6 Mean :47.73 Mean :142.8 Mean :6.904
## 3rd Qu.:120.0 3rd Qu.:50.00 3rd Qu.:142.8 3rd Qu.:7.180
## Max. :130.0 Max. :52.00 Max. :147.2 Max. :7.820
## Carb.Rel Balling.Lvl
## Min. :5.18 Min. :0.000
## 1st Qu.:5.34 1st Qu.:1.380
## Median :5.40 Median :1.480
## Mean :5.44 Mean :2.051
## 3rd Qu.:5.56 3rd Qu.:3.080
## Max. :5.74 Max. :3.420
Use the Random forest model to predict PH because out of all the models it has the lowest RSME
## Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure
## A: 35 Min. :5.147 Min. :23.75 Min. :0.09867 Min. :60.20
## B:135 1st Qu.:5.286 1st Qu.:23.92 1st Qu.:0.23433 1st Qu.:65.30
## C: 33 Median :5.340 Median :23.97 Median :0.27600 Median :68.00
## D: 64 Mean :5.368 Mean :23.97 Mean :0.27815 Mean :68.25
## 3rd Qu.:5.463 3rd Qu.:24.01 3rd Qu.:0.32233 3rd Qu.:70.60
## Max. :5.667 Max. :24.20 Max. :0.46400 Max. :77.60
## Carb.Temp PSC PSC.Fill PSC.CO2
## Min. :130.0 Min. :0.0040 Min. :0.0200 Min. :0.00000
## 1st Qu.:138.4 1st Qu.:0.0460 1st Qu.:0.1100 1st Qu.:0.02000
## Median :140.8 Median :0.0780 Median :0.1800 Median :0.04000
## Mean :141.2 Mean :0.0858 Mean :0.1906 Mean :0.05126
## 3rd Qu.:143.9 3rd Qu.:0.1120 3rd Qu.:0.2500 3rd Qu.:0.06000
## Max. :154.0 Max. :0.2460 Max. :0.6200 Max. :0.24000
## Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1
## Min. :-100.20 Min. :113.0 Min. :37.80 Min. :-50.00
## 1st Qu.:-100.00 1st Qu.:120.2 1st Qu.:46.00 1st Qu.: 0.00
## Median : 0.20 Median :123.4 Median :47.80 Median : 10.40
## Mean : 21.03 Mean :123.0 Mean :48.12 Mean : 12.01
## 3rd Qu.: 141.30 3rd Qu.:125.5 3rd Qu.:50.20 3rd Qu.: 20.40
## Max. : 220.40 Max. :136.0 Max. :60.20 Max. : 50.00
## Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level
## Min. :-50.00 Min. :-50.00 Min. : 68.00 Min. : 69.2
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 90.00 1st Qu.:100.6
## Median : 26.80 Median : 27.60 Median : 98.00 Median :118.6
## Mean : 20.04 Mean : 19.54 Mean : 98.07 Mean :110.4
## 3rd Qu.: 34.80 3rd Qu.: 33.00 3rd Qu.:104.00 3rd Qu.:120.2
## Max. : 61.40 Max. : 49.20 Max. :140.00 Max. :153.2
## Filler.Speed Temperature Usage.cont Carb.Flow Density
## Min. :1006 Min. :63.80 Min. :12.90 Min. : 0 Min. :0.060
## 1st Qu.:3795 1st Qu.:65.40 1st Qu.:18.12 1st Qu.:1083 1st Qu.:0.910
## Median :3918 Median :65.80 Median :21.40 Median :3038 Median :0.980
## Mean :3506 Mean :66.24 Mean :20.89 Mean :2409 Mean :1.175
## 3rd Qu.:3996 3rd Qu.:66.60 3rd Qu.:23.74 3rd Qu.:3215 3rd Qu.:1.600
## Max. :4020 Max. :75.40 Max. :24.60 Max. :3858 Max. :1.840
## MFR Balling Pressure.Vacuum Oxygen.Filler
## Min. : 15.6 Min. :0.902 Min. :-6.400 Min. :0.00240
## 1st Qu.:687.2 1st Qu.:1.497 1st Qu.:-5.600 1st Qu.:0.02070
## Median :720.4 Median :1.648 Median :-5.200 Median :0.03380
## Mean :670.8 Mean :2.200 Mean :-5.173 Mean :0.04724
## 3rd Qu.:730.7 3rd Qu.:3.242 3rd Qu.:-4.800 3rd Qu.:0.05710
## Max. :784.8 Max. :3.788 Max. :-3.600 Max. :0.39800
## Bowl.Setpoint Pressure.Setpoint Air.Pressurer Alch.Rel
## Min. : 70.0 Min. :44.00 Min. :141.2 Min. :6.400
## 1st Qu.:100.0 1st Qu.:46.00 1st Qu.:142.2 1st Qu.:6.540
## Median :120.0 Median :46.00 Median :142.6 Median :6.580
## Mean :109.6 Mean :47.73 Mean :142.8 Mean :6.904
## 3rd Qu.:120.0 3rd Qu.:50.00 3rd Qu.:142.8 3rd Qu.:7.180
## Max. :130.0 Max. :52.00 Max. :147.2 Max. :7.820
## Carb.Rel Balling.Lvl PH
## Min. :5.18 Min. :0.000 Min. :8.336
## 1st Qu.:5.34 1st Qu.:1.380 1st Qu.:8.462
## Median :5.40 Median :1.480 Median :8.527
## Mean :5.44 Mean :2.051 Mean :8.513
## 3rd Qu.:5.56 3rd Qu.:3.080 3rd Qu.:8.558
## Max. :5.74 Max. :3.420 Max. :8.660