624project2.knit

Technical Report - ABC Beverage Company

by Daniel DeBonis

Assignment

This is role playing. I am your new boss. I am in charge of production at ABC Beverage and you are a team of data scientists reporting to me. My leadership has told me that new regulations are requiring us to understand our manufacturing process, the predictive factors and be able to report to them our predictive model of PH.

Please use the historical data set I am providing. Build and report the factors in BOTH a technical and non-technical report. I like to use Word and Excel. Please provide your non-technical report in a business friendly readable document and your predictions in an Excel readable format. The technical report should show clearly the models you tested and how you selected your final approach.

Please submit both Rpubs links and .rmd files or other readable formats for technical and non-technical reports. Also submit the excel file showing the prediction of your models for pH.

Libraries

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(AppliedPredictiveModeling)
library(readxl)
library(corrplot)

## corrplot 0.95 loaded

library(caret)

## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift

library(mice)

## 
## Attaching package: 'mice'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     cbind, rbind

library(elasticnet)

## Loading required package: lars
## Loaded lars 1.3

library(earth)

## Loading required package: Formula
## Loading required package: plotmo
## Loading required package: plotrix

library(kernlab)

## 
## Attaching package: 'kernlab'
## 
## The following object is masked from 'package:mice':
## 
##     convergence
## 
## The following object is masked from 'package:purrr':
## 
##     cross
## 
## The following object is masked from 'package:ggplot2':
## 
##     alpha

library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(randomForest)

## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
## 
## The following object is masked from 'package:ggplot2':
## 
##     margin

library(openxlsx)

Loading data

train <- read_excel("C:/Users/ddebo/Downloads/StudentData.xlsx")
test <- read_excel('C:/Users/ddebo/Downloads/StudentEvaluation.xlsx')

Exploratory Analysis

train <- train |>
  clean_names(case = "snake")
head(train)

## # A tibble: 6 × 33
##   brand_code carb_volume fill_ounces pc_volume carb_pressure carb_temp    psc
##   <chr>            <dbl>       <dbl>     <dbl>         <dbl>     <dbl>  <dbl>
## 1 B                 5.34        24.0     0.263          68.2      141.  0.104
## 2 A                 5.43        24.0     0.239          68.4      140.  0.124
## 3 B                 5.29        24.1     0.263          70.8      145.  0.09 
## 4 A                 5.44        24.0     0.293          63        133. NA    
## 5 A                 5.49        24.3     0.111          67.2      137.  0.026
## 6 A                 5.38        23.9     0.269          66.6      138.  0.09 
## # ℹ 26 more variables: psc_fill <dbl>, psc_co2 <dbl>, mnf_flow <dbl>,
## #   carb_pressure1 <dbl>, fill_pressure <dbl>, hyd_pressure1 <dbl>,
## #   hyd_pressure2 <dbl>, hyd_pressure3 <dbl>, hyd_pressure4 <dbl>,
## #   filler_level <dbl>, filler_speed <dbl>, temperature <dbl>,
## #   usage_cont <dbl>, carb_flow <dbl>, density <dbl>, mfr <dbl>, balling <dbl>,
## #   pressure_vacuum <dbl>, ph <dbl>, oxygen_filler <dbl>, bowl_setpoint <dbl>,
## #   pressure_setpoint <dbl>, air_pressurer <dbl>, alch_rel <dbl>, …

Here we can see that we are provided with 33 variables, one of which is our target variable, PH. All variables are numeric except for the first one, where brands are coded by letter. For further analysis, we will likely need to convert these categories to dummy variables.

str(train)

## tibble [2,571 × 33] (S3: tbl_df/tbl/data.frame)
##  $ brand_code       : chr [1:2571] "B" "A" "B" "A" ...
##  $ carb_volume      : num [1:2571] 5.34 5.43 5.29 5.44 5.49 ...
##  $ fill_ounces      : num [1:2571] 24 24 24.1 24 24.3 ...
##  $ pc_volume        : num [1:2571] 0.263 0.239 0.263 0.293 0.111 ...
##  $ carb_pressure    : num [1:2571] 68.2 68.4 70.8 63 67.2 66.6 64.2 67.6 64.2 72 ...
##  $ carb_temp        : num [1:2571] 141 140 145 133 137 ...
##  $ psc              : num [1:2571] 0.104 0.124 0.09 NA 0.026 0.09 0.128 0.154 0.132 0.014 ...
##  $ psc_fill         : num [1:2571] 0.26 0.22 0.34 0.42 0.16 ...
##  $ psc_co2          : num [1:2571] 0.04 0.04 0.16 0.04 0.12 ...
##  $ mnf_flow         : num [1:2571] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 ...
##  $ carb_pressure1   : num [1:2571] 119 122 120 115 118 ...
##  $ fill_pressure    : num [1:2571] 46 46 46 46.4 45.8 45.6 51.8 46.8 46 45.2 ...
##  $ hyd_pressure1    : num [1:2571] 0 0 0 0 0 0 0 0 0 0 ...
##  $ hyd_pressure2    : num [1:2571] NA NA NA 0 0 0 0 0 0 0 ...
##  $ hyd_pressure3    : num [1:2571] NA NA NA 0 0 0 0 0 0 0 ...
##  $ hyd_pressure4    : num [1:2571] 118 106 82 92 92 116 124 132 90 108 ...
##  $ filler_level     : num [1:2571] 121 119 120 118 119 ...
##  $ filler_speed     : num [1:2571] 4002 3986 4020 4012 4010 ...
##  $ temperature      : num [1:2571] 66 67.6 67 65.6 65.6 66.2 65.8 65.2 65.4 66.6 ...
##  $ usage_cont       : num [1:2571] 16.2 19.9 17.8 17.4 17.7 ...
##  $ carb_flow        : num [1:2571] 2932 3144 2914 3062 3054 ...
##  $ density          : num [1:2571] 0.88 0.92 1.58 1.54 1.54 1.52 0.84 0.84 0.9 0.9 ...
##  $ mfr              : num [1:2571] 725 727 735 731 723 ...
##  $ balling          : num [1:2571] 1.4 1.5 3.14 3.04 3.04 ...
##  $ pressure_vacuum  : num [1:2571] -4 -4 -3.8 -4.4 -4.4 -4.4 -4.4 -4.4 -4.4 -4.4 ...
##  $ ph               : num [1:2571] 8.36 8.26 8.94 8.24 8.26 8.32 8.4 8.38 8.38 8.5 ...
##  $ oxygen_filler    : num [1:2571] 0.022 0.026 0.024 0.03 0.03 0.024 0.066 0.046 0.064 0.022 ...
##  $ bowl_setpoint    : num [1:2571] 120 120 120 120 120 120 120 120 120 120 ...
##  $ pressure_setpoint: num [1:2571] 46.4 46.8 46.6 46 46 46 46 46 46 46 ...
##  $ air_pressurer    : num [1:2571] 143 143 142 146 146 ...
##  $ alch_rel         : num [1:2571] 6.58 6.56 7.66 7.14 7.14 7.16 6.54 6.52 6.52 6.54 ...
##  $ carb_rel         : num [1:2571] 5.32 5.3 5.84 5.42 5.44 5.44 5.38 5.34 5.34 5.34 ...
##  $ balling_lvl      : num [1:2571] 1.48 1.56 3.28 3.04 3.04 3.02 1.44 1.44 1.44 1.38 ...

In the first few rows alone we see NA values, so that is something that will be addressed in the cleaning stage. Additionally, it was necessary to clean the column names due to difficulty encountered later in the process. Might as well make the same change for our test set which will be used at the end.

test <- test |>
  clean_names(case = "snake")
summary(test)

##   brand_code         carb_volume     fill_ounces      pc_volume      
##  Length:267         Min.   :5.147   Min.   :23.75   Min.   :0.09867  
##  Class :character   1st Qu.:5.287   1st Qu.:23.92   1st Qu.:0.23333  
##  Mode  :character   Median :5.340   Median :23.97   Median :0.27533  
##                     Mean   :5.369   Mean   :23.97   Mean   :0.27769  
##                     3rd Qu.:5.465   3rd Qu.:24.01   3rd Qu.:0.32200  
##                     Max.   :5.667   Max.   :24.20   Max.   :0.46400  
##                     NA's   :1       NA's   :6       NA's   :4        
##  carb_pressure     carb_temp          psc             psc_fill     
##  Min.   :60.20   Min.   :130.0   Min.   :0.00400   Min.   :0.0200  
##  1st Qu.:65.30   1st Qu.:138.4   1st Qu.:0.04450   1st Qu.:0.1000  
##  Median :68.00   Median :140.8   Median :0.07600   Median :0.1800  
##  Mean   :68.25   Mean   :141.2   Mean   :0.08545   Mean   :0.1903  
##  3rd Qu.:70.60   3rd Qu.:143.8   3rd Qu.:0.11200   3rd Qu.:0.2600  
##  Max.   :77.60   Max.   :154.0   Max.   :0.24600   Max.   :0.6200  
##                  NA's   :1       NA's   :5         NA's   :3       
##     psc_co2           mnf_flow       carb_pressure1  fill_pressure  
##  Min.   :0.00000   Min.   :-100.20   Min.   :113.0   Min.   :37.80  
##  1st Qu.:0.02000   1st Qu.:-100.00   1st Qu.:120.2   1st Qu.:46.00  
##  Median :0.04000   Median :   0.20   Median :123.4   Median :47.80  
##  Mean   :0.05107   Mean   :  21.03   Mean   :123.0   Mean   :48.14  
##  3rd Qu.:0.06000   3rd Qu.: 141.30   3rd Qu.:125.5   3rd Qu.:50.20  
##  Max.   :0.24000   Max.   : 220.40   Max.   :136.0   Max.   :60.20  
##  NA's   :5                           NA's   :4       NA's   :2      
##  hyd_pressure1    hyd_pressure2    hyd_pressure3    hyd_pressure4   
##  Min.   :-50.00   Min.   :-50.00   Min.   :-50.00   Min.   : 68.00  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.: 90.00  
##  Median : 10.40   Median : 26.80   Median : 27.70   Median : 98.00  
##  Mean   : 12.01   Mean   : 20.11   Mean   : 19.61   Mean   : 97.84  
##  3rd Qu.: 20.40   3rd Qu.: 34.80   3rd Qu.: 33.00   3rd Qu.:104.00  
##  Max.   : 50.00   Max.   : 61.40   Max.   : 49.20   Max.   :140.00  
##                   NA's   :1        NA's   :1        NA's   :4       
##   filler_level    filler_speed   temperature      usage_cont      carb_flow   
##  Min.   : 69.2   Min.   :1006   Min.   :63.80   Min.   :12.90   Min.   :   0  
##  1st Qu.:100.6   1st Qu.:3812   1st Qu.:65.40   1st Qu.:18.12   1st Qu.:1083  
##  Median :118.6   Median :3978   Median :65.80   Median :21.44   Median :3038  
##  Mean   :110.3   Mean   :3581   Mean   :66.23   Mean   :20.90   Mean   :2409  
##  3rd Qu.:120.2   3rd Qu.:3996   3rd Qu.:66.60   3rd Qu.:23.74   3rd Qu.:3215  
##  Max.   :153.2   Max.   :4020   Max.   :75.40   Max.   :24.60   Max.   :3858  
##  NA's   :2       NA's   :10     NA's   :2       NA's   :2                     
##     density           mfr           balling      pressure_vacuum 
##  Min.   :0.060   Min.   : 15.6   Min.   :0.902   Min.   :-6.400  
##  1st Qu.:0.920   1st Qu.:707.0   1st Qu.:1.498   1st Qu.:-5.600  
##  Median :0.980   Median :724.6   Median :1.648   Median :-5.200  
##  Mean   :1.177   Mean   :697.8   Mean   :2.203   Mean   :-5.174  
##  3rd Qu.:1.600   3rd Qu.:731.5   3rd Qu.:3.242   3rd Qu.:-4.800  
##  Max.   :1.840   Max.   :784.8   Max.   :3.788   Max.   :-3.600  
##  NA's   :1       NA's   :31      NA's   :1       NA's   :1       
##     ph          oxygen_filler     bowl_setpoint   pressure_setpoint
##  Mode:logical   Min.   :0.00240   Min.   : 70.0   Min.   :44.00    
##  NA's:267       1st Qu.:0.01960   1st Qu.:100.0   1st Qu.:46.00    
##                 Median :0.03370   Median :120.0   Median :46.00    
##                 Mean   :0.04666   Mean   :109.6   Mean   :47.73    
##                 3rd Qu.:0.05440   3rd Qu.:120.0   3rd Qu.:50.00    
##                 Max.   :0.39800   Max.   :130.0   Max.   :52.00    
##                 NA's   :3         NA's   :1       NA's   :2        
##  air_pressurer      alch_rel        carb_rel     balling_lvl   
##  Min.   :141.2   Min.   :6.400   Min.   :5.18   Min.   :0.000  
##  1st Qu.:142.2   1st Qu.:6.540   1st Qu.:5.34   1st Qu.:1.380  
##  Median :142.6   Median :6.580   Median :5.40   Median :1.480  
##  Mean   :142.8   Mean   :6.907   Mean   :5.44   Mean   :2.051  
##  3rd Qu.:142.8   3rd Qu.:7.180   3rd Qu.:5.56   3rd Qu.:3.080  
##  Max.   :147.2   Max.   :7.820   Max.   :5.74   Max.   :3.420  
##  NA's   :1       NA's   :3       NA's   :2

Using the summary function, we are given more detail as to the spread of each of these variables, as well as how many NA values are contained. This also confirms that we will need to address scale before doing any analysis being that the scales of the variables vary dramatically.

train |>
  select(where(is.numeric)) |>
  pivot_longer(cols = everything(), names_to = "Feature", values_to = "Value") |>
  filter(!is.na(Value)) |>
  ggplot(aes(x = Value)) +
  geom_histogram(bins = 30, fill = "skyblue", color = "black") +
  facet_wrap(~Feature, scales = "free") +
  ggtitle("Histograms of Numerical Features")

Here, the histograms allow us to visualize the distribution of each variable. The most important of which, PH, appears to be fairly normally distributed. Across these variables, we see examples of left skew(filler speed), right skew (PSC, PSC Fill, Temp), bimodality (density, balling, carb flow), and variables that are likely skewed by specific values (Mnf Flow, HYD pressure 1-3).

train2 <- train %>%
  mutate(`Brand Code` = as.factor(brand_code))

# Plot Brand Code distribution
ggplot(train2, aes(x = brand_code)) +
  geom_bar(fill = "skyblue", color = "black") +
  ggtitle("Distribution of Brand Code") +
  theme_minimal()

train |>
  ggplot(aes(x = ph)) +
  geom_histogram(bins= 40, fill = "lightblue", color = "black") +
  labs(title = "Distribution of pH by Brand", y = "Count" ) + 
  facet_wrap(~ brand_code, scales = 'free_y')

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

There are a surprising number of rows with an NA value for Brand Code, which is one of the most important variables in our analysis since our beverage company produces multiple brands of drink. It seems the best way to address this categorical variable in our analysis is to use one hot encoding to make it numerical, especially since there are only four brands, so we are not adding too many new dimensions to the data with this conversion. Additionally, it allows us to view those not assigned a brand as simply not within one of those categories.

model <- dummyVars(~ ., data = train)
train_pp <- predict(model, newdata = train) |>
  as.data.frame()
test_pp <- predict(model, newdata = test) |>
  as.data.frame()
test_pp <- test_pp |>
  select(-c(phFALSE, phTRUE))

Correlation Matrix

cor_matrix <- cor(train_pp, use = "pairwise.complete.obs")
corrplot::corrplot(cor_matrix, order = "hclust", tl.cex = 0.7)

The two things that jump out from this plot are the strong negative correlations with a set of variables and brand B, including Balling Lvl, Balling, Alch Rel, Density, and Carb Rel/Volume. On the other hand, those same variables are shown to be highly correlated with each other.

Data Preprocessing

The issue of missing data has come up several times in the exploratory stage. First, let us get a sense of the problem.

PH is our target variable and since this is our training set, we do not want to have NA as an option for our model, so it is best to remove those rows from further analysis.

train_pp = train_pp[!is.na(train_pp$ph),]
sum(is.na(train_pp$ph))

## [1] 0

set.seed(24601)
# MICE imputation
imputed_data <- mice(train_pp, m = 1, method = 'rf', print = FALSE)
summary(imputed_data)

## Class: mids
## Number of multiple imputations:  1 
## Imputation methods:
##       brand_codeA       brand_codeB       brand_codeC       brand_codeD 
##              "rf"              "rf"              "rf"              "rf" 
##       carb_volume       fill_ounces         pc_volume     carb_pressure 
##              "rf"              "rf"              "rf"              "rf" 
##         carb_temp               psc          psc_fill           psc_co2 
##              "rf"              "rf"              "rf"              "rf" 
##          mnf_flow    carb_pressure1     fill_pressure     hyd_pressure1 
##                ""              "rf"              "rf"              "rf" 
##     hyd_pressure2     hyd_pressure3     hyd_pressure4      filler_level 
##              "rf"              "rf"              "rf"              "rf" 
##      filler_speed       temperature        usage_cont         carb_flow 
##              "rf"              "rf"              "rf"              "rf" 
##           density               mfr           balling   pressure_vacuum 
##                ""              "rf"                ""                "" 
##                ph     oxygen_filler     bowl_setpoint pressure_setpoint 
##                ""              "rf"              "rf"              "rf" 
##     air_pressurer          alch_rel          carb_rel       balling_lvl 
##                ""              "rf"              "rf"              "rf" 
## PredictorMatrix:
##             brand_codeA brand_codeB brand_codeC brand_codeD carb_volume
## brand_codeA           0           1           1           1           1
## brand_codeB           1           0           1           1           1
## brand_codeC           1           1           0           1           1
## brand_codeD           1           1           1           0           1
## carb_volume           1           1           1           1           0
## fill_ounces           1           1           1           1           1
##             fill_ounces pc_volume carb_pressure carb_temp psc psc_fill psc_co2
## brand_codeA           1         1             1         1   1        1       1
## brand_codeB           1         1             1         1   1        1       1
## brand_codeC           1         1             1         1   1        1       1
## brand_codeD           1         1             1         1   1        1       1
## carb_volume           1         1             1         1   1        1       1
## fill_ounces           0         1             1         1   1        1       1
##             mnf_flow carb_pressure1 fill_pressure hyd_pressure1 hyd_pressure2
## brand_codeA        1              1             1             1             1
## brand_codeB        1              1             1             1             1
## brand_codeC        1              1             1             1             1
## brand_codeD        1              1             1             1             1
## carb_volume        1              1             1             1             1
## fill_ounces        1              1             1             1             1
##             hyd_pressure3 hyd_pressure4 filler_level filler_speed temperature
## brand_codeA             1             1            1            1           1
## brand_codeB             1             1            1            1           1
## brand_codeC             1             1            1            1           1
## brand_codeD             1             1            1            1           1
## carb_volume             1             1            1            1           1
## fill_ounces             1             1            1            1           1
##             usage_cont carb_flow density mfr balling pressure_vacuum ph
## brand_codeA          1         1       1   1       1               1  1
## brand_codeB          1         1       1   1       1               1  1
## brand_codeC          1         1       1   1       1               1  1
## brand_codeD          1         1       1   1       1               1  1
## carb_volume          1         1       1   1       1               1  1
## fill_ounces          1         1       1   1       1               1  1
##             oxygen_filler bowl_setpoint pressure_setpoint air_pressurer
## brand_codeA             1             1                 1             1
## brand_codeB             1             1                 1             1
## brand_codeC             1             1                 1             1
## brand_codeD             1             1                 1             1
## carb_volume             1             1                 1             1
## fill_ounces             1             1                 1             1
##             alch_rel carb_rel balling_lvl
## brand_codeA        1        1           1
## brand_codeB        1        1           1
## brand_codeC        1        1           1
## brand_codeD        1        1           1
## carb_volume        1        1           1
## fill_ounces        1        1           1

# Extract the completed dataset
train_pp <- complete(imputed_data)

# Check for any remaining missing values
colSums(is.na(train_pp))

##       brand_codeA       brand_codeB       brand_codeC       brand_codeD 
##                 0                 0                 0                 0 
##       carb_volume       fill_ounces         pc_volume     carb_pressure 
##                 0                 0                 0                 0 
##         carb_temp               psc          psc_fill           psc_co2 
##                 0                 0                 0                 0 
##          mnf_flow    carb_pressure1     fill_pressure     hyd_pressure1 
##                 0                 0                 0                 0 
##     hyd_pressure2     hyd_pressure3     hyd_pressure4      filler_level 
##                 0                 0                 0                 0 
##      filler_speed       temperature        usage_cont         carb_flow 
##                 0                 0                 0                 0 
##           density               mfr           balling   pressure_vacuum 
##                 0                 0                 0                 0 
##                ph     oxygen_filler     bowl_setpoint pressure_setpoint 
##                 0                 0                 0                 0 
##     air_pressurer          alch_rel          carb_rel       balling_lvl 
##                 0                 0                 0                 0

Now we no longer have to worry about missing values after this use of the mice program to impute, followed by a confirmation that there are no longer any missing values.

But first, If any variables have a variance near zero, they will not be useful for any analysis, so they should be removed.

nzv_cols <- nearZeroVar(train_pp, names = TRUE)
nzv_cols

## [1] "hyd_pressure1"

if(length(nzv_cols) > 0) {
  train_pp <- train_pp[, !colnames(train_pp) %in% nzv_cols]
}
if(length(nzv_cols) > 0) {
  test_pp <- test_pp[, !colnames(test_pp) %in% nzv_cols]
}

The only variable identified was Hyd Pressure1, but it has been removed from our data set. Another important thing to check for are outliers in our target variable, particularly those that could have been caused by measurement or human error, since they would impact the accuracy of our analysis.

outliers <- train_pp |>
  mutate(
    Q1 = quantile(ph, 0.25, na.rm = TRUE),
    Q3 = quantile(ph, 0.75, na.rm = TRUE),
    IQR = Q3 - Q1,
    lower = Q1 - 1.5 * IQR,
    upper = Q3 + 1.5 * IQR,
    is_outlier = (ph < lower | ph > upper)
  ) |>
  filter(is_outlier == TRUE)
outliers

##      brand_codeA brand_codeB brand_codeC brand_codeD carb_volume fill_ounces
## 1094           0           0           1           0    5.220000    23.92000
## 1876           1           0           0           0    5.626667    24.28000
## 2088           0           0           1           0    5.200000    24.03333
## 2094           0           1           0           0    5.373333    23.92667
## 2095           0           1           0           0    5.380000    24.09333
## 2097           0           1           0           0    5.426667    23.97333
## 2098           0           1           0           0    5.426667    24.00000
## 2099           0           1           0           0    5.360000    24.01333
## 2100           0           1           0           0    5.346667    23.97333
## 2102           0           1           0           0    5.340000    24.00000
## 2103           0           1           0           0    5.340000    24.14667
## 2104           0           1           0           0    5.346667    24.09333
## 2557           0           0           1           0    5.240000    24.07333
## 2558           0           0           1           0    5.206667    24.05333
## 2559           0           0           1           0    5.386667    24.27333
## 2560           0           0           1           0    5.373333    24.12667
## 2561           0           0           1           0    5.273333    24.01333
## 2562           0           0           1           0    5.366667    24.06667
##      pc_volume carb_pressure carb_temp   psc psc_fill psc_co2 mnf_flow
## 1094 0.3913333          63.6     139.0 0.002     0.18    0.04   -100.0
## 1876 0.1820000          70.6     136.6 0.068     0.32    0.12    143.8
## 2088 0.2413333          63.2     138.6 0.060     0.06    0.04    151.0
## 2094 0.2326667          67.6     140.0 0.128     0.24    0.06    151.0
## 2095 0.2506667          71.0     145.6 0.084     0.20    0.12    158.4
## 2097 0.2580000          63.0     134.0 0.072     0.28    0.10    149.2
## 2098 0.2273333          70.8     143.8 0.050     0.44    0.06    143.4
## 2099 0.2413333          66.6     138.4 0.032     0.06    0.06    147.2
## 2100 0.2606667          70.6     144.8 0.058     0.10    0.06    144.8
## 2102 0.2453333          66.4     139.8 0.116     0.14    0.04    144.6
## 2103 0.2940000          68.4     142.2 0.106     0.44    0.08    143.4
## 2104 0.2940000          66.8     140.2 0.086     0.18    0.02    146.2
## 2557 0.2686667          66.6     142.0 0.012     0.28    0.08    166.0
## 2558 0.3246667          64.8     140.4 0.148     0.34    0.02    164.6
## 2559 0.2866667          64.2     138.4 0.108     0.24    0.06      0.2
## 2560 0.2380000          66.0     139.4 0.124     0.30    0.12    151.8
## 2561 0.2813333          65.4     138.6 0.096     0.34    0.08    161.6
## 2562 0.2713333          69.4     142.0 0.078     0.30    0.08    160.0
##      carb_pressure1 fill_pressure hyd_pressure2 hyd_pressure3 hyd_pressure4
## 1094          138.4          42.8           0.2           0.0           130
## 1876          123.2          50.6          37.2          33.0           100
## 2088          129.0          50.0          33.8          31.0            82
## 2094          121.8          50.0          35.0          32.0            94
## 2095          129.8          50.0          39.8          34.6            96
## 2097          125.2          50.0          37.4          32.8            98
## 2098          123.2          50.2          35.0          33.2            96
## 2099          125.4          50.2          37.8          34.2           104
## 2100          126.6          50.0          36.8          31.6           100
## 2102          127.0          51.4          35.6          34.2            96
## 2103          123.2          50.2          37.2          32.6           104
## 2104          127.6          50.2          37.4          34.2            98
## 2557          123.8          53.0          33.0          38.2           106
## 2558          127.0          50.0          34.0          38.6           104
## 2559          129.0          46.4           0.2          -1.2           136
## 2560          119.6          56.0          28.6          35.4           106
## 2561          125.0          50.6          33.4          39.4           104
## 2562          124.4          49.8          33.8          37.8           102
##      filler_level filler_speed temperature usage_cont carb_flow density   mfr
## 1094        121.4         1010        72.2      17.88        42    0.24 603.4
## 1876        120.6         3996        68.2      23.80      3240    1.66 725.4
## 2088         92.6         3910        67.0      23.66      1092    0.94 702.8
## 2094         89.8         3998        65.2      23.52      1094    0.94 730.6
## 2095         90.2         3996        65.2      23.66      1088    0.92 726.6
## 2097         88.6         3998        65.4      23.60      1094    0.94 734.0
## 2098         90.0         3998        65.4      23.54      1090    0.96 728.6
## 2099         90.0         3996        66.4      23.58      1086    0.92 726.4
## 2100         89.2         3996        65.6      23.58      1094    0.94 729.6
## 2102         91.2         4000        66.0      24.32      1088    0.94 722.4
## 2103         90.4         3994        64.8      23.56      1088    0.92 726.6
## 2104         90.8         3998        64.8      23.68      1092    0.94 728.2
## 2557        109.2         3990        64.0      24.22      1180    0.58 730.2
## 2558        110.6         3998        64.4      23.68      1184    0.60 726.6
## 2559        109.8         1408        66.2      24.26        44    0.50 417.6
## 2560        110.6         3998        64.6      24.26      1212    0.58 736.0
## 2561        109.4         3996        64.8      24.16      1184    0.62 731.8
## 2562        110.6         3992        64.6      23.80      1188    0.60 733.0
##      balling pressure_vacuum   ph oxygen_filler bowl_setpoint pressure_setpoint
## 1094   0.160            -5.0 9.36        0.2380           120                46
## 1876   3.342            -5.4 8.06        0.0322           120                50
## 2088   1.548            -5.2 8.00        0.0026            90                50
## 2094   1.548            -5.2 8.06        0.0134            90                50
## 2095   1.498            -5.4 8.06        0.0026            90                50
## 2097   1.548            -5.4 8.04        0.0098            90                50
## 2098   1.596            -5.4 8.04        0.0070            90                50
## 2099   1.496            -5.4 8.06        0.0200            90                50
## 2100   1.548            -5.4 8.06        0.0204            90                50
## 2102   1.548            -5.4 8.02        0.0172            90                50
## 2103   1.498            -5.4 8.04        0.0254            90                50
## 2104   1.548            -5.4 8.00        0.0248            90                50
## 2557   1.868            -5.8 8.06        0.0260           110                50
## 2558   1.920            -5.8 8.02        0.0058           110                50
## 2559   1.666            -5.8 8.02        0.0422           110                50
## 2560   1.868            -5.8 7.98        0.0026           110                50
## 2561   1.968            -5.8 7.90        0.0286           110                50
## 2562   1.918            -5.8 7.88        0.0238           110                50
##      air_pressurer alch_rel carb_rel balling_lvl   Q1   Q3  IQR lower upper
## 1094         143.2     6.60     5.60        0.00 8.44 8.68 0.24  8.08  9.04
## 1876         142.6     7.16     5.62        3.16 8.44 8.68 0.24  8.08  9.04
## 2088         141.6     6.44     5.14        1.36 8.44 8.68 0.24  8.08  9.04
## 2094         141.8     6.52     5.36        1.36 8.44 8.68 0.24  8.08  9.04
## 2095         142.8     6.52     5.38        1.36 8.44 8.68 0.24  8.08  9.04
## 2097         142.2     6.54     5.36        1.38 8.44 8.68 0.24  8.08  9.04
## 2098         143.0     6.54     5.36        1.36 8.44 8.68 0.24  8.08  9.04
## 2099         142.4     6.52     5.36        1.34 8.44 8.68 0.24  8.08  9.04
## 2100         142.6     6.54     5.36        1.34 8.44 8.68 0.24  8.08  9.04
## 2102         142.2     6.54     5.36        1.34 8.44 8.68 0.24  8.08  9.04
## 2103         142.4     6.52     5.36        1.34 8.44 8.68 0.24  8.08  9.04
## 2104         143.2     6.52     5.36        1.34 8.44 8.68 0.24  8.08  9.04
## 2557         142.8     6.54     5.22        1.56 8.44 8.68 0.24  8.08  9.04
## 2558         142.8     6.54     5.20        1.60 8.44 8.68 0.24  8.08  9.04
## 2559         143.0     6.50     5.44        1.60 8.44 8.68 0.24  8.08  9.04
## 2560         143.2     6.52     5.24        1.58 8.44 8.68 0.24  8.08  9.04
## 2561         143.0     6.52     5.22        1.58 8.44 8.68 0.24  8.08  9.04
## 2562         142.2     6.50     5.26        1.60 8.44 8.68 0.24  8.08  9.04
##      is_outlier
## 1094       TRUE
## 1876       TRUE
## 2088       TRUE
## 2094       TRUE
## 2095       TRUE
## 2097       TRUE
## 2098       TRUE
## 2099       TRUE
## 2100       TRUE
## 2102       TRUE
## 2103       TRUE
## 2104       TRUE
## 2557       TRUE
## 2558       TRUE
## 2559       TRUE
## 2560       TRUE
## 2561       TRUE
## 2562       TRUE

Checking through these values identified as outliers by the 1.5 * interquartile range definition, the one with the largest value of PH (9.36) is one where it could be an erroneous value, but I do not have enough confidence in this to exclude it from analysis. Many of the identified outliers are Brand B, which makes sense given the particular associations we have seen on the correlation plot with only that brand.

The other important aspect that needs to be addressed at this stage is centering and scaling our data. At this point, we also have the option of transforming our data, which would usually be done with a Box-Cox transformation, but since some variables include negative values, we will instead use the Yeo-Johnson transformation. We can save these processes so that they can be carried out as each model is fit.

Model formation

We are going to train and test each model on the processed version of the training set.

set.seed(24601)

# index for training
index <- createDataPartition(train_pp$ph, p = .8, list = FALSE)

# train 
train_data <- train_pp[index, ] 
test_data<- train_pp[-index,]

# Training predictors and response
trainingX <- train_data |> select(-ph)
trainingY <- train_data$ph

# Test predictors and response
testX <- test_data |> select(-ph)
testY <- test_data$ph

# 5-fold cross-validation to make reasonable estimates
ctrl <- trainControl(method = "cv", number = 5)

pre_process <- c("center", "scale", "YeoJohnson")

Linear Regression Models

Partial Least Squares

set.seed(24601)
pls_model <- train(
  x = trainingX,
  y = trainingY,
  method = "pls",
  tuneLength = 20,
  trControl = ctrl,
  preProc = pre_process
)
pls_model

## Partial Least Squares 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE       Rsquared   MAE      
##    1     0.1484247  0.2451831  0.1180160
##    2     0.1418825  0.3105439  0.1116106
##    3     0.1388735  0.3399349  0.1094506
##    4     0.1372503  0.3549506  0.1076676
##    5     0.1357378  0.3692649  0.1059499
##    6     0.1348182  0.3782302  0.1049456
##    7     0.1342272  0.3834515  0.1046529
##    8     0.1341735  0.3843707  0.1045148
##    9     0.1341209  0.3851056  0.1043027
##   10     0.1339549  0.3867800  0.1039439
##   11     0.1339488  0.3869044  0.1039674
##   12     0.1338832  0.3876199  0.1039990
##   13     0.1338562  0.3878321  0.1039513
##   14     0.1338183  0.3881224  0.1039466
##   15     0.1337767  0.3884229  0.1039696
##   16     0.1337626  0.3884881  0.1040110
##   17     0.1336807  0.3891701  0.1039493
##   18     0.1336465  0.3894511  0.1039158
##   19     0.1336533  0.3893828  0.1038781
##   20     0.1336465  0.3894481  0.1038640
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 18.

The maximum variance by a Partial Least Squares model is about 39 percent, and that is with a likely overfitting effect of having so many variables in our model.

plspred <- predict(pls_model, newdata = testX)
plsResample <- postResample(pred=plspred, obs=testY)
plsResample

##      RMSE  Rsquared       MAE 
## 0.1371052 0.4175447 0.1057602

The estimate for R squared this time rises to .4175 when fitting the model on our test data. #### Ordinary Least Squares

ols_model <- train(
  x = trainingX,
  y = trainingY,
  method = "lm",
  trControl = ctrl,
  preProc = pre_process
)
ols_model

## Linear Regression 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1644, 1643, 1643, 1645, 1645 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.1332371  0.3932706  0.1037553
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

This model explains slightly less variance than the Partial Least Squares model. It is however helpful at identifying which variables are statistically significant.

summary(ols_model)

## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50017 -0.07662  0.01040  0.08653  0.42758 
## 
## Coefficients:
##                     Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)        8.545e+00  2.913e-03 2933.732  < 2e-16 ***
## brand_codeA       -1.571e-02  1.458e-02   -1.077 0.281512    
## brand_codeB        1.155e-03  2.134e-02    0.054 0.956855    
## brand_codeC       -4.326e-02  1.432e-02   -3.021 0.002549 ** 
## brand_codeD       -2.890e-03  2.113e-02   -0.137 0.891265    
## carb_volume       -1.867e-03  8.100e-03   -0.230 0.817776    
## fill_ounces       -6.977e-03  3.131e-03   -2.228 0.025984 *  
## pc_volume         -1.077e-02  3.538e-03   -3.045 0.002358 ** 
## carb_pressure     -9.162e-03  1.135e-02   -0.807 0.419712    
## carb_temp          1.283e-02  1.034e-02    1.241 0.214718    
## psc               -2.792e-03  3.111e-03   -0.898 0.369471    
## psc_fill          -6.094e-03  3.036e-03   -2.007 0.044836 *  
## psc_co2           -5.608e-03  2.987e-03   -1.877 0.060648 .  
## mnf_flow          -8.907e-02  6.235e-03  -14.285  < 2e-16 ***
## carb_pressure1     2.607e-02  3.604e-03    7.234 6.62e-13 ***
## fill_pressure      4.811e-03  4.284e-03    1.123 0.261555    
## hyd_pressure2     -4.043e-02  1.356e-02   -2.982 0.002895 ** 
## hyd_pressure3      8.243e-02  1.494e-02    5.519 3.86e-08 ***
## hyd_pressure4      7.395e-03  4.677e-03    1.581 0.113947    
## filler_level      -2.447e-02  1.012e-02   -2.419 0.015647 *  
## filler_speed       8.211e-05  7.289e-03    0.011 0.991014    
## temperature       -1.761e-02  3.482e-03   -5.059 4.59e-07 ***
## usage_cont        -1.806e-02  3.764e-03   -4.798 1.72e-06 ***
## carb_flow          1.144e-02  4.270e-03    2.680 0.007431 ** 
## density           -2.364e-02  8.486e-03   -2.785 0.005395 ** 
## mfr               -4.386e-03  6.475e-03   -0.677 0.498277    
## balling           -3.052e-02  1.142e-02   -2.672 0.007611 ** 
## pressure_vacuum   -3.204e-03  4.185e-03   -0.765 0.444087    
## oxygen_filler     -1.542e-02  3.616e-03   -4.263 2.11e-05 ***
## bowl_setpoint      5.515e-02  1.031e-02    5.349 9.84e-08 ***
## pressure_setpoint -1.750e-02  4.378e-03   -3.996 6.66e-05 ***
## air_pressurer     -1.330e-03  3.177e-03   -0.418 0.675663    
## alch_rel           4.178e-02  1.214e-02    3.440 0.000593 ***
## carb_rel          -1.528e-03  6.554e-03   -0.233 0.815667    
## balling_lvl        2.978e-02  9.391e-03    3.171 0.001541 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.132 on 2020 degrees of freedom
## Multiple R-squared:  0.412,  Adjusted R-squared:  0.4021 
## F-statistic: 41.63 on 34 and 2020 DF,  p-value: < 2.2e-16

olspred <- predict(ols_model, newdata = testX)
olsResample <- postResample(pred=olspred, obs=testY)
olsResample

##      RMSE  Rsquared       MAE 
## 0.1370562 0.4179574 0.1055787

Again we see a slight increase in r squared when fitting our test data, but we are still explaining less than half of the variance.

Lasso Regression

lasso_model <- train(
  x = trainingX,
  y = trainingY,
  method = "lasso",
  tuneLength = 20,
  trControl = ctrl,
  preProcess = pre_process
)
lasso_model

## The lasso 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1645, 1645, 1643, 1643, 1644 
## Resampling results across tuning parameters:
## 
##   fraction   RMSE       Rsquared   MAE      
##   0.1000000  0.1501339  0.2830533  0.1193508
##   0.1421053  0.1454322  0.3083041  0.1153596
##   0.1842105  0.1420905  0.3261748  0.1124716
##   0.2263158  0.1400862  0.3378452  0.1107797
##   0.2684211  0.1386387  0.3495120  0.1094667
##   0.3105263  0.1375284  0.3581265  0.1084145
##   0.3526316  0.1367190  0.3640821  0.1076256
##   0.3947368  0.1359902  0.3697629  0.1069180
##   0.4368421  0.1353450  0.3749892  0.1062692
##   0.4789474  0.1348298  0.3792478  0.1057480
##   0.5210526  0.1344771  0.3819948  0.1053465
##   0.5631579  0.1342117  0.3840646  0.1049850
##   0.6052632  0.1339778  0.3859990  0.1046454
##   0.6473684  0.1338510  0.3869356  0.1044546
##   0.6894737  0.1337544  0.3876715  0.1043041
##   0.7315789  0.1336614  0.3884919  0.1041576
##   0.7736842  0.1336083  0.3890165  0.1040548
##   0.8157895  0.1335968  0.3891783  0.1039939
##   0.8578947  0.1336088  0.3891510  0.1039582
##   0.9000000  0.1336389  0.3889998  0.1039389
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was fraction = 0.8157895.

With the optimal fraction for our lasso model, we are still only explaining 38.9% of the variance.

lspred <- predict(lasso_model, newdata = testX)
lsResample <- postResample(pred=lspred, obs=testY)

The same pattern we have seen in the other linear models continues here.

Ridge Regression

ridge_model <- train(
  x = trainingX,
  y = trainingY,
  method = "ridge",
  tuneLength = 20,
  trControl = ctrl,
  preProcess = pre_process
)
ridge_model

## Ridge Regression 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1644, 1645, 1643, 1642, 1646 
## Resampling results across tuning parameters:
## 
##   lambda        RMSE       Rsquared   MAE      
##   0.0000000000  0.1336952  0.3889776  0.1039762
##   0.0001000000  0.1336929  0.3889936  0.1039759
##   0.0001467799  0.1336918  0.3890009  0.1039758
##   0.0002154435  0.1336903  0.3890113  0.1039758
##   0.0003162278  0.1336882  0.3890261  0.1039760
##   0.0004641589  0.1336852  0.3890466  0.1039764
##   0.0006812921  0.1336811  0.3890745  0.1039773
##   0.0010000000  0.1336757  0.3891107  0.1039792
##   0.0014677993  0.1336687  0.3891550  0.1039827
##   0.0021544347  0.1336606  0.3892037  0.1039882
##   0.0031622777  0.1336522  0.3892464  0.1040011
##   0.0046415888  0.1336459  0.3892616  0.1040269
##   0.0068129207  0.1336454  0.3892126  0.1040664
##   0.0100000000  0.1336572  0.3890436  0.1041259
##   0.0146779927  0.1336893  0.3886800  0.1042114
##   0.0215443469  0.1337519  0.3880316  0.1043300
##   0.0316227766  0.1338570  0.3869923  0.1044874
##   0.0464158883  0.1340194  0.3854322  0.1046996
##   0.0681292069  0.1342601  0.3831857  0.1049696
##   0.1000000000  0.1346091  0.3800521  0.1053149
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.006812921.

With minimum root mean squared error, the ridge regression model is only explaining about 39% of the variance, performing about as well as the other models tested thus far.

rdpred <- predict(ridge_model, newdata = testX)
rdResample <- postResample(pred=rdpred, obs=testY)
rdResample

##      RMSE  Rsquared       MAE 
## 0.1368061 0.4206312 0.1054762

And we again see the pattern continue, a slight rise in the value of r squared. #### Elastic Net

elastic_model <- train(
  x = trainingX,
  y = trainingY,
  method = "glmnet",
  tuneLength = 20,
  trControl = ctrl,
  preProcess = pre_process
)

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.

elastic_model

## glmnet 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1644, 1643, 1644, 1645, 1644 
## Resampling results across tuning parameters:
## 
##   alpha      lambda        RMSE       Rsquared   MAE      
##   0.1000000  2.377511e-05  0.1329226  0.3941972  0.1035321
##   0.1000000  3.686369e-05  0.1329226  0.3941972  0.1035321
##   0.1000000  5.715773e-05  0.1329226  0.3941972  0.1035321
##   0.1000000  8.862397e-05  0.1329214  0.3942064  0.1035309
##   0.1000000  1.374129e-04  0.1329122  0.3942775  0.1035270
##   0.1000000  2.130608e-04  0.1328918  0.3944390  0.1035224
##   0.1000000  3.303542e-04  0.1328663  0.3946368  0.1035200
##   0.1000000  5.122193e-04  0.1328411  0.3948275  0.1035267
##   0.1000000  7.942041e-04  0.1328257  0.3949296  0.1035586
##   0.1000000  1.231426e-03  0.1328222  0.3949390  0.1036248
##   0.1000000  1.909346e-03  0.1328537  0.3946794  0.1037536
##   0.1000000  2.960470e-03  0.1329462  0.3939587  0.1039553
##   0.1000000  4.590256e-03  0.1331379  0.3924981  0.1043034
##   0.1000000  7.117264e-03  0.1333950  0.3908053  0.1046870
##   0.1000000  1.103543e-02  0.1339379  0.3871087  0.1052883
##   0.1000000  1.711061e-02  0.1349654  0.3796828  0.1063240
##   0.1000000  2.653027e-02  0.1365736  0.3675888  0.1078927
##   0.1000000  4.113560e-02  0.1386090  0.3528850  0.1098444
##   0.1000000  6.378140e-02  0.1407585  0.3421548  0.1117815
##   0.1000000  9.889407e-02  0.1438334  0.3298172  0.1144534
##   0.1473684  2.377511e-05  0.1329106  0.3942980  0.1035332
##   0.1473684  3.686369e-05  0.1329106  0.3942980  0.1035332
##   0.1473684  5.715773e-05  0.1329106  0.3942980  0.1035332
##   0.1473684  8.862397e-05  0.1329070  0.3943251  0.1035298
##   0.1473684  1.374129e-04  0.1328992  0.3943875  0.1035246
##   0.1473684  2.130608e-04  0.1328809  0.3945314  0.1035184
##   0.1473684  3.303542e-04  0.1328531  0.3947487  0.1035165
##   0.1473684  5.122193e-04  0.1328309  0.3949128  0.1035293
##   0.1473684  7.942041e-04  0.1328164  0.3950104  0.1035671
##   0.1473684  1.231426e-03  0.1328195  0.3949727  0.1036512
##   0.1473684  1.909346e-03  0.1328662  0.3945998  0.1037993
##   0.1473684  2.960470e-03  0.1330016  0.3935314  0.1040669
##   0.1473684  4.590256e-03  0.1332092  0.3920255  0.1044341
##   0.1473684  7.117264e-03  0.1335694  0.3895826  0.1048848
##   0.1473684  1.103543e-02  0.1343183  0.3842423  0.1056857
##   0.1473684  1.711061e-02  0.1356136  0.3746468  0.1069636
##   0.1473684  2.653027e-02  0.1375612  0.3597251  0.1088629
##   0.1473684  4.113560e-02  0.1396506  0.3463031  0.1107719
##   0.1473684  6.378140e-02  0.1422566  0.3340980  0.1130471
##   0.1473684  9.889407e-02  0.1459108  0.3224567  0.1162662
##   0.1947368  2.377511e-05  0.1329208  0.3942182  0.1035314
##   0.1947368  3.686369e-05  0.1329208  0.3942182  0.1035314
##   0.1947368  5.715773e-05  0.1329194  0.3942290  0.1035302
##   0.1947368  8.862397e-05  0.1329138  0.3942722  0.1035264
##   0.1947368  1.374129e-04  0.1328968  0.3944085  0.1035218
##   0.1947368  2.130608e-04  0.1328726  0.3946006  0.1035161
##   0.1947368  3.303542e-04  0.1328454  0.3948121  0.1035174
##   0.1947368  5.122193e-04  0.1328263  0.3949495  0.1035344
##   0.1947368  7.942041e-04  0.1328139  0.3950339  0.1035805
##   0.1947368  1.231426e-03  0.1328241  0.3949441  0.1036807
##   0.1947368  1.909346e-03  0.1328907  0.3944127  0.1038558
##   0.1947368  2.960470e-03  0.1330652  0.3930337  0.1041854
##   0.1947368  4.590256e-03  0.1333017  0.3913766  0.1045646
##   0.1947368  7.117264e-03  0.1337858  0.3879828  0.1051218
##   0.1947368  1.103543e-02  0.1347202  0.3811569  0.1060883
##   0.1947368  1.711061e-02  0.1363294  0.3688228  0.1076754
##   0.1947368  2.653027e-02  0.1383871  0.3534507  0.1096249
##   0.1947368  4.113560e-02  0.1406407  0.3402230  0.1116173
##   0.1947368  6.378140e-02  0.1435727  0.3282788  0.1141829
##   0.1947368  9.889407e-02  0.1479398  0.3161755  0.1180263
##   0.2421053  2.377511e-05  0.1329265  0.3941793  0.1035415
##   0.2421053  3.686369e-05  0.1329265  0.3941793  0.1035415
##   0.2421053  5.715773e-05  0.1329258  0.3941849  0.1035408
##   0.2421053  8.862397e-05  0.1329163  0.3942590  0.1035350
##   0.2421053  1.374129e-04  0.1328936  0.3944342  0.1035233
##   0.2421053  2.130608e-04  0.1328667  0.3946490  0.1035174
##   0.2421053  3.303542e-04  0.1328405  0.3948516  0.1035195
##   0.2421053  5.122193e-04  0.1328214  0.3949892  0.1035398
##   0.2421053  7.942041e-04  0.1328117  0.3950555  0.1035955
##   0.2421053  1.231426e-03  0.1328291  0.3949125  0.1037106
##   0.2421053  1.909346e-03  0.1329206  0.3941808  0.1039211
##   0.2421053  2.960470e-03  0.1331145  0.3926819  0.1042833
##   0.2421053  4.590256e-03  0.1334057  0.3906399  0.1046939
##   0.2421053  7.117264e-03  0.1340308  0.3861079  0.1053866
##   0.2421053  1.103543e-02  0.1351413  0.3778763  0.1065443
##   0.2421053  1.711061e-02  0.1369991  0.3632953  0.1083409
##   0.2421053  2.653027e-02  0.1391516  0.3476836  0.1103172
##   0.2421053  4.113560e-02  0.1415469  0.3348336  0.1123631
##   0.2421053  6.378140e-02  0.1447974  0.3242417  0.1152351
##   0.2421053  9.889407e-02  0.1501769  0.3066728  0.1198776
##   0.2894737  2.377511e-05  0.1329117  0.3942976  0.1035306
##   0.2894737  3.686369e-05  0.1329117  0.3942976  0.1035306
##   0.2894737  5.715773e-05  0.1329110  0.3943027  0.1035300
##   0.2894737  8.862397e-05  0.1329043  0.3943559  0.1035262
##   0.2894737  1.374129e-04  0.1328853  0.3945059  0.1035203
##   0.2894737  2.130608e-04  0.1328564  0.3947371  0.1035141
##   0.2894737  3.303542e-04  0.1328334  0.3949115  0.1035201
##   0.2894737  5.122193e-04  0.1328148  0.3950478  0.1035448
##   0.2894737  7.942041e-04  0.1328076  0.3950969  0.1036139
##   0.2894737  1.231426e-03  0.1328362  0.3948665  0.1037436
##   0.2894737  1.909346e-03  0.1329557  0.3939088  0.1039921
##   0.2894737  2.960470e-03  0.1331691  0.3922919  0.1043711
##   0.2894737  4.590256e-03  0.1335319  0.3897093  0.1048448
##   0.2894737  7.117264e-03  0.1342692  0.3843007  0.1056385
##   0.2894737  1.103543e-02  0.1356131  0.3740219  0.1070243
##   0.2894737  1.711061e-02  0.1375646  0.3587730  0.1088591
##   0.2894737  2.653027e-02  0.1398100  0.3431185  0.1108821
##   0.2894737  4.113560e-02  0.1423514  0.3308836  0.1130326
##   0.2894737  6.378140e-02  0.1460967  0.3195976  0.1163612
##   0.2894737  9.889407e-02  0.1523375  0.2984984  0.1216585
##   0.3368421  2.377511e-05  0.1329105  0.3943118  0.1035326
##   0.3368421  3.686369e-05  0.1329105  0.3943118  0.1035326
##   0.3368421  5.715773e-05  0.1329070  0.3943388  0.1035295
##   0.3368421  8.862397e-05  0.1328990  0.3944017  0.1035243
##   0.3368421  1.374129e-04  0.1328791  0.3945595  0.1035173
##   0.3368421  2.130608e-04  0.1328494  0.3947974  0.1035117
##   0.3368421  3.303542e-04  0.1328279  0.3949579  0.1035200
##   0.3368421  5.122193e-04  0.1328097  0.3950921  0.1035492
##   0.3368421  7.942041e-04  0.1328051  0.3951245  0.1036283
##   0.3368421  1.231426e-03  0.1328469  0.3947894  0.1037756
##   0.3368421  1.909346e-03  0.1329992  0.3935633  0.1040745
##   0.3368421  2.960470e-03  0.1332293  0.3918542  0.1044561
##   0.3368421  4.590256e-03  0.1336716  0.3886502  0.1050042
##   0.3368421  7.117264e-03  0.1345165  0.3824149  0.1059099
##   0.3368421  1.103543e-02  0.1361160  0.3697663  0.1075221
##   0.3368421  1.711061e-02  0.1380915  0.3546612  0.1093426
##   0.3368421  2.653027e-02  0.1404365  0.3387972  0.1113777
##   0.3368421  4.113560e-02  0.1431400  0.3273701  0.1137044
##   0.3368421  6.378140e-02  0.1475363  0.3127560  0.1176074
##   0.3368421  9.889407e-02  0.1545424  0.2883486  0.1235026
##   0.3842105  2.377511e-05  0.1329121  0.3943164  0.1035433
##   0.3842105  3.686369e-05  0.1329121  0.3943164  0.1035433
##   0.3842105  5.715773e-05  0.1329082  0.3943464  0.1035398
##   0.3842105  8.862397e-05  0.1328962  0.3944377  0.1035306
##   0.3842105  1.374129e-04  0.1328721  0.3946204  0.1035168
##   0.3842105  2.130608e-04  0.1328418  0.3948582  0.1035095
##   0.3842105  3.303542e-04  0.1328206  0.3950177  0.1035198
##   0.3842105  5.122193e-04  0.1328034  0.3951462  0.1035549
##   0.3842105  7.942041e-04  0.1328023  0.3951537  0.1036420
##   0.3842105  1.231426e-03  0.1328596  0.3946950  0.1038129
##   0.3842105  1.909346e-03  0.1330369  0.3932722  0.1041511
##   0.3842105  2.960470e-03  0.1332970  0.3913554  0.1045478
##   0.3842105  4.590256e-03  0.1338121  0.3875908  0.1051660
##   0.3842105  7.117264e-03  0.1347981  0.3801756  0.1062081
##   0.3842105  1.103543e-02  0.1365762  0.3658918  0.1079543
##   0.3842105  1.711061e-02  0.1386606  0.3500202  0.1098614
##   0.3842105  2.653027e-02  0.1410033  0.3351394  0.1118266
##   0.3842105  4.113560e-02  0.1439139  0.3243595  0.1143808
##   0.3842105  6.378140e-02  0.1489851  0.3056371  0.1188015
##   0.3842105  9.889407e-02  0.1567606  0.2749564  0.1253589
##   0.4315789  2.377511e-05  0.1329088  0.3943410  0.1035401
##   0.4315789  3.686369e-05  0.1329088  0.3943410  0.1035401
##   0.4315789  5.715773e-05  0.1329044  0.3943755  0.1035361
##   0.4315789  8.862397e-05  0.1328928  0.3944639  0.1035275
##   0.4315789  1.374129e-04  0.1328651  0.3946764  0.1035122
##   0.4315789  2.130608e-04  0.1328395  0.3948764  0.1035117
##   0.4315789  3.303542e-04  0.1328178  0.3950401  0.1035227
##   0.4315789  5.122193e-04  0.1328014  0.3951651  0.1035650
##   0.4315789  7.942041e-04  0.1328065  0.3951222  0.1036633
##   0.4315789  1.231426e-03  0.1328792  0.3945411  0.1038565
##   0.4315789  1.909346e-03  0.1330785  0.3929488  0.1042197
##   0.4315789  2.960470e-03  0.1333689  0.3908254  0.1046401
##   0.4315789  4.590256e-03  0.1339673  0.3863927  0.1053290
##   0.4315789  7.117264e-03  0.1351075  0.3776397  0.1065238
##   0.4315789  1.103543e-02  0.1369581  0.3627776  0.1082997
##   0.4315789  1.711061e-02  0.1392252  0.3453355  0.1103539
##   0.4315789  2.653027e-02  0.1415374  0.3319945  0.1122532
##   0.4315789  4.113560e-02  0.1447465  0.3206862  0.1151227
##   0.4315789  6.378140e-02  0.1504142  0.2986257  0.1199554
##   0.4315789  9.889407e-02  0.1590271  0.2538543  0.1273474
##   0.4789474  2.377511e-05  0.1329113  0.3943259  0.1035422
##   0.4789474  3.686369e-05  0.1329106  0.3943308  0.1035417
##   0.4789474  5.715773e-05  0.1329056  0.3943702  0.1035371
##   0.4789474  8.862397e-05  0.1328882  0.3945027  0.1035257
##   0.4789474  1.374129e-04  0.1328588  0.3947285  0.1035094
##   0.4789474  2.130608e-04  0.1328338  0.3949226  0.1035103
##   0.4789474  3.303542e-04  0.1328119  0.3950902  0.1035225
##   0.4789474  5.122193e-04  0.1327970  0.3952043  0.1035735
##   0.4789474  7.942041e-04  0.1328082  0.3951154  0.1036809
##   0.4789474  1.231426e-03  0.1329007  0.3943717  0.1038989
##   0.4789474  1.909346e-03  0.1331172  0.3926587  0.1042770
##   0.4789474  2.960470e-03  0.1334515  0.3902007  0.1047422
##   0.4789474  4.590256e-03  0.1341273  0.3851496  0.1054991
##   0.4789474  7.117264e-03  0.1354411  0.3748363  0.1068602
##   0.4789474  1.103543e-02  0.1373100  0.3599906  0.1086122
##   0.4789474  1.711061e-02  0.1396670  0.3420204  0.1107161
##   0.4789474  2.653027e-02  0.1420664  0.3290436  0.1126980
##   0.4789474  4.113560e-02  0.1456571  0.3159160  0.1159147
##   0.4789474  6.378140e-02  0.1518101  0.2916245  0.1211111
##   0.4789474  9.889407e-02  0.1612714  0.2223449  0.1293271
##   0.5263158  2.377511e-05  0.1329127  0.3943131  0.1035409
##   0.5263158  3.686369e-05  0.1329101  0.3943335  0.1035387
##   0.5263158  5.715773e-05  0.1329039  0.3943823  0.1035342
##   0.5263158  8.862397e-05  0.1328843  0.3945308  0.1035225
##   0.5263158  1.374129e-04  0.1328534  0.3947703  0.1035066
##   0.5263158  2.130608e-04  0.1328311  0.3949424  0.1035116
##   0.5263158  3.303542e-04  0.1328096  0.3951068  0.1035266
##   0.5263158  5.122193e-04  0.1327952  0.3952203  0.1035848
##   0.5263158  7.942041e-04  0.1328131  0.3950788  0.1037018
##   0.5263158  1.231426e-03  0.1329324  0.3941089  0.1039546
##   0.5263158  1.909346e-03  0.1331608  0.3923252  0.1043471
##   0.5263158  2.960470e-03  0.1335398  0.3895233  0.1048482
##   0.5263158  4.590256e-03  0.1342842  0.3839479  0.1056717
##   0.5263158  7.117264e-03  0.1357652  0.3721015  0.1071800
##   0.5263158  1.103543e-02  0.1376575  0.3572597  0.1089273
##   0.5263158  1.711061e-02  0.1400820  0.3389546  0.1110373
##   0.5263158  2.653027e-02  0.1425642  0.3265772  0.1131345
##   0.5263158  4.113560e-02  0.1466120  0.3104264  0.1167135
##   0.5263158  6.378140e-02  0.1532025  0.2842432  0.1223195
##   0.5263158  9.889407e-02  0.1628612  0.2047071  0.1307311
##   0.5736842  2.377511e-05  0.1329097  0.3943361  0.1035404
##   0.5736842  3.686369e-05  0.1329062  0.3943637  0.1035374
##   0.5736842  5.715773e-05  0.1329006  0.3944075  0.1035323
##   0.5736842  8.862397e-05  0.1328804  0.3945612  0.1035200
##   0.5736842  1.374129e-04  0.1328499  0.3947987  0.1035062
##   0.5736842  2.130608e-04  0.1328281  0.3949661  0.1035122
##   0.5736842  3.303542e-04  0.1328072  0.3951270  0.1035310
##   0.5736842  5.122193e-04  0.1327929  0.3952426  0.1035944
##   0.5736842  7.942041e-04  0.1328203  0.3950243  0.1037244
##   0.5736842  1.231426e-03  0.1329588  0.3939013  0.1040061
##   0.5736842  1.909346e-03  0.1332035  0.3920025  0.1044082
##   0.5736842  2.960470e-03  0.1336304  0.3888257  0.1049556
##   0.5736842  4.590256e-03  0.1344589  0.3825670  0.1058645
##   0.5736842  7.117264e-03  0.1360643  0.3695986  0.1074727
##   0.5736842  1.103543e-02  0.1380170  0.3543515  0.1092572
##   0.5736842  1.711061e-02  0.1404579  0.3363138  0.1113185
##   0.5736842  2.653027e-02  0.1430492  0.3244013  0.1135516
##   0.5736842  4.113560e-02  0.1475561  0.3050097  0.1174958
##   0.5736842  6.378140e-02  0.1546286  0.2750717  0.1235289
##   0.5736842  9.889407e-02  0.1640963  0.2033266  0.1318173
##   0.6210526  2.377511e-05  0.1329083  0.3943470  0.1035385
##   0.6210526  3.686369e-05  0.1329058  0.3943665  0.1035364
##   0.6210526  5.715773e-05  0.1328990  0.3944197  0.1035306
##   0.6210526  8.862397e-05  0.1328771  0.3945859  0.1035174
##   0.6210526  1.374129e-04  0.1328468  0.3948235  0.1035065
##   0.6210526  2.130608e-04  0.1328247  0.3949934  0.1035127
##   0.6210526  3.303542e-04  0.1328037  0.3951564  0.1035348
##   0.6210526  5.122193e-04  0.1327918  0.3952540  0.1036052
##   0.6210526  7.942041e-04  0.1328270  0.3949749  0.1037487
##   0.6210526  1.231426e-03  0.1329915  0.3936355  0.1040629
##   0.6210526  1.909346e-03  0.1332503  0.3916444  0.1044702
##   0.6210526  2.960470e-03  0.1337247  0.3880975  0.1050612
##   0.6210526  4.590256e-03  0.1346502  0.3810139  0.1060598
##   0.6210526  7.117264e-03  0.1363495  0.3672143  0.1077383
##   0.6210526  1.103543e-02  0.1383845  0.3513218  0.1095870
##   0.6210526  1.711061e-02  0.1408215  0.3338391  0.1115953
##   0.6210526  2.653027e-02  0.1435609  0.3219459  0.1139990
##   0.6210526  4.113560e-02  0.1484604  0.3000715  0.1182458
##   0.6210526  6.378140e-02  0.1560983  0.2631424  0.1247642
##   0.6210526  9.889407e-02  0.1654155  0.2033266  0.1329834
##   0.6684211  2.377511e-05  0.1329122  0.3943196  0.1035405
##   0.6684211  3.686369e-05  0.1329079  0.3943532  0.1035368
##   0.6684211  5.715773e-05  0.1328970  0.3944370  0.1035307
##   0.6684211  8.862397e-05  0.1328730  0.3946200  0.1035161
##   0.6684211  1.374129e-04  0.1328449  0.3948386  0.1035077
##   0.6684211  2.130608e-04  0.1328226  0.3950103  0.1035144
##   0.6684211  3.303542e-04  0.1328022  0.3951690  0.1035407
##   0.6684211  5.122193e-04  0.1327936  0.3952395  0.1036185
##   0.6684211  7.942041e-04  0.1328371  0.3948971  0.1037774
##   0.6684211  1.231426e-03  0.1330228  0.3933823  0.1041148
##   0.6684211  1.909346e-03  0.1332928  0.3913310  0.1045242
##   0.6684211  2.960470e-03  0.1338247  0.3873161  0.1051691
##   0.6684211  4.590256e-03  0.1348508  0.3793674  0.1062557
##   0.6684211  7.117264e-03  0.1366175  0.3649840  0.1079810
##   0.6684211  1.103543e-02  0.1387800  0.3479575  0.1099340
##   0.6684211  1.711061e-02  0.1411754  0.3315344  0.1118745
##   0.6684211  2.653027e-02  0.1441073  0.3190447  0.1144760
##   0.6684211  4.113560e-02  0.1493442  0.2952075  0.1189707
##   0.6684211  6.378140e-02  0.1575326  0.2497171  0.1259647
##   0.6684211  9.889407e-02  0.1668638  0.2033266  0.1342722
##   0.7157895  2.377511e-05  0.1329114  0.3943254  0.1035393
##   0.7157895  3.686369e-05  0.1329072  0.3943591  0.1035357
##   0.7157895  5.715773e-05  0.1328942  0.3944592  0.1035285
##   0.7157895  8.862397e-05  0.1328686  0.3946554  0.1035131
##   0.7157895  1.374129e-04  0.1328421  0.3948609  0.1035078
##   0.7157895  2.130608e-04  0.1328192  0.3950384  0.1035147
##   0.7157895  3.303542e-04  0.1327990  0.3951969  0.1035455
##   0.7157895  5.122193e-04  0.1327939  0.3952393  0.1036294
##   0.7157895  7.942041e-04  0.1328489  0.3948038  0.1038062
##   0.7157895  1.231426e-03  0.1330542  0.3931276  0.1041661
##   0.7157895  1.909346e-03  0.1333473  0.3909071  0.1045920
##   0.7157895  2.960470e-03  0.1339252  0.3865303  0.1052853
##   0.7157895  4.590256e-03  0.1350659  0.3775700  0.1064730
##   0.7157895  7.117264e-03  0.1368613  0.3629920  0.1081924
##   0.7157895  1.103543e-02  0.1391695  0.3446018  0.1102611
##   0.7157895  1.711061e-02  0.1415219  0.3293655  0.1121565
##   0.7157895  2.653027e-02  0.1446876  0.3156550  0.1149866
##   0.7157895  4.113560e-02  0.1502156  0.2902539  0.1196684
##   0.7157895  6.378140e-02  0.1589687  0.2336551  0.1272372
##   0.7157895  9.889407e-02  0.1684555  0.2033266  0.1356914
##   0.7631579  2.377511e-05  0.1329065  0.3943600  0.1035343
##   0.7631579  3.686369e-05  0.1329012  0.3944017  0.1035302
##   0.7631579  5.715773e-05  0.1328904  0.3944859  0.1035243
##   0.7631579  8.862397e-05  0.1328632  0.3946961  0.1035082
##   0.7631579  1.374129e-04  0.1328380  0.3948946  0.1035065
##   0.7631579  2.130608e-04  0.1328149  0.3950742  0.1035143
##   0.7631579  3.303542e-04  0.1327954  0.3952286  0.1035504
##   0.7631579  5.122193e-04  0.1327930  0.3952509  0.1036385
##   0.7631579  7.942041e-04  0.1328606  0.3947121  0.1038335
##   0.7631579  1.231426e-03  0.1330855  0.3928772  0.1042131
##   0.7631579  1.909346e-03  0.1334006  0.3904973  0.1046564
##   0.7631579  2.960470e-03  0.1340244  0.3857581  0.1053970
##   0.7631579  4.590256e-03  0.1352886  0.3756801  0.1067026
##   0.7631579  7.117264e-03  0.1370861  0.3612176  0.1083962
##   0.7631579  1.103543e-02  0.1395261  0.3415850  0.1105572
##   0.7631579  1.711061e-02  0.1418600  0.3273254  0.1124400
##   0.7631579  2.653027e-02  0.1452991  0.3117585  0.1155163
##   0.7631579  4.113560e-02  0.1510807  0.2852316  0.1203791
##   0.7631579  6.378140e-02  0.1604453  0.2134046  0.1285401
##   0.7631579  9.889407e-02  0.1701960  0.2033266  0.1373337
##   0.8105263  2.377511e-05  0.1329080  0.3943494  0.1035352
##   0.8105263  3.686369e-05  0.1329026  0.3943920  0.1035308
##   0.8105263  5.715773e-05  0.1328878  0.3945065  0.1035223
##   0.8105263  8.862397e-05  0.1328587  0.3947335  0.1035060
##   0.8105263  1.374129e-04  0.1328350  0.3949190  0.1035063
##   0.8105263  2.130608e-04  0.1328118  0.3950999  0.1035151
##   0.8105263  3.303542e-04  0.1327924  0.3952545  0.1035559
##   0.8105263  5.122193e-04  0.1327944  0.3952427  0.1036506
##   0.8105263  7.942041e-04  0.1328804  0.3945473  0.1038635
##   0.8105263  1.231426e-03  0.1331160  0.3926358  0.1042586
##   0.8105263  1.909346e-03  0.1334537  0.3900908  0.1047235
##   0.8105263  2.960470e-03  0.1341180  0.3850504  0.1055005
##   0.8105263  4.590256e-03  0.1354930  0.3739659  0.1069083
##   0.8105263  7.117264e-03  0.1373108  0.3594407  0.1085979
##   0.8105263  1.103543e-02  0.1398106  0.3393612  0.1107725
##   0.8105263  1.711061e-02  0.1421709  0.3256601  0.1127153
##   0.8105263  2.653027e-02  0.1458961  0.3080332  0.1160155
##   0.8105263  4.113560e-02  0.1519524  0.2799372  0.1211471
##   0.8105263  6.378140e-02  0.1614649  0.2057100  0.1294713
##   0.8105263  9.889407e-02  0.1706573        NaN  0.1377592
##   0.8578947  2.377511e-05  0.1329054  0.3943690  0.1035331
##   0.8578947  3.686369e-05  0.1328999  0.3944122  0.1035284
##   0.8578947  5.715773e-05  0.1328847  0.3945305  0.1035197
##   0.8578947  8.862397e-05  0.1328545  0.3947688  0.1035050
##   0.8578947  1.374129e-04  0.1328315  0.3949471  0.1035059
##   0.8578947  2.130608e-04  0.1328081  0.3951300  0.1035158
##   0.8578947  3.303542e-04  0.1327881  0.3952924  0.1035601
##   0.8578947  5.122193e-04  0.1327950  0.3952424  0.1036622
##   0.8578947  7.942041e-04  0.1328995  0.3943899  0.1038970
##   0.8578947  1.231426e-03  0.1331438  0.3924222  0.1042995
##   0.8578947  1.909346e-03  0.1335084  0.3896698  0.1047935
##   0.8578947  2.960470e-03  0.1342250  0.3842095  0.1056166
##   0.8578947  4.590256e-03  0.1356932  0.3722843  0.1071077
##   0.8578947  7.117264e-03  0.1375291  0.3577115  0.1087878
##   0.8578947  1.103543e-02  0.1400598  0.3375155  0.1109507
##   0.8578947  1.711061e-02  0.1424797  0.3240640  0.1129859
##   0.8578947  2.653027e-02  0.1464974  0.3041883  0.1165150
##   0.8578947  4.113560e-02  0.1528458  0.2739185  0.1219317
##   0.8578947  6.378140e-02  0.1623665  0.2033266  0.1302880
##   0.8578947  9.889407e-02  0.1706573        NaN  0.1377592
##   0.9052632  2.377511e-05  0.1329081  0.3943502  0.1035350
##   0.9052632  3.686369e-05  0.1329014  0.3944038  0.1035302
##   0.9052632  5.715773e-05  0.1328829  0.3945465  0.1035196
##   0.9052632  8.862397e-05  0.1328511  0.3947964  0.1035029
##   0.9052632  1.374129e-04  0.1328291  0.3949666  0.1035061
##   0.9052632  2.130608e-04  0.1328058  0.3951497  0.1035172
##   0.9052632  3.303542e-04  0.1327858  0.3953128  0.1035662
##   0.9052632  5.122193e-04  0.1327982  0.3952194  0.1036758
##   0.9052632  7.942041e-04  0.1329222  0.3942006  0.1039342
##   0.9052632  1.231426e-03  0.1331707  0.3922177  0.1043400
##   0.9052632  1.909346e-03  0.1335664  0.3892192  0.1048627
##   0.9052632  2.960470e-03  0.1343421  0.3832678  0.1057398
##   0.9052632  4.590256e-03  0.1358786  0.3707453  0.1072893
##   0.9052632  7.117264e-03  0.1377569  0.3558757  0.1089929
##   0.9052632  1.103543e-02  0.1403208  0.3355229  0.1111399
##   0.9052632  1.711061e-02  0.1427946  0.3224300  0.1132583
##   0.9052632  2.653027e-02  0.1470738  0.3006342  0.1169832
##   0.9052632  4.113560e-02  0.1537408  0.2675399  0.1227141
##   0.9052632  6.378140e-02  0.1632678  0.2033266  0.1310906
##   0.9052632  9.889407e-02  0.1706573        NaN  0.1377592
##   0.9526316  2.377511e-05  0.1329041  0.3943829  0.1035312
##   0.9526316  3.686369e-05  0.1328969  0.3944398  0.1035265
##   0.9526316  5.715773e-05  0.1328767  0.3945956  0.1035146
##   0.9526316  8.862397e-05  0.1328482  0.3948189  0.1035020
##   0.9526316  1.374129e-04  0.1328262  0.3949909  0.1035058
##   0.9526316  2.130608e-04  0.1328031  0.3951716  0.1035186
##   0.9526316  3.303542e-04  0.1327842  0.3953269  0.1035725
##   0.9526316  5.122193e-04  0.1328017  0.3951929  0.1036904
##   0.9526316  7.942041e-04  0.1329426  0.3940345  0.1039697
##   0.9526316  1.231426e-03  0.1331994  0.3919975  0.1043819
##   0.9526316  1.909346e-03  0.1336250  0.3887656  0.1049291
##   0.9526316  2.960470e-03  0.1344634  0.3822817  0.1058626
##   0.9526316  4.590256e-03  0.1360625  0.3692124  0.1074631
##   0.9526316  7.117264e-03  0.1379951  0.3539138  0.1092090
##   0.9526316  1.103543e-02  0.1405567  0.3338417  0.1113147
##   0.9526316  1.711061e-02  0.1431203  0.3206648  0.1135320
##   0.9526316  2.653027e-02  0.1476536  0.2969211  0.1174578
##   0.9526316  4.113560e-02  0.1545918  0.2619304  0.1234431
##   0.9526316  6.378140e-02  0.1642359  0.2033266  0.1319365
##   0.9526316  9.889407e-02  0.1706573        NaN  0.1377592
##   1.0000000  2.377511e-05  0.1329025  0.3943970  0.1035320
##   1.0000000  3.686369e-05  0.1328955  0.3944528  0.1035270
##   1.0000000  5.715773e-05  0.1328748  0.3946135  0.1035147
##   1.0000000  8.862397e-05  0.1328463  0.3948344  0.1035023
##   1.0000000  1.374129e-04  0.1328235  0.3950125  0.1035058
##   1.0000000  2.130608e-04  0.1328008  0.3951916  0.1035205
##   1.0000000  3.303542e-04  0.1327832  0.3953364  0.1035795
##   1.0000000  5.122193e-04  0.1328063  0.3951580  0.1037058
##   1.0000000  7.942041e-04  0.1329661  0.3938412  0.1040076
##   1.0000000  1.231426e-03  0.1332281  0.3917774  0.1044196
##   1.0000000  1.909346e-03  0.1336873  0.3882762  0.1049962
##   1.0000000  2.960470e-03  0.1345891  0.3812510  0.1059900
##   1.0000000  4.590256e-03  0.1362412  0.3677195  0.1076245
##   1.0000000  7.117264e-03  0.1382418  0.3518469  0.1094282
##   1.0000000  1.103543e-02  0.1408018  0.3320559  0.1115022
##   1.0000000  1.711061e-02  0.1434602  0.3187124  0.1138308
##   1.0000000  2.653027e-02  0.1481903  0.2937448  0.1179068
##   1.0000000  4.113560e-02  0.1554706  0.2554287  0.1241912
##   1.0000000  6.378140e-02  0.1652780  0.2033266  0.1328597
##   1.0000000  9.889407e-02  0.1706573        NaN  0.1377592
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.0003303542.

At its optimal values, the elastic net model performs about as well as the other linear models. The model with the minimum root mean squared error explains about 39% of the variance.

enpred <- predict(elastic_model, newdata = testX)
enResample <- postResample(pred=enpred, obs=testY)
enResample

##      RMSE  Rsquared       MAE 
## 0.1366820 0.4225125 0.1054273

After looking at the various linear models, we can only hope that the nonlinear models perform better in explaining the variance in PH, since these models cannot even explain half of the variance.

Non linear models

K Nearest Neighbors

set.seed(24601)
knn_grid <- expand.grid(k = seq(1, 21, by = 2))

knn_model <- train(ph ~ ., data = train_data,
                   method = "knn",
                   tuneGrid = knn_grid,
                   trControl = ctrl,
                   preProcess = pre_process)
knn_model

## k-Nearest Neighbors 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (34) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE       
##    1  0.1571298  0.3376827  0.10545197
##    3  0.1283961  0.4586531  0.09421875
##    5  0.1247663  0.4748306  0.09298493
##    7  0.1233603  0.4842774  0.09290304
##    9  0.1232427  0.4846575  0.09318522
##   11  0.1245447  0.4738264  0.09463101
##   13  0.1252214  0.4686268  0.09568094
##   15  0.1259550  0.4628176  0.09666035
##   17  0.1267149  0.4567030  0.09715940
##   19  0.1268391  0.4552119  0.09764181
##   21  0.1276046  0.4484649  0.09810749
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 9.

With a value of 7 for the number of neighbors considered for the value, we get a maximum r squared of .4745. This is an improvement over the linear models.

knn_pred <- predict(knn_model, testX)
knn_resample <- postResample(knn_pred, testY)
knn_resample

##       RMSE   Rsquared        MAE 
## 0.12712325 0.50325447 0.09409939

We also see an improvement in how well the model fits the testing data.

Multivariate Adaptive Regression Splines

set.seed(24601)
mars_grid <- expand.grid(degree = 1:2, nprune = seq(3, 30, by = 3))
mars_model <- train(x = trainingX, y = trainingY,
                    method = "earth",
                    tuneGrid = mars_grid,
                    trControl = ctrl,
                    preProcess = pre_process)
mars_model

## Multivariate Adaptive Regression Spline 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE       
##   1        3      0.1427850  0.3014911  0.11218231
##   1        6      0.1382800  0.3451378  0.10796721
##   1        9      0.1349277  0.3773848  0.10432556
##   1       12      0.1323308  0.4006187  0.10252380
##   1       15      0.1313089  0.4095839  0.10148329
##   1       18      0.1306325  0.4155768  0.10067455
##   1       21      0.1307606  0.4144335  0.10069639
##   1       24      0.1307606  0.4144335  0.10069639
##   1       27      0.1307606  0.4144335  0.10069639
##   1       30      0.1307606  0.4144335  0.10069639
##   2        3      0.1424299  0.3043370  0.11149167
##   2        6      0.1514080  0.3034098  0.10847153
##   2        9      0.1458623  0.3458508  0.10409818
##   2       12      0.1437758  0.3659845  0.10176819
##   2       15      0.1369686  0.4016242  0.09862093
##   2       18      0.1349284  0.4149646  0.09708129
##   2       21      0.1346464  0.4210798  0.09571633
##   2       24      0.1342221  0.4266992  0.09509272
##   2       27      0.1326639  0.4314925  0.09479406
##   2       30      0.1324807  0.4343004  0.09431343
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 18 and degree = 1.

With a degree of one and an nprune value of 18, this model accounts for about 42% of the variance in predicting pH. We are seeing improvement from the linear models, but are not yet predicting ph with much accuracy.

mars_pred <- predict(mars_model, testX)
mars_resample <- postResample(mars_pred, testY)
mars_resample

##       RMSE   Rsquared        MAE 
## 0.12996831 0.47800982 0.09958475

R squared of the resample is also increasing, showing that this model is a better fit at capturing new data than the linear ones as well, though with a greater error than the KNN model. #### Support Vector Machines

set.seed(24601)

svm_grid <- expand.grid(sigma = c(0.01, 0.05, 0.1), C = c(0.1, 1, 10))
svm_model <- train(ph ~ ., data = train_data,
                   method = "svmRadial",
                   tuneGrid = svm_grid,
                   trControl = ctrl,
                   preProcess = pre_process)
svm_model

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (34) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644 
## Resampling results across tuning parameters:
## 
##   sigma  C     RMSE       Rsquared   MAE       
##   0.01    0.1  0.1346234  0.3979370  0.10364118
##   0.01    1.0  0.1232807  0.4832278  0.09107551
##   0.01   10.0  0.1190963  0.5188803  0.08740104
##   0.05    0.1  0.1312313  0.4300784  0.09936065
##   0.05    1.0  0.1176899  0.5278731  0.08706333
##   0.05   10.0  0.1176241  0.5301238  0.08802148
##   0.10    0.1  0.1378865  0.3874625  0.10550142
##   0.10    1.0  0.1209807  0.5040307  0.09035770
##   0.10   10.0  0.1222567  0.4871226  0.09200612
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.05 and C = 10.

At the optimal values of C and sigma, this model can explain 53% of the variance. Again, this is stronger than the linear models, but not strong enough to present to anyone.

svm_pred <- predict(svm_model, testX)
svm_resample <- postResample(svm_pred, testY)
svm_resample

##       RMSE   Rsquared        MAE 
## 0.12205159 0.54124868 0.08725195

Tree Models

Single regression tree

set.seed(24601)
rpartFit <- train(ph ~ ., data = train_data,
                  method = "rpart",
                  trControl = ctrl,
                  tuneGrid = data.frame(cp = c(0.01, 0.05, 0.1)),
                  preProcess = pre_process)
rpartFit

## CART 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (34) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644 
## Resampling results across tuning parameters:
## 
##   cp    RMSE       Rsquared   MAE      
##   0.01  0.1303245  0.4198557  0.1002721
##   0.05  0.1437860  0.2929985  0.1138928
##   0.10  0.1510357  0.2185671  0.1188032
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.01.

rpartPred <- predict(rpartFit, testX)

rpartResample <- postResample(rpartPred, testY)

rpartResample

##      RMSE  Rsquared       MAE 
## 0.1397839 0.3942828 0.1049243

This model does not perform as well as any of the nonlinear models. #### Random Forest

set.seed(24601)
rfGrid <- expand.grid(mtry=seq(2,38,by=3) )


rfModel <- train(x = trainingX, y = trainingY, 
                 method = "rf", 
                 tuneGrid = rfGrid, 
                 importance = TRUE, 
                 trControl = ctrl,
                 ntree = 50,
                 preProcess=pre_process)

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

plot(rfModel)

Here we can see that we minimize our RMSE at around 27 predictors. This would be the majority of our variables. It seems that the tradeoff between overfitting and minimizing RMSE would lead us to use closer to 20 predictors, since our improvement from that point is minimal. Judging by the scale, another possibility is a model with closer to 6 predictors, given the observed improvement in RMSE.

rfModel

## Random Forest 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE        Rsquared   MAE       
##    2    0.11338601  0.5969181  0.08671890
##    5    0.10405811  0.6533762  0.07827453
##    8    0.10412213  0.6451380  0.07706827
##   11    0.10233563  0.6555127  0.07545518
##   14    0.10052992  0.6666400  0.07457894
##   17    0.10015646  0.6669754  0.07384513
##   20    0.10000695  0.6670764  0.07385840
##   23    0.10045650  0.6626793  0.07330933
##   26    0.09975411  0.6672592  0.07282458
##   29    0.09951271  0.6674173  0.07275401
##   32    0.10002789  0.6629529  0.07292256
##   35    0.10045497  0.6582002  0.07311189
##   38    0.10001779  0.6621191  0.07305077
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 29.

It is important to note the difference between the R squared values at mtry = 5 and 29. 29 was value that maximizes our Rsquared with a value of .6674, but requires the involvement of 29 different variables. A model with only 5 variables would be much easier to explain, and we are sacrificing (.6674 - .6534) explaining 1.4% of variance.

varImpPlot(
  rfModel$finalModel,
  type = 1,
  main = "Random Forest Variable Importance"
)

rfPred <- predict(rfModel, testX)

rfResample <- postResample(rfPred, testY)

rfResample

##       RMSE   Rsquared        MAE 
## 0.10768165 0.64534782 0.07415208

We are performing comparably well with our test data.

Random Forest w/ Out of Bag Error

ctrlOOB <- trainControl(method = "oob")
set.seed(24601)
rfTuneOOB <- train(x = trainingX, y = trainingY,
                   method = "rf",
                   tuneGrid = rfGrid,
                   ntree = 50,
                   importance = TRUE,
                   trControl = ctrlOOB,
                   preProcess=pre_process)

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

rfTuneOOB

## Random Forest 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16) 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE        Rsquared 
##    2    0.11447402  0.5504192
##    5    0.10664458  0.6098142
##    8    0.10377403  0.6305367
##   11    0.10105026  0.6496769
##   14    0.10090420  0.6506889
##   17    0.09947006  0.6605478
##   20    0.09981191  0.6582106
##   23    0.09986052  0.6578776
##   26    0.09889365  0.6644705
##   29    0.10083731  0.6511519
##   32    0.09922449  0.6622218
##   35    0.09983148  0.6580765
##   38    0.09799135  0.6705653
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 38.

It is somewhat unsurprising that the lowest error is the one that uses every variable because it includes the most information to account for. At its peak, this model accounts for 67% of the variance, which is the highest we have observed thus far, but again is subject to being that way because it is overly fit.

varImpPlot(
  rfTuneOOB$finalModel,
  type = 1,
  main = "Random Forest Variable (Out of Bag error) Importance"
)

Being the greatest contributor to both random forests, we can tell that mnf_flow is a particularly important variable in the prediction of pH levels. Being Brand C, air_pressurer, pressure_vacuum, oxygen_filler, and hydpressure 3 also occupy high places on both lists.

rfOOBPred <- predict(rfTuneOOB, testX)

rfOOBResample <- postResample(rfOOBPred, testY)

rfOOBResample

##      RMSE  Rsquared       MAE 
## 0.1063199 0.6516039 0.0718344

Cubist

set.seed(24601)
cubistModel <- train(x = trainingX, y = trainingY,
                 method = "cubist",
                 verbose = FALSE, 
                 preProcess=pre_process)

cubistModel

## Cubist 
## 
## 2055 samples
##   34 predictor
## 
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 2055, 2055, 2055, 2055, 2055, 2055, ... 
## Resampling results across tuning parameters:
## 
##   committees  neighbors  RMSE       Rsquared   MAE       
##    1          0          0.1486562  0.3918197  0.10045784
##    1          5          0.1475851  0.4148138  0.09935475
##    1          9          0.1470693  0.4125497  0.09891662
##   10          0          0.1088636  0.5976971  0.07818208
##   10          5          0.1071543  0.6131967  0.07678160
##   10          9          0.1072050  0.6117322  0.07683018
##   20          0          0.1049828  0.6256309  0.07562827
##   20          5          0.1031959  0.6384917  0.07414244
##   20          9          0.1032997  0.6372475  0.07423520
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.

cubistPred <- predict(cubistModel, testX)
cubistResample <- postResample(cubistPred, testY)
cubistResample

##       RMSE   Rsquared        MAE 
## 0.10732555 0.64227681 0.07174676

This model was not as strong as the Random Forest with Out of Bag error. ### Final Comparison

model_results <- rbind(
  plsResample,
  olsResample,
  lsResample,
  rdResample,
  enResample,
  knn_resample,
  mars_resample,
  svm_resample,
  rfResample,
  rfOOBResample,
  cubistResample
)

rownames(model_results) <- c("Partial Least Squares", "Ordinary Least Squares", "Lasso", "Ridge", "Elastic Net", "KNN", "MARS", "SVM", "Random Forest", "Random Forest OOB", "Cubist")


model_results[order(model_results[,2]), ]

##                             RMSE  Rsquared        MAE
## Partial Least Squares  0.1371052 0.4175447 0.10576023
## Ordinary Least Squares 0.1370562 0.4179574 0.10557865
## Ridge                  0.1368061 0.4206312 0.10547624
## Lasso                  0.1367303 0.4224391 0.10545659
## Elastic Net            0.1366820 0.4225125 0.10542735
## MARS                   0.1299683 0.4780098 0.09958475
## KNN                    0.1271233 0.5032545 0.09409939
## SVM                    0.1220516 0.5412487 0.08725195
## Cubist                 0.1073255 0.6422768 0.07174676
## Random Forest          0.1076817 0.6453478 0.07415208
## Random Forest OOB      0.1063199 0.6516039 0.07183440

With the Random Forest (OOB) model having the lowest RMSE, lowest MAE, and highest Rsquared values, it is the strongest model that we have, so we will need to communicate the implications of this more directly. Although our strongest model only explains a disappointing 65% of the variance, at least it gives us a model that tends to be off by only about .1 on the pH scale. with an average error of .07.

varImp(rfTuneOOB)

## rf variable importance
## 
##   only 20 most important variables shown (out of 34)
## 
##                 Overall
## mnf_flow         100.00
## alch_rel          62.26
## brand_codeC       53.84
## air_pressurer     37.99
## pressure_vacuum   34.15
## balling_lvl       33.14
## hyd_pressure3     31.18
## carb_rel          30.16
## oxygen_filler     29.44
## bowl_setpoint     27.91
## usage_cont        26.30
## temperature       26.20
## filler_speed      24.26
## carb_flow         21.42
## carb_pressure1    19.18
## balling           18.97
## hyd_pressure2     18.40
## fill_pressure     17.92
## pc_volume         16.66
## carb_volume       16.51

 toppre <- varImp(rfTuneOOB)$importance %>%
  arrange(-Overall) |>
  head(10)
variables <- rownames(toppre)
cor_matrix <- train_data |>
  dplyr::select(all_of(c("ph", variables))) |>
    cor()
 corrplot::corrplot(cor_matrix,
  method = "color",
  type = "upper",
  addCoef.col = "black",
  tl.cex = 0.8,
  tl.col = "black")

plot(rfTuneOOB)

Predicting other data

With the best model selected, let’s use it to predict pH for the other set of data provided.

set.seed(24601)

imputed_data <- mice(test_pp, m = 1, method = 'rf', print = FALSE)

test_pp <- complete(imputed_data)
ph_pred <- predict(rfTuneOOB, test_pp)
test_pp$ph <- ph_pred

Exporting to Excel

data.frame(
  SampleID = 1:nrow(test_pp),
  Predicted_pH = predict(rfTuneOOB, newdata = test_pp)
) |>
  write.csv(
    "pH_Predictions.csv",
    row.names = FALSE
  )

Conclusions

Despite the wide range of models tested, the strongest model found could only explain 65% of the variance in pH levels. With our most significant predictors, we do see a high correlation between a few of them: namely balling_lvl, carb_rel, and alch_rel. In spite of this, they were each contained in our strongest performing random forests. However, there is still 35% of the variance within pH level unexplained by our model, which does seem low for this type of study. Perhaps the lenience I showed in keeping the outliers and imputing for missing data values contributed to this, or of course there is the possibility of missing something, either in the data collection stage or in the analytical stage.

Making a graph for the Report

set.seed(24601)
imputed_data <- mice(test, m = 1, method = 'rf', print = FALSE)

## Warning: Number of logged events: 2

test <- complete(imputed_data)

test$ph <- ph_pred

test |>
  ggplot(aes(x = brand_code, y = ph))+
  geom_boxplot(fill = "skyblue", color = "black") +
  ggtitle('Distribution of Predicted pH Values by Brand')