This is role playing. I am your new boss. I am in charge of production at ABC Beverage and you are a team of data scientists reporting to me. My leadership has told me that new regulations are requiring us to understand our manufacturing process, the predictive factors and be able to report to them our predictive model of PH.
Please use the historical data set I am providing. Build and report the factors in BOTH a technical and non-technical report. I like to use Word and Excel. Please provide your non-technical report in a business friendly readable document and your predictions in an Excel readable format. The technical report should show clearly the models you tested and how you selected your final approach.
Please submit both Rpubs links and .rmd files or other readable formats for technical and non-technical reports. Also submit the excel file showing the prediction of your models for pH.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(AppliedPredictiveModeling)
library(readxl)
library(corrplot)
## corrplot 0.95 loaded
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(mice)
##
## Attaching package: 'mice'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## cbind, rbind
library(elasticnet)
## Loading required package: lars
## Loaded lars 1.3
library(earth)
## Loading required package: Formula
## Loading required package: plotmo
## Loading required package: plotrix
library(kernlab)
##
## Attaching package: 'kernlab'
##
## The following object is masked from 'package:mice':
##
## convergence
##
## The following object is masked from 'package:purrr':
##
## cross
##
## The following object is masked from 'package:ggplot2':
##
## alpha
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
##
## The following object is masked from 'package:dplyr':
##
## combine
##
## The following object is masked from 'package:ggplot2':
##
## margin
library(openxlsx)
train <- read_excel("C:/Users/ddebo/Downloads/StudentData.xlsx")
test <- read_excel('C:/Users/ddebo/Downloads/StudentEvaluation.xlsx')
train <- train |>
clean_names(case = "snake")
head(train)
## # A tibble: 6 Ă— 33
## brand_code carb_volume fill_ounces pc_volume carb_pressure carb_temp psc
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 B 5.34 24.0 0.263 68.2 141. 0.104
## 2 A 5.43 24.0 0.239 68.4 140. 0.124
## 3 B 5.29 24.1 0.263 70.8 145. 0.09
## 4 A 5.44 24.0 0.293 63 133. NA
## 5 A 5.49 24.3 0.111 67.2 137. 0.026
## 6 A 5.38 23.9 0.269 66.6 138. 0.09
## # ℹ 26 more variables: psc_fill <dbl>, psc_co2 <dbl>, mnf_flow <dbl>,
## # carb_pressure1 <dbl>, fill_pressure <dbl>, hyd_pressure1 <dbl>,
## # hyd_pressure2 <dbl>, hyd_pressure3 <dbl>, hyd_pressure4 <dbl>,
## # filler_level <dbl>, filler_speed <dbl>, temperature <dbl>,
## # usage_cont <dbl>, carb_flow <dbl>, density <dbl>, mfr <dbl>, balling <dbl>,
## # pressure_vacuum <dbl>, ph <dbl>, oxygen_filler <dbl>, bowl_setpoint <dbl>,
## # pressure_setpoint <dbl>, air_pressurer <dbl>, alch_rel <dbl>, …
Here we can see that we are provided with 33 variables, one of which is our target variable, PH. All variables are numeric except for the first one, where brands are coded by letter. For further analysis, we will likely need to convert these categories to dummy variables.
str(train)
## tibble [2,571 Ă— 33] (S3: tbl_df/tbl/data.frame)
## $ brand_code : chr [1:2571] "B" "A" "B" "A" ...
## $ carb_volume : num [1:2571] 5.34 5.43 5.29 5.44 5.49 ...
## $ fill_ounces : num [1:2571] 24 24 24.1 24 24.3 ...
## $ pc_volume : num [1:2571] 0.263 0.239 0.263 0.293 0.111 ...
## $ carb_pressure : num [1:2571] 68.2 68.4 70.8 63 67.2 66.6 64.2 67.6 64.2 72 ...
## $ carb_temp : num [1:2571] 141 140 145 133 137 ...
## $ psc : num [1:2571] 0.104 0.124 0.09 NA 0.026 0.09 0.128 0.154 0.132 0.014 ...
## $ psc_fill : num [1:2571] 0.26 0.22 0.34 0.42 0.16 ...
## $ psc_co2 : num [1:2571] 0.04 0.04 0.16 0.04 0.12 ...
## $ mnf_flow : num [1:2571] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 ...
## $ carb_pressure1 : num [1:2571] 119 122 120 115 118 ...
## $ fill_pressure : num [1:2571] 46 46 46 46.4 45.8 45.6 51.8 46.8 46 45.2 ...
## $ hyd_pressure1 : num [1:2571] 0 0 0 0 0 0 0 0 0 0 ...
## $ hyd_pressure2 : num [1:2571] NA NA NA 0 0 0 0 0 0 0 ...
## $ hyd_pressure3 : num [1:2571] NA NA NA 0 0 0 0 0 0 0 ...
## $ hyd_pressure4 : num [1:2571] 118 106 82 92 92 116 124 132 90 108 ...
## $ filler_level : num [1:2571] 121 119 120 118 119 ...
## $ filler_speed : num [1:2571] 4002 3986 4020 4012 4010 ...
## $ temperature : num [1:2571] 66 67.6 67 65.6 65.6 66.2 65.8 65.2 65.4 66.6 ...
## $ usage_cont : num [1:2571] 16.2 19.9 17.8 17.4 17.7 ...
## $ carb_flow : num [1:2571] 2932 3144 2914 3062 3054 ...
## $ density : num [1:2571] 0.88 0.92 1.58 1.54 1.54 1.52 0.84 0.84 0.9 0.9 ...
## $ mfr : num [1:2571] 725 727 735 731 723 ...
## $ balling : num [1:2571] 1.4 1.5 3.14 3.04 3.04 ...
## $ pressure_vacuum : num [1:2571] -4 -4 -3.8 -4.4 -4.4 -4.4 -4.4 -4.4 -4.4 -4.4 ...
## $ ph : num [1:2571] 8.36 8.26 8.94 8.24 8.26 8.32 8.4 8.38 8.38 8.5 ...
## $ oxygen_filler : num [1:2571] 0.022 0.026 0.024 0.03 0.03 0.024 0.066 0.046 0.064 0.022 ...
## $ bowl_setpoint : num [1:2571] 120 120 120 120 120 120 120 120 120 120 ...
## $ pressure_setpoint: num [1:2571] 46.4 46.8 46.6 46 46 46 46 46 46 46 ...
## $ air_pressurer : num [1:2571] 143 143 142 146 146 ...
## $ alch_rel : num [1:2571] 6.58 6.56 7.66 7.14 7.14 7.16 6.54 6.52 6.52 6.54 ...
## $ carb_rel : num [1:2571] 5.32 5.3 5.84 5.42 5.44 5.44 5.38 5.34 5.34 5.34 ...
## $ balling_lvl : num [1:2571] 1.48 1.56 3.28 3.04 3.04 3.02 1.44 1.44 1.44 1.38 ...
In the first few rows alone we see NA values, so that is something that will be addressed in the cleaning stage. Additionally, it was necessary to clean the column names due to difficulty encountered later in the process. Might as well make the same change for our test set which will be used at the end.
test <- test |>
clean_names(case = "snake")
summary(test)
## brand_code carb_volume fill_ounces pc_volume
## Length:267 Min. :5.147 Min. :23.75 Min. :0.09867
## Class :character 1st Qu.:5.287 1st Qu.:23.92 1st Qu.:0.23333
## Mode :character Median :5.340 Median :23.97 Median :0.27533
## Mean :5.369 Mean :23.97 Mean :0.27769
## 3rd Qu.:5.465 3rd Qu.:24.01 3rd Qu.:0.32200
## Max. :5.667 Max. :24.20 Max. :0.46400
## NA's :1 NA's :6 NA's :4
## carb_pressure carb_temp psc psc_fill
## Min. :60.20 Min. :130.0 Min. :0.00400 Min. :0.0200
## 1st Qu.:65.30 1st Qu.:138.4 1st Qu.:0.04450 1st Qu.:0.1000
## Median :68.00 Median :140.8 Median :0.07600 Median :0.1800
## Mean :68.25 Mean :141.2 Mean :0.08545 Mean :0.1903
## 3rd Qu.:70.60 3rd Qu.:143.8 3rd Qu.:0.11200 3rd Qu.:0.2600
## Max. :77.60 Max. :154.0 Max. :0.24600 Max. :0.6200
## NA's :1 NA's :5 NA's :3
## psc_co2 mnf_flow carb_pressure1 fill_pressure
## Min. :0.00000 Min. :-100.20 Min. :113.0 Min. :37.80
## 1st Qu.:0.02000 1st Qu.:-100.00 1st Qu.:120.2 1st Qu.:46.00
## Median :0.04000 Median : 0.20 Median :123.4 Median :47.80
## Mean :0.05107 Mean : 21.03 Mean :123.0 Mean :48.14
## 3rd Qu.:0.06000 3rd Qu.: 141.30 3rd Qu.:125.5 3rd Qu.:50.20
## Max. :0.24000 Max. : 220.40 Max. :136.0 Max. :60.20
## NA's :5 NA's :4 NA's :2
## hyd_pressure1 hyd_pressure2 hyd_pressure3 hyd_pressure4
## Min. :-50.00 Min. :-50.00 Min. :-50.00 Min. : 68.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 90.00
## Median : 10.40 Median : 26.80 Median : 27.70 Median : 98.00
## Mean : 12.01 Mean : 20.11 Mean : 19.61 Mean : 97.84
## 3rd Qu.: 20.40 3rd Qu.: 34.80 3rd Qu.: 33.00 3rd Qu.:104.00
## Max. : 50.00 Max. : 61.40 Max. : 49.20 Max. :140.00
## NA's :1 NA's :1 NA's :4
## filler_level filler_speed temperature usage_cont carb_flow
## Min. : 69.2 Min. :1006 Min. :63.80 Min. :12.90 Min. : 0
## 1st Qu.:100.6 1st Qu.:3812 1st Qu.:65.40 1st Qu.:18.12 1st Qu.:1083
## Median :118.6 Median :3978 Median :65.80 Median :21.44 Median :3038
## Mean :110.3 Mean :3581 Mean :66.23 Mean :20.90 Mean :2409
## 3rd Qu.:120.2 3rd Qu.:3996 3rd Qu.:66.60 3rd Qu.:23.74 3rd Qu.:3215
## Max. :153.2 Max. :4020 Max. :75.40 Max. :24.60 Max. :3858
## NA's :2 NA's :10 NA's :2 NA's :2
## density mfr balling pressure_vacuum
## Min. :0.060 Min. : 15.6 Min. :0.902 Min. :-6.400
## 1st Qu.:0.920 1st Qu.:707.0 1st Qu.:1.498 1st Qu.:-5.600
## Median :0.980 Median :724.6 Median :1.648 Median :-5.200
## Mean :1.177 Mean :697.8 Mean :2.203 Mean :-5.174
## 3rd Qu.:1.600 3rd Qu.:731.5 3rd Qu.:3.242 3rd Qu.:-4.800
## Max. :1.840 Max. :784.8 Max. :3.788 Max. :-3.600
## NA's :1 NA's :31 NA's :1 NA's :1
## ph oxygen_filler bowl_setpoint pressure_setpoint
## Mode:logical Min. :0.00240 Min. : 70.0 Min. :44.00
## NA's:267 1st Qu.:0.01960 1st Qu.:100.0 1st Qu.:46.00
## Median :0.03370 Median :120.0 Median :46.00
## Mean :0.04666 Mean :109.6 Mean :47.73
## 3rd Qu.:0.05440 3rd Qu.:120.0 3rd Qu.:50.00
## Max. :0.39800 Max. :130.0 Max. :52.00
## NA's :3 NA's :1 NA's :2
## air_pressurer alch_rel carb_rel balling_lvl
## Min. :141.2 Min. :6.400 Min. :5.18 Min. :0.000
## 1st Qu.:142.2 1st Qu.:6.540 1st Qu.:5.34 1st Qu.:1.380
## Median :142.6 Median :6.580 Median :5.40 Median :1.480
## Mean :142.8 Mean :6.907 Mean :5.44 Mean :2.051
## 3rd Qu.:142.8 3rd Qu.:7.180 3rd Qu.:5.56 3rd Qu.:3.080
## Max. :147.2 Max. :7.820 Max. :5.74 Max. :3.420
## NA's :1 NA's :3 NA's :2
Using the summary function, we are given more detail as to the spread of each of these variables, as well as how many NA values are contained. This also confirms that we will need to address scale before doing any analysis being that the scales of the variables vary dramatically.
train |>
select(where(is.numeric)) |>
pivot_longer(cols = everything(), names_to = "Feature", values_to = "Value") |>
filter(!is.na(Value)) |>
ggplot(aes(x = Value)) +
geom_histogram(bins = 30, fill = "skyblue", color = "black") +
facet_wrap(~Feature, scales = "free") +
ggtitle("Histograms of Numerical Features")
Here, the histograms allow us to visualize the distribution of each variable. The most important of which, PH, appears to be fairly normally distributed. Across these variables, we see examples of left skew(filler speed), right skew (PSC, PSC Fill, Temp), bimodality (density, balling, carb flow), and variables that are likely skewed by specific values (Mnf Flow, HYD pressure 1-3).
train2 <- train %>%
mutate(`Brand Code` = as.factor(brand_code))
# Plot Brand Code distribution
ggplot(train2, aes(x = brand_code)) +
geom_bar(fill = "skyblue", color = "black") +
ggtitle("Distribution of Brand Code") +
theme_minimal()
train |>
ggplot(aes(x = ph)) +
geom_histogram(bins= 40, fill = "lightblue", color = "black") +
labs(title = "Distribution of pH by Brand", y = "Count" ) +
facet_wrap(~ brand_code, scales = 'free_y')
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
There are a surprising number of rows with an NA value for Brand Code, which is one of the most important variables in our analysis since our beverage company produces multiple brands of drink. It seems the best way to address this categorical variable in our analysis is to use one hot encoding to make it numerical, especially since there are only four brands, so we are not adding too many new dimensions to the data with this conversion. Additionally, it allows us to view those not assigned a brand as simply not within one of those categories.
model <- dummyVars(~ ., data = train)
train_pp <- predict(model, newdata = train) |>
as.data.frame()
test_pp <- predict(model, newdata = test) |>
as.data.frame()
test_pp <- test_pp |>
select(-c(phFALSE, phTRUE))
cor_matrix <- cor(train_pp, use = "pairwise.complete.obs")
corrplot::corrplot(cor_matrix, order = "hclust", tl.cex = 0.7)
The two things that jump out from this plot are the strong negative correlations with a set of variables and brand B, including Balling Lvl, Balling, Alch Rel, Density, and Carb Rel/Volume. On the other hand, those same variables are shown to be highly correlated with each other.
The issue of missing data has come up several times in the exploratory stage. First, let us get a sense of the problem.
PH is our target variable and since this is our training set, we do not want to have NA as an option for our model, so it is best to remove those rows from further analysis.
train_pp = train_pp[!is.na(train_pp$ph),]
sum(is.na(train_pp$ph))
## [1] 0
set.seed(24601)
# MICE imputation
imputed_data <- mice(train_pp, m = 1, method = 'rf', print = FALSE)
summary(imputed_data)
## Class: mids
## Number of multiple imputations: 1
## Imputation methods:
## brand_codeA brand_codeB brand_codeC brand_codeD
## "rf" "rf" "rf" "rf"
## carb_volume fill_ounces pc_volume carb_pressure
## "rf" "rf" "rf" "rf"
## carb_temp psc psc_fill psc_co2
## "rf" "rf" "rf" "rf"
## mnf_flow carb_pressure1 fill_pressure hyd_pressure1
## "" "rf" "rf" "rf"
## hyd_pressure2 hyd_pressure3 hyd_pressure4 filler_level
## "rf" "rf" "rf" "rf"
## filler_speed temperature usage_cont carb_flow
## "rf" "rf" "rf" "rf"
## density mfr balling pressure_vacuum
## "" "rf" "" ""
## ph oxygen_filler bowl_setpoint pressure_setpoint
## "" "rf" "rf" "rf"
## air_pressurer alch_rel carb_rel balling_lvl
## "" "rf" "rf" "rf"
## PredictorMatrix:
## brand_codeA brand_codeB brand_codeC brand_codeD carb_volume
## brand_codeA 0 1 1 1 1
## brand_codeB 1 0 1 1 1
## brand_codeC 1 1 0 1 1
## brand_codeD 1 1 1 0 1
## carb_volume 1 1 1 1 0
## fill_ounces 1 1 1 1 1
## fill_ounces pc_volume carb_pressure carb_temp psc psc_fill psc_co2
## brand_codeA 1 1 1 1 1 1 1
## brand_codeB 1 1 1 1 1 1 1
## brand_codeC 1 1 1 1 1 1 1
## brand_codeD 1 1 1 1 1 1 1
## carb_volume 1 1 1 1 1 1 1
## fill_ounces 0 1 1 1 1 1 1
## mnf_flow carb_pressure1 fill_pressure hyd_pressure1 hyd_pressure2
## brand_codeA 1 1 1 1 1
## brand_codeB 1 1 1 1 1
## brand_codeC 1 1 1 1 1
## brand_codeD 1 1 1 1 1
## carb_volume 1 1 1 1 1
## fill_ounces 1 1 1 1 1
## hyd_pressure3 hyd_pressure4 filler_level filler_speed temperature
## brand_codeA 1 1 1 1 1
## brand_codeB 1 1 1 1 1
## brand_codeC 1 1 1 1 1
## brand_codeD 1 1 1 1 1
## carb_volume 1 1 1 1 1
## fill_ounces 1 1 1 1 1
## usage_cont carb_flow density mfr balling pressure_vacuum ph
## brand_codeA 1 1 1 1 1 1 1
## brand_codeB 1 1 1 1 1 1 1
## brand_codeC 1 1 1 1 1 1 1
## brand_codeD 1 1 1 1 1 1 1
## carb_volume 1 1 1 1 1 1 1
## fill_ounces 1 1 1 1 1 1 1
## oxygen_filler bowl_setpoint pressure_setpoint air_pressurer
## brand_codeA 1 1 1 1
## brand_codeB 1 1 1 1
## brand_codeC 1 1 1 1
## brand_codeD 1 1 1 1
## carb_volume 1 1 1 1
## fill_ounces 1 1 1 1
## alch_rel carb_rel balling_lvl
## brand_codeA 1 1 1
## brand_codeB 1 1 1
## brand_codeC 1 1 1
## brand_codeD 1 1 1
## carb_volume 1 1 1
## fill_ounces 1 1 1
# Extract the completed dataset
train_pp <- complete(imputed_data)
# Check for any remaining missing values
colSums(is.na(train_pp))
## brand_codeA brand_codeB brand_codeC brand_codeD
## 0 0 0 0
## carb_volume fill_ounces pc_volume carb_pressure
## 0 0 0 0
## carb_temp psc psc_fill psc_co2
## 0 0 0 0
## mnf_flow carb_pressure1 fill_pressure hyd_pressure1
## 0 0 0 0
## hyd_pressure2 hyd_pressure3 hyd_pressure4 filler_level
## 0 0 0 0
## filler_speed temperature usage_cont carb_flow
## 0 0 0 0
## density mfr balling pressure_vacuum
## 0 0 0 0
## ph oxygen_filler bowl_setpoint pressure_setpoint
## 0 0 0 0
## air_pressurer alch_rel carb_rel balling_lvl
## 0 0 0 0
Now we no longer have to worry about missing values after this use of the mice program to impute, followed by a confirmation that there are no longer any missing values.
But first, If any variables have a variance near zero, they will not be useful for any analysis, so they should be removed.
nzv_cols <- nearZeroVar(train_pp, names = TRUE)
nzv_cols
## [1] "hyd_pressure1"
if(length(nzv_cols) > 0) {
train_pp <- train_pp[, !colnames(train_pp) %in% nzv_cols]
}
if(length(nzv_cols) > 0) {
test_pp <- test_pp[, !colnames(test_pp) %in% nzv_cols]
}
The only variable identified was Hyd Pressure1, but it has been removed from our data set. Another important thing to check for are outliers in our target variable, particularly those that could have been caused by measurement or human error, since they would impact the accuracy of our analysis.
outliers <- train_pp |>
mutate(
Q1 = quantile(ph, 0.25, na.rm = TRUE),
Q3 = quantile(ph, 0.75, na.rm = TRUE),
IQR = Q3 - Q1,
lower = Q1 - 1.5 * IQR,
upper = Q3 + 1.5 * IQR,
is_outlier = (ph < lower | ph > upper)
) |>
filter(is_outlier == TRUE)
outliers
## brand_codeA brand_codeB brand_codeC brand_codeD carb_volume fill_ounces
## 1094 0 0 1 0 5.220000 23.92000
## 1876 1 0 0 0 5.626667 24.28000
## 2088 0 0 1 0 5.200000 24.03333
## 2094 0 1 0 0 5.373333 23.92667
## 2095 0 1 0 0 5.380000 24.09333
## 2097 0 1 0 0 5.426667 23.97333
## 2098 0 1 0 0 5.426667 24.00000
## 2099 0 1 0 0 5.360000 24.01333
## 2100 0 1 0 0 5.346667 23.97333
## 2102 0 1 0 0 5.340000 24.00000
## 2103 0 1 0 0 5.340000 24.14667
## 2104 0 1 0 0 5.346667 24.09333
## 2557 0 0 1 0 5.240000 24.07333
## 2558 0 0 1 0 5.206667 24.05333
## 2559 0 0 1 0 5.386667 24.27333
## 2560 0 0 1 0 5.373333 24.12667
## 2561 0 0 1 0 5.273333 24.01333
## 2562 0 0 1 0 5.366667 24.06667
## pc_volume carb_pressure carb_temp psc psc_fill psc_co2 mnf_flow
## 1094 0.3913333 63.6 139.0 0.002 0.18 0.04 -100.0
## 1876 0.1820000 70.6 136.6 0.068 0.32 0.12 143.8
## 2088 0.2413333 63.2 138.6 0.060 0.06 0.04 151.0
## 2094 0.2326667 67.6 140.0 0.128 0.24 0.06 151.0
## 2095 0.2506667 71.0 145.6 0.084 0.20 0.12 158.4
## 2097 0.2580000 63.0 134.0 0.072 0.28 0.10 149.2
## 2098 0.2273333 70.8 143.8 0.050 0.44 0.06 143.4
## 2099 0.2413333 66.6 138.4 0.032 0.06 0.06 147.2
## 2100 0.2606667 70.6 144.8 0.058 0.10 0.06 144.8
## 2102 0.2453333 66.4 139.8 0.116 0.14 0.04 144.6
## 2103 0.2940000 68.4 142.2 0.106 0.44 0.08 143.4
## 2104 0.2940000 66.8 140.2 0.086 0.18 0.02 146.2
## 2557 0.2686667 66.6 142.0 0.012 0.28 0.08 166.0
## 2558 0.3246667 64.8 140.4 0.148 0.34 0.02 164.6
## 2559 0.2866667 64.2 138.4 0.108 0.24 0.06 0.2
## 2560 0.2380000 66.0 139.4 0.124 0.30 0.12 151.8
## 2561 0.2813333 65.4 138.6 0.096 0.34 0.08 161.6
## 2562 0.2713333 69.4 142.0 0.078 0.30 0.08 160.0
## carb_pressure1 fill_pressure hyd_pressure2 hyd_pressure3 hyd_pressure4
## 1094 138.4 42.8 0.2 0.0 130
## 1876 123.2 50.6 37.2 33.0 100
## 2088 129.0 50.0 33.8 31.0 82
## 2094 121.8 50.0 35.0 32.0 94
## 2095 129.8 50.0 39.8 34.6 96
## 2097 125.2 50.0 37.4 32.8 98
## 2098 123.2 50.2 35.0 33.2 96
## 2099 125.4 50.2 37.8 34.2 104
## 2100 126.6 50.0 36.8 31.6 100
## 2102 127.0 51.4 35.6 34.2 96
## 2103 123.2 50.2 37.2 32.6 104
## 2104 127.6 50.2 37.4 34.2 98
## 2557 123.8 53.0 33.0 38.2 106
## 2558 127.0 50.0 34.0 38.6 104
## 2559 129.0 46.4 0.2 -1.2 136
## 2560 119.6 56.0 28.6 35.4 106
## 2561 125.0 50.6 33.4 39.4 104
## 2562 124.4 49.8 33.8 37.8 102
## filler_level filler_speed temperature usage_cont carb_flow density mfr
## 1094 121.4 1010 72.2 17.88 42 0.24 603.4
## 1876 120.6 3996 68.2 23.80 3240 1.66 725.4
## 2088 92.6 3910 67.0 23.66 1092 0.94 702.8
## 2094 89.8 3998 65.2 23.52 1094 0.94 730.6
## 2095 90.2 3996 65.2 23.66 1088 0.92 726.6
## 2097 88.6 3998 65.4 23.60 1094 0.94 734.0
## 2098 90.0 3998 65.4 23.54 1090 0.96 728.6
## 2099 90.0 3996 66.4 23.58 1086 0.92 726.4
## 2100 89.2 3996 65.6 23.58 1094 0.94 729.6
## 2102 91.2 4000 66.0 24.32 1088 0.94 722.4
## 2103 90.4 3994 64.8 23.56 1088 0.92 726.6
## 2104 90.8 3998 64.8 23.68 1092 0.94 728.2
## 2557 109.2 3990 64.0 24.22 1180 0.58 730.2
## 2558 110.6 3998 64.4 23.68 1184 0.60 726.6
## 2559 109.8 1408 66.2 24.26 44 0.50 417.6
## 2560 110.6 3998 64.6 24.26 1212 0.58 736.0
## 2561 109.4 3996 64.8 24.16 1184 0.62 731.8
## 2562 110.6 3992 64.6 23.80 1188 0.60 733.0
## balling pressure_vacuum ph oxygen_filler bowl_setpoint pressure_setpoint
## 1094 0.160 -5.0 9.36 0.2380 120 46
## 1876 3.342 -5.4 8.06 0.0322 120 50
## 2088 1.548 -5.2 8.00 0.0026 90 50
## 2094 1.548 -5.2 8.06 0.0134 90 50
## 2095 1.498 -5.4 8.06 0.0026 90 50
## 2097 1.548 -5.4 8.04 0.0098 90 50
## 2098 1.596 -5.4 8.04 0.0070 90 50
## 2099 1.496 -5.4 8.06 0.0200 90 50
## 2100 1.548 -5.4 8.06 0.0204 90 50
## 2102 1.548 -5.4 8.02 0.0172 90 50
## 2103 1.498 -5.4 8.04 0.0254 90 50
## 2104 1.548 -5.4 8.00 0.0248 90 50
## 2557 1.868 -5.8 8.06 0.0260 110 50
## 2558 1.920 -5.8 8.02 0.0058 110 50
## 2559 1.666 -5.8 8.02 0.0422 110 50
## 2560 1.868 -5.8 7.98 0.0026 110 50
## 2561 1.968 -5.8 7.90 0.0286 110 50
## 2562 1.918 -5.8 7.88 0.0238 110 50
## air_pressurer alch_rel carb_rel balling_lvl Q1 Q3 IQR lower upper
## 1094 143.2 6.60 5.60 0.00 8.44 8.68 0.24 8.08 9.04
## 1876 142.6 7.16 5.62 3.16 8.44 8.68 0.24 8.08 9.04
## 2088 141.6 6.44 5.14 1.36 8.44 8.68 0.24 8.08 9.04
## 2094 141.8 6.52 5.36 1.36 8.44 8.68 0.24 8.08 9.04
## 2095 142.8 6.52 5.38 1.36 8.44 8.68 0.24 8.08 9.04
## 2097 142.2 6.54 5.36 1.38 8.44 8.68 0.24 8.08 9.04
## 2098 143.0 6.54 5.36 1.36 8.44 8.68 0.24 8.08 9.04
## 2099 142.4 6.52 5.36 1.34 8.44 8.68 0.24 8.08 9.04
## 2100 142.6 6.54 5.36 1.34 8.44 8.68 0.24 8.08 9.04
## 2102 142.2 6.54 5.36 1.34 8.44 8.68 0.24 8.08 9.04
## 2103 142.4 6.52 5.36 1.34 8.44 8.68 0.24 8.08 9.04
## 2104 143.2 6.52 5.36 1.34 8.44 8.68 0.24 8.08 9.04
## 2557 142.8 6.54 5.22 1.56 8.44 8.68 0.24 8.08 9.04
## 2558 142.8 6.54 5.20 1.60 8.44 8.68 0.24 8.08 9.04
## 2559 143.0 6.50 5.44 1.60 8.44 8.68 0.24 8.08 9.04
## 2560 143.2 6.52 5.24 1.58 8.44 8.68 0.24 8.08 9.04
## 2561 143.0 6.52 5.22 1.58 8.44 8.68 0.24 8.08 9.04
## 2562 142.2 6.50 5.26 1.60 8.44 8.68 0.24 8.08 9.04
## is_outlier
## 1094 TRUE
## 1876 TRUE
## 2088 TRUE
## 2094 TRUE
## 2095 TRUE
## 2097 TRUE
## 2098 TRUE
## 2099 TRUE
## 2100 TRUE
## 2102 TRUE
## 2103 TRUE
## 2104 TRUE
## 2557 TRUE
## 2558 TRUE
## 2559 TRUE
## 2560 TRUE
## 2561 TRUE
## 2562 TRUE
Checking through these values identified as outliers by the 1.5 * interquartile range definition, the one with the largest value of PH (9.36) is one where it could be an erroneous value, but I do not have enough confidence in this to exclude it from analysis. Many of the identified outliers are Brand B, which makes sense given the particular associations we have seen on the correlation plot with only that brand.
The other important aspect that needs to be addressed at this stage is centering and scaling our data. At this point, we also have the option of transforming our data, which would usually be done with a Box-Cox transformation, but since some variables include negative values, we will instead use the Yeo-Johnson transformation. We can save these processes so that they can be carried out as each model is fit.
We are going to train and test each model on the processed version of the training set.
set.seed(24601)
# index for training
index <- createDataPartition(train_pp$ph, p = .8, list = FALSE)
# train
train_data <- train_pp[index, ]
test_data<- train_pp[-index,]
# Training predictors and response
trainingX <- train_data |> select(-ph)
trainingY <- train_data$ph
# Test predictors and response
testX <- test_data |> select(-ph)
testY <- test_data$ph
# 5-fold cross-validation to make reasonable estimates
ctrl <- trainControl(method = "cv", number = 5)
pre_process <- c("center", "scale", "YeoJohnson")
set.seed(24601)
pls_model <- train(
x = trainingX,
y = trainingY,
method = "pls",
tuneLength = 20,
trControl = ctrl,
preProc = pre_process
)
pls_model
## Partial Least Squares
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 0.1484247 0.2451831 0.1180160
## 2 0.1418825 0.3105439 0.1116106
## 3 0.1388735 0.3399349 0.1094506
## 4 0.1372503 0.3549506 0.1076676
## 5 0.1357378 0.3692649 0.1059499
## 6 0.1348182 0.3782302 0.1049456
## 7 0.1342272 0.3834515 0.1046529
## 8 0.1341735 0.3843707 0.1045148
## 9 0.1341209 0.3851056 0.1043027
## 10 0.1339549 0.3867800 0.1039439
## 11 0.1339488 0.3869044 0.1039674
## 12 0.1338832 0.3876199 0.1039990
## 13 0.1338562 0.3878321 0.1039513
## 14 0.1338183 0.3881224 0.1039466
## 15 0.1337767 0.3884229 0.1039696
## 16 0.1337626 0.3884881 0.1040110
## 17 0.1336807 0.3891701 0.1039493
## 18 0.1336465 0.3894511 0.1039158
## 19 0.1336533 0.3893828 0.1038781
## 20 0.1336465 0.3894481 0.1038640
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 18.
The maximum variance by a Partial Least Squares model is about 39 percent, and that is with a likely overfitting effect of having so many variables in our model.
plspred <- predict(pls_model, newdata = testX)
plsResample <- postResample(pred=plspred, obs=testY)
plsResample
## RMSE Rsquared MAE
## 0.1371052 0.4175447 0.1057602
The estimate for R squared this time rises to .4175 when fitting the model on our test data. #### Ordinary Least Squares
ols_model <- train(
x = trainingX,
y = trainingY,
method = "lm",
trControl = ctrl,
preProc = pre_process
)
ols_model
## Linear Regression
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1644, 1643, 1643, 1645, 1645
## Resampling results:
##
## RMSE Rsquared MAE
## 0.1332371 0.3932706 0.1037553
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
This model explains slightly less variance than the Partial Least Squares model. It is however helpful at identifying which variables are statistically significant.
summary(ols_model)
##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.50017 -0.07662 0.01040 0.08653 0.42758
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.545e+00 2.913e-03 2933.732 < 2e-16 ***
## brand_codeA -1.571e-02 1.458e-02 -1.077 0.281512
## brand_codeB 1.155e-03 2.134e-02 0.054 0.956855
## brand_codeC -4.326e-02 1.432e-02 -3.021 0.002549 **
## brand_codeD -2.890e-03 2.113e-02 -0.137 0.891265
## carb_volume -1.867e-03 8.100e-03 -0.230 0.817776
## fill_ounces -6.977e-03 3.131e-03 -2.228 0.025984 *
## pc_volume -1.077e-02 3.538e-03 -3.045 0.002358 **
## carb_pressure -9.162e-03 1.135e-02 -0.807 0.419712
## carb_temp 1.283e-02 1.034e-02 1.241 0.214718
## psc -2.792e-03 3.111e-03 -0.898 0.369471
## psc_fill -6.094e-03 3.036e-03 -2.007 0.044836 *
## psc_co2 -5.608e-03 2.987e-03 -1.877 0.060648 .
## mnf_flow -8.907e-02 6.235e-03 -14.285 < 2e-16 ***
## carb_pressure1 2.607e-02 3.604e-03 7.234 6.62e-13 ***
## fill_pressure 4.811e-03 4.284e-03 1.123 0.261555
## hyd_pressure2 -4.043e-02 1.356e-02 -2.982 0.002895 **
## hyd_pressure3 8.243e-02 1.494e-02 5.519 3.86e-08 ***
## hyd_pressure4 7.395e-03 4.677e-03 1.581 0.113947
## filler_level -2.447e-02 1.012e-02 -2.419 0.015647 *
## filler_speed 8.211e-05 7.289e-03 0.011 0.991014
## temperature -1.761e-02 3.482e-03 -5.059 4.59e-07 ***
## usage_cont -1.806e-02 3.764e-03 -4.798 1.72e-06 ***
## carb_flow 1.144e-02 4.270e-03 2.680 0.007431 **
## density -2.364e-02 8.486e-03 -2.785 0.005395 **
## mfr -4.386e-03 6.475e-03 -0.677 0.498277
## balling -3.052e-02 1.142e-02 -2.672 0.007611 **
## pressure_vacuum -3.204e-03 4.185e-03 -0.765 0.444087
## oxygen_filler -1.542e-02 3.616e-03 -4.263 2.11e-05 ***
## bowl_setpoint 5.515e-02 1.031e-02 5.349 9.84e-08 ***
## pressure_setpoint -1.750e-02 4.378e-03 -3.996 6.66e-05 ***
## air_pressurer -1.330e-03 3.177e-03 -0.418 0.675663
## alch_rel 4.178e-02 1.214e-02 3.440 0.000593 ***
## carb_rel -1.528e-03 6.554e-03 -0.233 0.815667
## balling_lvl 2.978e-02 9.391e-03 3.171 0.001541 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.132 on 2020 degrees of freedom
## Multiple R-squared: 0.412, Adjusted R-squared: 0.4021
## F-statistic: 41.63 on 34 and 2020 DF, p-value: < 2.2e-16
olspred <- predict(ols_model, newdata = testX)
olsResample <- postResample(pred=olspred, obs=testY)
olsResample
## RMSE Rsquared MAE
## 0.1370562 0.4179574 0.1055787
Again we see a slight increase in r squared when fitting our test data, but we are still explaining less than half of the variance.
lasso_model <- train(
x = trainingX,
y = trainingY,
method = "lasso",
tuneLength = 20,
trControl = ctrl,
preProcess = pre_process
)
lasso_model
## The lasso
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1645, 1645, 1643, 1643, 1644
## Resampling results across tuning parameters:
##
## fraction RMSE Rsquared MAE
## 0.1000000 0.1501339 0.2830533 0.1193508
## 0.1421053 0.1454322 0.3083041 0.1153596
## 0.1842105 0.1420905 0.3261748 0.1124716
## 0.2263158 0.1400862 0.3378452 0.1107797
## 0.2684211 0.1386387 0.3495120 0.1094667
## 0.3105263 0.1375284 0.3581265 0.1084145
## 0.3526316 0.1367190 0.3640821 0.1076256
## 0.3947368 0.1359902 0.3697629 0.1069180
## 0.4368421 0.1353450 0.3749892 0.1062692
## 0.4789474 0.1348298 0.3792478 0.1057480
## 0.5210526 0.1344771 0.3819948 0.1053465
## 0.5631579 0.1342117 0.3840646 0.1049850
## 0.6052632 0.1339778 0.3859990 0.1046454
## 0.6473684 0.1338510 0.3869356 0.1044546
## 0.6894737 0.1337544 0.3876715 0.1043041
## 0.7315789 0.1336614 0.3884919 0.1041576
## 0.7736842 0.1336083 0.3890165 0.1040548
## 0.8157895 0.1335968 0.3891783 0.1039939
## 0.8578947 0.1336088 0.3891510 0.1039582
## 0.9000000 0.1336389 0.3889998 0.1039389
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was fraction = 0.8157895.
With the optimal fraction for our lasso model, we are still only explaining 38.9% of the variance.
lspred <- predict(lasso_model, newdata = testX)
lsResample <- postResample(pred=lspred, obs=testY)
The same pattern we have seen in the other linear models continues here.
ridge_model <- train(
x = trainingX,
y = trainingY,
method = "ridge",
tuneLength = 20,
trControl = ctrl,
preProcess = pre_process
)
ridge_model
## Ridge Regression
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1644, 1645, 1643, 1642, 1646
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.0000000000 0.1336952 0.3889776 0.1039762
## 0.0001000000 0.1336929 0.3889936 0.1039759
## 0.0001467799 0.1336918 0.3890009 0.1039758
## 0.0002154435 0.1336903 0.3890113 0.1039758
## 0.0003162278 0.1336882 0.3890261 0.1039760
## 0.0004641589 0.1336852 0.3890466 0.1039764
## 0.0006812921 0.1336811 0.3890745 0.1039773
## 0.0010000000 0.1336757 0.3891107 0.1039792
## 0.0014677993 0.1336687 0.3891550 0.1039827
## 0.0021544347 0.1336606 0.3892037 0.1039882
## 0.0031622777 0.1336522 0.3892464 0.1040011
## 0.0046415888 0.1336459 0.3892616 0.1040269
## 0.0068129207 0.1336454 0.3892126 0.1040664
## 0.0100000000 0.1336572 0.3890436 0.1041259
## 0.0146779927 0.1336893 0.3886800 0.1042114
## 0.0215443469 0.1337519 0.3880316 0.1043300
## 0.0316227766 0.1338570 0.3869923 0.1044874
## 0.0464158883 0.1340194 0.3854322 0.1046996
## 0.0681292069 0.1342601 0.3831857 0.1049696
## 0.1000000000 0.1346091 0.3800521 0.1053149
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.006812921.
With minimum root mean squared error, the ridge regression model is only explaining about 39% of the variance, performing about as well as the other models tested thus far.
rdpred <- predict(ridge_model, newdata = testX)
rdResample <- postResample(pred=rdpred, obs=testY)
rdResample
## RMSE Rsquared MAE
## 0.1368061 0.4206312 0.1054762
And we again see the pattern continue, a slight rise in the value of r squared. #### Elastic Net
elastic_model <- train(
x = trainingX,
y = trainingY,
method = "glmnet",
tuneLength = 20,
trControl = ctrl,
preProcess = pre_process
)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
elastic_model
## glmnet
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1644, 1643, 1644, 1645, 1644
## Resampling results across tuning parameters:
##
## alpha lambda RMSE Rsquared MAE
## 0.1000000 2.377511e-05 0.1329226 0.3941972 0.1035321
## 0.1000000 3.686369e-05 0.1329226 0.3941972 0.1035321
## 0.1000000 5.715773e-05 0.1329226 0.3941972 0.1035321
## 0.1000000 8.862397e-05 0.1329214 0.3942064 0.1035309
## 0.1000000 1.374129e-04 0.1329122 0.3942775 0.1035270
## 0.1000000 2.130608e-04 0.1328918 0.3944390 0.1035224
## 0.1000000 3.303542e-04 0.1328663 0.3946368 0.1035200
## 0.1000000 5.122193e-04 0.1328411 0.3948275 0.1035267
## 0.1000000 7.942041e-04 0.1328257 0.3949296 0.1035586
## 0.1000000 1.231426e-03 0.1328222 0.3949390 0.1036248
## 0.1000000 1.909346e-03 0.1328537 0.3946794 0.1037536
## 0.1000000 2.960470e-03 0.1329462 0.3939587 0.1039553
## 0.1000000 4.590256e-03 0.1331379 0.3924981 0.1043034
## 0.1000000 7.117264e-03 0.1333950 0.3908053 0.1046870
## 0.1000000 1.103543e-02 0.1339379 0.3871087 0.1052883
## 0.1000000 1.711061e-02 0.1349654 0.3796828 0.1063240
## 0.1000000 2.653027e-02 0.1365736 0.3675888 0.1078927
## 0.1000000 4.113560e-02 0.1386090 0.3528850 0.1098444
## 0.1000000 6.378140e-02 0.1407585 0.3421548 0.1117815
## 0.1000000 9.889407e-02 0.1438334 0.3298172 0.1144534
## 0.1473684 2.377511e-05 0.1329106 0.3942980 0.1035332
## 0.1473684 3.686369e-05 0.1329106 0.3942980 0.1035332
## 0.1473684 5.715773e-05 0.1329106 0.3942980 0.1035332
## 0.1473684 8.862397e-05 0.1329070 0.3943251 0.1035298
## 0.1473684 1.374129e-04 0.1328992 0.3943875 0.1035246
## 0.1473684 2.130608e-04 0.1328809 0.3945314 0.1035184
## 0.1473684 3.303542e-04 0.1328531 0.3947487 0.1035165
## 0.1473684 5.122193e-04 0.1328309 0.3949128 0.1035293
## 0.1473684 7.942041e-04 0.1328164 0.3950104 0.1035671
## 0.1473684 1.231426e-03 0.1328195 0.3949727 0.1036512
## 0.1473684 1.909346e-03 0.1328662 0.3945998 0.1037993
## 0.1473684 2.960470e-03 0.1330016 0.3935314 0.1040669
## 0.1473684 4.590256e-03 0.1332092 0.3920255 0.1044341
## 0.1473684 7.117264e-03 0.1335694 0.3895826 0.1048848
## 0.1473684 1.103543e-02 0.1343183 0.3842423 0.1056857
## 0.1473684 1.711061e-02 0.1356136 0.3746468 0.1069636
## 0.1473684 2.653027e-02 0.1375612 0.3597251 0.1088629
## 0.1473684 4.113560e-02 0.1396506 0.3463031 0.1107719
## 0.1473684 6.378140e-02 0.1422566 0.3340980 0.1130471
## 0.1473684 9.889407e-02 0.1459108 0.3224567 0.1162662
## 0.1947368 2.377511e-05 0.1329208 0.3942182 0.1035314
## 0.1947368 3.686369e-05 0.1329208 0.3942182 0.1035314
## 0.1947368 5.715773e-05 0.1329194 0.3942290 0.1035302
## 0.1947368 8.862397e-05 0.1329138 0.3942722 0.1035264
## 0.1947368 1.374129e-04 0.1328968 0.3944085 0.1035218
## 0.1947368 2.130608e-04 0.1328726 0.3946006 0.1035161
## 0.1947368 3.303542e-04 0.1328454 0.3948121 0.1035174
## 0.1947368 5.122193e-04 0.1328263 0.3949495 0.1035344
## 0.1947368 7.942041e-04 0.1328139 0.3950339 0.1035805
## 0.1947368 1.231426e-03 0.1328241 0.3949441 0.1036807
## 0.1947368 1.909346e-03 0.1328907 0.3944127 0.1038558
## 0.1947368 2.960470e-03 0.1330652 0.3930337 0.1041854
## 0.1947368 4.590256e-03 0.1333017 0.3913766 0.1045646
## 0.1947368 7.117264e-03 0.1337858 0.3879828 0.1051218
## 0.1947368 1.103543e-02 0.1347202 0.3811569 0.1060883
## 0.1947368 1.711061e-02 0.1363294 0.3688228 0.1076754
## 0.1947368 2.653027e-02 0.1383871 0.3534507 0.1096249
## 0.1947368 4.113560e-02 0.1406407 0.3402230 0.1116173
## 0.1947368 6.378140e-02 0.1435727 0.3282788 0.1141829
## 0.1947368 9.889407e-02 0.1479398 0.3161755 0.1180263
## 0.2421053 2.377511e-05 0.1329265 0.3941793 0.1035415
## 0.2421053 3.686369e-05 0.1329265 0.3941793 0.1035415
## 0.2421053 5.715773e-05 0.1329258 0.3941849 0.1035408
## 0.2421053 8.862397e-05 0.1329163 0.3942590 0.1035350
## 0.2421053 1.374129e-04 0.1328936 0.3944342 0.1035233
## 0.2421053 2.130608e-04 0.1328667 0.3946490 0.1035174
## 0.2421053 3.303542e-04 0.1328405 0.3948516 0.1035195
## 0.2421053 5.122193e-04 0.1328214 0.3949892 0.1035398
## 0.2421053 7.942041e-04 0.1328117 0.3950555 0.1035955
## 0.2421053 1.231426e-03 0.1328291 0.3949125 0.1037106
## 0.2421053 1.909346e-03 0.1329206 0.3941808 0.1039211
## 0.2421053 2.960470e-03 0.1331145 0.3926819 0.1042833
## 0.2421053 4.590256e-03 0.1334057 0.3906399 0.1046939
## 0.2421053 7.117264e-03 0.1340308 0.3861079 0.1053866
## 0.2421053 1.103543e-02 0.1351413 0.3778763 0.1065443
## 0.2421053 1.711061e-02 0.1369991 0.3632953 0.1083409
## 0.2421053 2.653027e-02 0.1391516 0.3476836 0.1103172
## 0.2421053 4.113560e-02 0.1415469 0.3348336 0.1123631
## 0.2421053 6.378140e-02 0.1447974 0.3242417 0.1152351
## 0.2421053 9.889407e-02 0.1501769 0.3066728 0.1198776
## 0.2894737 2.377511e-05 0.1329117 0.3942976 0.1035306
## 0.2894737 3.686369e-05 0.1329117 0.3942976 0.1035306
## 0.2894737 5.715773e-05 0.1329110 0.3943027 0.1035300
## 0.2894737 8.862397e-05 0.1329043 0.3943559 0.1035262
## 0.2894737 1.374129e-04 0.1328853 0.3945059 0.1035203
## 0.2894737 2.130608e-04 0.1328564 0.3947371 0.1035141
## 0.2894737 3.303542e-04 0.1328334 0.3949115 0.1035201
## 0.2894737 5.122193e-04 0.1328148 0.3950478 0.1035448
## 0.2894737 7.942041e-04 0.1328076 0.3950969 0.1036139
## 0.2894737 1.231426e-03 0.1328362 0.3948665 0.1037436
## 0.2894737 1.909346e-03 0.1329557 0.3939088 0.1039921
## 0.2894737 2.960470e-03 0.1331691 0.3922919 0.1043711
## 0.2894737 4.590256e-03 0.1335319 0.3897093 0.1048448
## 0.2894737 7.117264e-03 0.1342692 0.3843007 0.1056385
## 0.2894737 1.103543e-02 0.1356131 0.3740219 0.1070243
## 0.2894737 1.711061e-02 0.1375646 0.3587730 0.1088591
## 0.2894737 2.653027e-02 0.1398100 0.3431185 0.1108821
## 0.2894737 4.113560e-02 0.1423514 0.3308836 0.1130326
## 0.2894737 6.378140e-02 0.1460967 0.3195976 0.1163612
## 0.2894737 9.889407e-02 0.1523375 0.2984984 0.1216585
## 0.3368421 2.377511e-05 0.1329105 0.3943118 0.1035326
## 0.3368421 3.686369e-05 0.1329105 0.3943118 0.1035326
## 0.3368421 5.715773e-05 0.1329070 0.3943388 0.1035295
## 0.3368421 8.862397e-05 0.1328990 0.3944017 0.1035243
## 0.3368421 1.374129e-04 0.1328791 0.3945595 0.1035173
## 0.3368421 2.130608e-04 0.1328494 0.3947974 0.1035117
## 0.3368421 3.303542e-04 0.1328279 0.3949579 0.1035200
## 0.3368421 5.122193e-04 0.1328097 0.3950921 0.1035492
## 0.3368421 7.942041e-04 0.1328051 0.3951245 0.1036283
## 0.3368421 1.231426e-03 0.1328469 0.3947894 0.1037756
## 0.3368421 1.909346e-03 0.1329992 0.3935633 0.1040745
## 0.3368421 2.960470e-03 0.1332293 0.3918542 0.1044561
## 0.3368421 4.590256e-03 0.1336716 0.3886502 0.1050042
## 0.3368421 7.117264e-03 0.1345165 0.3824149 0.1059099
## 0.3368421 1.103543e-02 0.1361160 0.3697663 0.1075221
## 0.3368421 1.711061e-02 0.1380915 0.3546612 0.1093426
## 0.3368421 2.653027e-02 0.1404365 0.3387972 0.1113777
## 0.3368421 4.113560e-02 0.1431400 0.3273701 0.1137044
## 0.3368421 6.378140e-02 0.1475363 0.3127560 0.1176074
## 0.3368421 9.889407e-02 0.1545424 0.2883486 0.1235026
## 0.3842105 2.377511e-05 0.1329121 0.3943164 0.1035433
## 0.3842105 3.686369e-05 0.1329121 0.3943164 0.1035433
## 0.3842105 5.715773e-05 0.1329082 0.3943464 0.1035398
## 0.3842105 8.862397e-05 0.1328962 0.3944377 0.1035306
## 0.3842105 1.374129e-04 0.1328721 0.3946204 0.1035168
## 0.3842105 2.130608e-04 0.1328418 0.3948582 0.1035095
## 0.3842105 3.303542e-04 0.1328206 0.3950177 0.1035198
## 0.3842105 5.122193e-04 0.1328034 0.3951462 0.1035549
## 0.3842105 7.942041e-04 0.1328023 0.3951537 0.1036420
## 0.3842105 1.231426e-03 0.1328596 0.3946950 0.1038129
## 0.3842105 1.909346e-03 0.1330369 0.3932722 0.1041511
## 0.3842105 2.960470e-03 0.1332970 0.3913554 0.1045478
## 0.3842105 4.590256e-03 0.1338121 0.3875908 0.1051660
## 0.3842105 7.117264e-03 0.1347981 0.3801756 0.1062081
## 0.3842105 1.103543e-02 0.1365762 0.3658918 0.1079543
## 0.3842105 1.711061e-02 0.1386606 0.3500202 0.1098614
## 0.3842105 2.653027e-02 0.1410033 0.3351394 0.1118266
## 0.3842105 4.113560e-02 0.1439139 0.3243595 0.1143808
## 0.3842105 6.378140e-02 0.1489851 0.3056371 0.1188015
## 0.3842105 9.889407e-02 0.1567606 0.2749564 0.1253589
## 0.4315789 2.377511e-05 0.1329088 0.3943410 0.1035401
## 0.4315789 3.686369e-05 0.1329088 0.3943410 0.1035401
## 0.4315789 5.715773e-05 0.1329044 0.3943755 0.1035361
## 0.4315789 8.862397e-05 0.1328928 0.3944639 0.1035275
## 0.4315789 1.374129e-04 0.1328651 0.3946764 0.1035122
## 0.4315789 2.130608e-04 0.1328395 0.3948764 0.1035117
## 0.4315789 3.303542e-04 0.1328178 0.3950401 0.1035227
## 0.4315789 5.122193e-04 0.1328014 0.3951651 0.1035650
## 0.4315789 7.942041e-04 0.1328065 0.3951222 0.1036633
## 0.4315789 1.231426e-03 0.1328792 0.3945411 0.1038565
## 0.4315789 1.909346e-03 0.1330785 0.3929488 0.1042197
## 0.4315789 2.960470e-03 0.1333689 0.3908254 0.1046401
## 0.4315789 4.590256e-03 0.1339673 0.3863927 0.1053290
## 0.4315789 7.117264e-03 0.1351075 0.3776397 0.1065238
## 0.4315789 1.103543e-02 0.1369581 0.3627776 0.1082997
## 0.4315789 1.711061e-02 0.1392252 0.3453355 0.1103539
## 0.4315789 2.653027e-02 0.1415374 0.3319945 0.1122532
## 0.4315789 4.113560e-02 0.1447465 0.3206862 0.1151227
## 0.4315789 6.378140e-02 0.1504142 0.2986257 0.1199554
## 0.4315789 9.889407e-02 0.1590271 0.2538543 0.1273474
## 0.4789474 2.377511e-05 0.1329113 0.3943259 0.1035422
## 0.4789474 3.686369e-05 0.1329106 0.3943308 0.1035417
## 0.4789474 5.715773e-05 0.1329056 0.3943702 0.1035371
## 0.4789474 8.862397e-05 0.1328882 0.3945027 0.1035257
## 0.4789474 1.374129e-04 0.1328588 0.3947285 0.1035094
## 0.4789474 2.130608e-04 0.1328338 0.3949226 0.1035103
## 0.4789474 3.303542e-04 0.1328119 0.3950902 0.1035225
## 0.4789474 5.122193e-04 0.1327970 0.3952043 0.1035735
## 0.4789474 7.942041e-04 0.1328082 0.3951154 0.1036809
## 0.4789474 1.231426e-03 0.1329007 0.3943717 0.1038989
## 0.4789474 1.909346e-03 0.1331172 0.3926587 0.1042770
## 0.4789474 2.960470e-03 0.1334515 0.3902007 0.1047422
## 0.4789474 4.590256e-03 0.1341273 0.3851496 0.1054991
## 0.4789474 7.117264e-03 0.1354411 0.3748363 0.1068602
## 0.4789474 1.103543e-02 0.1373100 0.3599906 0.1086122
## 0.4789474 1.711061e-02 0.1396670 0.3420204 0.1107161
## 0.4789474 2.653027e-02 0.1420664 0.3290436 0.1126980
## 0.4789474 4.113560e-02 0.1456571 0.3159160 0.1159147
## 0.4789474 6.378140e-02 0.1518101 0.2916245 0.1211111
## 0.4789474 9.889407e-02 0.1612714 0.2223449 0.1293271
## 0.5263158 2.377511e-05 0.1329127 0.3943131 0.1035409
## 0.5263158 3.686369e-05 0.1329101 0.3943335 0.1035387
## 0.5263158 5.715773e-05 0.1329039 0.3943823 0.1035342
## 0.5263158 8.862397e-05 0.1328843 0.3945308 0.1035225
## 0.5263158 1.374129e-04 0.1328534 0.3947703 0.1035066
## 0.5263158 2.130608e-04 0.1328311 0.3949424 0.1035116
## 0.5263158 3.303542e-04 0.1328096 0.3951068 0.1035266
## 0.5263158 5.122193e-04 0.1327952 0.3952203 0.1035848
## 0.5263158 7.942041e-04 0.1328131 0.3950788 0.1037018
## 0.5263158 1.231426e-03 0.1329324 0.3941089 0.1039546
## 0.5263158 1.909346e-03 0.1331608 0.3923252 0.1043471
## 0.5263158 2.960470e-03 0.1335398 0.3895233 0.1048482
## 0.5263158 4.590256e-03 0.1342842 0.3839479 0.1056717
## 0.5263158 7.117264e-03 0.1357652 0.3721015 0.1071800
## 0.5263158 1.103543e-02 0.1376575 0.3572597 0.1089273
## 0.5263158 1.711061e-02 0.1400820 0.3389546 0.1110373
## 0.5263158 2.653027e-02 0.1425642 0.3265772 0.1131345
## 0.5263158 4.113560e-02 0.1466120 0.3104264 0.1167135
## 0.5263158 6.378140e-02 0.1532025 0.2842432 0.1223195
## 0.5263158 9.889407e-02 0.1628612 0.2047071 0.1307311
## 0.5736842 2.377511e-05 0.1329097 0.3943361 0.1035404
## 0.5736842 3.686369e-05 0.1329062 0.3943637 0.1035374
## 0.5736842 5.715773e-05 0.1329006 0.3944075 0.1035323
## 0.5736842 8.862397e-05 0.1328804 0.3945612 0.1035200
## 0.5736842 1.374129e-04 0.1328499 0.3947987 0.1035062
## 0.5736842 2.130608e-04 0.1328281 0.3949661 0.1035122
## 0.5736842 3.303542e-04 0.1328072 0.3951270 0.1035310
## 0.5736842 5.122193e-04 0.1327929 0.3952426 0.1035944
## 0.5736842 7.942041e-04 0.1328203 0.3950243 0.1037244
## 0.5736842 1.231426e-03 0.1329588 0.3939013 0.1040061
## 0.5736842 1.909346e-03 0.1332035 0.3920025 0.1044082
## 0.5736842 2.960470e-03 0.1336304 0.3888257 0.1049556
## 0.5736842 4.590256e-03 0.1344589 0.3825670 0.1058645
## 0.5736842 7.117264e-03 0.1360643 0.3695986 0.1074727
## 0.5736842 1.103543e-02 0.1380170 0.3543515 0.1092572
## 0.5736842 1.711061e-02 0.1404579 0.3363138 0.1113185
## 0.5736842 2.653027e-02 0.1430492 0.3244013 0.1135516
## 0.5736842 4.113560e-02 0.1475561 0.3050097 0.1174958
## 0.5736842 6.378140e-02 0.1546286 0.2750717 0.1235289
## 0.5736842 9.889407e-02 0.1640963 0.2033266 0.1318173
## 0.6210526 2.377511e-05 0.1329083 0.3943470 0.1035385
## 0.6210526 3.686369e-05 0.1329058 0.3943665 0.1035364
## 0.6210526 5.715773e-05 0.1328990 0.3944197 0.1035306
## 0.6210526 8.862397e-05 0.1328771 0.3945859 0.1035174
## 0.6210526 1.374129e-04 0.1328468 0.3948235 0.1035065
## 0.6210526 2.130608e-04 0.1328247 0.3949934 0.1035127
## 0.6210526 3.303542e-04 0.1328037 0.3951564 0.1035348
## 0.6210526 5.122193e-04 0.1327918 0.3952540 0.1036052
## 0.6210526 7.942041e-04 0.1328270 0.3949749 0.1037487
## 0.6210526 1.231426e-03 0.1329915 0.3936355 0.1040629
## 0.6210526 1.909346e-03 0.1332503 0.3916444 0.1044702
## 0.6210526 2.960470e-03 0.1337247 0.3880975 0.1050612
## 0.6210526 4.590256e-03 0.1346502 0.3810139 0.1060598
## 0.6210526 7.117264e-03 0.1363495 0.3672143 0.1077383
## 0.6210526 1.103543e-02 0.1383845 0.3513218 0.1095870
## 0.6210526 1.711061e-02 0.1408215 0.3338391 0.1115953
## 0.6210526 2.653027e-02 0.1435609 0.3219459 0.1139990
## 0.6210526 4.113560e-02 0.1484604 0.3000715 0.1182458
## 0.6210526 6.378140e-02 0.1560983 0.2631424 0.1247642
## 0.6210526 9.889407e-02 0.1654155 0.2033266 0.1329834
## 0.6684211 2.377511e-05 0.1329122 0.3943196 0.1035405
## 0.6684211 3.686369e-05 0.1329079 0.3943532 0.1035368
## 0.6684211 5.715773e-05 0.1328970 0.3944370 0.1035307
## 0.6684211 8.862397e-05 0.1328730 0.3946200 0.1035161
## 0.6684211 1.374129e-04 0.1328449 0.3948386 0.1035077
## 0.6684211 2.130608e-04 0.1328226 0.3950103 0.1035144
## 0.6684211 3.303542e-04 0.1328022 0.3951690 0.1035407
## 0.6684211 5.122193e-04 0.1327936 0.3952395 0.1036185
## 0.6684211 7.942041e-04 0.1328371 0.3948971 0.1037774
## 0.6684211 1.231426e-03 0.1330228 0.3933823 0.1041148
## 0.6684211 1.909346e-03 0.1332928 0.3913310 0.1045242
## 0.6684211 2.960470e-03 0.1338247 0.3873161 0.1051691
## 0.6684211 4.590256e-03 0.1348508 0.3793674 0.1062557
## 0.6684211 7.117264e-03 0.1366175 0.3649840 0.1079810
## 0.6684211 1.103543e-02 0.1387800 0.3479575 0.1099340
## 0.6684211 1.711061e-02 0.1411754 0.3315344 0.1118745
## 0.6684211 2.653027e-02 0.1441073 0.3190447 0.1144760
## 0.6684211 4.113560e-02 0.1493442 0.2952075 0.1189707
## 0.6684211 6.378140e-02 0.1575326 0.2497171 0.1259647
## 0.6684211 9.889407e-02 0.1668638 0.2033266 0.1342722
## 0.7157895 2.377511e-05 0.1329114 0.3943254 0.1035393
## 0.7157895 3.686369e-05 0.1329072 0.3943591 0.1035357
## 0.7157895 5.715773e-05 0.1328942 0.3944592 0.1035285
## 0.7157895 8.862397e-05 0.1328686 0.3946554 0.1035131
## 0.7157895 1.374129e-04 0.1328421 0.3948609 0.1035078
## 0.7157895 2.130608e-04 0.1328192 0.3950384 0.1035147
## 0.7157895 3.303542e-04 0.1327990 0.3951969 0.1035455
## 0.7157895 5.122193e-04 0.1327939 0.3952393 0.1036294
## 0.7157895 7.942041e-04 0.1328489 0.3948038 0.1038062
## 0.7157895 1.231426e-03 0.1330542 0.3931276 0.1041661
## 0.7157895 1.909346e-03 0.1333473 0.3909071 0.1045920
## 0.7157895 2.960470e-03 0.1339252 0.3865303 0.1052853
## 0.7157895 4.590256e-03 0.1350659 0.3775700 0.1064730
## 0.7157895 7.117264e-03 0.1368613 0.3629920 0.1081924
## 0.7157895 1.103543e-02 0.1391695 0.3446018 0.1102611
## 0.7157895 1.711061e-02 0.1415219 0.3293655 0.1121565
## 0.7157895 2.653027e-02 0.1446876 0.3156550 0.1149866
## 0.7157895 4.113560e-02 0.1502156 0.2902539 0.1196684
## 0.7157895 6.378140e-02 0.1589687 0.2336551 0.1272372
## 0.7157895 9.889407e-02 0.1684555 0.2033266 0.1356914
## 0.7631579 2.377511e-05 0.1329065 0.3943600 0.1035343
## 0.7631579 3.686369e-05 0.1329012 0.3944017 0.1035302
## 0.7631579 5.715773e-05 0.1328904 0.3944859 0.1035243
## 0.7631579 8.862397e-05 0.1328632 0.3946961 0.1035082
## 0.7631579 1.374129e-04 0.1328380 0.3948946 0.1035065
## 0.7631579 2.130608e-04 0.1328149 0.3950742 0.1035143
## 0.7631579 3.303542e-04 0.1327954 0.3952286 0.1035504
## 0.7631579 5.122193e-04 0.1327930 0.3952509 0.1036385
## 0.7631579 7.942041e-04 0.1328606 0.3947121 0.1038335
## 0.7631579 1.231426e-03 0.1330855 0.3928772 0.1042131
## 0.7631579 1.909346e-03 0.1334006 0.3904973 0.1046564
## 0.7631579 2.960470e-03 0.1340244 0.3857581 0.1053970
## 0.7631579 4.590256e-03 0.1352886 0.3756801 0.1067026
## 0.7631579 7.117264e-03 0.1370861 0.3612176 0.1083962
## 0.7631579 1.103543e-02 0.1395261 0.3415850 0.1105572
## 0.7631579 1.711061e-02 0.1418600 0.3273254 0.1124400
## 0.7631579 2.653027e-02 0.1452991 0.3117585 0.1155163
## 0.7631579 4.113560e-02 0.1510807 0.2852316 0.1203791
## 0.7631579 6.378140e-02 0.1604453 0.2134046 0.1285401
## 0.7631579 9.889407e-02 0.1701960 0.2033266 0.1373337
## 0.8105263 2.377511e-05 0.1329080 0.3943494 0.1035352
## 0.8105263 3.686369e-05 0.1329026 0.3943920 0.1035308
## 0.8105263 5.715773e-05 0.1328878 0.3945065 0.1035223
## 0.8105263 8.862397e-05 0.1328587 0.3947335 0.1035060
## 0.8105263 1.374129e-04 0.1328350 0.3949190 0.1035063
## 0.8105263 2.130608e-04 0.1328118 0.3950999 0.1035151
## 0.8105263 3.303542e-04 0.1327924 0.3952545 0.1035559
## 0.8105263 5.122193e-04 0.1327944 0.3952427 0.1036506
## 0.8105263 7.942041e-04 0.1328804 0.3945473 0.1038635
## 0.8105263 1.231426e-03 0.1331160 0.3926358 0.1042586
## 0.8105263 1.909346e-03 0.1334537 0.3900908 0.1047235
## 0.8105263 2.960470e-03 0.1341180 0.3850504 0.1055005
## 0.8105263 4.590256e-03 0.1354930 0.3739659 0.1069083
## 0.8105263 7.117264e-03 0.1373108 0.3594407 0.1085979
## 0.8105263 1.103543e-02 0.1398106 0.3393612 0.1107725
## 0.8105263 1.711061e-02 0.1421709 0.3256601 0.1127153
## 0.8105263 2.653027e-02 0.1458961 0.3080332 0.1160155
## 0.8105263 4.113560e-02 0.1519524 0.2799372 0.1211471
## 0.8105263 6.378140e-02 0.1614649 0.2057100 0.1294713
## 0.8105263 9.889407e-02 0.1706573 NaN 0.1377592
## 0.8578947 2.377511e-05 0.1329054 0.3943690 0.1035331
## 0.8578947 3.686369e-05 0.1328999 0.3944122 0.1035284
## 0.8578947 5.715773e-05 0.1328847 0.3945305 0.1035197
## 0.8578947 8.862397e-05 0.1328545 0.3947688 0.1035050
## 0.8578947 1.374129e-04 0.1328315 0.3949471 0.1035059
## 0.8578947 2.130608e-04 0.1328081 0.3951300 0.1035158
## 0.8578947 3.303542e-04 0.1327881 0.3952924 0.1035601
## 0.8578947 5.122193e-04 0.1327950 0.3952424 0.1036622
## 0.8578947 7.942041e-04 0.1328995 0.3943899 0.1038970
## 0.8578947 1.231426e-03 0.1331438 0.3924222 0.1042995
## 0.8578947 1.909346e-03 0.1335084 0.3896698 0.1047935
## 0.8578947 2.960470e-03 0.1342250 0.3842095 0.1056166
## 0.8578947 4.590256e-03 0.1356932 0.3722843 0.1071077
## 0.8578947 7.117264e-03 0.1375291 0.3577115 0.1087878
## 0.8578947 1.103543e-02 0.1400598 0.3375155 0.1109507
## 0.8578947 1.711061e-02 0.1424797 0.3240640 0.1129859
## 0.8578947 2.653027e-02 0.1464974 0.3041883 0.1165150
## 0.8578947 4.113560e-02 0.1528458 0.2739185 0.1219317
## 0.8578947 6.378140e-02 0.1623665 0.2033266 0.1302880
## 0.8578947 9.889407e-02 0.1706573 NaN 0.1377592
## 0.9052632 2.377511e-05 0.1329081 0.3943502 0.1035350
## 0.9052632 3.686369e-05 0.1329014 0.3944038 0.1035302
## 0.9052632 5.715773e-05 0.1328829 0.3945465 0.1035196
## 0.9052632 8.862397e-05 0.1328511 0.3947964 0.1035029
## 0.9052632 1.374129e-04 0.1328291 0.3949666 0.1035061
## 0.9052632 2.130608e-04 0.1328058 0.3951497 0.1035172
## 0.9052632 3.303542e-04 0.1327858 0.3953128 0.1035662
## 0.9052632 5.122193e-04 0.1327982 0.3952194 0.1036758
## 0.9052632 7.942041e-04 0.1329222 0.3942006 0.1039342
## 0.9052632 1.231426e-03 0.1331707 0.3922177 0.1043400
## 0.9052632 1.909346e-03 0.1335664 0.3892192 0.1048627
## 0.9052632 2.960470e-03 0.1343421 0.3832678 0.1057398
## 0.9052632 4.590256e-03 0.1358786 0.3707453 0.1072893
## 0.9052632 7.117264e-03 0.1377569 0.3558757 0.1089929
## 0.9052632 1.103543e-02 0.1403208 0.3355229 0.1111399
## 0.9052632 1.711061e-02 0.1427946 0.3224300 0.1132583
## 0.9052632 2.653027e-02 0.1470738 0.3006342 0.1169832
## 0.9052632 4.113560e-02 0.1537408 0.2675399 0.1227141
## 0.9052632 6.378140e-02 0.1632678 0.2033266 0.1310906
## 0.9052632 9.889407e-02 0.1706573 NaN 0.1377592
## 0.9526316 2.377511e-05 0.1329041 0.3943829 0.1035312
## 0.9526316 3.686369e-05 0.1328969 0.3944398 0.1035265
## 0.9526316 5.715773e-05 0.1328767 0.3945956 0.1035146
## 0.9526316 8.862397e-05 0.1328482 0.3948189 0.1035020
## 0.9526316 1.374129e-04 0.1328262 0.3949909 0.1035058
## 0.9526316 2.130608e-04 0.1328031 0.3951716 0.1035186
## 0.9526316 3.303542e-04 0.1327842 0.3953269 0.1035725
## 0.9526316 5.122193e-04 0.1328017 0.3951929 0.1036904
## 0.9526316 7.942041e-04 0.1329426 0.3940345 0.1039697
## 0.9526316 1.231426e-03 0.1331994 0.3919975 0.1043819
## 0.9526316 1.909346e-03 0.1336250 0.3887656 0.1049291
## 0.9526316 2.960470e-03 0.1344634 0.3822817 0.1058626
## 0.9526316 4.590256e-03 0.1360625 0.3692124 0.1074631
## 0.9526316 7.117264e-03 0.1379951 0.3539138 0.1092090
## 0.9526316 1.103543e-02 0.1405567 0.3338417 0.1113147
## 0.9526316 1.711061e-02 0.1431203 0.3206648 0.1135320
## 0.9526316 2.653027e-02 0.1476536 0.2969211 0.1174578
## 0.9526316 4.113560e-02 0.1545918 0.2619304 0.1234431
## 0.9526316 6.378140e-02 0.1642359 0.2033266 0.1319365
## 0.9526316 9.889407e-02 0.1706573 NaN 0.1377592
## 1.0000000 2.377511e-05 0.1329025 0.3943970 0.1035320
## 1.0000000 3.686369e-05 0.1328955 0.3944528 0.1035270
## 1.0000000 5.715773e-05 0.1328748 0.3946135 0.1035147
## 1.0000000 8.862397e-05 0.1328463 0.3948344 0.1035023
## 1.0000000 1.374129e-04 0.1328235 0.3950125 0.1035058
## 1.0000000 2.130608e-04 0.1328008 0.3951916 0.1035205
## 1.0000000 3.303542e-04 0.1327832 0.3953364 0.1035795
## 1.0000000 5.122193e-04 0.1328063 0.3951580 0.1037058
## 1.0000000 7.942041e-04 0.1329661 0.3938412 0.1040076
## 1.0000000 1.231426e-03 0.1332281 0.3917774 0.1044196
## 1.0000000 1.909346e-03 0.1336873 0.3882762 0.1049962
## 1.0000000 2.960470e-03 0.1345891 0.3812510 0.1059900
## 1.0000000 4.590256e-03 0.1362412 0.3677195 0.1076245
## 1.0000000 7.117264e-03 0.1382418 0.3518469 0.1094282
## 1.0000000 1.103543e-02 0.1408018 0.3320559 0.1115022
## 1.0000000 1.711061e-02 0.1434602 0.3187124 0.1138308
## 1.0000000 2.653027e-02 0.1481903 0.2937448 0.1179068
## 1.0000000 4.113560e-02 0.1554706 0.2554287 0.1241912
## 1.0000000 6.378140e-02 0.1652780 0.2033266 0.1328597
## 1.0000000 9.889407e-02 0.1706573 NaN 0.1377592
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.0003303542.
At its optimal values, the elastic net model performs about as well as the other linear models. The model with the minimum root mean squared error explains about 39% of the variance.
enpred <- predict(elastic_model, newdata = testX)
enResample <- postResample(pred=enpred, obs=testY)
enResample
## RMSE Rsquared MAE
## 0.1366820 0.4225125 0.1054273
After looking at the various linear models, we can only hope that the nonlinear models perform better in explaining the variance in PH, since these models cannot even explain half of the variance.
set.seed(24601)
knn_grid <- expand.grid(k = seq(1, 21, by = 2))
knn_model <- train(ph ~ ., data = train_data,
method = "knn",
tuneGrid = knn_grid,
trControl = ctrl,
preProcess = pre_process)
knn_model
## k-Nearest Neighbors
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (34)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 0.1571298 0.3376827 0.10545197
## 3 0.1283961 0.4586531 0.09421875
## 5 0.1247663 0.4748306 0.09298493
## 7 0.1233603 0.4842774 0.09290304
## 9 0.1232427 0.4846575 0.09318522
## 11 0.1245447 0.4738264 0.09463101
## 13 0.1252214 0.4686268 0.09568094
## 15 0.1259550 0.4628176 0.09666035
## 17 0.1267149 0.4567030 0.09715940
## 19 0.1268391 0.4552119 0.09764181
## 21 0.1276046 0.4484649 0.09810749
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 9.
With a value of 7 for the number of neighbors considered for the value, we get a maximum r squared of .4745. This is an improvement over the linear models.
knn_pred <- predict(knn_model, testX)
knn_resample <- postResample(knn_pred, testY)
knn_resample
## RMSE Rsquared MAE
## 0.12712325 0.50325447 0.09409939
We also see an improvement in how well the model fits the testing data.
set.seed(24601)
mars_grid <- expand.grid(degree = 1:2, nprune = seq(3, 30, by = 3))
mars_model <- train(x = trainingX, y = trainingY,
method = "earth",
tuneGrid = mars_grid,
trControl = ctrl,
preProcess = pre_process)
mars_model
## Multivariate Adaptive Regression Spline
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 3 0.1427850 0.3014911 0.11218231
## 1 6 0.1382800 0.3451378 0.10796721
## 1 9 0.1349277 0.3773848 0.10432556
## 1 12 0.1323308 0.4006187 0.10252380
## 1 15 0.1313089 0.4095839 0.10148329
## 1 18 0.1306325 0.4155768 0.10067455
## 1 21 0.1307606 0.4144335 0.10069639
## 1 24 0.1307606 0.4144335 0.10069639
## 1 27 0.1307606 0.4144335 0.10069639
## 1 30 0.1307606 0.4144335 0.10069639
## 2 3 0.1424299 0.3043370 0.11149167
## 2 6 0.1514080 0.3034098 0.10847153
## 2 9 0.1458623 0.3458508 0.10409818
## 2 12 0.1437758 0.3659845 0.10176819
## 2 15 0.1369686 0.4016242 0.09862093
## 2 18 0.1349284 0.4149646 0.09708129
## 2 21 0.1346464 0.4210798 0.09571633
## 2 24 0.1342221 0.4266992 0.09509272
## 2 27 0.1326639 0.4314925 0.09479406
## 2 30 0.1324807 0.4343004 0.09431343
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 18 and degree = 1.
With a degree of one and an nprune value of 18, this model accounts for about 42% of the variance in predicting pH. We are seeing improvement from the linear models, but are not yet predicting ph with much accuracy.
mars_pred <- predict(mars_model, testX)
mars_resample <- postResample(mars_pred, testY)
mars_resample
## RMSE Rsquared MAE
## 0.12996831 0.47800982 0.09958475
R squared of the resample is also increasing, showing that this model is a better fit at capturing new data than the linear ones as well, though with a greater error than the KNN model. #### Support Vector Machines
set.seed(24601)
svm_grid <- expand.grid(sigma = c(0.01, 0.05, 0.1), C = c(0.1, 1, 10))
svm_model <- train(ph ~ ., data = train_data,
method = "svmRadial",
tuneGrid = svm_grid,
trControl = ctrl,
preProcess = pre_process)
svm_model
## Support Vector Machines with Radial Basis Function Kernel
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (34)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644
## Resampling results across tuning parameters:
##
## sigma C RMSE Rsquared MAE
## 0.01 0.1 0.1346234 0.3979370 0.10364118
## 0.01 1.0 0.1232807 0.4832278 0.09107551
## 0.01 10.0 0.1190963 0.5188803 0.08740104
## 0.05 0.1 0.1312313 0.4300784 0.09936065
## 0.05 1.0 0.1176899 0.5278731 0.08706333
## 0.05 10.0 0.1176241 0.5301238 0.08802148
## 0.10 0.1 0.1378865 0.3874625 0.10550142
## 0.10 1.0 0.1209807 0.5040307 0.09035770
## 0.10 10.0 0.1222567 0.4871226 0.09200612
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.05 and C = 10.
At the optimal values of C and sigma, this model can explain 53% of the variance. Again, this is stronger than the linear models, but not strong enough to present to anyone.
svm_pred <- predict(svm_model, testX)
svm_resample <- postResample(svm_pred, testY)
svm_resample
## RMSE Rsquared MAE
## 0.12205159 0.54124868 0.08725195
set.seed(24601)
rpartFit <- train(ph ~ ., data = train_data,
method = "rpart",
trControl = ctrl,
tuneGrid = data.frame(cp = c(0.01, 0.05, 0.1)),
preProcess = pre_process)
rpartFit
## CART
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (34)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644
## Resampling results across tuning parameters:
##
## cp RMSE Rsquared MAE
## 0.01 0.1303245 0.4198557 0.1002721
## 0.05 0.1437860 0.2929985 0.1138928
## 0.10 0.1510357 0.2185671 0.1188032
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.01.
rpartPred <- predict(rpartFit, testX)
rpartResample <- postResample(rpartPred, testY)
rpartResample
## RMSE Rsquared MAE
## 0.1397839 0.3942828 0.1049243
This model does not perform as well as any of the nonlinear models. #### Random Forest
set.seed(24601)
rfGrid <- expand.grid(mtry=seq(2,38,by=3) )
rfModel <- train(x = trainingX, y = trainingY,
method = "rf",
tuneGrid = rfGrid,
importance = TRUE,
trControl = ctrl,
ntree = 50,
preProcess=pre_process)
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
plot(rfModel)
Here we can see that we minimize our RMSE at around 27 predictors. This would be the majority of our variables. It seems that the tradeoff between overfitting and minimizing RMSE would lead us to use closer to 20 predictors, since our improvement from that point is minimal. Judging by the scale, another possibility is a model with closer to 6 predictors, given the observed improvement in RMSE.
rfModel
## Random Forest
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 1643, 1644, 1645, 1644, 1644
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 0.11338601 0.5969181 0.08671890
## 5 0.10405811 0.6533762 0.07827453
## 8 0.10412213 0.6451380 0.07706827
## 11 0.10233563 0.6555127 0.07545518
## 14 0.10052992 0.6666400 0.07457894
## 17 0.10015646 0.6669754 0.07384513
## 20 0.10000695 0.6670764 0.07385840
## 23 0.10045650 0.6626793 0.07330933
## 26 0.09975411 0.6672592 0.07282458
## 29 0.09951271 0.6674173 0.07275401
## 32 0.10002789 0.6629529 0.07292256
## 35 0.10045497 0.6582002 0.07311189
## 38 0.10001779 0.6621191 0.07305077
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 29.
It is important to note the difference between the R squared values at mtry = 5 and 29. 29 was value that maximizes our Rsquared with a value of .6674, but requires the involvement of 29 different variables. A model with only 5 variables would be much easier to explain, and we are sacrificing (.6674 - .6534) explaining 1.4% of variance.
varImpPlot(
rfModel$finalModel,
type = 1,
main = "Random Forest Variable Importance"
)
rfPred <- predict(rfModel, testX)
rfResample <- postResample(rfPred, testY)
rfResample
## RMSE Rsquared MAE
## 0.10768165 0.64534782 0.07415208
We are performing comparably well with our test data.
ctrlOOB <- trainControl(method = "oob")
set.seed(24601)
rfTuneOOB <- train(x = trainingX, y = trainingY,
method = "rf",
tuneGrid = rfGrid,
ntree = 50,
importance = TRUE,
trControl = ctrlOOB,
preProcess=pre_process)
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
rfTuneOOB
## Random Forest
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16)
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared
## 2 0.11447402 0.5504192
## 5 0.10664458 0.6098142
## 8 0.10377403 0.6305367
## 11 0.10105026 0.6496769
## 14 0.10090420 0.6506889
## 17 0.09947006 0.6605478
## 20 0.09981191 0.6582106
## 23 0.09986052 0.6578776
## 26 0.09889365 0.6644705
## 29 0.10083731 0.6511519
## 32 0.09922449 0.6622218
## 35 0.09983148 0.6580765
## 38 0.09799135 0.6705653
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 38.
It is somewhat unsurprising that the lowest error is the one that uses every variable because it includes the most information to account for. At its peak, this model accounts for 67% of the variance, which is the highest we have observed thus far, but again is subject to being that way because it is overly fit.
varImpPlot(
rfTuneOOB$finalModel,
type = 1,
main = "Random Forest Variable (Out of Bag error) Importance"
)
Being the greatest contributor to both random forests, we can tell that mnf_flow is a particularly important variable in the prediction of pH levels. Being Brand C, air_pressurer, pressure_vacuum, oxygen_filler, and hydpressure 3 also occupy high places on both lists.
rfOOBPred <- predict(rfTuneOOB, testX)
rfOOBResample <- postResample(rfOOBPred, testY)
rfOOBResample
## RMSE Rsquared MAE
## 0.1063199 0.6516039 0.0718344
set.seed(24601)
cubistModel <- train(x = trainingX, y = trainingY,
method = "cubist",
verbose = FALSE,
preProcess=pre_process)
cubistModel
## Cubist
##
## 2055 samples
## 34 predictor
##
## Pre-processing: centered (34), scaled (34), Yeo-Johnson transformation (16)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 2055, 2055, 2055, 2055, 2055, 2055, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared MAE
## 1 0 0.1486562 0.3918197 0.10045784
## 1 5 0.1475851 0.4148138 0.09935475
## 1 9 0.1470693 0.4125497 0.09891662
## 10 0 0.1088636 0.5976971 0.07818208
## 10 5 0.1071543 0.6131967 0.07678160
## 10 9 0.1072050 0.6117322 0.07683018
## 20 0 0.1049828 0.6256309 0.07562827
## 20 5 0.1031959 0.6384917 0.07414244
## 20 9 0.1032997 0.6372475 0.07423520
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.
cubistPred <- predict(cubistModel, testX)
cubistResample <- postResample(cubistPred, testY)
cubistResample
## RMSE Rsquared MAE
## 0.10732555 0.64227681 0.07174676
This model was not as strong as the Random Forest with Out of Bag error. ### Final Comparison
model_results <- rbind(
plsResample,
olsResample,
lsResample,
rdResample,
enResample,
knn_resample,
mars_resample,
svm_resample,
rfResample,
rfOOBResample,
cubistResample
)
rownames(model_results) <- c("Partial Least Squares", "Ordinary Least Squares", "Lasso", "Ridge", "Elastic Net", "KNN", "MARS", "SVM", "Random Forest", "Random Forest OOB", "Cubist")
model_results[order(model_results[,2]), ]
## RMSE Rsquared MAE
## Partial Least Squares 0.1371052 0.4175447 0.10576023
## Ordinary Least Squares 0.1370562 0.4179574 0.10557865
## Ridge 0.1368061 0.4206312 0.10547624
## Lasso 0.1367303 0.4224391 0.10545659
## Elastic Net 0.1366820 0.4225125 0.10542735
## MARS 0.1299683 0.4780098 0.09958475
## KNN 0.1271233 0.5032545 0.09409939
## SVM 0.1220516 0.5412487 0.08725195
## Cubist 0.1073255 0.6422768 0.07174676
## Random Forest 0.1076817 0.6453478 0.07415208
## Random Forest OOB 0.1063199 0.6516039 0.07183440
With the Random Forest (OOB) model having the lowest RMSE, lowest MAE, and highest Rsquared values, it is the strongest model that we have, so we will need to communicate the implications of this more directly. Although our strongest model only explains a disappointing 65% of the variance, at least it gives us a model that tends to be off by only about .1 on the pH scale. with an average error of .07.
varImp(rfTuneOOB)
## rf variable importance
##
## only 20 most important variables shown (out of 34)
##
## Overall
## mnf_flow 100.00
## alch_rel 62.26
## brand_codeC 53.84
## air_pressurer 37.99
## pressure_vacuum 34.15
## balling_lvl 33.14
## hyd_pressure3 31.18
## carb_rel 30.16
## oxygen_filler 29.44
## bowl_setpoint 27.91
## usage_cont 26.30
## temperature 26.20
## filler_speed 24.26
## carb_flow 21.42
## carb_pressure1 19.18
## balling 18.97
## hyd_pressure2 18.40
## fill_pressure 17.92
## pc_volume 16.66
## carb_volume 16.51
toppre <- varImp(rfTuneOOB)$importance %>%
arrange(-Overall) |>
head(10)
variables <- rownames(toppre)
cor_matrix <- train_data |>
dplyr::select(all_of(c("ph", variables))) |>
cor()
corrplot::corrplot(cor_matrix,
method = "color",
type = "upper",
addCoef.col = "black",
tl.cex = 0.8,
tl.col = "black")
plot(rfTuneOOB)
With the best model selected, let’s use it to predict pH for the other set of data provided.
set.seed(24601)
imputed_data <- mice(test_pp, m = 1, method = 'rf', print = FALSE)
test_pp <- complete(imputed_data)
ph_pred <- predict(rfTuneOOB, test_pp)
test_pp$ph <- ph_pred
data.frame(
SampleID = 1:nrow(test_pp),
Predicted_pH = predict(rfTuneOOB, newdata = test_pp)
) |>
write.csv(
"pH_Predictions.csv",
row.names = FALSE
)
Despite the wide range of models tested, the strongest model found could only explain 65% of the variance in pH levels. With our most significant predictors, we do see a high correlation between a few of them: namely balling_lvl, carb_rel, and alch_rel. In spite of this, they were each contained in our strongest performing random forests. However, there is still 35% of the variance within pH level unexplained by our model, which does seem low for this type of study. Perhaps the lenience I showed in keeping the outliers and imputing for missing data values contributed to this, or of course there is the possibility of missing something, either in the data collection stage or in the analytical stage.
set.seed(24601)
imputed_data <- mice(test, m = 1, method = 'rf', print = FALSE)
## Warning: Number of logged events: 2
test <- complete(imputed_data)
test$ph <- ph_pred
test |>
ggplot(aes(x = brand_code, y = ph))+
geom_boxplot(fill = "skyblue", color = "black") +
ggtitle('Distribution of Predicted pH Values by Brand')