My leadership has told me that new regulations are requiring us to understand our manufacturing process, the predictive factors and be able to report to them our predictive model of PH.
Please use the historical data set I am providing. Build and report the factors in BOTH a technical and non-technical report. Please provide your non-technical report in a business friendly readable document and your predictions in an Excel readable format. The technical report should show clearly the models you tested and how you selected your final approach.
# Read PH Training Data
studentdata <- read.csv("https://raw.githubusercontent.com/johnnydrodriguez/Data624/refs/heads/main/StudentData.csv", header = TRUE, sep = ',', na.strings="", fill = TRUE)
str(studentdata)
## 'data.frame': 2571 obs. of 33 variables:
## $ Brand.Code : chr "B" "A" "B" "A" ...
## $ Carb.Volume : num 5.34 5.43 5.29 5.44 5.49 ...
## $ Fill.Ounces : num 24 24 24.1 24 24.3 ...
## $ PC.Volume : num 0.263 0.239 0.263 0.293 0.111 ...
## $ Carb.Pressure : num 68.2 68.4 70.8 63 67.2 66.6 64.2 67.6 64.2 72 ...
## $ Carb.Temp : num 141 140 145 133 137 ...
## $ PSC : num 0.104 0.124 0.09 NA 0.026 0.09 0.128 0.154 0.132 0.014 ...
## $ PSC.Fill : num 0.26 0.22 0.34 0.42 0.16 0.24 0.4 0.34 0.12 0.24 ...
## $ PSC.CO2 : num 0.04 0.04 0.16 0.04 0.12 0.04 0.04 0.04 0.14 0.06 ...
## $ Mnf.Flow : num -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 ...
## $ Carb.Pressure1 : num 119 122 120 115 118 ...
## $ Fill.Pressure : num 46 46 46 46.4 45.8 45.6 51.8 46.8 46 45.2 ...
## $ Hyd.Pressure1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Hyd.Pressure2 : num NA NA NA 0 0 0 0 0 0 0 ...
## $ Hyd.Pressure3 : num NA NA NA 0 0 0 0 0 0 0 ...
## $ Hyd.Pressure4 : int 118 106 82 92 92 116 124 132 90 108 ...
## $ Filler.Level : num 121 119 120 118 119 ...
## $ Filler.Speed : int 4002 3986 4020 4012 4010 4014 NA 1004 4014 4028 ...
## $ Temperature : num 66 67.6 67 65.6 65.6 66.2 65.8 65.2 65.4 66.6 ...
## $ Usage.cont : num 16.2 19.9 17.8 17.4 17.7 ...
## $ Carb.Flow : int 2932 3144 2914 3062 3054 2948 30 684 2902 3038 ...
## $ Density : num 0.88 0.92 1.58 1.54 1.54 1.52 0.84 0.84 0.9 0.9 ...
## $ MFR : num 725 727 735 731 723 ...
## $ Balling : num 1.4 1.5 3.14 3.04 3.04 ...
## $ Pressure.Vacuum : num -4 -4 -3.8 -4.4 -4.4 -4.4 -4.4 -4.4 -4.4 -4.4 ...
## $ PH : num 8.36 8.26 8.94 8.24 8.26 8.32 8.4 8.38 8.38 8.5 ...
## $ Oxygen.Filler : num 0.022 0.026 0.024 0.03 0.03 0.024 0.066 0.046 0.064 0.022 ...
## $ Bowl.Setpoint : int 120 120 120 120 120 120 120 120 120 120 ...
## $ Pressure.Setpoint: num 46.4 46.8 46.6 46 46 46 46 46 46 46 ...
## $ Air.Pressurer : num 143 143 142 146 146 ...
## $ Alch.Rel : num 6.58 6.56 7.66 7.14 7.14 7.16 6.54 6.52 6.52 6.54 ...
## $ Carb.Rel : num 5.32 5.3 5.84 5.42 5.44 5.44 5.38 5.34 5.34 5.34 ...
## $ Balling.Lvl : num 1.48 1.56 3.28 3.04 3.04 3.02 1.44 1.44 1.44 1.38 ...
The primary goal at this stage is to identify whether our dataset contains missing observations that could distort model training. We visualize patterns of missingness to gain understanding of the scale and structure of missing data. Detecting these gaps allows us to address them before moving to modeling steps and to reduce the risk of biased models.
The missingness heatmap and the ranked bar chart show MFR has a notable amount of missing entries – where MFR stands out with the highest number of missing values. It could significantly affect our analysis. Brand.Code and Filler.Speed, also exhibit noticeable levels of missingness. As these variables have a high percentage of missing data, they may influence model performance and predictions.
We apply Multiple Imputation by Chained Equations (MICE), to fill in missing values. MICE ensures that our final modeling dataset retains the relationships between features and yields better generalization performance.
The output for MICE Imputation specifies the imputation methods assigned to each variable. Brand.Code is imputed using “polyreg”(polytomous regression) to handle its categorical value, while most numeric variables (e.g., use “pmm” (Predictive Mean Matching). This method selection tailors the imputation process to the type and characteristics of each variable, preserving the relationships within the data.
# Load library
library(mice)
# Convert Brand.Code for imputation
studentdata$Brand.Code <- as.factor(studentdata$Brand.Code)
# Generate method vector for the dataset
imputation_methods <- make.method(studentdata)
# Specify method for Brand.Code
imputation_methods["Brand.Code"] <- "polyreg"
# Specify method for numerical columns (Predictive Mean Matching)
numeric_columns <- names(studentdata)[sapply(studentdata, is.numeric)]
imputation_methods[numeric_columns] <- "pmm"
# Verify the methods vector
print(imputation_methods)
## Brand.Code Carb.Volume Fill.Ounces PC.Volume
## "polyreg" "pmm" "pmm" "pmm"
## Carb.Pressure Carb.Temp PSC PSC.Fill
## "pmm" "pmm" "pmm" "pmm"
## PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure
## "pmm" "pmm" "pmm" "pmm"
## Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4
## "pmm" "pmm" "pmm" "pmm"
## Filler.Level Filler.Speed Temperature Usage.cont
## "pmm" "pmm" "pmm" "pmm"
## Carb.Flow Density MFR Balling
## "pmm" "pmm" "pmm" "pmm"
## Pressure.Vacuum PH Oxygen.Filler Bowl.Setpoint
## "pmm" "pmm" "pmm" "pmm"
## Pressure.Setpoint Air.Pressurer Alch.Rel Carb.Rel
## "pmm" "pmm" "pmm" "pmm"
## Balling.Lvl
## "pmm"
# Perform multiple imputations
imputed_data <- mice(studentdata, method = imputation_methods, m = 5, seed = 321)
##
## iter imp variable
## 1 1 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 1 2 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 1 3 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 1 4 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 1 5 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 2 1 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 2 2 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 2 3 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 2 4 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 2 5 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 3 1 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 3 2 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 3 3 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 3 4 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 3 5 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 4 1 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 4 2 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 4 3 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 4 4 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 4 5 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 5 1 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 5 2 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 5 3 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 5 4 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
## 5 5 Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC PSC.Fill PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4 Filler.Level Filler.Speed Temperature Usage.cont Carb.Flow Density MFR Balling PH Oxygen.Filler Bowl.Setpoint Pressure.Setpoint Alch.Rel Carb.Rel Balling.Lvl
We verify that the dataset is now fully populated by checking for any lingering NA values. With complete data, we can move forward with modeling. The output confirms that all missing values have been successfully addressed.
## Brand.Code Carb.Volume Fill.Ounces PC.Volume
## 0 0 0 0
## Carb.Pressure Carb.Temp PSC PSC.Fill
## 0 0 0 0
## PSC.CO2 Mnf.Flow Carb.Pressure1 Fill.Pressure
## 0 0 0 0
## Hyd.Pressure1 Hyd.Pressure2 Hyd.Pressure3 Hyd.Pressure4
## 0 0 0 0
## Filler.Level Filler.Speed Temperature Usage.cont
## 0 0 0 0
## Carb.Flow Density MFR Balling
## 0 0 0 0
## Pressure.Vacuum PH Oxygen.Filler Bowl.Setpoint
## 0 0 0 0
## Pressure.Setpoint Air.Pressurer Alch.Rel Carb.Rel
## 0 0 0 0
## Balling.Lvl
## 0
We evaluate the relationships between pH (our target) and all available predictors to gain a preliminary understanding of which factors might drive our outcome. Using correlation and visualization we identify the variables are most strongly associated with pH.
The table of correlations ranks predictors based on their correlation with the target variable, pH. Mnf.Flow shows the strongest relationship with pH (negative correlation of -0.445), Bowl.Setpoint (positive correlation of 0.346) is has the strongest positive correlation and may indicate a role in influencing pH – although these correlations are moderate at best.
# Load libraries
library(GGally)
library(ggplot2)
library(dplyr)
library(tidyverse)
# Prepare the data for correlation analysis
cor_data <- studentdata_mice %>%
select_if(is.numeric)
# Compute correlations between 'PH' and all other predictors
correlation_values <- cor_data %>%
summarise(across(.cols = everything(),
.fns = ~ cor(., cor_data$PH, use = "complete.obs"),
.names = "cor_{col}")) %>%
pivot_longer(cols = everything(), names_to = "Predictor", values_to = "Correlation") %>%
mutate(Predictor = gsub("cor_", "", Predictor)) %>%
filter(Predictor != "PH") %>%
arrange(desc(abs(Correlation)))
# Output sorted list of correlations
print(correlation_values)
## # A tibble: 31 × 2
## Predictor Correlation
## <chr> <dbl>
## 1 Mnf.Flow -0.445
## 2 Bowl.Setpoint 0.346
## 3 Filler.Level 0.323
## 4 Usage.cont -0.319
## 5 Pressure.Setpoint -0.304
## 6 Hyd.Pressure3 -0.238
## 7 Pressure.Vacuum 0.220
## 8 Fill.Pressure -0.211
## 9 Hyd.Pressure2 -0.200
## 10 Oxygen.Filler 0.165
## # ℹ 21 more rows
# Create scatterplot matrix for PH against other predictors
correlation_matrix <- ggpairs(
data = cor_data,
columns = which(names(cor_data) == "PH"):ncol(cor_data),
upper = list(continuous = wrap("cor", size = 3)),
title = "Correlation Scatterplot Matrix for PH") +
theme(
axis.text.x = element_text(size = 7),
axis.text.y = element_text(size = 7),
strip.text = element_text(size = 7),
plot.title = element_text(size = 12))
# Display the scatterplot matrix
print(correlation_matrix)
We expand from a single predictor to a full multiple linear regression model that incorporates all potential factors. We examine model summaries and diagnostic plots to assess fit and identify potential issues like non-linearity or heteroskedasticity.
The multiple linear regression model results an adjusted R-squared value of 0.4072; the model explains approximately 40.7% of the variance in pH. Significant predictors include Mnf.Flow (negative association), Carb.Pressure1 (positive association), Hyd.Pressure3, Temperature, Usage.cont, Balling and other as indicated by p values, Several predictors, such as Carb.Volume and Hyd.Pressure4, are not significant and may add unnecessary complexity to the model.
The Residuals vs Fitted plot shows a reasonably random spread, suggesting linearity and homoscedasticity. The Q-Q plot suggests most residuals are normally distributed, with some deviation in the tails. The Scale-Location plot supports a relatively even variance across fitted values. The Residuals vs Leverage plot highlights a few influential points (e.g., observation 1094), which may warrant further treatment.
# Load libraries
library(ggplot2)
library(car)
library(lmtest)
library(GGally)
# Multiple linear regression model
model <- lm(PH ~ ., data = studentdata_mice)
summary(model)
##
## Call:
## lm(formula = PH ~ ., data = studentdata_mice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52814 -0.07883 0.01056 0.08650 0.85018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.142e+01 1.083e+00 10.543 < 2e-16 ***
## Brand.CodeB 7.884e-02 2.093e-02 3.766 0.000169 ***
## Brand.CodeC -5.817e-02 2.075e-02 -2.803 0.005104 **
## Brand.CodeD 6.130e-02 1.638e-02 3.742 0.000186 ***
## Carb.Volume -1.316e-01 9.039e-02 -1.456 0.145491
## Fill.Ounces -8.063e-02 3.211e-02 -2.511 0.012116 *
## PC.Volume -9.943e-02 5.415e-02 -1.836 0.066454 .
## Carb.Pressure 4.089e-03 4.243e-03 0.964 0.335332
## Carb.Temp -2.310e-03 3.342e-03 -0.691 0.489442
## PSC -1.069e-01 5.725e-02 -1.868 0.061911 .
## PSC.Fill -3.392e-02 2.345e-02 -1.446 0.148178
## PSC.CO2 -1.031e-01 6.294e-02 -1.638 0.101514
## Mnf.Flow -7.249e-04 4.609e-05 -15.726 < 2e-16 ***
## Carb.Pressure1 6.968e-03 6.936e-04 10.045 < 2e-16 ***
## Fill.Pressure 2.140e-03 1.210e-03 1.768 0.077131 .
## Hyd.Pressure1 3.432e-05 3.684e-04 0.093 0.925787
## Hyd.Pressure2 -1.218e-03 5.305e-04 -2.296 0.021763 *
## Hyd.Pressure3 3.558e-03 5.823e-04 6.110 1.15e-09 ***
## Hyd.Pressure4 -1.330e-04 3.127e-04 -0.425 0.670742
## Filler.Level -1.067e-03 5.702e-04 -1.872 0.061365 .
## Filler.Speed 1.692e-05 1.123e-05 1.507 0.131912
## Temperature -1.259e-02 2.277e-03 -5.528 3.56e-08 ***
## Usage.cont -7.394e-03 1.141e-03 -6.478 1.11e-10 ***
## Carb.Flow 1.063e-05 3.737e-06 2.844 0.004491 **
## Density -1.146e-01 2.798e-02 -4.095 4.35e-05 ***
## MFR -1.069e-04 6.198e-05 -1.724 0.084835 .
## Balling -7.430e-02 2.314e-02 -3.210 0.001342 **
## Pressure.Vacuum -1.833e-02 7.628e-03 -2.402 0.016356 *
## Oxygen.Filler -2.684e-01 6.823e-02 -3.934 8.57e-05 ***
## Bowl.Setpoint 3.221e-03 5.987e-04 5.381 8.11e-08 ***
## Pressure.Setpoint -9.089e-03 1.965e-03 -4.626 3.91e-06 ***
## Air.Pressurer -2.665e-03 2.371e-03 -1.124 0.261096
## Alch.Rel 4.957e-02 2.264e-02 2.190 0.028632 *
## Carb.Rel 3.967e-03 4.741e-02 0.084 0.933321
## Balling.Lvl 1.162e-01 2.142e-02 5.425 6.34e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.133 on 2536 degrees of freedom
## Multiple R-squared: 0.415, Adjusted R-squared: 0.4072
## F-statistic: 52.92 on 34 and 2536 DF, p-value: < 2.2e-16
Due to the potential issues introduced by multiple predictors, we use the Variance Inflation Factor to detect and remove variables that cause multicollinearity. After pruning these collinear variables, we refit the model. By mitigating multicollinearity, we ensure that each predictor contributes distinct information about pH. After removing predictors with high multicollinearity, the remaining predictors have low VIF values - all less than 5.
This refined model has an adjusted R-squared of 0.3003,lower than the full MLR model. The results suggest that some variables remain non-significant and could potentially be excluded but the drop in R Square value indicate the the excluded predictors offer explanatory power of the variance.
The diagnostic plots show similar trends to the previous MLR model, with the Residuals vs Fitted plot displaying a random spread, suggesting linearity. The Q-Q plot indicates a relatively normal distribution of residuals with some deviations in the tails. The Scale-Location plot suggests variance homogeneity. Residuals vs Leverage plot identifies a few influential points (e.g., observation 1094), which may need attention.
# Load library
library(car)
# Compute VIF values
vif_df <- as.data.frame(vif(model))
names(vif_df) <- "VIF"
vif_df$Predictor <- rownames(vif_df)
# Filter predictors with VIF <= 5
predictors_to_keep <- vif_df$Predictor[vif_df$VIF <= 5]
# Create a new formula with the filtered predictors
formula <- as.formula(paste("PH ~", paste(predictors_to_keep, collapse = " + ")))
# Refit the model using only the selected predictors
final_model <- lm(formula, data = studentdata_mice)
# Compute VIF values for the final model and store them in a dataframe
final_vif_df <- as.data.frame(vif(final_model))
names(final_vif_df) <- "VIF"
final_vif_df$Predictor <- rownames(final_vif_df)
# Display VIF values for all predictors
print(vif_df)
## VIF NA NA Predictor
## Brand.Code 49.317117 3 1.914989 Brand.Code
## Carb.Volume 13.496459 1 3.673753 Carb.Volume
## Fill.Ounces 1.150115 1 1.072434 Fill.Ounces
## PC.Volume 1.589004 1 1.260557 PC.Volume
## Carb.Pressure 33.422705 1 5.781237 Carb.Pressure
## Carb.Temp 27.907420 1 5.282747 Carb.Temp
## PSC 1.161185 1 1.077583 PSC
## PSC.Fill 1.114727 1 1.055806 PSC.Fill
## PSC.CO2 1.078937 1 1.038719 PSC.CO2
## Mnf.Flow 4.408606 1 2.099668 Mnf.Flow
## Carb.Pressure1 1.579910 1 1.256945 Carb.Pressure1
## Fill.Pressure 2.178546 1 1.475990 Fill.Pressure
## Hyd.Pressure1 3.047057 1 1.745582 Hyd.Pressure1
## Hyd.Pressure2 10.979155 1 3.313481 Hyd.Pressure2
## Hyd.Pressure3 12.552731 1 3.542983 Hyd.Pressure3
## Hyd.Pressure4 2.556701 1 1.598969 Hyd.Pressure4
## Filler.Level 11.699337 1 3.420429 Filler.Level
## Filler.Speed 12.877293 1 3.588494 Filler.Speed
## Temperature 1.454516 1 1.206033 Temperature
## Usage.cont 1.678740 1 1.295662 Usage.cont
## Carb.Flow 2.339185 1 1.529440 Carb.Flow
## Density 16.204921 1 4.025534 Density
## MFR 11.070038 1 3.327167 MFR
## Balling 67.441560 1 8.212281 Balling
## Pressure.Vacuum 2.746893 1 1.657375 Pressure.Vacuum
## Oxygen.Filler 1.486183 1 1.219091 Oxygen.Filler
## Bowl.Setpoint 12.248414 1 3.499773 Bowl.Setpoint
## Pressure.Setpoint 2.335773 1 1.528324 Pressure.Setpoint
## Air.Pressurer 1.199729 1 1.095322 Air.Pressurer
## Alch.Rel 18.972738 1 4.355771 Alch.Rel
## Carb.Rel 5.410081 1 2.325958 Carb.Rel
## Balling.Lvl 50.458243 1 7.103397 Balling.Lvl
## VIF Predictor
## Fill.Ounces 1.060853 Fill.Ounces
## PC.Volume 1.479512 PC.Volume
## PSC 1.152751 PSC
## PSC.Fill 1.104370 PSC.Fill
## PSC.CO2 1.062424 PSC.CO2
## Mnf.Flow 2.792677 Mnf.Flow
## Carb.Pressure1 1.427103 Carb.Pressure1
## Fill.Pressure 2.013339 Fill.Pressure
## Hyd.Pressure1 1.455314 Hyd.Pressure1
## Hyd.Pressure4 1.304884 Hyd.Pressure4
## Temperature 1.207110 Temperature
## Usage.cont 1.568474 Usage.cont
## Carb.Flow 1.470612 Carb.Flow
## Pressure.Vacuum 1.545823 Pressure.Vacuum
## Oxygen.Filler 1.434200 Oxygen.Filler
## Pressure.Setpoint 2.169753 Pressure.Setpoint
## Air.Pressurer 1.124044 Air.Pressurer
##
## Call:
## lm(formula = formula, data = studentdata_mice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.54684 -0.08428 0.01034 0.09910 0.76961
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.337e+01 8.727e-01 15.325 < 2e-16 ***
## Fill.Ounces -1.431e-01 3.351e-02 -4.269 2.03e-05 ***
## PC.Volume -7.972e-02 5.677e-02 -1.404 0.16034
## PSC -1.235e-01 6.198e-02 -1.993 0.04638 *
## PSC.Fill -4.330e-02 2.536e-02 -1.708 0.08780 .
## PSC.CO2 -1.153e-01 6.786e-02 -1.700 0.08931 .
## Mnf.Flow -7.025e-04 3.985e-05 -17.627 < 2e-16 ***
## Carb.Pressure1 6.326e-03 7.162e-04 8.833 < 2e-16 ***
## Fill.Pressure 3.477e-03 1.264e-03 2.751 0.00599 **
## Hyd.Pressure1 1.360e-03 2.766e-04 4.917 9.36e-07 ***
## Hyd.Pressure4 -1.154e-03 2.427e-04 -4.755 2.10e-06 ***
## Temperature -1.920e-02 2.254e-03 -8.522 < 2e-16 ***
## Usage.cont -9.126e-03 1.199e-03 -7.614 3.71e-14 ***
## Carb.Flow -1.428e-06 3.219e-06 -0.444 0.65738
## Pressure.Vacuum -1.066e-03 6.217e-03 -0.172 0.86384
## Oxygen.Filler -2.821e-01 7.282e-02 -3.875 0.00011 ***
## Pressure.Setpoint -1.084e-02 2.057e-03 -5.271 1.47e-07 ***
## Air.Pressurer -1.366e-03 2.493e-03 -0.548 0.58376
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1445 on 2553 degrees of freedom
## Multiple R-squared: 0.3049, Adjusted R-squared: 0.3003
## F-statistic: 65.88 on 17 and 2553 DF, p-value: < 2.2e-16
We model a Lasso regression to leverage its feature selection, model complexity reduction attributes. Lasso applies a penalty to shrink less important coefficients toward zero resulting in smaller set non-zero important features . By focusing on the most critical variables, we may be able to generate a parsimonious model that may outperform our linear regression baselines.
The Lasso regression results reveal a tuned model with optimal parameters: alpha = 0.1 and lambda = 0.0012, The variable importance rankings highlight Oxygen.Filler as the most influential predictor, followed by Carb.Rel, PC.Volume, and Density. These top features identify key predictors of pH.
The Lasso regression achieves effective dimensionality reduction but te relatively modest R-squared value suggests that further exploration may be necessary. The R-squared of 0.354 is lower than the adjusted R-squared of 0.4072 from the multiple linear regression (MLR) model and may not be a suitable model for this dataset.
# Load libraries
library(caret)
library(glmnet)
# Set the seed
set.seed(321)
# Split the data into training and test sets
set.seed(123)
train_indices <- createDataPartition(studentdata_mice$PH, p = 0.8, list = FALSE)
train_data <- studentdata_mice[train_indices, ]
test_data <- studentdata_mice[-train_indices, ]
# Prepare the data for Lasso regression
x_train <- as.matrix(train_data[, -which(names(train_data) == "PH")])
y_train <- train_data$PH
x_test <- as.matrix(test_data[, -which(names(test_data) == "PH")])
y_test <- test_data$PH
# Define cross-validation method
cross_val_10 <- trainControl(method = "cv", number = 10)
# Train the Lasso regression model
lasso_model <- train(
x = x_train,
y = y_train,
method = "glmnet",
trControl = cross_val_10,
preProcess = c("center", "scale", "nzv"),
tuneLength = 20)
# Extract optimal lambda from cross-validation
lasso_best_lambda <- lasso_model$bestTune
print(lasso_best_lambda)
## alpha lambda
## 10 0.1 0.001216886
# Compute the best cross-validated R-squared
lasso_best_r2 <- max(lasso_model$results$Rsquared, na.rm = TRUE)
print(lasso_best_r2)
## [1] 0.3383953
# Predict on the test set and compute R-squared
lasso_pred <- predict(lasso_model, newdata = x_test)
test_r2 <- postResample(pred = lasso_pred, obs = y_test)[2]
print(test_r2)
## Rsquared
## 0.3542227
# Display variable importance
lasso_var_importance <- varImp(lasso_model)
print(lasso_var_importance)
## glmnet variable importance
##
## only 20 most important variables shown (out of 32)
##
## Overall
## Oxygen.Filler 100.0000
## Carb.Rel 37.8637
## PC.Volume 32.2475
## Density 31.0408
## PSC 31.0301
## PSC.CO2 29.0048
## Alch.Rel 24.9745
## Fill.Ounces 23.7366
## PSC.Fill 12.3386
## Balling 12.1294
## Balling.Lvl 7.4211
## Temperature 6.3005
## Pressure.Setpoint 3.0438
## Carb.Volume 2.7216
## Usage.cont 2.0451
## Carb.Pressure1 1.9458
## Pressure.Vacuum 1.5986
## Hyd.Pressure3 0.8888
## Bowl.Setpoint 0.6928
## Fill.Pressure 0.5551
# Plot variable importance
plot(
lasso_var_importance,
main = "Lasso Variable Importance",
cex.main = 0.7,
cex.axis = 0.7,
cex.lab = 0.7)
# Plot tuning results
plot(
lasso_model,
main = "Lasso Tuning Results (Lambda vs R2)",
cex.main = 0.7,
cex.axis = 0.7,
cex.lab = 0.7)
To capture non-linear and potentially complex relationships, we train a neural network model.
The neural network model achieves an R-squared of 0.477, surpassing the performance of the Lasso regression (R-squared = 0.354) and coming close to the multiple linear regression (MLR) model’s R-squared of 0.407. The optimal parameters are size = 9 (number of hidden units) and decay = 0.5 (regularization parameter). Overall, the neural network model has a moderate results.
The tuning results graph shows that the combination of decay and hidden units significantly impacts model performance, with RMSE values improving as the decay and size are fine-tuned. The variable importance plot highlights Carb.Flow, Hyd.Pressure1, and Mnf.Flow as key predictors.
The residual plot suggests minimal systematic errors, with residuals centered around zero across predicted values.
# Load libraries
library(caret)
library(nnet)
library(dplyr)
# Convert 'Brand,Code' to dummy variable
studentdata_processed <- studentdata_mice %>%
mutate(across(where(is.character), as.factor)) %>%
dummyVars(" ~ .", data = .) %>%
predict(., studentdata_mice) %>%
as.data.frame()
# Split data into training and test sets
set.seed(321)
trainIndex <- createDataPartition(studentdata_processed$PH, p = 0.80, list = FALSE)
trainData <- studentdata_processed[trainIndex, ]
testData <- studentdata_processed[-trainIndex, ]
# Separate predictors and outcome for training and test sets
x_train <- trainData[, !names(trainData) %in% "PH"]
y_train <- trainData$PH
x_test <- testData[, !names(testData) %in% "PH"]
y_test <- testData$PH
# Define a tuning grid
nnetGrid <- expand.grid(
size = 1:10,
decay = c(0.001, 0.01, 0.1, 0.5))
# Train neural network
set.seed(321)
nnetTuned <- train(
x = x_train,
y = y_train,
method = "nnet",
tuneGrid = nnetGrid,
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 10),
linout = TRUE,
trace = FALSE,
maxit = 500,
MaxNWts = 10 * (ncol(x_train) + 1) + 10 + 1)
# View model results
print(nnetTuned)
## Neural Network
##
## 2058 samples
## 35 predictor
##
## Pre-processing: centered (35), scaled (35)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1852, 1852, 1852, 1852, 1852, 1852, ...
## Resampling results across tuning parameters:
##
## size decay RMSE Rsquared MAE
## 1 0.001 0.1314319 0.4138695 0.10196380
## 1 0.010 0.1313793 0.4146213 0.10144177
## 1 0.100 0.1336383 0.3968065 0.10354521
## 1 0.500 0.1344240 0.3894365 0.10443959
## 2 0.001 0.1259923 0.4630454 0.09675634
## 2 0.010 0.1311952 0.4257799 0.09945994
## 2 0.100 0.1276595 0.4497412 0.09733289
## 2 0.500 0.1289450 0.4375297 0.09834390
## 3 0.001 0.1241064 0.4797454 0.09392113
## 3 0.010 0.1264619 0.4657074 0.09376243
## 3 0.100 0.1281223 0.4494815 0.09564114
## 3 0.500 0.1273465 0.4552452 0.09517264
## 4 0.001 0.1298719 0.4508411 0.09490488
## 4 0.010 0.1281195 0.4629114 0.09440000
## 4 0.100 0.1258940 0.4711368 0.09335261
## 4 0.500 0.1271084 0.4588074 0.09436937
## 5 0.001 0.1297184 0.4494131 0.09642799
## 5 0.010 0.1263351 0.4791060 0.09406880
## 5 0.100 0.1274081 0.4672405 0.09386175
## 5 0.500 0.1256988 0.4731223 0.09343496
## 6 0.001 0.1337957 0.4354460 0.09682491
## 6 0.010 0.1322501 0.4554357 0.09461203
## 6 0.100 0.1272241 0.4705839 0.09415077
## 6 0.500 0.1281386 0.4601989 0.09438853
## 7 0.001 0.1306319 0.4627089 0.09726355
## 7 0.010 0.1348173 0.4362562 0.09727713
## 7 0.100 0.1288899 0.4708749 0.09351349
## 7 0.500 0.1238270 0.4912216 0.09226578
## 8 0.001 0.1320434 0.4563607 0.09900138
## 8 0.010 0.1330398 0.4511652 0.09647422
## 8 0.100 0.1305544 0.4581832 0.09431613
## 8 0.500 0.1251436 0.4824894 0.09355498
## 9 0.001 0.1390460 0.4219372 0.10053340
## 9 0.010 0.1337140 0.4482196 0.09754615
## 9 0.100 0.1326855 0.4554070 0.09508132
## 9 0.500 0.1234339 0.4956042 0.09221320
## 10 0.001 0.1343660 0.4504271 0.09976991
## 10 0.010 0.1339430 0.4584989 0.09626510
## 10 0.100 0.1285464 0.4767257 0.09463327
## 10 0.500 0.1235827 0.4907654 0.09188635
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 9 and decay = 0.5.
## size decay
## 36 9 0.5
# Variable importance
varImpPlot <- varImp(nnetTuned)
plot(varImpPlot, main = "Variable Importance from Neural Network")
# Predict on the test set
nnetPred <- predict(nnetTuned, newdata = x_test)
# Evaluate model performance
performance <- postResample(pred = nnetPred, obs = y_test)
print(performance)
## RMSE Rsquared MAE
## 0.12969633 0.47670903 0.09457135
# Residual diagnostic plot
residuals <- y_test - nnetPred
plot(nnetPred, residuals, xlab = "Predicted Values", ylab = "Residuals",
main = "Residual Plot", col = "blue", pch = 19)
abline(h = 0, col = "red", lwd = 2)
We train a gradient-boosted decision tree model using the xgboost package. We also extract feature importance rankings.
The XGBoost model achieves an R-squared of 0.601, significantly outperforming the previous models, including the neural network (R-squared = 0.477) and multiple linear regression (R-squared = 0.407).
The residual plot shows a good distribution of residuals around zero across the predicted values, indicating a well-fitted model with minimal systematic errors. The feature importance chart indicates Mnf.Flow as the most influential predictor, followed by Oxygen.Filler, Filler.Speed, and Alch.Rel, as predictors of PH.
# Load libraries
library(xgboost)
library(dplyr)
# Preprocess the data (convert categorical to dummy)
studentdata_processed <- studentdata_mice %>%
mutate(across(where(is.character), as.factor)) %>%
dummyVars(" ~ .", data = .) %>%
predict(., studentdata_mice) %>%
as.data.frame()
# Split into training and testing sets
set.seed(123)
trainIndex <- createDataPartition(studentdata_processed$PH, p = 0.80, list = FALSE)
trainData <- studentdata_processed[trainIndex, ]
testData <- studentdata_processed[-trainIndex, ]
# Separate predictors and outcome
x_train <- as.matrix(trainData[, !names(trainData) %in% "PH"])
y_train <- trainData$PH
x_test <- as.matrix(testData[, !names(testData) %in% "PH"])
y_test <- testData$PH
# Convert training data to DMatrix for XGBoost
dtrain <- xgb.DMatrix(data = x_train, label = y_train)
dtest <- xgb.DMatrix(data = x_test, label = y_test)
# Set parameters for XGBoost
params <- list(
booster = "gbtree",
objective = "reg:squarederror",
eta = 0.1,
max_depth = 6,
subsample = 0.8,
colsample_bytree = 0.8,
gamma = 0)
# Train XGBoost model
set.seed(123)
xgb_model <- xgb.train(
params = params,
data = dtrain,
nrounds = 200,
watchlist = list(train = dtrain, test = dtest),
early_stopping_rounds = 10,
print_every_n = 10)
## [1] train-rmse:7.243151 test-rmse:7.245872
## Multiple eval metrics are present. Will use test_rmse for early stopping.
## Will train until test_rmse hasn't improved in 10 rounds.
##
## [11] train-rmse:2.532389 test-rmse:2.535015
## [21] train-rmse:0.894530 test-rmse:0.894218
## [31] train-rmse:0.332541 test-rmse:0.336355
## [41] train-rmse:0.145981 test-rmse:0.163109
## [51] train-rmse:0.092131 test-rmse:0.123164
## [61] train-rmse:0.076973 test-rmse:0.116931
## [71] train-rmse:0.069305 test-rmse:0.115053
## [81] train-rmse:0.063289 test-rmse:0.114220
## [91] train-rmse:0.057172 test-rmse:0.113406
## [101] train-rmse:0.051773 test-rmse:0.113434
## [111] train-rmse:0.047308 test-rmse:0.112561
## [121] train-rmse:0.043808 test-rmse:0.112121
## [131] train-rmse:0.040077 test-rmse:0.111577
## [141] train-rmse:0.036986 test-rmse:0.111569
## [151] train-rmse:0.034563 test-rmse:0.111293
## [161] train-rmse:0.031591 test-rmse:0.111015
## [171] train-rmse:0.029441 test-rmse:0.110850
## [181] train-rmse:0.027203 test-rmse:0.110943
## [191] train-rmse:0.024747 test-rmse:0.110770
## Stopping. Best iteration:
## [185] train-rmse:0.026203 test-rmse:0.110632
# Predict on test data
predictions <- predict(xgb_model, newdata = dtest, iteration_range = c(1, xgb_model$best_iteration))
# Evaluate performance
rsq <- cor(y_test, predictions)^2
cat("R-squared:", rsq, "\n")
## R-squared: 0.6011627
# Residual plot
residuals <- y_test - predictions
plot(predictions, residuals, xlab = "Predicted", ylab = "Residuals",
main = "Residual Plot for XGBoost", col = "blue", pch = 19)
abline(h = 0, col = "red", lwd = 2)
# Variable importance plot
importance <- xgb.importance(feature_names = colnames(x_train), model = xgb_model)
xgb.plot.importance(importance, main = "Feature Importance for XGBoost")
Based on the final RSquared results, the Random Forest model is the best predictor with Rsq = 0.7130620. This model will be used to predict the Student Evaluation dataset. The Random Forest model provides the highest explanatory power, capturing a significant proportion of the variance in the target variable (pH). Its ensemble nature enables it to handle non-linear relationships and complex interactions between predictors effectively in this dataset.
Model | R-squared |
---|---|
OLR | 0.198 |
MLR | 0.4072 |
LASSO | 0.3542227 |
NNET | 0.47670903 |
XGB | 0.6011627 |
RIDGE | 0.3650728 |
SVM | 0.514247 |
RFOREST | 0.7130620 |
BOOSTED | 0.6468105 |