Predicting Concrete Strength: A Multivariate and Logistic Regression Approach to Classifying Compressive Strength Outcomes

Introduction

Concrete is the most important material in civil engineering. It has revolutionized the way construction is being carried out in the last century. The ability to make concrete into any shape and size has led to the many intriguing structures. The characteristics of concrete are dependent on the materials it is made up of and the proportions of the materials. The main materials that make up concrete are water, cement, and aggregates of varying proportions. The composition of these materials determine its compressive strength (Chopra et al., 2014).

The compressive strength of concrete refers to its ability to carry the load on its surface without any crack or deflection. Compressive strength of concrete is used to determine its quality and it is the most important concrete property. Therefore, the ability to predict concrete compression strength is of immense importance. The required compression strength of concrete can vary from 17 MPa for residential purposes to 70 MPa for some commercial con

In this study, multiple regression analysis was carried out for predicting compression strength of concrete using eight variables cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, fine aggregate, and age. The performance of the developed model was tested using test data.

We then wanted to understand if given the ingredients of concrete, could we accurately predict if the resultant strength of that concrete would meet industry standards (4000 PSI). So, we developed a logistic-regression model and assessed it’s accuracy with training and testing subsets.

The data used for this project was obtained from UCI repository. https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength

Import libraries

library(tidyverse,warn.conflicts = F) # includes 'magrittr' for piping and 'dplyr' for munging 
library(lubridate,warn.conflicts = F) # for munging time-series data 
library(plotly,warn.conflicts = F) # for making ggplot visuals interactive via ggplotly()
library(tidytext,warn.conflicts = F) # for ordering columns by value in each faceted plot 
library(car, warn.conflicts = F) # avPlots
library(MASS, warn.conflicts = F) # boxcox plots 
library(dplyr)
knitr::opts_chunk$set(echo = TRUE)
options(scipen=999) # turn off scientific notation

Read Data

source: https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength

# Load data
df <- read.csv("C:\\Users\\workd\\Desktop\\concrete.csv")
# 
colnames(df)

##  [1] "rownames"             "cement"               "blast_furnace_slag"  
##  [4] "fly_ash"              "water"                "superplasticizer"    
##  [7] "coarse_aggregate"     "fine_aggregate"       "age"                 
## [10] "compressive_strength"

# Rename columns for simplicity
df <- df %>% 
  rename(
    Cement = cement,
    Slag = blast_furnace_slag,
    FlyAsh = fly_ash,
    Water = water,
    SuperPlast = superplasticizer,
    CourseAgg = coarse_aggregate,
    FineAgg = fine_aggregate,
    Age = age,
    ConcreteStr = compressive_strength
  )

# Check and remove duplicate rows
paste('Rows with duplicated data (raw):', sum(duplicated(df)))

## [1] "Rows with duplicated data (raw): 0"

df <- df[!duplicated(df), ]
paste('Rows with duplicated data after fix:', sum(duplicated(df)))

## [1] "Rows with duplicated data after fix: 0"

# Check for missing data
paste('Rows with missing data:', sum(is.na(df)))

## [1] "Rows with missing data: 0"

# Print dataframe dimensions
paste('Dimensions of dataframe:', dim(df))

## [1] "Dimensions of dataframe: 1030" "Dimensions of dataframe: 10"

# Attach dataframe for easier column access
attach(df)

# Inspect data structure
glimpse(df)

## Rows: 1,030
## Columns: 10
## $ rownames    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,~
## $ Cement      <dbl> 540.0, 540.0, 332.5, 332.5, 198.6, 266.0, 380.0, 380.0, 26~
## $ Slag        <dbl> 0.0, 0.0, 142.5, 142.5, 132.4, 114.0, 95.0, 95.0, 114.0, 0~
## $ FlyAsh      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ Water       <dbl> 162, 162, 228, 228, 192, 228, 228, 228, 228, 228, 192, 192~
## $ SuperPlast  <dbl> 2.5, 2.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0~
## $ CourseAgg   <dbl> 1040.0, 1055.0, 932.0, 932.0, 978.4, 932.0, 932.0, 932.0, ~
## $ FineAgg     <dbl> 676.0, 676.0, 594.0, 594.0, 825.5, 670.0, 594.0, 594.0, 67~
## $ Age         <int> 28, 28, 270, 365, 360, 90, 365, 28, 28, 28, 90, 28, 270, 9~
## $ ConcreteStr <dbl> 79.99, 61.89, 40.27, 41.05, 44.30, 47.03, 43.70, 36.45, 45~

EDA

Below, we plot all variables as histograms. Some preliminary observations include:

Many samples have no slag, superplasticizer, nor fly ash, suggesting that they are unnecessary additives potentially used to strengthen concrete
Concrete strength seems right-skewed. Most samples have between 20-40 MPA in strength, yet some reach a very high level of strength

df %>% 
  pivot_longer(cols = df %>% 
                 colnames()) %>% 
  ggplot() + 
  geom_histogram(aes(value), size = .5, bins = 25) + 
  facet_wrap(~name, scales = 'free') + 
  theme_minimal() + 
  theme(axis.title.y = element_blank())

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## i Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Observations from independent variables plotted against concrete strength:

all independent variables except cement do not correlate well with concrete strength
some variables show a very wide range of concrete strength with the same value (FlyAsh = 0, for example). This may indicate interactivity between independent variables. For example, FlyAsh alone may not strengthen concrete, but when combined with Slag and Superplasticizer, it may (hypothetical example). We will dive into that further in later steps.

# scatter plots, all vars against strength
df %>% 
  pivot_longer(cols = df %>% 
                 dplyr::select(-c(ConcreteStr)) %>% 
                 colnames()) %>% 
  ggplot() + 
  geom_point(aes(x = ConcreteStr, y = value), size = .5) + 
  geom_smooth(aes(x = ConcreteStr, y = value),method='lm') +  
  facet_wrap(~name, scales = 'free') + 
  theme_minimal() + 
  theme(axis.title.y = element_blank())

Below, we use pairs to examine multi-collinearity among independent variables. We don’t see anything worth removing.

pairs(df[1:4])

pairs(df[5:8])

Qualitatively estimate which variables may require an interactive term with other independent variables

Some of the variables like FlyAsh, SuperPlast, and Slag have a wide-range of concrete strength values when equal to 0.

Therefore, a variable like FlyAsh likely has a greater effect on other independent variables rather than on Concrete Strength directly.

We believe this emphasizes the need for interactive terms between independent variables in our model. Below, we investigate each of those three variables and how they relate to other independent variables.

# slag binary term 
df %>% 
  mutate(Slag_cat = if_else(Slag == 0, 'Zero', 'Non-Zero')) %>% 
  pivot_longer(cols = df %>% 
                 dplyr::select(-c('ConcreteStr')) %>% 
                 colnames()) %>% 
  ggplot() + 
  geom_point(aes(x = ConcreteStr, y = value, color = Slag_cat), size = 1) + 
  facet_wrap(~name, scales = 'free') + 
  theme_minimal() + 
  theme(axis.title.y = element_blank())

# Slag by all x vars
df %>% 
  pivot_longer(cols = df %>% 
                 dplyr::select(-c('Slag','ConcreteStr')) %>% 
                 colnames()) %>% 
  ggplot() + 
  geom_point(aes(x = Slag, y = value, color = ConcreteStr), size = 2, alpha = .5) + 
  #geom_smooth(aes(x = FlyAsh, y = value),method='lm') +  
  facet_wrap(~name, scales = 'free') + 
  theme_minimal() + 
  theme(axis.title.y = element_blank()) + 
  ggplot2::scale_color_continuous(type = 'viridis')

# Plot independent vars against Concrete Str and color by FlyAsh
df %>% 
  mutate(FlyAsh_cat = if_else(FlyAsh == 0, 'Zero', 'Non-Zero')) %>%
  mutate(SuperPlast_cat = if_else(SuperPlast == 0, 'Zero', 'Non-Zero')) %>% 
  mutate(Slag_cat = if_else(Slag == 0, 'Zero', 'Non-Zero')) %>% 
  pivot_longer(cols = df %>% 
                 dplyr::select(-c('ConcreteStr','Slag')) %>% 
                 colnames()) %>% 
  ggplot() + 
  geom_point(aes(x = ConcreteStr, y = value, color = Slag), size = 2) + 
  facet_wrap(~name, scales = 'free') + 
  theme_minimal() + 
  theme(axis.title.y = element_blank())

FlyAsh

Below, we create a binary variable that categorizes FlyAsh as zero or non-zero. We then plot all other independent variables against concrete strength, and color the points by our engineered variable FlyAsh_cat (binary: zero, or non-zero).

When interpretting each plot, start at the Y-axis and read across from left-to-right. If the color changes, then it may indicate an effect that FlyAsh has on Concrete Strength for a fixed quantity of the other independent variable.

Using this methodology, in later steps we will investigate potential interactive terms between FlyAsh and Age and FlyAsh and Water.

df %>% 
  mutate(FlyAsh_cat = if_else(FlyAsh == 0, 'Zero', 'Non-Zero')) %>%
  mutate(SuperPlast_cat = if_else(SuperPlast == 0, 'Zero', 'Non-Zero')) %>% 
  mutate(Slag_cat = if_else(Slag == 0, 'Zero', 'Non-Zero')) %>% 
  pivot_longer(cols = df %>% 
                 dplyr::select(-c('ConcreteStr')) %>% 
                 colnames()) %>% 
  ggplot() + 
  geom_point(aes(x = ConcreteStr, y = value, color = FlyAsh_cat), size = 1) + 
  facet_wrap(~name, scales = 'free') + 
  theme_minimal() + 
  theme(axis.title.y = element_blank())

# Plot independent vars against Concrete Str and color by FlyAsh
df %>% 
  mutate(FlyAsh_cat = if_else(FlyAsh == 0, 'Zero', 'Non-Zero')) %>%
  mutate(SuperPlast_cat = if_else(SuperPlast == 0, 'Zero', 'Non-Zero')) %>% 
  mutate(Slag_cat = if_else(Slag == 0, 'Zero', 'Non-Zero')) %>% 
  pivot_longer(cols = df %>% 
                 dplyr::select(-c('ConcreteStr','FlyAsh')) %>% 
                 colnames()) %>% 
  ggplot() + 
  geom_point(aes(x = ConcreteStr, y = value, color = FlyAsh), size = 2) + 
  facet_wrap(~name, scales = 'free') + 
  theme_minimal() + 
  theme(axis.title.y = element_blank())

SuperPlast

df %>% 
  mutate(FlyAsh_cat = if_else(FlyAsh == 0, 'Zero', 'Non-Zero')) %>%
  mutate(SuperPlast_cat = if_else(SuperPlast == 0, 'Zero', 'Non-Zero')) %>% 
  mutate(Slag_cat = if_else(Slag == 0, 'Zero', 'Non-Zero')) %>% 
  pivot_longer(cols = df %>% 
                 dplyr::select(-c('ConcreteStr')) %>% 
                 colnames()) %>% 
  ggplot() + 
  geom_point(aes(x = ConcreteStr, y = value, color = SuperPlast_cat), size = 1) + 
  facet_wrap(~name, scales = 'free') + 
  theme_minimal() + 
  theme(axis.title.y = element_blank())

Developing initial model

In our first model, we have no interactive terms. Adjusted R^2 is roughly .6.

model0 <- lm(ConcreteStr ~ Cement + Slag + FlyAsh + Water + SuperPlast + CourseAgg + FineAgg + Age)
summary(model0)

## 
## Call:
## lm(formula = ConcreteStr ~ Cement + Slag + FlyAsh + Water + SuperPlast + 
##     CourseAgg + FineAgg + Age)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.654  -6.302   0.703   6.569  34.450 
## 
## Coefficients:
##               Estimate Std. Error t value             Pr(>|t|)    
## (Intercept) -23.331214  26.585504  -0.878             0.380372    
## Cement        0.119804   0.008489  14.113 < 0.0000000000000002 ***
## Slag          0.103866   0.010136  10.247 < 0.0000000000000002 ***
## FlyAsh        0.087934   0.012583   6.988     0.00000000000502 ***
## Water        -0.149918   0.040177  -3.731             0.000201 ***
## SuperPlast    0.292225   0.093424   3.128             0.001810 ** 
## CourseAgg     0.018086   0.009392   1.926             0.054425 .  
## FineAgg       0.020190   0.010702   1.887             0.059491 .  
## Age           0.114222   0.005427  21.046 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.4 on 1021 degrees of freedom
## Multiple R-squared:  0.6155, Adjusted R-squared:  0.6125 
## F-statistic: 204.3 on 8 and 1021 DF,  p-value: < 0.00000000000000022

Added variable plots to assess each independent variable

All the variables have significant p-values except coarse aggregate and fine aggregate. Also, from the added variable plot these two variables are not significant in estimating the concrete compression strength as their slopes are almost zero.

The slope for superplasticity is also almost 0, but the visuals above suggest superplasticity has an impact on other independent variables. Therefore, we will keep it and explore adding it in interactive terms with other independent variables.

avPlots(model0)

Optimizing our model by removing terms and adding interactivity terms

Next, we’ll remove CourseAgg and FineAgg then add in some interactive terms estimated from the visual relationship between dependent variables in the above steps.

model1 <- lm(ConcreteStr ~ Cement + Slag + FlyAsh + Water + SuperPlast + Age + 
               FlyAsh:Age + Slag:Age +  SuperPlast:Age + SuperPlast:Slag + SuperPlast:CourseAgg)
summary(model1)

## 
## Call:
## lm(formula = ConcreteStr ~ Cement + Slag + FlyAsh + Water + SuperPlast + 
##     Age + FlyAsh:Age + Slag:Age + SuperPlast:Age + SuperPlast:Slag + 
##     SuperPlast:CourseAgg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.712  -5.750   0.197   6.133  35.246 
## 
## Coefficients:
##                         Estimate  Std. Error t value             Pr(>|t|)    
## (Intercept)          16.43060036  4.10721384   4.000    0.000067805386495 ***
## Cement                0.11179927  0.00382881  29.199 < 0.0000000000000002 ***
## Slag                  0.06703302  0.00593673  11.291 < 0.0000000000000002 ***
## FlyAsh                0.03711830  0.00876344   4.236    0.000024859964239 ***
## Water                -0.14785733  0.01992723  -7.420    0.000000000000247 ***
## SuperPlast           -2.05053613  0.43533241  -4.710    0.000002816561120 ***
## Age                   0.06442521  0.00634794  10.149 < 0.0000000000000002 ***
## FlyAsh:Age            0.00105559  0.00016098   6.557    0.000000000086924 ***
## Slag:Age              0.00025384  0.00005807   4.371    0.000013634834882 ***
## SuperPlast:Age        0.01386550  0.00151351   9.161 < 0.0000000000000002 ***
## Slag:SuperPlast       0.00253235  0.00061679   4.106    0.000043538098736 ***
## SuperPlast:CourseAgg  0.00181418  0.00047097   3.852             0.000124 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.07 on 1018 degrees of freedom
## Multiple R-squared:  0.7084, Adjusted R-squared:  0.7053 
## F-statistic: 224.8 on 11 and 1018 DF,  p-value: < 0.00000000000000022

Model diagnostics

Based on the diagnostic plots, the model seems reasonable.

par(mfrow = c(2,2))
plot(model1)

plot(model1$fitted.values, ConcreteStr)
par(mfrow = c(2,5))

rstand <- rstandard(model1)
plot(rstand ~ Cement)
plot(rstand ~ Slag)
plot(rstand ~ FlyAsh)
plot(rstand ~ Water)
plot(rstand ~ SuperPlast)
plot(rstand ~ Age)
plot(rstand ~ FlyAsh:Age)
plot(rstand ~ SuperPlast:Age)

plot(rstand ~ Slag:SuperPlast)
plot(rstand ~ model1$fitted.values) # fitted v standard resid 
plot(ConcreteStr~ model1$fitted.values) # fitted v actual
abline(lm(ConcreteStr ~ model1$fitted.values))

Based on the inverse response plot and boxcox plot, no transformation seems necessary for the independent variable.

inverseResponsePlot(model1)

boxcox(model1, lambda = seq(  0,1,by =0.01))

avPlots(model1)

Based on the marginal model plots, our model fits well. Specifically, the ‘data’ and ‘model’ lines trace closely for the different variables.

#marginal model plot
mmps(model1,layout=c(2,2))

To demonstrate the impact of these interactive terms, we will perform an anova test between full and partial models.

The low p-value after excluding the interactive term between SuperPlast:CourseAgg is well-below any reasonable alpha-value, indicating that it is significant to the model.

model1_partial <- lm(ConcreteStr ~ Cement + Slag + FlyAsh + Water + SuperPlast + Age +
                       FlyAsh:Age + Slag:Age +  SuperPlast:Age + SuperPlast:Slag) # Removed SuperPlast:CourseAgg

anova(model1_partial, model1)

Next, we create a logistic regression model to determine if concrete components produce concrete stronger than industry standard (4000 PSI). We then assess our model accuracy using a test sample.

Compressive strength is important as it is the main criteria used to determine whether a given concrete mixture will meet the needs of a specific job. The threshold for compressive strength of a concrete pass is 4000 psi = 27.58 Mpa.

https://cor-tuf.com/everything-you-need-to-know-about-concrete-strength/

Creating a binary variable column for concrete strength using the threshold. Samples with strength higher than the threshold are assigned 1 (pass) while the opposite is assigned 0 (fail) based on the parameters that we have.

threshold <- 27.58 #4000 psi to MPA

 
df_expanded <- df %>% 

  # create binary variable above or below the concrete strength threshold  
  mutate(Binary_ConcreteStr = ifelse(ConcreteStr>threshold,1,0)) %>% 
  
  # create a unique identifier to split the train-test sets using anti-join 
  mutate(rownumber = row_number()) 

# inspect contents
head(df_expanded)

# split 80% of df_expanded into train set 
df_train <- sample_frac(df_expanded, size = 0.8)

# create test set using 25% of df_expanded not in train set 
df_test <- df_expanded %>% anti_join(df_train, by = 'rownumber')

Modeling the new dataset using logistic regression:

# create model using the training dataset, 
# prevent information leakage by excluding the testing dataset 
model1_log <- glm(Binary_ConcreteStr ~ Cement + Slag + FlyAsh + Water + SuperPlast + 
                    
Age + FlyAsh:Age + Slag:Age +  SuperPlast:Age + SuperPlast:Slag +SuperPlast:CourseAgg, 
                  
data = df_train, family = binomial)

summary(model1_log)

## 
## Call:
## glm(formula = Binary_ConcreteStr ~ Cement + Slag + FlyAsh + Water + 
##     SuperPlast + Age + FlyAsh:Age + Slag:Age + SuperPlast:Age + 
##     SuperPlast:Slag + SuperPlast:CourseAgg, family = binomial, 
##     data = df_train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.6381  -0.3430   0.0300   0.3731   2.2650  
## 
## Coefficients:
##                        Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)          -6.4150656  1.8524279  -3.463             0.000534 ***
## Cement                0.0259249  0.0024807  10.451 < 0.0000000000000002 ***
## Slag                  0.0073884  0.0029681   2.489             0.012800 *  
## FlyAsh                0.0021416  0.0054069   0.396             0.692043    
## Water                -0.0193218  0.0089992  -2.147             0.031789 *  
## SuperPlast           -0.5172861  0.2338731  -2.212             0.026979 *  
## Age                   0.0291569  0.0056994   5.116          0.000000312 ***
## FlyAsh:Age            0.0004525  0.0001865   2.426             0.015279 *  
## Slag:Age              0.0004895  0.0001088   4.501          0.000006776 ***
## SuperPlast:Age        0.0061277  0.0025376   2.415             0.015746 *  
## Slag:SuperPlast       0.0002204  0.0003080   0.716             0.474281    
## SuperPlast:CourseAgg  0.0005293  0.0002481   2.134             0.032865 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1062.72  on 823  degrees of freedom
## Residual deviance:  466.85  on 812  degrees of freedom
## AIC: 490.85
## 
## Number of Fisher Scoring iterations: 8

Computing a generalized version of the R^2 for logistic regression model:

D <- 555.78
Do <- 1297.93
R_square <- 1 - (D/Do)
print(paste("The R^2 vaue is ",round(R_square,2)))

## [1] "The R^2 vaue is  0.57"

Alternative for calculating R^2 confirms similar value

library(class) # import for knn function

## Warning: package 'class' was built under R version 4.1.3

temp <- class::knn(train = df_train, test = df_test, cl = df_train$Binary_ConcreteStr)
print(paste('The alternative R^2 value is close at:',
round(sum(as.integer(temp)-1)/length(temp),2)))

## [1] "The alternative R^2 value is close at: 0.66"

In the broader dataset, computing samples with concrete strength above the threshold:

concrete_passed_data <- length(which(df_expanded$Binary_ConcreteStr == 1))

print(paste(round(concrete_passed_data), 
            "concrete samples passed the threshold of 4000 psi (27.58 MPa)"))

## [1] "680 concrete samples passed the threshold of 4000 psi (27.58 MPa)"

Computing how many samples have concrete strength that passes the threshold in the test dataset:

concrete_passed_model <- predict(model1_log, df_test, type = "response")

count_concrete_passed_model <- length(which(concrete_passed_model > 0.5))

print(paste("The model predicted that",round(count_concrete_passed_model),
            "concrete samples passed the threshold of 4000 psi (27.58 MPa)"))

## [1] "The model predicted that 144 concrete samples passed the threshold of 4000 psi (27.58 MPa)"

Model accuracy

fit_passed.pred <- rep("Didn't Pass", nrow(df_test))

fit_passed.pred[concrete_passed_model > 0.5] = "Passed"

confusion <- table(fit_passed.pred, df_test$Binary_ConcreteStr)

print(confusion)

##                
## fit_passed.pred   0   1
##     Didn't Pass  52  10
##     Passed       13 131

accurancy <- round((c(confusion)[1] + c(confusion)[4])/(sum(c(confusion)))*100,0)

print(paste("The training accuracy of the model is",accurancy,"%"))

## [1] "The training accuracy of the model is 89 %"

Summary

Problem statement and how we addressed it:

We wanted to understand if given the ingredients of concrete, could we accurately predict if the resultant compressive strength of that concrete would meet industry standards (4000 PSI).

First, we performed EDA to refine a multi-linear regression model. Then, we took our model and our engineered term of above or below 4000 PSI to train a subset of our data and assess the accuracy against a testing subset.

Our final model came out at 87% accurate. Below, you can find our interpretation of this value.

Interesting insights and limitations:

Some variables of concrete are not ‘necessary’ but are rather additives that can strengthen concrete by enhancing the effects of more primary ingredients like cement. These types of elements (like Superplasticity, Slag, and Fly Ash) have strong interactivity with other concrete ingredients to improve the overall strength. Once we added interactive terms to our multi-linear regression for these ancillary ingredients with more primary ingredients, the model R2 value improved by 15%
Some variables have a direct effect on the strength of concrete without any interactive term (or added ingredients). For example, cement content correlates closely with concrete compressive strength. This is evident in the low p-value from the model summary, and a simple scatter plot between the two variables.
Our logistic regression model had an accuracy of 87%. In other words, if someone has the ingredients to make concrete and plugs those values into our model, our model will predict whether the resultant compressive strength is above or below 4000 PSI. 87% of the time, our model will accurately predict if the concrete strength is above or below that threshold. These types of logistic regression models are likely widely used in the real-world. If concrete unexpectedly fails, the consequences can be severe.
While decent, our model could likely be improved. We estimate that more time would be needed to determine exact interactive terms between variables. In this model, we managed to capture a few obvious ones from some of our diagnostic plots and EDA. Given how those interactive terms improved our model accuracy, more refined ones may further improve this model.

References

Chopra, P., Sharma, R.K. and Kumar, M. (2014) “Predicting compressive strength of concrete for varying workability using regression models,” International Journal Of Engineering & Applied Sciences, 6(4), pp. 10–10. Available at: https://doi.org/10.24107/ijeas.251233.

Peng, X., Zhuang, Z. and Yang, Q. (2022) “Predictive modeling of compressive strength for concrete at super early age,” Materials, 15(14), p. 4914. Available at: https://doi.org/10.3390/ma15144914.