Concrete Compressive Strength

Abstract

Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. These ingredients include cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, and fine aggregate.

Data Analysis

Data Pre Processing

Libraries Used

library(corrplot)
library(caret)
library(ggplot2)
library(knitr)
library(e1071)
library(rattle)
library(kableExtra)

Importing Data

conc <- read.csv("D:\\R\\Data Sets\\concrete.csv")

Data Preview

kable(head(conc), "html") %>% kable_styling("striped") %>% scroll_box(width = "100%")

cementcomp	slag	water	superplastisizer	coraseaggr	finraggr	age	CCS
540.0	0.0	162	2.5	1040.0	676.0	28	79.99
540.0	0.0	162	2.5	1055.0	676.0	28	61.89
332.5	142.5	228	0.0	932.0	594.0	270	40.27
332.5	142.5	228	0.0	932.0	594.0	365	41.05
198.6	132.4	192	0.0	978.4	825.5	360	44.30
266.0	114.0	228	0.0	932.0	670.0	90	47.03

Summary of variables

summary(conc)

##    cementcomp         slag           flyash           water      
##  Min.   :102.0   Min.   :  0.0   Min.   :  0.00   Min.   :121.8  
##  1st Qu.:192.4   1st Qu.:  0.0   1st Qu.:  0.00   1st Qu.:164.9  
##  Median :272.9   Median : 22.0   Median :  0.00   Median :185.0  
##  Mean   :281.2   Mean   : 73.9   Mean   : 54.19   Mean   :181.6  
##  3rd Qu.:350.0   3rd Qu.:142.9   3rd Qu.:118.30   3rd Qu.:192.0  
##  Max.   :540.0   Max.   :359.4   Max.   :200.10   Max.   :247.0  
##  superplastisizer   coraseaggr        finraggr          age        
##  Min.   : 0.000   Min.   : 801.0   Min.   :594.0   Min.   :  1.00  
##  1st Qu.: 0.000   1st Qu.: 932.0   1st Qu.:731.0   1st Qu.:  7.00  
##  Median : 6.400   Median : 968.0   Median :779.5   Median : 28.00  
##  Mean   : 6.205   Mean   : 972.9   Mean   :773.6   Mean   : 45.66  
##  3rd Qu.:10.200   3rd Qu.:1029.4   3rd Qu.:824.0   3rd Qu.: 56.00  
##  Max.   :32.200   Max.   :1145.0   Max.   :992.6   Max.   :365.00  
##       CCS       
##  Min.   : 2.33  
##  1st Qu.:23.71  
##  Median :34.45  
##  Mean   :35.82  
##  3rd Qu.:46.13  
##  Max.   :82.60

Check the skewness of variables. If the distribution is roughly symmetric, skewness will be close to zero.

apply(conc,2,skewness)

##       cementcomp             slag           flyash            water 
##       0.50799821       0.79838622       0.53578981       0.07441116 
## superplastisizer       coraseaggr         finraggr              age 
##       0.90456195      -0.04010268      -0.25227315       3.25966169 
##              CCS 
##       0.41576358

Age is skewed. But since the ‘Age’ attribute has some ‘0’ values, it can not be transformed.

CCS Distribution Plot

ggplot(conc, aes(x = CCS)) + geom_histogram(bins = 40) + labs(x = "CCS", y = "Frequency", title = "Distribution of CCS")

Correlation matrix to find interaction between variables.

corrplot(cor(conc), method = "number",type = "upper")

Scatter plot - Water vs Superplastisizer

qplot(conc$superplastisizer, conc$water) + labs(x = "Superplastisizer", y = "Water", title = "Water vs Superplastisizer")

Partition the data into training and testing set. 70-30.

#Splitting data into train and test set.
inTrain <- createDataPartition(conc$CCS, p = 0.7, list = F)
trainSET <- conc[inTrain,]
testSET <- conc[-inTrain,]

Create a dataframe to store the results.

#Dataframe to store results
results <- data.frame(test_obs = testSET$CCS)

Model Building

Linear Regression Model

#10 Fold Cross Validation
ctrl <- trainControl(method = "cv")

lmmodel <- train(CCS ~ ., data = trainSET, method = "lm", trControl = ctrl)
lmmodel

## Linear Regression 
## 
## 722 samples
##   8 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 649, 650, 650, 649, 650, 650, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   10.58996  0.6070471  8.375624
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Predict the results on the test set.

results$lm_predictions <- predict(lmmodel, testSET)

CART (Classification and Regression Trees)

treemodel <- train(CCS ~ ., data = trainSET, method = "rpart", trControl = ctrl, tuneLength = 5)

plot(treemodel)

Fancy Decision Tree Plot

fancyRpartPlot(treemodel$finalModel)

Predict the results on the test set.

results$tree_predictions <- predict(treemodel, testSET)

3.Multi adaptive Regression Model

marmodel <- train(CCS ~ ., data = trainSET, method = "earth", trControl = ctrl, tuneLength = 15)
plot(marmodel)

Predict the results on the test set.

results$mar_predictions <- predict(marmodel, testSET)

SVM Radial

svmmodel <- train(CCS ~ ., data = trainSET, method = "svmRadial", trControl = ctrl, tuneLength = 10)
svmmodel

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 722 samples
##   8 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 650, 650, 650, 650, 650, 648, ... 
## Resampling results across tuning parameters:
## 
##   C       RMSE      Rsquared   MAE     
##     0.25  8.193874  0.7727503  6.216962
##     0.50  7.510859  0.8057671  5.588933
##     1.00  6.975163  0.8315825  5.139610
##     2.00  6.614476  0.8484595  4.842685
##     4.00  6.335832  0.8617612  4.601815
##     8.00  6.187779  0.8681764  4.447095
##    16.00  6.024921  0.8747392  4.274296
##    32.00  5.994894  0.8761061  4.134932
##    64.00  6.047779  0.8746937  4.126316
##   128.00  6.257243  0.8664588  4.220552
## 
## Tuning parameter 'sigma' was held constant at a value of 0.1117766
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.1117766 and C = 32.

plot(svmmodel)

Predict the results on the test set.

results$svm_predictions <- predict(svmmodel, testSET)

Predicted Values

kable(head(results), "html") %>% kable_styling("striped") %>% scroll_box(width = "100%")

test_obs	lm_predictions	tree_predictions	mar_predictions	svm_predictions
40.27	55.56344	40.41024	47.18905	40.89970
44.30	59.60197	40.41024	39.63560	46.70834
43.70	66.59378	57.15409	50.17871	43.40097
36.45	29.91729	57.15409	38.31050	37.20360
40.56	36.66490	57.15409	48.29527	39.59831
42.62	47.84235	57.15409	49.37454	42.85674

Model Assesment

#Linear Regression
postResample(results$lm_predictions,results$test_obs)

##       RMSE   Rsquared        MAE 
## 10.1496003  0.6338727  8.2266351

#CART
postResample(results$tree_predictions, results$test_obs)

##       RMSE   Rsquared        MAE 
## 11.4951801  0.5315966  9.2424359

#MARS
postResample(results$mar_predictions, results$test_obs)

##      RMSE  Rsquared       MAE 
## 6.2252879 0.8622315 4.8071490

#SVMRadial
postResample(results$svm_predictions, results$test_obs)

##      RMSE  Rsquared       MAE 
## 5.4140668 0.8965451 3.8116596

It is seen that the SVM Radial model outperforms all other models with the highest RSquared, Lowest MAE and RMSE. This would be the best fit model for our data which gives the most accurate predictions.

Concrete Compressive Strength

Navin P

4 November 2018

Data Description

Abstract

Data Sourcing

Data Characteristics

Variable Information

Data Analysis

Data Pre Processing

Libraries Used

Importing Data

Data Preview

Summary of variables

Check the skewness of variables. If the distribution is roughly symmetric, skewness will be close to zero.

CCS Distribution Plot

Correlation matrix to find interaction between variables.

Scatter plot - Water vs Superplastisizer

Partition the data into training and testing set. 70-30.

Create a dataframe to store the results.

Model Building

Model Assesment