Introduction

This project is from Book: Machine learning with R by Brett Lantz, chapter 7.

A link to the book https://bit.ly/3gsf2e0

This project is for educational purpose only.

The aim is to estimate the strength of concrete using artificial neural networks.

Required packages

we will use nerualnet package

library(neuralnet)

Setp 1 - collecting data

the data is donated to the UCI Machine learning repository.

Step 2 - exploring and preparing the data

concrete <- read.csv("concrete.csv")

#Exploring the structure of the data frame
str(concrete)
## 'data.frame':    1030 obs. of  9 variables:
##  $ cement      : num  141 169 250 266 155 ...
##  $ slag        : num  212 42.2 0 114 183.4 ...
##  $ ash         : num  0 124.3 95.7 0 0 ...
##  $ water       : num  204 158 187 228 193 ...
##  $ superplastic: num  0 10.8 5.5 0 9.1 0 0 6.4 0 9 ...
##  $ coarseagg   : num  972 1081 957 932 1047 ...
##  $ fineagg     : num  748 796 861 670 697 ...
##  $ age         : int  28 14 28 28 28 90 7 56 28 28 ...
##  $ strength    : num  29.9 23.5 29.2 45.9 18.3 ...

we can see the 8 input variables and the outcome variable strength.

we can see the features have different ranges, which requires normalization to help the Neural networks do their best.

#Explore the summary of the strength column
summary(concrete$strength)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.33   23.71   34.45   35.82   46.13   82.60

If the data follow a normal distribution, we can use scale() function for standardization and normalization. On the other hand, if the data follow a uniform distribution or a severely non-normal, then normalization to range 0-1 may be more appropriate.

We will create a normalization function

normalize <- function(x) {
    return((x - min(x)) / (max(x) - min(x) ))
}

Now we will apply the function on the data frame.as the result from function lapply is a list, we will use as.data.frame to convert the list to a data frame. we shall reverse the transformation to the original unit of measures later in the project

concrete_norm <- as.data.frame(lapply(concrete, normalize))

#Inspect summary for the strength
summary(concrete_norm$strength)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2664  0.4001  0.4172  0.5457  1.0000

Now we will partition the data, 75% is training, and

concrete_train <- concrete_norm[1:773, ]
concrete_test <- concrete_norm[774:1030, ]

Setp 3 - training a model on the data

We will use neuralnet() function from neuralnet package.

#We will have 1 hidden layer by default and activation function "logistics"
concrete_model <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age, data = concrete_train)

#plotting the neural network, I included paramter rep = "best" to allow neural network plot the graph using knitr

plot(concrete_model, rep = "best")

We can see the input nodes, the hidden layer, the bias terms indicated by the nodes labeled number 1.

The error at the bottom of the plot shows the sum of squared errors (SSE)

Step 4 - evaluating model performance

#compute function returns a list with two components, $neurons and $net.result which stores the predicted value.

model_results <- compute(concrete_model, concrete_test[1:8])

predicted_strength <- model_results$net.result

We can’t use confusion matrix as in classification problems, we will measure correlation between predicted and actual values

cor(predicted_strength, concrete_test$strength)
##          [,1]
## [1,] 0.806423

This value of 0.806 indicates a fairly strong relationship, this implies that our model is doing a fairly good job.

Step 5 - improving model performance

# we will introduce 5 hidden nodes
concrete_model2 <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age, data = concrete_train, hidden = 5)

#plotting the neural network

plot(concrete_model2, rep = "best")

We notice the SSE reduced to be 1.746 instead of 5.08

Now we evaluate the new model

#compute function returns a list with two components, $neurons and $net.result which stores the predicted value.

model_results2 <- compute(concrete_model2, concrete_test[1:8])

predicted_strength2 <- model_results2$net.result

cor(predicted_strength2, concrete_test$strength)
##           [,1]
## [1,] 0.9281844

We can see improvement in the correlation to be 0.907, which is a considerable improvement over the previous result of 0.806

Activation function is usually very imprtant for deep learning, a function knows as a rectifier activation function, a node in a neural network that uses the rectifier activation function is known as rectifier linear Unit (ReLU). As the function will return x as x>= 0, and returns Zero otherwise.

To use the function, we will use a smooth approximation of the ReLu knows as softplus of smooth ReLU.

TO defined the softplus function in R

softplus <- function(x) log(1 + exp(x)) 

We will use the softplus activation function and add a second hidden layer in the network

# I added stepmax = 1e7 as the nueral netowrk was not converting after reaching max step.
concrete_model3 <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age, data = concrete_train, hidden = c(5, 5), act.fct = softplus, stepmax = 1e7)

#plotting the neural network

plot(concrete_model3, rep = "best")

model_results3 <- compute(concrete_model3, concrete_test[1:8])

predicted_strength3 <- model_results3$net.result

cor(predicted_strength3, concrete_test$strength)
##           [,1]
## [1,] 0.9177854

We want to highlight that as we had normalized the data prior to training the model, the predictions are also on a normalized scale from zero to one.

compare actual data vs. normalized data

strengths <- data.frame(
    actual = concrete$strength[774:1030],
    pred = predicted_strength3
)

head(strengths, n = 3)
##     actual      pred
## 774  30.14 0.2865025
## 775  44.40 0.4727623
## 776  24.50 0.2796036
#examining the correlation between actual and normalized
cor(strengths$actual, strengths$pred)
## [1] 0.9177854

We will create an unnormalize() function that reverses the min-max normalization procedure and allow us to convert the normalized prediction to the original scale

unnormalize <- function(x) {
    return((x * max(concrete$strength)) - min(concrete$strength)) + min(concrete$strength)
}
strengths$pred_new <- unnormalize(strengths$pred)
strengths$error <- strengths$pred_new - strengths$actual

head(strengths, n = 3)
##     actual      pred pred_new     error
## 774  30.14 0.2865025 21.33511 -8.804891
## 775  44.40 0.4727623 36.72016 -7.679836
## 776  24.50 0.2796036 20.76526 -3.734740

Examine the correlation between the normalized predictions vs. actual values.

cor(strengths$pred_new, strengths$actual)
## [1] 0.9177854

We can see consistency in the correlation value.

Summary

Neural networks is a powerful machine learning tool which is a black box and more complex than other regression tools.