This project is from Book: Machine learning with R by Brett Lantz, chapter 7.
A link to the book https://bit.ly/3gsf2e0
This project is for educational purpose only.
The aim is to estimate the strength of concrete using artificial neural networks.
we will use nerualnet package
library(neuralnet)
the data is donated to the UCI Machine learning repository.
concrete <- read.csv("concrete.csv")
#Exploring the structure of the data frame
str(concrete)
## 'data.frame': 1030 obs. of 9 variables:
## $ cement : num 141 169 250 266 155 ...
## $ slag : num 212 42.2 0 114 183.4 ...
## $ ash : num 0 124.3 95.7 0 0 ...
## $ water : num 204 158 187 228 193 ...
## $ superplastic: num 0 10.8 5.5 0 9.1 0 0 6.4 0 9 ...
## $ coarseagg : num 972 1081 957 932 1047 ...
## $ fineagg : num 748 796 861 670 697 ...
## $ age : int 28 14 28 28 28 90 7 56 28 28 ...
## $ strength : num 29.9 23.5 29.2 45.9 18.3 ...
we can see the 8 input variables and the outcome variable strength.
we can see the features have different ranges, which requires normalization to help the Neural networks do their best.
#Explore the summary of the strength column
summary(concrete$strength)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.33 23.71 34.45 35.82 46.13 82.60
If the data follow a normal distribution, we can use scale() function for standardization and normalization. On the other hand, if the data follow a uniform distribution or a severely non-normal, then normalization to range 0-1 may be more appropriate.
We will create a normalization function
normalize <- function(x) {
return((x - min(x)) / (max(x) - min(x) ))
}
Now we will apply the function on the data frame.as the result from function lapply is a list, we will use as.data.frame to convert the list to a data frame. we shall reverse the transformation to the original unit of measures later in the project
concrete_norm <- as.data.frame(lapply(concrete, normalize))
#Inspect summary for the strength
summary(concrete_norm$strength)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2664 0.4001 0.4172 0.5457 1.0000
Now we will partition the data, 75% is training, and
concrete_train <- concrete_norm[1:773, ]
concrete_test <- concrete_norm[774:1030, ]
We will use neuralnet() function from neuralnet package.
#We will have 1 hidden layer by default and activation function "logistics"
concrete_model <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age, data = concrete_train)
#plotting the neural network, I included paramter rep = "best" to allow neural network plot the graph using knitr
plot(concrete_model, rep = "best")
We can see the input nodes, the hidden layer, the bias terms indicated by the nodes labeled number 1.
The error at the bottom of the plot shows the sum of squared errors (SSE)
#compute function returns a list with two components, $neurons and $net.result which stores the predicted value.
model_results <- compute(concrete_model, concrete_test[1:8])
predicted_strength <- model_results$net.result
We can’t use confusion matrix as in classification problems, we will measure correlation between predicted and actual values
cor(predicted_strength, concrete_test$strength)
## [,1]
## [1,] 0.806423
This value of 0.806 indicates a fairly strong relationship, this implies that our model is doing a fairly good job.
# we will introduce 5 hidden nodes
concrete_model2 <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age, data = concrete_train, hidden = 5)
#plotting the neural network
plot(concrete_model2, rep = "best")
We notice the SSE reduced to be 1.746 instead of 5.08
Now we evaluate the new model
#compute function returns a list with two components, $neurons and $net.result which stores the predicted value.
model_results2 <- compute(concrete_model2, concrete_test[1:8])
predicted_strength2 <- model_results2$net.result
cor(predicted_strength2, concrete_test$strength)
## [,1]
## [1,] 0.9281844
We can see improvement in the correlation to be 0.907, which is a considerable improvement over the previous result of 0.806
Activation function is usually very imprtant for deep learning, a function knows as a rectifier activation function, a node in a neural network that uses the rectifier activation function is known as rectifier linear Unit (ReLU). As the function will return x as x>= 0, and returns Zero otherwise.
To use the function, we will use a smooth approximation of the ReLu knows as softplus of smooth ReLU.
TO defined the softplus function in R
softplus <- function(x) log(1 + exp(x))
We will use the softplus activation function and add a second hidden layer in the network
# I added stepmax = 1e7 as the nueral netowrk was not converting after reaching max step.
concrete_model3 <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age, data = concrete_train, hidden = c(5, 5), act.fct = softplus, stepmax = 1e7)
#plotting the neural network
plot(concrete_model3, rep = "best")
model_results3 <- compute(concrete_model3, concrete_test[1:8])
predicted_strength3 <- model_results3$net.result
cor(predicted_strength3, concrete_test$strength)
## [,1]
## [1,] 0.9177854
We want to highlight that as we had normalized the data prior to training the model, the predictions are also on a normalized scale from zero to one.
compare actual data vs. normalized data
strengths <- data.frame(
actual = concrete$strength[774:1030],
pred = predicted_strength3
)
head(strengths, n = 3)
## actual pred
## 774 30.14 0.2865025
## 775 44.40 0.4727623
## 776 24.50 0.2796036
#examining the correlation between actual and normalized
cor(strengths$actual, strengths$pred)
## [1] 0.9177854
We will create an unnormalize() function that reverses the min-max normalization procedure and allow us to convert the normalized prediction to the original scale
unnormalize <- function(x) {
return((x * max(concrete$strength)) - min(concrete$strength)) + min(concrete$strength)
}
strengths$pred_new <- unnormalize(strengths$pred)
strengths$error <- strengths$pred_new - strengths$actual
head(strengths, n = 3)
## actual pred pred_new error
## 774 30.14 0.2865025 21.33511 -8.804891
## 775 44.40 0.4727623 36.72016 -7.679836
## 776 24.50 0.2796036 20.76526 -3.734740
Examine the correlation between the normalized predictions vs. actual values.
cor(strengths$pred_new, strengths$actual)
## [1] 0.9177854
We can see consistency in the correlation value.
Neural networks is a powerful machine learning tool which is a black box and more complex than other regression tools.