ABSTRACT

To measure the performance of building materials in the field of engeneering, accurate estimates of those materials can be done using neural network models. Put aside the interaction effects between the components of a process, artificial neural network approach can reliably predict the solidity of a material, given the input ingredients we have in hand. The data used to apply the ANN algorithm can be downloaded from http://http://archive.ics.uci.edu/ml. Details of this approach can be found in the book “Machine Learning with R”, by Brett Lantz.

STEP 1: Load data

setwd("C:/bassel/MACHINE LEARNING")
getwd()
## [1] "C:/bassel/MACHINE LEARNING"
concrete<-read.csv("concrete.csv", sep = ",", header = T)
str(concrete)
## 'data.frame':    1030 obs. of  9 variables:
##  $ cement      : num  141 169 250 266 155 ...
##  $ slag        : num  212 42.2 0 114 183.4 ...
##  $ ash         : num  0 124.3 95.7 0 0 ...
##  $ water       : num  204 158 187 228 193 ...
##  $ superplastic: num  0 10.8 5.5 0 9.1 0 0 6.4 0 9 ...
##  $ coarseagg   : num  972 1081 957 932 1047 ...
##  $ fineagg     : num  748 796 861 670 697 ...
##  $ age         : int  28 14 28 28 28 90 7 56 28 28 ...
##  $ strength    : num  29.9 23.5 29.2 45.9 18.3 ...

STEP 2: Explore and prepare the data

Rescale and Normalize data using a predefined function and Apply the function to our dataset
normalize <-function(x){
  
  return((x-min(x))/(max(x)-min(x)))
  
}
Apply the function to all the elements of the dataframe
concrete_norm <- as.data.frame(lapply(concrete, normalize))
Display the distribution of the target variable
hist(concrete_norm$strength, col = "blue")

Verify that the range of the target variable is between 0 and 1 by comparing it to the original variable range
summary(concrete_norm$strength)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2664  0.4001  0.4172  0.5457  1.0000
summary(concrete$strength)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.33   23.71   34.44   35.82   46.14   82.60
Explore correlations between variables
library(psych)
pairs.panels(concrete_norm)

Strength of material seems to have a moderate relation with only the cement variable.

Split data between training (75%) and test (25%) sets

concrete_train<-concrete_norm[1:773,]
concrete_test<-concrete_norm[774:1030,]

STEP 3 : Train the model on the data :

Use a multilayer feedforward neural network (MLP) to model the relationship between the ingredients used in concrete and the finished product.This is a model of a single hidden node by default
# Building the model
library(neuralnet)
set.seed(12345) # to guarantee repeatable results
concrete_model<-neuralnet(formula = strength ~ cement + slag +
                              ash + water + superplastic + 
                              coarseagg + fineagg + age,
                            data = concrete_train)
Visualize the network topology (multilayer and single hidden node)
plot(concrete_model, rep="best")

Above is displayed a simple model of ann with a single hidden node.The weight between each input node and the hidden node can be interpreted as a regression coefficient. Which makes it very similar to multiple regression model algorithm.

Alternative plot

library(NeuralNetTools)
par(mar = numeric(4), family = 'serif')
plotnet(concrete_model, alpha=0.6)

STEP 4 : Evaluate the model performance

The score close to 1 indicates a strong relationship between the target variable and the predictors

STEP 5 : Improve the model performance

With a more complex topology we can increase the number of hidden nodes to 5 (parameter hidden) to see if the model performs better.
concrete_model_2<-neuralnet(formula = strength ~ cement + slag +
                              ash + water + superplastic + 
                              coarseagg + fineagg + age,
                            data = concrete_train, hidden = 5)
plot(concrete_model_2, rep="best")

Compare the predicted values to the true values on model 2
model_results_2<-compute(concrete_model_2, concrete_test[1:8])
predicted_strength_2<-model_results_2$net.result
cor(predicted_strength_2, concrete_test$strength)
##              [,1]
## [1,] 0.9342537338
The correlation (0.93) is much better than in the previous model with only a single hidden node.There is also a huge increase in the number of connections.The prediction error has been reduced(from 5 to 1.6). The model performance has thus been fairly improved.