Step 1: Collecting Data

The data of concrete.csv can be found in the UCI Machine Learning Data Repository (http://archive.ics.uci.edu/ml). The dataset is used for artificial neural networks (ANN) modeling purposes here.

Step 2: Exploring and preparing the data

The concrete.csv dataset contains 1030 observations and 9 numerical variables. Eight out of nine columns are features that describe the various ingredients used as composition for the concrete, and are thought to have overall effect on the strength of concrete. The last feature gives a numerical value for the strength of each concrete observation.

concrete <- read.csv("http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/Presentations/ml11/concrete.csv")
str(concrete)
## 'data.frame':    1030 obs. of  9 variables:
##  $ cement      : num  141 169 250 266 155 ...
##  $ slag        : num  212 42.2 0 114 183.4 ...
##  $ ash         : num  0 124.3 95.7 0 0 ...
##  $ water       : num  204 158 187 228 193 ...
##  $ superplastic: num  0 10.8 5.5 0 9.1 0 0 6.4 0 9 ...
##  $ coarseagg   : num  972 1081 957 932 1047 ...
##  $ fineagg     : num  748 796 861 670 697 ...
##  $ age         : int  28 14 28 28 28 90 7 56 28 28 ...
##  $ strength    : num  29.9 23.5 29.2 45.9 18.3 ...

The concept of artificial neural network (ANN) is similar to a biological neuronal network process, and that one neuron is sending signal to another one when a threshold is reach. Here, there are multiple inputs (eight features) that each is a input neuron with different weight according to their importance. The signals are send to a output neuron/signal through an activation function. However, since all activation function contain a small range of input values that affect the output around zero that are useful. The input neuron values has to be standardized/normalized to a range near zero. A z-score normalization can be done for the features that exhibits normal distribution; otherwise, a general normalization is used.

Below are histograms for each of the eight features. All feature distribution are non-normal, which confirms that a general normalization should be used for the data before modeling.

hist(concrete$cement, prob = T, breaks = 30)
lines(density(concrete$cement))

hist(concrete$slag, prob = T, breaks = 30)
lines(density(concrete$slag))

hist(concrete$ash, prob = T, breaks = 30)
lines(density(concrete$ash))

hist(concrete$water, prob = T, breaks = 30)
lines(density(concrete$water))

hist(concrete$superplastic, prob = T, breaks = 30)
lines(density(concrete$superplastic))

hist(concrete$coarseagg, prob = T, breaks = 30)
lines(density(concrete$coarseagg))

hist(concrete$fineagg, prob = T, breaks = 30)
lines(density(concrete$fineagg))

hist(concrete$age, prob = T, breaks = 30)
lines(density(concrete$age))

Below is the general normalization function created. The normalization function is applied to the whole set of concrete.csv data into concrete_norm.

normalize <- function(x) { 
  return((x - min(x)) / (max(x) - min(x)))
}

concrete_norm <- as.data.frame(lapply(concrete, normalize))

A summary() for a feature to confirm the data has been normalized in the 0 to 1 range.

summary(concrete_norm$strength)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2664  0.4001  0.4172  0.5457  1.0000
summary(concrete$strength)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.33   23.71   34.44   35.82   46.14   82.60

75% of the data is splitted to a trained dataset, and the rest of 25% is splitted to testing dataset.

concrete_train <- concrete_norm[1:773, ]
concrete_test <- concrete_norm[774:1030, ]

Step 3: Training a model on the data

An ANN model is built from the trained dataset using neuralnet() in the neuralnet package. Using all eight features as predictors to generate a model for the concrete strength. Numerical values on the line per feature is its weight that shows importance contribution for the output neuron. Blue values are the constant bias terms that allows the values at the indicated nodes to be shifted. We can look at the simple one-hidden neuron ANN plot as a regression with multiple coefficient; weights are the coefficient for each feature right next to it, and blue value for the bias term is the intercept, while the bias term is 1. The model has a sum square error (SSE) of 5.08 and it contains 4882 steps for building the model.

library(neuralnet)
## Warning: package 'neuralnet' was built under R version 3.3.3
set.seed(12345) 
concrete_model <- neuralnet(formula = strength ~ cement + slag +
                              ash + water + superplastic + 
                              coarseagg + fineagg + age,
                            data = concrete_train)

plot(concrete_model)

Another function, plotnet() under the NeuralNetTools package can be used to depict the ANN with visual presentation of weights in line thickness per feature.

library(NeuralNetTools)
## Warning: package 'NeuralNetTools' was built under R version 3.3.3
par(mar = numeric(4), family = 'serif')
plotnet(concrete_model, alpha = 0.6)

Step 4: Evaluating model performance

Applying the ANN model on the tested dataset to get an object that contains two components of the model; the ‘neurons’ store the neuron infomation for each layer in the network, and the ‘net.result’ stores the predicted value. The correlation of the predicted and actual value of concrete strength in the tested data is examine. Correlation is to be 0.81.

model_results <- compute(concrete_model, concrete_test[1:8])
predicted_strength <- model_results$net.result

cor(predicted_strength, concrete_test$strength)   
##              [,1]
## [1,] 0.8064655576

However, the ANN contains linear relationship between features (x) and outcome (y) after the features are normalized and modeled. The prediction of concrete strength is linear with its predictors, but uncomparable to the actual concrete strength in the tested dataset. Therefore, it need to be ‘un-normalized’ using reversed of the general normalization formula mentioned earlier.

head(predicted_strength)
##             [,1]
## 774 0.3258991537
## 775 0.4677425372
## 776 0.2370268181
## 777 0.6718811029
## 778 0.4663428766
## 779 0.4685272270

The original minimum and maximum values in the concrete strength is stored and used for constructing the unnormalized function, which is then applied to the predicted concrete strength values produced from the ANN model. Finally, prediction of the concrete strength from the tested data is converted back to its normal scale, and can be used for comparison with the actual strength data in the tested dataset.

strength_min <- min(concrete$strength)
strength_max <- max(concrete$strength)

unnormalize <- function(x, min, max) { 
  return( (max - min)*x + min )
}

strength_pred <- unnormalize(predicted_strength, strength_min, strength_max)
head(strength_pred, n =10)
##            [,1]
## 774 28.48992507
## 775 39.87569346
## 776 21.35614269
## 777 56.26189613
## 778 39.76334271
## 779 39.93868051
## 780 40.33043037
## 781 49.54027647
## 782 28.18084528
## 783 20.51360463

Step5: Improving Model Performance

One way to improve the ANN model is by adding more hidden nodes. Hidden nodes are nodes in the hidden layers, and are nodes not directly associate with each input feature, but in connnection with input nodes or mode hidden nodes. The number of hidden nodes should be determined by the number of input nodes, the amount of training data, noisy data and the complexity of the learning task. In general, more hidden nodes in the model provide higher learning ability with more complex problem. An improved model is by using 5 hidden nodes with fully connection, which means that every node in one layer is connected with every node in the next layer. This will improve the number of connections and that will greately improve the performance. A 86849 steps compared to 4882 steps in the previous model. The SSE of this model is 1.63, sum square error is 3 times smaller than the previous model with one hidden node.

Using the plot() on the model object can show the ANN model and connection between nodes with their respective weight in number. However, a more complicated model like this will be hard to see the weight value for each feature. A better visual presentation will be using the plotnet() for the model object to see respective weight of each feature node to another hidden node based on the line thickness. A black like indicates a positive effect in weight, while a white/greyish line indicates a negative effect in weight.

set.seed(12345) 
concrete_model2 <- neuralnet(strength ~ cement + slag +
                               ash + water + superplastic + 
                               coarseagg + fineagg + age,
                             data = concrete_train, hidden = 5, 
                             act.fct = "logistic")

plot(concrete_model2)
par(mar = numeric(4), family = 'serif')
plotnet(concrete_model2, alpha = 0.6)

The correlation of predicted vs. actual values of the concrete strength is 0.92. 12% more correlated compared to the previous model.

model_results2 <- compute(concrete_model2, concrete_test[1:8])
predicted_strength2 <- model_results2$net.result
cor(predicted_strength2, concrete_test$strength) 
##              [,1]
## [1,] 0.9244533426

Conclusion: Artificial neural networks (ANN) is an algorithm based on the ideas of sending signal from one node to the other when the signal has passed a certain threshold. A simple version includes the summation of individual features with its respective weight pass to the next node or output node as a activation function. There are many activation function types in which most of them has a narrow range of x values that affects the y outcome. Therefore, normalization of feature values closed to zero is necessary before modeling, but an un-normalization step is required for obtaining useful predicted information after a model has been selcted and used. One improvement of the ANN is to increased numbers of hidden nodes in the model that create more connection for better solving complex data. Another way is to find a more suitable activation function for the data by trials & errors. A better model usually yields to a higher correlation between the predicted and actual values for the target feature.