HW #6 - Part I: ANN Analysis on the Concrete Data

In this analysis we will be providing estimates of the performance of building materials. Estimate requirements are important to develop safety guidelines governing the materials used in the construction of buildings, bridges, and roadways. A ANN model could predict concrete strength given a listing of composition of the input materials could results in safer construction practices.

Step 1 - Data Collection

For this ANN analysis, we will utilize the concrete data donated to the UCI Machine Learning Data Repository (http://archive.ics.uci.edu/ml). We will attempt to replicate his work using a simple neural network model in R.

Step 2 - Exploring and Preparing the Data

the concrete dataset contains 1,030 examples of concrete with eight features describing the components used in the mixture. According to Lantz, these features are thought to be related to the final compressive strength and they include the amount (in kilograms per cubic meter) of cement, slag, ash, water, superplasticizer, coarse aggregate, and fine aggregate used in the product in addition to the aging time (measured in days).

concrete <- read.csv("concrete.csv")
str(concrete)

## 'data.frame':    1030 obs. of  9 variables:
##  $ cement      : num  141 169 250 266 155 ...
##  $ slag        : num  212 42.2 0 114 183.4 ...
##  $ ash         : num  0 124.3 95.7 0 0 ...
##  $ water       : num  204 158 187 228 193 ...
##  $ superplastic: num  0 10.8 5.5 0 9.1 0 0 6.4 0 9 ...
##  $ coarseagg   : num  972 1081 957 932 1047 ...
##  $ fineagg     : num  748 796 861 670 697 ...
##  $ age         : int  28 14 28 28 28 90 7 56 28 28 ...
##  $ strength    : num  29.9 23.5 29.2 45.9 18.3 ...

The nine variables in the data frame correspond to the eight features and one outcome we expected, although a problem has become apparent. Neural networks work best when the input data are scaled to a narrow range around zero, and here, we see values ranging anywhere from zero up to over a thousand.

Typically, the solution to this problem is to rescale the data with a normalizing or standardization function. f the data follow a bell-shaped curve (a normal distribution), then it may make sense to use standardization via R’s built-in scale() function. On the other hand, if the data follow a uniform distribution or are severely nonnormal, then normalization to a 0-1 range may be more appropriate. In this case, we’ll use the latter.

So, we’ll go ahead defined our own normalize() function as:

normalize <- function(x) {
return((x - min(x)) / (max(x) - min(x)))
}

Let’s apply to every column in the concrete data frame using the lapply() function and verify.

concrete_norm <- as.data.frame(lapply(concrete, normalize))
summary(concrete_norm$strength)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2664  0.4001  0.4172  0.5457  1.0000

As you can see below, the original minimum and maximum values were 2.33 and 82.60:

summary(concrete$strength)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.33   23.71   34.44   35.82   46.14   82.60

Step 3 - Training a Model on the Data

We will partition the data into a training set with 75% of the examples and a testing set with 25%.

concrete_train <- concrete_norm[1:773, ]
concrete_test <- concrete_norm[774:1030, ]

Let’s begin by installing the neuralnet library package with the training set as the simplest multilayer feedforward network with only a single hidden node:

#install.packages("neuralnet")
library(neuralnet)
library(grid)
library(MASS)

concrete_model <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age, data = concrete_train)

The following will produce a plot visually the network topology using the plot() function on the resulting model object.

plot(concrete_model)

# alternative plot
#install.packages("NeuralNetTools")
library(NeuralNetTools)

# plotnet
par(mar = numeric(4), family = 'serif')
plotnet(concrete_model, alpha = 0.6)

As you can see above in this simple mode, there is one input node for each of the eight features followed by a single hidden node and a single output node that predicts the concrete strength. The weights for each connection is displayed. Some bias terms are indicated by nodes labeled with the number 1. These bias terms are numeric constants that allow the vlaue at the indicated nodes to be shifted upward or downward, much like the intercept in a linear equation.

The bottom of the plot displays the number of training steps and Sum of Squared Errors (SSE) (sum of the squared predicted minus actual values). In our mode, we have an SSE of 5.077771 and 4293 training steps.

A lower SSE implies better predictive performance. This is helpful for estimating the model’s performance on the training data, but tells us little about how it will perform on unseen data.

Step 4 - Evaluating Model Performance

The network topology diagram gives us a peek into the black box of the ANN, but it doesn’t provide much information about how well the model fits future data. To generate predictions on the test dataset, we can use the compute() as follows:

model_results <- compute(concrete_model, concrete_test[1:8])

Note: The compute() function works a bit differently from the predict() functions we’ve used so far. It returns a list with two components: $neurons, which stores the neurons for each layer in the network, and $net.result, which stores the predicted values. Let’s use ** $net.result **.

predicted_strength <- model_results$net.result

Since his is a numeric prediction problem rather than a classification problem, we cannot use a confusion matrix to examine model accuracy. Instead, we must measure the correlation between our predicted concrete strength and the true value. This provides insight into the strength of the linear association between the two variables. Thus, we will use the cor() function to obtain a correlation between the two numeric vectors.

cor(predicted_strength, concrete_test$strength)

##              [,1]
## [1,] 0.8063798108

As a refresher, a correlation close to 1 indicate strong linear relationships between two variables. Therefore, the correlation here of about 0.8063904949 indicates a fairly strong relationship. This implies that our model is doing a fairly good job, even with only a single hidden node.

Given that we only used one hidden node, it is likely that we can improve the performance of our model. Let’s improve on this in Step 5.

Step 5 - Improving Model Performance

As networks with more complex topologies are capable of learning more difficult concepts, let’s see what happens when we increase the number of hidden nodes to five. We use the neuralnet() function as before, but add the hidden = 5 parameter:

concrete_model2 <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age, data = concrete_train, hidden = 5)

Let’s plot the network again if we see any drastic increase in the number of connections.

plot(concrete_model2)

#alternative plot using plotnet
par(mar = numeric(4), family = 'serif')
plotnet(concrete_model2, alpha = 0.6)

As you can see from our newly plotted model, we have a much reduced SSE of 1.692125 compared to from the previous model 5.077771. We also have an increased training steps of 12,932 compared to 4,293 from the previous model along a with a few bias terms - which are indicated by nodes labeled with the number 1. This should come as no surprise given how complex the model has become. A more complex network take more iterations to find the optimal weights.

Let’s continue with our predictions and correlations.

model_results2 <- compute(concrete_model2, concrete_test[1:8])
predicted_strength2 <- model_results2$net.result
cor(predicted_strength2, concrete_test$strength)

##              [,1]
## [1,] 0.9309175626

As you can see from above, the correlation is close to 1 - 0.932956629 - which is a drastic improvement from our earlier correlation of 0.8063904949.

References

Lantz, Brett. Machine Learning with R. 2nd ed. Birmingham: Packt Publishing Ltd, 2015. Print. , 2013. Print.

EOF