library('neuralnet')
concrete <- read.csv("https://s3.us-east-2.amazonaws.com/artificium.us/datasets/concrete.csv")
head(concrete)
summary(concrete)
cement slag ash water superplastic coarseagg fineagg age
Min. :102.0 Min. : 0.0 Min. : 0.00 Min. :121.8 Min. : 0.000 Min. : 801.0 Min. :594.0 Min. : 1.00
1st Qu.:192.4 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.:164.9 1st Qu.: 0.000 1st Qu.: 932.0 1st Qu.:731.0 1st Qu.: 7.00
Median :272.9 Median : 22.0 Median : 0.00 Median :185.0 Median : 6.400 Median : 968.0 Median :779.5 Median : 28.00
Mean :281.2 Mean : 73.9 Mean : 54.19 Mean :181.6 Mean : 6.205 Mean : 972.9 Mean :773.6 Mean : 45.66
3rd Qu.:350.0 3rd Qu.:142.9 3rd Qu.:118.30 3rd Qu.:192.0 3rd Qu.:10.200 3rd Qu.:1029.4 3rd Qu.:824.0 3rd Qu.: 56.00
Max. :540.0 Max. :359.4 Max. :200.10 Max. :247.0 Max. :32.200 Max. :1145.0 Max. :992.6 Max. :365.00
strength
Min. : 2.33
1st Qu.:23.71
Median :34.45
Mean :35.82
3rd Qu.:46.13
Max. :82.60
str(concrete)
'data.frame': 1030 obs. of 9 variables:
$ cement : num 540 540 332 332 199 ...
$ slag : num 0 0 142 142 132 ...
$ ash : num 0 0 0 0 0 0 0 0 0 0 ...
$ water : num 162 162 228 228 192 228 228 228 228 228 ...
$ superplastic: num 2.5 2.5 0 0 0 0 0 0 0 0 ...
$ coarseagg : num 1040 1055 932 932 978 ...
$ fineagg : num 676 676 594 594 826 ...
$ age : int 28 28 270 365 360 90 365 28 28 28 ...
$ strength : num 80 61.9 40.3 41 44.3 ...
table(is.na(concrete))
FALSE
9270
normalize <- function(x) {
a <- x - min(x)
b <- max(x) - min(x)
return(a / b)
}
The goal of the code chunk below is to rescale the data as some of the variables have very wide scales like a min of 0 and a maximum of 200. we are using teh function above to normalize the data, that means bring it to a range of 0 - 1
concrete_norm <- as.data.frame(lapply(concrete, normalize))
compare two variables
summary(concrete$age)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 7.00 28.00 45.66 56.00 365.00
summary(concrete_norm$age)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.01648 0.07418 0.12270 0.15110 1.00000
split into train and test
concrete_train <- concrete_norm[1:773,]
concrete_test <- concrete_norm[774:1030, ]
install neuralnet package build the model
m <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age,
data = concrete_train)
evaluate the model performance
model_resuts <- compute(m, concrete_test[1:8])
The compute function returns 1.neurons and 2.result
summary(model_resuts)
Length Class Mode
neurons 2 -none- list
net.result 257 -none- numeric
Becasue this is a numerical predition problem and not a classification problem, we cannot use a ocnfusion matrix to examine model accuracy
cor(predicted_strength, concrete_test$strength)
[,1]
[1,] 0.7204596
correlations close to 1 indicate strong liner relationships between variables
Improve the model performance
m2 <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age,
hidden = 5,
data = concrete_train)
Compare the two models: Model 1:Error 5.6671 hidden layer : 1 steps: 2349
Model2:Error 1.5542 hidden layer: 5 steps: 56406
Hyperparameters like ‘hidde’ are design choices, not facts. You try , evaluate and justify with evidence
model_resuts_2 <- compute(m2, concrete_test[1:8])
summary(model_resuts_2)
Length Class Mode
neurons 2 -none- list
net.result 257 -none- numeric
predicted_strength_2 <- model_resuts_2$net.result
cor(predicted_strength_2, concrete_test$strength)
[,1]
[1,] 0.8237139
The choice of activation function is importnat in deep learining rectifier activation function ReLU-> rectified linear unit A node in a neural network that uses the rectifier activation function
As depicted in the following figure, the rectifier activation function is defined such that it returns x if x is at least zero, and zero otherwise. The significance of this function is due to the fact that it is nonlinear yet has simple mathematical properties that make it both computationally inexpensive and highly efficient for gradient descent. Unfortunately, its derivative is undefined at x = 0 and therefore cannot be used with the neuralnet() function.
Instead, we can use a smooth approximation of the ReLU known as softplus or SmoothReLU, an activation function defined as log(1 + ex). As shown in the following figure, the softplus function is nearly zero for x less than zero and approximately x when x is greater than zero:
#softplus <- function(x) {
#log(1 + exp(x))
#}
Add the softplus activation function to neural bet through act.fct # lets add a second hidden layer of 5 nodes
softplus <- function(x) {
log(1 + exp(x))
}
set.seed(12345)
m3 <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age,
hidden = c(5,5),
data = concrete_train,
act.fct = softplus,
stepmax = 1e6
)
plot(m3)
Compare the three models: Model 1:Error 5.6671 activation function: default hidden layer : 1 steps: 2349
Model2:Error 1.5542 hidden layer: 5 activation function: default steps: 56406
Model2:Error 1.2606 hidden layer: 5 + 5 activation function: softplus steps: 467665
model_resuts_3 <- compute(m3, concrete_test[1:8])
summary(model_resuts_3)
Length Class Mode
neurons 3 -none- list
net.result 257 -none- numeric
predicted_strength_3 <- model_resuts_3$net.result
cor(predicted_strength_3, concrete_test$strength)
[,1]
[1,] 0.773088
M3 has a lower training error but a worse case correlation. This is a case of overfitting.
**** when trauining keeps improving but testing gets worse, ypou have crossed the overfitting line. ********