Set the working directory. Upload the Data
rr concrete <- read.csv(_Data.csv) str(concrete)
'data.frame': 1030 obs. of 9 variables:
$ Cement..component.1..kg.in.a.m.3.mixture. : num 540 540 332 332 199 ...
$ Blast.Furnace.Slag..component.2..kg.in.a.m.3.mixture.: num 0 0 142 142 132 ...
$ Fly.Ash..component.3..kg.in.a.m.3.mixture. : num 0 0 0 0 0 0 0 0 0 0 ...
$ Water...component.4..kg.in.a.m.3.mixture. : num 162 162 228 228 192 228 228 228 228 228 ...
$ Superplasticizer..component.5..kg.in.a.m.3.mixture. : num 2.5 2.5 0 0 0 0 0 0 0 0 ...
$ Coarse.Aggregate...component.6..kg.in.a.m.3.mixture. : num 1040 1055 932 932 978 ...
$ Fine.Aggregate..component.7..kg.in.a.m.3.mixture. : num 676 676 594 594 826 ...
$ Age..day. : num 28 28 270 365 360 90 365 28 28 28 ...
$ Concrete.compressive.strength.MPa..megapascals.. : num 80 61.9 40.3 41 44.3 ...
Rename the names in the data so that it is easy to handle
rr summary(concrete$strength)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.33 23.71 34.45 35.82 46.13 82.60
We need to normalise the data. If the data follows a normal binomial curve or bell curve we could have used the scale() function in R. Since here the data may follow a uniform continuous distribution or may be severely non-normal hence we will write our own function for a normalisation method to normalise the data to 0-1 range as this may be appropriate.
rr normalise <- function(x) { return((x - min(x)) / (max(x) - min(x))) }
Now after executing this code we can use the normalisation function to every column using lapply()
rr concrete_norm <- as.data.frame(lapply(concrete, normalise)) # to see that the normalisation has worked, we can see that the minimum and maximum strength are now 0 to 1 respectively summary(concrete_norm$strength)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.2664 0.4001 0.4172 0.5457 1.0000
We will divide the dataset to training and test datasets
rr library(caTools) set.seed(123) split = sample.split(concrete_norm$strength, SplitRatio = 0.75) concrete_train = subset(concrete_norm, split == TRUE) concrete_test = subset(concrete_norm, split == FALSE)
To model the relationship between the ingredients used in concrete and the strength of the finished product, we will use a multilayer feedforward neural network by using the neuralnet package. It offers a standard and easy to use implementation, and offers a function to plot the network topology. This is a strong choice for learning more about neural networks.
Lets first start with installing the package.
rr install.packages()
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/neuralnet_1.33.tgz'
Content type 'application/x-gzip' length 123627 bytes (120 KB)
==================================================
downloaded 120 KB
The downloaded binary packages are in
/var/folders/bn/gjt32xks0ys55l7pddrpz6mh0000gn/T//RtmpiA94ME/downloaded_packages
rr # load the package library(neuralnet)
Lets implement the neural net
rr concrete_model <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg+age, data = concrete_train) plot(concrete_model)
Lets now evaluate the performance of the model. The compute() function works a little differently. It returns a list with two compoenents $neurons, which stores the neurons for each layer in the network and $net.result, that stores the predicted values
rr model_results <- compute(concrete_model, concrete_test[1:8])
Since this is a numerical problem we need to use the corelation between our predicted outcome and the true value. This will provide us with the strength of the linear association between the two variables. more the cor better the association and better the prediction.
rr predicted_strength <- model_results\(net.result cor(predicted_strength, concrete_test\)strength)
[,1]
[1,] 0.8107229034
Given that we have used only one hidden layer with one input layer our prediction is better. Lets try and improve on this.
rr concrete_model2 <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg+age, data = concrete_train, hidden = 5) plot(concrete_model2)
Now lets deploy our earlier compute and assessment methodologies
rr model_results2 <- compute(concrete_model2, concrete_test[1:8]) predicted_strength2 <- model_results2\(net.result cor(predicted_strength2, concrete_test\)strength)
[,1]
[1,] 0.9320676091