Concrete Data

Set the working directory. Upload the Data

rr concrete <- read.csv(_Data.csv) str(concrete)

'data.frame':   1030 obs. of  9 variables:
 $ Cement..component.1..kg.in.a.m.3.mixture.            : num  540 540 332 332 199 ...
 $ Blast.Furnace.Slag..component.2..kg.in.a.m.3.mixture.: num  0 0 142 142 132 ...
 $ Fly.Ash..component.3..kg.in.a.m.3.mixture.           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Water...component.4..kg.in.a.m.3.mixture.            : num  162 162 228 228 192 228 228 228 228 228 ...
 $ Superplasticizer..component.5..kg.in.a.m.3.mixture.  : num  2.5 2.5 0 0 0 0 0 0 0 0 ...
 $ Coarse.Aggregate...component.6..kg.in.a.m.3.mixture. : num  1040 1055 932 932 978 ...
 $ Fine.Aggregate..component.7..kg.in.a.m.3.mixture.    : num  676 676 594 594 826 ...
 $ Age..day.                                            : num  28 28 270 365 360 90 365 28 28 28 ...
 $ Concrete.compressive.strength.MPa..megapascals..     : num  80 61.9 40.3 41 44.3 ...

Rename the names in the data so that it is easy to handle

rr summary(concrete$strength)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.33   23.71   34.45   35.82   46.13   82.60

We need to normalise the data. If the data follows a normal binomial curve or bell curve we could have used the scale() function in R. Since here the data may follow a uniform continuous distribution or may be severely non-normal hence we will write our own function for a normalisation method to normalise the data to 0-1 range as this may be appropriate.

rr normalise <- function(x) { return((x - min(x)) / (max(x) - min(x))) }

Now after executing this code we can use the normalisation function to every column using lapply()

rr concrete_norm <- as.data.frame(lapply(concrete, normalise)) # to see that the normalisation has worked, we can see that the minimum and maximum strength are now 0 to 1 respectively summary(concrete_norm$strength)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.2664  0.4001  0.4172  0.5457  1.0000

We will divide the dataset to training and test datasets

rr library(caTools) set.seed(123) split = sample.split(concrete_norm$strength, SplitRatio = 0.75) concrete_train = subset(concrete_norm, split == TRUE) concrete_test = subset(concrete_norm, split == FALSE)

To model the relationship between the ingredients used in concrete and the strength of the finished product, we will use a multilayer feedforward neural network by using the neuralnet package. It offers a standard and easy to use implementation, and offers a function to plot the network topology. This is a strong choice for learning more about neural networks.

Lets first start with installing the package.

rr install.packages()

trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/neuralnet_1.33.tgz'
Content type 'application/x-gzip' length 123627 bytes (120 KB)
==================================================
downloaded 120 KB

The downloaded binary packages are in
    /var/folders/bn/gjt32xks0ys55l7pddrpz6mh0000gn/T//RtmpiA94ME/downloaded_packages

rr # load the package library(neuralnet)

Lets implement the neural net

rr concrete_model <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg+age, data = concrete_train) plot(concrete_model)

Lets now evaluate the performance of the model. The compute() function works a little differently. It returns a list with two compoenents $neurons, which stores the neurons for each layer in the network and $net.result, that stores the predicted values

rr model_results <- compute(concrete_model, concrete_test[1:8])

Since this is a numerical problem we need to use the corelation between our predicted outcome and the true value. This will provide us with the strength of the linear association between the two variables. more the cor better the association and better the prediction.

rr predicted_strength <- model_results$net.result cor(predicted_strength, concrete_test$strength)

             [,1]
[1,] 0.8107229034

Given that we have used only one hidden layer with one input layer our prediction is better. Lets try and improve on this.

rr concrete_model2 <- neuralnet(strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg+age, data = concrete_train, hidden = 5) plot(concrete_model2)

Now lets deploy our earlier compute and assessment methodologies

rr model_results2 <- compute(concrete_model2, concrete_test[1:8]) predicted_strength2 <- model_results2$net.result cor(predicted_strength2, concrete_test$strength)

             [,1]
[1,] 0.9320676091

LS0tDQp0aXRsZTogIkNvbmNyZXRlIERhdGEiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpTZXQgdGhlIHdvcmtpbmcgZGlyZWN0b3J5Lg0KVXBsb2FkIHRoZSBEYXRhDQoNCmBgYHtyfQ0KY29uY3JldGUgPC0gcmVhZC5jc3YoIkNvbmNyZXRlX0RhdGEuY3N2IikNCnN0cihjb25jcmV0ZSkNCmBgYA0KDQpSZW5hbWUgdGhlIG5hbWVzIGluIHRoZSBkYXRhIHNvIHRoYXQgaXQgaXMgZWFzeSB0byBoYW5kbGUNCmBgYHtyfQ0KbmFtZXMoY29uY3JldGUpIDwtIGMoImNlbWVudCIsICJzbGFnIiwgImFzaCIsICJ3YXRlciIsICJzdXBlcnBsYXN0aWMiLCAiY29hcnNlYWdnIiwgImZpbmVhZ2ciLCAiYWdlIiwgInN0cmVuZ3RoIikNCm5hbWVzKGNvbmNyZXRlKQ0Kc3RyKGNvbmNyZXRlKQ0Kc3VtbWFyeShjb25jcmV0ZSRzdHJlbmd0aCkNCmBgYA0KV2UgbmVlZCB0byBub3JtYWxpc2UgdGhlIGRhdGEuIElmIHRoZSBkYXRhIGZvbGxvd3MgYSBub3JtYWwgYmlub21pYWwgY3VydmUgb3IgYmVsbCBjdXJ2ZSB3ZSBjb3VsZCBoYXZlIHVzZWQgdGhlIHNjYWxlKCkgZnVuY3Rpb24gaW4gUi4gU2luY2UgaGVyZSB0aGUgZGF0YSBtYXkgZm9sbG93IGEgdW5pZm9ybSBjb250aW51b3VzIGRpc3RyaWJ1dGlvbiBvciBtYXkgYmUgc2V2ZXJlbHkgbm9uLW5vcm1hbCBoZW5jZSB3ZSB3aWxsIHdyaXRlIG91ciBvd24gZnVuY3Rpb24gZm9yIGEgbm9ybWFsaXNhdGlvbiBtZXRob2QgdG8gbm9ybWFsaXNlIHRoZSBkYXRhIHRvIDAtMSByYW5nZSBhcyB0aGlzIG1heSBiZSBhcHByb3ByaWF0ZS4NCmBgYHtyfQ0Kbm9ybWFsaXNlIDwtIGZ1bmN0aW9uKHgpIHsNCiAgcmV0dXJuKCh4IC0gbWluKHgpKSAvIChtYXgoeCkgLSBtaW4oeCkpKQ0KfQ0KYGBgDQoNCk5vdyBhZnRlciBleGVjdXRpbmcgdGhpcyBjb2RlIHdlIGNhbiB1c2UgdGhlIG5vcm1hbGlzYXRpb24gZnVuY3Rpb24gdG8gZXZlcnkgY29sdW1uIHVzaW5nIGxhcHBseSgpDQoNCmBgYHtyfQ0KY29uY3JldGVfbm9ybSA8LSBhcy5kYXRhLmZyYW1lKGxhcHBseShjb25jcmV0ZSwgbm9ybWFsaXNlKSkNCiMgdG8gc2VlIHRoYXQgdGhlIG5vcm1hbGlzYXRpb24gaGFzIHdvcmtlZCwgd2UgY2FuIHNlZSB0aGF0IHRoZSBtaW5pbXVtIGFuZCBtYXhpbXVtIHN0cmVuZ3RoIGFyZSBub3cgMCB0byAxIHJlc3BlY3RpdmVseQ0Kc3VtbWFyeShjb25jcmV0ZV9ub3JtJHN0cmVuZ3RoKQ0KYGBgDQpXZSB3aWxsIGRpdmlkZSB0aGUgZGF0YXNldCB0byB0cmFpbmluZyBhbmQgdGVzdCBkYXRhc2V0cw0KYGBge3J9DQpsaWJyYXJ5KGNhVG9vbHMpDQpzZXQuc2VlZCgxMjMpDQpzcGxpdCA9IHNhbXBsZS5zcGxpdChjb25jcmV0ZV9ub3JtJHN0cmVuZ3RoLCBTcGxpdFJhdGlvID0gMC43NSkNCmNvbmNyZXRlX3RyYWluID0gc3Vic2V0KGNvbmNyZXRlX25vcm0sIHNwbGl0ID09IFRSVUUpDQpjb25jcmV0ZV90ZXN0ID0gc3Vic2V0KGNvbmNyZXRlX25vcm0sIHNwbGl0ID09IEZBTFNFKQ0KYGBgDQoNClRvIG1vZGVsIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgaW5ncmVkaWVudHMgdXNlZCBpbiBjb25jcmV0ZSBhbmQgdGhlIHN0cmVuZ3RoIG9mIHRoZSBmaW5pc2hlZCBwcm9kdWN0LCB3ZSB3aWxsIHVzZSBhIG11bHRpbGF5ZXIgZmVlZGZvcndhcmQgbmV1cmFsIG5ldHdvcmsgYnkgdXNpbmcgdGhlIG5ldXJhbG5ldCBwYWNrYWdlLiBJdCBvZmZlcnMgYSBzdGFuZGFyZCBhbmQgZWFzeSB0byB1c2UgaW1wbGVtZW50YXRpb24sIGFuZCBvZmZlcnMgYSBmdW5jdGlvbiB0byBwbG90IHRoZSBuZXR3b3JrIHRvcG9sb2d5LiBUaGlzIGlzIGEgc3Ryb25nIGNob2ljZSBmb3IgbGVhcm5pbmcgbW9yZSBhYm91dCBuZXVyYWwgbmV0d29ya3MuDQoNCkxldHMgZmlyc3Qgc3RhcnQgd2l0aCBpbnN0YWxsaW5nIHRoZSBwYWNrYWdlLg0KYGBge3J9DQppbnN0YWxsLnBhY2thZ2VzKCJuZXVyYWxuZXQiKQ0KIyBsb2FkIHRoZSBwYWNrYWdlDQpsaWJyYXJ5KG5ldXJhbG5ldCkNCmBgYA0KTGV0cyBpbXBsZW1lbnQgdGhlIG5ldXJhbCBuZXQNCmBgYHtyfQ0KY29uY3JldGVfbW9kZWwgPC0gbmV1cmFsbmV0KHN0cmVuZ3RoIH4gY2VtZW50ICsgc2xhZyArIGFzaCArIHdhdGVyICsgc3VwZXJwbGFzdGljICsgY29hcnNlYWdnICsgZmluZWFnZythZ2UsIA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgIGRhdGEgPSBjb25jcmV0ZV90cmFpbikNCnBsb3QoY29uY3JldGVfbW9kZWwpDQpgYGANCg0KTGV0cyBub3cgZXZhbHVhdGUgdGhlIHBlcmZvcm1hbmNlIG9mIHRoZSBtb2RlbC4gVGhlIGNvbXB1dGUoKSBmdW5jdGlvbiB3b3JrcyBhIGxpdHRsZSBkaWZmZXJlbnRseS4gSXQgcmV0dXJucyBhIGxpc3Qgd2l0aCB0d28gY29tcG9lbmVudHMgJG5ldXJvbnMsIHdoaWNoIHN0b3JlcyB0aGUgbmV1cm9ucyBmb3IgZWFjaCBsYXllciBpbiB0aGUgbmV0d29yayBhbmQgJG5ldC5yZXN1bHQsIHRoYXQgc3RvcmVzIHRoZSBwcmVkaWN0ZWQgdmFsdWVzDQoNCmBgYHtyfQ0KbW9kZWxfcmVzdWx0cyA8LSBjb21wdXRlKGNvbmNyZXRlX21vZGVsLCBjb25jcmV0ZV90ZXN0WzE6OF0pDQpgYGANClNpbmNlIHRoaXMgaXMgYSBudW1lcmljYWwgcHJvYmxlbSB3ZSBuZWVkIHRvIHVzZSB0aGUgY29yZWxhdGlvbiBiZXR3ZWVuIG91ciBwcmVkaWN0ZWQgb3V0Y29tZSBhbmQgdGhlIHRydWUgdmFsdWUuIFRoaXMgd2lsbCBwcm92aWRlIHVzIHdpdGggdGhlIHN0cmVuZ3RoIG9mIHRoZSBsaW5lYXIgYXNzb2NpYXRpb24gYmV0d2VlbiB0aGUgdHdvIHZhcmlhYmxlcy4gbW9yZSB0aGUgY29yIGJldHRlciB0aGUgYXNzb2NpYXRpb24gYW5kIGJldHRlciB0aGUgcHJlZGljdGlvbi4NCmBgYHtyfQ0KcHJlZGljdGVkX3N0cmVuZ3RoIDwtIG1vZGVsX3Jlc3VsdHMkbmV0LnJlc3VsdA0KY29yKHByZWRpY3RlZF9zdHJlbmd0aCwgY29uY3JldGVfdGVzdCRzdHJlbmd0aCkNCmBgYA0KR2l2ZW4gdGhhdCB3ZSBoYXZlIHVzZWQgb25seSBvbmUgaGlkZGVuIGxheWVyIHdpdGggb25lIGlucHV0IGxheWVyIG91ciBwcmVkaWN0aW9uIGlzIGJldHRlci4gTGV0cyB0cnkgYW5kIGltcHJvdmUgb24gdGhpcy4NCg0KYGBge3J9DQpjb25jcmV0ZV9tb2RlbDIgPC0gbmV1cmFsbmV0KHN0cmVuZ3RoIH4gY2VtZW50ICsgc2xhZyArIGFzaCArIHdhdGVyICsgc3VwZXJwbGFzdGljICsgY29hcnNlYWdnICsgZmluZWFnZythZ2UsIA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgIGRhdGEgPSBjb25jcmV0ZV90cmFpbiwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICBoaWRkZW4gPSA1KQ0KcGxvdChjb25jcmV0ZV9tb2RlbDIpDQpgYGANCg0KTm93IGxldHMgZGVwbG95IG91ciBlYXJsaWVyIGNvbXB1dGUgYW5kIGFzc2Vzc21lbnQgbWV0aG9kb2xvZ2llcw0KDQpgYGB7cn0NCm1vZGVsX3Jlc3VsdHMyIDwtIGNvbXB1dGUoY29uY3JldGVfbW9kZWwyLCBjb25jcmV0ZV90ZXN0WzE6OF0pDQpwcmVkaWN0ZWRfc3RyZW5ndGgyIDwtIG1vZGVsX3Jlc3VsdHMyJG5ldC5yZXN1bHQNCmNvcihwcmVkaWN0ZWRfc3RyZW5ndGgyLCBjb25jcmV0ZV90ZXN0JHN0cmVuZ3RoKQ0KYGBgDQoNCg==