# Loading required packages
library(FrF2)
## Warning: package 'FrF2' was built under R version 3.1.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.1.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.1.2
##
## Attaching package: 'DoE.base'
##
## The following objects are masked from 'package:stats':
##
## aov, lm
##
## The following object is masked from 'package:graphics':
##
## plot.design
library(DoE.base)
library(conf.design)
library(gridBase)
## Warning: package 'gridBase' was built under R version 3.1.2
library(rsm)
## Warning: package 'rsm' was built under R version 3.1.2
library(rgl)
## Warning: package 'rgl' was built under R version 3.1.2
Design: This experiment is designed to enable us to estimate the interaction and quadratic effects, and thus give us an idea of the shape of the response surface we are investigating. For this reason, this experiment is termed response surface method (RSM) designs. RSM designs are used for the following objectives: 1.Find improved or optimal process settings 2.Troubleshoot process problems and weak points 3. Make a product or process more robust (insensitive) against external and non-controllable influences.
The test: Null Hypothesis can be stated as:
“There is no effect of the variability in each factor/independent variable used (Cement, Fly ash, Water and Superplasticizer) on the overall strength (response variable) of the concrete mixture”
OR
“The variability in the strength of the mixture cannot be explained by the variability in any of the 4 factors (interaction or main effect).
The data is a subset of a large dataset involving strength of materials. We consider only the composition and strength of concrete mixture. Independent variables are the various constituents (7 in this case)that are used to make the concrete mixture used in a variety of construction applications (buildings and bridges).
Data Type: multivariate
Abstract: Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. These ingredients include cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, and fine aggregate.
Source:See References
Data Characteristics(Randomization scheme):
The actual concrete compressive strength (MPa) for a given mixture under a specific age (days) was determined from laboratory. Data is in raw form (not scaled).
Summary Statistics:
Number of instances (observations): 1030 Number of Attributes: 9 Attribute breakdown: 8 quantitative input variables, and 1 quantitative output variable Missing Attribute Values: None
A summmary of the given dataset and exploratory data analysis is presented here
# Selecting the data file from the local machine
dataf= (read.csv(file.choose(), header=T))
attach(dataf)
# Summary of the original data
summary(dataf)
## Cement Blast.Furnace.Slag Fly.Ash Water
## Min. :102.0 Min. : 0.0 Min. : 0.00 Min. :121.8
## 1st Qu.:192.4 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.:164.9
## Median :272.9 Median : 22.0 Median : 0.00 Median :185.0
## Mean :281.2 Mean : 73.9 Mean : 54.19 Mean :181.6
## 3rd Qu.:350.0 3rd Qu.:142.9 3rd Qu.:118.30 3rd Qu.:192.0
## Max. :540.0 Max. :359.4 Max. :200.10 Max. :247.0
## Superplasticizer Coarse.Aggregate Fine.Aggregate Age
## Min. : 0.000 Min. : 801.0 Min. :594.0 Min. : 1.00
## 1st Qu.: 0.000 1st Qu.: 932.0 1st Qu.:731.0 1st Qu.: 7.00
## Median : 6.400 Median : 968.0 Median :779.5 Median : 28.00
## Mean : 6.205 Mean : 972.9 Mean :773.6 Mean : 45.66
## 3rd Qu.:10.200 3rd Qu.:1029.4 3rd Qu.:824.0 3rd Qu.: 56.00
## Max. :32.200 Max. :1145.0 Max. :992.6 Max. :365.00
## strength
## Min. : 2.33
## 1st Qu.:23.71
## Median :34.45
## Mean :35.82
## 3rd Qu.:46.13
## Max. :82.60
str(dataf)
## 'data.frame': 1030 obs. of 9 variables:
## $ Cement : num 540 540 332 332 199 ...
## $ Blast.Furnace.Slag: num 0 0 142 142 132 ...
## $ Fly.Ash : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Water : num 162 162 228 228 192 228 228 228 228 228 ...
## $ Superplasticizer : num 2.5 2.5 0 0 0 0 0 0 0 0 ...
## $ Coarse.Aggregate : num 1040 1055 932 932 978 ...
## $ Fine.Aggregate : num 676 676 594 594 826 ...
## $ Age : int 28 28 270 365 360 90 365 28 28 28 ...
## $ strength : num 80 61.9 40.3 41 44.3 ...
We perform an exploratory data analysis on the new data frame created above for a fractional factorial design.
# Generate 3 plots.
par(mfrow=c(2,2))
qqnorm(dataf$strength)
qqline(dataf$strength, col = 2)
boxplot(dataf$strength, horizontal=TRUE, main="Box Plot", xlab="Strength")
hist(dataf$strength, main="Histogram", xlab="Strength")
par(mfrow=c(1,1))
par(mfrow=c(2,2))
plot(dataf$strength~dataf$Cement,xlab="Cement in (kg/m^3)",ylab="Strength in Mpa")
plot(dataf$strength~dataf$Water,xlab="Water in (kg/m^3)",ylab="Strength in Mpa")
plot(dataf$strength~dataf$Fly.Ash,xlab="Fly Ash in (kg/m^3)",ylab="Strength in Mpa")
plot(dataf$strength~dataf$Superplasticizer,xlab="Superplasticizer in (kg/m^3)",ylab="Strength in Mpa")
Result:There is clearly a normal distribution in the way the response variable is distributed around the mean in the histogram. The boxplot does indicate some outliers, however for the most part the data is normally distributed (as is also evident by the Normal Q-Q plot).
# Generate boxplots
boxplot(dataf$strength~Cement, data=dataf, main="Strength by Cement", xlab="Cement",ylab="Strength")
boxplot(dataf$strength~Fly.Ash, data=dataf, main="Strength by Fly Ash", xlab="Fly Ash ",ylab="Strength")
boxplot(dataf$strength~Water, data=dataf, main="Strength by Water", xlab="Water",ylab="Strength")
boxplot(dataf$strength~Superplasticizer, data=dataf, main="Strength by Superplasticizer", xlab="Superplasticizer",ylab="Strength")
Result: All the 4 factors show variability at all levels and can be assumed to have an impact on the response variable. However, we cannot derive anything conclusive from this exploratory data analysis. It only provides us with the direction of our next level analysis.
As mentioned before, Response-surface methodology consists of a collection of methods for exploring the optimum operating conditions through experimental methods. Essentially, this involves doing several experiments, and then using the results of one experiment to provide direction for the next steps.
The next action could be to perform the experiment around a different set of conditions, or to collect more data in the current experimental domain in order to fit a higher-order model or simply to confirm to what we have found.
In the next step we fit a response surface model to the given data set. This is an extension of lm, and works almost like it; however, the model formula for rsm must make use of the special functions FO, TWI, PQ, or SO (for first-order,two-way interaction, pure quadratic and second-order respectively), because the presence of these specifies the response-surface portion of the model. (Reference : Response-Surface Methods in R, Using rsm Updated to version 2.00, 5 December 2012)
model=rsm(strength ~ SO(Fly.Ash,Water,Superplasticizer,Cement), data=dataf)
summary(model)
##
## Call:
## rsm(formula = strength ~ SO(Fly.Ash, Water, Superplasticizer,
## Cement), data = dataf)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.9084e+00 4.2965e+01 -0.1608 0.8722892
## Fly.Ash 3.9174e-01 1.0522e-01 3.7231 0.0002076 ***
## Water -3.3649e-01 3.6818e-01 -0.9139 0.3609740
## Superplasticizer 4.2723e+00 1.3867e+00 3.0808 0.0021199 **
## Cement 2.7183e-01 6.6724e-02 4.0739 4.985e-05 ***
## Fly.Ash:Water -2.3753e-03 5.1324e-04 -4.6281 4.169e-06 ***
## Fly.Ash:Superplasticizer -1.1084e-02 2.1610e-03 -5.1290 3.488e-07 ***
## Fly.Ash:Cement 1.0844e-04 9.3983e-05 1.1539 0.2488305
## Water:Superplasticizer -3.8639e-03 6.7919e-03 -0.5689 0.5695532
## Water:Cement -1.1319e-03 2.9734e-04 -3.8067 0.0001493 ***
## Superplasticizer:Cement -2.7978e-03 1.1653e-03 -2.4009 0.0165344 *
## Fly.Ash^2 3.2641e-04 1.8937e-04 1.7236 0.0850757 .
## Water^2 2.1972e-03 8.3489e-04 2.6318 0.0086225 **
## Superplasticizer^2 -7.4753e-02 1.4770e-02 -5.0611 4.946e-07 ***
## Cement^2 2.9066e-05 3.6103e-05 0.8051 0.4209646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.4355, Adjusted R-squared: 0.4277
## F-statistic: 55.93 on 14 and 1015 DF, p-value: < 2.2e-16
##
## Analysis of Variance Table
##
## Response: strength
## Df Sum Sq Mean Sq F value
## FO(Fly.Ash, Water, Superplasticizer, Cement) 4 102414 25603.5 160.3110
## TWI(Fly.Ash, Water, Superplasticizer, Cement) 6 15524 2587.4 16.2004
## PQ(Fly.Ash, Water, Superplasticizer, Cement) 4 7130 1782.5 11.1607
## Residuals 1015 162107 159.7
## Lack of fit 390 59704 153.1 0.9343
## Pure error 625 102403 163.8
## Pr(>F)
## FO(Fly.Ash, Water, Superplasticizer, Cement) < 2.2e-16
## TWI(Fly.Ash, Water, Superplasticizer, Cement) < 2.2e-16
## PQ(Fly.Ash, Water, Superplasticizer, Cement) 7.293e-09
## Residuals
## Lack of fit 0.7688
## Pure error
##
## Stationary point of response surface:
## Fly.Ash Water Superplasticizer Cement
## 260.842220 195.671752 6.491437 -122.606234
##
## Eigenanalysis:
## $values
## [1] 0.0028849786 0.0002362323 0.0000000000 -0.0752385223
##
## $vectors
## [,1] [,2] [,3] [,4]
## Fly.Ash -0.43958437 0.87048383 0.2088571 -0.07349947
## Water 0.87806519 0.36811032 0.3046530 -0.02613219
## Superplasticizer 0.01292765 -0.06785135 -0.0407760 -0.99677800
## Cement -0.18867958 -0.31960739 0.9283871 -0.01866945
Results: From the p-values of the various factor effects we see that there is no effect of the ‘Water’ on the response variable. However, there are some significant main effects, interaction effects and second-order effects therefore we can reject the null hypothesis based on these results.
From the analysis of variance, it is evident that the second-order (TWI and PQ) terms contribute significantly to the model, so the canonical analysis is relevant. Also, the stationary point is quite close the experimental region, but the eigenvalues are of mixed sign (both positive and negative), indicating that it is a saddle point (neither a maximum nor a minimum). We will do further analysis based on these results in order to arrive at some conclusion.
par(mfrow=c(2,3))
contour(model, ~Cement+Fly.Ash+Water+Superplasticizer, image=TRUE, at=summary(model$canonical$xs))
# Generate plots taking two factors at a time
par(mfrow=c(1,1))
persp(model, ~ Cement+Fly.Ash, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model, ~ Fly.Ash+Water, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model, ~ Water+Superplasticizer, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model, ~ Superplasticizer+Cement, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model, ~ Cement+Water, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model, ~ Superplasticizer+Fly.Ash, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
For further investigation we can use steepest descent method in case we encounter a saddle point.In the second-order case, the steepest function still works, but it uses the ridge analysis method which is the analog of steepest ascent/descent in the sense that for a specified distance d, it finds the point at which the predicted response is a maximum/minimum among all predictor combinations at radius d.
Model adequacy checking is done to test the validity of our model given the numerous assumptions it is based on.
The shapiro test evaluates the Null hypothesis such that “the samples come from a Normal distribution” against the alternative hypothesis “the samples do not come from a Normal distribution”.
The Q-Q Norm plots also test the normality of the sample. The scatter plots of the residuals and the fitted model will check the distribution of the model over the entire dynamic range.
# Finding direction for further analysis
steepest(model)
## Path of steepest ascent from ridge analysis:
## dist Fly.Ash Water Superplasticizer Cement | yhat
## 1 0.0 0.000 0.000 0.000 0.000 | -6.908
## 2 0.5 0.046 -0.040 0.495 0.032 | -4.772
## 3 1.0 0.092 -0.082 0.990 0.065 | -2.672
## 4 1.5 0.138 -0.126 1.485 0.098 | -0.608
## 5 2.0 0.184 -0.172 1.980 0.133 | 1.420
## 6 2.5 0.232 -0.221 2.474 0.168 | 3.409
## 7 3.0 0.279 -0.272 2.968 0.205 | 5.362
## 8 3.5 0.327 -0.325 3.461 0.242 | 7.276
## 9 4.0 0.375 -0.381 3.954 0.281 | 9.154
## 10 4.5 0.424 -0.440 4.447 0.321 | 10.997
## 11 5.0 0.474 -0.503 4.939 0.362 | 12.802
canonical.path(model, dist = seq(-5, 5, by = 0.5))
## dist Fly.Ash Water Superplasticizer Cement | yhat
## 1 -5.0 263.040 191.281 6.427 -121.663 | 8.538
## 2 -4.5 262.820 191.720 6.433 -121.757 | 8.524
## 3 -4.0 262.601 192.159 6.440 -121.852 | 8.512
## 4 -3.5 262.381 192.599 6.446 -121.946 | 8.501
## 5 -3.0 262.161 193.038 6.453 -122.040 | 8.492
## 6 -2.5 261.941 193.477 6.459 -122.135 | 8.484
## 7 -2.0 261.721 193.916 6.466 -122.229 | 8.477
## 8 -1.5 261.502 194.355 6.472 -122.323 | 8.472
## 9 -1.0 261.282 194.794 6.479 -122.418 | 8.468
## 10 -0.5 261.062 195.233 6.485 -122.512 | 8.466
## 11 0.0 260.842 195.672 6.491 -122.606 | 8.466
## 12 0.5 260.622 196.111 6.498 -122.701 | 8.466
## 13 1.0 260.403 196.550 6.504 -122.795 | 8.468
## 14 1.5 260.183 196.989 6.511 -122.889 | 8.472
## 15 2.0 259.963 197.428 6.517 -122.984 | 8.477
## 16 2.5 259.743 197.867 6.524 -123.078 | 8.484
## 17 3.0 259.523 198.306 6.530 -123.172 | 8.492
## 18 3.5 259.304 198.745 6.537 -123.267 | 8.501
## 19 4.0 259.084 199.184 6.543 -123.361 | 8.512
## 20 4.5 258.864 199.623 6.550 -123.455 | 8.524
## 21 5.0 258.644 200.062 6.556 -123.550 | 8.538
# Shapiro Test
shapiro.test(dataf$strength)
##
## Shapiro-Wilk normality test
##
## data: dataf$strength
## W = 0.9798, p-value = 9.01e-11
#Plots for checking normality
qqnorm(residuals(model))
qqline(residuals(model))
plot(fitted(model),residuals(model))
From the results of the ridge analysis for points within the radius of 5 in the canonical path, we see that the best point for next experimentation is at a distance of 1.5.
The Shapiro-Wilk normality test does not return a p-value which is very small (less than 0.1) which shows that the data is not perfectly normally distributed.The Normal Q-Q plot conforms normality for the residuals.
The scatter plot also indicates no trend i.e. there is uniform scatter over the entire dynamic range.
Wikipedia
Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/