# Loading required packages

library(FrF2)
## Warning: package 'FrF2' was built under R version 3.1.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.1.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.1.2
## 
## Attaching package: 'DoE.base'
## 
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## 
## The following object is masked from 'package:graphics':
## 
##     plot.design
library(DoE.base)
library(conf.design)
library(gridBase)
## Warning: package 'gridBase' was built under R version 3.1.2
library(rsm)
## Warning: package 'rsm' was built under R version 3.1.2
library(rgl)
## Warning: package 'rgl' was built under R version 3.1.2

Setting

Design: This experiment is designed to enable us to estimate the interaction and quadratic effects, and thus give us an idea of the shape of the response surface we are investigating. For this reason, this experiment is termed response surface method (RSM) designs. RSM designs are used for the following objectives: 1.Find improved or optimal process settings 2.Troubleshoot process problems and weak points 3. Make a product or process more robust (insensitive) against external and non-controllable influences.

The test: Null Hypothesis can be stated as:

“There is no effect of the variability in each factor/independent variable used (Cement, Fly ash, Water and Superplasticizer) on the overall strength (response variable) of the concrete mixture”

OR

“The variability in the strength of the mixture cannot be explained by the variability in any of the 4 factors (interaction or main effect).

Data Summary and Preliminary analysis

The data is a subset of a large dataset involving strength of materials. We consider only the composition and strength of concrete mixture. Independent variables are the various constituents (7 in this case)that are used to make the concrete mixture used in a variety of construction applications (buildings and bridges).

Data Type: multivariate

Abstract: Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. These ingredients include cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, and fine aggregate.

Source:See References

Data Characteristics(Randomization scheme):

The actual concrete compressive strength (MPa) for a given mixture under a specific age (days) was determined from laboratory. Data is in raw form (not scaled).

Summary Statistics:

Number of instances (observations): 1030 Number of Attributes: 9 Attribute breakdown: 8 quantitative input variables, and 1 quantitative output variable Missing Attribute Values: None

A summmary of the given dataset and exploratory data analysis is presented here

# Selecting the data file from the local machine

dataf= (read.csv(file.choose(), header=T))
attach(dataf)

# Summary of the original data

summary(dataf)
##      Cement      Blast.Furnace.Slag    Fly.Ash           Water      
##  Min.   :102.0   Min.   :  0.0      Min.   :  0.00   Min.   :121.8  
##  1st Qu.:192.4   1st Qu.:  0.0      1st Qu.:  0.00   1st Qu.:164.9  
##  Median :272.9   Median : 22.0      Median :  0.00   Median :185.0  
##  Mean   :281.2   Mean   : 73.9      Mean   : 54.19   Mean   :181.6  
##  3rd Qu.:350.0   3rd Qu.:142.9      3rd Qu.:118.30   3rd Qu.:192.0  
##  Max.   :540.0   Max.   :359.4      Max.   :200.10   Max.   :247.0  
##  Superplasticizer Coarse.Aggregate Fine.Aggregate       Age        
##  Min.   : 0.000   Min.   : 801.0   Min.   :594.0   Min.   :  1.00  
##  1st Qu.: 0.000   1st Qu.: 932.0   1st Qu.:731.0   1st Qu.:  7.00  
##  Median : 6.400   Median : 968.0   Median :779.5   Median : 28.00  
##  Mean   : 6.205   Mean   : 972.9   Mean   :773.6   Mean   : 45.66  
##  3rd Qu.:10.200   3rd Qu.:1029.4   3rd Qu.:824.0   3rd Qu.: 56.00  
##  Max.   :32.200   Max.   :1145.0   Max.   :992.6   Max.   :365.00  
##     strength    
##  Min.   : 2.33  
##  1st Qu.:23.71  
##  Median :34.45  
##  Mean   :35.82  
##  3rd Qu.:46.13  
##  Max.   :82.60
str(dataf)
## 'data.frame':    1030 obs. of  9 variables:
##  $ Cement            : num  540 540 332 332 199 ...
##  $ Blast.Furnace.Slag: num  0 0 142 142 132 ...
##  $ Fly.Ash           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Water             : num  162 162 228 228 192 228 228 228 228 228 ...
##  $ Superplasticizer  : num  2.5 2.5 0 0 0 0 0 0 0 0 ...
##  $ Coarse.Aggregate  : num  1040 1055 932 932 978 ...
##  $ Fine.Aggregate    : num  676 676 594 594 826 ...
##  $ Age               : int  28 28 270 365 360 90 365 28 28 28 ...
##  $ strength          : num  80 61.9 40.3 41 44.3 ...

Exploratory Data Analysis

We perform an exploratory data analysis on the new data frame created above for a fractional factorial design.

# Generate 3 plots.
par(mfrow=c(2,2))
qqnorm(dataf$strength)
qqline(dataf$strength, col = 2)
boxplot(dataf$strength, horizontal=TRUE, main="Box Plot", xlab="Strength")
hist(dataf$strength, main="Histogram", xlab="Strength")
par(mfrow=c(1,1))

par(mfrow=c(2,2))
plot(dataf$strength~dataf$Cement,xlab="Cement in (kg/m^3)",ylab="Strength in Mpa")
plot(dataf$strength~dataf$Water,xlab="Water in (kg/m^3)",ylab="Strength in Mpa")
plot(dataf$strength~dataf$Fly.Ash,xlab="Fly Ash in (kg/m^3)",ylab="Strength in Mpa")
plot(dataf$strength~dataf$Superplasticizer,xlab="Superplasticizer in (kg/m^3)",ylab="Strength in Mpa")

Result:There is clearly a normal distribution in the way the response variable is distributed around the mean in the histogram. The boxplot does indicate some outliers, however for the most part the data is normally distributed (as is also evident by the Normal Q-Q plot).

# Generate boxplots

boxplot(dataf$strength~Cement, data=dataf, main="Strength by Cement", xlab="Cement",ylab="Strength")

boxplot(dataf$strength~Fly.Ash, data=dataf, main="Strength by Fly Ash", xlab="Fly Ash ",ylab="Strength")

boxplot(dataf$strength~Water, data=dataf, main="Strength by Water", xlab="Water",ylab="Strength")

boxplot(dataf$strength~Superplasticizer, data=dataf, main="Strength by Superplasticizer", xlab="Superplasticizer",ylab="Strength")

Result: All the 4 factors show variability at all levels and can be assumed to have an impact on the response variable. However, we cannot derive anything conclusive from this exploratory data analysis. It only provides us with the direction of our next level analysis.

Response surface designs

As mentioned before, Response-surface methodology consists of a collection of methods for exploring the optimum operating conditions through experimental methods. Essentially, this involves doing several experiments, and then using the results of one experiment to provide direction for the next steps.

The next action could be to perform the experiment around a different set of conditions, or to collect more data in the current experimental domain in order to fit a higher-order model or simply to confirm to what we have found.

In the next step we fit a response surface model to the given data set. This is an extension of lm, and works almost like it; however, the model formula for rsm must make use of the special functions FO, TWI, PQ, or SO (for first-order,two-way interaction, pure quadratic and second-order respectively), because the presence of these specifies the response-surface portion of the model. (Reference : Response-Surface Methods in R, Using rsm Updated to version 2.00, 5 December 2012)

model=rsm(strength ~ SO(Fly.Ash,Water,Superplasticizer,Cement), data=dataf) 
summary(model)
## 
## Call:
## rsm(formula = strength ~ SO(Fly.Ash, Water, Superplasticizer, 
##     Cement), data = dataf)
## 
##                             Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept)              -6.9084e+00  4.2965e+01 -0.1608 0.8722892    
## Fly.Ash                   3.9174e-01  1.0522e-01  3.7231 0.0002076 ***
## Water                    -3.3649e-01  3.6818e-01 -0.9139 0.3609740    
## Superplasticizer          4.2723e+00  1.3867e+00  3.0808 0.0021199 ** 
## Cement                    2.7183e-01  6.6724e-02  4.0739 4.985e-05 ***
## Fly.Ash:Water            -2.3753e-03  5.1324e-04 -4.6281 4.169e-06 ***
## Fly.Ash:Superplasticizer -1.1084e-02  2.1610e-03 -5.1290 3.488e-07 ***
## Fly.Ash:Cement            1.0844e-04  9.3983e-05  1.1539 0.2488305    
## Water:Superplasticizer   -3.8639e-03  6.7919e-03 -0.5689 0.5695532    
## Water:Cement             -1.1319e-03  2.9734e-04 -3.8067 0.0001493 ***
## Superplasticizer:Cement  -2.7978e-03  1.1653e-03 -2.4009 0.0165344 *  
## Fly.Ash^2                 3.2641e-04  1.8937e-04  1.7236 0.0850757 .  
## Water^2                   2.1972e-03  8.3489e-04  2.6318 0.0086225 ** 
## Superplasticizer^2       -7.4753e-02  1.4770e-02 -5.0611 4.946e-07 ***
## Cement^2                  2.9066e-05  3.6103e-05  0.8051 0.4209646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.4355, Adjusted R-squared:  0.4277 
## F-statistic: 55.93 on 14 and 1015 DF,  p-value: < 2.2e-16
## 
## Analysis of Variance Table
## 
## Response: strength
##                                                 Df Sum Sq Mean Sq  F value
## FO(Fly.Ash, Water, Superplasticizer, Cement)     4 102414 25603.5 160.3110
## TWI(Fly.Ash, Water, Superplasticizer, Cement)    6  15524  2587.4  16.2004
## PQ(Fly.Ash, Water, Superplasticizer, Cement)     4   7130  1782.5  11.1607
## Residuals                                     1015 162107   159.7         
## Lack of fit                                    390  59704   153.1   0.9343
## Pure error                                     625 102403   163.8         
##                                                  Pr(>F)
## FO(Fly.Ash, Water, Superplasticizer, Cement)  < 2.2e-16
## TWI(Fly.Ash, Water, Superplasticizer, Cement) < 2.2e-16
## PQ(Fly.Ash, Water, Superplasticizer, Cement)  7.293e-09
## Residuals                                              
## Lack of fit                                      0.7688
## Pure error                                             
## 
## Stationary point of response surface:
##          Fly.Ash            Water Superplasticizer           Cement 
##       260.842220       195.671752         6.491437      -122.606234 
## 
## Eigenanalysis:
## $values
## [1]  0.0028849786  0.0002362323  0.0000000000 -0.0752385223
## 
## $vectors
##                         [,1]        [,2]       [,3]        [,4]
## Fly.Ash          -0.43958437  0.87048383  0.2088571 -0.07349947
## Water             0.87806519  0.36811032  0.3046530 -0.02613219
## Superplasticizer  0.01292765 -0.06785135 -0.0407760 -0.99677800
## Cement           -0.18867958 -0.31960739  0.9283871 -0.01866945

Results: From the p-values of the various factor effects we see that there is no effect of the ‘Water’ on the response variable. However, there are some significant main effects, interaction effects and second-order effects therefore we can reject the null hypothesis based on these results.

From the analysis of variance, it is evident that the second-order (TWI and PQ) terms contribute significantly to the model, so the canonical analysis is relevant. Also, the stationary point is quite close the experimental region, but the eigenvalues are of mixed sign (both positive and negative), indicating that it is a saddle point (neither a maximum nor a minimum). We will do further analysis based on these results in order to arrive at some conclusion.

Display the Response surface

par(mfrow=c(2,3))
contour(model, ~Cement+Fly.Ash+Water+Superplasticizer, image=TRUE, at=summary(model$canonical$xs))

# Generate plots taking two factors at a time

par(mfrow=c(1,1))

persp(model, ~ Cement+Fly.Ash, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model, ~ Fly.Ash+Water, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model, ~ Water+Superplasticizer, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model, ~ Superplasticizer+Cement, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model, ~ Cement+Water, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model, ~ Superplasticizer+Fly.Ash, image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Strength in MPa",col.lab=33,contour="colors")
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

Diagnostics/Model Adequacy Checking

For further investigation we can use steepest descent method in case we encounter a saddle point.In the second-order case, the steepest function still works, but it uses the ridge analysis method which is the analog of steepest ascent/descent in the sense that for a specified distance d, it finds the point at which the predicted response is a maximum/minimum among all predictor combinations at radius d.

Model adequacy checking is done to test the validity of our model given the numerous assumptions it is based on.

The shapiro test evaluates the Null hypothesis such that “the samples come from a Normal distribution” against the alternative hypothesis “the samples do not come from a Normal distribution”.

The Q-Q Norm plots also test the normality of the sample. The scatter plots of the residuals and the fitted model will check the distribution of the model over the entire dynamic range.

# Finding direction for further analysis

steepest(model)
## Path of steepest ascent from ridge analysis:
##    dist Fly.Ash  Water Superplasticizer Cement |   yhat
## 1   0.0   0.000  0.000            0.000  0.000 | -6.908
## 2   0.5   0.046 -0.040            0.495  0.032 | -4.772
## 3   1.0   0.092 -0.082            0.990  0.065 | -2.672
## 4   1.5   0.138 -0.126            1.485  0.098 | -0.608
## 5   2.0   0.184 -0.172            1.980  0.133 |  1.420
## 6   2.5   0.232 -0.221            2.474  0.168 |  3.409
## 7   3.0   0.279 -0.272            2.968  0.205 |  5.362
## 8   3.5   0.327 -0.325            3.461  0.242 |  7.276
## 9   4.0   0.375 -0.381            3.954  0.281 |  9.154
## 10  4.5   0.424 -0.440            4.447  0.321 | 10.997
## 11  5.0   0.474 -0.503            4.939  0.362 | 12.802
canonical.path(model, dist = seq(-5, 5, by = 0.5))
##    dist Fly.Ash   Water Superplasticizer   Cement |  yhat
## 1  -5.0 263.040 191.281            6.427 -121.663 | 8.538
## 2  -4.5 262.820 191.720            6.433 -121.757 | 8.524
## 3  -4.0 262.601 192.159            6.440 -121.852 | 8.512
## 4  -3.5 262.381 192.599            6.446 -121.946 | 8.501
## 5  -3.0 262.161 193.038            6.453 -122.040 | 8.492
## 6  -2.5 261.941 193.477            6.459 -122.135 | 8.484
## 7  -2.0 261.721 193.916            6.466 -122.229 | 8.477
## 8  -1.5 261.502 194.355            6.472 -122.323 | 8.472
## 9  -1.0 261.282 194.794            6.479 -122.418 | 8.468
## 10 -0.5 261.062 195.233            6.485 -122.512 | 8.466
## 11  0.0 260.842 195.672            6.491 -122.606 | 8.466
## 12  0.5 260.622 196.111            6.498 -122.701 | 8.466
## 13  1.0 260.403 196.550            6.504 -122.795 | 8.468
## 14  1.5 260.183 196.989            6.511 -122.889 | 8.472
## 15  2.0 259.963 197.428            6.517 -122.984 | 8.477
## 16  2.5 259.743 197.867            6.524 -123.078 | 8.484
## 17  3.0 259.523 198.306            6.530 -123.172 | 8.492
## 18  3.5 259.304 198.745            6.537 -123.267 | 8.501
## 19  4.0 259.084 199.184            6.543 -123.361 | 8.512
## 20  4.5 258.864 199.623            6.550 -123.455 | 8.524
## 21  5.0 258.644 200.062            6.556 -123.550 | 8.538
# Shapiro Test

shapiro.test(dataf$strength)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataf$strength
## W = 0.9798, p-value = 9.01e-11
#Plots for checking normality

qqnorm(residuals(model)) 
qqline(residuals(model))

plot(fitted(model),residuals(model))

From the results of the ridge analysis for points within the radius of 5 in the canonical path, we see that the best point for next experimentation is at a distance of 1.5.

The Shapiro-Wilk normality test does not return a p-value which is very small (less than 0.1) which shows that the data is not perfectly normally distributed.The Normal Q-Q plot conforms normality for the residuals.

The scatter plot also indicates no trend i.e. there is uniform scatter over the entire dynamic range.

References

Wikipedia

Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/