# Loading required packages

library(FrF2)
## Warning: package 'FrF2' was built under R version 3.1.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.1.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.1.2
## 
## Attaching package: 'DoE.base'
## 
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## 
## The following object is masked from 'package:graphics':
## 
##     plot.design
library(DoE.base)
library(conf.design)
library(gridBase)
## Warning: package 'gridBase' was built under R version 3.1.2

Setting

Design: This is a 2^k-1 (k=6 in this case) design which involves creation of a factorial design with exactly 2 levels.The reason behind using this approach is to keep experimental ‘costs’ in control, which means that we need to take measurements (in an experiment) for a carefully choosen limited number of factor levels.This technique ensures that the main effects and low-order interaction effects can be estimated and tested, at the expense of high-order interactions.The scientific rationale behind this approach is that it is unlikely that there are significant complex interactions among various factors so we can assume that there are probably only main effects and a few low-order interactions. Now the first step in this analysis was to acquire data that fullfills this requirement. Since the data set available has multiple levels for each factor therefore we transform the data by finding the mean of the numerical values for each factor and then defining 2 levels (high and low) based on whether the value is above the average or below the average computed.

The test:

Null Hypothesis can be stated as:

“There is no effect of the amount of each factor/independent variable used on the overall strength of the concrete mixture”

OR

“The variability in the strength of the mixture cannot be explained by the variability in any of the 6 factors (interaction or main effect).

Data Summary and Preliminary analysis

The data is a subset of a large dataset involving strength of materials. We consider only the composition and strength of concrete mixture. Independent variables are the various constituents (only 6 are considered in this case)that are used to make the concrete mixture used in a variety of construction applications (buildings and bridges).

Data Type: multivariate

Abstract: Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. These ingredients include cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, and fine aggregate.

Source:See References

Data Characteristics(Randomization scheme):

The actual concrete compressive strength (MPa) for a given mixture under a specific age (days) was determined from laboratory. Data is in raw form (not scaled).

Summary Statistics:

Number of instances (observations): 1030 Number of Attributes: 9 Attribute breakdown: 8 quantitative input variables, and 1 quantitative output variable Missing Attribute Values: None

Data used for this experiment: As described above, we modified the data set with 2 levels for each factor (high or low) and out of the 1030 observations only first 64 records are used.Also we use the following naming conventions for the factors:

C: Cement B: Blast Furnace G: Fly Ash W: Water S: Superplasticizer CA: Coarse Aggregate

A summmary of the given dataset and exploratory data analysis is presented here. Also we perform an initial ANOVA (Analysis of variance test) to test our initial null hypothesis. Only modified data (described above) is used for this entire analysis.

# Selecting the data file from the local machine

dataf<- (read.csv(file.choose(), header=T))

# Summary of the original data

summary(dataf)
##        C               B                G           W        
##  Min.   :200.0   Min.   :0.0000   Min.   :0   Min.   :100.0  
##  1st Qu.:200.0   1st Qu.:0.0000   1st Qu.:0   1st Qu.:200.0  
##  Median :300.0   Median :1.0000   Median :0   Median :200.0  
##  Mean   :262.5   Mean   :0.6406   Mean   :0   Mean   :196.9  
##  3rd Qu.:300.0   3rd Qu.:1.0000   3rd Qu.:0   3rd Qu.:200.0  
##  Max.   :300.0   Max.   :1.0000   Max.   :0   Max.   :200.0  
##        S               CA               FA              Age       
##  Min.   :0.000   Min.   : 500.0   Min.   : 500.0   Min.   :  3.0  
##  1st Qu.:0.000   1st Qu.: 500.0   1st Qu.: 500.0   1st Qu.: 28.0  
##  Median :0.000   Median : 500.0   Median : 500.0   Median : 90.0  
##  Mean   :0.375   Mean   : 609.4   Mean   : 601.6   Mean   :154.1  
##  3rd Qu.:1.000   3rd Qu.: 500.0   3rd Qu.: 500.0   3rd Qu.:270.0  
##  Max.   :1.000   Max.   :1000.0   Max.   :1000.0   Max.   :365.0  
##     strength    
##  Min.   : 8.06  
##  1st Qu.:30.58  
##  Median :41.10  
##  Mean   :39.19  
##  3rd Qu.:47.86  
##  Max.   :79.99
str(dataf)
## 'data.frame':    64 obs. of  9 variables:
##  $ C       : int  300 300 300 300 200 200 300 300 200 300 ...
##  $ B       : int  0 0 1 1 1 1 1 1 1 0 ...
##  $ G       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ W       : int  100 100 200 200 200 200 200 200 200 200 ...
##  $ S       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CA      : int  1000 1000 500 500 1000 500 500 500 500 500 ...
##  $ FA      : int  500 500 500 500 1000 500 500 500 500 500 ...
##  $ Age     : int  28 28 270 365 360 90 365 28 28 28 ...
##  $ strength: num  80 61.9 40.3 41 42.3 ...
#ANOVA on the initial dataframe


modelinit=aov(dataf$strength~(C*B*G*W*S*CA),data=dataf)
anova(modelinit)
## Analysis of Variance Table
## 
## Response: dataf$strength
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## C          1 2996.5 2996.51 34.8963 2.542e-07 ***
## B          1   22.8   22.79  0.2655 0.6085390    
## W          1 1458.1 1458.07 16.9801 0.0001333 ***
## S          1  147.0  146.98  1.7116 0.1964198    
## CA         1 1437.3 1437.25 16.7378 0.0001468 ***
## C:S        1  122.3  122.35  1.4248 0.2379311    
## B:S        1   79.1   79.12  0.9214 0.3414635    
## C:CA       1   23.3   23.26  0.2709 0.6048924    
## B:CA       1  344.7  344.73  4.0146 0.0502313 .  
## S:CA       1  621.8  621.80  7.2413 0.0095099 ** 
## Residuals 53 4551.1   85.87                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Result: As can be seen from the results above, there is a significant effect of the factors Water (W) and Coarse Aggregate (CA) and a significant effect of Cement (C) on the response variable. There is no effect of the variable Superplasticizer (S) as can be seen. We can also see an interaction effect of the variables S and CA with a p-value of 0.0172 (<0.05). Based on these observations we reject the null hypothesis in our initial analysis.There is some interaction between variables B and CA although less pronounced.

Fractional factorial designs

As mentioned before, the dataset has 6 factors with two levels.Response variable is the strength of the concrete mixture.Here we create a completely new design matrix and then perform ANOVA on it.

Frdesign<-FrF2(32,6,factor.names=c("C","B","G","W","S","CA"),estimable=formula("~C+B+G+W+S+CA+C:(B+G+W+S+CA)"), clear=FALSE)

Frdesign
##     C  B  G  W  S CA
## 1   1  1  1 -1 -1  1
## 2   1  1 -1 -1 -1 -1
## 3  -1  1  1 -1  1  1
## 4  -1  1 -1  1  1  1
## 5  -1 -1  1 -1 -1  1
## 6   1  1 -1 -1  1  1
## 7  -1  1  1 -1 -1 -1
## 8   1 -1 -1  1 -1 -1
## 9  -1  1 -1 -1  1 -1
## 10 -1  1 -1 -1 -1  1
## 11 -1 -1  1  1 -1 -1
## 12  1 -1  1 -1  1  1
## 13  1  1 -1  1 -1  1
## 14  1  1  1  1 -1 -1
## 15 -1  1  1  1  1 -1
## 16  1  1  1  1  1  1
## 17 -1  1  1  1 -1  1
## 18 -1  1 -1  1 -1 -1
## 19 -1 -1  1 -1  1 -1
## 20  1 -1  1 -1 -1 -1
## 21 -1 -1 -1 -1 -1 -1
## 22  1 -1 -1 -1 -1  1
## 23 -1 -1 -1  1 -1  1
## 24 -1 -1 -1  1  1 -1
## 25  1 -1 -1 -1  1 -1
## 26  1 -1  1  1 -1  1
## 27  1  1 -1  1  1 -1
## 28  1 -1 -1  1  1  1
## 29 -1 -1 -1 -1  1  1
## 30  1 -1  1  1  1 -1
## 31 -1 -1  1  1  1  1
## 32  1  1  1 -1  1 -1
## class=design, type= FrF2.estimable
aliasprint(Frdesign)
## $legend
## [1] A=C  B=B  C=G  D=W  E=S  F=CA
## 
## [[2]]
## [1] no aliasing among main effects and 2fis
# Creating a new datframe

L<-dataf[1:32,]

Frdesign$strength<-L$strength

head(Frdesign)
##    C  B  G  W  S CA strength
## 1  1  1  1 -1 -1  1    79.99
## 2  1  1 -1 -1 -1 -1    61.89
## 3 -1  1  1 -1  1  1    40.27
## 4 -1  1 -1  1  1  1    41.05
## 5 -1 -1  1 -1 -1  1    42.30
## 6  1  1 -1 -1  1  1    47.03

From the above analysis we obtain a new data frame each with 2 levels (1 and -1). As can be seen from the aliasing output that there is no aliasing between any of the main effects with the main effects or 2 factor interaction effects. Also, none of the 2 factor interactions are aliased with other 2 factor interactions. Going by the definition this can be termed as a resolution 4 design.

Exploratory Data Analysis

We perform an exploratory data analysis on the new data frame created above for a fractional factorial design.

# Generate 3 plots.
par(bg=rgb(1,1,0.8), mfrow=c(2,2))
qqnorm(Frdesign$strength)
qqline(Frdesign$strength, col = 2)
boxplot(Frdesign$strength, horizontal=TRUE, main="Box Plot", xlab="Strength")
hist(Frdesign$strength, main="Histogram", xlab="Strength")
par(mfrow=c(1,1))

Result:There is clearly a normal distribution in the way the response variable is distributed around the mean (40-50)in the histogram. The boxplot does indicate some outliers, however for the most part the data is normally distributed (as is also evident by the Normal Q-Q plot).This analysis therefore supports our overall experimental idea of conducting an ANOVA on this dataset.

# Generate boxplots

# Since the levels have been modified therefore we do not explicitly show the units of the factors in the boxplots. However the initial unit for each factor is kg/m^3

par(bg=rgb(1,1,0.8),mfrow=c(2,3))
boxplot(Frdesign$strength~C, data=Frdesign, main="Strength by Cement",
        xlab="Cement",ylab="Strength")

boxplot(Frdesign$strength~B, data=Frdesign, main="Strength by Blast Furnace",
        xlab="Blast Furnace",ylab="Strength")

boxplot(Frdesign$strength~G, data=Frdesign, main="Strength by Fly Ash",
        xlab="Fly Ash ",ylab="Strength")

boxplot(Frdesign$strength~W, data=Frdesign, main="Strength by Water",
        xlab="Water",ylab="Strength")

boxplot(Frdesign$strength~S, data=Frdesign, main="Strength by Superplasticizer",
        xlab="Superplasticizer",ylab="Strength")

boxplot(Frdesign$strength~CA, data=Frdesign, main="Strength by Coarse Aggregate",
        xlab="Coarse aggregate",ylab="Strength")

par(mfrow=c(1,1))

Result: Various factors appear to affect the response variable. Most significant ones seem to be Water (W), Coarse Aggregate (GA) and Fly Ash (G). This is somewhat close to what we found from our initial testing of the original data frame. However, we cannot derive anything conclusive from this exploratory data analysis. It only provides us with the direction of our next level analysis.

Testing-Analysis of Variance (ANOVA)

To test our null hypothesis with our fractional design dataframe we perform an analysis of variance (ANOVA) to analyze the effects of the factors on the response variable.

Since this is a 2^(6-1) experiment therefore we can expect various higher order terms. Therefore, we start by assuming that all higher interaction terms are non-existent. It’s very rare for such high-order interactions to be significant, and they are very complex from the interpretation point of view. This assumption allows us to calculate the sums of squares for these terms and use that to estimate an error term.

If we find a few significant effects of any of the factors (main effect or interaction effect) we can also perform a stepwise regression to eliminate unnecessary terms. By a combination of stepwise regression and the removal of remaining terms with a p-value larger than 0.05, we can quickly arrive at a model with an intercept and just the significant effect terms.

## Fit a model with up to second order interactions.

Frdesign$C=as.factor(Frdesign$C)

Frdesign$B=as.factor(Frdesign$B)

Frdesign$G=as.factor(Frdesign$G)

Frdesign$W=as.factor(Frdesign$W)

Frdesign$S=as.factor(Frdesign$S)

Frdesign$CA=as.factor(Frdesign$CA)

modelfin= aov(strength~(C+B+G+W+S+CA)^2,data=Frdesign)

summary(modelfin)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## C            1  551.9   551.9   3.103 0.1087  
## B            1  210.6   210.6   1.184 0.3021  
## G            1  147.6   147.6   0.830 0.3837  
## W            1  111.1   111.1   0.625 0.4476  
## S            1    1.2     1.2   0.007 0.9371  
## CA           1  112.5   112.5   0.632 0.4450  
## C:B          1    7.4     7.4   0.042 0.8425  
## C:G          1   61.0    61.0   0.343 0.5710  
## C:W          1   16.6    16.6   0.094 0.7660  
## C:S          1   38.8    38.8   0.218 0.6504  
## C:CA         1  222.1   222.1   1.249 0.2899  
## B:G          1  140.2   140.2   0.788 0.3954  
## B:W          1    7.3     7.3   0.041 0.8435  
## B:S          1  524.6   524.6   2.949 0.1167  
## B:CA         1   12.8    12.8   0.072 0.7936  
## G:W          1    0.5     0.5   0.003 0.9570  
## G:S          1    1.4     1.4   0.008 0.9303  
## G:CA         1  681.1   681.1   3.829 0.0789 .
## W:S          1  485.7   485.7   2.731 0.1294  
## W:CA         1    3.5     3.5   0.020 0.8907  
## S:CA         1  247.4   247.4   1.391 0.2656  
## Residuals   10 1778.7   177.9                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: From the results of ANOVA we can see that there is no significant effect of any of the variables on the response variable. The only p value which is slightly significant is that of only one factor, so it might have an effect on the strength of the mixture. No significant interaction effects can be seen either (except 1). Therefore we fail to reject the null hypothesis and so we can say that the variation is strength can be attributed to randomization only. The important point to note here is that this result is completely the opposite of the result we got for the initial ANOVA test when the design was not a fractional factorial design.

A separate calculation of main effects and interactions effect can be done if there is some significant effect of a particular factor or interaction of 2 factors on the response variable.This means we can perform multiple one-way ANOVA tests to find out the exact impact of a particular variable on the response variable. However, in this case no such effect was found therefore we discard this step here.

Diagnostics/Model Adequacy Checking

Model adequacy checking is done to test the validity of ANOVA given the numerous assumptions it is based on.

The shapiro test evaluates the Null hypothesis such that “the samples come from a Normal distribution” against the alternative hypothesis “the samples do not come from a Normal distribution”.

The Q-Q Norm plots also test the normality of the sample. The scatter plots of the residuals and the fitted model will check the distribution of the model over the entire dynamic range.

Tukey’s range test:When we conduct an analysis of variance (ANOVA), the null hypothesis considered is that there is no difference in treatments mean, so once we reject the null hypothesis, it is essential to find what levels significantly differ.

# Shapiro Test

shapiro.test(Frdesign$strength)
## 
##  Shapiro-Wilk normality test
## 
## data:  Frdesign$strength
## W = 0.9638, p-value = 0.3468
#Plots for checking normality

par(bg=rgb(1,1,0.8),mfrow=c(2,3))
qqnorm(residuals(modelfin)) 
qqline(residuals(modelfin))
plot(fitted(modelfin),residuals(modelfin))

# Tukey's range test

tukey1 <- TukeyHSD(modelfin, ordered = FALSE, conf.level = 0.95)
tukey1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov.default(formula = strength ~ (C + B + G + W + S + CA)^2, data = Frdesign)
## 
## $C
##          diff       lwr      upr     p adj
## 1--1 8.305625 -2.200777 18.81203 0.1086551
## 
## $B
##          diff       lwr      upr     p adj
## 1--1 5.130625 -5.375777 15.63703 0.3020905
## 
## $G
##          diff       lwr      upr     p adj
## 1--1 4.295625 -6.210777 14.80203 0.3837337
## 
## $W
##           diff       lwr      upr     p adj
## 1--1 -3.726875 -14.23328 6.779527 0.4476468
## 
## $S
##          diff       lwr      upr     p adj
## 1--1 0.381875 -10.12453 10.88828 0.9370511
## 
## $CA
##          diff       lwr      upr     p adj
## 1--1 3.749375 -6.757027 14.25578 0.4449911
## 
## $`C:B`
##                diff        lwr      upr     p adj
## 1:-1--1:-1  9.26750 -11.133748 29.66875 0.5325350
## -1:1--1:-1  6.09250 -14.308748 26.49375 0.7983129
## 1:1--1:-1  13.43625  -6.964998 33.83750 0.2454284
## -1:1-1:-1  -3.17500 -23.576248 17.22625 0.9626703
## 1:1-1:-1    4.16875 -16.232498 24.57000 0.9216406
## 1:1--1:1    7.34375 -13.057498 27.74500 0.6968174
## 
## $`C:G`
##                diff        lwr      upr     p adj
## 1:-1--1:-1 11.06750  -9.333748 31.46875 0.3916134
## -1:1--1:-1  7.05750 -13.343748 27.45875 0.7208917
## 1:1--1:-1  12.60125  -7.799998 33.00250 0.2913486
## -1:1-1:-1  -4.01000 -24.411248 16.39125 0.9293260
## 1:1-1:-1    1.53375 -18.867498 21.93500 0.9954302
## 1:1--1:1    5.54375 -14.857498 25.94500 0.8385374
## 
## $`C:W`
##                 diff       lwr       upr     p adj
## 1:-1--1:-1   9.74750 -10.65375 30.148748 0.4929785
## -1:1--1:-1  -2.28500 -22.68625 18.116248 0.9853492
## 1:1--1:-1    4.57875 -15.82250 24.979998 0.8998654
## -1:1-1:-1  -12.03250 -32.43375  8.368748 0.3261445
## 1:1-1:-1    -5.16875 -25.57000 15.232498 0.8639313
## 1:1--1:1     6.86375 -13.53750 27.264998 0.7369584
## 
## $`C:S`
##                diff        lwr      upr     p adj
## 1:-1--1:-1 10.50875  -9.892498 30.91000 0.4330057
## -1:1--1:-1  2.58500 -17.816248 22.98625 0.9791164
## 1:1--1:-1   8.68750 -11.713748 29.08875 0.5816043
## -1:1-1:-1  -7.92375 -28.324998 12.47750 0.6472382
## 1:1-1:-1   -1.82125 -22.222498 18.58000 0.9924280
## 1:1--1:1    6.10250 -14.298748 26.50375 0.7975500
## 
## $`C:CA`
##                diff        lwr      upr     p adj
## 1:-1--1:-1  3.03625 -17.364998 23.43750 0.9670565
## -1:1--1:-1 -1.52000 -21.921248 18.88125 0.9955500
## 1:1--1:-1  12.05500  -8.346248 32.45625 0.3247138
## -1:1-1:-1  -4.55625 -24.957498 15.84500 0.9011307
## 1:1-1:-1    9.01875 -11.382498 29.42000 0.5534417
## 1:1--1:1   13.57500  -6.826248 33.97625 0.2383851
## 
## $`B:G`
##                diff       lwr      upr     p adj
## 1:-1--1:-1  9.31750 -11.08375 29.71875 0.5283629
## -1:1--1:-1  8.48250 -11.91875 28.88375 0.5991657
## 1:1--1:-1   9.42625 -10.97500 29.82750 0.5193274
## -1:1-1:-1  -0.83500 -21.23625 19.56625 0.9992492
## 1:1-1:-1    0.10875 -20.29250 20.51000 0.9999983
## 1:1--1:1    0.94375 -19.45750 21.34500 0.9989183
## 
## $`B:W`
##                diff       lwr      upr     p adj
## 1:-1--1:-1  4.17500 -16.22625 24.57625 0.9213294
## -1:1--1:-1 -4.68250 -25.08375 15.71875 0.8939276
## 1:1--1:-1   1.40375 -18.99750 21.80500 0.9964819
## -1:1-1:-1  -8.85750 -29.25875 11.54375 0.5671117
## 1:1-1:-1   -2.77125 -23.17250 17.63000 0.9745406
## 1:1--1:1    6.08625 -14.31500 26.48750 0.7987893
## 
## $`B:S`
##                diff        lwr      upr     p adj
## 1:-1--1:-1 13.22875  -7.172498 33.63000 0.2562709
## -1:1--1:-1  8.48000 -11.921248 28.88125 0.5993803
## 1:1--1:-1   5.51250 -14.888748 25.91375 0.8407230
## -1:1-1:-1  -4.74875 -25.149998 15.65250 0.8900485
## 1:1-1:-1   -7.71625 -28.117498 12.68500 0.6650581
## 1:1--1:1   -2.96750 -23.368748 17.43375 0.9691099
## 
## $`B:CA`
##                diff       lwr      upr     p adj
## 1:-1--1:-1  3.86375 -16.53750 24.26500 0.9360291
## -1:1--1:-1  2.48250 -17.91875 22.88375 0.9814007
## 1:1--1:-1   8.88000 -11.52125 29.28125 0.5651994
## -1:1-1:-1  -1.38125 -21.78250 19.02000 0.9966461
## 1:1-1:-1    5.01625 -15.38500 25.41750 0.8737115
## 1:1--1:1    6.39750 -14.00375 26.79875 0.7746246
## 
## $`G:W`
##                diff       lwr      upr     p adj
## 1:-1--1:-1  4.03500 -16.36625 24.43625 0.9281438
## -1:1--1:-1 -3.98750 -24.38875 16.41375 0.9303809
## 1:1--1:-1   0.56875 -19.83250 20.97000 0.9997618
## -1:1-1:-1  -8.02250 -28.42375 12.37875 0.6387413
## 1:1-1:-1   -3.46625 -23.86750 16.93500 0.9523982
## 1:1--1:1    4.55625 -15.84500 24.95750 0.9011307
## 
## $`G:S`
##                diff       lwr      upr     p adj
## 1:-1--1:-1  4.71875 -15.68250 25.12000 0.8918135
## -1:1--1:-1  0.80500 -19.59625 21.20625 0.9993269
## 1:1--1:-1   4.67750 -15.72375 25.07875 0.8942177
## -1:1-1:-1  -3.91375 -24.31500 16.48750 0.9337784
## 1:1-1:-1   -0.04125 -20.44250 20.36000 0.9999999
## 1:1--1:1    3.87250 -16.52875 24.27375 0.9356383
## 
## $`G:CA`
##                diff        lwr      upr     p adj
## 1:-1--1:-1 -4.93125 -25.332498 15.47000 0.8790178
## -1:1--1:-1 -5.47750 -25.878748 14.92375 0.8431564
## 1:1--1:-1   8.04500 -12.356248 28.44625 0.6368045
## -1:1-1:-1  -0.54625 -20.947498 19.85500 0.9997889
## 1:1-1:-1   12.97625  -7.424998 33.37750 0.2699699
## 1:1--1:1   13.52250  -6.878748 33.92375 0.2410308
## 
## $`W:S`
##                 diff       lwr       upr     p adj
## 1:-1--1:-1 -11.51875 -31.92000  8.882498 0.3600128
## -1:1--1:-1  -7.41000 -27.81125 12.991248 0.6911988
## 1:1--1:-1   -3.34500 -23.74625 17.056248 0.9568509
## -1:1-1:-1    4.10875 -16.29250 24.509998 0.9245951
## 1:1-1:-1     8.17375 -12.22750 28.574998 0.6257193
## 1:1--1:1     4.06500 -16.33625 24.466248 0.9267113
## 
## $`W:CA`
##                diff       lwr      upr     p adj
## 1:-1--1:-1 -4.39125 -24.79250 16.01000 0.9101630
## -1:1--1:-1  3.08500 -17.31625 23.48625 0.9655524
## 1:1--1:-1   0.02250 -20.37875 20.42375 1.0000000
## -1:1-1:-1   7.47625 -12.92500 27.87750 0.6855656
## 1:1-1:-1    4.41375 -15.98750 24.81500 0.9089571
## 1:1--1:1   -3.06250 -23.46375 17.33875 0.9662515
## 
## $`S:CA`
##                diff       lwr      upr     p adj
## 1:-1--1:-1 -5.17875 -25.58000 15.22250 0.8632785
## -1:1--1:-1 -1.81125 -22.21250 18.59000 0.9925491
## 1:1--1:-1   4.13125 -16.27000 24.53250 0.9234942
## -1:1-1:-1   3.36750 -17.03375 23.76875 0.9560437
## 1:1-1:-1    9.31000 -11.09125 29.71125 0.5289880
## 1:1--1:1    5.94250 -14.45875 26.34375 0.8096355
plot(tukey1)    

The Shapiro-Wilk normality test does not return a p-value which is very small (less than 0.1) which shows that the data is not perfectly normally distributed.The Normal Q-Q plot conforms normality for the residuals.

The scatter plot however indicates some trend i.e. there is a uniform decrease in the residual values over the entire dynamic range.For the lower values of the dynamic range there are no trends but there seems to be no or fewer scatter points on the higher end of the dynamic range. This indicates that this might not be considered a good fit. There are essentially 2 possibilities: 1.The slope of the regression line was computed incorrectly since the data shows trend. 2.The intercept of the regression line might or might not have been computed correctly.

Tukey’s test:As can be seen from the pairwise comparison of both levels for 4 of the six factors, there is no significant difference in means at an alpha level of 0.05.For the factors ‘S’ and ‘B’ there seems to be a significant difference between the levels, given the value zero does not appear in the difference of one of these (for the factor B it appears on the extreme left).

References

Wikipedia

Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/