The following experimental analysis looks at a data set which contains values regarding the grinding parameters of a hypothetical material X. There were 64 total runs taken to determine the strength of the material. Below, I load the CSV which contains the data, and display the first 6 rows of the data set as an example of the organization of the data, as well as the structure of the data below that. Lastly I load the package FrF2 which will be used in creating a fractional factorial design for this experimental data analysis.
CSV<-read.csv("C:\\Users\\Anthony\\Desktop\\School\\RPI Year 1\\DoE\\DoeR8.csv",header=TRUE)
head(CSV)
## Table Feed Grit Direction Batch Concentration Strength
## 1 -1 -1 -1 -1 -1 -1 680
## 2 1 -1 -1 -1 -1 -1 722
## 3 -1 1 -1 -1 -1 -1 702
## 4 1 1 -1 -1 -1 -1 667
## 5 -1 -1 1 -1 -1 -1 704
## 6 1 -1 1 -1 -1 -1 642
str(CSV)
## 'data.frame': 64 obs. of 7 variables:
## $ Table : int -1 1 -1 1 -1 1 -1 1 -1 1 ...
## $ Feed : int -1 -1 1 1 -1 -1 1 1 -1 -1 ...
## $ Grit : int -1 -1 -1 -1 1 1 1 1 -1 -1 ...
## $ Direction : int -1 -1 -1 -1 -1 -1 -1 -1 1 1 ...
## $ Batch : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ Concentration: int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ Strength : int 680 722 702 667 704 642 693 669 492 476 ...
require(FrF2)
## Loading required package: FrF2
## Warning: package 'FrF2' was built under R version 3.1.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.1.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.1.2
##
## Attaching package: 'DoE.base'
##
## The following objects are masked from 'package:stats':
##
## aov, lm
##
## The following object is masked from 'package:graphics':
##
## plot.design
The factors involved in this experimental analysis are the Table Speed used during grinding (denoted as “Table”), the Down Feed Rate (denoted as “Feed”), the Wheel Grit (denoted as “Grit”), the Direction of the wheel (denoted as “Direction”), the Batch of the material used, and the Concentration of X in the material (denoted as “Concentration”). Each factor has two levels which are labelled as -1 or 1. The levels of table speed were -1=0.025 m/s and 1=0.125 m/s. The levels for down feed rate were -1=0.05mm and 1=0.125mm. The levels for wheel grit were -1=140/170 and 1=80/100. The factors for direction were -1=longitudinal and 1=transverse. The levels for batch were -1=batch 1 and 1=batch 2. And the factors for concentration were -1=low concentration of X and 1=high concentration of X.
This data set is organized into the 6 factors as previously mentioned with the response variable being the strength of the material. This response variable is continuous.
How will the experiment be organized and conducted to test the hypothesis?
In this experimental data analysis i will use a fractional factorial design to select 32 out of the total 64 experimental runs to perform data analysis on. I will do this using the R package “FrF2”. Before that I will perform data analysis on the full factorial design to compare to the results of the fractional factorial design.
What is the rationale for this design?
This design will be used to demonstrate proper implementation of a fractional factorial design in an experimental data analysis.
Below I define all six factors as factors for R to analyze.
CSV$Table=as.factor(CSV$Table)
CSV$Feed=as.factor(CSV$Feed)
CSV$Grit=as.factor(CSV$Grit)
CSV$Direction=as.factor(CSV$Direction)
CSV$Batch=as.factor(CSV$Batch)
CSV$Concentration=as.factor(CSV$Concentration)
Below I create box and whisker plots to represent any trends in the data.
par(mfrow=c(2,3))
plot(CSV$Strength~CSV$Table,xlab="Table speed",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Feed,xlab="Down Feed Rate",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Grit,xlab="Wheel Grit",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Direction,xlab="Direction",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Batch,xlab="Batch",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Concentration,xlab="Concentration of X",ylab="Strength (psi)")
Below is the initial analysis of variance (ANOVA) performed on the full factorial set of data.
model1=lm(Strength~Table+Feed+Grit+Direction+Batch+Concentration,data=CSV)
anova(model1)
## Analysis of Variance Table
##
## Response: Strength
## Df Sum Sq Mean Sq F value Pr(>F)
## Table 1 32 32 0.02 0.8993
## Feed 1 16288 16288 8.33 0.0055 **
## Grit 1 15221 15221 7.78 0.0072 **
## Direction 1 608205 608205 310.87 < 2e-16 ***
## Batch 1 88730 88730 45.35 8.8e-09 ***
## Concentration 1 1511 1511 0.77 0.3831
## Residuals 57 111517 1956
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results of the ANOVA show that in the full factorial experiment all factors except for table speed and concentration have statistically significant effects on the resulting material strength that can likely be attributed to something other than randomization. Thus we reject the null hypothesis which states that a factor does not have an effect on the response variable for those four factors. Furthermore we fail to reject this null hypothesis for the factors of table speed, and concentration of X.
Below I construct a design matrix for a 2^(6-1) experimental design, which in this case represents a one-half (32/64) fractional factorial experimental design. Furthermore, due to the large differences in the median values resulting from a change in Direction as seen in the box plots above, I am most interested in observing the main and interaction effects of this factor.
NewDesign=FrF2(32,nfactors=6,estimable=formula("~Table+Feed+Grit+Direction+Batch+Concentration+Direction:(Table+Feed+Grit+Direction+Batch+Concentration)"),factor.names=c("Table","Feed","Grit","Direction","Batch","Concentration"),res5=TRUE,clear=FALSE)
NewDesign
## Table Feed Grit Direction Batch Concentration
## 1 1 1 1 1 -1 -1
## 2 1 1 -1 -1 1 1
## 3 -1 1 -1 1 1 1
## 4 -1 -1 -1 -1 -1 -1
## 5 1 -1 -1 -1 -1 1
## 6 -1 1 1 1 -1 1
## 7 1 1 -1 1 1 -1
## 8 -1 1 -1 -1 -1 1
## 9 1 -1 -1 -1 1 -1
## 10 -1 -1 -1 -1 1 1
## 11 1 -1 -1 1 1 1
## 12 -1 -1 1 -1 1 -1
## 13 1 1 1 -1 1 -1
## 14 1 1 -1 1 -1 1
## 15 1 -1 1 -1 1 1
## 16 -1 -1 1 1 -1 -1
## 17 -1 -1 1 1 1 1
## 18 -1 1 1 -1 1 1
## 19 1 -1 1 -1 -1 -1
## 20 1 1 1 1 1 1
## 21 -1 1 -1 -1 1 -1
## 22 -1 -1 -1 1 1 -1
## 23 -1 1 -1 1 -1 -1
## 24 1 -1 -1 1 -1 -1
## 25 -1 -1 -1 1 -1 1
## 26 -1 1 1 -1 -1 -1
## 27 1 -1 1 1 1 -1
## 28 1 1 1 -1 -1 1
## 29 1 -1 1 1 -1 1
## 30 -1 1 1 1 1 -1
## 31 1 1 -1 -1 -1 -1
## 32 -1 -1 1 -1 -1 1
## class=design, type= FrF2.estimable
aliasprint(NewDesign)
## $legend
## [1] A=Table B=Feed C=Grit D=Direction
## [5] E=Batch F=Concentration
##
## [[2]]
## [1] no aliasing among main effects and 2fis
Here we use res5=TRUE to implement a resolution of 5 which means that the main effects and interaction effects will not be aliased with any other main or interaction effects.
Below I create a new data set which only uses the selected experimental runs from the fractional factorial design that I created previously.
NewData=merge(NewDesign,CSV,by=c("Table","Feed","Grit","Direction","Batch","Concentration"),all=FALSE)
print(NewData)
## Table Feed Grit Direction Batch Concentration Strength
## 1 -1 -1 -1 -1 -1 -1 680
## 2 -1 -1 -1 -1 1 1 615
## 3 -1 -1 -1 1 -1 1 470
## 4 -1 -1 -1 1 1 -1 443
## 5 -1 -1 1 -1 -1 1 708
## 6 -1 -1 1 -1 1 -1 585
## 7 -1 -1 1 1 -1 -1 445
## 8 -1 -1 1 1 1 1 390
## 9 -1 1 -1 -1 -1 1 715
## 10 -1 1 -1 -1 1 -1 611
## 11 -1 1 -1 1 -1 -1 479
## 12 -1 1 -1 1 1 1 412
## 13 -1 1 1 -1 -1 -1 693
## 14 -1 1 1 -1 1 1 603
## 15 -1 1 1 1 -1 1 725
## 16 -1 1 1 1 1 -1 386
## 17 1 -1 -1 -1 -1 1 730
## 18 1 -1 -1 -1 1 -1 621
## 19 1 -1 -1 1 -1 -1 476
## 20 1 -1 -1 1 1 1 435
## 21 1 -1 1 -1 -1 -1 642
## 22 1 -1 1 -1 1 1 588
## 23 1 -1 1 1 -1 1 402
## 24 1 -1 1 1 1 -1 343
## 25 1 1 -1 -1 -1 -1 667
## 26 1 1 -1 -1 1 1 642
## 27 1 1 -1 1 -1 1 575
## 28 1 1 -1 1 1 -1 511
## 29 1 1 1 -1 -1 1 680
## 30 1 1 1 -1 1 -1 608
## 31 1 1 1 1 -1 -1 491
## 32 1 1 1 1 1 1 441
Below I will perform a second ANOVA on this newly created data set which will test the main effects of each factor as well as the interaction effects of Direction with all other factors.
model2=lm(Strength~Direction*Table+Direction*Feed+Direction*Grit+Direction*Batch+Direction*Concentration,data=NewData)
anova(model2)
## Analysis of Variance Table
##
## Response: Strength
## Df Sum Sq Mean Sq F value Pr(>F)
## Direction 1 274540 274540 90.02 7.6e-09 ***
## Table 1 364 364 0.12 0.73317
## Feed 1 13861 13861 4.54 0.04561 *
## Grit 1 3872 3872 1.27 0.27319
## Batch 1 56448 56448 18.51 0.00035 ***
## Concentration 1 6328 6328 2.07 0.16521
## Direction:Table 1 61 61 0.02 0.88940
## Direction:Feed 1 10011 10011 3.28 0.08507 .
## Direction:Grit 1 1 1 0.00 0.98991
## Direction:Batch 1 113 113 0.04 0.84963
## Direction:Concentration 1 325 325 0.11 0.74744
## Residuals 20 60998 3050
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As seen in the ANOVA above, we can reject the previously stated null hypothesis for the factors of Direction, Feed, and Batch. Also according to this new ANOVA it is evident that there are no significant interaction effects between the factor Direction and the other factors as I had previously hypothesized.
Here I use the Shapiro-Wilk normality test to determine if the response variable is normally distributed.
shapiro.test(NewData$Strength)
##
## Shapiro-Wilk normality test
##
## data: NewData$Strength
## W = 0.9331, p-value = 0.04772
The results of the Shapiro-Wilk test (p=0.04772) indicate that the data is normally distributed when using an alpha level of 0.005 thus we will not transform the data to fit a normal distribution and perform further analysis.
To check the adequacy of using the ANOVA as a means of analyzing this set of data I performed Quantile-Quantile (Q-Q) tests on the residual errors to determine if the residuals followed a normal distribution.
The nearly linear fit of the residuals in the QQ plot is an indication that the ANOVA model may be adequate for this analysis. A perfectly linear fit in these QQ plots would mean that the model that I used perfectly satisfies the assumptions of normality. Based on the results of the previously performed Shapiro-Wilk test, we expected a nearly linear fit for this Q-Q plot.
The second type of plot is a Residuals vs. Fits plot which is used to identify the linearity of the residual values and to determine if there are any outlying values. Because the residual values seem to be centered around zero for this model it can be concluded that the model used in this analysis is accurate for determining the effect of these factors on the material strength.
# QQ Plot for residuals in analysis of fuel type effect on highway gas mileage
qqnorm(residuals(model2))
qqline(residuals(model2))
plot(fitted(model2),residuals(model2))
This data set is a modified version of a data set obtained from http://www.itl.nist.gov/div898/handbook/pri/section4/pri471.htm.