Anthony D’Amato

RPI

11/18/2014

1. Setting

Material Strengths

The following experimental analysis looks at a data set which contains values regarding the grinding parameters of a hypothetical material X. There were 64 total runs taken to determine the strength of the material. Below, I load the CSV which contains the data, and display the first 6 rows of the data set as an example of the organization of the data, as well as the structure of the data below that. Lastly I load the package FrF2 which will be used in creating a fractional factorial design for this experimental data analysis.

CSV<-read.csv("C:\\Users\\Anthony\\Desktop\\School\\RPI Year 1\\DoE\\DoeR8.csv",header=TRUE)
head(CSV)

##   Table Feed Grit Direction Batch Concentration Strength
## 1    -1   -1   -1        -1    -1            -1      680
## 2     1   -1   -1        -1    -1            -1      722
## 3    -1    1   -1        -1    -1            -1      702
## 4     1    1   -1        -1    -1            -1      667
## 5    -1   -1    1        -1    -1            -1      704
## 6     1   -1    1        -1    -1            -1      642

str(CSV)

## 'data.frame':    64 obs. of  7 variables:
##  $ Table        : int  -1 1 -1 1 -1 1 -1 1 -1 1 ...
##  $ Feed         : int  -1 -1 1 1 -1 -1 1 1 -1 -1 ...
##  $ Grit         : int  -1 -1 -1 -1 1 1 1 1 -1 -1 ...
##  $ Direction    : int  -1 -1 -1 -1 -1 -1 -1 -1 1 1 ...
##  $ Batch        : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ Concentration: int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ Strength     : int  680 722 702 667 704 642 693 669 492 476 ...

require(FrF2)

## Loading required package: FrF2

## Warning: package 'FrF2' was built under R version 3.1.2

## Loading required package: DoE.base

## Warning: package 'DoE.base' was built under R version 3.1.2

## Loading required package: grid
## Loading required package: conf.design

## Warning: package 'conf.design' was built under R version 3.1.2

## 
## Attaching package: 'DoE.base'
## 
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## 
## The following object is masked from 'package:graphics':
## 
##     plot.design

Factors and Levels

The factors involved in this experimental analysis are the Table Speed used during grinding (denoted as “Table”), the Down Feed Rate (denoted as “Feed”), the Wheel Grit (denoted as “Grit”), the Direction of the wheel (denoted as “Direction”), the Batch of the material used, and the Concentration of X in the material (denoted as “Concentration”). Each factor has two levels which are labelled as -1 or 1. The levels of table speed were -1=0.025 m/s and 1=0.125 m/s. The levels for down feed rate were -1=0.05mm and 1=0.125mm. The levels for wheel grit were -1=140/170 and 1=80/100. The factors for direction were -1=longitudinal and 1=transverse. The levels for batch were -1=batch 1 and 1=batch 2. And the factors for concentration were -1=low concentration of X and 1=high concentration of X.

The Data: How is it organized and what does it look like?

This data set is organized into the 6 factors as previously mentioned with the response variable being the strength of the material. This response variable is continuous.

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?
In this experimental data analysis i will use a fractional factorial design to select 32 out of the total 64 experimental runs to perform data analysis on. I will do this using the R package “FrF2”. Before that I will perform data analysis on the full factorial design to compare to the results of the fractional factorial design.

What is the rationale for this design?
This design will be used to demonstrate proper implementation of a fractional factorial design in an experimental data analysis.

3. Statistical Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Below I define all six factors as factors for R to analyze.

CSV$Table=as.factor(CSV$Table)
CSV$Feed=as.factor(CSV$Feed)
CSV$Grit=as.factor(CSV$Grit)
CSV$Direction=as.factor(CSV$Direction)
CSV$Batch=as.factor(CSV$Batch)
CSV$Concentration=as.factor(CSV$Concentration)

Below I create box and whisker plots to represent any trends in the data.

par(mfrow=c(2,3))
plot(CSV$Strength~CSV$Table,xlab="Table speed",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Feed,xlab="Down Feed Rate",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Grit,xlab="Wheel Grit",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Direction,xlab="Direction",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Batch,xlab="Batch",ylab="Strength (psi)")
plot(CSV$Strength~CSV$Concentration,xlab="Concentration of X",ylab="Strength (psi)")

plot of chunk unnamed-chunk-3

Testing

Analysis of Variance (ANOVA)

Below is the initial analysis of variance (ANOVA) performed on the full factorial set of data.

model1=lm(Strength~Table+Feed+Grit+Direction+Batch+Concentration,data=CSV)
anova(model1)

## Analysis of Variance Table
## 
## Response: Strength
##               Df Sum Sq Mean Sq F value  Pr(>F)    
## Table          1     32      32    0.02  0.8993    
## Feed           1  16288   16288    8.33  0.0055 ** 
## Grit           1  15221   15221    7.78  0.0072 ** 
## Direction      1 608205  608205  310.87 < 2e-16 ***
## Batch          1  88730   88730   45.35 8.8e-09 ***
## Concentration  1   1511    1511    0.77  0.3831    
## Residuals     57 111517    1956                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results of the ANOVA show that in the full factorial experiment all factors except for table speed and concentration have statistically significant effects on the resulting material strength that can likely be attributed to something other than randomization. Thus we reject the null hypothesis which states that a factor does not have an effect on the response variable for those four factors. Furthermore we fail to reject this null hypothesis for the factors of table speed, and concentration of X.

Fractional Factorial Design

Below I construct a design matrix for a 2^(6-1) experimental design, which in this case represents a one-half (32/64) fractional factorial experimental design. Furthermore, due to the large differences in the median values resulting from a change in Direction as seen in the box plots above, I am most interested in observing the main and interaction effects of this factor.

NewDesign=FrF2(32,nfactors=6,estimable=formula("~Table+Feed+Grit+Direction+Batch+Concentration+Direction:(Table+Feed+Grit+Direction+Batch+Concentration)"),factor.names=c("Table","Feed","Grit","Direction","Batch","Concentration"),res5=TRUE,clear=FALSE)
NewDesign

##    Table Feed Grit Direction Batch Concentration
## 1      1    1    1         1    -1            -1
## 2      1    1   -1        -1     1             1
## 3     -1    1   -1         1     1             1
## 4     -1   -1   -1        -1    -1            -1
## 5      1   -1   -1        -1    -1             1
## 6     -1    1    1         1    -1             1
## 7      1    1   -1         1     1            -1
## 8     -1    1   -1        -1    -1             1
## 9      1   -1   -1        -1     1            -1
## 10    -1   -1   -1        -1     1             1
## 11     1   -1   -1         1     1             1
## 12    -1   -1    1        -1     1            -1
## 13     1    1    1        -1     1            -1
## 14     1    1   -1         1    -1             1
## 15     1   -1    1        -1     1             1
## 16    -1   -1    1         1    -1            -1
## 17    -1   -1    1         1     1             1
## 18    -1    1    1        -1     1             1
## 19     1   -1    1        -1    -1            -1
## 20     1    1    1         1     1             1
## 21    -1    1   -1        -1     1            -1
## 22    -1   -1   -1         1     1            -1
## 23    -1    1   -1         1    -1            -1
## 24     1   -1   -1         1    -1            -1
## 25    -1   -1   -1         1    -1             1
## 26    -1    1    1        -1    -1            -1
## 27     1   -1    1         1     1            -1
## 28     1    1    1        -1    -1             1
## 29     1   -1    1         1    -1             1
## 30    -1    1    1         1     1            -1
## 31     1    1   -1        -1    -1            -1
## 32    -1   -1    1        -1    -1             1
## class=design, type= FrF2.estimable

aliasprint(NewDesign)

## $legend
## [1] A=Table         B=Feed          C=Grit          D=Direction    
## [5] E=Batch         F=Concentration
## 
## [[2]]
## [1] no aliasing among main effects and 2fis

Here we use res5=TRUE to implement a resolution of 5 which means that the main effects and interaction effects will not be aliased with any other main or interaction effects.

Below I create a new data set which only uses the selected experimental runs from the fractional factorial design that I created previously.

NewData=merge(NewDesign,CSV,by=c("Table","Feed","Grit","Direction","Batch","Concentration"),all=FALSE)
print(NewData)

##    Table Feed Grit Direction Batch Concentration Strength
## 1     -1   -1   -1        -1    -1            -1      680
## 2     -1   -1   -1        -1     1             1      615
## 3     -1   -1   -1         1    -1             1      470
## 4     -1   -1   -1         1     1            -1      443
## 5     -1   -1    1        -1    -1             1      708
## 6     -1   -1    1        -1     1            -1      585
## 7     -1   -1    1         1    -1            -1      445
## 8     -1   -1    1         1     1             1      390
## 9     -1    1   -1        -1    -1             1      715
## 10    -1    1   -1        -1     1            -1      611
## 11    -1    1   -1         1    -1            -1      479
## 12    -1    1   -1         1     1             1      412
## 13    -1    1    1        -1    -1            -1      693
## 14    -1    1    1        -1     1             1      603
## 15    -1    1    1         1    -1             1      725
## 16    -1    1    1         1     1            -1      386
## 17     1   -1   -1        -1    -1             1      730
## 18     1   -1   -1        -1     1            -1      621
## 19     1   -1   -1         1    -1            -1      476
## 20     1   -1   -1         1     1             1      435
## 21     1   -1    1        -1    -1            -1      642
## 22     1   -1    1        -1     1             1      588
## 23     1   -1    1         1    -1             1      402
## 24     1   -1    1         1     1            -1      343
## 25     1    1   -1        -1    -1            -1      667
## 26     1    1   -1        -1     1             1      642
## 27     1    1   -1         1    -1             1      575
## 28     1    1   -1         1     1            -1      511
## 29     1    1    1        -1    -1             1      680
## 30     1    1    1        -1     1            -1      608
## 31     1    1    1         1    -1            -1      491
## 32     1    1    1         1     1             1      441

Below I will perform a second ANOVA on this newly created data set which will test the main effects of each factor as well as the interaction effects of Direction with all other factors.

model2=lm(Strength~Direction*Table+Direction*Feed+Direction*Grit+Direction*Batch+Direction*Concentration,data=NewData)
anova(model2)

## Analysis of Variance Table
## 
## Response: Strength
##                         Df Sum Sq Mean Sq F value  Pr(>F)    
## Direction                1 274540  274540   90.02 7.6e-09 ***
## Table                    1    364     364    0.12 0.73317    
## Feed                     1  13861   13861    4.54 0.04561 *  
## Grit                     1   3872    3872    1.27 0.27319    
## Batch                    1  56448   56448   18.51 0.00035 ***
## Concentration            1   6328    6328    2.07 0.16521    
## Direction:Table          1     61      61    0.02 0.88940    
## Direction:Feed           1  10011   10011    3.28 0.08507 .  
## Direction:Grit           1      1       1    0.00 0.98991    
## Direction:Batch          1    113     113    0.04 0.84963    
## Direction:Concentration  1    325     325    0.11 0.74744    
## Residuals               20  60998    3050                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As seen in the ANOVA above, we can reject the previously stated null hypothesis for the factors of Direction, Feed, and Batch. Also according to this new ANOVA it is evident that there are no significant interaction effects between the factor Direction and the other factors as I had previously hypothesized.

Here I use the Shapiro-Wilk normality test to determine if the response variable is normally distributed.

shapiro.test(NewData$Strength)

## 
##  Shapiro-Wilk normality test
## 
## data:  NewData$Strength
## W = 0.9331, p-value = 0.04772

The results of the Shapiro-Wilk test (p=0.04772) indicate that the data is normally distributed when using an alpha level of 0.005 thus we will not transform the data to fit a normal distribution and perform further analysis.

Diagnostics/Model Adequacy Checking

To check the adequacy of using the ANOVA as a means of analyzing this set of data I performed Quantile-Quantile (Q-Q) tests on the residual errors to determine if the residuals followed a normal distribution.

The nearly linear fit of the residuals in the QQ plot is an indication that the ANOVA model may be adequate for this analysis. A perfectly linear fit in these QQ plots would mean that the model that I used perfectly satisfies the assumptions of normality. Based on the results of the previously performed Shapiro-Wilk test, we expected a nearly linear fit for this Q-Q plot.

The second type of plot is a Residuals vs. Fits plot which is used to identify the linearity of the residual values and to determine if there are any outlying values. Because the residual values seem to be centered around zero for this model it can be concluded that the model used in this analysis is accurate for determining the effect of these factors on the material strength.

# QQ Plot for residuals in analysis of fuel type effect on highway gas mileage
qqnorm(residuals(model2))
qqline(residuals(model2))

plot of chunk unnamed-chunk-9

plot(fitted(model2),residuals(model2))

plot of chunk unnamed-chunk-10

Appendix

This data set is a modified version of a data set obtained from http://www.itl.nist.gov/div898/handbook/pri/section4/pri471.htm.