Recipes for the Design of Experiments:

Recipe Outline

Trevor Corrao

RPI

12/7/16 Version 1

1. Setting

System under test

For this recipe, a dataset of automobile design and performance metrics from 1974 Motor Trend is analyzed. MPG is a response variable, which is dependent on factors including number of cylinders, car weight, V or Straight Engine, and Automatic or Manual transmission. There were 32 observations. This data can be found on vincentarelbundock.github.io/Rdatasets/datasets.html.

library("Ecdat")
## Warning: package 'Ecdat' was built under R version 3.1.3
## Loading required package: Ecfun
## Warning: package 'Ecfun' was built under R version 3.1.3
## 
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
## 
##     Orange
data(mtcars)
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Factors and Levels

The two 2-level factors being considered are:

  1. vs: Engine (0 = V, 1 = Straight)

  2. am: Transmission (0 = automatic, 1 = manual)

The two 3-level factors being considered are:

  1. cyl: Number of cylinders (“4”=2, “6”= 1, “8”=0)

  2. wt: Weight in 1000s of lbs (“1.304 to 2.817” = 2, “2.818 to 4.120” = 1, “4.121 to 5.424” = 0)

mtcars$cyl[mtcars$cyl >= 4 & mtcars$cyl < 6] = 2
mtcars$cyl[mtcars$cyl >= 6 & mtcars$cyl < 8] = 1
mtcars$cyl[mtcars$cyl >= 8 & mtcars$cyl < 9] = 0
mtcars$wt[as.numeric(mtcars$wt) >= 1.304 & as.numeric(mtcars$wt) < 2.818] = 2
mtcars$wt[as.numeric(mtcars$wt) >= 2.818 & as.numeric(mtcars$wt) < 4.121] = 1
mtcars$wt[as.numeric(mtcars$wt) >= 4.121 & as.numeric(mtcars$wt) < 5.424] = 0

These functions split the continous data into three defined levels.

mtcars$vs=as.factor(mtcars$vs)
mtcars$am=as.factor(mtcars$am)
mtcars$cyl=as.factor(mtcars$cyl)
mtcars$wt=as.factor(mtcars$wt)

Continuous Variables

All of the variables in this experiment are discrete. The only dynamic variable is MPG, which is dependent upon the factors. But, MPG is still a discretized value.

Deciding Upon a Response Variable

Deciding upon a response variable in this experiment was relatively simple and intuitive. The MPG of an automobile is affected by weight, transmission, engine, and number of cylinders, along with other factors which are not discussed. (i.e. displacement, fuel type, aerodynamics, gear ratios, horsepower etc.) These factors, combined determine MPG. MPG is not the determinant for the factors, so it was relatively simple to see where the dependency lies. MPG is the response variable.

The Data: How is it organized and what does it look like?

The structure of the Doctor data is as follows:

#analyzing structure
  str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : Factor w/ 3 levels "0","1","2": 2 2 3 2 1 2 1 3 3 2 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : Factor w/ 4 levels "0","1","2","5.424": 3 2 3 2 2 2 2 2 2 2 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
##  $ am  : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

First and last 6 observations of the dataset:

 head(mtcars)
##                    mpg cyl disp  hp drat wt  qsec vs am gear carb
## Mazda RX4         21.0   1  160 110 3.90  2 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   1  160 110 3.90  1 17.02  0  1    4    4
## Datsun 710        22.8   2  108  93 3.85  2 18.61  1  1    4    1
## Hornet 4 Drive    21.4   1  258 110 3.08  1 19.44  1  0    3    1
## Hornet Sportabout 18.7   0  360 175 3.15  1 17.02  0  0    3    2
## Valiant           18.1   1  225 105 2.76  1 20.22  1  0    3    1
 tail(mtcars)
##                 mpg cyl  disp  hp drat wt qsec vs am gear carb
## Porsche 914-2  26.0   2 120.3  91 4.43  2 16.7  0  1    5    2
## Lotus Europa   30.4   2  95.1 113 3.77  2 16.9  1  1    5    2
## Ford Pantera L 15.8   0 351.0 264 4.22  1 14.5  0  1    5    4
## Ferrari Dino   19.7   1 145.0 175 3.62  2 15.5  0  1    5    6
## Maserati Bora  15.0   0 301.0 335 3.54  1 14.6  0  1    5    8
## Volvo 142E     21.4   2 121.0 109 4.11  2 18.6  1  1    4    2

2.(Experimental) Design

Organization of the Experiment to Test the Hypothesis

The design of the experiment is 2m-3 fractional factorial design. THis design is used to calculate the lowest level resoultion. The full factorial design of the experiment would be 2^6 design, but ultimately, the three-level factor will be represented by two-level factors.

Rationale for This Design

We use this design because it allows us to analyze main effects and relationship effects. It also incorporates resolution, so we can control the order of the experiment.

Randomization

We are assuming random selection of vehicles, which will have random MPG and factors values.

Replication

There are replicates; the study observed the exact same factors on each vehicle. Each vehicle is an observation, but also a replicate. They would be “repeated” if the same study were performed on the same vehicle more than once.

Blocking

Blocking was applied when selecting the factors to use. The factors that were deemed null include engine displacement, horsepower, rear axle ratio, 1/4 mile time, number of forward gears, and number of carbs. These factors were not as significant as the 4 which were selected.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

#checking for relationships
boxplot(mtcars$mpg~mtcars$vs, xlab="V or Straight", ylab="MPG")

boxplot(mtcars$mpg~mtcars$am, xlab="Automatic or Manual", ylab="MPG")

boxplot(mtcars$mpg~mtcars$cyl, xlab="# of Cylinders", ylab="MPG")

boxplot(mtcars$mpg~mtcars$wt, xlab="Weight", ylab="MPG")

Testing

#anova test
model1=aov(mtcars$mpg~mtcars$vs+mtcars$am+mtcars$cyl+mtcars$wt)
anova(model1)
## Analysis of Variance Table
## 
## Response: mtcars$mpg
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## mtcars$vs   1 496.53  496.53 53.5014 1.484e-07 ***
## mtcars$am   1 276.03  276.03 29.7429 1.324e-05 ***
## mtcars$cyl  2  94.59   47.30  5.0961    0.0143 *  
## mtcars$wt   3  36.16   12.05  1.2987    0.2977    
## Residuals  24 222.74    9.28                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the linear ANOVA model, we can see that the type of engine, the type of transmission, and number of cylinders all have significant effects on the MPG. The only factor with a p value that lies significantly above .05 is weight, but this makes sense in the real world. The Weight of vehicles determines the engine, transmission, and # of cylinders, which all have high significance. So weight has an indirect accountability.

qqnorm(residuals(model1))
qqline(residuals(model1))

After observing the Q-Q norm plot, we can tell that the data is relatively normal, so we can assume normality when proceeding.

library(FrF2)
## Warning: package 'FrF2' was built under R version 3.1.3
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.1.3
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.1.3
## 
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## The following object is masked from 'package:graphics':
## 
##     plot.design

We are assigning the split level factors with a binary structure.

r=nrow(mtcars)

vsnum = data.frame(1)
amnum = data.frame(1)
cylnum = data.frame(1)
wtnum = data.frame(1)

for (i in 1:r){
  
  if (mtcars$vs[i] == "1"){
    vsnum[i,1] <- 1
  } else {
    vsnum[i,1] <- 0
  }
  
  if (mtcars$am[i] == "1"){
    amnum[i,1] <- 1
  } else {
    amnum[i,1] <- 0
  }
  
   if (mtcars$cyl[i] == "1"){
    cylnum[i,1] <- 1
  } else {
    cylnum[i,1] <- 0
  }
  
   if (mtcars$wt[i] == "1"){
    wtnum[i,1] <- 1
  } else {
    wtnum[i,1] <- 0
  }
}

car1 <- cbind(vsnum,amnum,cylnum,wtnum,mtcars$mpg)

colnames(car1) <- c("vs","am","cyl","wt","mpg")
head(car1)
##   vs am cyl wt  mpg
## 1  0  1   1  0 21.0
## 2  0  1   1  1 21.0
## 3  1  1   0  0 22.8
## 4  1  0   1  1 21.4
## 5  0  0   0  1 18.7
## 6  1  0   1  1 18.1
str(car1)
## 'data.frame':    32 obs. of  5 variables:
##  $ vs : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ cyl: num  1 1 0 1 0 1 0 0 0 1 ...
##  $ wt : num  0 1 0 1 1 1 1 1 1 1 ...
##  $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
summary(car1)
##        vs               am              cyl               wt        
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :1.0000  
##  Mean   :0.4375   Mean   :0.4062   Mean   :0.2188   Mean   :0.5625  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##       mpg       
##  Min.   :10.40  
##  1st Qu.:15.43  
##  Median :19.20  
##  Mean   :20.09  
##  3rd Qu.:22.80  
##  Max.   :33.90

Full Factorial Design of 64 runs:

library(FrF2)
u= FrF2(64,6,res3 = T)
## creating full factorial with 64 runs ...
print(u)
##     A  B  C  D  E  F
## 1   1  1 -1 -1 -1  1
## 2  -1 -1 -1 -1 -1  1
## 3   1 -1  1 -1 -1 -1
## 4   1 -1 -1  1 -1  1
## 5  -1  1 -1  1 -1  1
## 6   1  1  1  1 -1  1
## 7  -1  1  1 -1  1  1
## 8  -1  1 -1  1 -1 -1
## 9   1  1  1  1 -1 -1
## 10  1  1 -1  1  1  1
## 11  1 -1 -1 -1 -1 -1
## 12  1  1  1  1  1  1
## 13  1 -1 -1  1  1  1
## 14  1 -1 -1 -1 -1  1
## 15 -1  1 -1  1  1  1
## 16  1  1 -1 -1  1  1
## 17 -1  1 -1 -1  1  1
## 18 -1 -1 -1  1 -1 -1
## 19 -1 -1  1 -1 -1 -1
## 20 -1 -1  1 -1 -1  1
## 21 -1  1  1 -1  1 -1
## 22  1 -1  1  1  1 -1
## 23  1 -1 -1  1 -1 -1
## 24  1 -1 -1  1  1 -1
## 25  1 -1  1  1  1  1
## 26  1 -1  1 -1 -1  1
## 27 -1  1  1 -1 -1  1
## 28 -1  1 -1 -1  1 -1
## 29  1 -1 -1 -1  1  1
## 30  1 -1  1  1 -1 -1
## 31 -1 -1  1  1  1 -1
## 32 -1 -1  1  1 -1  1
## 33 -1 -1 -1  1 -1  1
## 34 -1 -1 -1  1  1  1
## 35 -1  1  1 -1 -1 -1
## 36  1  1 -1  1 -1  1
## 37  1  1 -1 -1  1 -1
## 38  1  1 -1  1 -1 -1
## 39 -1 -1  1 -1  1  1
## 40  1  1  1 -1  1  1
## 41  1 -1 -1 -1  1 -1
## 42 -1  1 -1 -1 -1  1
## 43 -1  1  1  1  1  1
## 44  1  1  1 -1  1 -1
## 45  1  1  1 -1 -1 -1
## 46 -1 -1  1  1  1  1
## 47 -1 -1  1 -1  1 -1
## 48 -1  1 -1 -1 -1 -1
## 49  1  1  1  1  1 -1
## 50  1  1  1 -1 -1  1
## 51 -1 -1 -1  1  1 -1
## 52 -1 -1 -1 -1  1 -1
## 53 -1  1  1  1 -1  1
## 54  1  1 -1 -1 -1 -1
## 55 -1  1  1  1  1 -1
## 56 -1 -1 -1 -1  1  1
## 57 -1  1  1  1 -1 -1
## 58 -1  1 -1  1  1 -1
## 59  1 -1  1  1 -1  1
## 60  1  1 -1  1  1 -1
## 61 -1 -1 -1 -1 -1 -1
## 62  1 -1  1 -1  1  1
## 63  1 -1  1 -1  1 -1
## 64 -1 -1  1  1 -1 -1
## class=design, type= full factorial

Fractional factorial design of 8 runs:

library(FrF2)
s= FrF2(8,6,res3 = T)
print(s)
##    A  B  C  D  E  F
## 1  1  1  1  1  1  1
## 2  1 -1  1 -1  1 -1
## 3 -1  1 -1 -1  1 -1
## 4 -1 -1  1  1 -1 -1
## 5 -1 -1 -1  1  1  1
## 6  1  1 -1  1 -1 -1
## 7  1 -1 -1 -1 -1  1
## 8 -1  1  1 -1 -1  1
## class=design, type= FrF2

We can recognize apparent confounding in within the data below. Since the design of the experiment is resolution 3, the MEs and 2fis are confounding with 2fis.

aliasprint(s)
## $legend
## [1] A=A B=B C=C D=D E=E F=F
## 
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD

4. References to the Literature.

N/A

5. Appendices

Raw Data: http://vincentarelbundock.github.io/Rdatasets/datasets.html

All Complete R code included in the RMarkDown.