Recipe 8: Fractional Factoral Design

Fractional Factorial Design: An Analysis of Car Data

Max Winkelman

Rensselaer Polytechnic Institute

November 20, 2014

Version 1

1. Setting

Cars

The data analyzed in this recipe is a csv file that contains various measured parameters from vehicles.

Install the ‘FFD Cars 3.0.csv’ file

cars <- read.csv("~/RPI/Classes/Design of Experiments/R/FFD Cars 3.0.csv", header=TRUE)
#reads in the data from the csv file 'FFD Cars 3.0.csv' and assigns it to the dataframe 'cars'

Factors and Levels

Factor: Engine Size (in^3), Weight (lbs), Acceleration Time (secs), Model Year, Country of Origin, and Number of Cylinders

Levels: Above or Below 150 in^3, Above or below 3000 lbs, Above or below 14 secs, 1970 or 1971, Country A or B, and 4 or 6 cylinders

#Summary of Data 
head(cars)

##   Engine Weight Acceleration Year Country Cylinders mpg
## 1      1      1           -1    1      -1         1  18
## 2      1      1            1   -1      -1         1  15
## 3      1     -1           -1   -1       1         1  18
## 4     -1     -1           -1   -1       1         1  16
## 5      1     -1            1    1       1         1  17
## 6      1      1            1    1       1         1  15

#displays the first 6 sets of variables 
tail(cars)

##     Engine Weight Acceleration Year Country Cylinders mpg
## 123      1      1            1    1       1        -1  16
## 124     -1      1           -1    1      -1        -1  15
## 125     -1      1           -1    1       1        -1  16
## 126     -1     -1           -1   -1      -1        -1  14
## 127     -1     -1           -1    1       1         1  17
## 128      1     -1           -1    1       1         1  16

#displays the last 6 sets of variables 
summary(cars)

##      Engine             Weight    Acceleration      Year       Country  
##  Min.   :-1.00000   Min.   :-1   Min.   :-1    Min.   :-1   Min.   :-1  
##  1st Qu.:-1.00000   1st Qu.:-1   1st Qu.:-1    1st Qu.:-1   1st Qu.:-1  
##  Median : 1.00000   Median : 0   Median : 0    Median : 0   Median : 0  
##  Mean   : 0.03125   Mean   : 0   Mean   : 0    Mean   : 0   Mean   : 0  
##  3rd Qu.: 1.00000   3rd Qu.: 1   3rd Qu.: 1    3rd Qu.: 1   3rd Qu.: 1  
##  Max.   : 1.00000   Max.   : 1   Max.   : 1    Max.   : 1   Max.   : 1  
##    Cylinders       mpg       
##  Min.   :-1   Min.   :10.00  
##  1st Qu.:-1   1st Qu.:15.00  
##  Median : 0   Median :19.00  
##  Mean   : 0   Mean   :19.88  
##  3rd Qu.: 1   3rd Qu.:25.00  
##  Max.   : 1   Max.   :35.00

#displays a summary of the variables

Continuous variables:

Since each factor in the dataset “cars” only contains two levels, they can be considered categorical.

Response variables:

Miles per gallon will be the response variable in this recipe.

The Data: How is it organized and what does it look like?

The csv file is for educational purposes and the methods of how that data were gathered are unknown. The data is organized into columns labeled Engine Size (in^3), Weight (lbs), Acceleration Time (secs), Model Year, Country of Origin, Number of Cylinders, and Miles Per Gallon. The data has be modified so that each factor only have two levels and are represented by either 1 or -1, which correspond to above or Below 150 in^3, Above or below 3000 lbs, Above or below 14 secs, 1970 or 1971, Country A or B, and 4 or 6 cylinders, respectively.

Randomization

It can be assumed that the original data was gathered with proper randomization methods.

2. Experimental Design

How will the experiment be organized and conducted to test the hypothesis?

For this recipe, the data from “FFD Cars 3.0” will be analyzed to determine if the variation in a vehicle’s fuel mileage can be attributed to the variation of each of the factors in the data set. An analysis of variance with a confidence interval of 95% will be performed to determine the relationship between the fuel mileage means. The null hypothesis of this experiment is that the fuel mileage means across all levels will be the same. If this is rejected, the alternative hypothesis, which states that the mean fuel mileage are different between levels, will be accepted. After the ANOVA has been performed, a fractional factorial design will be implemented to determine its effect on the statistical analysis. The data is composed of 6 factors, each containing two levels, that correspond to one response variable. A fractional factorial design is a common technique used to analyze multiple factors containing only two levels.

Since this data was gathered with no specific intention, the randomization scheme, if any, is unknown. There are no replicates or repeated measures in this data set. There will also be no blocking in this recipe.

3. Statistical Analysis

Exploratory Data Analysis: Graphics and Descriptive Summary

#Assign the data types 
cars$engine=as.factor(cars$Engine)
#makes the variable "engine" a factor

cars$weight=as.factor(cars$Weight)
#makes the variable "weight" a factor

cars$acceleration=as.factor(cars$Acceleration)
#makes the variable "acceleration" a factor

cars$year=as.factor(cars$Year)
#makes the variable "year" a factor

cars$country=as.factor(cars$Country)
#makes the variable "country" a factor

cars$cylinders=as.factor(cars$Cylinders)
#makes the variable "cylinders" a factor

#Boxplot
boxplot(mpg~engine,data=cars, xlab="Engine", ylab="Fuel Mileage (mpg)")
title("Engine Miles Per Gallon")

#boxplot of the mpg data from the factor "Engine"

boxplot(mpg~weight,data=cars, xlab="Weight (lbs)", ylab="Fuel Mileage (mpg)")
title("Weight Miles Per Gallon")

#boxplot of the mpg data from the factor "Weight"

boxplot(mpg~acceleration,data=cars, xlab="Acceleration Time", ylab="Fuel Mileage (mpg)")
title("Acceleration Miles Per Gallon")

#boxplot of the mpg data from the factor "Acceleration"

boxplot(mpg~year,data=cars, xlab="Model Year", ylab="Fuel Mileage (mpg)")
title("Year Miles Per Gallon")

#boxplot of the mpg data from the factor "Year"

boxplot(mpg~country,data=cars, xlab="Country of Origin", ylab="Fuel Mileage (mpg)")
title("Country Miles Per Gallon")

#boxplot of the mpg data from the factor "Country"

boxplot(mpg~cylinders,data=cars, xlab="Number of Cylinders", ylab="Fuel Mileage (mpg)")
title("Cylinders Miles Per Gallon")

#boxplot of the mpg data from the factor "Cylinders"

The boxplots above display the distribution of the variation of the fuel mileage that can be attributed to the variation of the 6 factors in this design. Based on visual observation, there does not appear to be any large differences between the level means of any factors. There do no appear to be any obvious interaction effects, based on the boxplots above.

First ANOVA Test

An analysis of variance (ANOVA) will be used to determine the statistical significance between the fuel mileage means. The null hypothesis for the ANOVA test is that the mean fuel mileage vectors of all samples are equal to each other. If the null hypothesis is rejected, the alternative hypothesis, which states that the mean vectors are not equal to each other, is accepted.

# ANOVA
model = aov(mpg~engine+weight+acceleration+year+country+cylinders,data=cars) 
anova(model)

## Analysis of Variance Table
## 
## Response: mpg
##               Df Sum Sq Mean Sq F value  Pr(>F)  
## engine         1  145.7 145.707  4.4236 0.03752 *
## weight         1    2.1   2.070  0.0628 0.80248  
## acceleration   1   23.0  23.046  0.6996 0.40455  
## year           1    0.5   0.544  0.0165 0.89798  
## country        1    2.1   2.090  0.0634 0.80156  
## cylinders      1  120.9 120.948  3.6719 0.05770 .
## Residuals    121 3985.6  32.939                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#performs an anova test

ANOVA Results: Based on the p-values returned from the anova, we can reject the null hypothesis for engine size. The p-value for engine was 0.038, indicating that there is a high probability that the variation of fuel mileage can be attributed to the engine area. All other p-values indicate that any variation of fuel mileage seen in the other factors is likely due to randomization. No interaction effects of engine size and any other factor are apparent.

Fractional Factorial Design

This method utilizes a subset of the experimental runs of the full factorial design to specifically identify information about the most important characteristics of the data set analyzed. The data set in this recipe has 6 factors, each with two levels, resulting in 64 runs (a 2^6 factorial design). A half-fractional design will analyze 32 runs and is expressed as a 2^(6-1) factorial design. The resolution of a factorial design is its ability to separate main effects and factor interactions from each other.

library(FrF2)

## Loading required package: DoE.base
## Loading required package: grid
## Loading required package: conf.design
## 
## Attaching package: 'DoE.base'
## 
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## 
## The following object is masked from 'package:graphics':
## 
##     plot.design

#loads the R package "FrF2" needed to run a fractional factorial design
FFD = FrF2(32,nfactors=6,estimable=formula("~Engine+Weight+Acceleration+Year+Country+Cylinders+Engine:(Weight+Acceleration+Year+Country+Cylinders)"), factor.names=c("Engine","Weight","Acceleration","Year","Country","Cylinders"), res4=TRUE, clear=FALSE)
#generates an array for a 2^(6-1) fractional factorial design (32 runs)
#since factor names are used, factor.names must be used
#the resolution of this design is 4 to estimate the main effects unconfounded by other main effects or two-factor interactions. Two-factor interaction effects, however, are aliased with each other.
FFD

##    Engine Weight Acceleration Year Country Cylinders
## 1      -1     -1           -1    1       1        -1
## 2      -1      1           -1    1       1         1
## 3      -1      1            1    1       1        -1
## 4      -1      1           -1   -1      -1         1
## 5       1      1           -1   -1       1         1
## 6      -1     -1            1    1       1         1
## 7       1      1            1   -1       1        -1
## 8       1     -1            1    1      -1         1
## 9      -1     -1            1    1      -1        -1
## 10      1     -1           -1    1       1         1
## 11     -1      1            1   -1      -1        -1
## 12      1      1           -1    1       1        -1
## 13      1      1            1    1       1         1
## 14      1     -1            1    1       1        -1
## 15      1      1           -1   -1      -1        -1
## 16      1     -1           -1   -1      -1         1
## 17     -1     -1           -1    1      -1         1
## 18      1     -1            1   -1       1         1
## 19      1      1            1   -1      -1         1
## 20     -1     -1           -1   -1      -1        -1
## 21     -1     -1            1   -1      -1         1
## 22     -1      1            1    1      -1         1
## 23     -1      1            1   -1       1         1
## 24      1     -1           -1   -1       1        -1
## 25     -1      1           -1   -1       1        -1
## 26      1      1            1    1      -1        -1
## 27      1     -1            1   -1      -1        -1
## 28      1     -1           -1    1      -1        -1
## 29     -1      1           -1    1      -1        -1
## 30      1      1           -1    1      -1         1
## 31     -1     -1           -1   -1       1         1
## 32     -1     -1            1   -1       1        -1
## class=design, type= FrF2.estimable

#displays the fractional factorial design 
aliasprint(FFD)

## $legend
## [1] A=Engine       B=Weight       C=Acceleration D=Year        
## [5] E=Country      F=Cylinders   
## 
## [[2]]
## [1] no aliasing among main effects and 2fis

#identifies any aliasing in the design
FFDdata = merge(FFD, cars, by=c("Engine","Weight","Acceleration","Year","Country","Cylinders"), all = FALSE)
#combines the FFD matrix with the data in cars
FFDdata

##    Engine Weight Acceleration Year Country Cylinders mpg engine weight
## 1      -1     -1           -1   -1      -1        -1  14     -1     -1
## 2      -1     -1           -1   -1      -1        -1  13     -1     -1
## 3      -1     -1           -1   -1       1         1  16     -1     -1
## 4      -1     -1           -1   -1       1         1  23     -1     -1
## 5      -1     -1           -1    1      -1         1  14     -1     -1
## 6      -1     -1           -1    1      -1         1  12     -1     -1
## 7      -1     -1           -1    1       1        -1  25     -1     -1
## 8      -1     -1           -1    1       1        -1  19     -1     -1
## 9      -1     -1            1   -1      -1         1  15     -1     -1
## 10     -1     -1            1   -1      -1         1  24     -1     -1
## 11     -1     -1            1   -1       1        -1  14     -1     -1
## 12     -1     -1            1   -1       1        -1  16     -1     -1
## 13     -1     -1            1    1      -1        -1  16     -1     -1
## 14     -1     -1            1    1      -1        -1  32     -1     -1
## 15     -1     -1            1    1       1         1  18     -1     -1
## 16     -1     -1            1    1       1         1  14     -1     -1
## 17     -1      1           -1   -1       1        -1  14     -1      1
## 18     -1      1           -1   -1       1        -1  12     -1      1
## 19     -1      1           -1    1      -1        -1  20     -1      1
## 20     -1      1           -1    1      -1        -1  15     -1      1
## 21     -1      1           -1    1       1         1  24     -1      1
## 22     -1      1           -1    1       1         1  30     -1      1
## 23     -1      1            1   -1      -1        -1  11     -1      1
## 24     -1      1            1   -1      -1        -1  10     -1      1
## 25     -1      1            1   -1       1         1  21     -1      1
## 26     -1      1            1   -1       1         1  15     -1      1
## 27     -1      1            1    1      -1         1  13     -1      1
## 28     -1      1            1    1      -1         1  29     -1      1
## 29     -1      1            1    1       1        -1  24     -1      1
## 30     -1      1            1    1       1        -1  21     -1      1
## 31      1     -1           -1   -1      -1         1  15      1     -1
## 32      1     -1           -1   -1      -1         1  25      1     -1
## 33      1     -1           -1   -1       1        -1  25      1     -1
## 34      1     -1           -1   -1       1        -1  31      1     -1
## 35      1     -1           -1    1      -1        -1  17      1     -1
## 36      1     -1           -1    1      -1        -1  25      1     -1
## 37      1     -1           -1    1       1         1  24      1     -1
## 38      1     -1           -1    1       1         1  16      1     -1
## 39      1     -1            1   -1      -1        -1  19      1     -1
## 40      1     -1            1   -1      -1        -1  28      1     -1
## 41      1     -1            1   -1       1         1  18      1     -1
## 42      1     -1            1   -1       1         1  26      1     -1
## 43      1     -1            1    1      -1         1  26      1     -1
## 44      1     -1            1    1      -1         1  22      1     -1
## 45      1     -1            1    1       1        -1  18      1     -1
## 46      1     -1            1    1       1        -1  14      1     -1
## 47      1      1           -1   -1      -1        -1  22      1      1
## 48      1      1           -1   -1      -1        -1  21      1      1
## 49      1      1           -1   -1       1         1  32      1      1
## 50      1      1           -1   -1       1         1  18      1      1
## 51      1      1           -1    1      -1         1  18      1      1
## 52      1      1           -1    1      -1         1  16      1      1
## 53      1      1           -1    1       1        -1  10      1      1
## 54      1      1           -1    1       1        -1  20      1      1
## 55      1      1            1   -1      -1         1  18      1      1
## 56      1      1            1   -1      -1         1  15      1      1
## 57      1      1            1   -1       1        -1  27      1      1
## 58      1      1            1   -1       1        -1  21      1      1
## 59      1      1            1    1      -1        -1  30      1      1
## 60      1      1            1    1      -1        -1  26      1      1
## 61      1      1            1    1       1         1  11      1      1
## 62      1      1            1    1       1         1  15      1      1
##    acceleration year country cylinders
## 1            -1   -1      -1        -1
## 2            -1   -1      -1        -1
## 3            -1   -1       1         1
## 4            -1   -1       1         1
## 5            -1    1      -1         1
## 6            -1    1      -1         1
## 7            -1    1       1        -1
## 8            -1    1       1        -1
## 9             1   -1      -1         1
## 10            1   -1      -1         1
## 11            1   -1       1        -1
## 12            1   -1       1        -1
## 13            1    1      -1        -1
## 14            1    1      -1        -1
## 15            1    1       1         1
## 16            1    1       1         1
## 17           -1   -1       1        -1
## 18           -1   -1       1        -1
## 19           -1    1      -1        -1
## 20           -1    1      -1        -1
## 21           -1    1       1         1
## 22           -1    1       1         1
## 23            1   -1      -1        -1
## 24            1   -1      -1        -1
## 25            1   -1       1         1
## 26            1   -1       1         1
## 27            1    1      -1         1
## 28            1    1      -1         1
## 29            1    1       1        -1
## 30            1    1       1        -1
## 31           -1   -1      -1         1
## 32           -1   -1      -1         1
## 33           -1   -1       1        -1
## 34           -1   -1       1        -1
## 35           -1    1      -1        -1
## 36           -1    1      -1        -1
## 37           -1    1       1         1
## 38           -1    1       1         1
## 39            1   -1      -1        -1
## 40            1   -1      -1        -1
## 41            1   -1       1         1
## 42            1   -1       1         1
## 43            1    1      -1         1
## 44            1    1      -1         1
## 45            1    1       1        -1
## 46            1    1       1        -1
## 47           -1   -1      -1        -1
## 48           -1   -1      -1        -1
## 49           -1   -1       1         1
## 50           -1   -1       1         1
## 51           -1    1      -1         1
## 52           -1    1      -1         1
## 53           -1    1       1        -1
## 54           -1    1       1        -1
## 55            1   -1      -1         1
## 56            1   -1      -1         1
## 57            1   -1       1        -1
## 58            1   -1       1        -1
## 59            1    1      -1        -1
## 60            1    1      -1        -1
## 61            1    1       1         1
## 62            1    1       1         1

FFDdata2 = unique( FFDdata[ , 1:6])
#elimates any repeated rows for columns 1 through 6
FFDdata2

##    Engine Weight Acceleration Year Country Cylinders
## 1      -1     -1           -1   -1      -1        -1
## 3      -1     -1           -1   -1       1         1
## 5      -1     -1           -1    1      -1         1
## 7      -1     -1           -1    1       1        -1
## 9      -1     -1            1   -1      -1         1
## 11     -1     -1            1   -1       1        -1
## 13     -1     -1            1    1      -1        -1
## 15     -1     -1            1    1       1         1
## 17     -1      1           -1   -1       1        -1
## 19     -1      1           -1    1      -1        -1
## 21     -1      1           -1    1       1         1
## 23     -1      1            1   -1      -1        -1
## 25     -1      1            1   -1       1         1
## 27     -1      1            1    1      -1         1
## 29     -1      1            1    1       1        -1
## 31      1     -1           -1   -1      -1         1
## 33      1     -1           -1   -1       1        -1
## 35      1     -1           -1    1      -1        -1
## 37      1     -1           -1    1       1         1
## 39      1     -1            1   -1      -1        -1
## 41      1     -1            1   -1       1         1
## 43      1     -1            1    1      -1         1
## 45      1     -1            1    1       1        -1
## 47      1      1           -1   -1      -1        -1
## 49      1      1           -1   -1       1         1
## 51      1      1           -1    1      -1         1
## 53      1      1           -1    1       1        -1
## 55      1      1            1   -1      -1         1
## 57      1      1            1   -1       1        -1
## 59      1      1            1    1      -1        -1
## 61      1      1            1    1       1         1

mpgdata = FFDdata$mpg[index=c(1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61)]
#creates a column of the fuel mileage data that corresponds to the unique rows 
FFDmpg = cbind(FFDdata2,mpgdata)
#cobines FFDdata2 with the column of fuel mileage data 
FFDmpg

##    Engine Weight Acceleration Year Country Cylinders mpgdata
## 1      -1     -1           -1   -1      -1        -1      14
## 3      -1     -1           -1   -1       1         1      16
## 5      -1     -1           -1    1      -1         1      14
## 7      -1     -1           -1    1       1        -1      25
## 9      -1     -1            1   -1      -1         1      15
## 11     -1     -1            1   -1       1        -1      14
## 13     -1     -1            1    1      -1        -1      16
## 15     -1     -1            1    1       1         1      18
## 17     -1      1           -1   -1       1        -1      14
## 19     -1      1           -1    1      -1        -1      20
## 21     -1      1           -1    1       1         1      24
## 23     -1      1            1   -1      -1        -1      11
## 25     -1      1            1   -1       1         1      21
## 27     -1      1            1    1      -1         1      13
## 29     -1      1            1    1       1        -1      24
## 31      1     -1           -1   -1      -1         1      15
## 33      1     -1           -1   -1       1        -1      25
## 35      1     -1           -1    1      -1        -1      17
## 37      1     -1           -1    1       1         1      24
## 39      1     -1            1   -1      -1        -1      19
## 41      1     -1            1   -1       1         1      18
## 43      1     -1            1    1      -1         1      26
## 45      1     -1            1    1       1        -1      18
## 47      1      1           -1   -1      -1        -1      22
## 49      1      1           -1   -1       1         1      32
## 51      1      1           -1    1      -1         1      18
## 53      1      1           -1    1       1        -1      10
## 55      1      1            1   -1      -1         1      18
## 57      1      1            1   -1       1        -1      27
## 59      1      1            1    1      -1        -1      30
## 61      1      1            1    1       1         1      11

Second ANOVA Test

Preforms an anova with the dataframe generated from the fractional factorial design

FFDmodel = aov(mpgdata~Engine+Weight+Acceleration+Year+Country+Cylinders,data=FFDmpg) 
anova(FFDmodel)

## Analysis of Variance Table
## 
## Response: mpgdata
##              Df Sum Sq Mean Sq F value Pr(>F)
## Engine        1  87.32  87.317  2.5448 0.1237
## Weight        1  10.79  10.787  0.3144 0.5802
## Acceleration  1   2.56   2.562  0.0747 0.7870
## Year          1   2.50   2.501  0.0729 0.7895
## Country       1  40.01  40.006  1.1660 0.2910
## Cylinders     1   1.36   1.357  0.0395 0.8440
## Residuals    24 823.47  34.311

#performs an anova test

Based on the results of the anova preformed on the half fraction factorial design, we fail to reject the null hypothesize. The p-values of this anova indicate that the variation in fuel mileage seen in this data frame can likely be due to randomization. This is likely a result of only selecting 32 runs of data. In this model, there are no main effects and no two-factor interactions.

Shapiro-Wilk Test

A Shapiro-Wilk Test will be performed to test the dataset for normality

shapiro.test(FFDmpg$mpgdata)

## 
##  Shapiro-Wilk normality test
## 
## data:  FFDmpg$mpgdata
## W = 0.9582, p-value = 0.2613

Based on the results of the Shapiro-Wilk test, we fail to reject the null hypothesis that the data is normally distributed. We can assume that the population that the data was taken from is normally distributed.

Diagnostics/Model Adequacy Checking (1)

Quantile-Quantile (Q-Q) plots are graphs used to verify the distributional assumption for a set of data. Based on the theoretical distribution, the expected value for each datum is determined. If the data values in a set follow the theoretical distribution, then they will appear as a straight line on a Q-Q plot. When an anova is performed, it is done so with the assumption that the test statistic follows a normal distribution. Visualization of a Q-Q plot will further confirm if that assumption is correct for the anova tests that were performed.

#Q-Q Plots
#Model Year
qqnorm(residuals(FFDmodel), main="Normal Q-Q Plot", ylab="Fuel Mileage (mpg) Residuals")
qqline(residuals(FFDmodel))

#produces a Q-Q normal plot with a normal fit line

The Normal Q-Q plot for the model year returned a relatively linear relationships between the fuel mileage values and the theoretical quantities, indicating that they follow the theoretical distribution. The tails on the plot deviate away from the linear relationship, but not so much that it indicates that the population that the data is from is not normal.

A Residuals vs. Fits Plot is a common graph used in residual analysis. It is a scatter plot of residuals as a function of fitted values, or the estimated responses. These plots are used to identify linearity, outliers, and error variances.

#Model Year
plot(fitted(FFDmodel),residuals(FFDmodel), main="Residual vs Fitted Plot")

#Produces a residual plot

The residual plot above shows a small degree of variation of the residual values and contains no extreme outliers. This indicates that the model chosen in this recipe is appropriate.

4. References to the Literature

No literature was used in this sample recipe

5. Appendices

The raw data used in this statistical analysis are results of vehicle testing. It can be readily accessed using R or RStudio. It is available as a downloadable package and can be found online at http://www.mathcs.org/statistics/datasets/.