Recipes for the Design of Experiments:

Recipe Outline

Trevor Corrao

RPI

12/16/16 Version 1

1. Setting

System under test

For this recipe, we will be using the same dataset that was analyzed in Project 3. This is a dataset of automobile design and performance metrics from 1974 Motor Trend is analyzed. MPG is a response variable, which is dependent on factors including number of cylinders, car weight, V or Straight Engine, and Automatic or Manual transmission. There were 32 observations. This data can be found on vincentarelbundock.github.io/Rdatasets/datasets.html.

library("Ecdat")

## Warning: package 'Ecdat' was built under R version 3.1.3

## Loading required package: Ecfun

## Warning: package 'Ecfun' was built under R version 3.1.3

## 
## Attaching package: 'Ecdat'

## The following object is masked from 'package:datasets':
## 
##     Orange

data(mtcars)

library(FrF2)

## Warning: package 'FrF2' was built under R version 3.1.3

## Loading required package: DoE.base

## Warning: package 'DoE.base' was built under R version 3.1.3

## Loading required package: grid

## Loading required package: conf.design

## Warning: package 'conf.design' was built under R version 3.1.3

## 
## Attaching package: 'DoE.base'

## The following objects are masked from 'package:stats':
## 
##     aov, lm

## The following object is masked from 'package:graphics':
## 
##     plot.design

library("rsm")

## Warning: package 'rsm' was built under R version 3.1.3

summary(mtcars)

##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Factors and Levels

The two 2-level factors being considered are:

vs: Engine (0 = V, 1 = Straight)
am: Transmission (0 = automatic, 1 = manual)

The two 3-level factors being considered are:

cyl: Number of cylinders (“4”=2, “6”= 1, “8”=0)
wt: Weight in 1000s of lbs (“1.304 to 2.817” = 2, “2.818 to 4.120” = 1, “4.121 to 5.424” = 0)

mtcars$cyl[mtcars$cyl >= 4 & mtcars$cyl < 6] = 2
mtcars$cyl[mtcars$cyl >= 6 & mtcars$cyl < 8] = 1
mtcars$cyl[mtcars$cyl >= 8 & mtcars$cyl < 9] = 0

mtcars$wt[as.numeric(mtcars$wt) >= 1.304 & as.numeric(mtcars$wt) < 2.818] = 2
mtcars$wt[as.numeric(mtcars$wt) >= 2.818 & as.numeric(mtcars$wt) < 4.121] = 1
mtcars$wt[as.numeric(mtcars$wt) >= 4.121 & as.numeric(mtcars$wt) < 5.424] = 0

These functions split the continous data into three defined levels.

mtcars$vs=as.factor(mtcars$vs)
mtcars$am=as.factor(mtcars$am)
mtcars$cyl=as.factor(mtcars$cyl)
mtcars$wt=as.factor(mtcars$wt)

Assingment of binary structure for factors.

r=nrow(mtcars)

vsnum = data.frame(1)
amnum = data.frame(1)
cylnum = data.frame(1)
wtnum = data.frame(1)

for (i in 1:r){
  
  if (mtcars$vs[i] == "1"){
    vsnum[i,1] <- 1
  } else {
    vsnum[i,1] <- 0
  }
  
  if (mtcars$am[i] == "1"){
    amnum[i,1] <- 1
  } else {
    amnum[i,1] <- 0
  }
  
   if (mtcars$cyl[i] == "1"){
    cylnum[i,1] <- 1
  } else {
    cylnum[i,1] <- 0
  }
  
   if (mtcars$wt[i] == "1"){
    wtnum[i,1] <- 1
  } else {
    wtnum[i,1] <- 0
  }
}

car1 <- cbind(vsnum,amnum,cylnum,wtnum,mtcars$mpg)

colnames(car1) <- c("vs","am","cyl","wt","mpg")

head(car1)

##   vs am cyl wt  mpg
## 1  0  1   1  0 21.0
## 2  0  1   1  1 21.0
## 3  1  1   0  0 22.8
## 4  1  0   1  1 21.4
## 5  0  0   0  1 18.7
## 6  1  0   1  1 18.1

str(car1)

## 'data.frame':    32 obs. of  5 variables:
##  $ vs : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ cyl: num  1 1 0 1 0 1 0 0 0 1 ...
##  $ wt : num  0 1 0 1 1 1 1 1 1 1 ...
##  $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...

summary(car1)

##        vs               am              cyl               wt        
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :1.0000  
##  Mean   :0.4375   Mean   :0.4062   Mean   :0.2188   Mean   :0.5625  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##       mpg       
##  Min.   :10.40  
##  1st Qu.:15.43  
##  Median :19.20  
##  Mean   :20.09  
##  3rd Qu.:22.80  
##  Max.   :33.90

Continuous Variables

All of the variables in this experiment are discrete. The only continuous variable is MPG, which is dependent upon the factors. But, MPG is still a discretized value.

Deciding Upon a Response Variable

Deciding upon a response variable in this experiment was relatively simple and intuitive. The MPG of an automobile is affected by weight, transmission, engine, and number of cylinders, along with other factors which are not discussed. (i.e. displacement, fuel type, aerodynamics, gear ratios, horsepower etc.) These factors, combined determine MPG. MPG is not the determinant for the factors, so it was relatively simple to see where the dependency lies. MPG is the response variable.

The Data: How is it organized and what does it look like?

The structure of the automobile data is as follows:

#analyzing structure
  str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : Factor w/ 3 levels "0","1","2": 2 2 3 2 1 2 1 3 3 2 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : Factor w/ 4 levels "0","1","2","5.424": 3 2 3 2 2 2 2 2 2 2 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
##  $ am  : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

First and last 6 observations of the dataset:

 head(mtcars)

##                    mpg cyl disp  hp drat wt  qsec vs am gear carb
## Mazda RX4         21.0   1  160 110 3.90  2 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   1  160 110 3.90  1 17.02  0  1    4    4
## Datsun 710        22.8   2  108  93 3.85  2 18.61  1  1    4    1
## Hornet 4 Drive    21.4   1  258 110 3.08  1 19.44  1  0    3    1
## Hornet Sportabout 18.7   0  360 175 3.15  1 17.02  0  0    3    2
## Valiant           18.1   1  225 105 2.76  1 20.22  1  0    3    1

 tail(mtcars)

##                 mpg cyl  disp  hp drat wt qsec vs am gear carb
## Porsche 914-2  26.0   2 120.3  91 4.43  2 16.7  0  1    5    2
## Lotus Europa   30.4   2  95.1 113 3.77  2 16.9  1  1    5    2
## Ford Pantera L 15.8   0 351.0 264 4.22  1 14.5  0  1    5    4
## Ferrari Dino   19.7   1 145.0 175 3.62  2 15.5  0  1    5    6
## Maserati Bora  15.0   0 301.0 335 3.54  1 14.6  0  1    5    8
## Volvo 142E     21.4   2 121.0 109 4.11  2 18.6  1  1    4    2

2.(Experimental) Design

Organization of the Experiment to Test the Hypothesis

The design of the experiment is 2^k fractional factorial design. This design is used to calculate the lowest level resoultion. The full factorial design of the experiment would be 2⁶ design, but ultimately, the three-level factor will be represented by two-level factors.We also used a linear ANOVA model to analyze main effects.

After that, we use Response Surface Method to estimate residuals and coefficients.

Rationale for This Design

We use the linear ANOVA design because it allows us to analyze main effects of factors on response variables. We use the fractional factorial design to analyze confounding between interaction effects.

We then use RSM for goal optimization, which entails the combining of factor levels to maximaize results. WE are ultimately in search of the combination of engine, transmission, weight, and cylinders that produce the highest MPG.

Randomization

We are assuming random selection of vehicles, which will have random MPG and factors values.

Replication

There are replicates; the study observed the exact same factors on each vehicle. Each vehicle is an observation, but also a replicate. They would be “repeated” if the same study were performed on the same vehicle more than once.

Blocking

Blocking was applied when selecting the factors to use. The factors that were deemed null include engine displacement, horsepower, rear axle ratio, 1/4 mile time, number of forward gears, and number of carbs. These factors were not as significant as the 4 which were selected.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

#checking for relationships
boxplot(mtcars$mpg~mtcars$vs, xlab="V or Straight", ylab="MPG")

boxplot(mtcars$mpg~mtcars$am, xlab="Automatic or Manual", ylab="MPG")

boxplot(mtcars$mpg~mtcars$cyl, xlab="# of Cylinders", ylab="MPG")

boxplot(mtcars$mpg~mtcars$wt, xlab="Weight", ylab="MPG")

#anova test
model1=aov(mtcars$mpg~mtcars$vs+mtcars$am+mtcars$cyl+mtcars$wt)
anova(model1)

## Analysis of Variance Table
## 
## Response: mtcars$mpg
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## mtcars$vs   1 496.53  496.53 53.5014 1.484e-07 ***
## mtcars$am   1 276.03  276.03 29.7429 1.324e-05 ***
## mtcars$cyl  2  94.59   47.30  5.0961    0.0143 *  
## mtcars$wt   3  36.16   12.05  1.2987    0.2977    
## Residuals  24 222.74    9.28                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the linear ANOVA model, with the binary data, we can see that the type of engine, the type of transmission, the number of cylinders, and the weight all have significant effects on the MPG.

qqnorm(residuals(model1))
qqline(residuals(model1))

After observing the Q-Q norm plot, we can tell that the data is relatively normal, so we can assume normality when proceeding.

Full Factorial Design of 64 runs:

library(FrF2)
u= FrF2(64,6,res3 = T)

## creating full factorial with 64 runs ...

print(u)

##     A  B  C  D  E  F
## 1   1 -1  1  1 -1  1
## 2  -1 -1 -1  1 -1  1
## 3   1 -1 -1  1  1 -1
## 4  -1 -1  1  1  1 -1
## 5   1  1  1  1 -1 -1
## 6   1 -1 -1 -1  1 -1
## 7  -1 -1  1 -1  1  1
## 8   1  1 -1  1  1  1
## 9   1 -1  1 -1 -1 -1
## 10  1  1 -1 -1 -1  1
## 11  1 -1  1 -1  1  1
## 12 -1 -1  1  1 -1  1
## 13 -1 -1  1  1  1  1
## 14  1 -1  1  1 -1 -1
## 15  1  1 -1  1 -1 -1
## 16  1  1  1  1  1  1
## 17 -1  1  1 -1 -1 -1
## 18  1  1  1 -1  1 -1
## 19 -1  1  1  1  1  1
## 20 -1  1  1 -1  1 -1
## 21 -1 -1  1 -1  1 -1
## 22 -1  1 -1  1  1  1
## 23  1 -1 -1 -1 -1 -1
## 24  1  1  1 -1  1  1
## 25 -1  1 -1  1 -1 -1
## 26 -1 -1  1 -1 -1 -1
## 27 -1  1 -1  1 -1  1
## 28  1 -1  1  1  1  1
## 29 -1  1 -1 -1  1 -1
## 30 -1  1 -1  1  1 -1
## 31 -1  1  1 -1 -1  1
## 32  1  1  1  1  1 -1
## 33  1  1 -1  1 -1  1
## 34  1 -1 -1  1  1  1
## 35 -1  1  1  1 -1  1
## 36  1 -1 -1  1 -1 -1
## 37 -1  1 -1 -1 -1 -1
## 38 -1  1  1  1  1 -1
## 39 -1 -1 -1  1  1 -1
## 40 -1 -1  1  1 -1 -1
## 41 -1 -1 -1  1 -1 -1
## 42  1  1 -1 -1  1 -1
## 43  1  1 -1 -1 -1 -1
## 44  1 -1  1  1  1 -1
## 45  1 -1 -1  1 -1  1
## 46 -1  1  1  1 -1 -1
## 47 -1 -1  1 -1 -1  1
## 48 -1  1 -1 -1  1  1
## 49 -1  1 -1 -1 -1  1
## 50 -1  1  1 -1  1  1
## 51 -1 -1 -1  1  1  1
## 52 -1 -1 -1 -1  1  1
## 53  1 -1  1 -1 -1  1
## 54  1 -1 -1 -1 -1  1
## 55  1  1  1 -1 -1  1
## 56  1  1 -1 -1  1  1
## 57 -1 -1 -1 -1  1 -1
## 58  1 -1 -1 -1  1  1
## 59 -1 -1 -1 -1 -1 -1
## 60 -1 -1 -1 -1 -1  1
## 61  1  1  1 -1 -1 -1
## 62  1  1 -1  1  1 -1
## 63  1 -1  1 -1  1 -1
## 64  1  1  1  1 -1  1
## class=design, type= full factorial

Fractional factorial design of 8 runs:

library(FrF2)
s= FrF2(8,6,res3 = T)
print(s)

##    A  B  C  D  E  F
## 1 -1 -1  1  1 -1 -1
## 2 -1 -1 -1  1  1  1
## 3  1  1  1  1  1  1
## 4  1 -1  1 -1  1 -1
## 5 -1  1  1 -1 -1  1
## 6 -1  1 -1 -1  1 -1
## 7  1 -1 -1 -1 -1  1
## 8  1  1 -1  1 -1 -1
## class=design, type= FrF2

We can recognize apparent confounding in within the data below. Since the design of the experiment is resolution 3, the MEs and 2fis are confounding with 2fis.

aliasprint(s)

## $legend
## [1] A=A B=B C=C D=D E=E F=F
## 
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD

Testing

library("rsm")
car.rsm=rsm(mpg ~ SO(vs,am,cyl,wt), data=car1)

## Warning in rsm(mpg ~ SO(vs, am, cyl, wt), data = car1): Some coefficients are aliased - cannot use 'rsm' methods.
##   Returning an 'lm' object.

summary(car.rsm)

## 
## Call:
## rsm(formula = mpg ~ SO(vs, am, cyl, wt), data = car1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.971 -1.037  0.000  1.391  5.529 
## 
## Coefficients: (5 not defined because of singularities)
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  11.833      1.695   6.983 5.22e-07 ***
## FO(vs, am, cyl, wt)vs         9.667      3.389   2.852 0.009266 ** 
## FO(vs, am, cyl, wt)am        14.167      3.389   4.180 0.000389 ***
## FO(vs, am, cyl, wt)cyl       -5.650      3.595  -1.572 0.130287    
## FO(vs, am, cyl, wt)wt         4.289      1.957   2.192 0.039265 *  
## TWI(vs, am, cyl, wt)vs:am    -7.295      4.619  -1.580 0.128492    
## TWI(vs, am, cyl, wt)vs:cyl  -10.075      4.403  -2.288 0.032087 *  
## TWI(vs, am, cyl, wt)vs:wt    -2.189      4.093  -0.535 0.598146    
## TWI(vs, am, cyl, wt)am:cyl       NA         NA      NA       NA    
## TWI(vs, am, cyl, wt)am:wt   -14.889      4.093  -3.638 0.001453 ** 
## TWI(vs, am, cyl, wt)cyl:wt   11.250      5.084   2.213 0.037584 *  
## PQ(vs, am, cyl, wt)vs^2          NA         NA      NA       NA    
## PQ(vs, am, cyl, wt)am^2          NA         NA      NA       NA    
## PQ(vs, am, cyl, wt)cyl^2         NA         NA      NA       NA    
## PQ(vs, am, cyl, wt)wt^2          NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.935 on 22 degrees of freedom
## Multiple R-squared:  0.8317, Adjusted R-squared:  0.7628 
## F-statistic: 12.08 on 9 and 22 DF,  p-value: 1.233e-06

After reviewing the results of the RSM model, we can reject the null hypotheses for first order vs, first order am, and two way interaction between am and wt.

The following contour plots demonstrate the MPG in response to two variables per plot.

par(mfrow=c(2,3))
contour(car.rsm, ~vs + am + cyl + wt, image=TRUE, at=summary(car.rsm$canonical$xs))

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

The following images are perspective plots for the interactions.

par(mfrow=c(1,1))
persp(car.rsm, ~ vs + am, image=TRUE, at = c(summary(car.rsm)$canonical$xs, Block="B2"), contour="colors", zlab="MPG", theta=30)

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

par(mfrow=c(1,1))
persp(car.rsm, ~ vs + cyl, image=TRUE, at = c(summary(car.rsm)$canonical$xs, Block="B2"), contour="colors", zlab="MPG", theta=30)

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

par(mfrow=c(1,1))
persp(car.rsm, ~ vs + wt, image=TRUE, at = c(summary(car.rsm)$canonical$xs, Block="B2"), contour="colors", zlab="MPG", theta=30)

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

par(mfrow=c(1,1))
persp(car.rsm, ~ am + cyl, image=TRUE, at = c(summary(car.rsm)$canonical$xs, Block="B2"), contour="colors", zlab="MPG", theta=30)

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

par(mfrow=c(1,1))
persp(car.rsm, ~ am + wt, image=TRUE, at = c(summary(car.rsm)$canonical$xs, Block="B2"), contour="colors", zlab="MPG", theta=30)

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

par(mfrow=c(1,1))
persp(car.rsm, ~ cyl + wt, image=TRUE, at = c(summary(car.rsm)$canonical$xs, Block="B2"), contour="colors", zlab="MPG", theta=30)

## Warning in predict.lm(lmobj, newdata = newdata): prediction from a rank-
## deficient fit may be misleading

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

Model Adequecy Checking

Shapiro-Wilk Test

We will utilize the Shapiro-Wilk Test to test normality. It uses the null hypothesis principle to check if a sample(i.e. our vehicles) came from a normally distributed population.

shapiro.test(car1$mpg)

## 
##  Shapiro-Wilk normality test
## 
## data:  car1$mpg
## W = 0.9476, p-value = 0.1229

shapiro.test(1/(car1$mpg))

## 
##  Shapiro-Wilk normality test
## 
## data:  1/(car1$mpg)
## W = 0.9388, p-value = 0.06922

shapiro.test((car1$mpg)^2)

## 
##  Shapiro-Wilk normality test
## 
## data:  (car1$mpg)^2
## W = 0.8752, p-value = 0.00153

shapiro.test(sqrt(car1$mpg))

## 
##  Shapiro-Wilk normality test
## 
## data:  sqrt(car1$mpg)
## W = 0.9695, p-value = 0.4849

shapiro.test(log(car1$mpg))

## 
##  Shapiro-Wilk normality test
## 
## data:  log(car1$mpg)
## W = 0.9767, p-value = 0.699

Based on the results, we reject the null hypothesis only of the squared transformation. This proves that there is evidence, for this transform, that the data is not normally distributed.

One last normality check is the Q-Qnorm plot.

qqnorm(residuals(car.rsm), ylab="MPG")
qqline(residuals(car.rsm))

It is evident that a majority of the data follows a normal distribution. There are some outliers, but ideally, we would exclud or truncate them.

Conclusion

Based on the above RSM analysis, i WOuld suggest that the data are normally distributed.The optimal designs make sense for the problem.

4. References to the Literature.

N/A

5. Appendices

Raw Data: http://vincentarelbundock.github.io/Rdatasets/datasets.html

Project 4

Trevor Corrao

Friday, December 16, 2016

Recipes for the Design of Experiments:

Recipe Outline

Trevor Corrao

RPI

12/16/16 Version 1

1. Setting

System under test

Factors and Levels

Continuous Variables

Deciding Upon a Response Variable

The Data: How is it organized and what does it look like?

2.(Experimental) Design

Organization of the Experiment to Test the Hypothesis

Rationale for This Design

Randomization

Replication

Blocking

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Testing

Model Adequecy Checking

Shapiro-Wilk Test

Conclusion

4. References to the Literature.

5. Appendices