Project #3: Fractional Factorial Designs, Traffic Safety and Drunk Driving Laws

Andreas Vought

Rensselaer Polytechnic Institute

12/08/2016, V1.0

Revised 12/09/2016, V1.1

Completed 12/10/2016 V1.2

1. Setting

Ecdat is one of the recommended data libraries in “in 100+ interesting data sets”

Fatality = read.csv("Fatality.csv", header=TRUE, sep = ",")#selects dataset, will simply call data in subsequent functions 
FFDdata = Fatality

Dataset examines the effect of 6 factors on traffic incident fatality rates per 10000 accidents.

System Under Test

Dataset includes: (Initial Factors and Levels)

state integer for state ID code

year integer for year

mrall number representing traffic fatality rate (deaths per 10000)

beertax number for tax on case of beer

mlda number representing minimum legal drinking age

jaild factor with 2 levels for mandatory jail sentence

comserd factor with 2 levels for mandatory community service

vmiles number of average miles per driver

unrate number of employment rate

Perinc number of per capital personal income

For a condensed view of the data, call head/tail for a quick look at the set. Let’s save this display for when the dataset is ready for factorial design however.

For 2^6 Fractional Factorial Design, the dataset must be restructured. As instructed, the 2^k design will consist of 2-level (or higher order factors represented in terms of two 3-level factors). When properly structured, fractional factorial design will be implemented to achieve results with a limited number of experimental runs.

Factors and Levels

(Two level factors)

jaild factor is already defined as having two levels, so they will be unchanged in that regard except converting responses to be consistent with other high low factors (0, 1, 2)

levels(FFDdata$jaild)<-c(0,1)

comserd although this factor is already two level, I think it is similar to jaild in nature so I would like to examine a different soon-to-be two level factor. mlda minimum legal drinking age will be converted from numeric to a two level factor for analysis

FFDdata$mlda[FFDdata$mlda >=18 & FFDdata$mlda < 20]=0 #this line defines 18 and 19 year minimum drinking ages as a low level 
FFDdata$mlda[FFDdata$mlda >=20]=1 #ages 20 21 are considered high level 

jaild and mlda are now defined in terms of two levels (0,1) and will represent our two two-level factors.

(Three level factors)

vmiles and perinc are the two other factors I am interested in, so I will format that according to three levels. Initial inspection of vmiles shows a min value of 4.576, max 26.150 with mean 7.891 and 1st/3rd quartiles 7.183 and 8.504. The data are quite skewed but I will draw boundaries according to the 33rd and 66th quantiles found with the quantile() function.

FFDdata$vmiles[as.numeric(FFDdata$vmiles)>= 4.50 & as.numeric(FFDdata$vmiles) < 7.38386 ] = 0
FFDdata$vmiles[as.numeric(FFDdata$vmiles)>= 7.38386 & as.numeric(FFDdata$vmiles) < 8.185094] = 1
FFDdata$vmiles[as.numeric(FFDdata$vmiles)> 8.185094 & as.numeric(FFDdata$vmiles) < 26.150] = 2

perinc has a more normal distribution, but I will use the same 3 quantile convention for converting it to a three-level factor.

FFDdata$perinc[FFDdata$perinc>= 9500 & FFDdata$perinc <12658.92] = 0
FFDdata$perinc[FFDdata$perinc>= 12658.92 & FFDdata$perinc < 14603.35] = 1
FFDdata$perinc[FFDdata$perinc> 14603.35 & FFDdata$perinc <22195] = 2

Continuous Variables & Responce Variable

In this dataset, all factors are continuous variables except community service and jailtime. However, all relevant factors have been set to two or three level factors.

The only remaining continuous variable is the response variable, mrall, which measures traffic fatality rate.

The Data: How is it organized and what does it look like?

Dataset ‘Fatality’ contains 336 observations of 10 variables. For our FFD, we will consider only the response variable mrall and the independent variables jaild, mlda, vmiles, and perinc. This is all we need, so we can condense the dataset to include just these variables of interest.

FFDdata$state = FFDdata$year = FFDdata$beertax = FFDdata$comserd = FFDdata$unrate = NULL #drops remaining variables, not part of this analysis
FFDdata$mlda = as.factor(FFDdata$mlda) #defines two level factor
FFDdata$jaild = as.factor(FFDdata$jaild) #defines two level factor
FFDdata$vmiles = as.factor(FFDdata$vmiles) #defines three level factor
FFDdata$perinc = as.factor(FFDdata$perinc) #defines three level factor
str(FFDdata) #allows examination of dataset and variables 
## 'data.frame':    336 obs. of  6 variables:
##  $ X     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ mrall : num  2.13 2.35 2.34 2.19 2.67 ...
##  $ mlda  : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 1 1 1 ...
##  $ jaild : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 2 2 ...
##  $ vmiles: Factor w/ 3 levels "0","1","2": 1 2 3 3 3 3 3 1 1 1 ...
##  $ perinc: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 2 2 ...
head(FFDdata, n=8) #displays first 5 rows
##   X   mrall mlda jaild vmiles perinc
## 1 1 2.12836    0     0      0      0
## 2 2 2.34848    0     0      1      0
## 3 3 2.33643    0     0      2      0
## 4 4 2.19348    0     0      2      0
## 5 5 2.66914    1     0      2      0
## 6 6 2.71859    1     0      2      0
## 7 7 2.49391    1     0      2      0
## 8 8 2.49914    0     1      0      0
tail(FFDdata, n=8) #shows last 5 rows 
##       X   mrall mlda jaild vmiles perinc
## 329 329 1.66220    1     0      2      2
## 330 330 3.94118    0     1      2      1
## 331 331 3.35271    0     1      2      1
## 332 332 3.06043    0     1      2      1
## 333 333 2.98625    0     1      2      1
## 334 334 3.31361    0     1      2      1
## 335 335 2.63265    0     1      2      1
## 336 336 3.23591    0     1      2      1

Cool, we did it. We are ready for factorial design!

2. (Experimental) Design

The objective of this experiment is to observe the influence of numerous conditions on the rate of fatal traffic incidents (particularly alcohol related factors). While we know for a fact that alcohol consumption increases the risk of traffic accidents, but there are potentially some specific insights looming.

How will the experiment be organized and conducted?

2^m-3 fractional factorial design of the highest possible resolution is used to simplify calculations. Three level factors are broken into a combination of two 2-level factors. This is done to satisfy project requirements and reduce the overall number of calculations required to yield meaningful data for analysis. Replication is limited by the size of each array, since one unique factor combination is required. However, replicates are used to ensure accuracy. FFD will point out main and interaction effects, and using FrF2 the resolution of the design is constructed. Like our previous discussion, aliasing structure will be considered to gauge results. ANOVA and measured effects will determine the final model.

What is the rational for this design?

The rationale behind fractional factorial design is the economically determine cause-and-effect relationships between experimental variables. At this level we seek to screen out important effects from several of lesser importance. This is also an important precursor to response surface method and optimization design. Dataset is assumed to meet design requirements. Division of 3 level factors was arbitrarily chosen with some basic logical guidance. FFD is essentially being used to approximate results of a full factorial design. Full factorial design would have 64 experimental runs. This fractional design will have 8:

library("FrF2")
## Warning: package 'FrF2' was built under R version 3.3.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.3.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.3.2
## 
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## The following object is masked from 'package:graphics':
## 
##     plot.design
## The following object is masked from 'package:base':
## 
##     lengths
FrF2(8,6, randomize = FALSE) 
##    A  B  C  D  E  F
## 1 -1 -1 -1  1  1  1
## 2  1 -1 -1 -1 -1  1
## 3 -1  1 -1 -1  1 -1
## 4  1  1 -1  1 -1 -1
## 5 -1 -1  1  1 -1 -1
## 6  1 -1  1 -1  1 -1
## 7 -1  1  1 -1 -1  1
## 8  1  1  1  1  1  1
## class=design, type= FrF2

Our FFD will be randomized, above is simply an ordered example.

Replicate:

Replicates are used to achieve accuracy. Due to their unique nature, each factor combination will occur once so the replications may be limited. The purpose of FFD is to reduce required replications while still achieving results, so replication is limited.

Randomize:

Randomization is important in meaningful statistical experiments. It is assumed that data is randomly collected, given that it is provided in a built in library containing datasets meant to be used for this type of analysis. In addition to random selection, the order of experimental runs is randomly executed.

Block:

Blocking is not used per se, at least not directly. The experimental dataset has been modified to only include variables of interest, and the rest have been dropped. While blocking increases experimental precision, it is not used in this experiment as it has in previous projects.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and Descriptive summary

Levels of each factor are shown in the following boxplots, which visually shows the relationship between main effects of factors over response.

### Testing

Randomized 2^m-3 Fractional Factorial Design We need to decompose the 3 level factors into two 2 level factors, using this process:

x = FFDdata [1,]
vmiles = c(0,0)

L = levels(FFDdata[,'vmiles'])
for(i in seq(2,dim(FFDdata)[1])){
  if(FFDdata[i,'vmiles']==L[1]){
    x = rbind(x,FFDdata[i,])
    vmiles = rbind(vmiles,c(0,0))
  } 
  else if(FFDdata[i,'vmiles']==L[2]){
    x = rbind(x,FFDdata[i,])
    x = rbind(x,FFDdata[i,])
    vmiles = rbind(vmiles,c(1,0))
    vmiles = rbind(vmiles,c(0,1))
  } 
  else{
    x = rbind(x,FFDdata[i,])
    vmiles = rbind(vmiles,c(1,1))
  }
}

x = cbind(x,vmiles)
x = x[,c(1,2,3,4,5, 6, 7)]
colnames(x) = c(colnames(FFDdata[1,c(1,2,3,4,5)]),'vmilesA','vmilesB')

Lc = levels(FFDdata[,'perinc'])

perinc = c(1,0)
perinc = rbind(perinc,c(0,1))
dat = x[1,]
dat = rbind(dat,dat)


for(i in seq(2,dim(x)[1])){
  if(x[i,'perinc']==Lc[1]){
    dat = rbind(dat,x[i,])
    perinc = rbind(perinc,c(0,0))
  } 
  else if(x[i,'perinc']==Lc[2]){
    dat = rbind(dat,x[i,])
    dat = rbind(dat,x[i,])
    perinc = rbind(perinc,c(1,0))
    perinc = rbind(perinc,c(0,1))
  } 
  else{
    dat = rbind(dat,x[i,])
    perinc = rbind(perinc,c(1,1))
  }
}

dat = cbind(dat,perinc)

colnames(dat) = c(colnames(x),'perincA','perincB')
dat = dat[,c(1,2,3,4,5,6,7,8,9)]

Here is a look at the full factorial design, consisting of 64 runs.

expand.grid(mlda = c(0,1), jaild = c(0,1), vmilesA = c(0,1), vmilesB = c(0,1), perincA = c(0,1), perincB = c(0,1))
   mlda jaild vmilesA vmilesB perincA perincB
1     0     0       0       0       0       0
2     1     0       0       0       0       0
3     0     1       0       0       0       0
4     1     1       0       0       0       0
5     0     0       1       0       0       0
6     1     0       1       0       0       0
7     0     1       1       0       0       0
8     1     1       1       0       0       0
9     0     0       0       1       0       0
10    1     0       0       1       0       0
11    0     1       0       1       0       0
12    1     1       0       1       0       0
13    0     0       1       1       0       0
14    1     0       1       1       0       0
15    0     1       1       1       0       0
16    1     1       1       1       0       0
17    0     0       0       0       1       0
18    1     0       0       0       1       0
19    0     1       0       0       1       0
20    1     1       0       0       1       0
21    0     0       1       0       1       0
22    1     0       1       0       1       0
23    0     1       1       0       1       0
24    1     1       1       0       1       0
25    0     0       0       1       1       0
26    1     0       0       1       1       0
27    0     1       0       1       1       0
28    1     1       0       1       1       0
29    0     0       1       1       1       0
30    1     0       1       1       1       0
31    0     1       1       1       1       0
32    1     1       1       1       1       0
33    0     0       0       0       0       1
34    1     0       0       0       0       1
35    0     1       0       0       0       1
36    1     1       0       0       0       1
37    0     0       1       0       0       1
38    1     0       1       0       0       1
39    0     1       1       0       0       1
40    1     1       1       0       0       1
41    0     0       0       1       0       1
42    1     0       0       1       0       1
43    0     1       0       1       0       1
44    1     1       0       1       0       1
45    0     0       1       1       0       1
46    1     0       1       1       0       1
47    0     1       1       1       0       1
48    1     1       1       1       0       1
49    0     0       0       0       1       1
50    1     0       0       0       1       1
51    0     1       0       0       1       1
52    1     1       0       0       1       1
53    0     0       1       0       1       1
54    1     0       1       0       1       1
55    0     1       1       0       1       1
56    1     1       1       0       1       1
57    0     0       0       1       1       1
58    1     0       0       1       1       1
59    0     1       0       1       1       1
60    1     1       0       1       1       1
61    0     0       1       1       1       1
62    1     0       1       1       1       1
63    0     1       1       1       1       1
64    1     1       1       1       1       1

Next, lets examine the design and aliasing strucutre.

fatdata = FrF2(8,6,factor.names = c('mlda','jaild','vmilesA','vmilesB', 'perincA', 'perincB'))
fatdata

summary(fatdata)

aliasprint(fatdata)
> fatdata = FrF2(8,6,factor.names = c('mlda','jaild','vmilesA','vmilesB', 'perincA', 'perincB'))
> fatdata
  mlda jaild vmilesA vmilesB perincA perincB
1   -1     1       1      -1      -1       1
2   -1    -1       1       1      -1      -1
3   -1    -1      -1       1       1       1
4   -1     1      -1      -1       1      -1
5    1    -1      -1      -1      -1       1
6    1    -1       1      -1       1      -1
7    1     1      -1       1      -1      -1
8    1     1       1       1       1       1
class=design, type= FrF2 
> summary(fatdata)
Call:
FrF2(8, 6, factor.names = c("mlda", "jaild", "vmilesA", "vmilesB", 
    "perincA", "perincB"))

Experimental design of type  FrF2 
8  runs

Factor settings (scale ends):
  mlda jaild vmilesA vmilesB perincA perincB
1   -1    -1      -1      -1      -1      -1
2    1     1       1       1       1       1

Design generating information:
$legend
[1] A=mlda    B=jaild   C=vmilesA D=vmilesB E=perincA F=perincB

$generators
[1] D=AB E=AC F=BC


Alias structure:
$main
[1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE

$fi2
[1] AF=BE=CD


The design itself:
  mlda jaild vmilesA vmilesB perincA perincB
1   -1     1       1      -1      -1       1
2   -1    -1       1       1      -1      -1
3   -1    -1      -1       1       1       1
4   -1     1      -1      -1       1      -1
5    1    -1      -1      -1      -1       1
6    1    -1       1      -1       1      -1
7    1     1      -1       1      -1      -1
8    1     1       1       1       1       1
class=design, type= FrF2 
> aliasprint(fatdata)
$legend
[1] A=mlda    B=jaild   C=vmilesA D=vmilesB E=perincA F=perincB

$main
[1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE

$fi2
[1] AF=BE=CD

Above, we see the aliasing structure which shows what variables are confounded. Main effects are aliased with some two factor interactions, which show aliasing between each other.

ISYE 6020 has an additional testing requirement:

From the summary printed above, generators in this experimental design are D =AB, E=AC, and F=BC. Generator I can be determined by multiplying both sides of the interactions listed above by D, E, and F respectively. Results are: DD=ABD, EE=ACE, FF=BCF which is transformed to I=ABD=ACE=BCF

Estimation of Main and Interaction Effects:

As in previous experiments, ME and IE can be examined through a linear model and analysis of variance.

fit1 = lm(mrall~mlda+jaild+vmilesA+vmilesB+perincA+perincB, data=FFDdata)
anova(fit1) 
fit2 = lm(mrall~(mlda+jaild+vmilesA+vmilesB+perincA+perincB)^2, data=FFDdata)
anova(fit2)
> fit1 = lm(mrall~mlda+jaild+vmilesA+vmilesB+perincA+perincB, data=dat)
> anova(fit1) 
Analysis of Variance Table

Response: mrall
           Df Sum Sq Mean Sq F value    Pr(>F)    
mlda        1  5.082  5.0816  29.876 6.836e-08 ***
jaild       1 14.086 14.0856  82.813 < 2.2e-16 ***
vmilesA     1 16.490 16.4904  96.951 < 2.2e-16 ***
vmilesB     1 15.414 15.4135  90.620 < 2.2e-16 ***
perincA     1  9.026  9.0265  53.069 1.050e-12 ***
perincB     1  8.999  8.9991  52.908 1.132e-12 ***
Residuals 583 99.162  0.1701                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> fit2 = lm(mrall~(mlda+jaild+vmilesA+vmilesB+perincA+perincB)^2, data=dat)
> anova(fit2)
Analysis of Variance Table

Response: mrall
                 Df Sum Sq Mean Sq  F value    Pr(>F)    
mlda              1  5.082  5.0816  31.9095 2.556e-08 ***
jaild             1 14.086 14.0856  88.4498 < 2.2e-16 ***
vmilesA           1 16.490 16.4904 103.5502 < 2.2e-16 ***
vmilesB           1 15.414 15.4135  96.7882 < 2.2e-16 ***
perincA           1  9.026  9.0265  56.6812 2.024e-13 ***
perincB           1  8.999  8.9991  56.5094 2.192e-13 ***
mlda:jaild        1  3.549  3.5492  22.2869 2.961e-06 ***
mlda:vmilesA      1  0.235  0.2348   1.4747 0.2251074    
mlda:vmilesB      1  0.193  0.1927   1.2101 0.2717734    
mlda:perincA      1  0.138  0.1379   0.8659 0.3525010    
mlda:perincB      1  0.223  0.2232   1.4015 0.2369624    
jaild:vmilesA     1  0.038  0.0383   0.2403 0.6241877    
jaild:vmilesB     1  0.035  0.0346   0.2170 0.6415175    
jaild:perincA     1  1.021  1.0212   6.4128 0.0115980 *  
jaild:perincB     1  1.057  1.0573   6.6390 0.0102280 *  
vmilesA:vmilesB   1  0.092  0.0922   0.5791 0.4469790    
vmilesA:perincA   1  0.019  0.0186   0.1166 0.7329022    
vmilesA:perincB   1  0.019  0.0185   0.1164 0.7331424    
vmilesB:perincA   1  0.022  0.0216   0.1357 0.7127559    
vmilesB:perincB   1  0.021  0.0214   0.1344 0.7140681    
perincA:perincB   1  2.047  2.0466  12.8512 0.0003663 ***
Residuals       568 90.454  0.1593                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA table above for main effects. All factors chosen have a significant effect on traffic mortality rate. However, there is confounding of 2-factor interactions, which we determined from the aliasing structured printed by a function in FrF2.

Parameter estimation for both models are displayed below:

coef(fit1)
coef(fit2)
 coef(fit1)
(Intercept)       mlda1      jaild1     vmilesA     vmilesB 
  1.9924755  -0.1292047   0.2425419   0.3162926   0.3162926 
    perincA     perincB 
 -0.2506703  -0.2506703 
 coef(fit2)
    (Intercept)           mlda1          jaild1         vmilesA 
     1.83036916      0.21225996      0.44665904      0.40031307 
        vmilesB         perincA         perincB    mlda1:jaild1 
     0.40031307     -0.28171472     -0.28171472     -0.53832894 
  mlda1:vmilesA   mlda1:vmilesB   mlda1:perincA   mlda1:perincB 
    -0.09029857     -0.09029857     -0.15628997     -0.15628997 
 jaild1:vmilesA  jaild1:vmilesB  jaild1:perincA  jaild1:perincB 
     0.01198586      0.01198586      0.22908719      0.22908719 
vmilesA:vmilesB vmilesA:perincA vmilesA:perincB vmilesB:perincA 
    -0.03167265     -0.01243216     -0.01243216     -0.01243216 
vmilesB:perincB perincA:perincB 
    -0.01243216      0.25028616 

Diagnostics/Model Adequacy Checking

qqnorm(residuals(fit1))
qqline(residuals(fit1))

qqnorm(residuals(fit2))
qqline(residuals(fit2))

plot(fitted(fit1),residuals(fit1))
plot(fitted(fit2),residuals(fit2))

hist(FFDdata$mrall, breaks = 25)

The assumption of normality and random selection are required for analysis of variance. Histograms, QQ plots, and scatter plots suggests that this assumption is not valid. The histogram shows a slight right skew, and the QQ plot does not follow the normal line.

From these results, and given the abbreviated nature of fractional factorial design, one should conclude that further tests/analysis should be conducted for a sufficient study.

4. References to the literature

  1. Montgomery, Douglas C. 2012. Design and Analysis of Experiments, 8th Edition.
  2. http://stackoverflow.com/questions
  3. https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

5. Appendices

A summary of, or pointer to, the raw data

https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf

complete and documented R code

install.packages("Ecdat")
library("Ecdat", lib.loc="~/R/win-library/3.2")
FFDdata = Fatality #selects dataset, will simply call data in subsequent functions 
levels(FFDdata$jaild)<-c(0,1)
FFDdata$mlda[FFDdata$mlda >=18 & FFDdata$mlda < 20]=0 #this line defines 18 and 19 year minimum drinking ages as a low level 
FFDdata$mlda[FFDdata$mlda>=20]=1 #ages 20 21 are considered high level 
FFDdata$vmiles[as.numeric(FFDdata$vmiles)>= 4.50 & as.numeric(FFDdata$vmiles) <7.38386 ] = 0
FFDdata$vmiles[as.numeric(FFDdata$vmiles)>= 7.38386 & as.numeric(FFDdata$vmiles) < 8.185094] = 1
FFDdata$vmiles[as.numeric(FFDdata$vmiles)> 8.185094 & as.numeric(FFDdata$vmiles) <26.150] = 2
FFDdata$perinc[FFDdata$perinc>= 9500 & FFDdata$perinc <12658.92] = 0
FFDdata$perinc[FFDdata$perinc>= 12658.92 & FFDdata$perinc < 14603.35] = 1
FFDdata$perinc[FFDdata$perinc> 14603.35 & FFDdata$perinc <22195] = 2
FFDdata$state = FFDdata$year = FFDdata$beertax = FFDdata$comserd = FFDdata$unrate = NULL #drops remaining variables, not part of this analysis
FFDdata$mlda = as.factor(FFDdata$mlda) #defines two level factor
FFDdata$jaild = as.factor(FFDdata$jaild) #defines two level factor
FFDdata$vmiles = as.factor(FFDdata$vmiles) #defines three level factor
FFDdata$perinc = as.factor(FFDdata$perinc) #defines three level factor
str(FFDdata) #allows examination of dataset and variables 
head(FFDdata, n=8) #displays first 5 rows
tail(FFDdata, n=8) #shows last 5 rows 
x = FFDdata [1,]
vmiles = c(0,0)
L = levels(FFDdata[,'vmiles'])
for(i in seq(2,dim(FFDdata)[1])){
  if(FFDdata[i,'vmiles']==L[1]){
    x = rbind(x,FFDdata[i,])
    vmiles = rbind(vmiles,c(0,0))
  } 
  else if(FFDdata[i,'vmiles']==L[2]){
    x = rbind(x,FFDdata[i,])
    x = rbind(x,FFDdata[i,])
    vmiles = rbind(vmiles,c(1,0))
    vmiles = rbind(vmiles,c(0,1))
  } 
  else{
    x = rbind(x,FFDdata[i,])
    vmiles = rbind(vmiles,c(1,1))
  }
}
x = cbind(x,vmiles)
x = x[,c(1,2,3,4,5, 6, 7)]
colnames(x) = c(colnames(FFDdata[1,c(1,2,3,4,5)]),'vmilesA','vmilesB')
Lc = levels(FFDdata[,'perinc'])
perinc = c(1,0)
perinc = rbind(perinc,c(0,1))
dat = x[1,]
dat = rbind(dat,dat)

for(i in seq(2,dim(x)[1])){
  if(x[i,'perinc']==Lc[1]){
    dat = rbind(dat,x[i,])
    perinc = rbind(perinc,c(0,0))
  } 
  else if(x[i,'perinc']==Lc[2]){
    dat = rbind(dat,x[i,])
    dat = rbind(dat,x[i,])
    perinc = rbind(perinc,c(1,0))
    perinc = rbind(perinc,c(0,1))
  } 
  else{
    dat = rbind(dat,x[i,])
    perinc = rbind(perinc,c(1,1))
  }
}

dat = cbind(dat,perinc)
colnames(dat) = c(colnames(x),'perincA','perincB')
dat = dat[,c(1,2,3,4,5,6,7,8,9)]

expand.grid(mlda = c(0,1), jaild = c(0,1), vmilesA = c(0,1), vmilesB = c(0,1), perincA = c(0,1), perincB = c(0,1))

fatdata = FrF2(8,6,factor.names = c('mlda','jaild','vmilesA','vmilesB', 'perincA', 'perincB'))
fatdata
summary(fatdata)
aliasprint(fatdata)

fit1 = lm(mrall~mlda+jaild+vmilesA+vmilesB+perincA+perincB, data=dat)
anova(fit1) 
fit2 = lm(mrall~(mlda+jaild+vmilesA+vmilesB+perincA+perincB)^2, data=dat)
anova(fit2)
coef(fit1)
coef(fit2)
qqnorm(residuals(fit1))
qqline(residuals(fit1))
qqnorm(residuals(fit2))
qqline(residuals(fit2))
plot(fitted(fit1),residuals(fit1))
plot(fitted(fit2),residuals(fit2))
hist(FFDdata$mrall, breaks = 25)