Main Effects
MEPlot(plan, abbrev = 5, cex.xax = 1.6, cex.main = 2)
Based off this plot is looks like all the main effects are significant except for the Has_Children
factor.
A national survey collected data from interviews with inmates at state and federally owned prisons from October 2003 to May 2004. Information collected included current offense and sentence, personal characteristics, and background information regarding prior experience with drugs, alcohol and abuse.
From this data set, I wanted to look at different factors that potentially influence sentence length. To do this I selected 2 2-level factors and 2 3-level factors from the original data set along with one response variable.
In this experiment 4 factors were examined.
1. Race: levels: White non hispanic (0), Black non hispanic (1), Hispanic (2)
2. Criminal History: First timer offender (0), Repeat offender w/o violent history (1), Repeat offender w/ violent history (2)
3. Veteran Status: No (0), Yes (1)
4. Has Children?: No (0), Yes (1)
The response variables considered in this experiment will be sentence length. It is a continuous variable that represents the length (in months) of the sentence each inmate received for their crime.
To get a better idea of what this data looks like, let’s examine the first six rows:
head(prison1)
## Race Criminal_History Veteran_Status Has_Children Sentence
## 1 1 0 0 1 19
## 2 1 0 0 1 46
## 3 2 0 0 1 58
## 4 2 0 0 1 22
## 5 1 1 0 1 180
## 6 2 0 0 1 33
We can also look at the structure of the data to verifiy the levels of each of the 4 factors:
str(prison1)
## 'data.frame': 3193 obs. of 5 variables:
## $ Race : Factor w/ 3 levels "0","1","2": 2 2 3 3 2 3 1 2 2 1 ...
## $ Criminal_History: Factor w/ 3 levels "0","1","2": 1 1 1 1 2 1 1 1 1 1 ...
## $ Veteran_Status : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Has_Children : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 2 1 2 ...
## $ Sentence : num 19 46 58 22 180 33 60 360 168 38 ...
With the current selection of factors we have a 22 32 factorial design. To make the design simpler, we will decompose the two 3-level factors each into two 2-level factors. Doing so will give us a 26 factorial design with 64 experimental runs. 2k factorials designs are useful as screening experiments because they require relatively few runs to estimate main and interaction effects.
We will also construct a 26-3 fractional factorial design to show how even less runs can produce estimable effects.
By utilizing a 2k factorial design and a 26-3 fractional factorial design we will be able to estimate main and interaction effects using very few experimental runs.
While we had no control over how the original data were collected, by selecting a random sample from the data, we are incorporating randomization into the model.
There is no replication in this model. Part of the purpose of using 2k and fractional factorial designs is to save money. This means only one replication will be used.
Blocking is not used in this design. This is a screening experiment and we are interested in the effects of all the factors.
Race Factor into 2 2-level factors
RaceA <- c(-1, 1, -1, 1)
RaceB <- c(-1, -1, 1, 1)
Race1 <- c("0", "1","1", "2")
Race_Factor <- as.data.frame((cbind(RaceA,RaceB,Race1)))
kable(Race_Factor, align = 'c')
RaceA | RaceB | Race1 |
---|---|---|
-1 | -1 | 0 |
1 | -1 | 1 |
-1 | 1 | 1 |
1 | 1 | 2 |
Criminal History Factor into 2 2-level factors
CrimHistA <- c(-1, 1, -1, 1)
CrimHistB <- c(-1, -1, 1, 1)
CrimHist1 <- c("0", "1","1", "2")
CrimHist_Factor <- as.data.frame((cbind(CrimHistA,CrimHistB,CrimHist1)))
kable(CrimHist_Factor, align = 'c')
CrimHistA | CrimHistB | CrimHist1 |
---|---|---|
-1 | -1 | 0 |
1 | -1 | 1 |
-1 | 1 | 1 |
1 | 1 | 2 |
Using the FrF2
package we can determine a full factorial 2k design.
FrF2(64,6, factor.names = c('RaceA', 'RaceB','CrimHistA', 'CrimHistB', 'Veteran','Children'), randomize = FALSE)
## creating full factorial with 64 runs ...
## RaceA RaceB CrimHistA CrimHistB Veteran Children
## 1 -1 -1 -1 -1 -1 -1
## 2 1 -1 -1 -1 -1 -1
## 3 -1 1 -1 -1 -1 -1
## 4 1 1 -1 -1 -1 -1
## 5 -1 -1 1 -1 -1 -1
## 6 1 -1 1 -1 -1 -1
## 7 -1 1 1 -1 -1 -1
## 8 1 1 1 -1 -1 -1
## 9 -1 -1 -1 1 -1 -1
## 10 1 -1 -1 1 -1 -1
## 11 -1 1 -1 1 -1 -1
## 12 1 1 -1 1 -1 -1
## 13 -1 -1 1 1 -1 -1
## 14 1 -1 1 1 -1 -1
## 15 -1 1 1 1 -1 -1
## 16 1 1 1 1 -1 -1
## 17 -1 -1 -1 -1 1 -1
## 18 1 -1 -1 -1 1 -1
## 19 -1 1 -1 -1 1 -1
## 20 1 1 -1 -1 1 -1
## 21 -1 -1 1 -1 1 -1
## 22 1 -1 1 -1 1 -1
## 23 -1 1 1 -1 1 -1
## 24 1 1 1 -1 1 -1
## 25 -1 -1 -1 1 1 -1
## 26 1 -1 -1 1 1 -1
## 27 -1 1 -1 1 1 -1
## 28 1 1 -1 1 1 -1
## 29 -1 -1 1 1 1 -1
## 30 1 -1 1 1 1 -1
## 31 -1 1 1 1 1 -1
## 32 1 1 1 1 1 -1
## 33 -1 -1 -1 -1 -1 1
## 34 1 -1 -1 -1 -1 1
## 35 -1 1 -1 -1 -1 1
## 36 1 1 -1 -1 -1 1
## 37 -1 -1 1 -1 -1 1
## 38 1 -1 1 -1 -1 1
## 39 -1 1 1 -1 -1 1
## 40 1 1 1 -1 -1 1
## 41 -1 -1 -1 1 -1 1
## 42 1 -1 -1 1 -1 1
## 43 -1 1 -1 1 -1 1
## 44 1 1 -1 1 -1 1
## 45 -1 -1 1 1 -1 1
## 46 1 -1 1 1 -1 1
## 47 -1 1 1 1 -1 1
## 48 1 1 1 1 -1 1
## 49 -1 -1 -1 -1 1 1
## 50 1 -1 -1 -1 1 1
## 51 -1 1 -1 -1 1 1
## 52 1 1 -1 -1 1 1
## 53 -1 -1 1 -1 1 1
## 54 1 -1 1 -1 1 1
## 55 -1 1 1 -1 1 1
## 56 1 1 1 -1 1 1
## 57 -1 -1 -1 1 1 1
## 58 1 -1 -1 1 1 1
## 59 -1 1 -1 1 1 1
## 60 1 1 -1 1 1 1
## 61 -1 -1 1 1 1 1
## 62 1 -1 1 1 1 1
## 63 -1 1 1 1 1 1
## 64 1 1 1 1 1 1
## class=design, type= full factorial
Again using the FrF2
package we can determine a design for the fractional factorial, this time specifying 8 runs for 6 factors.
d <- FrF2(8,6, res3 = T, factor.names = c('RaceA', 'RaceB','CrimHistA', 'CrimHistB', 'Veteran','Children'), randomize = FALSE)
print(d)
## RaceA RaceB CrimHistA CrimHistB Veteran Children
## 1 -1 -1 -1 1 1 1
## 2 1 -1 -1 -1 -1 1
## 3 -1 1 -1 -1 1 -1
## 4 1 1 -1 1 -1 -1
## 5 -1 -1 1 1 -1 -1
## 6 1 -1 1 -1 1 -1
## 7 -1 1 1 -1 -1 1
## 8 1 1 1 1 1 1
## class=design, type= FrF2
Our 26-3 is a resolution III design. This means that main effects are not aliased with any other main effect (me) but they are aliased with two factor interactions (2fi). To determine the aliasing in our design we use the aliasprint
function in the FrF2
package.
aliasprint(d)
## $legend
## [1] A=RaceA B=RaceB C=CrimHistA D=CrimHistB E=Veteran F=Children
##
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
Here we see which main effects are aliased with which 2fi. We also see which 2fi are aliased with other 2fi.
Based on this design, we can observe that the generator was I = ABD = ACE = BCF. You can check this by looking at the design above.
Now that we have our design, we will collect the data for our 8 experimental runs.
#subset data by run criteria
set1 <- subset(prison1, Race == 0 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 1)
set2 <- subset(prison1, Race == 1 & Criminal_History == 0 & Veteran_Status == 0 & Has_Children == 1)
set3 <- subset(prison1, Race == 1 & Criminal_History == 0 & Veteran_Status == 1 & Has_Children == 0)
set4 <- subset(prison1, Race == 2 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 0)
set5 <- subset(prison1, Race == 0 & Criminal_History == 2 & Veteran_Status == 0 & Has_Children == 0)
set6 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 0)
set7 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 1)
set8 <- subset(prison1, Race == 2 & Criminal_History == 2 & Veteran_Status == 1 & Has_Children == 1)
#randomly sample one observation for each set
#set seed so results are reproducable
set.seed(1587)
run1 <- set1[sample(1:nrow(set1), 1), ]
run2 <- set2[sample(1:nrow(set2), 1), ]
run3 <- set3[sample(1:nrow(set3), 1), ]
run4 <- set4[sample(1:nrow(set4), 1), ]
run5 <- set5[sample(1:nrow(set5), 1), ]
run6 <- set6[sample(1:nrow(set6), 1), ]
run7 <- set7[sample(1:nrow(set7), 1), ]
run8 <- set8[sample(1:nrow(set8), 1), ]
#define vector of responses for each run
response <- c(run1$Sentence, run2$Sentence, run3$Sentence, run4$Sentence, run5$Sentence, run6$Sentence, run7$Sentence, run8$Sentence)
#print result
response
## [1] 30 46 46 33 210 140 262 12
Now we will add the response column to our design from before.
plan <- add.response(d, response)
summary(plan)
## Call:
## FrF2(8, 6, res3 = T, factor.names = c("RaceA", "RaceB", "CrimHistA",
## "CrimHistB", "Veteran", "Children"), randomize = FALSE)
##
## Experimental design of type FrF2
## 8 runs
##
## Factor settings (scale ends):
## RaceA RaceB CrimHistA CrimHistB Veteran Children
## 1 -1 -1 -1 -1 -1 -1
## 2 1 1 1 1 1 1
##
## Responses:
## [1] response
##
## Design generating information:
## $legend
## [1] A=RaceA B=RaceB C=CrimHistA D=CrimHistB E=Veteran F=Children
##
## $generators
## [1] D=AB E=AC F=BC
##
##
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
##
##
## The design itself:
## RaceA RaceB CrimHistA CrimHistB Veteran Children response
## 1 -1 -1 -1 1 1 1 30
## 2 1 -1 -1 -1 -1 1 46
## 3 -1 1 -1 -1 1 -1 46
## 4 1 1 -1 1 -1 -1 33
## 5 -1 -1 1 1 -1 -1 210
## 6 1 -1 1 -1 1 -1 140
## 7 -1 1 1 -1 -1 1 262
## 8 1 1 1 1 1 1 12
## class=design, type= FrF2
MEPlot(plan, abbrev = 5, cex.xax = 1.6, cex.main = 2)
Based off this plot is looks like all the main effects are significant except for the Has_Children
factor.
IAPlot(plan, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5)
summary(lm(plan))
## Number of observations used: 8
## Formula:
## response ~ (RaceA + RaceB + CrimHistA + CrimHistB + Veteran +
## Children)^2
##
## Call:
## lm.default(formula = fo, data = model.frame(fo, data = formula))
##
## Residuals:
## ALL 8 residuals are 0: no residual degrees of freedom!
##
## Coefficients: (14 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 97.375 NA NA NA
## RaceA1 -39.625 NA NA NA
## RaceB1 -9.125 NA NA NA
## CrimHistA1 58.625 NA NA NA
## CrimHistB1 -26.125 NA NA NA
## Veteran1 -40.375 NA NA NA
## Children1 -9.875 NA NA NA
## RaceA1:Children1 -18.875 NA NA NA
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 7 and 0 DF, p-value: NA
From this output, we see again that the main effect size for Has_Children
is not very big. Based on our analysis of our 2^6-3 design, Has_Children should not be included in the model.
Tenative Model:
\[Sentence = 97.375 - 39.625RaceA - 9.125RaceB + 58.625CrimHistA - 26.125CrimHistB - 40.375Veteran\]
To check the validity of the results about we will test our model using a ANOVA test on the full dataset.
model <- (aov(Sentence ~ Race + Criminal_History + Veteran_Status))
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Race 1 9143 9143 0.765 0.382
## Criminal_History 1 987324 987324 82.588 < 2e-16 ***
## Veteran_Status 1 286936 286936 24.002 1.01e-06 ***
## Residuals 3493 41757919 11955
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From this we can see that main effect of Criminal History and Veteran Status were significant. However, Race was not significant according to the ANOVA model. This demonstrates that while fractional factorial designs can be useful, they also have a greater potential for misleading results.
hist(prison1$Sentence)
qqnorm(residuals(model))
From the histogram and the plot it does not appear that the assumption of normality was met. Any conclusions drawn from the above analysis should be met with caution and further analyis is needed.
Literature: Montgomery, Douglas C.. Design and Analysis of Experiments, 8th Edition, PDF
Ulrike Gromping (2014). R Package FrF2 for Creating and Analyzing Fractional Factorial 2-Level Designs. Journal of Statistical Software, January 2014, Volume 56, Issue 1. #5. Appendices Raw R code:
require(FrF2)
require(knitr)
load("C:/Users/Clare Dorsey/Documents/Design of Experiments/raw_prison.rda")
prison_data <- da04572.0003
prison <- subset(prison_data, select = c(2, 4, 24, 32, 35))
prison <- subset(prison, prison$CS_SENTENCEMTH < 1000)
#define length of data frame
l = nrow(prison)
#create empty data frames to rewrite factors as numbers
racenum = data.frame(l)
crimhistnum = data.frame(l)
veterannum = data.frame(l)
childrennum = data.frame(l)
#replacement loop
for (i in 1:l) {
#Race replace white with 0, black with 1, hispanic with 2, all others with 3
if (prison$RACE[i] == "(0000001) White non-hispanic") {
racenum[i,1] <- 0
}
else if (prison$RACE[i] == "(0000002) Black non-hispanic") {
racenum[i,1] <- 1
}
else if (prison$RACE[i] == "(0000003) Hispanic") {
racenum[i,1] <- 2
}
else {
racenum[i,1] <- 3
}
#crimhistory replace first timers with 0, non violent recidivist with 1, violent recidivist with 2, missing with 3
if (prison$CH_CRIMHIST_COLLAPSED[i] == "(0000001) First timers") {
crimhistnum[i,1] <- 0
}
else if (prison$CH_CRIMHIST_COLLAPSED[i] == "(0000003) Recidivist, no current or prior violent offense") {
crimhistnum[i,1] <- 1
}
else if (prison$CH_CRIMHIST_COLLAPSED[i] == "(0000002) Recidivist, current or past violent offense") {
crimhistnum[i,1] <- 2
}
else {
crimhistnum[i,1] <- 3
}
#veteran status replace no with 0 and yes with 1, missing with 2
if (prison$VETERAN[i] == "(2) No") {
veterannum[i,1] <- 0
}
else {
veterannum[i,1] <- 1
}
#has children replace no with 0, yes with 1, missing with 2
if (prison$SES_HASCHILDREN[i] == "(0000002) Does not have children") {
childrennum[i,1] <- 0
}
else if(prison$SES_HASCHILDREN[i] == "(0000001) Has children") {
childrennum[i,1] <- 1
}
else {
childrennum[i,1] <- 2
}
}
#combine numbered dataframe with response variable into new dataframe
prison1 <- cbind(racenum, crimhistnum, veterannum, childrennum, prison$CS_SENTENCEMTH)
#rename columns
colnames(prison1) <- c("Race", "Criminal_History", "Veteran_Status" , "Has_Children" , "Sentence")
attach(prison1)
# get rid of unwanted factors
prison1 <- subset(prison1, prison1$Race < 3)
prison1 <- subset(prison1, prison1$Criminal_History < 3)
prison1 <- subset(prison1, prison1$Veteran_Status < 2)
prison1 <- subset(prison1, prison1$Has_Children < 2)
prison1$Race <- as.factor(prison1$Race)
prison1$Criminal_History <- as.factor(prison1$Criminal_History)
prison1$Veteran_Status <- as.factor(prison1$Veteran_Status)
prison1$Has_Children <- as.factor(prison1$Has_Children)
head(prison1)
str(prison1)
#decompose Race factor
RaceA <- c(-1, 1, -1, 1)
RaceB <- c(-1, -1, 1, 1)
Race1 <- c("0", "1","1", "2")
Race_Factor <- as.data.frame((cbind(RaceA,RaceB,Race1)))
kable(Race_Factor, align = 'c')
#decompose Criminal History factor
CrimHistA <- c(-1, 1, -1, 1)
CrimHistB <- c(-1, -1, 1, 1)
CrimHist1 <- c("0", "1","1", "2")
CrimHist_Factor <- as.data.frame((cbind(CrimHistA,CrimHistB,CrimHist1)))
kable(CrimHist_Factor, align = 'c')
#create full factorial design
FrF2(64,6, factor.names = c('RaceA', 'RaceB','CrimHistA', 'CrimHistB', 'Veteran','Children'), randomize = FALSE)
#create fractional factorial design
d <- FrF2(8,6, res3 = T, factor.names = c('RaceA', 'RaceB','CrimHistA', 'CrimHistB', 'Veteran','Children'), randomize = FALSE)
print(d)
#check aliasing in desing
aliasprint(d)
#subset data by run criteria
set1 <- subset(prison1, Race == 0 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 1)
set2 <- subset(prison1, Race == 1 & Criminal_History == 0 & Veteran_Status == 0 & Has_Children == 1)
set3 <- subset(prison1, Race == 1 & Criminal_History == 0 & Veteran_Status == 1 & Has_Children == 0)
set4 <- subset(prison1, Race == 2 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 0)
set5 <- subset(prison1, Race == 0 & Criminal_History == 2 & Veteran_Status == 0 & Has_Children == 0)
set6 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 0)
set7 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 1)
set8 <- subset(prison1, Race == 2 & Criminal_History == 2 & Veteran_Status == 1 & Has_Children == 1)
#randomly sample one observation for each set
#set seed so results are reproducable
set.seed(1587)
run1 <- set1[sample(1:nrow(set1), 1), ]
run2 <- set2[sample(1:nrow(set2), 1), ]
run3 <- set3[sample(1:nrow(set3), 1), ]
run4 <- set4[sample(1:nrow(set4), 1), ]
run5 <- set5[sample(1:nrow(set5), 1), ]
run6 <- set6[sample(1:nrow(set6), 1), ]
run7 <- set7[sample(1:nrow(set7), 1), ]
run8 <- set8[sample(1:nrow(set8), 1), ]
#define vector of responses for each run
response <- c(run1$Sentence, run2$Sentence, run3$Sentence, run4$Sentence, run5$Sentence, run6$Sentence, run7$Sentence, run8$Sentence)
#print result
response
#add response column to design
plan <- add.response(d, response)
summary(plan)
#plot main effects
MEPlot(plan, abbrev = 5, cex.xax = 1.6, cex.main = 2)
#plot interaction effects
IAPlot(plan, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5)
#compute main effect size
summary(lm(plan))
#ANOVA test
model <- (aov(Sentence ~ Race + Criminal_History + Veteran_Status))
summary(model)
#check assumptions
hist(prison1$Sentence)
qqnorm(residuals(model))
qqline(residuals(model))