1. Setting

A national survey collected data from interviews with inmates at state and federally owned prisons from October 2003 to May 2004. Information collected included current offense and sentence, personal characteristics, and background information regarding prior experience with drugs, alcohol and abuse.

System Under Test

From this data set, I wanted to look at different factors that potentially influence sentence length. To do this I selected 2 2-level factors and 2 3-level factors from the original data set along with one response variable.

Factors and Levels

In this experiment 4 factors were examined.
1. Race: levels: White non hispanic (0), Black non hispanic (1), Hispanic (2)
2. Criminal History: First timer offender (0), Repeat offender w/o violent history (1), Repeat offender w/ violent history (2)
3. Veteran Status: No (0), Yes (1)
4. Has Children?: No (0), Yes (1)

Continuous Variables

The response variables considered in this experiment will be sentence length. It is a continuous variable that represents the length (in months) of the sentence each inmate received for their crime.

The Data

To get a better idea of what this data looks like, let’s examine the first six rows:

head(prison1)
##   Race Criminal_History Veteran_Status Has_Children Sentence
## 1    1                0              0            1       19
## 2    1                0              0            1       46
## 3    2                0              0            1       58
## 4    2                0              0            1       22
## 5    1                1              0            1      180
## 6    2                0              0            1       33

We can also look at the structure of the data to verifiy the levels of each of the 4 factors:

str(prison1)
## 'data.frame':    3193 obs. of  5 variables:
##  $ Race            : Factor w/ 3 levels "0","1","2": 2 2 3 3 2 3 1 2 2 1 ...
##  $ Criminal_History: Factor w/ 3 levels "0","1","2": 1 1 1 1 2 1 1 1 1 1 ...
##  $ Veteran_Status  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Has_Children    : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 2 1 2 ...
##  $ Sentence        : num  19 46 58 22 180 33 60 360 168 38 ...

2. Experimental Design

With the current selection of factors we have a 22 32 factorial design. To make the design simpler, we will decompose the two 3-level factors each into two 2-level factors. Doing so will give us a 26 factorial design with 64 experimental runs. 2k factorials designs are useful as screening experiments because they require relatively few runs to estimate main and interaction effects.

We will also construct a 26-3 fractional factorial design to show how even less runs can produce estimable effects.

Rationale for Design

By utilizing a 2k factorial design and a 26-3 fractional factorial design we will be able to estimate main and interaction effects using very few experimental runs.

Randomization

While we had no control over how the original data were collected, by selecting a random sample from the data, we are incorporating randomization into the model.

Replicaton

There is no replication in this model. Part of the purpose of using 2k and fractional factorial designs is to save money. This means only one replication will be used.

Blocking

Blocking is not used in this design. This is a screening experiment and we are interested in the effects of all the factors.

Decomposing 3-level factors to 2-level factors

Race Factor into 2 2-level factors

RaceA <- c(-1, 1, -1, 1)
RaceB <- c(-1, -1, 1, 1)
Race1 <- c("0", "1","1", "2")
Race_Factor <- as.data.frame((cbind(RaceA,RaceB,Race1)))
kable(Race_Factor, align = 'c')
RaceA RaceB Race1
-1 -1 0
1 -1 1
-1 1 1
1 1 2

Criminal History Factor into 2 2-level factors

CrimHistA <- c(-1, 1, -1, 1)
CrimHistB <- c(-1, -1, 1, 1)
CrimHist1 <- c("0", "1","1", "2")
CrimHist_Factor <- as.data.frame((cbind(CrimHistA,CrimHistB,CrimHist1)))
kable(CrimHist_Factor, align = 'c')
CrimHistA CrimHistB CrimHist1
-1 -1 0
1 -1 1
-1 1 1
1 1 2

26 full factorial design

Using the FrF2 package we can determine a full factorial 2k design.

FrF2(64,6, factor.names = c('RaceA', 'RaceB','CrimHistA', 'CrimHistB', 'Veteran','Children'), randomize = FALSE)
## creating full factorial with 64 runs ...
##    RaceA RaceB CrimHistA CrimHistB Veteran Children
## 1     -1    -1        -1        -1      -1       -1
## 2      1    -1        -1        -1      -1       -1
## 3     -1     1        -1        -1      -1       -1
## 4      1     1        -1        -1      -1       -1
## 5     -1    -1         1        -1      -1       -1
## 6      1    -1         1        -1      -1       -1
## 7     -1     1         1        -1      -1       -1
## 8      1     1         1        -1      -1       -1
## 9     -1    -1        -1         1      -1       -1
## 10     1    -1        -1         1      -1       -1
## 11    -1     1        -1         1      -1       -1
## 12     1     1        -1         1      -1       -1
## 13    -1    -1         1         1      -1       -1
## 14     1    -1         1         1      -1       -1
## 15    -1     1         1         1      -1       -1
## 16     1     1         1         1      -1       -1
## 17    -1    -1        -1        -1       1       -1
## 18     1    -1        -1        -1       1       -1
## 19    -1     1        -1        -1       1       -1
## 20     1     1        -1        -1       1       -1
## 21    -1    -1         1        -1       1       -1
## 22     1    -1         1        -1       1       -1
## 23    -1     1         1        -1       1       -1
## 24     1     1         1        -1       1       -1
## 25    -1    -1        -1         1       1       -1
## 26     1    -1        -1         1       1       -1
## 27    -1     1        -1         1       1       -1
## 28     1     1        -1         1       1       -1
## 29    -1    -1         1         1       1       -1
## 30     1    -1         1         1       1       -1
## 31    -1     1         1         1       1       -1
## 32     1     1         1         1       1       -1
## 33    -1    -1        -1        -1      -1        1
## 34     1    -1        -1        -1      -1        1
## 35    -1     1        -1        -1      -1        1
## 36     1     1        -1        -1      -1        1
## 37    -1    -1         1        -1      -1        1
## 38     1    -1         1        -1      -1        1
## 39    -1     1         1        -1      -1        1
## 40     1     1         1        -1      -1        1
## 41    -1    -1        -1         1      -1        1
## 42     1    -1        -1         1      -1        1
## 43    -1     1        -1         1      -1        1
## 44     1     1        -1         1      -1        1
## 45    -1    -1         1         1      -1        1
## 46     1    -1         1         1      -1        1
## 47    -1     1         1         1      -1        1
## 48     1     1         1         1      -1        1
## 49    -1    -1        -1        -1       1        1
## 50     1    -1        -1        -1       1        1
## 51    -1     1        -1        -1       1        1
## 52     1     1        -1        -1       1        1
## 53    -1    -1         1        -1       1        1
## 54     1    -1         1        -1       1        1
## 55    -1     1         1        -1       1        1
## 56     1     1         1        -1       1        1
## 57    -1    -1        -1         1       1        1
## 58     1    -1        -1         1       1        1
## 59    -1     1        -1         1       1        1
## 60     1     1        -1         1       1        1
## 61    -1    -1         1         1       1        1
## 62     1    -1         1         1       1        1
## 63    -1     1         1         1       1        1
## 64     1     1         1         1       1        1
## class=design, type= full factorial

26-3 fractional factorial design

Again using the FrF2 package we can determine a design for the fractional factorial, this time specifying 8 runs for 6 factors.

d <- FrF2(8,6, res3 = T, factor.names = c('RaceA', 'RaceB','CrimHistA', 'CrimHistB', 'Veteran','Children'), randomize = FALSE)
print(d)
##   RaceA RaceB CrimHistA CrimHistB Veteran Children
## 1    -1    -1        -1         1       1        1
## 2     1    -1        -1        -1      -1        1
## 3    -1     1        -1        -1       1       -1
## 4     1     1        -1         1      -1       -1
## 5    -1    -1         1         1      -1       -1
## 6     1    -1         1        -1       1       -1
## 7    -1     1         1        -1      -1        1
## 8     1     1         1         1       1        1
## class=design, type= FrF2

Aliasing

Our 26-3 is a resolution III design. This means that main effects are not aliased with any other main effect (me) but they are aliased with two factor interactions (2fi). To determine the aliasing in our design we use the aliasprint function in the FrF2 package.

aliasprint(d)
## $legend
## [1] A=RaceA     B=RaceB     C=CrimHistA D=CrimHistB E=Veteran   F=Children 
## 
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD

Here we see which main effects are aliased with which 2fi. We also see which 2fi are aliased with other 2fi.

Based on this design, we can observe that the generator was I = ABD = ACE = BCF. You can check this by looking at the design above.

3. Analysis

Data Collection

Now that we have our design, we will collect the data for our 8 experimental runs.

#subset data by run criteria
set1 <- subset(prison1, Race == 0 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 1)
set2 <- subset(prison1, Race == 1 & Criminal_History == 0 & Veteran_Status == 0 & Has_Children == 1)
set3 <- subset(prison1, Race == 1 & Criminal_History == 0 & Veteran_Status == 1 & Has_Children == 0)
set4 <- subset(prison1, Race == 2 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 0)
set5 <- subset(prison1, Race == 0 & Criminal_History == 2 & Veteran_Status == 0 & Has_Children == 0)
set6 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 0)
set7 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 1)
set8 <- subset(prison1, Race == 2 & Criminal_History == 2 & Veteran_Status == 1 & Has_Children == 1)

#randomly sample one observation for each set

#set seed so results are reproducable
set.seed(1587)

run1 <- set1[sample(1:nrow(set1), 1), ]
run2 <- set2[sample(1:nrow(set2), 1), ]
run3 <- set3[sample(1:nrow(set3), 1), ]
run4 <- set4[sample(1:nrow(set4), 1), ]
run5 <- set5[sample(1:nrow(set5), 1), ]
run6 <- set6[sample(1:nrow(set6), 1), ]
run7 <- set7[sample(1:nrow(set7), 1), ]
run8 <- set8[sample(1:nrow(set8), 1), ]

#define vector of responses for each run
response <- c(run1$Sentence, run2$Sentence, run3$Sentence, run4$Sentence, run5$Sentence, run6$Sentence, run7$Sentence, run8$Sentence)

#print result
response
## [1]  30  46  46  33 210 140 262  12

Now we will add the response column to our design from before.

plan <- add.response(d, response)
summary(plan)
## Call:
## FrF2(8, 6, res3 = T, factor.names = c("RaceA", "RaceB", "CrimHistA", 
##     "CrimHistB", "Veteran", "Children"), randomize = FALSE)
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##   RaceA RaceB CrimHistA CrimHistB Veteran Children
## 1    -1    -1        -1        -1      -1       -1
## 2     1     1         1         1       1        1
## 
## Responses:
## [1] response
## 
## Design generating information:
## $legend
## [1] A=RaceA     B=RaceB     C=CrimHistA D=CrimHistB E=Veteran   F=Children 
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##   RaceA RaceB CrimHistA CrimHistB Veteran Children response
## 1    -1    -1        -1         1       1        1       30
## 2     1    -1        -1        -1      -1        1       46
## 3    -1     1        -1        -1       1       -1       46
## 4     1     1        -1         1      -1       -1       33
## 5    -1    -1         1         1      -1       -1      210
## 6     1    -1         1        -1       1       -1      140
## 7    -1     1         1        -1      -1        1      262
## 8     1     1         1         1       1        1       12
## class=design, type= FrF2

Main Effects

MEPlot(plan, abbrev = 5, cex.xax = 1.6, cex.main = 2)

Based off this plot is looks like all the main effects are significant except for the Has_Children factor.

Interaction Effects

IAPlot(plan, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5)

Model Creation

summary(lm(plan))
## Number of observations used: 8 
## Formula:
## response ~ (RaceA + RaceB + CrimHistA + CrimHistB + Veteran + 
##     Children)^2
## 
## Call:
## lm.default(formula = fo, data = model.frame(fo, data = formula))
## 
## Residuals:
## ALL 8 residuals are 0: no residual degrees of freedom!
## 
## Coefficients: (14 not defined because of singularities)
##                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)        97.375         NA      NA       NA
## RaceA1            -39.625         NA      NA       NA
## RaceB1             -9.125         NA      NA       NA
## CrimHistA1         58.625         NA      NA       NA
## CrimHistB1        -26.125         NA      NA       NA
## Veteran1          -40.375         NA      NA       NA
## Children1          -9.875         NA      NA       NA
## RaceA1:Children1  -18.875         NA      NA       NA
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 7 and 0 DF,  p-value: NA

From this output, we see again that the main effect size for Has_Children is not very big. Based on our analysis of our 2^6-3 design, Has_Children should not be included in the model.

Tenative Model:

\[Sentence = 97.375 - 39.625RaceA - 9.125RaceB + 58.625CrimHistA - 26.125CrimHistB - 40.375Veteran\]

Anova Test

To check the validity of the results about we will test our model using a ANOVA test on the full dataset.

model <- (aov(Sentence ~ Race + Criminal_History + Veteran_Status))
summary(model)
##                    Df   Sum Sq Mean Sq F value   Pr(>F)    
## Race                1     9143    9143   0.765    0.382    
## Criminal_History    1   987324  987324  82.588  < 2e-16 ***
## Veteran_Status      1   286936  286936  24.002 1.01e-06 ***
## Residuals        3493 41757919   11955                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From this we can see that main effect of Criminal History and Veteran Status were significant. However, Race was not significant according to the ANOVA model. This demonstrates that while fractional factorial designs can be useful, they also have a greater potential for misleading results.

Model Adequacey Checking

hist(prison1$Sentence)

qqnorm(residuals(model))

From the histogram and the plot it does not appear that the assumption of normality was met. Any conclusions drawn from the above analysis should be met with caution and further analyis is needed.

4. References

  1. Literature: Montgomery, Douglas C.. Design and Analysis of Experiments, 8th Edition, PDF

  2. Ulrike Gromping (2014). R Package FrF2 for Creating and Analyzing Fractional Factorial 2-Level Designs. Journal of Statistical Software, January 2014, Volume 56, Issue 1. #5. Appendices Raw R code:

require(FrF2)
require(knitr)
load("C:/Users/Clare Dorsey/Documents/Design of Experiments/raw_prison.rda")
prison_data <- da04572.0003
prison <- subset(prison_data, select = c(2, 4, 24, 32, 35))
prison <- subset(prison, prison$CS_SENTENCEMTH < 1000)

#define length of data frame
l = nrow(prison)

#create empty data frames to rewrite factors as numbers
racenum = data.frame(l)
crimhistnum = data.frame(l)
veterannum = data.frame(l)
childrennum = data.frame(l)

#replacement loop
for (i in 1:l) {
  
  #Race replace white with 0, black with 1, hispanic with 2, all others with 3
  if (prison$RACE[i] == "(0000001) White non-hispanic") {
    racenum[i,1] <- 0
  } 
  else if (prison$RACE[i] == "(0000002) Black non-hispanic") {
    racenum[i,1] <- 1
  } 
  else if (prison$RACE[i] == "(0000003) Hispanic") {
    racenum[i,1] <- 2
  } 
  else {
    racenum[i,1] <- 3
  }
  #crimhistory replace first timers with 0, non violent recidivist with 1, violent recidivist with 2, missing with 3
  if (prison$CH_CRIMHIST_COLLAPSED[i] == "(0000001) First timers") {
    crimhistnum[i,1] <- 0
  } 
  else if (prison$CH_CRIMHIST_COLLAPSED[i] == "(0000003) Recidivist, no current or prior violent offense") {
    crimhistnum[i,1] <- 1
  } 
  else if (prison$CH_CRIMHIST_COLLAPSED[i] == "(0000002) Recidivist, current or past violent offense") {
    crimhistnum[i,1] <- 2
  } 
  else {
    crimhistnum[i,1] <- 3
  }
  #veteran status replace no with 0 and yes with 1, missing with 2
  if (prison$VETERAN[i] == "(2) No") {
    veterannum[i,1] <- 0
  } 
  else {
    veterannum[i,1] <- 1
  }
  #has children replace no with 0, yes with 1, missing with 2
  if (prison$SES_HASCHILDREN[i] == "(0000002) Does not have children") {
    childrennum[i,1] <- 0
  } 
  else if(prison$SES_HASCHILDREN[i] == "(0000001) Has children") {
    childrennum[i,1] <- 1
  } 
  else {
    childrennum[i,1] <- 2
  }
}

#combine numbered dataframe with response variable into new dataframe
prison1 <- cbind(racenum, crimhistnum, veterannum, childrennum, prison$CS_SENTENCEMTH)

#rename columns
colnames(prison1) <- c("Race", "Criminal_History", "Veteran_Status" , "Has_Children" , "Sentence")

attach(prison1)
# get rid of unwanted factors
prison1 <- subset(prison1, prison1$Race < 3)
prison1 <- subset(prison1, prison1$Criminal_History < 3)
prison1 <- subset(prison1, prison1$Veteran_Status < 2)
prison1 <- subset(prison1, prison1$Has_Children < 2)
prison1$Race <- as.factor(prison1$Race)
prison1$Criminal_History <- as.factor(prison1$Criminal_History)
prison1$Veteran_Status <- as.factor(prison1$Veteran_Status)
prison1$Has_Children <- as.factor(prison1$Has_Children)

head(prison1)

str(prison1)

#decompose Race factor
RaceA <- c(-1, 1, -1, 1)
RaceB <- c(-1, -1, 1, 1)
Race1 <- c("0", "1","1", "2")
Race_Factor <- as.data.frame((cbind(RaceA,RaceB,Race1)))
kable(Race_Factor, align = 'c')

#decompose Criminal History factor
CrimHistA <- c(-1, 1, -1, 1)
CrimHistB <- c(-1, -1, 1, 1)
CrimHist1 <- c("0", "1","1", "2")
CrimHist_Factor <- as.data.frame((cbind(CrimHistA,CrimHistB,CrimHist1)))
kable(CrimHist_Factor, align = 'c')

#create full factorial design
FrF2(64,6, factor.names = c('RaceA', 'RaceB','CrimHistA', 'CrimHistB', 'Veteran','Children'), randomize = FALSE)

#create fractional factorial design
d <- FrF2(8,6, res3 = T, factor.names = c('RaceA', 'RaceB','CrimHistA', 'CrimHistB', 'Veteran','Children'), randomize = FALSE)
print(d)

#check aliasing in desing
aliasprint(d)

#subset data by run criteria
set1 <- subset(prison1, Race == 0 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 1)
set2 <- subset(prison1, Race == 1 & Criminal_History == 0 & Veteran_Status == 0 & Has_Children == 1)
set3 <- subset(prison1, Race == 1 & Criminal_History == 0 & Veteran_Status == 1 & Has_Children == 0)
set4 <- subset(prison1, Race == 2 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 0)
set5 <- subset(prison1, Race == 0 & Criminal_History == 2 & Veteran_Status == 0 & Has_Children == 0)
set6 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 0)
set7 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 1)
set8 <- subset(prison1, Race == 2 & Criminal_History == 2 & Veteran_Status == 1 & Has_Children == 1)

#randomly sample one observation for each set

#set seed so results are reproducable
set.seed(1587)

run1 <- set1[sample(1:nrow(set1), 1), ]
run2 <- set2[sample(1:nrow(set2), 1), ]
run3 <- set3[sample(1:nrow(set3), 1), ]
run4 <- set4[sample(1:nrow(set4), 1), ]
run5 <- set5[sample(1:nrow(set5), 1), ]
run6 <- set6[sample(1:nrow(set6), 1), ]
run7 <- set7[sample(1:nrow(set7), 1), ]
run8 <- set8[sample(1:nrow(set8), 1), ]

#define vector of responses for each run
response <- c(run1$Sentence, run2$Sentence, run3$Sentence, run4$Sentence, run5$Sentence, run6$Sentence, run7$Sentence, run8$Sentence)

#print result
response

#add response column to design
plan <- add.response(d, response)
summary(plan)

#plot main effects
MEPlot(plan, abbrev = 5, cex.xax = 1.6, cex.main = 2)

#plot interaction effects
IAPlot(plan, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5)

#compute main effect size
summary(lm(plan))

#ANOVA test
model <- (aov(Sentence ~ Race + Criminal_History + Veteran_Status))
summary(model)

#check assumptions
hist(prison1$Sentence)
qqnorm(residuals(model))
qqline(residuals(model))