1. Setting
A national survey collected data from interviews with inmates at state and federally owned prisons from October 2003 to May 2004. Information collected included current offense and sentence, personal characteristics, and background information regarding prior experience with drugs, alcohol and abuse.
System Under Test
From this data set, I wanted to look at different factors that potentially influence sentence length. To do this I selected 2 2-level factors and 2 3-level factors from the original data set along with one response variable.
Factors and Levels
In this experiment 4 factors were examined.
1. Race: levels: White non hispanic (0), Black non hispanic (1), Hispanic (2)
2. Criminal History: First timer offender (0), Repeat offender w/o violent history (1), Repeat offender w/ violent history (2)
3. Veteran Status: No (0), Yes (1)
4. Has Children?: No (0), Yes (1)
Continuous Variables
The response variables considered in this experiment will be sentence length. It is a continuous variable that represents the length (in months) of the sentence each inmate received for their crime.
The Data
To get a better idea of what this data looks like, let’s examine the first six rows:
head(prison1)
## Race Criminal_History Veteran_Status Has_Children Sentence
## 1 1 0 0 1 19
## 2 1 0 0 1 46
## 3 2 0 0 1 58
## 4 2 0 0 1 22
## 5 1 1 0 1 180
## 6 2 0 0 1 33
We can also look at the structure of the data to verifiy the levels of each of the 4 factors:
str(prison1)
## 'data.frame': 3193 obs. of 5 variables:
## $ Race : Factor w/ 3 levels "0","1","2": 2 2 3 3 2 3 1 2 2 1 ...
## $ Criminal_History: Factor w/ 3 levels "0","1","2": 1 1 1 1 2 1 1 1 1 1 ...
## $ Veteran_Status : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Has_Children : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 2 1 2 ...
## $ Sentence : num 19 46 58 22 180 33 60 360 168 38 ...
2. Experimental Design
To determine the main effects of the four factors on the response, Sentence Length, we will utilize a Taguchi design. To make the design simpler, we will decompose the two 3-level factors each into two 2-level factors, giving us a total of six 2-level factors for inclusion in our design.
Decomposing 3-level factors to 2-level factors
Race Factor into 2 2-level factors
RaceA <- c(-1, 1, -1, 1)
RaceB <- c(-1, -1, 1, 1)
Race1 <- c("0", "1","1", "2")
Race_Factor <- as.data.frame((cbind(RaceA,RaceB,Race1)))
kable(Race_Factor, align = 'c')
| -1 |
-1 |
0 |
| 1 |
-1 |
1 |
| -1 |
1 |
1 |
| 1 |
1 |
2 |
Criminal History Factor into 2 2-level factors
CrimHistA <- c(-1, 1, -1, 1)
CrimHistB <- c(-1, -1, 1, 1)
CrimHist1 <- c("0", "1","1", "2")
CrimHist_Factor <- as.data.frame((cbind(CrimHistA,CrimHistB,CrimHist1)))
kable(CrimHist_Factor, align = 'c')
| -1 |
-1 |
0 |
| 1 |
-1 |
1 |
| -1 |
1 |
1 |
| 1 |
1 |
2 |
Selection of Appropriate Taguchi Design
Using the taguchiChoose function in the qualityTools package we can see all possible Taguchi designs based on the number of factors and the number of levels for each factor.
#show possible taguchi designs
taguchiChoose(factors1 = 6, level1 = 2)
## 6 factors on 2 levels and 0 factors on 0 levels with 0 desired interactions to be estimated
##
## Possible Designs:
##
## L8_2 L12_2 L16_2 L32_2
##
## Use taguchiDesign("L8_2") or different to create a taguchi design object
While there are multiple Taguchi designs we can use, for the purposes of this experiment we are only interested in main effects and in minimizing the number of experimental runs. For these reasons, the L8_2 design is the most appropriate.
Next we will create our design using the taguchiDesign function. We will also set the random seed so that our results are reproduceable.
set.seed(1587)
design <- taguchiDesign("L8_2")
names(design) = c("Race1", "Race2", "CrimHist1", "CrimHist2", "Veteran", "Children")
design
## StandOrder RunOrder Replicate A B C D E F G y
## 1 8 1 1 2 2 1 2 1 1 2 NA
## 2 5 2 1 2 1 2 1 2 1 2 NA
## 3 3 3 1 1 2 2 1 1 2 2 NA
## 4 6 4 1 2 1 2 2 1 2 1 NA
## 5 4 5 1 1 2 2 2 2 1 1 NA
## 6 1 6 1 1 1 1 1 1 1 1 NA
## 7 7 7 1 2 2 1 1 2 2 1 NA
## 8 2 8 1 1 1 1 2 2 2 2 NA
Rationale for Design
Taguchi designs are traditionally used when there are between 3 and 50 factors, and the designer is only interested in the main effect of the few factors that contribute significantly to the response. In this experiment we only have four factors and we believe them to all contribute significantly. This means that a Taguchi design may not be the ideal but another purpose of this design is to see how a Taguchi design compares against a fractional factorial design in estimating main effects. For this reason, we are using a Taguchi design.
Randomization
While we had no control over how the original data were collected, by selecting a random sample from the data, we are incorporating randomization into the model.
Replicaton
There is no replication in this model. Part of the purpose of using a Taguchi design is to save money. This means only one replication will be used.
Blocking
Blocking is not used in this design. Taguchi designs are unique in that they do not try to block nuisance factors but include them. This is also a screening experiment and we are interested in the effects of all the factors.
3. Analysis
Data Collection
Now that we have our design, we will collect the data for our 8 experimental runs.
#subset data by run criteria
set1 <- subset(prison1, Race == 2 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 0)
set2 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 0)
set3 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 1)
set4 <- subset(prison1, Race == 1 & Criminal_History == 2 & Veteran_Status == 0 & Has_Children == 1)
set5 <- subset(prison1, Race == 1 & Criminal_History == 2 & Veteran_Status == 1 & Has_Children == 0)
set6 <- subset(prison1, Race == 0 & Criminal_History == 0 & Veteran_Status == 0 & Has_Children == 0)
set7 <- subset(prison1, Race == 2 & Criminal_History == 0 & Veteran_Status == 1 & Has_Children == 1)
set8 <- subset(prison1, Race == 0 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 1)
#randomly sample one observation for each set
#set seed so results are reproducable
run1 <- set1[sample(1:nrow(set1), 1), ]
run2 <- set2[sample(1:nrow(set2), 1), ]
run3 <- set3[sample(1:nrow(set3), 1), ]
run4 <- set4[sample(1:nrow(set4), 1), ]
run5 <- set5[sample(1:nrow(set5), 1), ]
run6 <- set6[sample(1:nrow(set6), 1), ]
run7 <- set7[sample(1:nrow(set7), 1), ]
run8 <- set8[sample(1:nrow(set8), 1), ]
#define vector of responses for each run
response <- c(run1$Sentence, run2$Sentence, run3$Sentence, run4$Sentence, run5$Sentence, run6$Sentence, run7$Sentence, run8$Sentence)
#print result
response
## [1] 21 121 210 300 220 24 150 135
Now we will add the response column to our design from before.
response(design) = response
summary(design)
## Taguchi SINGLE Design
## Information about the factors:
##
## A B C D E F G
## value 1 1 1 1 1 1 1 1
## value 2 2 2 2 2 2 2 2
## name Race1 Race2 CrimHist1 CrimHist2 Veteran Children <NA>
## unit
## type numeric numeric numeric numeric numeric numeric numeric
##
## -----------
##
## StandOrder RunOrder Replicate A B C D E F G response
## 1 8 1 1 2 2 1 2 1 1 2 21
## 2 5 2 1 2 1 2 1 2 1 2 121
## 3 3 3 1 1 2 2 1 1 2 2 210
## 4 6 4 1 2 1 2 2 1 2 1 300
## 5 4 5 1 1 2 2 2 2 1 1 220
## 6 1 6 1 1 1 1 1 1 1 1 24
## 7 7 7 1 2 2 1 1 2 2 1 150
## 8 2 8 1 1 1 1 2 2 2 2 135
##
## -----------
Main Effects
effectPlot(design)

From the main effect plots it seems that only Criminal_History and Has_Children have a significant effect on sentence length. This is interesting, in that it differs a lot from the main effect plot that was generated using a fractional factorial design in a previous experiment using the same data. This differnce will be discussed further below.
Model Creation
Based on the output from the Taguchi design, we would only include Criminal_History and Has_Children in our model.
Anova Test
To check the validity of the results about we will test our model using a ANOVA test on the full dataset.
model <- (aov(Sentence ~ Race + Criminal_History + Veteran_Status))
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Race 1 9143 9143 0.765 0.382
## Criminal_History 1 987324 987324 82.588 < 2e-16 ***
## Veteran_Status 1 286936 286936 24.002 1.01e-06 ***
## Residuals 3493 41757919 11955
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From this we can see that main effect of Criminal History and Veteran Status were significant. However, Race and whether the inmate has children were not significant according to the ANOVA model. This is a very different result from what we would expect based on the main effects determined by our Taguchi desing experiment.
Comparison to Fractional Factorial Design Results
To compare the the Fractional Factorial Design to the Taguchi Design we will look at the main effects plot for both.

Fractional Factorial ME plot

Taguchi ME plot
Here we see that the results of the two experiments differed greatly even though they used the same data set. Compared to the results of the ANOVA test conducted on the full dataset, the Fractional Factorial design gave a better estimate of the main effects than the Taguchi model.
Discussion
The last two experiments using first Fractional Factorial and then Taguchi designs to estimate main effects have highlighted some of the positives and negatives to using each design. Both designs were capable of estimating main effects with only 8 experimental runs for our six 2-level factor experiment. This is an enormous time and money saver and useful when it is not practical to run a full factorial design. However, the main effects estimates when compared to the ANOVA test results, computed on the full data set, show that estimates computed from only 8 observations are highly dependent on the selection of those 8 observations.
While neither design did a great job of estimating the main effects, for this dataset it appears that the fractional factorial was a more effective design. This makes sense for a couple of reasons. There were relatively few factors in this experiment and they were all chosen because they were believed to influence the response variable, sentence length. Given these circumstances, a fractional factorial design is probably more appropriate as these designs are meant for a smaller number of factors with a few levels where all factors are thought to contribute significantly. As mentioned previously, Taguchi designs are meant for a larger number of factors where some are nuisance factors and only a few factors contribute signifcantly. The Taguchi design tries to reduce the noise to signal ratio by considering all factors and selecting the best combination of factors to create a product or process that is the least sensitive to noise. This means that Taguchi designs are good for designing systems and processes but may not be the best at determining which factors contribute significantly to a response variable such as prison sentence length.
Model Adequacey Checking
hist(prison1$Sentence)

qqnorm(residuals(model))

From the histogram and the plot it does not appear that the assumption of normality was met. Any conclusions drawn from the above analysis should be met with caution and further analyis is needed.
5. Appendices
Raw R code:
require(FrF2)
require(knitr)
load("C:/Users/Clare Dorsey/Documents/Design of Experiments/raw_prison.rda")
prison_data <- da04572.0003
prison <- subset(prison_data, select = c(2, 4, 24, 32, 35))
prison <- subset(prison, prison$CS_SENTENCEMTH < 1000)
#define length of data frame
l = nrow(prison)
#create empty data frames to rewrite factors as numbers
racenum = data.frame(l)
crimhistnum = data.frame(l)
veterannum = data.frame(l)
childrennum = data.frame(l)
#replacement loop
for (i in 1:l) {
#Race replace white with 0, black with 1, hispanic with 2, all others with 3
if (prison$RACE[i] == "(0000001) White non-hispanic") {
racenum[i,1] <- 0
}
else if (prison$RACE[i] == "(0000002) Black non-hispanic") {
racenum[i,1] <- 1
}
else if (prison$RACE[i] == "(0000003) Hispanic") {
racenum[i,1] <- 2
}
else {
racenum[i,1] <- 3
}
#crimhistory replace first timers with 0, non violent recidivist with 1, violent recidivist with 2, missing with 3
if (prison$CH_CRIMHIST_COLLAPSED[i] == "(0000001) First timers") {
crimhistnum[i,1] <- 0
}
else if (prison$CH_CRIMHIST_COLLAPSED[i] == "(0000003) Recidivist, no current or prior violent offense") {
crimhistnum[i,1] <- 1
}
else if (prison$CH_CRIMHIST_COLLAPSED[i] == "(0000002) Recidivist, current or past violent offense") {
crimhistnum[i,1] <- 2
}
else {
crimhistnum[i,1] <- 3
}
#veteran status replace no with 0 and yes with 1, missing with 2
if (prison$VETERAN[i] == "(2) No") {
veterannum[i,1] <- 0
}
else {
veterannum[i,1] <- 1
}
#has children replace no with 0, yes with 1, missing with 2
if (prison$SES_HASCHILDREN[i] == "(0000002) Does not have children") {
childrennum[i,1] <- 0
}
else if(prison$SES_HASCHILDREN[i] == "(0000001) Has children") {
childrennum[i,1] <- 1
}
else {
childrennum[i,1] <- 2
}
}
#combine numbered dataframe with response variable into new dataframe
prison1 <- cbind(racenum, crimhistnum, veterannum, childrennum, prison$CS_SENTENCEMTH)
#rename columns
colnames(prison1) <- c("Race", "Criminal_History", "Veteran_Status" , "Has_Children" , "Sentence")
attach(prison1)
# get rid of unwanted factors
prison1 <- subset(prison1, prison1$Race < 3)
prison1 <- subset(prison1, prison1$Criminal_History < 3)
prison1 <- subset(prison1, prison1$Veteran_Status < 2)
prison1 <- subset(prison1, prison1$Has_Children < 2)
prison1$Race <- as.factor(prison1$Race)
prison1$Criminal_History <- as.factor(prison1$Criminal_History)
prison1$Veteran_Status <- as.factor(prison1$Veteran_Status)
prison1$Has_Children <- as.factor(prison1$Has_Children)
#view first 6 rows of data
head(prison1)
#view structure of data
str(prison1)
#decompose Race factor
RaceA <- c(-1, 1, -1, 1)
RaceB <- c(-1, -1, 1, 1)
Race1 <- c("0", "1","1", "2")
Race_Factor <- as.data.frame((cbind(RaceA,RaceB,Race1)))
kable(Race_Factor, align = 'c')
#decompose Criminal History factor
CrimHistA <- c(-1, 1, -1, 1)
CrimHistB <- c(-1, -1, 1, 1)
CrimHist1 <- c("0", "1","1", "2")
CrimHist_Factor <- as.data.frame((cbind(CrimHistA,CrimHistB,CrimHist1)))
kable(CrimHist_Factor, align = 'c')
#show possible taguchi designs
taguchiChoose(factors1 = 6, level1 = 2)
#set random seed to generate reproduceable results
set.seed(1587)
#create design
design <- taguchiDesign("L8_2")
#rename columns
names(design) = c("Race1", "Race2", "CrimHist1", "CrimHist2", "Veteran", "Children")
#view design
design
#subset data by run criteria
set1 <- subset(prison1, Race == 2 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 0)
set2 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 0)
set3 <- subset(prison1, Race == 1 & Criminal_History == 1 & Veteran_Status == 0 & Has_Children == 1)
set4 <- subset(prison1, Race == 1 & Criminal_History == 2 & Veteran_Status == 0 & Has_Children == 1)
set5 <- subset(prison1, Race == 1 & Criminal_History == 2 & Veteran_Status == 1 & Has_Children == 0)
set6 <- subset(prison1, Race == 0 & Criminal_History == 0 & Veteran_Status == 0 & Has_Children == 0)
set7 <- subset(prison1, Race == 2 & Criminal_History == 0 & Veteran_Status == 1 & Has_Children == 1)
set8 <- subset(prison1, Race == 0 & Criminal_History == 1 & Veteran_Status == 1 & Has_Children == 1)
#randomly sample one observation for each set
run1 <- set1[sample(1:nrow(set1), 1), ]
run2 <- set2[sample(1:nrow(set2), 1), ]
run3 <- set3[sample(1:nrow(set3), 1), ]
run4 <- set4[sample(1:nrow(set4), 1), ]
run5 <- set5[sample(1:nrow(set5), 1), ]
run6 <- set6[sample(1:nrow(set6), 1), ]
run7 <- set7[sample(1:nrow(set7), 1), ]
run8 <- set8[sample(1:nrow(set8), 1), ]
#define vector of responses for each run
response <- c(run1$Sentence, run2$Sentence, run3$Sentence, run4$Sentence, run5$Sentence, run6$Sentence, run7$Sentence, run8$Sentence)
#print result
response
#add response vector to Taguchi design
response(design) = response
#display summary
summary(design)
#plot main effects
effectPlot(design)
#perform another test on full data set
model <- (aov(Sentence ~ Race + Criminal_History + Veteran_Status))
summary(model)
#check normality assumptions
hist(prison1$Sentence)
qqnorm(residuals(model))