Data from a previous project in fractional factorial design is used again to explore the method of Taguchi Design, for the purpose of comparing main effects. Data from the Ecdat package pertaining to traffic accident fatality rates and drunk driving laws is used. Data analysis supported by ANOVA suggested strong effects are present.
Dataset examines the effect of 6 factors on traffic incident fatality rates per 10000 accidents.
?fatality #Command helps understand variable names and gives background information on the dataset
Dataset includes: (Initial Factors and Levels)
Important: Please note that data setting follows the same procedure used to set up fractional factorial design, as in Project 3. This involves factor decomposition and omission from desired model.
jaild factor is already defined as having two levels, so they will be unchanged in that regard except converting responses to be consistent with other high low factors (0, 1, 2)
levels(fatality$jaild)<-c(0,1)
comserd although this factor is already two level, I think it is similar to jaild in nature so I would like to examine a different soon-to-be two level factor. mlda minimum legal drinking age will be converted from numeric to a two level factor for analysis
fatality$mlda[fatality$mlda >=18 & fatality$mlda < 20]=0 #this line defines 18 and 19 year minimum drinking ages as a low level
fatality$mlda[fatality$mlda >=20]=1 #ages 20 21 are considered high level
jaild and mlda are now defined in terms of two levels (0,1) and will represent our two two-level factors.
vmiles and perinc are the two other factors I am interested in, so I will format that according to three levels. Initial inspection of vmiles shows a min value of 4.576, max 26.150 with mean 7.891 and 1st/3rd quartiles 7.183 and 8.504. The data are quite skewed but I will draw boundaries according to the 33rd and 66th quantiles found with the quantile() function.
fatality$vmiles[as.numeric(fatality$vmiles)>= 4.50 & as.numeric(fatality$vmiles) < 7.38386 ] = 0
fatality$vmiles[as.numeric(fatality$vmiles)>= 7.38386 & as.numeric(fatality$vmiles) < 8.185094] = 1
fatality$vmiles[as.numeric(fatality$vmiles)> 8.185094 & as.numeric(fatality$vmiles) < 26.150] = 2
perinc has a more normal distribution, but I will use the same 3 quantile convention for converting it to a three-level factor.
fatality$perinc[fatality$perinc>= 9500 & fatality$perinc <12658.92] = 0
fatality$perinc[fatality$perinc>= 12658.92 & fatality$perinc < 14603.35] = 1
fatality$perinc[fatality$perinc> 14603.35 & fatality$perinc <22195] = 2
In this dataset, all factors are continuous variables except community service and jailtime. However, all relevant factors have been set to two or three level factors.
The only remaining continuous variable is the response variable, mrall, which measures traffic fatality rate.
Dataset ‘Fatality’ contains 336 observations of 10 variables. For our FFD, we will consider only the response variable mrall and the independent variables jaild, mlda, vmiles, and perinc. This is all we need, so we can condense the dataset to include just these variables of interest.
fatality$state = fatality$year = fatality$beertax = fatality$comserd = fatality$unrate = NULL #drops remaining variables, not part of this analysis
fatality$mlda = as.factor(fatality$mlda) #defines two level factor
fatality$jaild = as.factor(fatality$jaild) #defines two level factor
fatality$vmiles = as.factor(fatality$vmiles) #defines three level factor
fatality$perinc = as.factor(fatality$perinc) #defines three level factor
str(fatality) #allows examination of dataset and variables
## 'data.frame': 336 obs. of 6 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ mrall : num 2.13 2.35 2.34 2.19 2.67 ...
## $ mlda : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 1 1 1 ...
## $ jaild : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 2 2 2 ...
## $ vmiles: Factor w/ 3 levels "0","1","2": 1 2 3 3 3 3 3 1 1 1 ...
## $ perinc: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 2 2 ...
head(fatality, n=8) #displays first 5 rows
## X mrall mlda jaild vmiles perinc
## 1 1 2.12836 0 no 0 0
## 2 2 2.34848 0 no 1 0
## 3 3 2.33643 0 no 2 0
## 4 4 2.19348 0 no 2 0
## 5 5 2.66914 1 no 2 0
## 6 6 2.71859 1 no 2 0
## 7 7 2.49391 1 no 2 0
## 8 8 2.49914 0 yes 0 0
tail(fatality, n=8) #shows last 5 rows
## X mrall mlda jaild vmiles perinc
## 329 329 1.66220 1 no 2 2
## 330 330 3.94118 0 yes 2 1
## 331 331 3.35271 0 yes 2 1
## 332 332 3.06043 0 yes 2 1
## 333 333 2.98625 0 yes 2 1
## 334 334 3.31361 0 yes 2 1
## 335 335 2.63265 0 yes 2 1
## 336 336 3.23591 0 yes 2 1
The objective of the fractional factorial design experiment is to observe the influence of numerous conditions on the rate of fatal traffic incidents (particularly alcohol related factors). While we know for a fact that alcohol consumption increases the risk of traffic accidents, but there are potentially some specific insights looming. For Project 4: Taguchi Design, the qualityTools package is installed because it contains important functions for verifying FFD results through Taguchi Design.
The following package needs to be installed because it contains important methods: taguchiChoose(); taguchiDesign(); response() and effectPlot()
install.packages("qualityTools", repos='http://cran.us.r-project.org')
require(qualityTools)
# Test designs for 2-2level and 2-3level factors
taguchiChoose(factors1 = 2, factors2 = 2, level1 = 3, level2 = 2)
taguchiDesign("L36_2_3_a")
36 seems like too many experimental runs. At that rate we might as well be doing a full factorial design. Lets try and alternative to reduce the required replications.
# test and and print a better design
b <- taguchiChoose(6, level1 = 2)
print(b)
#Show the structure of the Taguchi design matrix
b <- taguchiDesign("L8_2", randomize = F)
print(b)
> b <- taguchiChoose(6, level1 = 2)
6 factors on 2 levels and 0 factors on 0 levels with 0 desired interactions to be estimated
Possible Designs:
L8_2 L12_2 L16_2 L32_2
Use taguchiDesign("L8_2") or different to create a taguchi design object
> print(b)
StandOrder RunOrder Replicate A B C D E F G y
1 1 1 1 1 1 1 1 1 1 1 NA
2 2 2 1 1 1 1 2 2 2 2 NA
3 3 3 1 1 2 2 1 1 2 2 NA
4 4 4 1 1 2 2 2 2 1 1 NA
5 5 5 1 2 1 2 1 2 1 2 NA
6 6 6 1 2 1 2 2 1 2 1 NA
7 7 7 1 2 2 1 1 2 2 1 NA
8 8 8 1 2 2 1 2 1 1 2 NA
This looks ideal. Lets move forward to execution and analysis. But first:
Similar to FFD in Proj. 3, experimental rational is reducing the number of runs, while maximizing efficiency. In fact, Genichi Taguchi cited quality control and performance as his objective in experimental design. Parameter & tolerance design are key differences between FFD and Taguchi Design. L36_2_3_a is the chosen Taguchi design because it fits the given data. An alternative is decomposing it into 6 separate 2 level factors, which would produce an L8_2 design, which requires less trials.
Randomization is important in meaningful statistical experiments. It is assumed that data is randomly collected, given that it is provided in a built in library containing datasets meant to be used for this type of analysis. Randomization is also implemented automatically through the taguchiDesign() method.
Replicates are used to achieve accuracy. However, part of the experimental method is analyzing data with a limit amount of trials. For analysis, two replications from each factor-level combination are selected, so there are 16 replicates in total (although just 8 are required).
Blocking nuisance factors is not part of analysis.
Before Taguchi Design, we will do some exploratory analysis, similar to that of Project 3, so we can compare results.
boxplot(mrall~mlda, data=fatality, xlab="Min. Drinking Age", ylab="Fatality Rate per 10000")
boxplot(mrall~jaild, data=fatality, xlab="Mandatory Jail Time?", ylab="Fatality Rate per 10000")
boxplot(mrall~vmiles, data=fatality, xlab="Average Milage", ylab="Fatality Rate per 10000")
boxplot(mrall~perinc, data=fatality, xlab="Per Capita Personal Income", ylab="Fatality Rate per 10000")
boxplot(fatality$mrall ~ fatality$mlda + fatality$jaild + fatality$vmiles + fatality$perinc, xlab= "MinimumLegalDrinkingAge | MandatoryJailTime? | AverageMilage | Income",ylab="Fatality Rate per 10000", main="Side-by-side Analysis of Variables")
We can discuss these results in a bit, first lets go through our Taguchi Design and contrast. Note there are a few outliers and means differ between groups.
## [1] NA NA NA NA NA NA NA NA
## [1] NA NA NA NA NA NA NA NA
Mean RV results for the 2 sets of 8 runs can be seen. Lets convert those to matrices/tables.
# Convert to matrix
full_means <- as.matrix(means)
a <- b
c <- b
# Insert responses into Taguchi matrix
response(a) <- full_means
response(b) <- means_rep1
response(c) <- means_rep2
# Show all design matrices with responses
#print(full_means)
#print(means_rep1)
#print(means_rep2)
print(b)
print(c)
print(a)
> print(b)
StandOrder RunOrder Replicate A B C D E F G means_rep1
1 1 1 1 1 1 1 1 1 1 1 1.76997
2 2 2 1 1 1 1 2 2 2 2 1.46127
3 3 3 1 1 2 2 1 1 2 2 1.53202
4 4 4 1 1 2 2 2 2 1 1 1.45285
5 5 5 1 2 1 2 1 2 1 2 2.09016
6 6 6 1 2 1 2 2 1 2 1 2.45203
7 7 7 1 2 2 1 1 2 2 1 2.54661
8 8 8 1 2 2 1 2 1 1 2 1.47572
> print(c)
StandOrder RunOrder Replicate A B C D E F G means_rep2
1 1 1 1 1 1 1 1 1 1 1 2.12836
2 2 2 1 1 1 1 2 2 2 2 1.85384
3 3 3 1 1 2 2 1 1 2 2 1.53259
4 4 4 1 1 2 2 2 2 1 1 1.45129
5 5 5 1 2 1 2 1 2 1 2 2.13752
6 6 6 1 2 1 2 2 1 2 1 2.00692
7 7 7 1 2 2 1 1 2 2 1 1.56178
8 8 8 1 2 2 1 2 1 1 2 1.85126
> print(a)
StandOrder RunOrder Replicate A B C D E F G full_means
1 1 1 1 1 1 1 1 1 1 1 1.949165
2 2 2 1 1 1 1 2 2 2 2 1.901629
3 3 3 1 1 2 2 1 1 2 2 1.816150
4 4 4 1 1 2 2 2 2 1 1 1.490152
5 5 5 1 2 1 2 1 2 1 2 2.214516
6 6 6 1 2 1 2 2 1 2 1 2.234001
7 7 7 1 2 2 1 1 2 2 1 2.062704
8 8 8 1 2 2 1 2 1 1 2 1.772424
Great, we now have response results for two replications and a grouped mean.
Main effects will be calculated mathematically and compared to those generated through ANOVA in project 3.
With main effects determined and stored, lets organize them and show effect plots.
#Create main effect table vector
ME_tble1 <- matrix(c(mainEffect_A1,mainEffect_B1,mainEffect_C1,mainEffect_D1,mainEffect_E1,mainEffect_F1), ncol = 1)
# Convert to table
ME_tble1 <- as.table(ME_tble1)
rownames(ME_tble1) <- c("A","B","C","D","E","F")
colnames(ME_tble1) <- "Main Effects, Rep 1"
ME_tble1
effectPlot(b, factors = c("A","B","C","D","E","F"))
Next, we will generate the same table and plots for our second replication, and the grouped mean.
#make 2nd main effect table, format, and label
ME_tble2 <- matrix(c(mainEffect_A2,mainEffect_B2,mainEffect_C2,mainEffect_D2,mainEffect_E2,mainEffect_F2), ncol = 1)
ME_tble2 <- as.table(ME_tble2)
rownames(ME_tble2) <- c("A","B","C","D","E","F")
colnames(ME_tble2) <- "Main Effects, Rep 2"
ME_tble2
# Create ME plot
effectPlot(c, factors = c("A","B","C","D","E","F"))
# Create full subset vector
ME_tble3 <- matrix(c(mainEffect_A3,mainEffect_B3,mainEffect_C3,mainEffect_D3,mainEffect_E3,mainEffect_F3), ncol = 1)
# Convert to table
ME_tble3 <- as.table(ME_tble3)
# Label rows
rownames(ME_tble3) <- c("A","B","C","D","E","F")
# Label columns
colnames(ME_tble3) <- "Main Effects, Full Means"
# Print table
ME_tble3
effectPlot(a, factors = c("A","B","C","D","E","F"))
ME_tble1
Main Effects, Rep 1
A 0.507055011
B -0.191557519
C -0.011675002
D -0.274222439
E 0.000239961
F 0.300807478
> ME_tble2
Main Effects, Rep 2
A 0.14556001
B -0.43243002
C -0.06901999
D -0.04923502
E -0.13096502
F -0.15332509
> ME_tble3
Main Effects, Full Means
A 0.28163737
B -0.28947005
C 0.01722427
D -0.16108224
E -0.02568498
F 0.14705641
r2
In testing, we can compare ANOVA and effects plots from project 3 and project 4 to draw conclusions about Taguchi Design.
Levels of each factor are shown in the following boxplots, which visually shows the relationship between main effects of factors over response.
Testing is nearly complete. But first, lets put things in terms of NHST.
H(null) - There is no statistically significant difference between the responses due to differing factor levels of the independent variables
H(alternative) - There is a statistically significant difference between the responses due to the factor level of the independent variables
To draw an accurate conclusion, we must compare calculated main effects by Taguchi Design, and compare results from the ANOVA (what we used in P3).
ANOVA = aov(lm(mrall~mlda+jaild+vmiles+perinc, data=fatality))
summary(ANOVA)
## Df Sum Sq Mean Sq F value Pr(>F)
## mlda 1 2.54 2.539 14.24 0.000191 ***
## jaild 1 7.52 7.522 42.18 3.08e-10 ***
## vmiles 2 22.65 11.327 63.52 < 2e-16 ***
## perinc 2 17.53 8.764 49.14 < 2e-16 ***
## Residuals 329 58.67 0.178
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In conclusion, results of Taguchi Design are dissimilar to those of FFD on the same dataset. ANOVA suggested strong effects from each factor. Factor effect plots showed effects of varying significance. While the number of experimental runs was not reduced through Taguchi Design, it is expected that results are more accurate (especially given two replications, but again this is not a requirement of T.D.). This experimental design is not perfect, nor was it executed flawlessly, but Taguchi Design has been successfully used to estimate main effect size as an alternative to FFD in this case.
qqnorm(residuals(ANOVA))
qqline(residuals(ANOVA))
plot(fitted(ANOVA),residuals(ANOVA))
hist(fatality$mrall, breaks = 25)
The assumption of normality and random selection are required for analysis of variance. Histograms, QQ plots, and scatter plots suggests that this assumption is not valid. The histogram shows a slight right skew, and the QQ plot does not follow the normal line.
From the results of FFD in project 3, I suggested further testing and analysis should take place, especially given the invalidity of the initial assumption of normality. Taguchi Design proved to be a complimentary study, which is a good exercise in statistical analysis and other fields of research as well.
https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf
### complete and documented R code (project 4 material, for P3 code see datasetting or previous FFD project)
(projectcode)
Run1.111111 = subset(fatality,vmiles == "0" & perinc == "0" & mlda == "0" & jaild == "0")
Run2.111222 = subset(fatality,vmiles == "0" & perinc == "0" & mlda == "1" & jaild == "1")
Run3.122112 = subset(fatality,vmiles == "0" & perinc == "1" & mlda == "1" & jaild == "1")
Run4.122221 = subset(fatality,vmiles == "0" & perinc == "1" & mlda == "1" & jaild == "0")
Run5.212121 = subset(fatality,vmiles == "1" & perinc == "1" & mlda == "1" & jaild == "1")
Run6.212212 = subset(fatality,vmiles == "1" & perinc == "0" & mlda == "1" & jaild == "1")
Run7.221122 = subset(fatality,vmiles == "1" & perinc == "1" & mlda == "0" & jaild == "0")
Run8.221211 = subset(fatality,vmiles == "1" & perinc == "1" & mlda == "1" & jaild == "0")
#randomization
randfx <- function (fatality){
a <- sample(nrow(fatality)) # Randomize data frame
b <- a[1] # Take first response
c <- a[2] # Take second resposne
return(c(fatality$mrall[b],fatality$mrall[c])) # return variable
}
# Randomize all subsets
mean_111111 <- randfx(Run1.111111)
mean_111222 <- randfx(Run2.111222)
mean_122112 <- randfx(Run3.122112)
mean_122221 <- randfx(Run4.122221)
mean_212121 <- randfx(Run5.212121)
mean_212212 <- randfx(Run6.212212)
mean_221122 <- randfx(Run7.221122)
mean_221211 <- randfx(Run8.221211)
#actual mean calc
fullmean_000000 <- mean(Run1.111111$mrall)
fullmean_000111 <- mean(Run2.111222$mrall)
fullmean_011001 <- mean(Run3.122112$mrall)
fullmean_011110 <- mean(Run4.122221$mrall)
fullmean_101010 <- mean(Run5.212121$mrall)
fullmean_101101 <- mean(Run6.212212$mrall)
fullmean_110011 <- mean(Run7.221122$mrall)
fullmean_110100 <- mean(Run8.221211$mrall)
means_rep1 <- c(mean_111111[1], mean_111222[1], mean_122112[1],mean_122221[1],mean_212121[1],mean_212212[1],mean_221122[1],mean_221211[1])
means_rep1
means_rep2 <- c(mean_111111[2], mean_111222[2], mean_122112[2],mean_122221[2] ,mean_212121[2], mean_212212[2], mean_221122[2], mean_221211[2])
means_rep2
means_rep1 <- as.matrix(means_rep1)
means_rep2 <- as.matrix(means_rep2)
means <- c(fullmean_000000,fullmean_000111,fullmean_011001,fullmean_011110,fullmean_101010,fullmean_101101,fullmean_110011,fullmean_110100)
# Convert to matrix
full_means <- as.matrix(means)
a <- b
c <- b
# Insert responses into Taguchi matrix
response(a) <- full_means
response(b) <- means_rep1
response(c) <- means_rep2
# Show all design matrices with responses
#print(full_means)
#print(means_rep1)
#print(means_rep2)
print(b)
print(c)
print(a)
# ME calculation for all replicates and full subsets.
mainEffect_A1 = .25 * ((mean_111111[1] + mean_212212[1] + mean_221122[1] + mean_221211[1]) - (mean_111222[1] + mean_111111[1] + mean_122112[1] + mean_122221[1]))
mainEffect_B1 = .25 * ((mean_122112[1] + mean_122221[1] + mean_221122[1] + mean_221211[1]) - (mean_111111[1] + mean_212121[1] + mean_212212[1] + mean_111222[1]))
mainEffect_C1 = .25 * ((mean_122112[1] + mean_122221[1] + mean_111111[1] + mean_212212[1]) - (mean_111222[1] + mean_111111[1] + mean_221122[1] + mean_221211[1]))
mainEffect_D1 = .25 * ((mean_111222[1] + mean_122221[1] + mean_212212[1] + mean_221211[1]) - (mean_111111[1] + mean_122112[1] + mean_212121[1] + mean_221122[1]))
mainEffect_E1 = .25 * ((mean_111222[1] + mean_122221[1] + mean_111111[1] + mean_221122[1] - (mean_111111[1] + mean_122112[1] + mean_212212[1] + mean_221211[1])))
mainEffect_F1 = .25 * ((mean_111222[1] + mean_122112[1] + mean_212212[1] + mean_221122[1] - (mean_111111[1] + mean_122221[1] + mean_221211[1] + mean_212121[1])))
#R2
mainEffect_A2 = .25 * ((mean_111111[2] + mean_212212[2] + mean_221122[2] + mean_221211[2]) - (mean_111222[2] + mean_111111[2] + mean_122112[2] + mean_122221[2]))
mainEffect_B2 = .25 * ((mean_122112[2] + mean_122221[2] + mean_221122[2] + mean_221211[2]) - (mean_111111[2] + mean_212121[2] + mean_212212[2] + mean_111222[2]))
mainEffect_C2 = .25 * ((mean_122112[2] + mean_122221[2] + mean_111111[2] + mean_212212[2]) - (mean_111222[2] + mean_111111[2] + mean_221122[2] + mean_221211[2]))
mainEffect_D2 = .25 * ((mean_111222[2] + mean_122221[2] + mean_212212[2] + mean_221211[2]) - (mean_111111[2] + mean_122112[2] + mean_212121[2] + mean_221122[2]))
mainEffect_E2 = .25 * ((mean_111222[2] + mean_122221[2] + mean_111111[2] + mean_221122[2] - (mean_111111[2] + mean_122112[2] + mean_212212[2] + mean_221211[2])))
mainEffect_F2 = .25 * ((mean_111222[2] + mean_122112[2] + mean_212212[2] + mean_221122[2] - (mean_111111[2] + mean_122221[2] + mean_221211[2] + mean_212121[2])))
#R3- Full Mean
mainEffect_A3 <- 1/4 * ( (fullmean_101010 + fullmean_101101 + fullmean_110011 + fullmean_110100) - (fullmean_000111 + fullmean_000000 + fullmean_011001 + fullmean_011110))
mainEffect_B3 <- 1/4 * ((fullmean_011001 + fullmean_011110 + fullmean_110011 + fullmean_110100) - (fullmean_000000 + fullmean_101010 + fullmean_101101 + fullmean_000111))
mainEffect_C3 <- 1/4 * ((fullmean_011001 + fullmean_011110 + fullmean_101010 + fullmean_101101) - (fullmean_000111 + fullmean_000000 + fullmean_110011 + fullmean_110100))
mainEffect_D3 <- 1/4 * ((fullmean_000111 + fullmean_011110 + fullmean_101101 + fullmean_110100) - (fullmean_000000 + fullmean_011001 + fullmean_101010 + fullmean_110011))
mainEffect_E3 <- 1/4 * ((fullmean_000111 + fullmean_011110 + fullmean_101010 + fullmean_110011 - (fullmean_000000 + fullmean_011001 + fullmean_101101 + fullmean_110100)))
mainEffect_F3 <- 1/4 * ((fullmean_000111 + fullmean_011001 + fullmean_101101 + fullmean_110011 - (fullmean_000000+ fullmean_011110 + fullmean_110100 + fullmean_101010)))
#Create main effect table vector
ME_tble1 <- matrix(c(mainEffect_A1,mainEffect_B1,mainEffect_C1,mainEffect_D1,mainEffect_E1,mainEffect_F1), ncol = 1)
# Convert to table
ME_tble1 <- as.table(ME_tble1)
rownames(ME_tble1) <- c("A","B","C","D","E","F")
colnames(ME_tble1) <- "Main Effects, Rep 1"
ME_tble1
effectPlot(b, factors = c("A","B","C","D","E","F"))
#make 2nd main effect table, format, lable
ME_tble2 <- matrix(c(mainEffect_A2,mainEffect_B2,mainEffect_C2,mainEffect_D2,mainEffect_E2,mainEffect_F2), ncol = 1)
ME_tble2 <- as.table(ME_tble2)
rownames(ME_tble2) <- c("A","B","C","D","E","F")
colnames(ME_tble2) <- "Main Effects, Rep 2"
ME_tble2
# Create ME plot
effectPlot(c, factors = c("A","B","C","D","E","F"))
# Create full subset vector
ME_tble3 <- matrix(c(mainEffect_A3,mainEffect_B3,mainEffect_C3,mainEffect_D3,mainEffect_E3,mainEffect_F3), ncol = 1)
# Convert to table
ME_tble3 <- as.table(ME_tble3)
# Label rows
rownames(ME_tble3) <- c("A","B","C","D","E","F")
# Label columns
colnames(ME_tble3) <- "Main Effects, Full Means"
# Print table
ME_tble3
effectPlot(a, factors = c("A","B","C","D","E","F"))
ANOVA = aov(lm(mrall~mlda+jaild+vmiles+perinc, data=fatality))
summary(ANOVA)
qqnorm(residuals(ANOVA))
qqline(residuals(ANOVA))
plot(fitted(ANOVA),residuals(ANOVA))
hist(fatality$mrall, breaks = 25)