Design of Experiments

Yage Ding

Rensselaer Polytechnic Institute

December 6, 2016

1. Setting

System under test

The Ecdat::Benefits data set resides in the Ecdat package of the R programming language. It contains a information of unemployement of blue collar workers.

library(Ecdat)
B <- Ecdat::Benefits

Factors and Levels

There are 18 parameter in this data set:

colnames(B)

##  [1] "stateur"  "statemb"  "state"    "age"      "tenure"   "joblost" 
##  [7] "nwhite"   "school12" "sex"      "bluecol"  "smsa"     "married" 
## [13] "dkids"    "dykids"   "yrdispl"  "rr"       "head"     "ui"

In this experiment, we choose factors sex, nwhite (color), age, and tenure (years of tenure in job lost) as independent variables, and stateur (state unemployment rate (in %)) to be the response variable. In the original data frame, sex and nwhite are 2-level factors, whileage and tenure are continuous variables.

B <- Ecdat::Benefits [c("sex", "nwhite", "age", "tenure", "stateur")]
head(B)

##      sex nwhite age tenure stateur
## 1   male     no  49     21     4.5
## 2   male     no  26      2    10.5
## 3 female     no  40     19     7.2
## 4 female    yes  51     17     5.8
## 5   male     no  33      1     6.5
## 6   male     no  51      3     7.5

str(B)

## 'data.frame':    4877 obs. of  5 variables:
##  $ sex    : Factor w/ 2 levels "female","male": 2 2 1 1 2 2 2 2 2 2 ...
##  $ nwhite : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ age    : int  49 26 40 51 33 51 30 26 54 31 ...
##  $ tenure : int  21 2 19 17 1 3 5 3 20 1 ...
##  $ stateur: num  4.5 10.5 7.2 5.8 6.5 7.5 5.8 5.8 7.7 6 ...

We first transform age and tenure into 2 3-level factors.

##transform age into 3-level factor
for (i in 1:4877) {
  if (B$age[i] <= 30) {
  B$age[i] <- "<=30"
  } else if (B$age[i] > 40) {
    B$age[i] <- ">40"
    } else {
       B$age[i] <- ">30 & <=40"
    }
}

We then transform age, and tenure into factors. Now, we have a data frame with 4 categorical factors, and a continuous response variable. Factor sex and nwhite both have 2 levels, while age and tenure have 3.

B1 <- B[c("sex", "nwhite", "age", "tenure","stateur")]
B1$sex <- B$sex
B1$nwhite <- B$nwhite
B1$age <- factor(B$age)
B1$tenure <- factor(B$tenure)
str(B1)

## 'data.frame':    4877 obs. of  5 variables:
##  $ sex    : Factor w/ 2 levels "female","male": 2 2 1 1 2 2 2 2 2 2 ...
##  $ nwhite : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ age    : Factor w/ 3 levels "<=30",">30 & <=40",..: 3 1 2 3 2 3 1 1 3 2 ...
##  $ tenure : Factor w/ 3 levels "<=2.5",">2.5 & <=6",..: 3 1 1 1 1 2 2 2 2 1 ...
##  $ stateur: num  4.5 10.5 7.2 5.8 6.5 7.5 5.8 5.8 7.7 6 ...

##sex
levels(B1$sex)

## [1] "female" "male"

##nwhite
levels(B1$nwhite)

## [1] "no"  "yes"

##age
levels(B1$age)

## [1] "<=30"       ">30 & <=40" ">40"

##tenure
levels(B1$tenure)

## [1] "<=2.5"      ">2.5 & <=6" ">6"

Another option for organizing the data is to split 3-level factors each into 2 2-level factors.

##Transform 3-level factor (age) into 2 2-level factors (age1 and age2)
for (i in 1:4877) {
  if (B$age[i]=="<=30"){
    B$age1[i]<-"30"
    B$age2[i]<-"30"
    } else if (B$age[i]==">40") {
      B$age2[i]<-"40"
      B$age1[i]<-"40"
      } else {
        a <- c(1,2)
        b <- c("30", "40")
        c<- sample(a,1)
        B$age1[i]<- b[c]
        B$age2[i]<- b[3-c]
        }
}

And now, our experiment can be set up as having 6 2-level factors (setup 2) or 4 mixed level factors (setup 1):

##6 2-level factors
str(B2)

## 'data.frame':    4877 obs. of  7 variables:
##  $ sex    : Factor w/ 2 levels "female","male": 2 2 1 1 2 2 2 2 2 2 ...
##  $ nwhite : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ age1   : Factor w/ 2 levels "30","40": 2 1 2 2 1 2 1 1 2 1 ...
##  $ age2   : Factor w/ 2 levels "30","40": 2 1 1 2 2 2 1 1 2 2 ...
##  $ tenure1: Factor w/ 2 levels "2.5","6": 2 1 1 1 1 1 1 2 2 1 ...
##  $ tenure2: Factor w/ 2 levels "2.5","6": 2 1 1 1 1 2 2 1 1 1 ...
##  $ stateur: num  4.5 10.5 7.2 5.8 6.5 7.5 5.8 5.8 7.7 6 ...

##2 2level and 2 3-level factors
str(B3)

## 'data.frame':    4877 obs. of  5 variables:
##  $ sex    : Factor w/ 2 levels "female","male": 2 2 1 1 2 2 2 2 2 2 ...
##  $ nwhite : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ age    : Factor w/ 3 levels "<=30",">30 & <=40",..: 3 1 2 3 2 3 1 1 3 2 ...
##  $ tenure : Factor w/ 3 levels "<=2.5",">2.5 & <=6",..: 3 1 1 1 1 2 2 2 2 1 ...
##  $ stateur: num  4.5 10.5 7.2 5.8 6.5 7.5 5.8 5.8 7.7 6 ...

2. Design

Hypothesis and Method

In this experiment, we are to find out effect of blue collar workers’ sex, color, age, and years of tenure in job lost on the response variable: state unemployment rate (in %). Our hypothesis is that these factors do not affect the response variable.

Since we are only concerned about the main effects. Taguchi design is an economic method to use. We use the package “qualityTools” to construct the experiment with the method of Taguchi design.

library(qualityTools)

If you remember, we have 2 kinds of setup for this experiment. To design the most economic experiment, we need to find out which design setup requires the least number of experimental runs to determing main effects of factors.

##design setup 1 with 2 2-level factors and 2 3-level factors
taguchiChoose(factors1 = 2, factors2 = 2, level1 = 2, level2 = 3)

## 2 factors on 2 levels and 2 factors on 3 levels with 0 desired interactions to be estimated
## 
## Possible Designs:
## 
## L36_2_3_a L36_2_3_b
## 
## Use taguchiDesign("L36_2_3_a") or different to create a taguchi design object

## design setup 2 with 6 2-level factors 
taguchiChoose(factors1 = 6, level1 = 2)

## 6 factors on 2 levels and 0 factors on 0 levels with 0 desired interactions to be estimated
## 
## Possible Designs:
## 
## L8_2 L12_2 L16_2 L32_2
## 
## Use taguchiDesign("L8_2") or different to create a taguchi design object

Accoding to the result of function taguchiChoose, setup 2 with 6 2-level factors requires the least number of runs, only 8 runs. Therefore, we choose to construct the design matrix with setup 1.

t <- taguchiDesign("L8_2")

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

For convenience in collecting data and a simpler and more clear view of the levels, we code the levels so that the lower levels are all represented as “1”, while the higher levels are represented as “2”.

##Code levels of factors: low to "1", and high to "2"
B$sex <- as.character(B$sex)
for (i in 1:4877){
  if (as.character(B$sex[i]) == "female") {
    B$sex[i] <- "1"
  } else {
    B$sex[i] <- "2"
  }
}

The final structure of our data frame looks like:

str(B)

## 'data.frame':    4877 obs. of  7 variables:
##  $ sex    : Factor w/ 2 levels "1","2": 2 2 1 1 2 2 2 2 2 2 ...
##  $ nwhite : Factor w/ 2 levels "1","2": 1 1 1 2 1 1 1 1 1 1 ...
##  $ age1   : Factor w/ 2 levels "1","2": 2 1 2 2 1 2 1 1 2 1 ...
##  $ age2   : Factor w/ 2 levels "1","2": 2 1 1 2 2 2 1 1 2 2 ...
##  $ tenure1: Factor w/ 2 levels "1","2": 2 1 1 1 1 1 1 2 2 1 ...
##  $ tenure2: Factor w/ 2 levels "1","2": 2 1 1 1 1 2 2 1 1 1 ...
##  $ stateur: num  4.5 10.5 7.2 5.8 6.5 7.5 5.8 5.8 7.7 6 ...

head(B)

##   sex nwhite age1 age2 tenure1 tenure2 stateur
## 1   2      1    2    2       2       2     4.5
## 2   2      1    1    1       1       1    10.5
## 3   1      1    2    1       1       1     7.2
## 4   1      2    2    2       1       1     5.8
## 5   2      1    1    2       1       1     6.5
## 6   2      1    2    2       1       2     7.5

Before actually running the experiment, we perform some exploratory analysis. Main effects (box plot), and ANOVA are shown below.

##Box plot
par(mfrow=c(1,2))
plot(B$stateur~B$sex, xlab="sex", ylab="unemployment rate")
plot(B$stateur~B$nwhite, xlab="color", ylab="unemployment rate")

par(mfrow=c(1,2))
plot(B1$stateur~B2$age1, xlab="age1", ylab="unemployment rate")
plot(B1$stateur~B2$age2, xlab="age2", ylab="unemployment rate")

par(mfrow=c(1,2))
plot(B1$stateur~B2$tenure1, xlab="tenure1", ylab="unemployment rate")
plot(B1$stateur~B2$tenure2, xlab="tenure2", ylab="unemployment rate")

##ANOVA with a linear model 
linear = lm(stateur~sex+nwhite+age1+age2+tenure1+tenure2, data=B1)
anova(linear)

## Analysis of Variance Table
## 
## Response: stateur
##             Df  Sum Sq Mean Sq F value   Pr(>F)   
## sex          1    47.9  47.906  7.6798 0.005605 **
## nwhite       1     0.7   0.734  0.1177 0.731563   
## age1         1     1.6   1.597  0.2560 0.612883   
## age2         1    14.5  14.528  2.3290 0.127050   
## tenure1      1    30.1  30.077  4.8216 0.028153 * 
## tenure2      1    15.4  15.447  2.4763 0.115640   
## Residuals 4870 30378.8   6.238                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According the ANOVA results, we notice that main effects of factors: sex, and tenure1are significant. With this in mind, we go on to run our experiment.

Collect Data

library(tidyr)

plan.design <- taguchiDesign("L8_2")

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

K <- as.data.frame(plan.design)

A <- c(as.integer(K[,4]))
B <- c(as.integer(K[,5]))
C <- c(as.integer(K[,6]))
D <- c(as.integer(K[,7]))
E <- c(as.integer(K[,8]))
F <- c(as.integer(K[,9]))
Plan <- as.data.frame(cbind(A,B,C,D,E,F))

n = nrow(K)
for (i in 1:n){
  if (Plan$A[i] == 1){Plan$sex[i] <- 1}
  if (Plan$A[i] == 2){Plan$sex[i] <- 2}
  
  if (Plan$B[i] == 1){Plan$nwhite[i] <- 1}
  if (Plan$B[i] == 2){Plan$nwhite[i] <- 2}
  
  if (Plan$C[i] == 1){Plan$age1[i] <- 1}
  if (Plan$C[i] == 2){Plan$age1[i] <- 2}
  
  if (Plan$D[i] == 1){Plan$age2[i] <- 1}
  if (Plan$D[i] == 2){Plan$age2[i] <- 2}
  
  if (Plan$E[i] == 1){Plan$tenure1[i] <- 1}
  if (Plan$E[i] == 2){Plan$tenure1[i] <- 2}
  
  if (Plan$F[i] == 1){Plan$tenure2[i] <- 1}
  if (Plan$F[i] == 2){Plan$tenure2[i] <- 2}
}
Bplan <- cbind(K,Plan$sex,Plan$nwhite,Plan$age1,Plan$age2, Plan$tenure1,Plan$tenure2)

library(knitr)
kable(Bplan, align = 'c')

StandOrder	RunOrder	Replicate	A	B	C	D	E	F	G	y	Plan$sex	Plan$nwhite	Plan$age1	Plan$age2	Plan$tenure1	Plan$tenure2
6	1	1	2	1	2	2	1	2	1	NA	2	1	2	2	1	2
2	2	1	1	1	1	2	2	2	2	NA	1	1	1	2	2	2
4	3	1	1	2	2	2	2	1	1	NA	1	2	2	2	2	1
7	4	1	2	2	1	1	2	2	1	NA	2	2	1	1	2	2
5	5	1	2	1	2	1	2	1	2	NA	2	1	2	1	2	1
3	6	1	1	2	2	1	1	2	2	NA	1	2	2	1	1	2
1	7	1	1	1	1	1	1	1	1	NA	1	1	1	1	1	1
8	8	1	2	2	1	2	1	1	2	NA	2	2	1	2	1	1

Bplan1<-unite(Bplan, refcol , c(12,13,14,15,16,17), remove=FALSE)
Plan.refcol <- c(Bplan1$refcol)

##B need to be string instead of factors
B1<-unite(Btagu, refcol , c(1,2,3,4,5,6), remove=FALSE)

RE1 <- c()
RE2 <- c()
RE3 <- c()
RE4 <- c()
RE5 <- c()
RE6 <- c()
RE7 <- c()
RE8 <- c()

##Collect response variable separately for all taguchi conditions
for (i in 1:4877){if (Bplan1$refcol[1] == B1$refcol[i]){RE1 <- append(RE1, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[2] == B1$refcol[i]){RE2 <- append(RE2, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[3] == B1$refcol[i]){RE3 <- append(RE3, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[4] == B1$refcol[i]){RE4 <- append(RE4, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[5] == B1$refcol[i]){RE5 <- append(RE5, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[6] == B1$refcol[i]){RE6 <- append(RE6, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[7] == B1$refcol[i]){RE7 <- append(RE7, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[8] == B1$refcol[i]){RE8 <- append(RE8, B1$stateur[i])}}
                
numRE1 <- length(RE1)
numRE2 <- length(RE2)
numRE3 <- length(RE3)
numRE4 <- length(RE4)
numRE5 <- length(RE5)
numRE6 <- length(RE6)
numRE7 <- length(RE7)
numRE8 <- length(RE8)

numHist <- c(numRE1,numRE2,numRE3,numRE4,numRE5,numRE6, numRE7,numRE8)
##a view of number for matching response variables for each condition
barplot(numHist, names.arg = c("RE1","RE2","RE3","RE4","RE5","RE6","RE7","RE8"),xlab = "Conditions",ylab = "Observations")

##randomly assign response variables to experimental conditions
RV <- c(sample(RE1,1), sample(RE2,1),sample(RE3,1),sample(RE4,1),sample(RE5,1),sample(RE6,1),sample(RE7,1),sample(RE8,1))

response(plan.design) = RV
summary(plan.design)

## Taguchi SINGLE Design
## Information about the factors:
## 
##               A       B       C       D       E       F       G
## value 1       1       1       1       1       1       1       1
## value 2       2       2       2       2       2       2       2
## name                                                           
## unit                                                           
## type    numeric numeric numeric numeric numeric numeric numeric
## 
## -----------
## 
##   StandOrder RunOrder Replicate A B C D E F G  RV
## 1          6        1         1 2 1 2 2 1 2 1 3.6
## 2          2        2         1 1 1 1 2 2 2 2 5.7
## 3          4        3         1 1 2 2 2 2 1 1 6.3
## 4          7        4         1 2 2 1 1 2 2 1 4.4
## 5          5        5         1 2 1 2 1 2 1 2 6.6
## 6          3        6         1 1 2 2 1 1 2 2 3.2
## 7          1        7         1 1 1 1 1 1 1 1 6.0
## 8          8        8         1 2 2 1 2 1 1 2 5.6
## 
## -----------

Randomization: we randomly selected data for all conditions, and in random execution order. Replication: there is no replications for any condition in this experiment for ecomonic purposes. Blocking: blocking is not used in this experiment.

3. Analysis

Testing

First, we look at the main effect plot.

effectPlot(plan.design)

From the effect plots, we observe some weak difference in response variable between levels of some factor, but we cannot conclude that these effects are significant or not. To reach this conclusion, we create a linear model using our new dataframe and perform an analysis of variance.

A <- as.data.frame(plan.design)
Z1 <- as.data.frame(cbind(A$A,A$B,A$C,A$D,A$E,A$F,A$RV))
names(Z1) <- c("sex", "nwhite", "age1", "age2", "tenure1", "tenure2","stateur")
Z1$sex <- factor(Z1$sex)
Z1$nwhite <- factor(Z1$nwhite)
Z1$age1 <- factor(Z1$age1)
Z1$age2 <- factor(Z1$age2)
Z1$tenure1 <- factor(Z1$tenure1)
Z1$tenure2 <- factor(Z1$tenure2)
##linear model
model=lm(stateur~sex+nwhite+age1+age2+tenure1+tenure2, data=Z1)
anova(model)

## Analysis of Variance Table
## 
## Response: stateur
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## sex        1  0.125   0.125  1.5625 0.42955  
## nwhite     1  0.720   0.720  9.0000 0.20483  
## age1       1  0.500   0.500  6.2500 0.24224  
## age2       1  0.125   0.125  1.5625 0.42955  
## tenure1    1  2.645   2.645 33.0625 0.10962  
## tenure2    1  7.220   7.220 90.2500 0.06677 .
## Residuals  1  0.080   0.080                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As we can see from this new anova we get different results than our original. Here, p-values suggest that sex and tenure1have no significant effects on unemployment rate of blue collar workers. We fail to reject the null hypothesis about factor sex and tenure1 along with the null hypothesis for other factors.

Diagnostics / Model Adequacy Checking

Next we graph Q-Q plots to check our data in the model for normality. If the data is not normal the results of the analysis may not be valid due to invalidation of assumptions made during analysis of variance. We can see below that the Q-Q plot of original model deviates from the linear line, while the Taguchi design plot is not linear at all. Next we plot fitted data against the residuals. We can observe a large amount of variation among the plot of original model (residuals at the ends lare narrower than those in the middle). However, the plot of the fractional factorial design model look a lot more evenly distributed. Since these results do suggest that Taguchi design fits the linear model better than the original data, we cannot conclude that ANOVA results of the Taguchi design model is any more trustworthy.

### for original model
qqnorm(residuals(linear))
qqline(residuals(linear))

### for taguchi design
qqnorm(residuals(model))
qqline(residuals(model))

### for original model
plot(fitted(linear),residuals(linear))

### for taguchi design
plot(fitted(model),residuals(model))

4. Relevant Theory

Comparing the ANOVA result of Taguchi design with that of the fractional factorial design from project 3, we find no difference in main effects’s significant level: we failed to reject any null hypothesis. All p values are greater than 0.05. Although design matrix by Taguchi design method and fractional factorial design method look different, Taguchi design divides experimental system into outer layer and inner layer, and it uses signal to noise ratio, we need to remember that Taguchi design is still a kind of fractional factorial design. Taguchi design uses orthogonal arrays that are 2-level fractional factorial designs, which estimate the effects of factors on the response mean and variation. An orthogonal array means the design is balanced so that factor levels are weighted equally. Therefore, each factor can be assessed independently of all the other factors. Similarly, a fractional factorial design with resolution III also ensures that main effects are not confounded with each other.

5. References to Literature

Taguchi DOE vs Traditional DOE. (n.d.). Retrieved December 15, 2016, from https://www.isixsigma.com/topic/taguchi-doe-vs-traditional-doe/ 2.Taguchi Designs. (n.d.). Retrieved December 15, 2016, from http://www.jmp.com/support/help/Taguchi_Designs.shtml

6. Contingencies

Based on the model adequacy testing, it appears that the data is not normally distributed, and that the linear model may not be the best fit of the data. A Kruskal-Wallis test can be used to evaluate whether groups’ distributions are identical, without the assumption that the data is normally distributed. With the kruskal-wallis test, we could only analyze the main effect of one factor at a time. Base on results from our exploratory analysis, we chose to analyze sex and tenure1.

kruskal.test(stateur~sex, data=Z1)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  stateur by sex
## Kruskal-Wallis chi-squared = 0.083333, df = 1, p-value = 0.7728

kruskal.test(stateur~tenure1, data=Z1)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  stateur by tenure1
## Kruskal-Wallis chi-squared = 2.0833, df = 1, p-value = 0.1489

The p-value for the factor, sex, of this Kruskal-Wallis test is larger than an alpha of 0.05. Therefore, we fail to reject the null, that the variation in blue collar worker unemployment rate is due to error and randomization.

7. Appendices

library(Ecdat)
B <- Ecdat::Benefits 
colnames(B)

B <- Ecdat::Benefits [c("sex", "nwhite", "age", "tenure", "stateur")]
head(B)

##transform age into 3-level factor
for (i in 1:4877) {
  if (B$age[i] <= 30) {
    B$age[i] <- "<=30"
  } else if (B$age[i] > 40) {
    B$age[i] <- ">40"
  } else {
    B$age[i] <- ">30 & <=40"
  }
}

##transform tenure into 3-level factor
for (i in 1:4877) {
  if (B$tenure[i] <= 2.5) {
    B$tenure[i] <- "<=2.5"
  } else if (B$tenure[i] > 6) {
    B$tenure[i] <- ">6"
  } else {
    B$tenure[i] <- ">2.5 & <=6"
  }
}

B1 <- B[c("sex", "nwhite", "age", "tenure","stateur")]
B1$sex <- B$sex
B1$nwhite <- B$nwhite
B1$age <- factor(B$age)
B1$tenure <- factor(B$tenure)
str(B1)
##sex
levels(B1$sex)
##nwhite
levels(B1$nwhite)
##age
levels(B1$age)
##tenure
levels(B1$tenure)

##2factor with 2levels, 2factor with 3 levels
B3 <- B[c("sex", "nwhite", "age", "tenure","stateur")]
B3$sex <- B$sex
B3$nwhite <- B$nwhite
B3$age <- factor(B$age)
B3$tenure <- factor(B$tenure)

##for screening B1, 1 3-level to 2 2-level
for (i in 1:2438) {
  if (B1$age[i]=="<=30"){
    B1$age1[i]<-"30"
    B1$age2[i]<-"30"
  } else if (B1$age[i]==">40") {
    B1$age2[i]<-"40"
    B1$age1[i]<-"40"
  } else {
    B1$age1[i]<- "30"
    B1$age2[i]<- "40"
  }
}

for (i in 1:2439) {
  if (B1$age[i]=="<=30"){
    B1$age1[i]<-"30"
    B1$age2[i]<-"30"
  } else if (B1$age[i]==">40") {
    B1$age2[i]<-"40"
    B1$age1[i]<-"40"
  } else {
    B1$age1[i]<- "40"
    B1$age2[i]<- "30"
  }
}

for (i in 1:2438) {
  if (B1$tenure[i]=="<=2.5"){
    B1$tenure1[i]<-"2.5"
    B1$tenure2[i]<-"2.5"
  } else if (B1$tenure[i]==">6") {
    B1$tenure2[i]<-"6"
    B1$tenure1[i]<-"6"
  } else {
    B1$tenure1[i]<- "2.5"
    B1$tenure2[i]<- "6"
  }
}

for (i in 1:2439) {
  if (B1$tenure[i]=="<=2.5"){
    B1$tenure1[i]<-"2.5"
    B1$tenure2[i]<-"2.5"
  } else if (B1$tenure[i]==">6") {
    B1$tenure2[i]<-"6"
    B1$tenure1[i]<-"6"
  } else {
    B1$tenure1[i]<- "6"
    B1$tenure2[i]<- "2.5"
  }
}

##Transform 3-level factor (age) into 2 2-level factors (age1 and age2)
for (i in 1:4877) {
  if (B$age[i]=="<=30"){
    B$age1[i]<-"30"
    B$age2[i]<-"30"
  } else if (B$age[i]==">40") {
    B$age2[i]<-"40"
    B$age1[i]<-"40"
  } else {
    a <- c(1,2)
    b <- c("30", "40")
    c<- sample(a,1)
    B$age1[i]<- b[c]
    B$age2[i]<- b[3-c]
  }
}

for (i in 1:4877) {
  if (B$tenure[i]=="<=2.5"){
    B$tenure1[i]<-"2.5"
    B$tenure2[i]<-"2.5"
  } else if (B$tenure[i]==">6") {
    B$tenure2[i]<-"6"
    B$tenure1[i]<-"6"
  } else {
    a <- c(1,2)
    b <- c("2.5", "6")
    c<- sample(a,1)
    B$tenure1[i]<- b[c]
    B$tenure2[i]<- b[3-c]
  }
}

B2 <- B[c("sex", "nwhite", "age1", "age2", "tenure1", "tenure2","stateur")]
B2$sex <- B$sex
B2$nwhite <- B$nwhite
B2$age1 <- factor(B$age1)
B2$age2 <- factor(B$age2)
B2$tenure1 <- factor(B$tenure1)
B2$tenure2 <- factor(B$tenure2)

##6 2-level factors
str(B2)
##2 2level and 2 3-level factors
str(B3)

library(qualityTools)

##design setup 1 with 2 2-level factors and 2 3-level factors
taguchiChoose(factors1 = 2, factors2 = 2, level1 = 2, level2 = 3)
## design setup 2 with 6 2-level factors 
taguchiChoose(factors1 = 6, level1 = 2)

t <- taguchiDesign("L8_2")

##Code levels of factors: low to "1", and high to "2"
B$sex <- as.character(B$sex)
for (i in 1:4877){
  if (as.character(B$sex[i]) == "female") {
    B$sex[i] <- "1"
  } else {
    B$sex[i] <- "2"
  }
}

B$nwhite <- as.character(B$nwhite)
for (i in 1:4877){
  if (as.character(B$nwhite[i]) == "no") {
    B$nwhite[i] <- "1"
  } else {
    B$nwhite[i] <- "2"
  }
}

for (i in 1:4877) {
  if (B$age1[i] == "30") {
    B$age1[i] <- "1"
  } else {
    B$age1[i] <- "2"
  }
}

for (i in 1:4877) {
  if (B$age2[i] == "30") {
    B$age2[i] <- "1"
  } else {
    B$age2[i] <- "2"
  }
}

for (i in 1:4877) {
  if (B$tenure1[i] =="2.5") {
    B$tenure1[i] <- "1"
  } else {
    B$tenure1[i] <- "2"
  }
}

for (i in 1:4877) {
  if (B$tenure2[i] =="2.5") {
    B$tenure2[i] <- "1"
  } else {
    B$tenure2[i] <- "2"
  }
}

B <- B[c("sex", "nwhite", "age1", "age2", "tenure1", "tenure2","stateur")]
Btagu <- B[c("sex", "nwhite", "age1", "age2", "tenure1", "tenure2","stateur")]
B$sex <- factor(B$sex)
B$nwhite <- factor(B$nwhite)
B$age1 <- factor(B$age1)
B$age2 <- factor(B$age2)
B$tenure1 <- factor(B$tenure1)
B$tenure2 <- factor(B$tenure2)

str(B)
head(B)

##Box plot
par(mfrow=c(1,2))
plot(B$stateur~B$sex, xlab="sex", ylab="unemployment rate")
plot(B$stateur~B$nwhite, xlab="color", ylab="unemployment rate")
par(mfrow=c(1,2))
plot(B1$stateur~B2$age1, xlab="age1", ylab="unemployment rate")
plot(B1$stateur~B2$age2, xlab="age2", ylab="unemployment rate")
par(mfrow=c(1,2))
plot(B1$stateur~B2$tenure1, xlab="tenure1", ylab="unemployment rate")
plot(B1$stateur~B2$tenure2, xlab="tenure2", ylab="unemployment rate")
##ANOVA with a linear model 
linear = lm(stateur~sex+nwhite+age1+age2+tenure1+tenure2, data=B1)
anova(linear)

library(tidyr)

plan.design <- taguchiDesign("L8_2")

K <- as.data.frame(plan.design)

A <- c(as.integer(K[,4]))
B <- c(as.integer(K[,5]))
C <- c(as.integer(K[,6]))
D <- c(as.integer(K[,7]))
E <- c(as.integer(K[,8]))
F <- c(as.integer(K[,9]))
Plan <- as.data.frame(cbind(A,B,C,D,E,F))

n = nrow(K)
for (i in 1:n){
  if (Plan$A[i] == 1){Plan$sex[i] <- 1}
  if (Plan$A[i] == 2){Plan$sex[i] <- 2}
  
  if (Plan$B[i] == 1){Plan$nwhite[i] <- 1}
  if (Plan$B[i] == 2){Plan$nwhite[i] <- 2}
  
  if (Plan$C[i] == 1){Plan$age1[i] <- 1}
  if (Plan$C[i] == 2){Plan$age1[i] <- 2}
  
  if (Plan$D[i] == 1){Plan$age2[i] <- 1}
  if (Plan$D[i] == 2){Plan$age2[i] <- 2}
  
  if (Plan$E[i] == 1){Plan$tenure1[i] <- 1}
  if (Plan$E[i] == 2){Plan$tenure1[i] <- 2}
  
  if (Plan$F[i] == 1){Plan$tenure2[i] <- 1}
  if (Plan$F[i] == 2){Plan$tenure2[i] <- 2}
}
Bplan <- cbind(K,Plan$sex,Plan$nwhite,Plan$age1,Plan$age2, Plan$tenure1,Plan$tenure2)

library(knitr)
kable(Bplan, align = 'c')
Bplan1<-unite(Bplan, refcol , c(12,13,14,15,16,17), remove=FALSE)
Plan.refcol <- c(Bplan1$refcol)

##B need to be string instead of factors
B1<-unite(Btagu, refcol , c(1,2,3,4,5,6), remove=FALSE)

RE1 <- c()
RE2 <- c()
RE3 <- c()
RE4 <- c()
RE5 <- c()
RE6 <- c()
RE7 <- c()
RE8 <- c()

##Collect response variable separately for all taguchi conditions
for (i in 1:4877){if (Bplan1$refcol[1] == B1$refcol[i]){RE1 <- append(RE1, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[2] == B1$refcol[i]){RE2 <- append(RE2, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[3] == B1$refcol[i]){RE3 <- append(RE3, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[4] == B1$refcol[i]){RE4 <- append(RE4, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[5] == B1$refcol[i]){RE5 <- append(RE5, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[6] == B1$refcol[i]){RE6 <- append(RE6, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[7] == B1$refcol[i]){RE7 <- append(RE7, B1$stateur[i])}}
for (i in 1:4877){if (Bplan1$refcol[8] == B1$refcol[i]){RE8 <- append(RE8, B1$stateur[i])}}

numRE1 <- length(RE1)
numRE2 <- length(RE2)
numRE3 <- length(RE3)
numRE4 <- length(RE4)
numRE5 <- length(RE5)
numRE6 <- length(RE6)
numRE7 <- length(RE7)
numRE8 <- length(RE8)

numHist <- c(numRE1,numRE2,numRE3,numRE4,numRE5,numRE6, numRE7,numRE8)
##a view of number for matching response variables for each condition
barplot(numHist, names.arg = c("RE1","RE2","RE3","RE4","RE5","RE6","RE7","RE8"),xlab = "Conditions",ylab = "Observations")

##randomly assign response variables to experimental conditions
RV <- c(sample(RE1,1), sample(RE2,1),sample(RE3,1),sample(RE4,1),sample(RE5,1),sample(RE6,1),sample(RE7,1),sample(RE8,1))

RV <- c(3.6, 5.7, 6.3, 4.4, 6.6, 3.2, 6.0, 5.6)

response(plan.design) = RV
summary(plan.design)

##main effect plot
effectPlot(plan.design)
##ANOVA
Z1 <- as.data.frame(cbind(A$A,A$B,A$C,A$D,A$E,A$F,A$RV))
names(Z1) <- c("sex", "nwhite", "age1", "age2", "tenure1", "tenure2","stateur")
Z1$sex <- factor(Z1$sex)
Z1$nwhite <- factor(Z1$nwhite)
Z1$age1 <- factor(Z1$age1)
Z1$age2 <- factor(Z1$age2)
Z1$tenure1 <- factor(Z1$tenure1)
Z1$tenure2 <- factor(Z1$tenure2)
##linear model
model=lm(stateur~sex+nwhite+age1+age2+tenure1+tenure2, data=Z1)
anova(model)

###Diagnostics / Model Adequacy Checking
### for original model
qqnorm(residuals(linear))
qqline(residuals(linear))
### for taguchi design
qqnorm(residuals(model))
qqline(residuals(model))
### for original model
plot(fitted(linear),residuals(linear))
### for taguchi design
plot(fitted(model),residuals(model))

StandOrder	RunOrder	Replicate	A	B	C	D	E	F	G	y	Plan$sex	Plan$nwhite	Plan$age1	Plan$age2	Plan$tenure1	Plan$tenure2
6	1	1	2	1	2	2	1	2	1	NA	2	1	2	2	1	2
2	2	1	1	1	1	2	2	2	2	NA	1	1	1	2	2	2
4	3	1	1	2	2	2	2	1	1	NA	1	2	2	2	2	1
7	4	1	2	2	1	1	2	2	1	NA	2	2	1	1	2	2
5	5	1	2	1	2	1	2	1	2	NA	2	1	2	1	2	1
3	6	1	1	2	2	1	1	2	2	NA	1	2	2	1	1	2
1	7	1	1	1	1	1	1	1	1	NA	1	1	1	1	1	1
8	8	1	2	2	1	2	1	1	2	NA	2	2	1	2	1	1

StandOrder	RunOrder	Replicate	A	B	C	D	E	F	G	y	Plan$sex	Plan$nwhite	Plan$age1	Plan$age2	Plan$tenure1	Plan$tenure2
6	1	1	2	1	2	2	1	2	1	NA	2	1	2	2	1	2
2	2	1	1	1	1	2	2	2	2	NA	1	1	1	2	2	2
4	3	1	1	2	2	2	2	1	1	NA	1	2	2	2	2	1
7	4	1	2	2	1	1	2	2	1	NA	2	2	1	1	2	2
5	5	1	2	1	2	1	2	1	2	NA	2	1	2	1	2	1
3	6	1	1	2	2	1	1	2	2	NA	1	2	2	1	1	2
1	7	1	1	1	1	1	1	1	1	NA	1	1	1	1	1	1
8	8	1	2	2	1	2	1	1	2	NA	2	2	1	2	1	1

Design of Experiments - Project 4