Taguchi Designs are experimental designs that are made to ensure good performance of products or processes. They are part of the so called “Taguchi Methods”, proposed by Japanese engineer Genichi Taguchi, who was concerned with variation in the process not only due by the controllable but also by the uncontrollable or nuissance factors. These methods are similar to the fractional factorial designs but it extends the concept to a more robust design that ensures the process will stay within specific conditions [1].
The data used was part of a study entitled “Gender, Mental Illness and Crime in the United States” developed by Melissa Thompson as part of the Inter-university Consortium for Political and Social Research [2]. This database is composed of 55,602 respondents from the National Household Survey on Drug Use and Health (NSDUH) in 2004. Respondents were members of United States households from ages 12 and above. Information related to the use of illicit drugs, alcohol and tobacco, as well as the criminal activity, depression and other factors were included in the database. Other additional variables, such as the depression index for both adults and youth were included by the researcher.
The purpose of this study in particular is to understand the influence of gender and/or other population characteristics (i.e. age, education, income) and the addiction level (use of drugs, tobacco and alcohol consumption) in mental illness, particularly depression. The Thompson’s study includes criminal activity, yet this study will only be limited by the depression index calculated for both youth and adults in the study.
A Taguchi Design is introduced to identify the main differences between population groups and different levels of addiction to depression of the youth population. ANOVA is used as a tool of analysis for testing these differences and a fitted model to identify the levels of Depression in the youth populations accross the United States.
To start the analysis, the original dataset was imported, which included a total of 55,602 observations and 3,011 variables.
#loading data from TSV file
newdata <- data.frame(IRSEX = crimedata$IRSEX, CATAG2 = crimedata$CATAG2, AGEYOUNG= crimedata$AGE2, INCOME_R = crimedata$INCOME_R, EDU_DUMMY= crimedata$EDU_DUMMY, EDUC2 = crimedata$IREDUC2, CATAG2 = crimedata$CATAG2, BINGEHVY = crimedata$BINGEHVY, CDCGMO = crimedata$CDCGMO, CDNOCGMO = crimedata$CDNOCGMO, CDUFLAG = crimedata$CDUFLAG, SUMFLAG = crimedata$SUMFLAG, MJOFLAG = crimedata$MJOFLAG, IEMFLAG = crimedata$IEMFLAG, ANYCRIME = crime1, ARREST = crimedata$NUMARREST, DEPRESS = crimedata$YODEPRESSIONINDEX)
As observed above, the original dataset included a total of 55,602 observations and 3,011 variables. In accordance to the purpose of the study only 19 variables remained, which are contained in newdata dataset.
load("data2.RData")
str(newdata)
## 'data.frame': 55602 obs. of 19 variables:
## $ GEND : int 1 1 1 1 2 1 2 2 2 1 ...
## $ CATAG2 : int 2 2 3 2 3 3 3 2 3 3 ...
## $ AGEYOUNG : int 11 9 17 10 15 17 15 8 16 14 ...
## $ INCOME_R : int 1 1 3 1 1 2 3 1 4 3 ...
## $ EDUC : int 1 1 1 1 0 0 1 0 1 0 ...
## $ EDUC2 : int 11 10 10 10 8 8 11 8 11 8 ...
## $ ALCOHOL : int 1 1 1 1 1 1 1 1 1 0 ...
## $ DRUG : int 1 1 0 1 1 0 0 1 0 0 ...
## $ CIGARDAILY : int 0 0 1 1 0 1 1 1 1 0 ...
## $ PSYFLAG : int 0 0 0 1 0 0 0 0 0 0 ...
## $ BINGEHVY : int 1 3 4 1 4 3 3 4 4 4 ...
## $ CDCGMO : int 0 0 0 0 0 0 0 1 0 0 ...
## $ CDNOCGMO : int 0 0 1 1 0 1 1 0 1 0 ...
## $ MJOFLAG : int 0 1 0 0 1 0 0 1 0 0 ...
## $ IEMFLAG : int 1 0 0 1 0 0 0 0 0 0 ...
## $ ANYCRIME : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ARREST : int 0 0 0 0 0 0 0 0 0 0 ...
## $ DEPRESSALL : int -1 -1 -1 -1 -1 5 -1 -1 -1 -1 ...
## $ DEPRESSYOUTH: int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
Possible factor variables were created representing demographic groups gender, age, education; and addiction-related subgroups cigar, druguse and alcohol. Possible response variables were youth depression index, DEPRESS, and the some criminal-activity based variables found in the dataset.
## gender age educ inc cigar alcohol alcohol2 druguse drug psych crime
## 56 female young high low no none yes high yes yes 1
## 78 male young high low no none no high yes yes 0
## 109 male young high low no none yes high yes yes 0
## 139 male young high low no none no none no no 1
## 143 female young high low no none no none no no 0
## 169 male young high low no none no high yes yes 0
## arrests depression depressionall
## 56 0 7 -1
## 78 0 3 -1
## 109 0 1 -1
## 139 0 1 -1
## 143 0 8 -1
## 169 0 9 -1
There were several interests in this study. Analyze the criminal outcome of the different population groups according to their addiction levels and demographics or analyzing the depression index. This study was oriented to the youth population, thus there was of major interest to understand how the addiction levels affect the depression of these kids. Thus the variable depression was selected for the study. The distribution of this variable is observed from the following histogram.
As observed from the dataset, there are some variables that have two levels, like gender, where the respondents were either male or female. Yet there were others that had multiple levels. The level of education could be divided in two as for the youth population this was divided between only elementary school education elem and the high school education hs. For the age group variable, a similar case was observed, where a child (13 years or younger) was differentiated from a young kid (14-17 years). Addiction-related variables such as the daily use of tobacco cigar was also binary, which meant if they use it daily yes or if they don’t no. The alcohol variable was observed in four levels from the initial study but was conveniently divided in three levels: none, low and high. The druguse variable was also evidenced in three levels: none, low (only Marijuana use) and high (other drugs consumed).
After an initial evaluation of the dataset, the two 2-level factors used were gender and cigar and two 3-level factors used were druguse and alcohol.
If we were to analyze all four factors under homogeneous conditions, we would have to provide a full factorial design. In this case, it would be a \((2^2)(3^2)\). Yet, this same design can be easily converted into a \(2^k\) by converting the 3 level factors into two 2-level factors. For our experiment, the alcohol and druguse variables were divided in low_alcohol with 2 levels (0 for none and 1 for low) and high_alcohol with 2 levels (0 for low and 1 for high). The same treatment was done with the druguse variable.
## depression gender cigardaily alcohol drugs low_alc high_alc low_drug
## 1 7 female no none high 0 0 0
## 2 3 male no none high 0 0 0
## 3 1 male no none high 0 0 0
## 4 1 male no none none 0 0 0
## 5 8 female no none none 0 0 0
## 6 9 male no none high 0 0 0
## high_drug
## 1 1
## 2 1
## 3 1
## 4 0
## 5 0
## 6 1
Thus, if we were to design the full factorial experiment, that would imply a total of 64 runs, as shown:
## gender cigardaily low_alc high_alc low_drug high_drug
## 1 0 0 0 0 0 0
## 2 1 0 0 0 0 0
## 3 0 1 0 0 0 0
## 4 1 1 0 0 0 0
## 5 0 0 1 0 0 0
## 6 1 0 1 0 0 0
## 7 0 1 1 0 0 0
## 8 1 1 1 0 0 0
## 9 0 0 0 1 0 0
## 10 1 0 0 1 0 0
## 11 0 1 0 1 0 0
## 12 1 1 0 1 0 0
## 13 0 0 1 1 0 0
## 14 1 0 1 1 0 0
## 15 0 1 1 1 0 0
## 16 1 1 1 1 0 0
## 17 0 0 0 0 1 0
## 18 1 0 0 0 1 0
## 19 0 1 0 0 1 0
## 20 1 1 0 0 1 0
## 21 0 0 1 0 1 0
## 22 1 0 1 0 1 0
## 23 0 1 1 0 1 0
## 24 1 1 1 0 1 0
## 25 0 0 0 1 1 0
## 26 1 0 0 1 1 0
## 27 0 1 0 1 1 0
## 28 1 1 0 1 1 0
## 29 0 0 1 1 1 0
## 30 1 0 1 1 1 0
## 31 0 1 1 1 1 0
## 32 1 1 1 1 1 0
## 33 0 0 0 0 0 1
## 34 1 0 0 0 0 1
## 35 0 1 0 0 0 1
## 36 1 1 0 0 0 1
## 37 0 0 1 0 0 1
## 38 1 0 1 0 0 1
## 39 0 1 1 0 0 1
## 40 1 1 1 0 0 1
## 41 0 0 0 1 0 1
## 42 1 0 0 1 0 1
## 43 0 1 0 1 0 1
## 44 1 1 0 1 0 1
## 45 0 0 1 1 0 1
## 46 1 0 1 1 0 1
## 47 0 1 1 1 0 1
## 48 1 1 1 1 0 1
## 49 0 0 0 0 1 1
## 50 1 0 0 0 1 1
## 51 0 1 0 0 1 1
## 52 1 1 0 0 1 1
## 53 0 0 1 0 1 1
## 54 1 0 1 0 1 1
## 55 0 1 1 0 1 1
## 56 1 1 1 0 1 1
## 57 0 0 0 1 1 1
## 58 1 0 0 1 1 1
## 59 0 1 0 1 1 1
## 60 1 1 0 1 1 1
## 61 0 0 1 1 1 1
## 62 1 0 1 1 1 1
## 63 0 1 1 1 1 1
## 64 1 1 1 1 1 1
The fractional factorial design resulted was the following:
## Call:
## FrF2(runs, factor.names = nam2, default.levels = c("0", "1"))
##
## Experimental design of type FrF2
## 8 runs
##
## Factor settings (scale ends):
## gender cigardaily low_alc high_alc low_drug high_drug
## 1 0 0 0 0 0 0
## 2 1 1 1 1 1 1
##
## Design generating information:
## $legend
## [1] A=gender B=cigardaily C=low_alc D=high_alc E=low_drug
## [6] F=high_drug
##
## $generators
## [1] D=AB E=AC F=BC
##
##
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
##
##
## The design itself:
## gender cigardaily low_alc high_alc low_drug high_drug
## 1 1 0 1 0 1 0
## 2 1 1 0 1 0 0
## 3 0 1 0 0 1 0
## 4 1 1 1 1 1 1
## 5 0 1 1 0 0 1
## 6 0 0 0 1 1 1
## 7 0 0 1 1 0 0
## 8 1 0 0 0 0 1
## class=design, type= FrF2
For our fractional factorial design the order of experimental runs was developed as follows:
## gender cigardaily low_alc high_alc low_drug high_drug RV
## 8 1 0 0 0 0 1 6
## 3 0 1 0 0 1 0 0
## 4 1 1 1 1 1 1 7
## 6 0 0 0 1 1 1 7
## 2 1 1 0 1 0 0 6
## 5 0 1 1 0 0 1 8
## 7 0 0 1 1 0 0 6
## 1 1 0 1 0 1 0 9
Using Taguchi Designs “Orthogonal Arrays” we could select a design the same size as the previous (only 8 runs).
First step, as we know we have two 2-level and two 3-level factors, we use the taguchiChoose function for selecting the best design.
taguchiChoose(factors1 = 2, factors2 = 2, level1 = 3, level2 = 2)
## 2 factors on 3 levels and 2 factors on 2 levels with 0 desired interactions to be estimated
##
## Possible Designs:
##
## L36_2_3_a L36_2_3_b
##
## Use taguchiDesign("L36_2_3_a") or different to create a taguchi design object
As recommended, we are asked to choose the following design, which is composed of 36 runs:
By reducing it to a 2-level design, we are also able to obtain a Taguchi Design:
According to this design, we are able to identify Taguchi Design:
t1_design
## StandOrder RunOrder Replicate A B C D E F G y
## 1 4 1 1 1 2 2 2 2 1 1 NA
## 2 3 2 1 1 2 2 1 1 2 2 NA
## 3 2 3 1 1 1 1 2 2 2 2 NA
## 4 6 4 1 2 1 2 2 1 2 1 NA
## 5 5 5 1 2 1 2 1 2 1 2 NA
## 6 1 6 1 1 1 1 1 1 1 1 NA
## 7 7 7 1 2 2 1 1 2 2 1 NA
## 8 8 8 1 2 2 1 2 1 1 2 NA
t2_design <- t1_design[,-c(1,2,3,10,11)]
nam2 <- c("gender","cigardaily", "low_alc", "high_alc", "low_drug", "high_drug")
colnames(t2_design) <- nam2
t2_design
## gender cigardaily low_alc high_alc low_drug high_drug
## 1 1 2 2 2 2 1
## 2 1 2 2 1 1 2
## 3 1 1 1 2 2 2
## 4 2 1 2 2 1 2
## 5 2 1 2 1 2 1
## 6 1 1 1 1 1 1
## 7 2 2 1 1 2 2
## 8 2 2 1 2 1 1
The response variable was added the same as with the Fractional Factorial Designs:
The result of the randomized experiment is as follows:
## gender cigardaily low_alc high_alc low_drug high_drug depression
## 1 1 2 2 2 2 1 9
## 5 2 1 2 1 2 1 8
## 4 2 1 2 2 1 2 8
## 6 1 1 1 1 1 1 7
## 7 2 2 1 1 2 2 9
## 2 1 2 2 1 1 2 7
## 8 2 2 1 2 1 1 6
## 3 1 1 1 2 2 2 4
The aim of this study was to observe how Taguchi Designs could yield optimal parameters of any experimental study. In this case, we were interested in designing an experiment that would help us understand the main effects and “some” interaction effects. In this case this experimental design was fitted for these four factors as shown previously. The Taguchi Design is further compared with the Fractional Factorial Design also presented earlier.
Based on the Taguchi Design, we have our fitted model:
##
## Call:
## lm.default(formula = t2_design$depression ~ as.factor(t2_design$gender) +
## as.factor(t2_design$cigardaily) + as.factor(t2_design$low_alc) +
## as.factor(t2_design$high_alc) + as.factor(t2_design$low_drug) +
## as.factor(t2_design$high_drug))
##
## Residuals:
## 1 2 3 4 5 6 7 8
## 1 -1 -1 1 -1 1 1 -1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.000 2.646 2.268 0.264
## as.factor(t2_design$gender)2 1.000 2.000 0.500 0.705
## as.factor(t2_design$cigardaily)2 1.000 2.000 0.500 0.705
## as.factor(t2_design$low_alc)2 1.500 2.000 0.750 0.590
## as.factor(t2_design$high_alc)2 -1.000 2.000 -0.500 0.705
## as.factor(t2_design$low_drug)2 0.500 2.000 0.250 0.844
## as.factor(t2_design$high_drug)2 -0.500 2.000 -0.250 0.844
##
## Residual standard error: 2.828 on 1 degrees of freedom
## Multiple R-squared: 0.5897, Adjusted R-squared: -1.872
## F-statistic: 0.2396 on 6 and 1 DF, p-value: 0.9129
These results can be compared to the Fractional Factorial Design:
##
## Call:
## lm.default(formula = frac11$RV ~ as.factor(frac11$gender) + as.factor(frac11$cigardaily) +
## as.factor(frac11$low_alc) + as.factor(frac11$high_alc) +
## as.factor(frac11$low_drug) + as.factor(frac11$high_drug))
##
## Residuals:
## 1 2 3 4 5 6 7 8
## 1.375 1.375 -1.375 -1.375 1.375 1.375 -1.375 -1.375
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.125 1.375 4.455 0.141
## as.factor(frac11$gender)1 0.875 1.375 0.636 0.639
## as.factor(frac11$cigardaily)1 -0.875 1.375 -0.636 0.639
## as.factor(frac11$low_alc)1 1.375 1.375 1.000 0.500
## as.factor(frac11$high_alc)1 0.375 1.375 0.273 0.830
## as.factor(frac11$low_drug)1 -0.375 1.375 -0.273 0.830
## as.factor(frac11$high_drug)1 0.875 1.375 0.636 0.639
##
## Residual standard error: 3.889 on 1 degrees of freedom
## Multiple R-squared: 0.7027, Adjusted R-squared: -1.081
## F-statistic: 0.3939 on 6 and 1 DF, p-value: 0.8378
To understand both models, please not that the parameters are estimated based on the high or (+) value. The high values correspond to:
Gender: Female
CigarDaily: Yes, they consume daily
Low_Alc: Low level of Alcohol, compared to none
High_Alc: High level of Alcohol, compared to low
Low_Drug: Low level of Drug Consumption (only Marijuana), compared to none
High_Drug: High level of Drug Consumption (other illegal drugs) as compared to low drug consumption
From the Plots of both designs it can be observed that there is some systematic effect not captured by the model, the residuals have no normal fit and there seems to be some outliers in the Fractional Factorial Model (one observation has depression value of 0). The reason for this is mainly that the type of data used is ordinal (from 0 to 9) and that number of observations are so small that the results are likely to represent reality.
It is worth noting that the previous results are completely random selections and that much more robust designs can be constructed with the Taguchi functions, when we have a broader knowledge on the subject.
Nevertheless, the Taguchi design offered a much closer, as it reflects much closely the results of regression model constructed from all the observations in the dataset.
##
## Call:
## lm.default(formula = proj3$depression ~ as.factor(proj3$gender) +
## as.factor(proj3$cigardaily) + as.factor(proj3$low_alc) +
## as.factor(proj3$high_alc) + as.factor(proj3$low_drug) + as.factor(proj3$high_drug))
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6688 -1.1135 0.5973 1.6318 3.7026
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.36821 0.08201 65.461 < 2e-16 ***
## as.factor(proj3$gender)female 1.03445 0.08885 11.643 < 2e-16 ***
## as.factor(proj3$cigardaily)yes 0.36497 0.13866 2.632 0.00853 **
## as.factor(proj3$low_alc)1 -0.07081 0.10749 -0.659 0.51011
## as.factor(proj3$high_alc)1 0.19029 0.22656 0.840 0.40102
## as.factor(proj3$low_drug)1 0.36052 0.14371 2.509 0.01217 *
## as.factor(proj3$high_drug)1 0.71089 0.09922 7.165 9.63e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.374 on 3177 degrees of freedom
## Multiple R-squared: 0.06763, Adjusted R-squared: 0.06587
## F-statistic: 38.41 on 6 and 3177 DF, p-value: < 2.2e-16
Yet for the future studies it can be more explicitly modified to yield desired or “optimal” results if more information is provided. The lack of knowledge on the study performed on the Depression Experiment could make the experimental design challenging. Thus, further knowledge needs to be provided in this sense.
This report developed an analysis that related depression on different segments of the population and addictive habits. As response variable, the depression of the youth population was analyzed. As factors this study used gender, cigar daily use, alcohol and drug levels. Using Taguchi Designs, a much more robust design was generated as it aim on particularly evaluating main effects and some interaction effects. Desired parameters will lead to “optimal” results. Taguchi designs are able to capture these outcomes at a much reduced experimental cost.
The results of the Taguchi Design were compared to the fractional factorial design results. Although it is true that the data was selected randomly and the experimental setup was not perfect, the Taguchi Design aimed to capture the some of the trends observered from the overall database. For instance, Gender and Cigar Daily is found to influence depression positively as observed in both designs but the measured effect is larger in Taguchi Designs.
From these results, it seems that females are likely to be more depressive than males. With respect to the addictives, the simplified designs were not able to capture significant interactions. Only the Taguchi design seem to captures high alcohol level reduces depression, which is unlikely in reality. With the model results from all database, some more refined conclusions can be made in this sense. Thus, kids that have a habit of smoking at a daily basis are more susceptible to being depressive. The level of drug use also influences depression, at low and high levels. With respect to alcohol, it seems that at the highest levels, it would influence depression.
Complete code is provided below:
#loading data from TSV file
crimedata <- read.delim("CrimeData.tsv")
str(crimedata)
#creating new dataframe with variables selected for this study
crime1 = ifelse(crimedata$ANYCRIME==0, 0,
ifelse(crimedata$ANYCRIME ==1, 1,NA))
newdata <- data.frame(id = crimedata$CASEID, IRSEX = crimedata$IRSEX, CATAG2 = crimedata$CATAG2, AGEYOUNG= crimedata$AGE2, INCOME_R = crimedata$INCOME_R, EDU_DUMMY= crimedata$EDU_DUMMY, EDUC2 = crimedata$IREDUC2, CATAG2 = crimedata$CATAG2, BINGEHVY = crimedata$BINGEHVY, CDCGMO = crimedata$CDCGMO, CDNOCGMO = crimedata$CDNOCGMO, CDUFLAG = crimedata$CDUFLAG, SUMFLAG = crimedata$SUMFLAG, MJOFLAG = crimedata$MJOFLAG, IEMFLAG = crimedata$IEMFLAG, ANYCRIME = crime1, CRIME2 = probcrime, ARREST = crimedata$NUMARREST, DEPRESS = crimedata$YODEPRESSIONINDEX)
save(newdata,file="data2.RData")
load("data2.RData")
head(newdata)
tail(newdata)
summary(newdata$IRSEX)
summary(newdata$CATAG2)
summary(newdata$INCOME_R)
summary(newdata$EDU_DUMMY)
summary(newdata$IREDUC2)
summary(newdata$DEPRESSIONINDEX2)
gen_levels <- c(male = 1, female = 2)
age_levels <- c(child = 1, young = 2)
edu_levels <- c(elem = 1, hs = 2)
alc_levels <- c(none = 1, low =2, high=3)
drug_levels <- c(none = 1, low =2, high=3)
cig_levels <- c(no=1, yes=2)
age_rel<- ifelse(newdata$AGEYOUNG==1, 1,
ifelse(newdata$AGEYOUNG==2, 1,
ifelse(newdata$AGEYOUNG==3, 1, 2)))
edu_rel<- ifelse(newdata$EDUC2==1, 1, 2)
cigarrel<- ifelse(newdata$CDUFLAG==0, 1, 2)
alc_rel<- ifelse(newdata$BINGEHVY==4, 1,
ifelse(newdata$BINGEHVY==1, 3, 2))
drugrel<- ifelse(newdata$SUMFLAG==0, 1,
ifelse(newdata$MJOFLAG==1, 2, 3))
studydata <- data.frame(gender = (gender = factor(newdata$IRSEX, levels = gen_levels, labels = names(gen_levels))),
age = (age = factor(age_rel, levels = age_levels, labels = names(age_levels))),
educ = (educ = factor(edu_rel, levels = edu_levels, labels = names(edu_levels))),
cigar = (cigar = factor(cigarrel, levels = cig_levels, labels = names(cig_levels))),
alcohol = (alcohol = factor(alc_rel, levels = alc_levels, labels = names(alc_levels))),
druguse= (druguse = factor(drugrel, levels = drug_levels, labels = names(drug_levels))),
crime = newdata$ANYCRIME, prcrime=newdata$CRIME2, arrests = newdata$ARREST, depression = newdata$DEPRESS)
data2proj3 = subset(studydata, studydata$depression>=0)
head(data2proj3)
save(data2proj3, file="data30.RData")
head(data2proj3)
boxplot(data2proj3$depression ~ data2proj3$alcohol, main = "Depression Index based on Alcohol", xlab= "Level of Alcohol Use", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$druguse, main = "Depression Index based on Drug Use", xlab= "Level of Drug Use", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$gender, main = "Depression Index based on Gender", xlab= "Gender", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$age, main = "Depression Index based on Age", xlab= "Age of Youth", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$educ , main = "Depression Index based on Education", xlab= "Education", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$cigar, main = "Depression Index based on Cigar Use", xlab= "Cigar Daily Use?", ylab ="depression index")
proj3 <- data.frame(depression = data2proj3$depression, gender = data2proj3$gender, cigardaily = data2proj3$cigar, alcohol = data2proj3$alcohol, drugs = data2proj3$druguse, low_alc = NA, high_alc = NA, low_drug = NA, high_drug = NA)
proj3$low_alc[proj3$alcohol == "none"] <- 0
proj3$low_alc[proj3$alcohol == "low"] <- 1
proj3$low_alc[proj3$alcohol == "high"] <- 0
proj3$high_alc[proj3$alcohol == "none"] <- 0
proj3$high_alc[proj3$alcohol == "low"] <- 0
proj3$high_alc[proj3$alcohol == "high"] <- 1
proj3$low_drug[proj3$drugs == "none"] <- 0
proj3$low_drug[proj3$drugs == "low"] <- 1
proj3$low_drug[proj3$drugs == "high"] <- 0
proj3$high_drug[proj3$drugs == "none"] <- 0
proj3$high_drug[proj3$drugs == "low"] <- 0
proj3$high_drug[proj3$drugs == "high"] <- 1
head(proj3)
save(proj3,file="data31.RData")
head(proj3)
runs <- 2^(6-3)
nam2 <- c("gender","cigardaily", "low_alc", "high_alc", "low_drug", "high_drug")
frac_design <- FrF2(runs, factor.names = nam2 , default.levels = c("0","1"))
summary(frac_design)
frac11 <-data.frame(frac_design)
run1 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'yes' & proj3$low_alc==0 & proj3$low_drug==0))
run2 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'no' & proj3$low_alc==0 & proj3$low_drug==1))
run3 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'no' & proj3$low_alc==1 & proj3$low_drug==0))
run4 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'yes' & proj3$low_alc==0 & proj3$low_drug==1))
run5 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'yes' & proj3$low_alc==1 & proj3$low_drug==0))
run6 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'no' & proj3$low_alc==1 & proj3$low_drug==1))
run7 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'no' & proj3$low_alc==0 & proj3$low_drug==0))
run8 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'yes' & proj3$low_alc==1 & proj3$low_drug==1))
set.seed(1123)
rv <- cbind(sample(run1$depression,1),sample(run2$depression,1),sample(run3$depression,1),sample(run4$depression,1),sample(run5$depression,1),sample(run6$depression,1),sample(run7$depression,1),sample(run8$depression,1))
frac11["RV"]=NA
for (i in 1:8){ frac11$RV[i] = rv[i] }
rand_fd <- frac11[sample(nrow(frac11)),]
rand_fd
#Full Factorial Design
expand.grid(gender = c(0,1), cigardaily = c(0,1), low_alc = c(0,1), high_alc = c(0,1), low_drug = c(0,1), high_drug = c(0,1))
rv <- cbind(sample(run1$depression,1),sample(run2$depression,1),sample(run3$depression,1),sample(run4$depression,1),sample(run5$depression,1),sample(run6$depression,1),sample(run7$depression,1),sample(run8$depression,1))
frac2["RV"]=NA
for (i in 1:8){ frac2$RV[i] = rv[i] }
rand_fd <- frac2[sample(nrow(frac2)),]
rand_fd
taguchiChoose(factors1 = 2, factors2 = 2, level1 = 3, level2 = 2)
taguchiDesign("L36_2_3_a")
taguchiChoose(factors1 = 6, level1 = 2)
tag2_design <- taguchiDesign("L8_2")
tag2_design
taguchiChoose(factors1 = 6, level1 = 2)
t1_design <- as.data.frame(taguchiDesign("L8_2"))
t1_design
t2_design <- t1_design[,-c(1,2,3,10,11)]
nam2 <- c("gender","cigardaily", "low_alc", "high_alc", "low_drug", "high_drug")
colnames(t2_design) <- nam2
t2_design
run1 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'no' & proj3$low_alc==0 & proj3$low_drug==0))
run2 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'no' & proj3$low_alc==1 & proj3$low_drug==1))
run3 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'no' & proj3$low_alc==0 & proj3$low_drug==1))
run4 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'yes' & proj3$low_alc==0 & proj3$low_drug==1))
run5 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'yes' & proj3$low_alc==1 & proj3$low_drug==1))
run6 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'yes' & proj3$low_alc==0 & proj3$low_drug==0))
run7 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'yes' & proj3$low_alc==1 & proj3$low_drug==0))
run8 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'no' & proj3$low_alc==1 & proj3$low_drug==0))
set.seed(1234)
rv <- cbind(sample(run1$depression,1),sample(run2$depression,1),sample(run3$depression,1),sample(run4$depression,1),sample(run5$depression,1),sample(run6$depression,1),sample(run7$depression,1),sample(run8$depression,1))
t2_design["depression"]=NA
for (i in 1:8){ t2_design$depression[i] = rv[i] }
set.seed(1234)
rand_fd <- t2_design[sample(nrow(t2_design)),]
rand_fd
model2 <- lm(t2_design$depression~ as.factor(t2_design$gender) + as.factor(t2_design$cigardaily) + as.factor(t2_design$low_alc) + as.factor(t2_design$high_alc) + as.factor(t2_design$low_drug) + as.factor(t2_design$high_drug))
summary(model2)
par(mfrow=c(2,2))
plot(model2)
model1 <- lm(frac11$RV~ as.factor(frac11$gender) + as.factor(frac11$cigardaily) + as.factor(frac11$low_alc) + as.factor(frac11$high_alc) + as.factor(frac11$low_drug) + as.factor(frac11$high_drug))
summary(model1)
par(mfrow=c(2,2))
plot(model1)