Introduction

Taguchi Designs are experimental designs that are made to ensure good performance of products or processes. They are part of the so called “Taguchi Methods”, proposed by Japanese engineer Genichi Taguchi, who was concerned with variation in the process not only due by the controllable but also by the uncontrollable or nuissance factors. These methods are similar to the fractional factorial designs but it extends the concept to a more robust design that ensures the process will stay within specific conditions [1].

Description of Data

The data used was part of a study entitled “Gender, Mental Illness and Crime in the United States” developed by Melissa Thompson as part of the Inter-university Consortium for Political and Social Research [2]. This database is composed of 55,602 respondents from the National Household Survey on Drug Use and Health (NSDUH) in 2004. Respondents were members of United States households from ages 12 and above. Information related to the use of illicit drugs, alcohol and tobacco, as well as the criminal activity, depression and other factors were included in the database. Other additional variables, such as the depression index for both adults and youth were included by the researcher.

Purpose of this Study

The purpose of this study in particular is to understand the influence of gender and/or other population characteristics (i.e. age, education, income) and the addiction level (use of drugs, tobacco and alcohol consumption) in mental illness, particularly depression. The Thompson’s study includes criminal activity, yet this study will only be limited by the depression index calculated for both youth and adults in the study.

A Taguchi Design is introduced to identify the main differences between population groups and different levels of addiction to depression of the youth population. ANOVA is used as a tool of analysis for testing these differences and a fitted model to identify the levels of Depression in the youth populations accross the United States.

DataSet: Selection of IVs and Response Variables

Data Importing and Cleaning

To start the analysis, the original dataset was imported, which included a total of 55,602 observations and 3,011 variables.

#loading data from TSV file
newdata <- data.frame(IRSEX = crimedata$IRSEX, CATAG2 = crimedata$CATAG2, AGEYOUNG= crimedata$AGE2, INCOME_R = crimedata$INCOME_R, EDU_DUMMY= crimedata$EDU_DUMMY, EDUC2 = crimedata$IREDUC2, CATAG2 = crimedata$CATAG2,  BINGEHVY = crimedata$BINGEHVY, CDCGMO = crimedata$CDCGMO, CDNOCGMO = crimedata$CDNOCGMO, CDUFLAG = crimedata$CDUFLAG, SUMFLAG =  crimedata$SUMFLAG, MJOFLAG = crimedata$MJOFLAG, IEMFLAG = crimedata$IEMFLAG, ANYCRIME = crime1, ARREST = crimedata$NUMARREST, DEPRESS = crimedata$YODEPRESSIONINDEX)

As observed above, the original dataset included a total of 55,602 observations and 3,011 variables. In accordance to the purpose of the study only 19 variables remained, which are contained in newdata dataset.

load("data2.RData")
str(newdata)
## 'data.frame':    55602 obs. of  19 variables:
##  $ GEND        : int  1 1 1 1 2 1 2 2 2 1 ...
##  $ CATAG2      : int  2 2 3 2 3 3 3 2 3 3 ...
##  $ AGEYOUNG    : int  11 9 17 10 15 17 15 8 16 14 ...
##  $ INCOME_R    : int  1 1 3 1 1 2 3 1 4 3 ...
##  $ EDUC        : int  1 1 1 1 0 0 1 0 1 0 ...
##  $ EDUC2       : int  11 10 10 10 8 8 11 8 11 8 ...
##  $ ALCOHOL     : int  1 1 1 1 1 1 1 1 1 0 ...
##  $ DRUG        : int  1 1 0 1 1 0 0 1 0 0 ...
##  $ CIGARDAILY  : int  0 0 1 1 0 1 1 1 1 0 ...
##  $ PSYFLAG     : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ BINGEHVY    : int  1 3 4 1 4 3 3 4 4 4 ...
##  $ CDCGMO      : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ CDNOCGMO    : int  0 0 1 1 0 1 1 0 1 0 ...
##  $ MJOFLAG     : int  0 1 0 0 1 0 0 1 0 0 ...
##  $ IEMFLAG     : int  1 0 0 1 0 0 0 0 0 0 ...
##  $ ANYCRIME    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ ARREST      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ DEPRESSALL  : int  -1 -1 -1 -1 -1 5 -1 -1 -1 -1 ...
##  $ DEPRESSYOUTH: int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...

Possible factor variables were created representing demographic groups gender, age, education; and addiction-related subgroups cigar, druguse and alcohol. Possible response variables were youth depression index, DEPRESS, and the some criminal-activity based variables found in the dataset.

##     gender   age educ inc cigar alcohol alcohol2 druguse drug psych crime
## 56  female young high low    no    none      yes    high  yes   yes     1
## 78    male young high low    no    none       no    high  yes   yes     0
## 109   male young high low    no    none      yes    high  yes   yes     0
## 139   male young high low    no    none       no    none   no    no     1
## 143 female young high low    no    none       no    none   no    no     0
## 169   male young high low    no    none       no    high  yes   yes     0
##     arrests depression depressionall
## 56        0          7            -1
## 78        0          3            -1
## 109       0          1            -1
## 139       0          1            -1
## 143       0          8            -1
## 169       0          9            -1

Response Variable

There were several interests in this study. Analyze the criminal outcome of the different population groups according to their addiction levels and demographics or analyzing the depression index. This study was oriented to the youth population, thus there was of major interest to understand how the addiction levels affect the depression of these kids. Thus the variable depression was selected for the study. The distribution of this variable is observed from the following histogram.

Factors and Levels

As observed from the dataset, there are some variables that have two levels, like gender, where the respondents were either male or female. Yet there were others that had multiple levels. The level of education could be divided in two as for the youth population this was divided between only elementary school education elem and the high school education hs. For the age group variable, a similar case was observed, where a child (13 years or younger) was differentiated from a young kid (14-17 years). Addiction-related variables such as the daily use of tobacco cigar was also binary, which meant if they use it daily yes or if they don’t no. The alcohol variable was observed in four levels from the initial study but was conveniently divided in three levels: none, low and high. The druguse variable was also evidenced in three levels: none, low (only Marijuana use) and high (other drugs consumed).

After an initial evaluation of the dataset, the two 2-level factors used were gender and cigar and two 3-level factors used were druguse and alcohol.

Experimental Design

Fractional Factorial Design

If we were to analyze all four factors under homogeneous conditions, we would have to provide a full factorial design. In this case, it would be a \((2^2)(3^2)\). Yet, this same design can be easily converted into a \(2^k\) by converting the 3 level factors into two 2-level factors. For our experiment, the alcohol and druguse variables were divided in low_alcohol with 2 levels (0 for none and 1 for low) and high_alcohol with 2 levels (0 for low and 1 for high). The same treatment was done with the druguse variable.

##   depression gender cigardaily alcohol drugs low_alc high_alc low_drug
## 1          7 female         no    none  high       0        0        0
## 2          3   male         no    none  high       0        0        0
## 3          1   male         no    none  high       0        0        0
## 4          1   male         no    none  none       0        0        0
## 5          8 female         no    none  none       0        0        0
## 6          9   male         no    none  high       0        0        0
##   high_drug
## 1         1
## 2         1
## 3         1
## 4         0
## 5         0
## 6         1

Thus, if we were to design the full factorial experiment, that would imply a total of 64 runs, as shown:

##    gender cigardaily low_alc high_alc low_drug high_drug
## 1       0          0       0        0        0         0
## 2       1          0       0        0        0         0
## 3       0          1       0        0        0         0
## 4       1          1       0        0        0         0
## 5       0          0       1        0        0         0
## 6       1          0       1        0        0         0
## 7       0          1       1        0        0         0
## 8       1          1       1        0        0         0
## 9       0          0       0        1        0         0
## 10      1          0       0        1        0         0
## 11      0          1       0        1        0         0
## 12      1          1       0        1        0         0
## 13      0          0       1        1        0         0
## 14      1          0       1        1        0         0
## 15      0          1       1        1        0         0
## 16      1          1       1        1        0         0
## 17      0          0       0        0        1         0
## 18      1          0       0        0        1         0
## 19      0          1       0        0        1         0
## 20      1          1       0        0        1         0
## 21      0          0       1        0        1         0
## 22      1          0       1        0        1         0
## 23      0          1       1        0        1         0
## 24      1          1       1        0        1         0
## 25      0          0       0        1        1         0
## 26      1          0       0        1        1         0
## 27      0          1       0        1        1         0
## 28      1          1       0        1        1         0
## 29      0          0       1        1        1         0
## 30      1          0       1        1        1         0
## 31      0          1       1        1        1         0
## 32      1          1       1        1        1         0
## 33      0          0       0        0        0         1
## 34      1          0       0        0        0         1
## 35      0          1       0        0        0         1
## 36      1          1       0        0        0         1
## 37      0          0       1        0        0         1
## 38      1          0       1        0        0         1
## 39      0          1       1        0        0         1
## 40      1          1       1        0        0         1
## 41      0          0       0        1        0         1
## 42      1          0       0        1        0         1
## 43      0          1       0        1        0         1
## 44      1          1       0        1        0         1
## 45      0          0       1        1        0         1
## 46      1          0       1        1        0         1
## 47      0          1       1        1        0         1
## 48      1          1       1        1        0         1
## 49      0          0       0        0        1         1
## 50      1          0       0        0        1         1
## 51      0          1       0        0        1         1
## 52      1          1       0        0        1         1
## 53      0          0       1        0        1         1
## 54      1          0       1        0        1         1
## 55      0          1       1        0        1         1
## 56      1          1       1        0        1         1
## 57      0          0       0        1        1         1
## 58      1          0       0        1        1         1
## 59      0          1       0        1        1         1
## 60      1          1       0        1        1         1
## 61      0          0       1        1        1         1
## 62      1          0       1        1        1         1
## 63      0          1       1        1        1         1
## 64      1          1       1        1        1         1

The fractional factorial design resulted was the following:

## Call:
## FrF2(runs, factor.names = nam2, default.levels = c("0", "1"))
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##   gender cigardaily low_alc high_alc low_drug high_drug
## 1      0          0       0        0        0         0
## 2      1          1       1        1        1         1
## 
## Design generating information:
## $legend
## [1] A=gender     B=cigardaily C=low_alc    D=high_alc   E=low_drug  
## [6] F=high_drug 
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##   gender cigardaily low_alc high_alc low_drug high_drug
## 1      1          0       1        0        1         0
## 2      1          1       0        1        0         0
## 3      0          1       0        0        1         0
## 4      1          1       1        1        1         1
## 5      0          1       1        0        0         1
## 6      0          0       0        1        1         1
## 7      0          0       1        1        0         0
## 8      1          0       0        0        0         1
## class=design, type= FrF2

For our fractional factorial design the order of experimental runs was developed as follows:

##   gender cigardaily low_alc high_alc low_drug high_drug RV
## 8      1          0       0        0        0         1  6
## 3      0          1       0        0        1         0  0
## 4      1          1       1        1        1         1  7
## 6      0          0       0        1        1         1  7
## 2      1          1       0        1        0         0  6
## 5      0          1       1        0        0         1  8
## 7      0          0       1        1        0         0  6
## 1      1          0       1        0        1         0  9

Taguchi Design

Using Taguchi Designs “Orthogonal Arrays” we could select a design the same size as the previous (only 8 runs).

Choosing the Best Design

First step, as we know we have two 2-level and two 3-level factors, we use the taguchiChoose function for selecting the best design.

taguchiChoose(factors1 = 2, factors2 = 2, level1 = 3, level2 = 2)
## 2 factors on 3 levels and 2 factors on 2 levels with 0 desired interactions to be estimated
## 
## Possible Designs:
## 
## L36_2_3_a L36_2_3_b
## 
## Use taguchiDesign("L36_2_3_a") or different to create a taguchi design object

As recommended, we are asked to choose the following design, which is composed of 36 runs:

By reducing it to a 2-level design, we are also able to obtain a Taguchi Design:

According to this design, we are able to identify Taguchi Design:

t1_design
##   StandOrder RunOrder Replicate A B C D E F G  y
## 1          4        1         1 1 2 2 2 2 1 1 NA
## 2          3        2         1 1 2 2 1 1 2 2 NA
## 3          2        3         1 1 1 1 2 2 2 2 NA
## 4          6        4         1 2 1 2 2 1 2 1 NA
## 5          5        5         1 2 1 2 1 2 1 2 NA
## 6          1        6         1 1 1 1 1 1 1 1 NA
## 7          7        7         1 2 2 1 1 2 2 1 NA
## 8          8        8         1 2 2 1 2 1 1 2 NA
t2_design <- t1_design[,-c(1,2,3,10,11)]
nam2 <- c("gender","cigardaily", "low_alc", "high_alc", "low_drug", "high_drug")
colnames(t2_design) <- nam2
t2_design
##   gender cigardaily low_alc high_alc low_drug high_drug
## 1      1          2       2        2        2         1
## 2      1          2       2        1        1         2
## 3      1          1       1        2        2         2
## 4      2          1       2        2        1         2
## 5      2          1       2        1        2         1
## 6      1          1       1        1        1         1
## 7      2          2       1        1        2         2
## 8      2          2       1        2        1         1

Adding Response Variable

The response variable was added the same as with the Fractional Factorial Designs:

The result of the randomized experiment is as follows:

##   gender cigardaily low_alc high_alc low_drug high_drug depression
## 1      1          2       2        2        2         1          9
## 5      2          1       2        1        2         1          8
## 4      2          1       2        2        1         2          8
## 6      1          1       1        1        1         1          7
## 7      2          2       1        1        2         2          9
## 2      1          2       2        1        1         2          7
## 8      2          2       1        2        1         1          6
## 3      1          1       1        2        2         2          4

Comparing Models

The aim of this study was to observe how Taguchi Designs could yield optimal parameters of any experimental study. In this case, we were interested in designing an experiment that would help us understand the main effects and “some” interaction effects. In this case this experimental design was fitted for these four factors as shown previously. The Taguchi Design is further compared with the Fractional Factorial Design also presented earlier.

Based on the Taguchi Design, we have our fitted model:

## 
## Call:
## lm.default(formula = t2_design$depression ~ as.factor(t2_design$gender) + 
##     as.factor(t2_design$cigardaily) + as.factor(t2_design$low_alc) + 
##     as.factor(t2_design$high_alc) + as.factor(t2_design$low_drug) + 
##     as.factor(t2_design$high_drug))
## 
## Residuals:
##  1  2  3  4  5  6  7  8 
##  1 -1 -1  1 -1  1  1 -1 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)                         6.000      2.646   2.268    0.264
## as.factor(t2_design$gender)2        1.000      2.000   0.500    0.705
## as.factor(t2_design$cigardaily)2    1.000      2.000   0.500    0.705
## as.factor(t2_design$low_alc)2       1.500      2.000   0.750    0.590
## as.factor(t2_design$high_alc)2     -1.000      2.000  -0.500    0.705
## as.factor(t2_design$low_drug)2      0.500      2.000   0.250    0.844
## as.factor(t2_design$high_drug)2    -0.500      2.000  -0.250    0.844
## 
## Residual standard error: 2.828 on 1 degrees of freedom
## Multiple R-squared:  0.5897, Adjusted R-squared:  -1.872 
## F-statistic: 0.2396 on 6 and 1 DF,  p-value: 0.9129

These results can be compared to the Fractional Factorial Design:

## 
## Call:
## lm.default(formula = frac11$RV ~ as.factor(frac11$gender) + as.factor(frac11$cigardaily) + 
##     as.factor(frac11$low_alc) + as.factor(frac11$high_alc) + 
##     as.factor(frac11$low_drug) + as.factor(frac11$high_drug))
## 
## Residuals:
##      1      2      3      4      5      6      7      8 
##  1.375  1.375 -1.375 -1.375  1.375  1.375 -1.375 -1.375 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)
## (Intercept)                      6.125      1.375   4.455    0.141
## as.factor(frac11$gender)1        0.875      1.375   0.636    0.639
## as.factor(frac11$cigardaily)1   -0.875      1.375  -0.636    0.639
## as.factor(frac11$low_alc)1       1.375      1.375   1.000    0.500
## as.factor(frac11$high_alc)1      0.375      1.375   0.273    0.830
## as.factor(frac11$low_drug)1     -0.375      1.375  -0.273    0.830
## as.factor(frac11$high_drug)1     0.875      1.375   0.636    0.639
## 
## Residual standard error: 3.889 on 1 degrees of freedom
## Multiple R-squared:  0.7027, Adjusted R-squared:  -1.081 
## F-statistic: 0.3939 on 6 and 1 DF,  p-value: 0.8378

To understand both models, please not that the parameters are estimated based on the high or (+) value. The high values correspond to:

Gender: Female

CigarDaily: Yes, they consume daily

Low_Alc: Low level of Alcohol, compared to none

High_Alc: High level of Alcohol, compared to low

Low_Drug: Low level of Drug Consumption (only Marijuana), compared to none

High_Drug: High level of Drug Consumption (other illegal drugs) as compared to low drug consumption

From the Plots of both designs it can be observed that there is some systematic effect not captured by the model, the residuals have no normal fit and there seems to be some outliers in the Fractional Factorial Model (one observation has depression value of 0). The reason for this is mainly that the type of data used is ordinal (from 0 to 9) and that number of observations are so small that the results are likely to represent reality.

Model Validation

It is worth noting that the previous results are completely random selections and that much more robust designs can be constructed with the Taguchi functions, when we have a broader knowledge on the subject.

Nevertheless, the Taguchi design offered a much closer, as it reflects much closely the results of regression model constructed from all the observations in the dataset.

## 
## Call:
## lm.default(formula = proj3$depression ~ as.factor(proj3$gender) + 
##     as.factor(proj3$cigardaily) + as.factor(proj3$low_alc) + 
##     as.factor(proj3$high_alc) + as.factor(proj3$low_drug) + as.factor(proj3$high_drug))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6688 -1.1135  0.5973  1.6318  3.7026 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     5.36821    0.08201  65.461  < 2e-16 ***
## as.factor(proj3$gender)female   1.03445    0.08885  11.643  < 2e-16 ***
## as.factor(proj3$cigardaily)yes  0.36497    0.13866   2.632  0.00853 ** 
## as.factor(proj3$low_alc)1      -0.07081    0.10749  -0.659  0.51011    
## as.factor(proj3$high_alc)1      0.19029    0.22656   0.840  0.40102    
## as.factor(proj3$low_drug)1      0.36052    0.14371   2.509  0.01217 *  
## as.factor(proj3$high_drug)1     0.71089    0.09922   7.165 9.63e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.374 on 3177 degrees of freedom
## Multiple R-squared:  0.06763,    Adjusted R-squared:  0.06587 
## F-statistic: 38.41 on 6 and 3177 DF,  p-value: < 2.2e-16

Yet for the future studies it can be more explicitly modified to yield desired or “optimal” results if more information is provided. The lack of knowledge on the study performed on the Depression Experiment could make the experimental design challenging. Thus, further knowledge needs to be provided in this sense.

Final Remarks

This report developed an analysis that related depression on different segments of the population and addictive habits. As response variable, the depression of the youth population was analyzed. As factors this study used gender, cigar daily use, alcohol and drug levels. Using Taguchi Designs, a much more robust design was generated as it aim on particularly evaluating main effects and some interaction effects. Desired parameters will lead to “optimal” results. Taguchi designs are able to capture these outcomes at a much reduced experimental cost.

The results of the Taguchi Design were compared to the fractional factorial design results. Although it is true that the data was selected randomly and the experimental setup was not perfect, the Taguchi Design aimed to capture the some of the trends observered from the overall database. For instance, Gender and Cigar Daily is found to influence depression positively as observed in both designs but the measured effect is larger in Taguchi Designs.

From these results, it seems that females are likely to be more depressive than males. With respect to the addictives, the simplified designs were not able to capture significant interactions. Only the Taguchi design seem to captures high alcohol level reduces depression, which is unlikely in reality. With the model results from all database, some more refined conclusions can be made in this sense. Thus, kids that have a habit of smoking at a daily basis are more susceptible to being depressive. The level of drug use also influences depression, at low and high levels. With respect to alcohol, it seems that at the highest levels, it would influence depression.

References

  1. Engineering Statistics Handbook (2016), What are Taguchi Designs? [Accessed 16 December 2016] http://www.itl.nist.gov/div898/handbook/pri/section5/pri56.htm
  2. Thompson, M. (2011), Gender, Mental Illness, and Crime in the United States, 2004. ICPSR27521-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011-02-10. http://doi.org/10.3886/ICPSR27521.v1

APPENDIX

Complete code is provided below:

#loading data from TSV file
crimedata <- read.delim("CrimeData.tsv")
str(crimedata)

#creating new dataframe with variables selected for this study
crime1 = ifelse(crimedata$ANYCRIME==0, 0,
                ifelse(crimedata$ANYCRIME ==1, 1,NA))
newdata <- data.frame(id = crimedata$CASEID, IRSEX = crimedata$IRSEX, CATAG2 = crimedata$CATAG2, AGEYOUNG= crimedata$AGE2, INCOME_R = crimedata$INCOME_R, EDU_DUMMY= crimedata$EDU_DUMMY, EDUC2 = crimedata$IREDUC2, CATAG2 = crimedata$CATAG2,  BINGEHVY = crimedata$BINGEHVY, CDCGMO = crimedata$CDCGMO, CDNOCGMO = crimedata$CDNOCGMO, CDUFLAG = crimedata$CDUFLAG, SUMFLAG =  crimedata$SUMFLAG, MJOFLAG = crimedata$MJOFLAG, IEMFLAG = crimedata$IEMFLAG, ANYCRIME = crime1, CRIME2 = probcrime, ARREST = crimedata$NUMARREST, DEPRESS = crimedata$YODEPRESSIONINDEX)

save(newdata,file="data2.RData")
load("data2.RData")
head(newdata)
tail(newdata)

summary(newdata$IRSEX)
summary(newdata$CATAG2)
summary(newdata$INCOME_R)
summary(newdata$EDU_DUMMY)
summary(newdata$IREDUC2)
summary(newdata$DEPRESSIONINDEX2)

gen_levels <- c(male = 1, female = 2)
age_levels <- c(child = 1, young = 2)
edu_levels <- c(elem = 1, hs = 2)

alc_levels <- c(none = 1, low =2, high=3) 
drug_levels <- c(none = 1, low =2, high=3) 
cig_levels <- c(no=1, yes=2) 

age_rel<- ifelse(newdata$AGEYOUNG==1, 1, 
                 ifelse(newdata$AGEYOUNG==2, 1,
                        ifelse(newdata$AGEYOUNG==3, 1, 2)))
edu_rel<- ifelse(newdata$EDUC2==1, 1, 2)
cigarrel<- ifelse(newdata$CDUFLAG==0, 1, 2)
alc_rel<- ifelse(newdata$BINGEHVY==4, 1, 
                 ifelse(newdata$BINGEHVY==1, 3, 2))
drugrel<- ifelse(newdata$SUMFLAG==0, 1, 
                 ifelse(newdata$MJOFLAG==1, 2, 3))

studydata <- data.frame(gender = (gender = factor(newdata$IRSEX, levels = gen_levels, labels = names(gen_levels))),
                        age = (age = factor(age_rel, levels = age_levels, labels = names(age_levels))),
                        educ = (educ = factor(edu_rel, levels = edu_levels, labels = names(edu_levels))),
                        cigar = (cigar = factor(cigarrel, levels = cig_levels, labels = names(cig_levels))),
                        alcohol = (alcohol = factor(alc_rel, levels = alc_levels, labels = names(alc_levels))), 
                        druguse= (druguse = factor(drugrel, levels = drug_levels, labels = names(drug_levels))), 
                        crime = newdata$ANYCRIME, prcrime=newdata$CRIME2, arrests = newdata$ARREST, depression = newdata$DEPRESS)

data2proj3 = subset(studydata, studydata$depression>=0)
head(data2proj3)
save(data2proj3, file="data30.RData")
head(data2proj3)

boxplot(data2proj3$depression ~ data2proj3$alcohol, main = "Depression Index based on Alcohol", xlab= "Level of Alcohol Use", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$druguse, main = "Depression Index based on Drug Use", xlab= "Level of Drug Use", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$gender, main = "Depression Index based on Gender", xlab= "Gender", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$age, main = "Depression Index based on Age", xlab= "Age of Youth", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$educ , main = "Depression Index based on Education", xlab= "Education", ylab ="depression index")
boxplot(data2proj3$depression ~ data2proj3$cigar, main = "Depression Index based on Cigar Use", xlab= "Cigar Daily Use?", ylab ="depression index")

proj3 <- data.frame(depression = data2proj3$depression, gender = data2proj3$gender, cigardaily = data2proj3$cigar, alcohol = data2proj3$alcohol, drugs = data2proj3$druguse, low_alc = NA, high_alc = NA, low_drug = NA, high_drug = NA)  

proj3$low_alc[proj3$alcohol == "none"] <- 0
proj3$low_alc[proj3$alcohol == "low"] <- 1
proj3$low_alc[proj3$alcohol == "high"] <- 0

proj3$high_alc[proj3$alcohol == "none"] <- 0
proj3$high_alc[proj3$alcohol == "low"] <- 0
proj3$high_alc[proj3$alcohol == "high"] <- 1

proj3$low_drug[proj3$drugs == "none"] <- 0
proj3$low_drug[proj3$drugs == "low"] <- 1
proj3$low_drug[proj3$drugs == "high"] <- 0

proj3$high_drug[proj3$drugs == "none"] <- 0
proj3$high_drug[proj3$drugs == "low"] <- 0
proj3$high_drug[proj3$drugs == "high"] <- 1
head(proj3)

save(proj3,file="data31.RData")
head(proj3)

runs <- 2^(6-3)
nam2 <- c("gender","cigardaily", "low_alc", "high_alc", "low_drug", "high_drug")
frac_design <- FrF2(runs, factor.names = nam2 , default.levels = c("0","1"))
summary(frac_design)
frac11 <-data.frame(frac_design)

run1 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'yes' & proj3$low_alc==0 & proj3$low_drug==0))
run2 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'no' & proj3$low_alc==0 & proj3$low_drug==1))
run3 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'no' & proj3$low_alc==1 & proj3$low_drug==0))
run4 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'yes' & proj3$low_alc==0 & proj3$low_drug==1))
run5 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'yes' & proj3$low_alc==1 & proj3$low_drug==0))
run6 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'no' & proj3$low_alc==1 & proj3$low_drug==1))
run7 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'no' & proj3$low_alc==0 & proj3$low_drug==0))
run8 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'yes' & proj3$low_alc==1 & proj3$low_drug==1))

set.seed(1123)
rv <- cbind(sample(run1$depression,1),sample(run2$depression,1),sample(run3$depression,1),sample(run4$depression,1),sample(run5$depression,1),sample(run6$depression,1),sample(run7$depression,1),sample(run8$depression,1))

frac11["RV"]=NA
for (i in 1:8){ frac11$RV[i] = rv[i] }

rand_fd <- frac11[sample(nrow(frac11)),]
rand_fd

#Full Factorial Design
expand.grid(gender = c(0,1), cigardaily = c(0,1), low_alc = c(0,1), high_alc = c(0,1), low_drug = c(0,1), high_drug = c(0,1))


rv <- cbind(sample(run1$depression,1),sample(run2$depression,1),sample(run3$depression,1),sample(run4$depression,1),sample(run5$depression,1),sample(run6$depression,1),sample(run7$depression,1),sample(run8$depression,1))

frac2["RV"]=NA
for (i in 1:8){ frac2$RV[i] = rv[i] }
rand_fd <- frac2[sample(nrow(frac2)),]
rand_fd


taguchiChoose(factors1 = 2, factors2 = 2, level1 = 3, level2 = 2)

taguchiDesign("L36_2_3_a")

taguchiChoose(factors1 = 6, level1 = 2)
tag2_design <- taguchiDesign("L8_2")
tag2_design

taguchiChoose(factors1 = 6, level1 = 2)
t1_design <- as.data.frame(taguchiDesign("L8_2"))


t1_design
t2_design <- t1_design[,-c(1,2,3,10,11)]
nam2 <- c("gender","cigardaily", "low_alc", "high_alc", "low_drug", "high_drug")
colnames(t2_design) <- nam2
t2_design

run1 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'no' & proj3$low_alc==0 & proj3$low_drug==0))
run2 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'no' & proj3$low_alc==1 & proj3$low_drug==1))
run3 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'no' & proj3$low_alc==0 & proj3$low_drug==1))
run4 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'yes' & proj3$low_alc==0 & proj3$low_drug==1))
run5 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'yes' & proj3$low_alc==1 & proj3$low_drug==1))
run6 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'yes' & proj3$low_alc==0 & proj3$low_drug==0))
run7 <- subset(proj3, (proj3$gender== 'male' & proj3$cigardaily== 'yes' & proj3$low_alc==1 & proj3$low_drug==0))
run8 <- subset(proj3, (proj3$gender== 'female' & proj3$cigardaily== 'no' & proj3$low_alc==1 & proj3$low_drug==0))

set.seed(1234)
rv <- cbind(sample(run1$depression,1),sample(run2$depression,1),sample(run3$depression,1),sample(run4$depression,1),sample(run5$depression,1),sample(run6$depression,1),sample(run7$depression,1),sample(run8$depression,1))

t2_design["depression"]=NA
for (i in 1:8){ t2_design$depression[i] = rv[i] }

set.seed(1234)
rand_fd <- t2_design[sample(nrow(t2_design)),]
rand_fd

model2 <- lm(t2_design$depression~ as.factor(t2_design$gender) + as.factor(t2_design$cigardaily)  + as.factor(t2_design$low_alc) + as.factor(t2_design$high_alc) + as.factor(t2_design$low_drug) + as.factor(t2_design$high_drug))
summary(model2)
par(mfrow=c(2,2))
plot(model2)

model1 <- lm(frac11$RV~ as.factor(frac11$gender) + as.factor(frac11$cigardaily)  + as.factor(frac11$low_alc) + as.factor(frac11$high_alc) + as.factor(frac11$low_drug) + as.factor(frac11$high_drug))
summary(model1)
par(mfrow=c(2,2))
plot(model1)