Experimental Setting

System under test

The data being tested are from the Ecdat package in R, and the data set is named OFP. This was a survey of a variety of factors which impacted how often people went to a variety of medical appointments.

This data will be analyzed in a similar fashion to the last report. All of the visits will be combined into one variable, which will be analyzed through the use of 4 factors that may have an impact. Two of the factors will be 2-level, and two will be 3-level. A Taguchi design will be created, carried out, analyzed and then compared to the results obtained from the FFD carried out in the previous report.

Factors and Levels

The original testing independent variables of this dataset are as follows:

Sex, with levels male and female

Employed, with levels yes and no

Region, with levels noreast, west, midwest and other

Health, with levels excellent, poor, and other

All of these variables are currently assumed to have some effect on the number of visits a patient requires.

Continuous Variables

The response variable, totalvisits is a sum of the number of patient visits to a variety of medical practitioners that was originally surveyed. This represents the number of times that a subject had an issue for which they had to see a medical professional (excluding hospital visits).

Response Variables

The response variable is overall number of visits totalvisits, which is derived as explained above.

Data Overview

head(df)
##   ofp ofnp opp opnp emr hosp numchron adldiff age black    sex maried
## 1   5    0   0    0   0    1        2       0 6.9   yes   male    yes
## 2   1    0   2    0   2    0        2       0 7.4    no female    yes
## 3  13    0   0    0   3    3        4       1 6.6   yes female     no
## 4  16    0   5    0   1    1        2       1 7.6    no   male    yes
## 5   3    0   0    0   0    0        2       1 7.9    no female    yes
## 6  17    0   0    0   0    0        5       1 6.6    no female     no
##   school faminc employed privins medicaid region  hlth
## 1      6 2.8810      yes     yes       no  other other
## 2     10 2.7478       no     yes       no  other other
## 3     10 0.6532       no      no      yes  other  poor
## 4      3 0.6588       no     yes       no  other  poor
## 5      6 0.6588       no     yes       no  other other
## 6      7 0.3301       no      no      yes  other  poor
str(df)
## 'data.frame':    4406 obs. of  19 variables:
##  $ ofp     : int  5 1 13 16 3 17 9 3 1 0 ...
##  $ ofnp    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ opp     : int  0 2 0 5 0 0 0 0 0 0 ...
##  $ opnp    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ emr     : int  0 2 3 1 0 0 0 0 0 0 ...
##  $ hosp    : int  1 0 3 1 0 0 0 0 0 0 ...
##  $ numchron: int  2 2 4 2 2 5 0 0 0 0 ...
##  $ adldiff : int  0 0 1 1 1 1 0 0 0 0 ...
##  $ age     : num  6.9 7.4 6.6 7.6 7.9 6.6 7.5 8.7 7.3 7.8 ...
##  $ black   : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 1 1 1 ...
##  $ sex     : Factor w/ 2 levels "female","male": 2 1 1 2 1 1 1 1 1 1 ...
##  $ maried  : Factor w/ 2 levels "no","yes": 2 2 1 2 2 1 1 1 1 1 ...
##  $ school  : int  6 10 10 3 6 7 8 8 8 8 ...
##  $ faminc  : num  2.881 2.748 0.653 0.659 0.659 ...
##  $ employed: Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
##  $ privins : Factor w/ 2 levels "no","yes": 2 2 1 2 2 1 2 2 2 2 ...
##  $ medicaid: Factor w/ 2 levels "no","yes": 1 1 2 1 1 2 1 1 1 1 ...
##  $ region  : Factor w/ 4 levels "other","noreast",..: 1 1 1 1 1 1 3 3 3 3 ...
##  $ hlth    : Factor w/ 3 levels "other","excellent",..: 1 1 3 3 1 3 1 1 1 1 ...

The data set contains 4406 observations of 19 separate variables. The following alterations will be performed:

  1. All incomplete cases in the set will be removed.

  2. The ofp, ofnp, opp, and opnp variables are combined to show the total number of doctors’ office visits (does not include hospital visits) that a patient made over the span of the study.

  3. The data set will be reduced to the response variable totalvisits, and the independent variables sex, employed, region, and hlth.

The final data set is shown below.

head(df2)
##      sex employed region  hlth totalvisits
## 1   male      yes  other other           5
## 2 female       no  other other           3
## 3 female       no  other  poor          13
## 4   male       no  other  poor          21
## 5 female       no  other other           3
## 6 female       no  other  poor          17
str(df2)
## 'data.frame':    4406 obs. of  5 variables:
##  $ sex        : Factor w/ 2 levels "female","male": 2 1 1 2 1 1 1 1 1 1 ...
##  $ employed   : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
##  $ region     : Factor w/ 4 levels "other","noreast",..: 1 1 1 1 1 1 3 3 3 3 ...
##  $ hlth       : Factor w/ 3 levels "other","excellent",..: 1 1 3 3 1 3 1 1 1 1 ...
##  $ totalvisits: int  5 3 13 21 3 17 9 3 1 0 ...
summary(df2)
##      sex       employed       region            hlth     
##  female:2628   no :3951   other  :1614   other    :3509  
##  male  :1778   yes: 455   noreast: 837   excellent: 343  
##                           midwest:1157   poor     : 554  
##                           west   : 798                   
##                                                          
##                                                          
##   totalvisits     
##  Min.   :  0.000  
##  1st Qu.:  2.000  
##  Median :  6.000  
##  Mean   :  8.679  
##  3rd Qu.: 11.000  
##  Max.   :296.000

The factor levels are to be replaced with numbers in order to better express high and low levels. the changes are as follows:

sex: male = 1, female = 0

employed: yes = 1, no = 0

region: other and midwest = 0, west = 1, noreast = 2

hlth: poor = 0, other = 1, excellent = 2

The decisions regarding combinations and ordering of factors are based on experimenter’s judgment. other seems to describe the south and great plains region of the USA, which fits more closely with the midwest than any of the other regions in the country. other was chosen as the middle level in the health factor, as it seems that poor and excellent adequately define the bounds of self-described health.

The numeric data set is shown below.

##   sex employed region health totalvisits
## 1   1        1      0      1           5
## 2   0        0      0      1           3
## 3   0        0      0      0          13
## 4   1        0      0      0          21
## 5   0        0      0      1           3
## 6   0        0      0      0          17
## 'data.frame':    4406 obs. of  5 variables:
##  $ sex        : num  1 0 0 1 0 0 0 0 0 0 ...
##  $ employed   : num  1 0 0 0 0 0 0 0 0 0 ...
##  $ region     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ health     : num  1 1 0 0 1 0 1 1 1 1 ...
##  $ totalvisits: int  5 3 13 21 3 17 9 3 1 0 ...
##       sex            employed          region           health      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :1.0000  
##  Mean   :0.4035   Mean   :0.1033   Mean   :0.5611   Mean   :0.9521  
##  3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :2.0000   Max.   :2.0000  
##   totalvisits     
##  Min.   :  0.000  
##  1st Qu.:  2.000  
##  Median :  6.000  
##  Mean   :  8.679  
##  3rd Qu.: 11.000  
##  Max.   :296.000

Experimental Design

This experiment seeks to observe the impact of a variety of conditions on the number of doctor’s visits required by a subject. A Taguchi design was used to conduct the analysis, allowing the reduction of the total runs, while still allowing the computation of main effects.

How will the experiment be organized and conducted?

A Taguchi design will be conducted in order to reduce the number of experimental runs. The three level factor variables will be decomposed into 2 two-level factors for the purposes of calculating the required design. For these, the sum of the two-level variables will yield the 3-level value that is chosen. (This is the reason for the selection of 0-2 as levels instead of -1,0, and 1.) Replication will be conducted in order to ensure that the data is not skewed by outliers. However, in the last experiment, it was required to average the entire data set in order to obtain accurate data. If need be, this will be done again.

What is the rationale for this design?

A Taguchi design is another means to reduce the number of runs required below that of a full factorial design. While the full design would require 36 runs without decomposition and 64 with it, the Taguchi model will require far fewer runs, and still yield accurate main effect size data.

Assumptions

This data set meets many random design assumptions. It has many replicates within the data set, although the subgroups have smaller numbers of replicates. Blocking will not be used in this design. The data set has the qualities required to continue analysis.

Statistical Analysis

Descriptive Summary

Boxplots of the response variable totalvisits will be displayed, sorted by each combination of the independent factors.

This data set has a large number of outliers. The means are similar, but the distributions are very different, with some groups having many more outliers. This will compromise tests which assume normality, such as ANOVA. Kruskal-Wallis tests will be used in its place, as normality is not assumed in this non-parametric test.

Testing

The hypotheses for the test are as follows:

Ho - There is no statistically significant difference between the responses due to differing factor levels of the independent variables

Ha - There is a statistically significant difference between the responses due to the factor level of the independent variables

The testing will be performed using main effect analysis with a Taguchi design, and then a KW test will be used to corroborate the main effect results and create the final model.

Treatment Structure

In order to adequately test every interaction of the experiment, it would be necessary to have 36 runs with a non-broken down data set, or 64 with the decomposed data. This design options are as follows. Note that the decomposed version was chosen for this analysis because the original data set parameters (two 2-level factors and two 3-level factors) yielded a 36 run system in the Taguchi design due to the library of available designs. As this is not useful in reducing the run total, the decomposed structure was used instead.

## 6 factors on 2 levels and 0 factors on 0 levels with 0 desired interactions to be estimated
## 
## Possible Designs:
## 
## L8_2 L12_2 L16_2 L32_2
## 
## Use taguchiDesign("L8_2") or different to create a taguchi design object
## [1] "L8_2"  "L12_2" "L16_2" "L32_2"

The option with the least required runs is used, so the design L8_2 is used. By inputting this into the taguchiDesign function in R, a design matrix can be created, shown unrandomized below:

##   StandOrder RunOrder Replicate A B C D E F G  y
## 1          1        1         1 1 1 1 1 1 1 1 NA
## 2          2        2         1 1 1 1 2 2 2 2 NA
## 3          3        3         1 1 2 2 1 1 2 2 NA
## 4          4        4         1 1 2 2 2 2 1 1 NA
## 5          5        5         1 2 1 2 1 2 1 2 NA
## 6          6        6         1 2 1 2 2 1 2 1 NA
## 7          7        7         1 2 2 1 1 2 2 1 NA
## 8          8        8         1 2 2 1 2 1 1 2 NA

The final column of each row will be dropped, as there are only six factors to be tested. Subsets of the full data set will be created for each of these level combinations. This will be done through using the sum of decomposed factors to determine the appropriate level for each of the sets of the original data.

Main and Interaction Effect Calculation

The calculated means for each run are put into an array and attached to the Taguchi design so that main effects can be calculated.

Replicate 1 Results

##   StandOrder RunOrder Replicate A B C D E F G means_rep1
## 1          1        1         1 1 1 1 1 1 1 1          4
## 2          2        2         1 1 1 1 2 2 2 2          7
## 3          3        3         1 1 2 2 1 1 2 2          7
## 4          4        4         1 1 2 2 2 2 1 1          4
## 5          5        5         1 2 1 2 1 2 1 2          0
## 6          6        6         1 2 1 2 2 1 2 1          5
## 7          7        7         1 2 2 1 1 2 2 1          4
## 8          8        8         1 2 2 1 2 1 1 2         30

Replicate 2 Results

##   StandOrder RunOrder Replicate A B C D E F G means_rep2
## 1          1        1         1 1 1 1 1 1 1 1         11
## 2          2        2         1 1 1 1 2 2 2 2          3
## 3          3        3         1 1 2 2 1 1 2 2          1
## 4          4        4         1 1 2 2 2 2 1 1         21
## 5          5        5         1 2 1 2 1 2 1 2          9
## 6          6        6         1 2 1 2 2 1 2 1          1
## 7          7        7         1 2 2 1 1 2 2 1          0
## 8          8        8         1 2 2 1 2 1 1 2         10

It can be seen that there are significant differences in the results column between the two replicates. As a result, full subset means were also incorporated to mitigate the effects of outliers within the subsets.

Full Subset Mean Results

##   StandOrder RunOrder Replicate A B C D E F G full_means
## 1          1        1         1 1 1 1 1 1 1 1  11.787879
## 2          2        2         1 1 1 1 2 2 2 2   7.238095
## 3          3        3         1 1 2 2 1 1 2 2   9.607143
## 4          4        4         1 1 2 2 2 2 1 1   8.733333
## 5          5        5         1 2 1 2 1 2 1 2   9.425339
## 6          6        6         1 2 1 2 2 1 2 1   7.588496
## 7          7        7         1 2 2 1 1 2 2 1   5.965517
## 8          8        8         1 2 2 1 2 1 1 2  20.000000

Both numeric calculations and graphical representations of the main effect sizes will be shown. These analyses will be conducted with replicates and also with the full subset means. Single replicates are more susceptible to outliers, which the full subset mean hopes to alleviate. The following formula will be used:

\[ME_n = \frac{\sum{totalvisits_{n=1}} - \sum{totalvisits_{n=0}}}{4}\]

This equation produces the average difference between the values where the main effect variable is 1 and where it is 0. This is averaged over the four differences that can be taken from the 8 runs in the design. These will all be calculated using the two replicates and full subset means.

Effects Table
The results of the main effect calculation for each replicate and the full data set analysis is shown below.

Replicate 1

##   Main Effect
## A        4.25
## B        7.25
## C       -7.25
## D        7.75
## E       -7.75
## F       -3.75

Replicate 2

##   Main Effect
## A        -4.0
## B         2.0
## C         2.0
## D         3.5
## E         2.5
## F       -11.5

The numeric values line up with the graph, allowing the checking of the calculation process within the design. It is easy to see that there is a significant amount of variance between the main effects between the two replicates, making it difficult to determine what effects are significant and in what direction (positive or negative).

Analysis of Full Data Set
Due to significant variation between the main effect sizes for each replicate, the mean totalvisits value will be used instead of taking a random sample. All results will be discussed using these numbers.

##   Main Effect
## A    1.403225
## B    2.066546
## C   -2.409295
## D    1.693511
## E   -4.405308
## F   -4.886825


These main effects can be compared to those of project 3, shown side to side below.

##   ME Taguchi     ME FFD
## A   1.403225 -0.2269288
## B   2.066546 -0.1762097
## C  -2.409295 -1.2255017
## D   1.693511  2.8773048
## E  -4.405308 -6.6480638
## F  -4.886825 -6.5169795


The main effects of the Taguchi design are somewhat different than those yielded from the fractional factorial design. The signs of A and B are different, and the magnitudes of each of the values are very different.

Model Construction

The table shows that with a Taguchi design, all of the main factors are reasonably large. Of particular interest are the later 4 variables, which are the decomposition of the original region and health variables.

In order to corroborate the insertion of these values into the model, we will conduct a Kruskal Wallis test to observe significance. This will be conducted on the full data set, with only 4 factors, instead of the 6 that the original set was decomposed to, in order to ensure that the test reflects the differences in the factors. A KW test is used in place of the ANOVA, as the distribution of the response variable is skewed..

## 
##  Kruskal-Wallis rank sum test
## 
## data:  totalvisits by sex
## Kruskal-Wallis chi-squared = 21.666, df = 1, p-value = 3.246e-06
## 
##  Kruskal-Wallis rank sum test
## 
## data:  totalvisits by employed
## Kruskal-Wallis chi-squared = 7.1706, df = 1, p-value = 0.007411
## 
##  Kruskal-Wallis rank sum test
## 
## data:  totalvisits by region
## Kruskal-Wallis chi-squared = 31.852, df = 2, p-value = 1.212e-07
## 
##  Kruskal-Wallis rank sum test
## 
## data:  totalvisits by health
## Kruskal-Wallis chi-squared = 157.38, df = 2, p-value < 2.2e-16

The KW test indicates that all of the variables are statistically significant, indicating that we should reject the null for each variable. This indicates that all variables should be included in the model, which could look as follows:

\[totalvisits = Intercept + \beta_1*sex + \beta_2*employed + \beta_3*region + \beta_4*health\]

or alternatively as

\[totalvisits = Intercept + 1.403*A + 2.067*B - 2.409*C + 1.694*D -4.405*E - 4.887*F\]

This model indicates that men are more likely to visit the doctor than women, employed people are more likely than the unemployed to visit a doctor, that people in excellent health are less likely to visit the doctor, and that there is an effect of the region variable. However, since the decomposed variables have mixed signs, it is hard to tell for sure what the impact is.

Model Diagnostics

Based off of the linear model function in R, and the residuals that result from its usage, there are flaws within the model, as the residuals are skewed above 0. This is likely due to the predominance of outliers on the higher side of the data sets. This would skew the residuals as most ‘misses’ with these outliers would result in high residuals.

Conclusions

Taguchi designs yield different results than a FFD does when using the same data. Without committing to completing a full factorial design with the same data, it is impossible to tell which is the more accurate of the models. However, the taguchi design does not reduce the amount of experimental runs required to obtain results as compared to a FFD. However, the results may be more accurate, although this remains to be fully examined. Further factors would be required in order to fully improve the model, as the ones present do not explain all of the variance, as shown by larger residuals. As this experiment is a post-hoc assessment of Taguchi designs, it is not a perfect usage of the experiment, but it results in good approximations of main effect sizes.

Appendix 1: Raw Data

The data was drawn from the R ecdat library, and the dataset used was OFP. The structure and the head of the data set can be seen in the section Experimental Setting - Data Overview

Appendix 2: R Code

require(Ecdat) # data package
require(qualityTools) # Taguchi package
df <- OFP # Create data frame

head(df) # Show first 6 rows

str(df) # Show structure

df2 <- df[complete.cases(df),] # Eliminate incomplete cases
df2$totalvisits <- df2$ofp + df2$ofnp + df2$opp + df2$opnp # Create totalvisits variable by summing smaller variables
df2 <- df2[,c(11,15,18,19,20)] # Cut down data set

# Show head, structure, and summary of new data set
head(df2)

str(df2)

summary(df2)

l = nrow(df2) # Find length of frame

# Create four new frames for writing numbers instead of factors 
sexnum = data.frame(l)
emplnum = data.frame(l)
region = data.frame(l)
health = data.frame(l)

## Replacement loop
for (i in 1:l){
  
  # sex replace male with 1 and female with 0
  if (df2$sex[i] == "male"){
    sexnum[i,1] <- 1
    }  else {
    sexnum[i,1] <- 0
   }
  
  # employment replace yes with 1 and no with 0
  if (df2$employed[i] == "yes"){
    emplnum[i,1] <- 1
   }  else {
    emplnum[i,1] <- 0
   }
  
  # region west = 1, other + midwest = 1, noreast = 2
  if (df2$region[i] == "west"){
    
    region[i,1] <- 1
  }
  if (df2$region[i] == "other"){
    region[i,1] <- 0
  }  
  if (df2$region[i] == "midwest") {
    region[i,1] <- 0
  } 
  if (df2$region[i] == "noreast") {
    region[i,1] <- 2
  }
  
  #health poor = 0, other = 1, excellent = 2
  if (df2$hlth[i] == "poor"){
    health[i,1] <- 0
  }
  
  if (df2$hlth[i] == "other") {
    health[i,1] <- 1
  } 
  if (df2$hlth[i] == "excellent") {
    health[i,1] <- 2
  }
  
}

# Create numbered data frame out of column vectors and response variable 
df3 <- cbind(sexnum,emplnum,region,health,df2$totalvisits)

# Title columns of data frame appropriately
colnames(df3) <- c("sex","employed","region","health","totalvisits")

# Show head, structure and summary for newest data set
head(df3)

str(df3)

summary(df3)

# Boxplot of overall factors with labeled axes 
boxplot(df3$totalvisits ~ df3$sex + df3$employed + df3$region + df3$health, xlab= "sex.employed.region.health",ylab="Total doctor's visits", main="Analysis of Testing Variables")


# Find and print appropriate Taguchi design for data set
b <- taguchiChoose(6,0,2,0)
print(b)

#Show the structure of the Taguchi design matrix
b <- taguchiDesign("L8_2", randomize = F)
print(b)

#Subset creation for main effect calculations

set000000 <- subset(df3,sex == "0" & employed == "0" & region == "0" & health == "0")

set000111 <- subset(df3,sex == "0" & employed == "0" & region == "1" & health == "2")

set011001 <- subset(df3,sex == "0" & employed == "1" & region == "1" & health == "1")

set011110 <- subset(df3,sex == "0" & employed == "1" & region == "2" & health == "1")

set101010 <- subset(df3,sex == "1" & employed == "0" & region == "1" & health == "1")

set101101 <- subset(df3,sex == "1" & employed == "0" & region == "2" & health == "1")

set110011 <- subset(df3,sex == "1" & employed == "1" & region == "0" & health == "2")

set110100 <- subset(df3,sex == "1" & employed == "1" & region == "1" & health == "0")


# Randomizes the data frame so replicates are chosen randomly 
myfun <- function (df){
  a <- sample(nrow(df)) # Randomize data frame
  b <- a[1] # Take first response
  c <- a[2] # Take second resposne
  
  
  return(c(df$totalvisits[b],df$totalvisits[c])) # return variable
}

# Randomize all subsets
mean_000000 <- myfun(set000000)
mean_000111 <- myfun(set000111)
mean_011001 <- myfun(set011001)
mean_011110 <- myfun(set011110)
mean_101010 <- myfun(set101010)
mean_101101 <- myfun(set101101)
mean_110011 <- myfun(set110011)
mean_110100 <- myfun(set110100)

# Full subset mean calculations
fullmean_000000 <- mean(set000000$totalvisits)
fullmean_000111 <- mean(set000111$totalvisits)
fullmean_011001 <- mean(set011001$totalvisits)
fullmean_011110 <- mean(set011110$totalvisits)
fullmean_101010 <- mean(set101010$totalvisits)
fullmean_101101 <- mean(set101101$totalvisits)
fullmean_110011 <- mean(set110011$totalvisits)
fullmean_110100 <- mean(set110100$totalvisits)

# Create replicate 1 vector
means_rep1 <- c(mean_000000[1], mean_000111[1], mean_011001[1],mean_011110[1],mean_101010[1],mean_101101[1],mean_110011[1],mean_110100[1])

#Create replicate 2 vector
means_rep2 <- c(mean_000000[2], mean_000111[2], mean_011001[2],mean_011110[2] ,mean_101010[2], mean_101101[2], mean_110011[2], mean_110100[2])

# Convert to matrix forms to allow table display
means_rep1 <- as.matrix(means_rep1)
means_rep2 <- as.matrix(means_rep2)

# Create full subset vector
means <- c(fullmean_000000,fullmean_000111,fullmean_011001,fullmean_011110,fullmean_101010,fullmean_101101,fullmean_110011,fullmean_110100)

# Convert to matrix
full_means <- as.matrix(means)

#Create two additional copies of Taguchi design 
a <- b
c <- b

# Insert responses into Taguchi matrix
response(a) <- full_means
response(b) <- means_rep1
response(c) <- means_rep2

# Show all design matrices with responses
print(b) 
print(c) 
print(a) 

# Main effect calculation for all replicates and full subset.
ME_A1 <- 1/4 * ( (mean_101010[1] + mean_101101[1] + mean_110011[1] + mean_110100[1]) - (mean_000111[1] + mean_000000[1] + mean_011001[1] + mean_011110[1]))

ME_B1 <- 1/4 * ((mean_011001[1] + mean_011110[1] + mean_110011[1] + mean_110100[1]) - (mean_000000[1] + mean_101010[1] + mean_101101[1] + mean_000111[1]))

ME_C1 <- 1/4 * ((mean_011001[1] + mean_011110[1] + mean_101010[1] + mean_101101[1]) - (mean_000111[1] + mean_000000[1] + mean_110011[1] + mean_110100[1]))

ME_D1 <- 1/4 * ((mean_000111[1] + mean_011110[1] + mean_101101[1] + mean_110100[1]) - (mean_000000[1] + mean_011001[1] + mean_101010[1] + mean_110011[1]))

ME_E1 <- 1/4 * ((mean_000111[1]  + mean_011110[1] + mean_101010[1] + mean_110011[1] - (mean_000000[1] + mean_011001[1] + mean_101101[1] + mean_110100[1])))

ME_F1 <- 1/4 * ((mean_000111[1] + mean_011001[1] + mean_101101[1] + mean_110011[1] - (mean_000000[1] + mean_011110[1] + mean_110100[1] + mean_101010[1])))

# Second replicate

ME_A2 <- 1/4 * ( (mean_101010[2] + mean_101101[2] + mean_110011[2] + mean_110100[2]) - (mean_000111[2] + mean_000000[2] + mean_011001[2] + mean_011110[2]))

ME_B2 <- 1/4 * ((mean_011001[2] + mean_011110[2] + mean_110011[2] + mean_110100[2]) - (mean_000000[2] + mean_101010[2] + mean_101101[2] + mean_000111[2]))

ME_C2 <- 1/4 * ((mean_011001[2] + mean_011110[2] + mean_101010[2] + mean_101101[2]) - (mean_000111[2] + mean_000000[2] + mean_110011[2] + mean_110100[2]))

ME_D2 <- 1/4 * ((mean_000111[2] + mean_011110[2] + mean_101101[2] + mean_110100[2]) - (mean_000000[2] + mean_011001[2] + mean_101010[2] + mean_110011[2]))

ME_E2 <- 1/4 * ((mean_000111[2]  + mean_011110[2] + mean_101010[2] + mean_110011[2] - (mean_000000[2] + mean_011001[2] + mean_101101[2] + mean_110100[2])))

ME_F2 <- 1/4 * ((mean_000111[2] + mean_011001[2] + mean_101101[2] + mean_110011[2] - (mean_000000[2] + mean_011110[2] + mean_110100[2] + mean_101010[2])))

#Mean3

ME_A3 <- 1/4 * ( (fullmean_101010 + fullmean_101101 + fullmean_110011 + fullmean_110100) - (fullmean_000111 + fullmean_000000 + fullmean_011001 + fullmean_011110))

ME_B3 <- 1/4 * ((fullmean_011001 + fullmean_011110 + fullmean_110011 + fullmean_110100) - (fullmean_000000 + fullmean_101010 + fullmean_101101 + fullmean_000111))

ME_C3 <- 1/4 * ((fullmean_011001 + fullmean_011110 + fullmean_101010 + fullmean_101101) - (fullmean_000111 + fullmean_000000 + fullmean_110011 + fullmean_110100))

ME_D3 <- 1/4 * ((fullmean_000111 + fullmean_011110 + fullmean_101101 + fullmean_110100) - (fullmean_000000 + fullmean_011001 + fullmean_101010 + fullmean_110011))

ME_E3 <- 1/4 * ((fullmean_000111  + fullmean_011110 + fullmean_101010 + fullmean_110011 - (fullmean_000000 + fullmean_011001 + fullmean_101101 + fullmean_110100)))

ME_F3 <- 1/4 * ((fullmean_000111 + fullmean_011001 + fullmean_101101 + fullmean_110011 - (fullmean_000000+ fullmean_011110 + fullmean_110100 + fullmean_101010)))

#Create main effect table vector
main_effect_table1 <- matrix(c(ME_A1,ME_B1,ME_C1,ME_D1,ME_E1,ME_F1), ncol = 1)


# Convert to table
main_effect_table1 <- as.table(main_effect_table1)

# Label Rows
rownames(main_effect_table1) <- c("A","B","C","D","E","F")

# Label columns
colnames(main_effect_table1) <- "Main Effect"

# Display table
main_effect_table1

# Create main effect plots
effectPlot(b, factors = c("A","B","C","D","E","F"))


# Create table 2 vector
main_effect_table2 <- matrix(c(ME_A2,ME_B2,ME_C2,ME_D2,ME_E2,ME_F2), ncol = 1)

# Convert to table
main_effect_table2 <- as.table(main_effect_table2)

# Label rows
rownames(main_effect_table2) <- c("A","B","C","D","E","F")

# Label columns
colnames(main_effect_table2) <- "Main Effect"

# Show table
main_effect_table2

# Create ME plot
effectPlot(c, factors = c("A","B","C","D","E","F"))

# Create full subset vector
main_effect_table3 <- matrix(c(ME_A3,ME_B3,ME_C3,ME_D3,ME_E3,ME_F3), ncol = 1)

# Convert to table
main_effect_table3 <- as.table(main_effect_table3)

# Label rows
rownames(main_effect_table3) <- c("A","B","C","D","E","F")

# Label columns
colnames(main_effect_table3) <- "Main Effect"

# Print table
main_effect_table3

# Show effect plot
effectPlot(a, factors = c("A","B","C","D","E","F"))

# Compare Taguchi results to FFD results
main_effect_table3 <- matrix(c(ME_A3,ME_B3,ME_C3,ME_D3,ME_E3,ME_F3,-0.2269288, -0.1762097,-1.2255017,2.8773048,-6.6480638,-6.5169795), ncol = 2)

# Label columns
colnames(main_effect_table3) <- c("ME Taguchi", "ME FFD")

# Label rows
rownames(main_effect_table3) <- c("A","B","C","D","E","F")

# Show table
main_effect_table3

# Analyze significance of each variable in set and then show results using KW test
av1 <- kruskal.test(totalvisits~ sex, data = df3)

av1

av2 <- kruskal.test(totalvisits~ employed, data = df3)
av2

av3 <- kruskal.test(totalvisits~ region, data = df3)
av3

av4 <- kruskal.test(totalvisits~ health, data = df3)

av4

# Create linear model permitting plotting of residuals
model_lin <- lm(totalvisits~sex+employed+region+health, data = df3)

# Plot residuals
plot(model_lin)