Experimental Setting
System under test
The data being tested are from the Ecdat package in R, and the data set is named OFP. This was a survey of a variety of factors which impacted how often people went to a variety of medical appointments.
This data will be analyzed in a similar fashion to the last report. All of the visits will be combined into one variable, which will be analyzed through the use of 4 factors that may have an impact. Two of the factors will be 2-level, and two will be 3-level. A Taguchi design will be created, carried out, analyzed and then compared to the results obtained from the FFD carried out in the previous report.
Factors and Levels
The original testing independent variables of this dataset are as follows:
Sex, with levels male and female
Employed, with levels yes and no
Region, with levels noreast, west, midwest and other
Health, with levels excellent, poor, and other
All of these variables are currently assumed to have some effect on the number of visits a patient requires.
Continuous Variables
The response variable, totalvisits is a sum of the number of patient visits to a variety of medical practitioners that was originally surveyed. This represents the number of times that a subject had an issue for which they had to see a medical professional (excluding hospital visits).
Response Variables
The response variable is overall number of visits totalvisits, which is derived as explained above.
Data Overview
head(df)
## ofp ofnp opp opnp emr hosp numchron adldiff age black sex maried
## 1 5 0 0 0 0 1 2 0 6.9 yes male yes
## 2 1 0 2 0 2 0 2 0 7.4 no female yes
## 3 13 0 0 0 3 3 4 1 6.6 yes female no
## 4 16 0 5 0 1 1 2 1 7.6 no male yes
## 5 3 0 0 0 0 0 2 1 7.9 no female yes
## 6 17 0 0 0 0 0 5 1 6.6 no female no
## school faminc employed privins medicaid region hlth
## 1 6 2.8810 yes yes no other other
## 2 10 2.7478 no yes no other other
## 3 10 0.6532 no no yes other poor
## 4 3 0.6588 no yes no other poor
## 5 6 0.6588 no yes no other other
## 6 7 0.3301 no no yes other poor
str(df)
## 'data.frame': 4406 obs. of 19 variables:
## $ ofp : int 5 1 13 16 3 17 9 3 1 0 ...
## $ ofnp : int 0 0 0 0 0 0 0 0 0 0 ...
## $ opp : int 0 2 0 5 0 0 0 0 0 0 ...
## $ opnp : int 0 0 0 0 0 0 0 0 0 0 ...
## $ emr : int 0 2 3 1 0 0 0 0 0 0 ...
## $ hosp : int 1 0 3 1 0 0 0 0 0 0 ...
## $ numchron: int 2 2 4 2 2 5 0 0 0 0 ...
## $ adldiff : int 0 0 1 1 1 1 0 0 0 0 ...
## $ age : num 6.9 7.4 6.6 7.6 7.9 6.6 7.5 8.7 7.3 7.8 ...
## $ black : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "female","male": 2 1 1 2 1 1 1 1 1 1 ...
## $ maried : Factor w/ 2 levels "no","yes": 2 2 1 2 2 1 1 1 1 1 ...
## $ school : int 6 10 10 3 6 7 8 8 8 8 ...
## $ faminc : num 2.881 2.748 0.653 0.659 0.659 ...
## $ employed: Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
## $ privins : Factor w/ 2 levels "no","yes": 2 2 1 2 2 1 2 2 2 2 ...
## $ medicaid: Factor w/ 2 levels "no","yes": 1 1 2 1 1 2 1 1 1 1 ...
## $ region : Factor w/ 4 levels "other","noreast",..: 1 1 1 1 1 1 3 3 3 3 ...
## $ hlth : Factor w/ 3 levels "other","excellent",..: 1 1 3 3 1 3 1 1 1 1 ...
The data set contains 4406 observations of 19 separate variables. The following alterations will be performed:
All incomplete cases in the set will be removed.
The ofp, ofnp, opp, and opnp variables are combined to show the total number of doctors’ office visits (does not include hospital visits) that a patient made over the span of the study.
The data set will be reduced to the response variable totalvisits, and the independent variables sex, employed, region, and hlth.
The final data set is shown below.
head(df2)
## sex employed region hlth totalvisits
## 1 male yes other other 5
## 2 female no other other 3
## 3 female no other poor 13
## 4 male no other poor 21
## 5 female no other other 3
## 6 female no other poor 17
str(df2)
## 'data.frame': 4406 obs. of 5 variables:
## $ sex : Factor w/ 2 levels "female","male": 2 1 1 2 1 1 1 1 1 1 ...
## $ employed : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
## $ region : Factor w/ 4 levels "other","noreast",..: 1 1 1 1 1 1 3 3 3 3 ...
## $ hlth : Factor w/ 3 levels "other","excellent",..: 1 1 3 3 1 3 1 1 1 1 ...
## $ totalvisits: int 5 3 13 21 3 17 9 3 1 0 ...
summary(df2)
## sex employed region hlth
## female:2628 no :3951 other :1614 other :3509
## male :1778 yes: 455 noreast: 837 excellent: 343
## midwest:1157 poor : 554
## west : 798
##
##
## totalvisits
## Min. : 0.000
## 1st Qu.: 2.000
## Median : 6.000
## Mean : 8.679
## 3rd Qu.: 11.000
## Max. :296.000
The factor levels are to be replaced with numbers in order to better express high and low levels. the changes are as follows:
sex: male = 1, female = 0
employed: yes = 1, no = 0
region: other and midwest = 0, west = 1, noreast = 2
hlth: poor = 0, other = 1, excellent = 2
The decisions regarding combinations and ordering of factors are based on experimenter’s judgment. other seems to describe the south and great plains region of the USA, which fits more closely with the midwest than any of the other regions in the country. other was chosen as the middle level in the health factor, as it seems that poor and excellent adequately define the bounds of self-described health.
The numeric data set is shown below.
## sex employed region health totalvisits
## 1 1 1 0 1 5
## 2 0 0 0 1 3
## 3 0 0 0 0 13
## 4 1 0 0 0 21
## 5 0 0 0 1 3
## 6 0 0 0 0 17
## 'data.frame': 4406 obs. of 5 variables:
## $ sex : num 1 0 0 1 0 0 0 0 0 0 ...
## $ employed : num 1 0 0 0 0 0 0 0 0 0 ...
## $ region : num 0 0 0 0 0 0 0 0 0 0 ...
## $ health : num 1 1 0 0 1 0 1 1 1 1 ...
## $ totalvisits: int 5 3 13 21 3 17 9 3 1 0 ...
## sex employed region health
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :1.0000
## Mean :0.4035 Mean :0.1033 Mean :0.5611 Mean :0.9521
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :2.0000 Max. :2.0000
## totalvisits
## Min. : 0.000
## 1st Qu.: 2.000
## Median : 6.000
## Mean : 8.679
## 3rd Qu.: 11.000
## Max. :296.000
Appendix 2: R Code
require(Ecdat) # data package
require(qualityTools) # Taguchi package
df <- OFP # Create data frame
head(df) # Show first 6 rows
str(df) # Show structure
df2 <- df[complete.cases(df),] # Eliminate incomplete cases
df2$totalvisits <- df2$ofp + df2$ofnp + df2$opp + df2$opnp # Create totalvisits variable by summing smaller variables
df2 <- df2[,c(11,15,18,19,20)] # Cut down data set
# Show head, structure, and summary of new data set
head(df2)
str(df2)
summary(df2)
l = nrow(df2) # Find length of frame
# Create four new frames for writing numbers instead of factors
sexnum = data.frame(l)
emplnum = data.frame(l)
region = data.frame(l)
health = data.frame(l)
## Replacement loop
for (i in 1:l){
# sex replace male with 1 and female with 0
if (df2$sex[i] == "male"){
sexnum[i,1] <- 1
} else {
sexnum[i,1] <- 0
}
# employment replace yes with 1 and no with 0
if (df2$employed[i] == "yes"){
emplnum[i,1] <- 1
} else {
emplnum[i,1] <- 0
}
# region west = 1, other + midwest = 1, noreast = 2
if (df2$region[i] == "west"){
region[i,1] <- 1
}
if (df2$region[i] == "other"){
region[i,1] <- 0
}
if (df2$region[i] == "midwest") {
region[i,1] <- 0
}
if (df2$region[i] == "noreast") {
region[i,1] <- 2
}
#health poor = 0, other = 1, excellent = 2
if (df2$hlth[i] == "poor"){
health[i,1] <- 0
}
if (df2$hlth[i] == "other") {
health[i,1] <- 1
}
if (df2$hlth[i] == "excellent") {
health[i,1] <- 2
}
}
# Create numbered data frame out of column vectors and response variable
df3 <- cbind(sexnum,emplnum,region,health,df2$totalvisits)
# Title columns of data frame appropriately
colnames(df3) <- c("sex","employed","region","health","totalvisits")
# Show head, structure and summary for newest data set
head(df3)
str(df3)
summary(df3)
# Boxplot of overall factors with labeled axes
boxplot(df3$totalvisits ~ df3$sex + df3$employed + df3$region + df3$health, xlab= "sex.employed.region.health",ylab="Total doctor's visits", main="Analysis of Testing Variables")
# Find and print appropriate Taguchi design for data set
b <- taguchiChoose(6,0,2,0)
print(b)
#Show the structure of the Taguchi design matrix
b <- taguchiDesign("L8_2", randomize = F)
print(b)
#Subset creation for main effect calculations
set000000 <- subset(df3,sex == "0" & employed == "0" & region == "0" & health == "0")
set000111 <- subset(df3,sex == "0" & employed == "0" & region == "1" & health == "2")
set011001 <- subset(df3,sex == "0" & employed == "1" & region == "1" & health == "1")
set011110 <- subset(df3,sex == "0" & employed == "1" & region == "2" & health == "1")
set101010 <- subset(df3,sex == "1" & employed == "0" & region == "1" & health == "1")
set101101 <- subset(df3,sex == "1" & employed == "0" & region == "2" & health == "1")
set110011 <- subset(df3,sex == "1" & employed == "1" & region == "0" & health == "2")
set110100 <- subset(df3,sex == "1" & employed == "1" & region == "1" & health == "0")
# Randomizes the data frame so replicates are chosen randomly
myfun <- function (df){
a <- sample(nrow(df)) # Randomize data frame
b <- a[1] # Take first response
c <- a[2] # Take second resposne
return(c(df$totalvisits[b],df$totalvisits[c])) # return variable
}
# Randomize all subsets
mean_000000 <- myfun(set000000)
mean_000111 <- myfun(set000111)
mean_011001 <- myfun(set011001)
mean_011110 <- myfun(set011110)
mean_101010 <- myfun(set101010)
mean_101101 <- myfun(set101101)
mean_110011 <- myfun(set110011)
mean_110100 <- myfun(set110100)
# Full subset mean calculations
fullmean_000000 <- mean(set000000$totalvisits)
fullmean_000111 <- mean(set000111$totalvisits)
fullmean_011001 <- mean(set011001$totalvisits)
fullmean_011110 <- mean(set011110$totalvisits)
fullmean_101010 <- mean(set101010$totalvisits)
fullmean_101101 <- mean(set101101$totalvisits)
fullmean_110011 <- mean(set110011$totalvisits)
fullmean_110100 <- mean(set110100$totalvisits)
# Create replicate 1 vector
means_rep1 <- c(mean_000000[1], mean_000111[1], mean_011001[1],mean_011110[1],mean_101010[1],mean_101101[1],mean_110011[1],mean_110100[1])
#Create replicate 2 vector
means_rep2 <- c(mean_000000[2], mean_000111[2], mean_011001[2],mean_011110[2] ,mean_101010[2], mean_101101[2], mean_110011[2], mean_110100[2])
# Convert to matrix forms to allow table display
means_rep1 <- as.matrix(means_rep1)
means_rep2 <- as.matrix(means_rep2)
# Create full subset vector
means <- c(fullmean_000000,fullmean_000111,fullmean_011001,fullmean_011110,fullmean_101010,fullmean_101101,fullmean_110011,fullmean_110100)
# Convert to matrix
full_means <- as.matrix(means)
#Create two additional copies of Taguchi design
a <- b
c <- b
# Insert responses into Taguchi matrix
response(a) <- full_means
response(b) <- means_rep1
response(c) <- means_rep2
# Show all design matrices with responses
print(b)
print(c)
print(a)
# Main effect calculation for all replicates and full subset.
ME_A1 <- 1/4 * ( (mean_101010[1] + mean_101101[1] + mean_110011[1] + mean_110100[1]) - (mean_000111[1] + mean_000000[1] + mean_011001[1] + mean_011110[1]))
ME_B1 <- 1/4 * ((mean_011001[1] + mean_011110[1] + mean_110011[1] + mean_110100[1]) - (mean_000000[1] + mean_101010[1] + mean_101101[1] + mean_000111[1]))
ME_C1 <- 1/4 * ((mean_011001[1] + mean_011110[1] + mean_101010[1] + mean_101101[1]) - (mean_000111[1] + mean_000000[1] + mean_110011[1] + mean_110100[1]))
ME_D1 <- 1/4 * ((mean_000111[1] + mean_011110[1] + mean_101101[1] + mean_110100[1]) - (mean_000000[1] + mean_011001[1] + mean_101010[1] + mean_110011[1]))
ME_E1 <- 1/4 * ((mean_000111[1] + mean_011110[1] + mean_101010[1] + mean_110011[1] - (mean_000000[1] + mean_011001[1] + mean_101101[1] + mean_110100[1])))
ME_F1 <- 1/4 * ((mean_000111[1] + mean_011001[1] + mean_101101[1] + mean_110011[1] - (mean_000000[1] + mean_011110[1] + mean_110100[1] + mean_101010[1])))
# Second replicate
ME_A2 <- 1/4 * ( (mean_101010[2] + mean_101101[2] + mean_110011[2] + mean_110100[2]) - (mean_000111[2] + mean_000000[2] + mean_011001[2] + mean_011110[2]))
ME_B2 <- 1/4 * ((mean_011001[2] + mean_011110[2] + mean_110011[2] + mean_110100[2]) - (mean_000000[2] + mean_101010[2] + mean_101101[2] + mean_000111[2]))
ME_C2 <- 1/4 * ((mean_011001[2] + mean_011110[2] + mean_101010[2] + mean_101101[2]) - (mean_000111[2] + mean_000000[2] + mean_110011[2] + mean_110100[2]))
ME_D2 <- 1/4 * ((mean_000111[2] + mean_011110[2] + mean_101101[2] + mean_110100[2]) - (mean_000000[2] + mean_011001[2] + mean_101010[2] + mean_110011[2]))
ME_E2 <- 1/4 * ((mean_000111[2] + mean_011110[2] + mean_101010[2] + mean_110011[2] - (mean_000000[2] + mean_011001[2] + mean_101101[2] + mean_110100[2])))
ME_F2 <- 1/4 * ((mean_000111[2] + mean_011001[2] + mean_101101[2] + mean_110011[2] - (mean_000000[2] + mean_011110[2] + mean_110100[2] + mean_101010[2])))
#Mean3
ME_A3 <- 1/4 * ( (fullmean_101010 + fullmean_101101 + fullmean_110011 + fullmean_110100) - (fullmean_000111 + fullmean_000000 + fullmean_011001 + fullmean_011110))
ME_B3 <- 1/4 * ((fullmean_011001 + fullmean_011110 + fullmean_110011 + fullmean_110100) - (fullmean_000000 + fullmean_101010 + fullmean_101101 + fullmean_000111))
ME_C3 <- 1/4 * ((fullmean_011001 + fullmean_011110 + fullmean_101010 + fullmean_101101) - (fullmean_000111 + fullmean_000000 + fullmean_110011 + fullmean_110100))
ME_D3 <- 1/4 * ((fullmean_000111 + fullmean_011110 + fullmean_101101 + fullmean_110100) - (fullmean_000000 + fullmean_011001 + fullmean_101010 + fullmean_110011))
ME_E3 <- 1/4 * ((fullmean_000111 + fullmean_011110 + fullmean_101010 + fullmean_110011 - (fullmean_000000 + fullmean_011001 + fullmean_101101 + fullmean_110100)))
ME_F3 <- 1/4 * ((fullmean_000111 + fullmean_011001 + fullmean_101101 + fullmean_110011 - (fullmean_000000+ fullmean_011110 + fullmean_110100 + fullmean_101010)))
#Create main effect table vector
main_effect_table1 <- matrix(c(ME_A1,ME_B1,ME_C1,ME_D1,ME_E1,ME_F1), ncol = 1)
# Convert to table
main_effect_table1 <- as.table(main_effect_table1)
# Label Rows
rownames(main_effect_table1) <- c("A","B","C","D","E","F")
# Label columns
colnames(main_effect_table1) <- "Main Effect"
# Display table
main_effect_table1
# Create main effect plots
effectPlot(b, factors = c("A","B","C","D","E","F"))
# Create table 2 vector
main_effect_table2 <- matrix(c(ME_A2,ME_B2,ME_C2,ME_D2,ME_E2,ME_F2), ncol = 1)
# Convert to table
main_effect_table2 <- as.table(main_effect_table2)
# Label rows
rownames(main_effect_table2) <- c("A","B","C","D","E","F")
# Label columns
colnames(main_effect_table2) <- "Main Effect"
# Show table
main_effect_table2
# Create ME plot
effectPlot(c, factors = c("A","B","C","D","E","F"))
# Create full subset vector
main_effect_table3 <- matrix(c(ME_A3,ME_B3,ME_C3,ME_D3,ME_E3,ME_F3), ncol = 1)
# Convert to table
main_effect_table3 <- as.table(main_effect_table3)
# Label rows
rownames(main_effect_table3) <- c("A","B","C","D","E","F")
# Label columns
colnames(main_effect_table3) <- "Main Effect"
# Print table
main_effect_table3
# Show effect plot
effectPlot(a, factors = c("A","B","C","D","E","F"))
# Compare Taguchi results to FFD results
main_effect_table3 <- matrix(c(ME_A3,ME_B3,ME_C3,ME_D3,ME_E3,ME_F3,-0.2269288, -0.1762097,-1.2255017,2.8773048,-6.6480638,-6.5169795), ncol = 2)
# Label columns
colnames(main_effect_table3) <- c("ME Taguchi", "ME FFD")
# Label rows
rownames(main_effect_table3) <- c("A","B","C","D","E","F")
# Show table
main_effect_table3
# Analyze significance of each variable in set and then show results using KW test
av1 <- kruskal.test(totalvisits~ sex, data = df3)
av1
av2 <- kruskal.test(totalvisits~ employed, data = df3)
av2
av3 <- kruskal.test(totalvisits~ region, data = df3)
av3
av4 <- kruskal.test(totalvisits~ health, data = df3)
av4
# Create linear model permitting plotting of residuals
model_lin <- lm(totalvisits~sex+employed+region+health, data = df3)
# Plot residuals
plot(model_lin)