A sample of 44 healthy male college students participated in an experiment, Each student was asked to memorize a list of 40 words (20 on a green list and 20 on a red list).The students were then randomly assigned of one of 4 treatments: Group A: received 2 alcoholic drinks, Group AC: received 2 alcoholic drinks with caffeine powder dissolved in their drinks, Group AR: received 2 alcoholic drinks and a monetary reward for correct responses on the task, Group P: were told they received 2 alcoholic drinks, but instead just received a carbonated drink.
After consuming their drinks and resting for 25 minutes, the students performed the word completion task. The response variable ”task score” represents the difference between the proportion of correct responses on the green list and incorrect on the red list.
Consider the design of the data collection: For the One and Two Factor ANOVA we are covering the analysis on completely randomized designs. We want to ensure our data meet that criteria and define the other components of the experimental design.
drinkersdata<-read.delim("https://raw.githubusercontent.com/kvaranyak4/STAT3220/main/DRINKERS.txt",header=T)
names(drinkersdata)
[1] "GROUP" "SCORE"
head(drinkersdata)
table(drinkersdata$GROUP)
A AC AR P
11 11 11 11
This is a completely randomized design because subjects are randomly assigned to treatments. There is one factor, group, with four levels: AR, AC, A, and P. Therefore, there are four treatments: AR, AC, A, P. It is balanced because each treatment has the same number of experimental units. The response variable is the task score.
We will look at side by side box plots of the treatment groups and compare the mean value. We can also compare the 5 number summary for a more detailed interpretation. (We could also determine the approximate distribution of each treatment group- in the next section we will see that we want the treatments to have normally distributed data.)
boxplot(SCORE~GROUP, data=drinkersdata)
tapply(drinkersdata$SCORE,drinkersdata$GROUP,summary)
$A
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.35000 -0.05000 0.16000 0.06364 0.19000 0.31000
$AC
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0200 0.1750 0.2200 0.2655 0.3750 0.5000
$AR
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.400 0.500 0.440 0.525 0.610
$P
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.12 0.33 0.43 0.40 0.47 0.62
drinkanova1<-aov(SCORE~GROUP, data=drinkersdata)
summary(drinkanova1)
Df Sum Sq Mean Sq F value Pr(>F)
GROUP 3 0.9506 0.3169 10.29 3.76e-05 ***
Residuals 40 1.2317 0.0308
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The question of interest is: Does coffee or some other form of stimulation really allow a person suffering from alcohol intoxication to ”sober up”? What is your conclusion? Include any relevant inferences and confidence intervals. Use alpha=0.05.
TukeyHSD(drinkanova1,conf.level=.95)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = SCORE ~ GROUP, data = drinkersdata)
$GROUP
diff lwr upr p adj
AC-A 0.2018182 0.001256169 0.4023802 0.0480809
AR-A 0.3763636 0.175801623 0.5769256 0.0000618
P-A 0.3363636 0.135801623 0.5369256 0.0003282
AR-AC 0.1745455 -0.026016558 0.3751075 0.1075832
P-AC 0.1345455 -0.066016558 0.3351075 0.2892292
P-AR -0.0400000 -0.240562013 0.1605620 0.9501001
plot(TukeyHSD(drinkanova1,conf.level=.95))
# Compute Minimum Significant Difference w for interactions
# q(p,v) Critical studentized range
q<-qtukey(.95,nmeans=4,df=40)
q
[1] 3.790685
s<-sqrt(0.0308)
n<-11
w<-q*(s/sqrt(n))
w
[1] 0.2005842
Note: If 0 is captured in any interval, it would imply the means are not significantly different. Therefore, we see (with confidence intervals, Tukey’s test, and comparing to w) the group that just had alcohol is significantly different from the rest, but the control, reward, and caffeine group are not significantly different from each other.
We will look at the confidence intervals for the difference in the mean scores of the alcohol group compared to the others. Note all of the confidence intervals are positive. This implies that the alcohol group under performs compared to every other group.
Pair | Lower | Upper |
---|---|---|
AC-A | 0.00126 | 0.4024 |
AR-A | 0.1758 | 0.5769 |
P-A | 0.1358 | 0.5369 |
We are 95% confident that the true mean difference of scores for those who also had caffeine is between 0.00126 and 0.40 higher than the group with just alcohol. There is a significant difference between the group with just alcohol and the group with caffeine, but it may not be practically significant.
#NOTE: this function is qqPlot from the car package
qqPlot(drinkersdata$SCORE, groups=drinkersdata$GROUP)
We see the normality assumption is reasonably met because the scores fo
each tratment group follow closely on the line of the QQ Plot. We will
use Bartlett’s Test of Homogeneity.
Because we determined the normality assumption was reasonably met, we will use Bartlett’s test of homogeneity.
#this function is in the stats package
bartlett.test(SCORE~GROUP, data=drinkersdata)
Bartlett test of homogeneity of variances
data: SCORE by GROUP
Bartlett's K-squared = 1.7548, df = 3, p-value = 0.6248
#Visualize Constant Variance
boxplot(SCORE~GROUP, data=drinkersdata)
The chemical element antimony is sometimes added to tin – lead solder to replace the more expensive tin and to reduce the cost of soldering. A factorial experiment was conducted to determine how antimony affects the strength of the tin–lead solder joint (Journal of Materials Science, May 1986). Tin–lead solder specimens were prepared using one of four possible cooling methods (water- quenched, WQ; oil-quenched, OQ; air-blown, AB; and furnace-cooled, FC) and with one of four possible amounts of antimony (0%, 3%, 5%, and 10%) added to the composition. Three solder joints were randomly assigned to each of the 4 × 4 = 16 treatments and the shear strength of each measured.
Consider the design of the data collection: For the One and Two Factor ANOVA we are covering the analysis on completely randomized designs. We want to ensure our data meet that criteria and define the other components of the experimental design.
tindata<-read.delim("https://raw.githubusercontent.com/kvaranyak4/STAT3220/main/TINLEAD.txt",header=T)
names(tindata)
[1] "ANTIMONY" "METHOD" "STRENGTH"
head(tindata)
table(tindata$ANTIMONY,tindata$METHOD)
AB FC OQ WQ
0 3 3 3 3
3 3 3 3 3
5 3 3 3 3
10 3 3 3 3
This is a completely randomized design because experimental units are randomly assigned to treatments. There two factors, cooling method, with four levels: AB, FC, OQ, and WQ and antimony with four levels 0%, 3%, 5%, and 10%. Therefore, there are 16 treatments. It is balanced because each treatment has the same number of experimental units. The response variable is the sheer strength.
For two factor designs, we will examine the interaction plot first.
# We want to begin with examining the interaction
interaction.plot(tindata$METHOD, tindata$ANTIMONY, tindata$STRENGTH,fun=mean,trace.label="Antimony Level"
, xlab="Cooling Method",ylab="Mean Sheer Strength",main="Interaction Plot Cooling Method X Antimony")
# We can also look at the main effects
#cooling method
boxplot(STRENGTH~METHOD, data=tindata)
tapply(tindata$STRENGTH,tindata$METHOD,summary)
$AB
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.80 18.05 20.70 20.18 22.30 22.90
$FC
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.40 17.60 19.50 18.95 20.00 20.90
$OQ
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.40 19.75 20.50 20.45 21.15 24.30
$WQ
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.20 17.48 18.80 18.64 19.50 22.30
#antimony level
boxplot(STRENGTH~ANTIMONY, data=tindata)
tapply(tindata$STRENGTH,tindata$ANTIMONY,summary)
$`0`
Min. 1st Qu. Median Mean 3rd Qu. Max.
17.60 19.12 19.80 20.18 20.70 24.30
$`3`
Min. 1st Qu. Median Mean 3rd Qu. Max.
18.60 19.38 20.20 20.41 21.10 22.90
$`5`
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.40 19.68 20.55 20.62 21.77 22.90
$`10`
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.20 16.40 17.10 17.02 17.60 19.00
NOTE: With a two factor factorial design, we will always test the significance of the interaction first.
# We include the interaction
# Because Antimony was read in as a numeric variable, we want to
# indicate that it is actually a factor
tinanova1<-aov(STRENGTH~METHOD*factor(ANTIMONY), data=tindata)
summary(tinanova1)
Df Sum Sq Mean Sq F value Pr(>F)
METHOD 3 28.63 9.54 5.527 0.00357 **
factor(ANTIMONY) 3 104.19 34.73 20.117 1.64e-07 ***
METHOD:factor(ANTIMONY) 9 25.13 2.79 1.617 0.15226
Residuals 32 55.25 1.73
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#We do not include the interaction, just add the terms
# Because Antimony was read in as a numeric variable, we want to
# indicate that it is actually a factor
tinanova2<-aov(STRENGTH~METHOD+factor(ANTIMONY), data=tindata)
summary(tinanova2)
Df Sum Sq Mean Sq F value Pr(>F)
METHOD 3 28.63 9.54 4.868 0.00549 **
factor(ANTIMONY) 3 104.19 34.73 17.716 1.58e-07 ***
Residuals 41 80.38 1.96
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the main effects are significant, we go to post hoc analysis (Tukey’s test in this class).
TukeyHSD(tinanova2,conf.level = 0.95)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = STRENGTH ~ METHOD + factor(ANTIMONY), data = tindata)
$METHOD
diff lwr upr p adj
FC-AB -1.2250000 -2.75555373 0.305553731 0.1567570
OQ-AB 0.2750000 -1.25555373 1.805553731 0.9628573
WQ-AB -1.5333333 -3.06388706 -0.002779602 0.0494355
OQ-FC 1.5000000 -0.03055373 3.030553731 0.0565864
WQ-FC -0.3083333 -1.83888706 1.222220398 0.9488521
WQ-OQ -1.8083333 -3.33888706 -0.277779602 0.0149509
$`factor(ANTIMONY)`
diff lwr upr p adj
3-0 0.2333333 -1.297220 1.763887 0.9767248
5-0 0.4416667 -1.088887 1.972220 0.8663147
10-0 -3.1583333 -4.688887 -1.627780 0.0000119
5-3 0.2083333 -1.322220 1.738887 0.9832100
10-3 -3.3916667 -4.922220 -1.861113 0.0000031
10-5 -3.6000000 -5.130554 -2.069446 0.0000010
plot(TukeyHSD(tinanova2,conf.level = 0.95))
We would like to evaluate the distribution of each treatment.
#NOTE: this function is qqPlot from the car package
qqPlot(tindata$STRENGTH, groups=tindata$METHOD)
qqPlot(tindata$STRENGTH, groups=tindata$ANTIMONY)
Because we determined the normality assumption was reasonably met, we will use Bartlett’s test of homogeneity.
#this function is in the stats package
bartlett.test(STRENGTH~ANTIMONY, data=tindata)
Bartlett test of homogeneity of variances
data: STRENGTH by ANTIMONY
Bartlett's K-squared = 5.0649, df = 3, p-value = 0.1671
bartlett.test(STRENGTH~METHOD, data=tindata)
Bartlett test of homogeneity of variances
data: STRENGTH by METHOD
Bartlett's K-squared = 2.6014, df = 3, p-value = 0.4572
Three brands of paper towels were compared to test their strength. A random sample of 26 towels from each of the three brands were tested. Weights were added to the center of the towel as it was held tight. Then drops of water were added. There were three levels of water (0 drops, 5 drops, and 15 drops). The strength was recorded as how much weight the towel held before breaking.
Consider the design of the data collection: For the One and Two Factor ANOVA we are covering the analysis on completely randomized designs. We want to ensure our data meet that criteria and define the other components of the experimental design.
toweldata<-read.csv("https://raw.githubusercontent.com/kvaranyak4/STAT3220/main/C4%20PaperTowel2.csv",header=T)
names(toweldata)
[1] "Brand" "Water" "Strength"
head(toweldata)
table(toweldata$Brand,toweldata$Water)
0 5 15
Bounty 26 26 26
Comfort 26 26 26
Decorator 26 26 26
This is a completely randomized design because experimental units are randomly chosen from each treatment. There two factors, brand, with three levels: Bounty, Comfort, and Decorator and water with three levels 0, 5, and 15 drops. Therefore, there are 9 treatments. It is balanced because each treatment has the same number of experimental units. The response variable is the strength of the towel.
For two factor designs, we will examine the interaction plot first. We can look at side by side box plots of the main effect treatment groups and compare the mean value.
# We want to begin with examining the interaction
interaction.plot(toweldata$Brand, toweldata$Water, toweldata$Strength,fun=mean,trace.label="Water Drops"
, xlab="Brand",ylab="Mean Strength",main="Interaction Plot Water Drops X Brand")
interaction.plot(toweldata$Water, toweldata$Brand, toweldata$Strength,fun=mean,trace.label="Brand"
, xlab="Water",ylab="Mean Strength",main="Interaction Plot Water Drops X Brand")
# We can also look at the main effects
#cooling method
boxplot(Strength~Brand, data=toweldata)
tapply(toweldata$Strength,toweldata$Brand,summary)
$Bounty
Min. 1st Qu. Median Mean 3rd Qu. Max.
1450 2050 2700 2599 3088 3600
$Comfort
Min. 1st Qu. Median Mean 3rd Qu. Max.
300.0 456.2 1700.0 1772.8 2900.0 3700.0
$Decorator
Min. 1st Qu. Median Mean 3rd Qu. Max.
300 500 700 1123 2000 2700
#antimony level
boxplot(Strength~Water, data=toweldata)
tapply(toweldata$Strength,toweldata$Water,summary)
$`0`
Min. 1st Qu. Median Mean 3rd Qu. Max.
1700 2425 2900 2824 3300 3700
$`5`
Min. 1st Qu. Median Mean 3rd Qu. Max.
550 800 1700 1764 2650 3450
$`15`
Min. 1st Qu. Median Mean 3rd Qu. Max.
300.0 400.0 487.5 906.1 1687.5 2350.0
NOTE: With a two factor factorial design, we will always test the significance of the interaction first.
# We include the interaction
# Because Antimony was read in as a numeric variable, we want to
# indicate that it is actually a factor
towelanova1<-aov(Strength~factor(Water)*Brand, data=toweldata)
summary(towelanova1)
Df Sum Sq Mean Sq F value Pr(>F)
factor(Water) 2 144038884 72019442 1277.0 <2e-16 ***
Brand 2 85291704 42645852 756.2 <2e-16 ***
factor(Water):Brand 4 27105924 6776481 120.2 <2e-16 ***
Residuals 225 12689471 56398
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(towelanova1,conf.level = .95)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Strength ~ factor(Water) * Brand, data = toweldata)
$`factor(Water)`
diff lwr upr p adj
5-0 -1059.9359 -1149.6554 -970.2164 0
15-0 -1918.2692 -2007.9887 -1828.5497 0
15-5 -858.3333 -948.0528 -768.6138 0
$Brand
diff lwr upr p adj
Comfort-Bounty -825.9615 -915.6810 -736.2420 0
Decorator-Bounty -1475.3205 -1565.0400 -1385.6010 0
Decorator-Comfort -649.3590 -739.0785 -559.6395 0
$`factor(Water):Brand`
diff lwr upr p adj
5:Bounty-0:Bounty -171.15385 -377.46198 35.15428 0.1930999
15:Bounty-0:Bounty -1176.92308 -1383.23121 -970.61495 0.0000000
0:Comfort-0:Bounty 157.69231 -48.61582 364.00044 0.2922960
5:Comfort-0:Bounty -1336.53846 -1542.84659 -1130.23033 0.0000000
15:Comfort-0:Bounty -2647.11538 -2853.42351 -2440.80725 0.0000000
0:Decorator-0:Bounty -828.84615 -1035.15428 -622.53802 0.0000000
5:Decorator-0:Bounty -2343.26923 -2549.57736 -2136.96110 0.0000000
15:Decorator-0:Bounty -2601.92308 -2808.23121 -2395.61495 0.0000000
15:Bounty-5:Bounty -1005.76923 -1212.07736 -799.46110 0.0000000
0:Comfort-5:Bounty 328.84615 122.53802 535.15428 0.0000416
5:Comfort-5:Bounty -1165.38462 -1371.69275 -959.07649 0.0000000
15:Comfort-5:Bounty -2475.96154 -2682.26967 -2269.65341 0.0000000
0:Decorator-5:Bounty -657.69231 -864.00044 -451.38418 0.0000000
5:Decorator-5:Bounty -2172.11538 -2378.42351 -1965.80725 0.0000000
15:Decorator-5:Bounty -2430.76923 -2637.07736 -2224.46110 0.0000000
0:Comfort-15:Bounty 1334.61538 1128.30725 1540.92351 0.0000000
5:Comfort-15:Bounty -159.61538 -365.92351 46.69275 0.2765142
15:Comfort-15:Bounty -1470.19231 -1676.50044 -1263.88418 0.0000000
0:Decorator-15:Bounty 348.07692 141.76879 554.38505 0.0000105
5:Decorator-15:Bounty -1166.34615 -1372.65428 -960.03802 0.0000000
15:Decorator-15:Bounty -1425.00000 -1631.30813 -1218.69187 0.0000000
5:Comfort-0:Comfort -1494.23077 -1700.53890 -1287.92264 0.0000000
15:Comfort-0:Comfort -2804.80769 -3011.11582 -2598.49956 0.0000000
0:Decorator-0:Comfort -986.53846 -1192.84659 -780.23033 0.0000000
5:Decorator-0:Comfort -2500.96154 -2707.26967 -2294.65341 0.0000000
15:Decorator-0:Comfort -2759.61538 -2965.92351 -2553.30725 0.0000000
15:Comfort-5:Comfort -1310.57692 -1516.88505 -1104.26879 0.0000000
0:Decorator-5:Comfort 507.69231 301.38418 714.00044 0.0000000
5:Decorator-5:Comfort -1006.73077 -1213.03890 -800.42264 0.0000000
15:Decorator-5:Comfort -1265.38462 -1471.69275 -1059.07649 0.0000000
0:Decorator-15:Comfort 1818.26923 1611.96110 2024.57736 0.0000000
5:Decorator-15:Comfort 303.84615 97.53802 510.15428 0.0002268
15:Decorator-15:Comfort 45.19231 -161.11582 251.50044 0.9989206
5:Decorator-0:Decorator -1514.42308 -1720.73121 -1308.11495 0.0000000
15:Decorator-0:Decorator -1773.07692 -1979.38505 -1566.76879 0.0000000
15:Decorator-5:Decorator -258.65385 -464.96198 -52.34572 0.0035990
plot(TukeyHSD(towelanova1,conf.level = .95))
# Compute Minimum Significant Difference w for interactions
# q(p,v) Critical studentized range
q<-qtukey(.95,nmeans=9,df=225)
q
[1] 4.429682
s<-sqrt(56398)
n<-26
w<-q*(s/sqrt(n))
w
[1] 206.3088
We would like to evaluate the distribution of each treatment.
# NOTE: this function is qqPlot from the car package
qqPlot(toweldata$Strength, groups=toweldata$Brand)
qqPlot(toweldata$Strength, groups=toweldata$Water)
## EXTRA PLOTTING NOTES
# There are a few options to create plots for each treatment
# So far we have been using base R plotting.
# We can use a loop to make histograms for each level of one factor
par(mfcol=c(2, 2))
for (i in unique(toweldata$Brand)){
hist(toweldata$Strength[toweldata$Brand==i], xlab="Strength", main=paste("Histogram of Brand", i))
}
unique(toweldata$Brand)
[1] "Decorator" "Comfort" "Bounty"
# We can use dev.off() to clear the plot formatting
# dev.off()
# Making plots by two factors is even more complicated in base R plotting
# The package ggplot2 with the function ggplot() is useful for
# more customizable graphics.
towelplot<-ggplot(toweldata, aes(x=Strength))+ geom_histogram(bins=12, center=.5)+ facet_grid(Brand~Water)
towelplot
Since Normality does not seem reasonably met, we will use Levene’s Test of equal variance.
leveneTest(Strength ~ Brand*factor(Water), data = toweldata,center=median)
boxplot(Strength ~ Brand+Water, data = toweldata)