Score <- c(3,2,1,4,3,2,3,4,4,2,2,3,1,2,3,5,4,3,5,3,4,3,3,5,5,4,1,3,
4,3,5,5,3,3,4,4,4,4,5,2,3,1)
Group <- as.factor(c(rep("ir",14),rep("dr1",14),rep("dr2",14)))
Helping <- data.frame(Group, Score)Multiple Comparison Procedures
Creating the Data
This set of data is taken from Multiple Comparison Procedures by Larry E. Toothaker SAGE, Newbury Park, 1993. It describes a psychology experiment which examines helping behaviour in children when they were explicitly told to look after a child in the next room (direct responsibility) and when they were just told that there was a child in the next room who should not climb on a chair (indirect responsibility - IR). The direct responsibility group were further divided into those who heard calls for help from the other room (DR2) and those who didn’t (DR1)
Creating the data in R:
R in its infinite wisdom arranges the factors in alphabetical order which probably isn’t what you want and so you need to specify the order.
levels(Helping$Group)[1] "dr1" "dr2" "ir"
Helping$Group <- ordered(Helping$Group,
levels = c("ir","dr1", "dr2"))
levels(Helping$Group)[1] "ir" "dr1" "dr2"
In the single factor case there are J total treatments that correspond to the J factor levels. We carry out K replications of each treatment.
\(\bar{X}_{j}\) is the mean for the treatment j
\[ \bar{X}_{j}=\frac{1}{K}\sum_{k=1}^{k=K}X_{jk} \;\;\;\;\; j=1,2,3,...,J \]
The grand mean across all of the treatments is:
\[ \bar{X}=\frac{1}{JK}\sum_{j=1}^{j=J}\sum_{k=1}^{k=K}X_{jk} \]
The total variation is.
\[ V=\sum_{j,k}(X_{jk}-\bar{X})^2 \]
This can be divided into two terms with some mathematical cleverness.
1) add and subtract the mean of the subgroups \(\bar{X}_{j}\) to each of the terms like so.
\[ V=\sum_{j,k}((X_{jk}-\bar{X}_{j})+(\bar{X}_{j}-\bar{X}))^2 \]
2) square the resulting formula and collect together the squares
\[ V=\sum_{j,k}(X_{jk}-\bar{X}_{j})^{2}+(\bar{X}_{j}-\bar{X})^2+2\sum_{j,k}(X_{jk}-\bar{X}_{j})(\bar{X}_{j}-\bar{X}) \]
3) ignoring the first two terms considering only the third term. First split out the sum over the treatments J.
\[ \sum_{j=1}^{J}(\bar{X}_{j}-\bar{X})\left[\sum_{k=1}^{K} (X_{jk}-\bar{X}_{j})\right] \]
4) looking at the terms in the square brackets \(\bar{X}_{j}\) doesn’t depend on k and so the sum of this term will be \(K\bar{X}_{j}\)
\[ \left[\sum_{k=1}^{K} (X_{jk}-\bar{X}_{j})\right]= \sum_{k=1}^KX_{jk}-K\left(\frac{1}{K}\sum_{k=1}^KX_{jk}\right) = 0 \]
Therefore the final term is 0 and:
\[V=\sum_{j,k}(X_{jk}-\bar{X}_{j})^{2}+K\sum_{j}(\bar{X}_{j}-\bar{X})^2\]
The first term is defined as \(V_{w}\). The variation within the treatments.The second term is defined as \(V_{b}\). The variation between the treatments.
The degrees of freedom for \(V_{b}=J-1\)
The degrees of freedom for \(V_{w}=J(K-1)\)
From these you can calculate the mean squares for within and between treatments, by dividing the variation by the degrees of freedom.
After that mathematical digression we can return to the analysis and calculate the means for each of the groups.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
group_by(Helping, Group) %>%
summarise(
count = n(),
mean = mean(Score, na.rm = TRUE),
sd = sd(Score, na.rm=TRUE)
)# A tibble: 3 × 4
Group count mean sd
<ord> <int> <dbl> <dbl>
1 ir 14 2.57 1.02
2 dr1 14 3.64 1.15
3 dr2 14 3.57 1.16
We can also visualise this data:
library(ggpubr)Loading required package: ggplot2
ggboxplot(Helping, x="Group", y="Score",
color = "Group", palette= c("#4B0082","#5c3317", "#176608"),
order = c("ir","dr1","dr2"),
ylab = "Helping Score", xlab = "Treatment")library(ggpubr)
ggline(Helping, x="Group", y="Score",
add = c("mean_se","jitter"),
order = c("ir","dr1","dr2"),
ylab = "Helping Score", xlab = "Treatment")Now you can carry out the one way ANOVA
results.anova <- aov(Score ~ Group, data = Helping)
summary(results.anova) Df Sum Sq Mean Sq F value Pr(>F)
Group 2 10.05 5.024 4.076 0.0247 *
Residuals 39 48.07 1.233
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This shows that there is a significant result.
You can now carry our Tukey’s honest significant differences for the multiple pairwise comparisons.
TukeyHSD(results.anova) Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Score ~ Group, data = Helping)
$Group
diff lwr upr p adj
dr1-ir 1.07142857 0.04909097 2.093766 0.0382162
dr2-ir 1.00000000 -0.02233761 2.022338 0.0563419
dr2-dr1 -0.07142857 -1.09376618 0.950909 0.9841598
Alternatively you can carry out the corrected t-test using the Benjamini and Hochberg false discovery rate, Holm’s method or the Bonferroni correction.
pairwise.t.test(Helping$Score, Helping$Group,
p.adjust.method ="BH")
Pairwise comparisons using t tests with pooled SD
data: Helping$Score and Helping$Group
ir dr1
dr1 0.033 -
dr2 0.033 0.866
P value adjustment method: BH
pairwise.t.test(Helping$Score, Helping$Group,
p.adjust.method ="holm")
Pairwise comparisons using t tests with pooled SD
data: Helping$Score and Helping$Group
ir dr1
dr1 0.044 -
dr2 0.044 0.866
P value adjustment method: holm
pairwise.t.test(Helping$Score, Helping$Group,
p.adjust.method ="bonferroni")
Pairwise comparisons using t tests with pooled SD
data: Helping$Score and Helping$Group
ir dr1
dr1 0.044 -
dr2 0.066 1.000
P value adjustment method: bonferroni
In this case we could treat the indirect responsibility case as the control and compare the other two direct responsibility cases to it. In this case we would use Dunnet’s test.
library(DescTools)
DunnettTest(x=Helping$Score, g=Helping$Group)
Dunnett's test for comparing several treatments with a control :
95% family-wise confidence level
$ir
diff lwr.ci upr.ci pval
dr1-ir 1.071429 0.10838612 2.034471 0.0274 *
dr2-ir 1.000000 0.03695755 1.963042 0.0409 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1