This document is written using R Markdown. The source code is available in 03-anova-lab-answers.R.
Let us load the data as suggested:
d <- read.table("../data/tab13-tabagisme.dat", header = FALSE)
names(d) <- c("task", "group", "error")
d$task <- factor(d$task, labels = c("ident", "cogn", "simul"))
summary(d)
## task group error
## ident:45 FA:45 Min. : 0.0
## cogn :45 FP:45 1st Qu.: 6.0
## simul:45 NF:45 Median :11.0
## Mean :18.3
## 3rd Qu.:26.0
## Max. :75.0
As can be seen, there are two factors, ident and cogn, for this experiment using independent groups (we don't really need to add an extra subject variable in this case). A very basic formula that we will use throughout this session would look like error ~ group + task, which means “describe error by a combination of ident and cogn”. This simple formula allows to summarize and plot data by groups. For ANOVA, we will also need to include an interaction term, hence the formula error ~ group + task + group:task, which can also be written as error ~ group * task.
The mean number of errors in each treatment (here, a treatment means any combination of levels for the two factors) can be obtained using aggregate() and the basic formula discussed above:
aggregate(error ~ group + task, data = d, mean)
## group task error
## 1 FA ident 9.933
## 2 FP ident 9.600
## 3 NF ident 9.400
## 4 FA cogn 47.533
## 5 FP cogn 39.933
## 6 NF cogn 28.867
## 7 FA simul 2.333
## 8 FP simul 6.800
## 9 NF simul 9.933
We can use the same approach to compute standard deviation, replacing mean by sd. It is also possible to write a little function that will compute these two statistics for each treatment: (Note that we gave names to the returned values.)
f <- function(x) c(mean = mean(x), sd = sd(x))
aggregate(error ~ group + task, data = d, f)
## group task error.mean error.sd
## 1 FA ident 9.933 6.519
## 2 FP ident 9.600 4.405
## 3 NF ident 9.400 1.404
## 4 FA cogn 47.533 14.652
## 5 FP cogn 39.933 20.133
## 6 NF cogn 28.867 14.687
## 7 FA simul 2.333 2.289
## 8 FP simul 6.800 5.441
## 9 NF simul 9.933 6.006
The aggregate() command will separate the whole data frame into smaller chunks corresponding to each treatment e.g., task=ident and group=FA), to which the f() function will be applied. It would be possible to create a treatment indicator with the help of the interaction() command:
group.task <- interaction(d$group, d$task)
aggregate(error ~ group.task, data = d, mean)
## group.task error
## 1 FA.ident 9.933
## 2 FP.ident 9.600
## 3 NF.ident 9.400
## 4 FA.cogn 47.533
## 5 FP.cogn 39.933
## 6 NF.cogn 28.867
## 7 FA.simul 2.333
## 8 FP.simul 6.800
## 9 NF.simul 9.933
Here is a quick visual inspection of the distribution of errors:
bwplot(error ~ task | group, data = d, pch = "|", layout = c(3, 1))
The error ~ task | group expression just means “draw a boxplot of errors by task, for each level of group”. This is arranged in a custom layout (3 panels arranged on one row). We often see average data shown as bar charts (barchart() in R) in the litterature. It is also possible to rely on Cleveland's dotplots, which generally offer a good data-ink ratio:
error.means <- aggregate(error ~ group + task, data = d, mean)
dotplot(task ~ error, data = error.means, groups = group, type = "l", auto.key = list(space = "right",
points = FALSE, lines = TRUE))
Note that we stored the average results in a separate variable, and since the aggregate() command returns a data frame, it can easily be used with lattice commands.
Describe in plain English the apparent patterns of variation.
The total number of statistical units available in each treatment can be obtained with replication(). It is also useful to check when a given design is balanced or not.
replications(error ~ group * task, data = d)
## group task group:task
## 45 45 15
The two-way ANOVA can be fitted using the extended formula error ~ group * task, and we get the following results:
m <- aov(error ~ group * task, data = d)
summary(m)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 355 177 1.64 0.19733
## task 2 28662 14331 132.90 < 2e-16 ***
## group:task 4 2729 682 6.33 0.00011 ***
## Residuals 126 13587 108
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA results indicate that the interaction between the two factors is significant at the 5% level, suggesting that the effect of task depends on the levels of group. Thus, simple effects cannot be interpreted directly. An interaction plot will help summarizing those results, since the absence of interaction would result in “parallel lines” for the marginal effects in the following display:
xyplot(error ~ group, data = d, groups = task, type = c("a", "g"), auto.key = list(corner = c(1,
1), points = FALSE, lines = TRUE))
To summarize simple effects, we can fit separate one-way models for each level of, say, task. To illustrate the use of simple loops in R, we create a small iteration over task levels:
op <- options(show.signif.stars = FALSE)
for (cond in levels(d$task)) {
cat("Task = ", cond, "\n")
print(summary(aov(error ~ group, data = d, subset = task == cond)))
cat("\n")
}
## Task = ident
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 2 1.09 0.05 0.95
## Residuals 42 894 21.29
##
## Task = cogn
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 2643 1322 4.74 0.014
## Residuals 42 11700 279
##
## Task = simul
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 438 218.8 9.26 0.00047
## Residuals 42 993 23.6
options(op)
Note that in this case we need to explicitly print() the results. The first and last line are here to suppress part of the ANOVA table, namely “significance stars”. These comparisons are, however, less powerful than the full ANOVA model (we use smaller samples, and the error risk accumulated over multiple tests), and significance tests must be treated with caution. Sum of squares remain, however, fully interpretable. The above results would suggest that the group effect essentially holds in the last two tasks (cogn and simul).
Although we computed average error for each treatment, it is also possible to provide a measure of the effect size for each factor. In the one-way design, dividing the sum of squares (SS) for the effect by the total sum of squares (effect + error), also called the \( \eta^2 \), provides such a measure: it sumamrizes the proportion of variance accounted for by the factor under study. In two-way designs, we can compute partial \( \eta^2 \) along the same idea, but the denominator would only include the residual and effect SSs. In this case, the partial effect for the task factor would be:
28662/(28662 + 13587)
## [1] 0.6784
that is, about 68% of explained variance. Confidence intervals can be computed for partial effects by using the ci.pvalf() command from the MBESS package. See http://yatani.jp/HCIstats/ANOVA for more examples.