ANOVA Tests |
|
PCA connection: where PCA finds principal components of variance, ANOVA analyzes variance in the directions of the independent variables.
One Way ANOVA
ANOVA tests if there are significant statistical differences between the means of three or more groups (student’s t-testis used to test two groups) classified by a single independent factor.
hypotheses:
- Null \(H_0\): all groups have the same mean
- Alernative \(H_A\): not all groups have the same mean
Requires/Expects:
- single continous/quantitative dependent variable \(y\) (aka metric variable)
- single factored/qualitative independent variable \(p\) (aka nominal variable)
\[y\sim f(p)\]
Assumptions:
- independence: the independent variables should be mutually independent (strictly no confounding e.g.
intellegence ~ age, shoe_size
) - normality: data should be normally distributed within groups (testable using the Shapiro-Wilk test) or the residuals from the normal model should be normally distributed (testable via
qqnorm
+qqline
plot). The Mean Squared Error (MSE) of the residuals estimates the within-group variance and assumes normality for valid inference. - homogeneity / homoscedasticity: each group should have the same variance (testable through the Levene test; if violated, a Welch ANOVA can be used, where degrees of freedom are adjusted as in the Welch test)
test statistic:
Uses the F distribution/F-value: \[\frac{\text{Variance Between Groups}}{\text{Variance within Groups}}\sim F\]
\[ \underbrace{\sum_{j=1}^{k} \sum_{i=1}^{n_j} (y_{ij} - \bar{y})^2}_{\text{Total Sum of Squares}} = \underbrace{\sum_{j=1}^{k} n_j (\bar{y}_j - \bar{y})^2}_{\text{SSB (Between-group)}} + \underbrace{\sum_{j=1}^{k} \sum_{i=1}^{n_j} (y_{ij} - \bar{y}_j)^2}_{\text{SSW (Within-group)}} \]
example:
# Built-in data
<-as.data.table(ToothGrowth)
dt:=as.factor(dose)]
dt[,dose
# One-way ANOVA
<- aov(len ~ dose, data = dt)
model summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
dose 2 2426 1213 67.42 9.53e-16 ***
Residuals 57 1026 18
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Diagnostic plots
par(mfrow = c(2, 2))
plot(model)
Two Way ANOVA
ANOVA tests if there are significant statistical differences between the means of three or more groups (student’s t-testis used to test two groups) classified by a multiple independent factors.
Compound hypotheses:
There’s a symmetric matrix of things to check here: - Main effect of first factor: - Null \(H_{0_{11}}\): all groups of the first factor have the same mean
- Alternative \(H_{A_{11}}\): not all groups of the first factor have the same mean
- Main effect of second factor: - Null \(H_{0_{22}}\): all groups of the second factor have the same mean
- Alternative \(H_{A_{22}}\): not all groups of the second factor have the same mean
- Interaction of first and second factors: - Null \(H_{0_{12}}\): no interaction between the two factors
- Alternative \(H_{A_{12}}\): there is an interaction between the two factors
Requires/Expects:
- single continous/quantitative dependent variable \(y\) (aka metric variable)
- multiple factored/qualitative independent variable \(a,b,c\) (aka nominal variable)
\[y\sim f(a,b,c)\]
Assumptions:
- independence: the independent variables should be mutually independent (strictly no confounding e.g.
intellegence ~ age, shoe_size
) - normality: data should be normally distributed within groups (testable using the Shapiro-Wilk test) or the residuals from the normal model should be normally distributed (testable via
qqnorm
+qqline
plot). The Mean Squared Error (MSE) of the residuals estimates the within-group variance and assumes normality for valid inference. - homogeneity / homoscedasticity: each group should have the same variance (testable through the Levene test)
test statistic:
Total variance in \(y\) is the sum of variance in the independent variables \(A,B\) and the standard error. \[SS_{tot}=SS_A+SS_B+SS_{AB}+SS_{err}\]
Uses the F distribution/F-value: \[\frac{\text{Variance Between Groups}}{\text{Variance within Groups}}\sim F\]
The featurespace has not been standardised so the PCs don’t look mutually orthogonal due to scale
Two-way ANOVA feature space with principle components shown
example:
<- as.data.table(ToothGrowth)
dt c("supp","dose"):=lapply(.SD,as.factor), .SDcols=c("supp","dose")]
dt[,
<- aov(len ~ dose * supp, data = dt)
model_twoway summary(model_twoway)
Df Sum Sq Mean Sq F value Pr(>F)
dose 2 2426.4 1213.2 92.000 < 2e-16 ***
supp 1 205.4 205.4 15.572 0.000231 ***
dose:supp 2 108.3 54.2 4.107 0.021860 *
Residuals 54 712.1 13.2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(dt$dose, dt$supp, dt$len,
col = 1:3, lty = 1, lwd = 2,
ylab = "odentoblast length", xlab = "Dose")
ANOVA with Repeated Measures
TODO
Mixed Model ANOVA
TODO