ANCOVA: Analysis of the variance in the outcome variable after adjusting for the linear covariance between the outcome and continuous covariates, while also partitioning variance due to categorical factors. By default, ANCOVA models treat factors and covariates additively (no interaction), though interaction terms can be included in extended models.
Factorial ANOVA: Analysis of the variance in the outcome \(y\) with respect to two or more categorical factors considered simultaneously, including all their main effects and (optionally) their interactions. This design allows testing whether factors act independently or whether their effects on \(y\) depend on each other.
KC 1. ANOVA as Adjustment
click here to expand/collapse…
While ANOVA models a single continuous response variable \(y\) (strictly one dimensional), its core idea is partitioning the variation in \(y\) according to different explanatory factors (fundamentally multi-dimensional).
In this sense, ANOVA is fundamentally about adjustment i.e. we compare group means while accounting for other sources of variation in the data.
For example: - In a one-way ANOVA, we adjust \(y\) for differences among levels of a single factor.
- In a two-way ANOVA, we adjust for two factors simultaneously, and can also include their interaction. - In ANCOVA, we further adjust for continuous covariates, which highlights the regression perspective - removing variance explained by the covariates before testing factor effects.
Conceptually, each ANOVA is performing a series of comparisons of means while controlling for other influences which can be viewed as sequential adjustments.
From this perspective, ANOVA is not just a test of group differences but it is a framework for understanding how multiple factors jointly influence the outcome.
KC 2. Snedecor’s F-Test Statistic
click here to expand/collapse…
Consider the linear (alternative-hypothesis) model \(\mathbf{Y}=X_A\beta+\varepsilon\) with \(\varepsilon\sim N(\mathbf{0},\sigma^2 I_n)\), where \(X_A\) is the full design matrix of rank \(p_A\) encoding all predictors, including intercepts, main effects, interactions, or covariates. A reduced (null-hypothesis) model\(X_0\) of rank \(p_0<p_A\) contains only the predictors assumed true under the null hypothesis (e.g., intercept only for testing group differences).
The hat (projection) matrices are symmetric and idempotent, projecting onto the column spaces of the full and reduced models. \[
P_A := X_A(X_A^\top X_A)^{-1}X_A^\top \qquad
P_0 := X_0(X_0^\top X_0)^{-1}X_0^\top
\]
the sum of squares for the effect being tested is given by the quadratic form: \[\mathrm{SS}_{\text{effect}} = \mathbf{Y}^\top (P_A-P_0)\mathbf{Y}\qquad\text{with dof}\quad\nu_1 = \operatorname{rank}(P_A-P_0) = p_A-p_0\]
\(P_A-P_0\) is the projection that is missed by the null model
\(\mathrm{SS}_{\text{effect}}\) is the residual explainable sum of squares
In ANOVA, an “effect” refers to the contribution of a factor (or set of predictors) to explaining variance in the outcome.
the residual sum of squares for the full (alternative) model is also convex (McCabe, 2024): \[\mathrm{SS}_{\text{res}} = \mathbf{Y}^\top (I-P_A)\mathbf{Y}\qquad\text{with dof}\quad\nu_2 = \operatorname{rank}(I-P_A) = n-p_A\]
\(I-P_A\) is the projection that no linear model could ever capture due to noise/nonlinearity etc…
\(\mathrm{SS}_{\text{res}}\) is the noise
TODO: Since \(P-P_0\) and \(I-P\) are symmetric, idempotent, and project onto orthogonal subspaces, Cochran’s theorem implies two independent chi-squared variables. \[\frac{\mathrm{SS}_{\text{effect}}}{\sigma^2} \sim \chi^2_{\nu_1}, \qquad
\frac{\mathrm{SS}_{\text{res}}}{\sigma^2} \sim \chi^2_{\nu_2}\]
Form the mean squares \[
\mathrm{MS}_{\text{effect}} = \frac{\mathrm{SS}_{\text{effect}}}{\nu_1}, \qquad
\mathrm{MS}_{\text{res}} = \frac{\mathrm{SS}_{\text{res}}}{\nu_2}.
\] Under the null hypothesis, the ratio \[
\frac{\text{Variance with Adjustment}}{\text{Variance without adjustment}}
=\frac{\mathrm{MS}_{\text{effect}}}{\mathrm{MS}_{\text{res}}}
= \frac{\big(\mathrm{SS}_{\text{effect}}/\nu_1\big)}{\big(\mathrm{SS}_{\text{res}}/\nu_2\big)}
\sim F_{\nu_1,\nu_2}.
\]
Remarks / assumptions: exact \(F\)-distribution requires \(\varepsilon \sim N(0, \sigma^2 I)\). Homoscedasticity and independence are implied by this assumption. In practice, for large samples the \(F\)-test is approximately valid under weaker conditions via asymptotic arguments.
KC 3. Dependent Variable Types
click here to expand/collapse…
ANOVA is essentially a regression model where all predictors are exclusively categorical factors ANCOVA which sits under the ANCOVA umbrella uses categorical factors as well as including continuous covariates.
KC 3.1. Continuous Covariates
In ANOVA, the response variable \(y\) is continuous, while the explanatory variables are categorical factors (e.g. treatment groups). The method partitions total variation in \(y\) into components attributable to each factor and to residual error.
In ANCOVA, we extend this framework by including continuous explanatory variables (covariates) alongside categorical factors. These covariates are adjusted for in the model, removing part of the variability in \(y\) that is explained by linear relationships with them. This makes the regression aspect explicit: ANCOVA is essentially ANOVA embedded in a linear regression framework, where predictors can be both categorical (coded as dummies/indicators) and continuous.
We can include factor variables in our regression, and they behave like normal covariates.
For example, consider political affiliation: Republican / Democrat / Neither. You might think to assign each factor a numeric value, but this imposes an arbitrary order, which usually does not make sense.1 Instead we use a binary flag for each group (despite the groups being mutually exclusive).
Political Affiliation
I_Neither
I_Democrat
I_Republican
Neither
1
0
0
Democrat
0
1
0
Republican
0
0
1
Dummy Coding
Tip
See my one way example for dropping the intercept and for custom reference variable selection.
Instead, we create dummy variables for regression. To visualise in 3D space; x-axis - Republican = 1, else 0, y-axis - Democrat = 1, else 0, z-axis - y (outcome, e.g., policy score or voting tendency).
Pick one level as the reference category (e.g., Independent).
Create k-1 dummy variables that indicate binary membership in the other categories.
Political Affiliation
(D_1) (Democrat)
(D_2) (Republican)
Neither
0
0
Democrat
1
0
Republican
0
1
We ignore the reference variable because it is redundant. The X and Y axes completely encode the Independent voters. If we added a separate dummy for Independents, we would have perfect multicollinearity, since it would be a linear combination of the other covariates. In R, lm would then return NA due to a non-invertible matrix.
KC 4. Interaction Terms
click here to expand/collapse…
Interaction terms feature in some flavours of ANOVA and are absent from others. When we include an interaction term we are simply performing multivariable linear regression i.e. fitting a hyperplane to the data.
Multivariable Linear Regression (No interaction) lm(z~x+y, ...)
Think about the model the general model: \[\mathbb{E}[y;x_1x_2]=\beta_0+\beta_1x_1+\beta_2x_2\]
for protestant states (\(x_2=0\)): \(\mathbb{E}[y;x_1x_2]=\beta_0+\beta_1x_1\)
for catholic states (\(x_2=1\)): \(\mathbb{E}[y;x_1x_2]=\beta_0+\beta_1x_1+\beta_2\):
both models have the same slope \(\beta_1\)- two cross-sections through a hyper-plane pitched at a \(\beta_1\) angle with an offset of \(\beta_2\)
Linear Interaction Models lm(z~x*y, ...)
Think about the model the general model (interaction included): \[\mathbb{E}[y;x_1x_2]=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_1x_2\] (we can project the space to simplify- removing the \(x_2\) dimension) here the binary catholic coefficient becomes a vertical (y-direction) offset and both models have the same slope \(\beta_1\)- two cross-sections through a single plane:
for protestant states (\(x_2=0\)): \(\mathbb{E}[y;x_1x_2]=\beta_0+\beta_1x_1\)
for catholic states (\(x_2=1\)): \(\begin{align}\mathbb{E}[y;x_1x_2]&=\beta_0+\beta_1x_1+\beta_2+\beta_3x_1\\&=(\beta_0+\beta_2)+(\beta_1+\beta_3)x_1\end{align}\)
here the models have different slope \(\beta_1\) versus \(\beta_1+\beta_3\) - here the hyper-plane is twisted by the interaction term giving different linear gradiants at each of the cross-sections in the religion dimension.
KC 4.b) Interaction Plots
click here to expand/collapse…
An interaction plot is a graphical tool used in factorial experiments to illustrate how the effect of one factor depends on the level of another. It displays the mean response of the outcome variable across combinations of two categorical factors, with one factor on the x-axis and the other represented by separate lines. Non-parallel lines suggest the presence of an interaction: the effect of one factor varies according to the level of the other. The plot itself is descriptive rather than inferential; it does not fit a model or provide p-values or constitute a physical model, but it can reveal patterns that simple ANOVA techniques might overlook when only main effects are considered. Formal testing of these patterns requires a model that includes interaction terms, such as factorial ANOVA or a linear model.
After fitting the model, R provides a few quick checks to assess the main aspects of model fit, including overall metrics such as the ANOVA table and VIF (aside: additional diagnostics can examine individual data points if needed residuals, influence/leverage see (McCabe, 2025)).
An ANOVA (Analysis of Variance) table summarises how much of the variation in the response variable can be attributed to each term in a regression model. The F-value is calculated by comparing the fit of the full model to a model with that specific term removed, effectively testing whether adding the term significantly improves the model.
Small p-values (high significance terms) indicate that the predictor contributes meaningfully to explaining the response, while large p-values (low significance terms) suggest little additional contribution once other predictors are included.
The residual row shows the variation left unexplained by the model.
Variance inflation Factors
The Variance Inflation Factor (VIF) quantifies how much the variance of a regression coefficient is inflated due to multicollinearity among the terms in a model. For each term, the VIF is calculated by fitting a regression of that term on all the other terms and measuring how well it can be predicted. (swap dependent and independent variables).
vif(model)
wt disp carb
4.890224 4.734708 1.225414
A VIF of 1 indicates no correlation with other terms, values between 1 and 5 suggest moderate correlation, and values above 5 (or 10) indicate high multicollinearity, which can make coefficient estimates unstable (imagine a flate plane teetering on a tightrope).
High VIF values highlight terms whose explanatory power overlaps substantially with other terms, making it difficult to isolate their individual effects.
ANOVA Workflows with special names
General Model
Suppose we have a response \(Y\) and several categorical factors \(A, B, \dots\). The general ANOVA model is:
ANOVA uses multivariable linear regression to test if there are significant statistical differences between the means of three or more groups (student’s t-test/simple linear regression is used in the case of two groups) classified by a single independent factor.
\[
\boxed{Y_{ij} = \mu + \alpha_i + \epsilon_{ij}}\qquad\text{where}
\quad i = 1, \dots, k \quad j = 1, \dots, n_i
\]\[
\begin{align*}
Y_{ij} &\text{ is the response of observation } j \text{ in group } i, \\
\mu &\text{ is the overall mean,} \\
\alpha_i &\text{ is the effect of group } i, \\
\epsilon_{ij} &\sim N(0, \sigma^2) \text{ is the random error.}
\end{align*}
\]
hypotheses:
Null \(H_0\): all groups have the same mean
Alternative \(H_A\): not all groups have the same mean
Requires/Expects:
single continous/quantitative dependent variable \(Y\) (aka metric variable)
single factored/qualitative independent variable \(\alpha\) (aka nominal variable)
Assumptions:
independence: the independent variables should be mutually independent (strictly no confounding e.g. intellegence ~ age, shoe_size)
normality: data should be normally distributed within groups (testable using the Shapiro-Wilk test) or the residuals from the normal model should be normally distributed (testable via qqnorm + qqline plot). The Mean Squared Error (MSE) of the residuals estimates the within-group variance and assumes normality for valid inference.
homogeneity / homoscedasticity: each group should have the same variance (testable through the Levene test; if violated, a Welch ANOVA can be used, where degrees of freedom are adjusted as in the Welch test)
observation…notice the lm command recognised the ANOVA operation and performed dummy codingautomatically taking spray A as the reference variable and ommiting it from the feature space. here; - 14.5000000 is the mean for spray A - 0.8333333 is the change in the mean between spray A and spray B - -12.4166667 is the mean of spray A minus the mean of spray C as seen in the violin plot
Dropping the intercept
dt[,.(mean(count)),spray]
spray V1
<fctr> <num>
1: A 14.500000
2: B 15.333333
3: C 2.083333
4: D 4.916667
5: E 3.500000
6: F 16.666667
observation…now we see the precise means for each spray in the first column of the table relative to 0
Custom reference variable selection
summary(lm(count~I(as.numeric(spray=='A'))+# now B is the referenceI(as.numeric(spray=='C'))+I(as.numeric(spray=='D'|spray=='E'|spray=='F')) # combine the remainder ,data=dt))$coef
ANOVA tests if there are significant statistical differences between the means of three or more groups (student’s t-testis used to test two groups) classified by a multiple independent factors.
\[
\begin{equation}
\boxed{Y_{ij} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ij}}, \quad
i = 1, \dots, a, \quad j = 1, \dots, b
\end{equation}
\] where.. \[
\begin{align*}
Y_{ij} &\text{ is the response at level } i \text{ of factor A and level } j \text{ of factor B,} \\
\mu &\text{ is the overall mean,} \\
\alpha_i &\text{ is the effect of level } i \text{ of factor A,} \\
\beta_j &\text{ is the effect of level } j \text{ of factor B,} \\
(\alpha\beta)_{ij} &\text{ is the interaction effect between level } i \text{ of A and level } j \text{ of B,} \\
\epsilon_{ij} &\sim N(0, \sigma^2) \text{ is the random error.}
\end{align*}
\]
twisted-hyperplane (not quite - two triangles)
Compound hypotheses:
There’s a symmetric matrix of things to check here: - Main effect of first factor: - Null \(H_{0_{11}}\): all groups of the first factor have the same mean
- Alternative \(H_{A_{11}}\): not all groups of the first factor have the same mean
- Main effect of second factor: - Null \(H_{0_{22}}\): all groups of the second factor have the same mean
- Alternative \(H_{A_{22}}\): not all groups of the second factor have the same mean
- Interaction of first and second factors: - Null \(H_{0_{12}}\): no interaction between the two factors
- Alternative \(H_{A_{12}}\): there is an interaction between the two factors
independence: the independent variables should be mutually independent (strictly no confounding e.g. intellegence ~ age, shoe_size)
normality: data should be normally distributed within groups (testable using the Shapiro-Wilk test) or the residuals from the normal model should be normally distributed (testable via qqnorm + qqline plot). The Mean Squared Error (MSE) of the residuals estimates the within-group variance and assumes normality for valid inference.
homogeneity / homoscedasticity: each group should have the same variance (testable through the Levene test)
ANOVA with repeated measures - ANOVA where the grouping of the datapoints is the measurement ordinal
Repeated measures ANOVA can be seen as a special case of ANOVA. In repeated measures designs, the same subjects are observed under multiple conditions or timepoints, and the analysis accounts for the natural pairing of these observations. Mathematically, this is equivalent to treating one measurement (for example, a baseline) as a covariate when modelling the next, so that between-subject variability is removed and only within-subject differences are tested.
I really don’t underderstand why this is given such prominance, It’s only ANOVA applied to a particular experiment type, see one-way
MMRM ANOVA (Different groups accross repeated measures)
Mixed Linear Model ANOVA with repeated measures
Why all the fancy titles?!?!? It’s only ANOVA with repeat measures (within subject groupings) and “proper” groupings (between subject groupings), see two-way