Analysis of Variance (ANOVA) is a statistical technique used to compare the means of three or more groups to determine if at least one of them is significantly different from the others.
While a t-test compares the means of two groups, ANOVA generalizes this to \(k\) groups. You might ask: Why not just run multiple t-tests?
Running multiple pairwise comparisons increases the Type I error rate (false positives). If you test 3 groups against each other (\(A vs B\), \(B vs C\), \(A vs C\)) at \(\alpha = 0.05\), your probability of finding a false significant result increases to roughly \(1 - (0.95)^3 \approx 14\%\). ANOVA maintains the error rate at 5% for the whole set of comparisons.
Despite its name, ANOVA analyzes variance to test differences in means. It splits the total variation in the data into two parts: 1. Signal (Between-Group Variance): Differences caused by the specific treatment/group. 2. Noise (Within-Group Variance): Random error or individual differences within a group.
If the Signal is significantly larger than the Noise, we conclude the groups are different.
To quantify “Signal” vs “Noise,” we calculate the F-statistic.
We calculate Sums of Squares (SS) to measure variation.
Total Sum of Squares (\(SS_T\)): The total variation in the data. \[ SS_T = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_{grand})^2 \] Where \(X_{ij}\) is the \(j\)-th observation in the \(i\)-th group, and \(\bar{X}_{grand}\) is the mean of all data combined.
Sum of Squares Between Groups (\(SS_B\)): The variation due to the interaction (Signal). \[ SS_B = \sum_{i=1}^{k} n_i (\bar{X}_i - \bar{X}_{grand})^2 \] Where \(\bar{X}_i\) is the mean of group \(i\).
Sum of Squares Within Groups (\(SS_W\)): The variation due to random error (Noise). \[ SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_i)^2 \]
We convert Sums of Squares to Mean Squares (MS) by dividing by their degrees of freedom (\(df\)).
\[ MS_B = \frac{SS_B}{k-1} \] \[ MS_W = \frac{SS_W}{N-k} \]
Finally, the F-ratio is:
\[ F = \frac{MS_B}{MS_W} \]
If \(F\) is large (and the associated p-value is \(< 0.05\)), we reject \(H_0\).
An agricultural research institute wants to test three different types of fertilizers to see if they impact crop yield (measured in bushels per acre).
They apply these fertilizers to 30 random plots of land (10 plots per fertilizer).
Let’s generate synthetic data representing this scenario.
set.seed(123) # For reproducibility
# Create data
data <- data.frame(
Fertilizer = factor(rep(c("Standard", "Organic", "SuperGrow"), each = 30)),
Yield = c(rnorm(30, mean = 50, sd = 5), # Standard
rnorm(30, mean = 52, sd = 5), # Organic (slightly better)
rnorm(30, mean = 60, sd = 6)) # SuperGrow (Much better)
)
# Display first few rows
kable(head(data), caption = "Preview of Crop Yield Data")| Fertilizer | Yield |
|---|---|
| Standard | 47.19762 |
| Standard | 48.84911 |
| Standard | 57.79354 |
| Standard | 50.35254 |
| Standard | 50.64644 |
| Standard | 58.57532 |
Before running statistics, always visualize the data. Boxplots are ideal for comparing distributions across groups.
ggplot(data, aes(x = Fertilizer, y = Yield, fill = Fertilizer)) +
geom_boxplot(alpha = 0.7) +
geom_jitter(width = 0.2, alpha = 0.5) + # Adds individual points
stat_summary(fun = mean, geom = "point", shape = 23, size = 3, fill = "white") + # Marks the mean
theme_minimal() +
labs(title = "Crop Yield Distribution per Fertilizer",
y = "Yield (Bushels/Acre)",
x = "Fertilizer Type") +
theme(legend.position = "none")Figure 1: Boxplot of Crop Yield by Fertilizer Type
Note: The white diamond represents the mean, while the black line represents the median.
We use the aov() function to calculate the ANOVA
table.
# Run the ANOVA model
anova_model <- aov(Yield ~ Fertilizer, data = data)
# View the summary
summary(anova_model)## Df Sum Sq Mean Sq F value Pr(>F)
## Fertilizer 2 1702 851.0 37.14 2.18e-12 ***
## Residuals 87 1993 22.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion: Since the p-value is extremely small (much less than 0.05), we reject the Null Hypothesis. There is a statistically significant difference in crop yield between at least two of the fertilizer groups.
ANOVA is reliable only if certain assumptions are met:
Figure 2: Q-Q Plot of Residuals
If points fall roughly along the dotted diagonal line, normality is satisfied. We can also run the Shapiro-Wilk test on the residuals:
##
## Shapiro-Wilk normality test
##
## data: residuals(anova_model)
## W = 0.99355, p-value = 0.9434
If p > 0.05, we assume normality.
Figure 3: Residuals vs Fitted Values
We look for a “starry night” pattern (random scatter). If we see a
funnel shape, variances might be unequal. We can formally test this with
Levene’s Test (requires car package) or
Bartlett’s test.
##
## Bartlett test of homogeneity of variances
##
## data: Yield by Fertilizer
## Bartlett's K-squared = 1.4618, df = 2, p-value = 0.4815
If p > 0.05, variances are equal.
The ANOVA told us that there is a difference, but not where the difference lies. Is SuperGrow better than Organic? Is Organic better than Standard?
We use the Tukey HSD (Honest Significant Difference) test.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Yield ~ Fertilizer, data = data)
##
## $Fertilizer
## diff lwr upr p adj
## Standard-Organic -3.127210 -6.074120 -0.1803011 0.0348896
## SuperGrow-Organic 7.254831 4.307921 10.2017401 0.0000002
## SuperGrow-Standard 10.382041 7.435132 13.3289505 0.0000000
Figure 4: Tukey HSD Confidence Intervals
Interpretation: * If the confidence interval crosses the vertical line at 0, there is no significant difference between those two groups. * If the interval does not touch 0, the difference is significant.
Based on our generated data: 1. SuperGrow vs Standard: Significant difference (Interval does not cross 0). 2. SuperGrow vs Organic: Significant difference. 3. Organic vs Standard: Likely no significant difference (if the interval crosses 0).
In this chapter, we explored Analysis of Variance (ANOVA). We learned that:
ANOVA is a powerful tool widely used in clinical trials, marketing A/B/C testing, manufacturing quality control, and agricultural science. ```