\[F = \frac{s^2_1}{s^2_2} \sim F_{df_1,df_2}\]
Where: \(s^2_1\) and \(s^2_2\) are sample variances from independent populations AND very importantly the following restriction is applied here:
\[ s^2_1 > s^2_2\]
\(H_0\): \(\sigma^2_1 = \sigma^2_2\) \(H_1\): \(\sigma^2_1 = \sigma^2_2\) OR \(\sigma^2_1 > \sigma^2_2\)
\(H_0\): \(\sigma^2_1 = \sigma^2_2\) vs \(H_1\): \(\sigma^2_1 \neq \sigma^2_2\)
Critical F value = \(F^{\alpha/2}_{df_1,df_2}\)
\(H_0\): \(\sigma^2_1 = \sigma^2_2\) \(H_1\): \(\sigma^2_1 > \sigma^2_2\)
Critical F value = \(F^{\alpha}_{df_1,df_2}\)
Factors: Independent variables being actively manipulated and become the explanatory variables (e.g., Type of fertilizer).
Factor levels: Factor levels are the different levels of the thing being manipulated. So the different types of fertilizers (Fertilizer A, B, and C).
Treatment: Treatment is the individual factor level or combination of factor levels (when there is more than one factor being manipulated). The factor level being applied to the experimental unit.
It is a hypothesis test!
Used to compare the means of three or more groups based on one factor.
\(H_0\): All k group means are equal.
\(H_A\): At least one group mean (\(\mu_i\)) is different.
Independent variable / Factor: nominal variable defining the k groups.
Dependent variable / Response: measurements taken on experimental units within each group.
Independence between groups.
Measurements within each group are normally distributed.
Variance in each group are equal.
Let’s explore this using simple visuals to understand intuitively how variability (variance) helps us assess differences in means.
The means are the same between plots, but which one provides stronger evidence that the means are significantly different from each other?
ANOVA compares between-treatment (signal) to within-treatment (noise) variation as a ratio of the two:
\[ \frac{\text{Between variation (differences in group means)}}{\text{Within variation (spread within each group)}} \]
A well designed experiment allows us to partition the variability in our response.
Decompose the total sums of squares into:
\[ \begin{align} \text{Total SS (SST)} &= \text{Between SS} + \text{Within SS} \\ \sum_{i=1}^{n} (Y_i - \bar{Y})^2 &= \sum_{j=1}^{k} n_j (\bar{Y}_j - \bar{Y})^2 + \sum_{j=1}^{k} \sum_{i=1}^{n_j} (Y_{ij} - \bar{Y}_j)^2 \end{align} \]
Variance is the average squared deviations of data points from the mean.
Sum of squares is the variance without the average.
\[ s^2 = \frac{\sum(y-\bar{y})^2}{n-1} = \frac{\text{Sum of squares}}{n-1} \]
We want to compare the between-group variability to the within-group variability.
To do this, we use the partitioning of the total sum of squares into between-group and within-group sums of squares.
For each sum of squares, we convert them to mean squares (MS) by dividing by their respective degrees of freedom.
This gives us MS between and MS within, which are estimates of variability.
Then, the test statistic is the ratio of MS between to MS within!
| Source | Df | SS | MS | F |
|---|---|---|---|---|
| Between Groups | \(k - 1\) | \(SS_B\) | \(MS_B\) | \(\frac{MS_B}{MS_W}\) |
| Within Groups | \(N - k\) | \(SS_W\) | \(MS_W\) | |
| Total | \(N - 1\) | \(SS_T\) |
Where: \(k\) is the number of groups and \(N\) is the total number of observations.
| Source | Df | SS | MS | F |
|---|---|---|---|---|
| Between Groups | 3 | 120.5 | 40.17 | 8.45 |
| Within Groups | 16 | 75.2 | 4.7 | |
| Total | 19 | 195.7 |