F-distribution, Experimental Terminology, and one-way ANOVA

STA1008S

F-distribution

Introduction to the F-distribution

  • The F-distribution is a continuous probability distribution.
  • It arises when comparing the variances of two samples.
  • Positively skewed and depends on two sets of degrees of freedom:
    • \(df_1\): degrees of freedom for the numerator.
    • \(df_2\): degrees of freedom for the denominator.
  • Always non-negative, as it is a ratio of variances.

Testing for equality of variances

\[F = \frac{s^2_1}{s^2_2} \sim F_{df_1,df_2}\]

Where: \(s^2_1\) and \(s^2_2\) are sample variances from independent populations AND very importantly the following restriction is applied here:

\[ s^2_1 > s^2_2\]

\(H_0\): \(\sigma^2_1 = \sigma^2_2\) \(H_1\): \(\sigma^2_1 = \sigma^2_2\) OR \(\sigma^2_1 > \sigma^2_2\)

Two sided test

\(H_0\): \(\sigma^2_1 = \sigma^2_2\) vs \(H_1\): \(\sigma^2_1 \neq \sigma^2_2\)

Critical F value = \(F^{\alpha/2}_{df_1,df_2}\)

Upper tailed one-sided test

\(H_0\): \(\sigma^2_1 = \sigma^2_2\) \(H_1\): \(\sigma^2_1 > \sigma^2_2\)

Critical F value = \(F^{\alpha}_{df_1,df_2}\)

Experimental Terminology

Key Terminology

  • Factors: Independent variables being actively manipulated and become the explanatory variables (e.g., Type of fertilizer).

  • Factor levels: Factor levels are the different levels of the thing being manipulated. So the different types of fertilizers (Fertilizer A, B, and C).

  • Treatment: Treatment is the individual factor level or combination of factor levels (when there is more than one factor being manipulated). The factor level being applied to the experimental unit.

  • Experimental Unit: The entity/object receiving the treatment.
  • Response Variable: The outcome or dependent variable measured in the experiment.
  • Replicates: Repeated observations at each factor level / treatment.

Types of Experimental Designs

  • Completely Randomized Design: All subjects are randomly assigned to treatments.
  • Randomized Block Design: Subjects are divided into blocks based on a variable, treatments are randomly assigned to units within blocks.

One-way ANOVA Recap

Introduction to One-way ANOVA

  • It is a hypothesis test!

  • Used to compare the means of three or more groups based on one factor.

\(H_0\): All k group means are equal.

\(\mu_1 = \mu_2 =\ldots =\mu_k\)

\(H_A\): At least one group mean (\(\mu_i\)) is different.

Assumptions

  1. Independent variable / Factor: nominal variable defining the k groups.

    Dependent variable / Response: measurements taken on experimental units within each group.

  2. Independence between groups.

  3. Measurements within each group are normally distributed.

  4. Variance in each group are equal.

Why “Analysis of Variance” for Testing Means?

  • When we use ANOVA, we are making conclusions about means of different groups.
  • But why is it called “Analysis of Variance”?

Let’s explore this using simple visuals to understand intuitively how variability (variance) helps us assess differences in means.

An intuitive example

The means are the same between plots, but which one provides stronger evidence that the means are significantly different from each other?

  • Both experiments have the same differences among treatment means.
  • The key difference is in the within-treatment variability:
  • Experiment 1: Low within-treatment variability compared to between-treatment variability.
  • Experiment 2: Within-treatment variability is similar to between-treatment variability.
  • Conclusion: Lower within-treatment variability in Experiment 1 makes the treatment effect clearer.

The Basic Idea

ANOVA compares between-treatment (signal) to within-treatment (noise) variation as a ratio of the two:

\[ \frac{\text{Between variation (differences in group means)}}{\text{Within variation (spread within each group)}} \]

  • Large ratio: Signal (difference among means) is large relative to noise (within-group variation) → evidence of differences in means.
  • Small ratio: Signal is small relative to noise → no evidence of mean differences.

More specifically

  • A well designed experiment allows us to partition the variability in our response.

  • Decompose the total sums of squares into:

\[ \begin{align} \text{Total SS (SST)} &= \text{Between SS} + \text{Within SS} \\ \sum_{i=1}^{n} (Y_i - \bar{Y})^2 &= \sum_{j=1}^{k} n_j (\bar{Y}_j - \bar{Y})^2 + \sum_{j=1}^{k} \sum_{i=1}^{n_j} (Y_{ij} - \bar{Y}_j)^2 \end{align} \]

Quick note on sum of sqaures

  • Variance is the average squared deviations of data points from the mean.

  • Sum of squares is the variance without the average.

\[ s^2 = \frac{\sum(y-\bar{y})^2}{n-1} = \frac{\text{Sum of squares}}{n-1} \]

  • We want to compare the between-group variability to the within-group variability.

  • To do this, we use the partitioning of the total sum of squares into between-group and within-group sums of squares.

  • For each sum of squares, we convert them to mean squares (MS) by dividing by their respective degrees of freedom.

  • This gives us MS between and MS within, which are estimates of variability.

  • Then, the test statistic is the ratio of MS between to MS within!

ANOVA Table

Source Df SS MS F
Between Groups \(k - 1\) \(SS_B\) \(MS_B\) \(\frac{MS_B}{MS_W}\)
Within Groups \(N - k\) \(SS_W\) \(MS_W\)
Total \(N - 1\) \(SS_T\)

Where: \(k\) is the number of groups and \(N\) is the total number of observations.

Example

Source Df SS MS F
Between Groups 3 120.5 40.17 8.45
Within Groups 16 75.2 4.7
Total 19 195.7