Sampling distribution for differences of means

If assumptions of the t-distribution are true

  • data come from a normal distribution (normality)
  • both standard deviations are identical (equality of variance, homoscedasticity)
  • and \(\mu_x\) = \(\mu_y\) or \(\mu_x - \mu_y = 0\)

Then, the difference in the means is distributed according to a t-distribution

\[ t(n_x + n_y -2) \sim \frac{\bar{x} - \bar{y}}{\text{se}} \]

with \(n - 2\) degrees of freedom.

Sampling distribution for differences of means

The denominator of the t-statistic is known as the standard error which is calculated according to

\[ \text{se} = \sqrt{\frac{(n_x - 1) \times s_x^2 + (n_y - 1) \times s_y^2}{n_x + n_y - 2}} \times \sqrt{\frac{1}{n_x} + \frac{1}{n_y}} \]

Breaking down the standard error (left term)

Numerator:

\[ (n_x - 1) \times s_x^2 + (n_y - 1) \times s_y^2 \]

Denominator:

\[ n_x + n_y - 2 \] Fraction:

\[ \text{left term} = \sqrt\frac{\text{numerator}}{\text{denominator}} \]

Breaking down the standard error (right term)

\[ \text{right term} = \sqrt{\frac{1}{n_x} + \frac{1}{n_y}} \]

Putting the standard error together again

Product of the left and the right side of the SE equation.

\[ \text{se} = \text{left term} \times \text{right term} \]

Confidence interval for the null hypothesis

The confidence interval for the difference in means is

\[ (\bar{x} - \bar{y}) \pm \tau \times \text{se} \] where \(\pm\tau\) is the lower and upper bounds of in t-distribution that contain 95%, 99% etc. of the area under the curve.

We can work out \(\tau\) on the basis of the lower and upper bound of the CI and the degrees of freedom of our \(t\)-distribution.