The difference between independent means test uses the difference in sample means $$\hat{d} = \bar{x} - \bar{y}$$ as an estimate of the difference in population means $$d = \mu_X - \mu_Y$$ to evaluate an hypothesized difference in population means $$d_0$$. The null hypothesis is $$H_0: d = d_0$$. Alternatively, construct a $$(1 - \alpha)\%$$ confidence interval around $$\hat{d}$$ to estimate $$d$$ within a margin of error $$\epsilon$$.

In theory, you can test the difference between independent means with either a z test or a t test. Both require that the independent samples. Both require normal sampling distributions. The sampling distributions are normal if the underlying populations are normally distributed, or if the sample sizes are large $$(n >= 30)$$. However, the z-test additionally requires known sampling distribution variances $$\sigma^2_X$$ and $$\sigma^2_Y$$. These variances are never known, so always use the t-test.

In the z-test, the measured difference in means $$\hat{d} = \bar{x} - \bar{y}$$ approximates the difference in population means $$d = \mu_X - \mu_Y$$, and the sampling distribution of $$d$$ has a normal distribution centered at $$\hat{d} = d$$ with standard error $$se = \frac{\sigma_X}{\sqrt{n_X}} + \frac{\sigma_Y}{\sqrt{n_Y}}$$ where $$\sigma_X$$ and $$\sigma_Y$$ are the standard deviations of the underlying populations. Test $$H_0: d = d_0$$ with test statistic $$Z = \frac{\hat{d} - d_0}{se}$$ or define a $$(1 - \alpha)\%$$ confidence interval as $$d = \hat{d} \pm z_{1 - \alpha {/} 2} se$$.

The t-test is similar except that it estimates $$\sigma_X$$ and $$\sigma_Y$$ with the sample standard deviations $$s_X$$ and $$s_Y$$. The resulting sampling distribution has a t-distribution centered at $$\hat{d} = d$$ with standard error $$se = \frac{s_X}{\sqrt{n_X}} + \frac{s_Y}{\sqrt{n_Y}}$$. Test $$H_0: d = d_0$$ with test statistic $$T = \frac{\hat{d} - d_0}{se}$$, or define a $$(1 - \alpha)\%$$ confidence interval as $$d = \hat{d} \pm t_{1 - \alpha / 2, n_X + n_Y - 2} se$$.

#### Pooled vs Separate Variances

At this point in the t-test we have two options. If the sample sizes are small, and the standard deviations from each population are similar (roughly, $$s_X / s_Y$$ between $$0.5$$ and $$2$$), pool the sampling distribution variances $$s_p^2 = \frac{(n_X - 1) s_X^2 + (n_Y-1) s_Y^2}{n_X + n_Y-2}$$ so that $$se = s_p \sqrt{\frac{1}{n_X} + \frac{1}{n_Y}}$$. This version of the test is called the pooled variances t-test.

Otherwise let $$se = \sqrt{s_X^2 / n_X + s_Y^2 / n_Y}$$ and reduce the degrees of freedom in the t-distribution using Satterthwaiteâ€™s approximation $$df = \frac{(n_X - 1) (n_Y - 1)}{(n_Y - 1) C^2 + (1-C)^2 (n_X - 1)}$$ where $$C = \frac{s_X^2 / n_X}{se_\hat{d}^2}$$. This version of the t-test is called the separate variance t-test or Welchâ€™s t-test.

#### Paired Samples

The paired two-sample test uses the mean of sampled paired differences $$\bar{d} = \sum_{i = 1}^n (x_i - y_i) {/} n$$ as an estimate of the mean of the population paired differences $$\delta$$ to evaluate the hypothesized mean of population paired differences $$\delta_0$$. Test $$H_0: \delta = \delta_0$$ with test statistic $$T = \frac{\bar{d} - \delta_0)}{se}$$, or define a $$(1 - \alpha)\%$$ confidence interval asw $$\delta = \bar{d} \pm t_{1 - \alpha / 2, n - 1} se$$. The paired t-test is really just a single mean t-test operating on variable that is defined as the difference between two variables.

## Pooled Variances t Test Example

A packing plant tests the hypothesis that a new machine is faster than its old machine. It measures the packing time for n_old = 10 cartons using the old new machine and n_new = 10 cartons using the new machine. At an $$\alpha = 5\%$$ level of significance, does the new machine pack faster?

library(readr)

machine <- read_tsv(file = "Data/machine.txt", col_names = TRUE)
colnames(machine)[colnames(machine) == "New machine"] <- "new"
colnames(machine)[colnames(machine) == "Old machine"] <- "old"

print(machine)
## # A tibble: 10 x 2
##      new   old
##    <dbl> <dbl>
##  1  42.1  42.7
##  2  41    43.6
##  3  41.3  43.8
##  4  41.8  43.3
##  5  42.4  42.5
##  6  42.8  43.5
##  7  43.2  43.1
##  8  42.3  41.7
##  9  41.8  44
## 10  42.7  44.1

The null hypothesis is that the difference in means is zero, $$H_0: d = \mu_{new} - \mu_{old} = 0$$ with alternative hypothesis that the new machine is faster, $$H_a: d < 0$$. The threshold level of significance s $$\alpha = 0.05$$.

d_0 = 0
alpha = 0.05

Prior to conducting a t test, check the initial conditions of indepependent samples and normally-distributed sampling distributions. The samples are independent because the machines are not related. The sample sizes are small (<30), so check that each sample is normally distributed. Neither normal Q-Q plot below indicates a non-normal population.

par(mfrow=c(1,2))
qqnorm(machine$old) qqline(machine$old)
qqnorm(machine$new) qqline(machine$new)

The Anderson-Darling normality tests both have p-values > .05, so do not reject the null hypothesis of normally distributed populations.

library(nortest)
ad.test(machine$old) ## ## Anderson-Darling normality test ## ## data: machine$old
## A = 0.27475, p-value = 0.5782
ad.test(machine$new) ## ## Anderson-Darling normality test ## ## data: machine$new
## A = 0.14231, p-value = 0.9548

The pooled variance test can be used if the sample standard deviations are similar (their ratio between 0.5 and 2.0). That is the case here.

s_old = sd(machine$old) s_new = sd(machine$new)
s_old / s_new
## [1] 1.097203

Better yet, use the F test to compare two variances. The F test compares uses the ratio of sample variances $$\hat{r} = s_X^2 / s_Y^2$$ as an estimate of the ratio of population variances $$r = \sigma_X^2 / \sigma_Y^2$$ to evaluate the hypothesized ratio of population variances $$r_0$$. In this case, $$H_0: r = \sigma_{new}^2 / \sigma_{old}^2 = 1$$. The p-value is $$P(f>|F|) = .7868$$, so do not reject the null hypothesis of equal variances in packing times.

r_0 = 1
alpha = 0.05
var.test(x = machine$new, y = machine$old,
ratio = 1,
alternative = "two.sided",
conf.level = .95)
##
##  F test to compare two variances
##
## data:  machine$new and machine$old
## F = 0.83067, num df = 9, denom df = 9, p-value = 0.7868
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.2063257 3.3442560
## sample estimates:
## ratio of variances
##          0.8306659

The pooled variance is $s_p^2 = \frac{(n_{new}-1) s_{new}^2 + (n_{old}-1) s_{old}^2)}{n_{new} + n_{old} - 2} = \frac{(10-1) .683^2 + (10-1) .750^2)}{10 + 10 - 2} = 0.515$.

n_old <- nrow(machine)
n_new <- nrow(machine)
(s_p2 <- ((n_new - 1) * s_new^2 + (n_old - 1) * s_old^2) /
(n_new + n_old - 2))
## [1] 0.5147222

The variance of the measured difference sampling distribution is $se^2 = s_p^2 (\frac{1}{n_{new}} + \frac{1}{n_{old}}) = .515 (\frac{1}{10} + \frac{1}{10}) = .103$.

(se_2 <- s_p2 * (1 / n_new + 1 / n_old))
## [1] 0.1029444

The test statistic for the pooled variance t-test for the difference in means is $T = \frac{\hat{d} - d_0}{\sqrt{se^2}} = \frac{(42.14 - 43.23) - 0}{\sqrt{0.103}} = -3.40$

d_hat <- mean(machine$new) - mean(machine$old)
(t <- (d_hat - d_0) / sqrt(se_2))
## [1] -3.397231

The p-value is $$P(t<T) = .0016$$, so reject the null hypothesis of no difference in packing times.

pt(q = t, df = n_new + n_old - 2, lower.tail = TRUE)
## [1] 0.001605571

The function t.test() calculates the t-test for the difference in means. Specify var.equal = TRUE for a pooled variance.

(t.test.result <- t.test(x = machine$new, y = machine$old,
alternative = "less",
mu = 0,
paired = FALSE,
var.equal = TRUE,
conf.level = (1 - alpha)))
##
##  Two Sample t-test
##
## data:  machine$new and machine$old
## t = -3.3972, df = 18, p-value = 0.001606
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.5336264
## sample estimates:
## mean of x mean of y
##     42.14     43.23

The plot of the hypothesis test shows the measured difference in means is well into the the range of rejection.

library(ggplot2)
library(dplyr)  # for %>%

t_crit = qt(p = alpha, df = n_old + n_new - 2)
lrr <- d_0 - t_crit * sqrt(se_2)
urr <- Inf
data.frame(d = -250:250 / 100) %>%
mutate(t = (d - d_0) / sqrt(se_2)) %>%
mutate(prob = dt(x = t, df = n_old + n_new - 2)) %>%
mutate(rr = ifelse(t < t_crit, prob, 0)) %>%
ggplot() +
geom_line(aes(x = d, y = prob)) +
geom_area(aes(x = d, y = rr), fill = "red", alpha = 0.3) +
geom_vline(aes(xintercept = d_0), color = "blue") +
geom_vline(aes(xintercept = d_hat), color = "red") +
labs(title = bquote("Hypothesis Test of H0: d = 0"),
subtitle = bquote("d_hat ="~.(round(d_hat, 1))~", p-value="~.(round(t.test.result$p.value, 4))), x = "d", y = "Probability") + theme(legend.position="none") Construct the pooled variance confidence interval for the difference in means. The confidence interval is $$d = \hat{d} \pm t_{\alpha / 2, n_X + n_Y - 2} se$$. t <- qt(p = alpha / 2, df = n_old + n_new - 2, lower.tail = FALSE) cat("95% CI: ", d_hat, " +/- ", t * sqrt(se_2), " (", d_hat - t * sqrt(se_2), ", ", d_hat + t * sqrt(se_2), ")") ## 95% CI: -1.09 +/- 0.6740799 ( -1.76408 , -0.4159201 ) The function t.test() calculates the t-test for the difference in means. Specify var.equal = TRUE for a pooled variance. (t.test.result <- t.test(x = machine$new,
y = machine$old, paired = FALSE, var.equal = TRUE, conf.level = (1 - alpha))) ## ## Two Sample t-test ## ## data: machine$new and machine$old ## t = -3.3972, df = 18, p-value = 0.003211 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -1.7640799 -0.4159201 ## sample estimates: ## mean of x mean of y ## 42.14 43.23 The plot of the confidence interval show the measured difference in means is significantly different from 0. library(ggplot2) library(dplyr) # for %>% t_crit = qt(p = alpha / 2, df = n_old + n_new - 2, lower.tail = FALSE) lcl <- -t_crit ucl <- t_crit data.frame(d = -250:250 / 100) %>% mutate(t = (d - d_hat) / sqrt(se_2)) %>% mutate(prob = dt(x = t, df = n_old + n_new - 2)) %>% mutate(rr = ifelse(t > lcl & t < ucl, prob, 0)) %>% ggplot() + geom_line(aes(x = d, y = prob)) + geom_area(aes(x = d, y = rr), fill = "blue", alpha = 0.1) + geom_vline(aes(xintercept = d_hat), color = "blue") + labs(title = bquote("95% CI for Difference in Means"), subtitle = bquote("d_hat = "~.(round(d_hat, 1))), x = "d", y = "Probability") + theme(legend.position="none") ### Example with separate Variances (Welch) In the example above, we could have used the separate variances t-test instead. How do the results differ? The separate variances method defines the standard error of the difference in means sampling distribution as $$se = \frac{s_{new}}{\sqrt{n_{new}}} + \frac{s_{old}}{\sqrt{n_{old}}} = \frac{0.6835}{\sqrt{10}} + \frac{0.7499}{\sqrt{10}} = 0.3208$$. The degrees of freedom are $$df = \frac{(n_new - 1) (n_old - 1)}{(n_old - 1) C^2 + (1-C)^2 (n_new - 1)} = \frac{(10 - 1) (10 - 1)}{(10 - 1) (.4537)^2 + (1 - .4537)^2 (10-1) } = 17.847$$ where $$C = \frac{s_{new}^2 {/} n_{new}}{se_\hat{d}^2} = \frac{(.6835)^2 {/} 10}{.3208^2} = 0.4537$$ (known as Satterthwaiteâ€™s approximation). (se <- sqrt(s_new^2 / n_new + {s_old}^2 / n_old)) ## [1] 0.3208496 (c <- s_new^2 / n_new / se^2) ## [1] 0.4537507 (df_satter <- (n_new - 1) * (n_old - 1) / ((n_old - 1) * c^2 + (1 - c)^2 * (n_new - 1))) ## [1] 17.8473 Conduct the separate variance t-test for the difference in means. The T-statistic is $$T = \frac{\hat{d} - d_0}{\sqrt{se_\hat{d}^2}} = \frac{(42.14 - 43.23) - 0}{\sqrt{0.103}} = -3.40$$. The p-value is $$P(t<T) = .0016$$, so reject the null hypothesis of no difference in packing times. (t <- (d_hat - d_0) / sqrt(se)) ## [1] -1.924313 pt(q = t, df = df_satter, lower.tail = TRUE) ## [1] 0.03520502 The stats package function t.test calculates the t-test for the difference in means. This time specify var.equal = FALSE for a separate variances test. (t.test.result <- t.test(x = machine$new,
y = machine$old, alternative = "less", mu = 0, paired = FALSE, var.equal = FALSE, conf.level = (1 - alpha))) ## ## Welch Two Sample t-test ## ## data: machine$new and machine\$old
## t = -3.3972, df = 17.847, p-value = 0.001621
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.5333684
## sample estimates:
## mean of x mean of y
##     42.14     43.23

The separate variance method returned results nearly identical to the pooled variance method.

## Example with Paired Values

A study of 15 pairs of twins where one twin had schizophrenia measured the volumes of regions of the brain. What is the magnitude of the difference in volume?

x <- c(1.94, 1.44, 1.56, 1.58, 2.06,
1.66, 1.75, 1.77, 1.78, 1.92,
1.25, 1.93, 2.04, 1.62, 2.08)
y <- c(1.27, 1.63, 1.47, 1.39, 1.93,
1.26, 1.71, 1.67, 1.28, 1.85,
1.02, 1.34, 2.02, 1.59, 1.97)
d <- x - y
n <- length(d)

The null hypothesis is that the mean of the paired differences is zero, $$H_0: \delta = \delta_0 = 0$$ with alternative hypothesis that the difference is non-zero, $$H_a: \delta \ne 0$$. We use an $$\alpha = 0.05$$ level of significance.

d_0 <- 0
d_bar <- mean(d)
se <- sd(d) / sqrt(n)
alpha <- 0.05

The samples are independent because the sets of twins are not related. The sample size is not large $$(n < 30)$$, so we need to check that each population is normally distributed before taking a difference in pairs mean test with the t-statistic. Neither normal Q-Q plot below indicates a non-normal population. The Anderson-Darling normality tests have p-values > .05, so do not reject the null hypothesis of normally distributed populations.

qqnorm(x)
qqline(x)

qqnorm(y)
qqline(y)

library(nortest)
ad.test(x)
##
##  Anderson-Darling normality test
##
## data:  x
## A = 0.25027, p-value = 0.6937
ad.test(y)
##
##  Anderson-Darling normality test
##
## data:  y
## A = 0.25863, p-value = 0.6639

Conduct the t-test for the mean of paired differences. The T-statistic is $$T = \frac{\bar{d} - \delta_0)}{se} = \frac{0.1987 - 0}{0.0615} = 3.2289$$. The p-value is $$P(t<T) = .0061$$, so reject the null hypothesis of a mean difference of zero. Construct a 95% CI as $$d_bar \pm t_{1 - \alpha {/} 2, n - 1} se = .1987 \pm 0.1320 = (0.0667, 0.3306)$$.

(t <- (d_bar - d_0) / se)
## [1] 3.228928
pt(q = t, df = n - 1, lower.tail = FALSE) * 2  # x2 for two-tail
## [1] 0.006061544
(t_crit <- qt(p = alpha / 2, df = n - 1, lower.tail = FALSE))
## [1] 2.144787
(epsilon <- t_crit * se)
## [1] 0.1319626
(lcl <- d_bar - epsilon)
## [1] 0.0667041
(ucl <- d_bar + epsilon)
## [1] 0.3306292
cat("95% CI: (", lcl, ", ", ucl, ")")
## 95% CI: ( 0.0667041 ,  0.3306292 )

The R function t.test does these same calculations.

t.test(x = x,
y = y,
alternative = "two.sided",
mu = d_0,
paired = TRUE,
conf.level = (1 - alpha))
##
##  Paired t-test
##
## data:  x and y
## t = 3.2289, df = 14, p-value = 0.006062
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.0667041 0.3306292
## sample estimates:
## mean of the differences
##               0.1986667

Graph the t-test and CI.

# Hypothesis test graph
lrr <- d_0 - t * se
urr <- d_0 + t * se
data.frame(x = -500:500 / 1000) %>%
mutate(t = (x - d_0) / se) %>%
mutate(prob = dt(x = t, df = n - 1)) %>%
mutate(rr = ifelse(x < lrr | x > urr, prob, 0)) %>%
ggplot() +
geom_line(aes(x = x, y = prob)) +
geom_area(aes(x = x, y = rr), fill = "red", alpha = 0.3) +
geom_vline(aes(xintercept = d_0), color = "blue") +
geom_vline(aes(xintercept = d_bar), color = "red") +
labs(title = bquote('Hypothesis Test of H0: delta = 0'),
subtitle = bquote('Diff = '~.(round(d_bar, 4))),
x = "d",
y = "Probability") +
theme(legend.position="none")

# 95% confidence interval graph
data.frame(x = 0:500 / 1000) %>%
mutate(t = (x - d_bar) / se) %>%
mutate(prob = dt(x = t, df = n - 1)) %>%
mutate(rr = ifelse(x >= lcl & x <= ucl, prob, 0)) %>%
ggplot() +
geom_line(aes(x = x, y = prob)) +
geom_area(aes(x = x, y = rr), fill = "blue", alpha = 0.2) +
geom_vline(aes(xintercept = d_bar), color = "blue") +
labs(title = bquote('95% CI'),
subtitle = bquote('Difference = '~.(round(d_bar,4))~'+-'~.(round(epsilon, 4))~' using t-dist with'~.(n - 1)~'df.'),
x = "delta",
y = "Probability")