The **difference between independent means** test uses the difference in sample means \(\hat{d} = \bar{x} - \bar{y}\) as an estimate of the difference in population means \(d = \mu_X - \mu_Y\) to evaluate an hypothesized difference in population means \(d_0\). The null hypothesis is \(H_0: d = d_0\). Alternatively, construct a \((1 - \alpha)\%\) confidence interval around \(\hat{d}\) to estimate \(d\) within a margin of error \(\epsilon\).

In theory, you can test the difference between independent means with either a z test or a t test. Both require that the independent samples. Both require normal sampling distributions. The sampling distributions are normal if the underlying populations are normally distributed, or if the sample sizes are large \((n >= 30)\). However, the z-test additionally requires known sampling distribution variances \(\sigma^2_X\) and \(\sigma^2_Y\). These variances are never known, so always use the t-test.

In the z-test, the measured difference in means \(\hat{d} = \bar{x} - \bar{y}\) approximates the difference in population means \(d = \mu_X - \mu_Y\), and the sampling distribution of \(d\) has a normal distribution centered at \(\hat{d} = d\) with standard error \(se = \frac{\sigma_X}{\sqrt{n_X}} + \frac{\sigma_Y}{\sqrt{n_Y}}\) where \(\sigma_X\) and \(\sigma_Y\) are the standard deviations of the underlying populations. Test \(H_0: d = d_0\) with test statistic \(Z = \frac{\hat{d} - d_0}{se}\) or define a \((1 - \alpha)\%\) confidence interval as \(d = \hat{d} \pm z_{1 - \alpha {/} 2} se\).

The t-test is similar except that it estimates \(\sigma_X\) and \(\sigma_Y\) with the sample standard deviations \(s_X\) and \(s_Y\). The resulting sampling distribution has a t-distribution centered at \(\hat{d} = d\) with standard error \(se = \frac{s_X}{\sqrt{n_X}} + \frac{s_Y}{\sqrt{n_Y}}\). Test \(H_0: d = d_0\) with test statistic \(T = \frac{\hat{d} - d_0}{se}\), or define a \((1 - \alpha)\%\) confidence interval as \(d = \hat{d} \pm t_{1 - \alpha / 2, n_X + n_Y - 2} se\).

At this point in the t-test we have two options. If the sample sizes are small, and the standard deviations from each population are similar (roughly, \(s_X / s_Y\) between \(0.5\) and \(2\)), pool the sampling distribution variances \(s_p^2 = \frac{(n_X - 1) s_X^2 + (n_Y-1) s_Y^2}{n_X + n_Y-2}\) so that \(se = s_p \sqrt{\frac{1}{n_X} + \frac{1}{n_Y}}\). This version of the test is called the **pooled variances t-test**.

Otherwise let \(se = \sqrt{s_X^2 / n_X + s_Y^2 / n_Y}\) and reduce the degrees of freedom in the t-distribution using Satterthwaiteâ€™s approximation \(df = \frac{(n_X - 1) (n_Y - 1)}{(n_Y - 1) C^2 + (1-C)^2 (n_X - 1)}\) where \(C = \frac{s_X^2 / n_X}{se_\hat{d}^2}\). This version of the t-test is called the **separate variance t-test** or **Welchâ€™s t-test**.

The **paired two-sample test** uses the mean of sampled paired differences \(\bar{d} = \sum_{i = 1}^n (x_i - y_i) {/} n\) as an estimate of the mean of the population paired differences \(\delta\) to evaluate the hypothesized mean of population paired differences \(\delta_0\). Test \(H_0: \delta = \delta_0\) with test statistic \(T = \frac{\bar{d} - \delta_0)}{se}\), or define a \((1 - \alpha)\%\) confidence interval asw \(\delta = \bar{d} \pm t_{1 - \alpha / 2, n - 1} se\). *The paired t-test is really just a single mean t-test operating on variable that is defined as the difference between two variables*.

A packing plant tests the hypothesis that a new machine is faster than its old machine. It measures the packing time for `n_old`

= 10 cartons using the old new machine and `n_new`

= 10 cartons using the new machine. At an \(\alpha = 5\%\) level of significance, does the new machine pack faster?

```
library(readr)
machine <- read_tsv(file = "Data/machine.txt", col_names = TRUE)
colnames(machine)[colnames(machine) == "New machine"] <- "new"
colnames(machine)[colnames(machine) == "Old machine"] <- "old"
print(machine)
```

```
## # A tibble: 10 x 2
## new old
## <dbl> <dbl>
## 1 42.1 42.7
## 2 41 43.6
## 3 41.3 43.8
## 4 41.8 43.3
## 5 42.4 42.5
## 6 42.8 43.5
## 7 43.2 43.1
## 8 42.3 41.7
## 9 41.8 44
## 10 42.7 44.1
```

The null hypothesis is that the difference in means is zero, \(H_0: d = \mu_{new} - \mu_{old} = 0\) with alternative hypothesis that the new machine is faster, \(H_a: d < 0\). The threshold level of significance s \(\alpha = 0.05\).

```
d_0 = 0
alpha = 0.05
```

Prior to conducting a t test, check the initial conditions of indepependent samples and normally-distributed sampling distributions. The samples are independent because the machines are not related. The sample sizes are small (<30), so check that each sample is normally distributed. Neither normal Q-Q plot below indicates a non-normal population.

```
par(mfrow=c(1,2))
qqnorm(machine$old)
qqline(machine$old)
qqnorm(machine$new)
qqline(machine$new)
```

The Anderson-Darling normality tests both have p-values > .05, so do not reject the null hypothesis of normally distributed populations.

```
library(nortest)
ad.test(machine$old)
```

```
##
## Anderson-Darling normality test
##
## data: machine$old
## A = 0.27475, p-value = 0.5782
```

`ad.test(machine$new)`

```
##
## Anderson-Darling normality test
##
## data: machine$new
## A = 0.14231, p-value = 0.9548
```

The pooled variance test can be used if the sample standard deviations are similar (their ratio between 0.5 and 2.0). That is the case here.

```
s_old = sd(machine$old)
s_new = sd(machine$new)
s_old / s_new
```

`## [1] 1.097203`

Better yet, use the **F test** to compare two variances. The F test compares uses the ratio of sample variances \(\hat{r} = s_X^2 / s_Y^2\) as an estimate of the ratio of population variances \(r = \sigma_X^2 / \sigma_Y^2\) to evaluate the hypothesized ratio of population variances \(r_0\). In this case, \(H_0: r = \sigma_{new}^2 / \sigma_{old}^2 = 1\). The p-value is \(P(f>|F|) = .7868\), so *do not* reject the null hypothesis of equal variances in packing times.

```
r_0 = 1
alpha = 0.05
var.test(x = machine$new,
y = machine$old,
ratio = 1,
alternative = "two.sided",
conf.level = .95)
```

```
##
## F test to compare two variances
##
## data: machine$new and machine$old
## F = 0.83067, num df = 9, denom df = 9, p-value = 0.7868
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.2063257 3.3442560
## sample estimates:
## ratio of variances
## 0.8306659
```

The pooled variance is \[s_p^2 = \frac{(n_{new}-1) s_{new}^2 + (n_{old}-1) s_{old}^2)}{n_{new} + n_{old} - 2} = \frac{(10-1) .683^2 + (10-1) .750^2)}{10 + 10 - 2} = 0.515\].

```
n_old <- nrow(machine)
n_new <- nrow(machine)
(s_p2 <- ((n_new - 1) * s_new^2 + (n_old - 1) * s_old^2) /
(n_new + n_old - 2))
```

`## [1] 0.5147222`

The variance of the measured difference sampling distribution is \[se^2 = s_p^2 (\frac{1}{n_{new}} + \frac{1}{n_{old}}) = .515 (\frac{1}{10} + \frac{1}{10}) = .103\].

`(se_2 <- s_p2 * (1 / n_new + 1 / n_old))`

`## [1] 0.1029444`

The test statistic for the pooled variance t-test for the difference in means is \[T = \frac{\hat{d} - d_0}{\sqrt{se^2}} = \frac{(42.14 - 43.23) - 0}{\sqrt{0.103}} = -3.40\]

```
d_hat <- mean(machine$new) - mean(machine$old)
(t <- (d_hat - d_0) / sqrt(se_2))
```

`## [1] -3.397231`

The p-value is \(P(t<T) = .0016\), so reject the null hypothesis of no difference in packing times.

`pt(q = t, df = n_new + n_old - 2, lower.tail = TRUE)`

`## [1] 0.001605571`

The function `t.test()`

calculates the t-test for the difference in means. Specify `var.equal = TRUE`

for a pooled variance.

```
(t.test.result <- t.test(x = machine$new,
y = machine$old,
alternative = "less",
mu = 0,
paired = FALSE,
var.equal = TRUE,
conf.level = (1 - alpha)))
```

```
##
## Two Sample t-test
##
## data: machine$new and machine$old
## t = -3.3972, df = 18, p-value = 0.001606
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.5336264
## sample estimates:
## mean of x mean of y
## 42.14 43.23
```

The plot of the hypothesis test shows the measured difference in means is well into the the range of rejection.

```
library(ggplot2)
library(dplyr) # for %>%
t_crit = qt(p = alpha, df = n_old + n_new - 2)
lrr <- d_0 - t_crit * sqrt(se_2)
urr <- Inf
data.frame(d = -250:250 / 100) %>%
mutate(t = (d - d_0) / sqrt(se_2)) %>%
mutate(prob = dt(x = t, df = n_old + n_new - 2)) %>%
mutate(rr = ifelse(t < t_crit, prob, 0)) %>%
ggplot() +
geom_line(aes(x = d, y = prob)) +
geom_area(aes(x = d, y = rr), fill = "red", alpha = 0.3) +
geom_vline(aes(xintercept = d_0), color = "blue") +
geom_vline(aes(xintercept = d_hat), color = "red") +
labs(title = bquote("Hypothesis Test of H0: d = 0"),
subtitle = bquote("d_hat ="~.(round(d_hat, 1))~", p-value="~.(round(t.test.result$p.value, 4))),
x = "d",
y = "Probability") +
theme(legend.position="none")
```

Construct the pooled variance confidence interval for the difference in means. The confidence interval is \(d = \hat{d} \pm t_{\alpha / 2, n_X + n_Y - 2} se\).

```
t <- qt(p = alpha / 2, df = n_old + n_new - 2, lower.tail = FALSE)
cat("95% CI: ", d_hat, " +/- ", t * sqrt(se_2),
" (", d_hat - t * sqrt(se_2), ", ", d_hat + t * sqrt(se_2), ")")
```

`## 95% CI: -1.09 +/- 0.6740799 ( -1.76408 , -0.4159201 )`

The function `t.test()`

calculates the t-test for the difference in means. Specify `var.equal = TRUE`

for a pooled variance.

```
(t.test.result <- t.test(x = machine$new,
y = machine$old,
paired = FALSE,
var.equal = TRUE,
conf.level = (1 - alpha)))
```

```
##
## Two Sample t-test
##
## data: machine$new and machine$old
## t = -3.3972, df = 18, p-value = 0.003211
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.7640799 -0.4159201
## sample estimates:
## mean of x mean of y
## 42.14 43.23
```

The plot of the confidence interval show the measured difference in means is significantly different from 0.

```
library(ggplot2)
library(dplyr) # for %>%
t_crit = qt(p = alpha / 2, df = n_old + n_new - 2, lower.tail = FALSE)
lcl <- -t_crit
ucl <- t_crit
data.frame(d = -250:250 / 100) %>%
mutate(t = (d - d_hat) / sqrt(se_2)) %>%
mutate(prob = dt(x = t, df = n_old + n_new - 2)) %>%
mutate(rr = ifelse(t > lcl & t < ucl, prob, 0)) %>%
ggplot() +
geom_line(aes(x = d, y = prob)) +
geom_area(aes(x = d, y = rr), fill = "blue", alpha = 0.1) +
geom_vline(aes(xintercept = d_hat), color = "blue") +
labs(title = bquote("95% CI for Difference in Means"),
subtitle = bquote("d_hat = "~.(round(d_hat, 1))),
x = "d",
y = "Probability") +
theme(legend.position="none")
```

In the example above, we could have used the separate variances t-test instead. How do the results differ?

The separate variances method defines the standard error of the difference in means sampling distribution as \(se = \frac{s_{new}}{\sqrt{n_{new}}} + \frac{s_{old}}{\sqrt{n_{old}}} = \frac{0.6835}{\sqrt{10}} + \frac{0.7499}{\sqrt{10}} = 0.3208\).

The degrees of freedom are \(df = \frac{(n_new - 1) (n_old - 1)}{(n_old - 1) C^2 + (1-C)^2 (n_new - 1)} = \frac{(10 - 1) (10 - 1)}{(10 - 1) (.4537)^2 + (1 - .4537)^2 (10-1) } = 17.847\) where \(C = \frac{s_{new}^2 {/} n_{new}}{se_\hat{d}^2} = \frac{(.6835)^2 {/} 10}{.3208^2} = 0.4537\) (known as Satterthwaiteâ€™s approximation).

`(se <- sqrt(s_new^2 / n_new + {s_old}^2 / n_old))`

`## [1] 0.3208496`

`(c <- s_new^2 / n_new / se^2)`

`## [1] 0.4537507`

```
(df_satter <- (n_new - 1) * (n_old - 1) /
((n_old - 1) * c^2 + (1 - c)^2 * (n_new - 1)))
```

`## [1] 17.8473`

Conduct the separate variance t-test for the difference in means. The T-statistic is \(T = \frac{\hat{d} - d_0}{\sqrt{se_\hat{d}^2}} = \frac{(42.14 - 43.23) - 0}{\sqrt{0.103}} = -3.40\). The p-value is \(P(t<T) = .0016\), so reject the null hypothesis of no difference in packing times.

`(t <- (d_hat - d_0) / sqrt(se))`

`## [1] -1.924313`

`pt(q = t, df = df_satter, lower.tail = TRUE)`

`## [1] 0.03520502`

The `stats`

package function `t.test`

calculates the t-test for the difference in means. This time specify `var.equal = FALSE`

for a separate variances test.

```
(t.test.result <- t.test(x = machine$new,
y = machine$old,
alternative = "less",
mu = 0,
paired = FALSE,
var.equal = FALSE,
conf.level = (1 - alpha)))
```

```
##
## Welch Two Sample t-test
##
## data: machine$new and machine$old
## t = -3.3972, df = 17.847, p-value = 0.001621
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.5333684
## sample estimates:
## mean of x mean of y
## 42.14 43.23
```

The separate variance method returned results nearly identical to the pooled variance method.

*A study of 15 pairs of twins where one twin had schizophrenia measured the volumes of regions of the brain. What is the magnitude of the difference in volume?*

```
x <- c(1.94, 1.44, 1.56, 1.58, 2.06,
1.66, 1.75, 1.77, 1.78, 1.92,
1.25, 1.93, 2.04, 1.62, 2.08)
y <- c(1.27, 1.63, 1.47, 1.39, 1.93,
1.26, 1.71, 1.67, 1.28, 1.85,
1.02, 1.34, 2.02, 1.59, 1.97)
d <- x - y
n <- length(d)
```

The null hypothesis is that the mean of the paired differences is zero, \(H_0: \delta = \delta_0 = 0\) with alternative hypothesis that the difference is non-zero, \(H_a: \delta \ne 0\). We use an \(\alpha = 0.05\) level of significance.

```
d_0 <- 0
d_bar <- mean(d)
se <- sd(d) / sqrt(n)
alpha <- 0.05
```

The samples are independent because the sets of twins are not related. The sample size is not large \((n < 30)\), so we need to check that each population is normally distributed before taking a difference in pairs mean test with the t-statistic. Neither normal Q-Q plot below indicates a non-normal population. The Anderson-Darling normality tests have p-values > .05, so do not reject the null hypothesis of normally distributed populations.

```
qqnorm(x)
qqline(x)
```

```
qqnorm(y)
qqline(y)
```

```
library(nortest)
ad.test(x)
```

```
##
## Anderson-Darling normality test
##
## data: x
## A = 0.25027, p-value = 0.6937
```

`ad.test(y)`

```
##
## Anderson-Darling normality test
##
## data: y
## A = 0.25863, p-value = 0.6639
```

Conduct the t-test for the mean of paired differences. The T-statistic is \(T = \frac{\bar{d} - \delta_0)}{se} = \frac{0.1987 - 0}{0.0615} = 3.2289\). The p-value is \(P(t<T) = .0061\), so reject the null hypothesis of a mean difference of zero. Construct a 95% CI as \(d_bar \pm t_{1 - \alpha {/} 2, n - 1} se = .1987 \pm 0.1320 = (0.0667, 0.3306)\).

`(t <- (d_bar - d_0) / se)`

`## [1] 3.228928`

`pt(q = t, df = n - 1, lower.tail = FALSE) * 2 # x2 for two-tail`

`## [1] 0.006061544`

`(t_crit <- qt(p = alpha / 2, df = n - 1, lower.tail = FALSE))`

`## [1] 2.144787`

`(epsilon <- t_crit * se)`

`## [1] 0.1319626`

`(lcl <- d_bar - epsilon)`

`## [1] 0.0667041`

`(ucl <- d_bar + epsilon)`

`## [1] 0.3306292`

`cat("95% CI: (", lcl, ", ", ucl, ")")`

`## 95% CI: ( 0.0667041 , 0.3306292 )`

The R function `t.test`

does these same calculations.

```
t.test(x = x,
y = y,
alternative = "two.sided",
mu = d_0,
paired = TRUE,
conf.level = (1 - alpha))
```

```
##
## Paired t-test
##
## data: x and y
## t = 3.2289, df = 14, p-value = 0.006062
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.0667041 0.3306292
## sample estimates:
## mean of the differences
## 0.1986667
```

Graph the t-test and CI.

```
# Hypothesis test graph
lrr <- d_0 - t * se
urr <- d_0 + t * se
data.frame(x = -500:500 / 1000) %>%
mutate(t = (x - d_0) / se) %>%
mutate(prob = dt(x = t, df = n - 1)) %>%
mutate(rr = ifelse(x < lrr | x > urr, prob, 0)) %>%
ggplot() +
geom_line(aes(x = x, y = prob)) +
geom_area(aes(x = x, y = rr), fill = "red", alpha = 0.3) +
geom_vline(aes(xintercept = d_0), color = "blue") +
geom_vline(aes(xintercept = d_bar), color = "red") +
labs(title = bquote('Hypothesis Test of H0: delta = 0'),
subtitle = bquote('Diff = '~.(round(d_bar, 4))),
x = "d",
y = "Probability") +
theme(legend.position="none")
```

```
# 95% confidence interval graph
data.frame(x = 0:500 / 1000) %>%
mutate(t = (x - d_bar) / se) %>%
mutate(prob = dt(x = t, df = n - 1)) %>%
mutate(rr = ifelse(x >= lcl & x <= ucl, prob, 0)) %>%
ggplot() +
geom_line(aes(x = x, y = prob)) +
geom_area(aes(x = x, y = rr), fill = "blue", alpha = 0.2) +
geom_vline(aes(xintercept = d_bar), color = "blue") +
labs(title = bquote('95% CI'),
subtitle = bquote('Difference = '~.(round(d_bar,4))~'+-'~.(round(epsilon, 4))~' using t-dist with'~.(n - 1)~'df.'),
x = "delta",
y = "Probability")
```