The mean test uses the sample mean \(\bar{x}\) as an estimate of the population mean \(\mu\) to evaluate the hypothesized population mean \(\mu_0\) The null hypothesis is that \(H_0: \mu = \mu_0\) and the alternative hypothesis \(H_a: \mu \ne \mu_0\), \(H_a: \mu < \mu_0\), or \(H_a: \mu > \mu_0\). One can also construct a \((1 - \alpha)\%\) confidence interval around \(\bar{x}\) to estimate \(\mu\) within a margin of error \(\epsilon\).1 See PSU STAT 500 online material, and Stat Trek.
There are two ways to perform a hypothesis test or calculate a confidence interval: the normal approximation z test, and the t test. The normal approximation method only applies when the sampling distribution of the population mean is normally distributed with known standard error. The distribution is normal when the underlying population is normally distributed, or when the sample size is large \((n >= 30)\).2 The large sample size yielding a normally distributed sampling distribution follows from the central limit theorem. The t-test returns similar results when the sampling distribution is normal, plus it is valid when the standard error is unknown (which is always!).
Under the normal approximation method, the measured mean \(\bar{x}\) approximates the population mean \(\mu\), and the sampling distribution has a normal distribution centered at \(\mu\) with standard error \(se_\mu = \frac{\sigma}{\sqrt{n}}\) where \(\sigma\) is the standard deviation of the underlying population. Test \(H_0: \mu = \mu_0\) with test statistic \(Z = \frac{\bar{x} - \mu_0}{se_\mu}\). Define a \((1 - \alpha)\%\) confidence interval as \(\bar{x} \pm z_{(1 - \alpha) {/} 2} se_\mu\).
\(\sigma\) is always unknown, but we can estimate it with the sample standard deviation \(s\). The resulting sampling distribution has a t-distribution centered at \(\mu\) with standard error \(se_\bar{x} = \frac{s}{\sqrt{n}}\). Test \(H_0: \mu = \mu_0\) with test statistic \(T = \frac{\bar{x} - \mu_0}{se_\bar{x}}\). Define a \((1 - \alpha)\%\) confidence interval as \(\bar{x} \pm t_{(1 - \alpha){/}2} se_\bar{x}\). The t test only requires that either the underlying population is normally distributed, or the sample is large enough \((n >= 30)\) for the sampling distribution to be normally distributed.3 If the sample fails the normality condition, use extreme caution or use a nonparametric confidence interval for the median.
A random sample of \(n=32\) cars have a fuel economy of \(\bar{x}=20.1\) mpg with standard deviation \(s=6.0\) mpg. The prior measured overall fuel economy for vehicles was \(\mu_0=16.0\) mg/dl. Has fuel economy improved?
(n <- nrow(mtcars))
## [1] 32
(x_bar <- mean(mtcars$mpg))
## [1] 20.09062
(s <- sd(mtcars$mpg))
## [1] 6.026948
mu_0 <- 16.0
The sample size is \(>=30\), so the sampling distribution of the population mean is normally distributed. The population standard deviation \(\sigma\) is unknown, so estimate it with the sample standard deviation \(s\) and use the t test.
\(H_0: \mu = 16.0\), and \(H_a: \mu > 16.0\) - a right-tail test. The test statistic \(T = \frac{\bar{x} - \mu_0}{se_\bar{x}}\) where \(se_\bar{x} = \frac{s}{\sqrt{n}} = \frac{6.0}{\sqrt{32}}= 1.06\). \(T = \frac{20.1 - 16.0}{1.07} = 3.84\). \(P(t > 3.84) = .0003\), so reject \(H_0\) at the \(\alpha = 0.05\) level of significance.
library(stats) # for pt
alpha <- 0.05
(se <- s / sqrt(n))
## [1] 1.065424
(t <- (x_bar - mu_0) / se)
## [1] 3.839434
(pt(q = t, df = n - 1, lower.tail = FALSE))
## [1] 0.0002847998
R package stats function t.test performs the t-test directly. t.test calculates a confidence interval with lower limit equal to \(\bar{x} - t_{1-\alpha, df} se_{\bar{x}}\). Under this formulation, one rejects \(H_0\) if \(\mu_0\) is outside this interval.4 The other, more intuitive way, is to calculate the range of rejection regions around \(\mu_0\) and then reject \(H_0\) if \(\bar{x}\) is within the range of rejection.
(t.test.result <- t.test(x = mtcars$mpg,
alternative = "greater",
mu = mu_0,
conf.level = 1 - alpha))
##
## One Sample t-test
##
## data: mtcars$mpg
## t = 3.8394, df = 31, p-value = 0.0002848
## alternative hypothesis: true mean is greater than 16
## 95 percent confidence interval:
## 18.28418 Inf
## sample estimates:
## mean of x
## 20.09062
Visualize the hypothesis test using the ggplot2 package function ggplot.
library(dplyr) # for %>%
library(ggplot2)
lrr = -Inf # right-tailed test
urr = mu_0 + qt(p = alpha, df = n - 1, lower.tail = FALSE) * se
data.frame(mu = 1000:2500 / 100) %>%
mutate(t = (mu - mu_0) / se) %>%
mutate(prob = dt(x = t, df = n - 1)) %>%
mutate(rr = ifelse(mu < lrr | mu > urr, prob, 0)) %>%
ggplot() +
geom_line(aes(x = mu, y = prob)) +
geom_area(aes(x = mu, y = rr), fill = "red", alpha = 0.3) +
geom_vline(aes(xintercept = mu_0), color = "black") +
geom_vline(aes(xintercept = x_bar), color = "blue") +
labs(title = bquote("Right-Tail Hypothesis Test"),
subtitle = bquote("mu_0=" ~ .(mu_0) ~
", x_bar =" ~ .(round(x_bar, 1)) ~
", p-value =" ~ .(round(t.test.result$p.value, 4))),
x = "mu",
y = "Probability") +
theme(legend.position="none",
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
The \(95%\) confidence interval for \(\mu\) is \(\bar{x} \pm t_{(1 - \alpha){/}2} se_\bar{x}\) where \(t_{(1 - \alpha){/}2}\) for \(n - 1\) degrees of freedom is \(2.04\). \(\mu = 20.1 \pm (2.04)(1.07) = 20.1 \pm 2.17\) \((17.9, 22.3)\).
(t_crit <- qt(p = (1 - alpha / 2),
df = n - 1,
lower.tail = TRUE))
## [1] 2.039513
(epsilon <- t_crit * se)
## [1] 2.172946
(lcl <- x_bar - epsilon)
## [1] 17.91768
(ucl <- x_bar + epsilon)
## [1] 22.26357
The t.test function will also calculate the confidence interval. Do not supply the hypothesis test parameters in the call this time.
(t.test.result <- t.test(x = mtcars$mpg,
conf.level = 1 - alpha))
##
## One Sample t-test
##
## data: mtcars$mpg
## t = 18.857, df = 31, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 17.91768 22.26357
## sample estimates:
## mean of x
## 20.09062
Visualize the confidence interval using the ggplot2 package function ggplot.5 the CI is a interval around the point estimate. In the case of a right-tailed test, the CI might start at the \(1-\alpha\) level and streatch to Inf.
library(dplyr) # for %>%
library(ggplot2)
lcl = t.test.result$conf.int[1]
ucl = t.test.result$conf.int[2]
data.frame(mu = 1000:2500 / 100) %>%
mutate(t = (mu - x_bar) / se) %>%
mutate(prob = dt(x = t, df = n - 1)) %>%
mutate(ci = ifelse(mu >= lcl & mu <= ucl, prob, 0)) %>%
ggplot() +
geom_line(aes(x = mu, y = prob)) +
geom_area(aes(x = mu, y = ci), fill = "grey", alpha = 0.3) +
geom_vline(aes(xintercept = x_bar), color = "blue") +
labs(title = bquote("Confidence Interval"),
subtitle = bquote("x_bar =" ~ .(round(x_bar, 1)) ~
", alpha =" ~ .(alpha) ~
", lcl =" ~ .(round(lcl, 1)) ~
", ucl =" ~ .(round(ucl, 1))),
x = "mu",
y = "Probability") +
theme(legend.position="none",
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
A random sample of \(n = 16\) Americans measured pounds of beef consumed in a year. Evaluate the hypothesis that the population value is \(\mu_0=115\) pounds.
lbs <- c(118, 115, 125, 110, 112, 130, 117, 112,
115, 120, 113, 118, 119, 122, 123, 126)
n <- length(lbs)
(x_bar <- mean(lbs))
## [1] 118.4375
(s <- sd(lbs))
## [1] 5.656486
(se <- s / sqrt(n))
## [1] 1.414121
\(n < 30\), so only use the t-test if the sample data follows a normal distribution. Test for normality by starting with the assumption that the distribution is normal, \(H_0: normal\), then falsifying the assumption if sufficient evidence exists.
Look for substantial deviations from a straight line. This one looks good.
qqnorm(lbs)
qqline(lbs)
For a more quantitative analysis, use the Anderson-Darling normality test. The test p-value is the probability of calculating the test statistic if the distribution is normal. The p-value is .88, so do not reject \(H_0\).
library(nortest)
ad.test(lbs)
##
## Anderson-Darling normality test
##
## data: lbs
## A = 0.19325, p-value = 0.8753
The null hypothesis is \(H_0: \mu_0 = 115\) pounds with alternative hypothesis \(H_a: \mu_0 \ne 115\) pounds.
mu_0 <- 115
alpha <- 0.05
Conduct the t-test. The p-value is less than .05 (just barely!), so reject \(H_0\).
(t.test.result <- t.test(x = lbs,
alternative = "two.sided",
mu = mu_0,
conf.level = (1 - alpha)))
##
## One Sample t-test
##
## data: lbs
## t = 2.4308, df = 15, p-value = 0.02808
## alternative hypothesis: true mean is not equal to 115
## 95 percent confidence interval:
## 115.4234 121.4516
## sample estimates:
## mean of x
## 118.4375
lrr = mu_0 + qt(p = alpha, df = n - 1, lower.tail = TRUE) * se
urr = mu_0 + qt(p = alpha, df = n - 1, lower.tail = FALSE) * se
data.frame(mu = 1000:1250 / 10) %>%
mutate(t = (mu - mu_0) / se) %>%
mutate(prob = dt(x = t, df = n - 1)) %>%
mutate(rr = ifelse(mu < lrr | mu > urr, prob, 0)) %>%
ggplot() +
geom_line(aes(x = mu, y = prob)) +
geom_area(aes(x = mu, y = rr), fill = "red", alpha = 0.3) +
geom_vline(aes(xintercept = mu_0), color = "black") +
geom_vline(aes(xintercept = x_bar), color = "blue") +
labs(title ="Two-Tail Hypothesis Test",
subtitle = bquote("mu_0=" ~ .(mu_0) ~
", x_bar =" ~ .(round(x_bar, 1)) ~
", p-value =" ~ .(round(t.test.result$p.value, 4))),
x = "mu",
y = "Probability") +
theme(legend.position="none",
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Determine the sample size required for a maximum error \(\epsilon\) in the estimate by solving the confidence interval equation for \(n=\frac{{t_{\alpha/2,n-1}^2se^2}}{{\epsilon^2}}\) . Unfortunately, \(t_{\alpha/2,n-1}^2\) is dependent on \(n\), so replace it with \(z_{\alpha/2}^2\). What about \(s^2\)? Estimate it from the literature, a pilot study, or using the empirical rule of \(95%\) of range falling within two standard deviations, \(s=\frac{{range}}{{4}}\).
If the maximum tolerable error is \(\epsilon=3\), and the sample variance is \(s^2 = 10^2\), what sample size provides an \(\alpha=0.05\) conidence level?
alpha = 0.05
s = 10
epsilon = 3
z_crit = qnorm(p = 1 - alpha / 2,
mean = 0,
sd = 1,
lower.tail = TRUE)
(n = ceiling(z_crit^2 * s^2 / epsilon^2))
## [1] 43
\(n\) varies with confidence level and maximum error.
library(tidyr) # for spread
alpha <- c(.10, .10, .10, .05, .05, .05, .01, .01, .01)
epsilon <- c(1, 3, 5, 1, 3, 5, 1, 3, 5)
dataframe <- data.frame(alpha, epsilon)
dataframe <- dataframe %>%
mutate(z_crit = qnorm(p = 1 - alpha / 2, mean = 0, sd = 1,
lower.tail = TRUE),
n = ceiling(z_crit^2 * s^2 / epsilon^2))
dataframe$z_crit <- NULL
spread(dataframe, key = epsilon, n) %>%
knitr::kable()
| alpha | 1 | 3 | 5 |
|---|---|---|---|
| 0.01 | 664 | 74 | 27 |
| 0.05 | 385 | 43 | 16 |
| 0.10 | 271 | 31 | 11 |