Introduction

The proportion test uses the sample proportion \(p\) as an estimate of the population proportion \(\pi\) to evaluate an hypothesized population proportion \(\pi_0\). The null hypothesis is \(H_0: \pi = \pi_0\) and the alternative hypothesis is \(H_a: \pi \ne \pi_0\), \(H_a: \pi < \pi_0\), or \(H_a: \pi > \pi_0\). One can also construct a \((1 - \alpha)\%\) confidence interval around \(p\) to estimate \(\pi\) within a margin of error \(\epsilon\).11 See PSU STAT 500 online material, Stat Trek, and An R Companion for the Handbook of Biological Statistics by Salvatore S. Mangiafico.

There are several ways to perform a hypothesis test or calculate a confidence interval. Two important methods are the Wald method (aka normal approximation test) and the Clopper-Pearson method (aka exact binomial test). The normal approximation method is common because it is intuitive to learn, but it only applies when the central limit theorem conditions hold:

the sample is independently drawn (random sampling without replacement from \(n < 10%\) of the population in observational studies, or random assignment in experiments),
there are at least \(n \pi >= 5\) successes and \(n(1 - \pi) >= 5\) failures,
the sample size is \(n >= 30\), and
the probability of success is not extreme, \(0.2 < \pi < 0.8\).

Under the normal approximation method, the sampling distribution of the population proportion has a normal distribution centered at the measured proportion \(p\) with standard error \(se_p = \frac{s_p}{\sqrt{n}} = \frac{\sqrt{p(1 - p)}}{\sqrt{n}}\). The measured proportion \(p\) and standard deviation \(s_p\) approximate the population proportion \(\pi\) and standard deviation \(\sigma_\pi\). Test a hypothesis of \(\pi = \pi_0\) with test statistic \(Z = \frac{p - \pi_0}{se_{\pi_0}}\) where \(se_{\pi_0} = \frac{\sqrt{\pi_0 (1 - \pi_0)}}{\sqrt{n}}\) is the standard error of the hypothesized population proportion sampling distribution. Define a \((1 - \alpha)\%\) confidence interval as \(p \pm z_{\alpha / 2} se_p\).

The Clopper-Pearson exact binomial test is precise, but is theoretically complicated. The exact confidence interval method inverts two single-tailed Binomial tests. These notes do not present the formulas and derivations because they are complex - just rely on the software. The exact binomial method also works for small sample sizes and for extreme probabilities of success. It only requires independence and at least \(n \pi >= 5\) successes or \(n(1 - \pi) >= 5\) failures.

Hypothesis Test Example

A maintenance company claims to resolve at least 70% of maintenance requests within 24 hours. In a random sample of n = 50 repair requests, 33 (66%) were resolved within 24 hours. At a 5% level of significance, is the maintenance company’s claim valid?

The null hypothesis is that the maintenance company resolves \(\pi_0 = 70\%\) of requests within 24 hours, \(H_0: \pi = 0.70\) with alternative hypothesis that it resolves \(<70\%\) of requests within 24 hours, \(H_a: \pi < \pi_0\). This is a left-tailed test with an \(\alpha = 0.05\) level of significance.

pi_0 <- .70
n <- 50
k <- 33
p <- k / n
alpha <- 0.05

The sample is independently drawn without replacement from \(<10%\) of the population (by assumption) and there were \(>=5\) successes, so we can use the Clopper-Pearson exact binomial test. The p-value is greater than .05, so do not reject the maintenance company’s claim.

library(stats)

(binom.test.result <- binom.test(x = p * n, 
                                 n = n, 
                                 p = pi_0, 
                                 alternative = "less", 
                                 conf.level = 1 - alpha))

## 
##  Exact binomial test
## 
## data:  p * n and n
## number of successes = 33, number of trials = 50, p-value = 0.3161
## alternative hypothesis: true probability of success is less than 0.7
## 95 percent confidence interval:
##  0.000000 0.770452
## sample estimates:
## probability of success 
##                   0.66

There were also \(>=5\) failures, \(>=30\) observations, and the measured probability of success was within \((.2, .80)\), so the we can also use the Wald normal approximation method.

(prop.test.result <- prop.test(x = k, 
                               n = n, 
                               p = pi_0, 
                               alternative = "less", 
                               conf.level = 1 - alpha, 
                               correct = FALSE))

## 
##  1-sample proportions test without continuity correction
## 
## data:  k out of n, null probability pi_0
## X-squared = 0.38095, df = 1, p-value = 0.2685
## alternative hypothesis: true p is less than 0.7
## 95 percent confidence interval:
##  0.0000000 0.7594279
## sample estimates:
##    p 
## 0.66

# manual calculation.
pnorm(q = p, 
      mean = pi_0, 
      sd = sqrt(pi_0 * (1 - pi_0) / n), 
      lower.tail = TRUE)

## [1] 0.268547

Here is a graph of the hypothesis test. The \(alpha = 0.05\) rejection range is shown in red.

library(dplyr)
library(ggplot2)

p_value <- binom.test.result$p.value
lrr = qnorm(p = alpha, mean = pi_0, sd = sqrt(p * (1 - p) / n), lower.tail = TRUE)
urr = Inf
data.frame(pi = 400:900 / 1000) %>%
  mutate(density = dnorm(x = pi, mean = pi_0, sd = sqrt(pi_0 * (1 - pi_0) / n))) %>%
  mutate(rr = ifelse(pi < lrr | pi > urr, density, 0)) %>%
ggplot() +
  geom_line(aes(x = pi, y = density)) +
  geom_area(aes(x = pi, y = rr), fill = "red", alpha = 0.1) +
  geom_vline(aes(xintercept = pi_0), color = "black") +
  geom_vline(aes(xintercept = p), color = "red") +
  labs(title = bquote("Proportion Test"),
       subtitle = bquote("pi_0 ="~.(pi_0)~", p ="~.(p)~", p-value ="~.(round(p_value, 4))),
       x = "pi",
       y = "Density") +
  theme(legend.position="none")

Confidence Interval Example

In the example above, define a \((1 - \alpha)\%\) confidence interval for the company’s issue resolution rate.

Two good options for calucating the Clopper-Pearson CI are the binom.test function in the native stats package, and the BinomCI function in the DescTools package.

Notice in the formula below that the confidence interval uses \(p\) as the estimate of \(\pi\). (The hypothesis test in the prior example used \(\pi_0\) as the formula.)

library(DescTools)  # for BinomCI

BinomCI(x = k, n = n, conf.level = 1 - alpha, method = "clopper-pearson")

##       est    lwr.ci    upr.ci
## [1,] 0.66 0.5123475 0.7879453

(binom.test.result <- binom.test(x = k, n = n, conf.level = 1 - alpha))$conf.int

## [1] 0.5123475 0.7879453
## attr(,"conf.level")
## [1] 0.95

Do not use the prop.test function in the stats package to calculate a Wald CI - it calculates a Modified Wilson CI. To calculate the Wald CI, use the BinomCI function in the DescTools package.

BinomCI(x = k, n = n, conf.level = 1 - alpha, method = "wald")

##       est   lwr.ci   upr.ci
## [1,] 0.66 0.528697 0.791303

# manual calculation.
z_crit <- qnorm(p = 1 - alpha / 2)
se_p <- sqrt(p * (1 - p) / n)
epsilon <- z_crit * se_p
lcl <- p - epsilon
ucl <- p + epsilon
cat("95% CI: ", p, " +/- ", round(epsilon, 4), "  (", round(lcl, 4), ", ", round(ucl, 4), ")")

## 95% CI:  0.66  +/-  0.1313   ( 0.5287 ,  0.7913 )

Here is a graph of the confidence interval. The 95% confidence interval is shaded blue.

library(dplyr)  # for %>%
library(ggplot2)

data.frame(pi = 400:900 / 1000) %>%
  mutate(density = dnorm(x = pi, mean = p, sd = sqrt(p * (1 - p) / n))) %>%
  mutate(ci = ifelse(pi >= lcl & pi <= ucl, density, 0)) %>%
ggplot() +
  geom_line(aes(x = pi, y = density)) +
  geom_area(aes(x = pi, y = ci), fill = "blue", alpha = 0.1) +
  geom_vline(aes(xintercept = p), color = "black") +
  labs(title = bquote("Proportion Confidence Interval"),
       subtitle = bquote("pi ="~.(p)~" +/-"~
                           .(round(epsilon, 4))~
                           ",  ("~.(round(lcl, 4))~", "~.(round(ucl, 4))~")"),
       x = "pi",
       y = "Density") +
  theme(legend.position="none")

Example Where Wald Does not Apply

In the example above, suppose the maintenance company resolves \(p = 100\%\) of a sample of \(n = 10\) requests within 24 hours. Define a \((1 - \alpha)\%\) confidence interval for the company’s issue resolution rate.

k <- 10
n <- 10
p <- k / n

The sample is independently drawn without replacement from \(<10%\) of the population (by assumption) and there were \(>=5\) successes, so we can calculate a Clopper-Pearson CI. However, the Wald confidence interval returns the nonsense interval \((1,1)\).

BinomCI(x = k, n = n, conf.level = 1 - alpha, method = "clopper-pearson")

##      est    lwr.ci upr.ci
## [1,]   1 0.6915029      1

The normal approximation will not work because although there are not >=30 observations, >=5 failures, and the probability of success is not between .2 and .8. The Wald confidence interval returns the nonsense interval (1,1).

BinomCI(x = k, n = n, conf.level = 1 - alpha, method = "wald")

##      est lwr.ci upr.ci
## [1,]   1      1      1

Calculating Sample Size

Calculate the sample size required to achieve a specified margin of error ?? by solving the Wald CI equation for \(\epsilon = z_{\alpha / 2}^2 \sqrt{\frac{\pi_0(1 - \pi_0)}{n}}\) for \(n = \frac{z_{\alpha / 2}^2 \pi_0 (1 - \pi_0)}{\epsilon^2}\) . If \(\pi\) is not approximately known because there are no prior studies, use \(0.50\).

Example

What sample size would yield a margin of error of \(\pm .03\) with \(95\%\) confidence?

pi_0 <- 0.50
alpha <- 0.05
epsilon <- 0.03

Take the conservative assumption of \(\pi = .50\).

z <- qnorm(p = 1 - alpha / 2, mean = 0, sd = 1, lower.tail = TRUE)
(n <- ceiling(z^2 * pi_0 * (1 - pi_0) / epsilon^2))

## [1] 1068

Suppose prior studies had inidicated the expected population proportion is \(\pi = 0.80\). How would that change the required sample size?

pi_0 = 0.80

z <- qnorm(p = 1 - alpha / 2, mean = 0, sd = 1, lower.tail = TRUE)
(n <- ceiling(z^2 * pi_0 * (1 - pi_0) / epsilon^2))

## [1] 683

If the population size \(N\) is small relative to the expected sample size, the standard error estimate of \(\sigma=\sqrt{\frac{{\pi_0(1 - \pi_0)}}{{n}}}\) should be replaced with the more exact \(\sigma=\sqrt{\frac{{\pi_0(1 - \pi_0)}}{{n}} \frac{{N-n}}{{N-1}}}\). Solving for \(n\), the sample size required for an \(\alpha\) level of significance is

\(n = \frac{{m}}{{ 1 + \frac{{m - 1}}{{N}}}}\) where \(m = \frac{{z_{\alpha / 2}^2 \pi_0(1 - \pi_0)}}{{\epsilon^2}}\).

Example with Exact Equation

A researcher will estimate the proportions \(\pi\) of several yes/no questions on a survey to study a small town of \(N = 2000\) people. How many people \(n\) does she have to randomly sample (without replacement) to ensure a maximum error of \(\epsilon = 0.04\)?

pi <- 0.50
alpha <- 0.05
epsilon <- .04
N <- 2000

Because there are many different questions on the survey, use a sample proportion of \(\pi_0 = 0.50\). The population size is finite and small relative to the sample, so use the more exact calculation applies here.

z <- qnorm(p = 1 - alpha / 2, mean = 0, sd = 1, lower.tail = TRUE)
m <- z^2 * pi * (1 - pi) / epsilon^2
(n <- ceiling(m / (1 + (m - 1) / N)))

## [1] 462

Compare this to the less exact approach.

(n <- ceiling(z^2 * pi * (1 - pi) / epsilon^2))

## [1] 601

References

PSU STAT 500 online notes.

Stat Trek

An R Companion for the Handbook of Biological Statistics by Salvatore S. Mangiafico.

Proportion CI and Test

Conducting Hypothesis Tests and Calculating Confidence Intervals for a single Binomial Categorical Variable

Michael Foley

2019-02-20