The R package binom makes it extremely easy to compute interval estimators for a binomial proportion.
library(binom)
## Loading required package: lattice
binom.confint(x = 5, n = 50)
## method x n mean lower upper
## 1 agresti-coull 5 50 0.1000 0.03914 0.2179
## 2 asymptotic 5 50 0.1000 0.01685 0.1832
## 3 bayes 5 50 0.1078 0.03919 0.2054
## 4 cloglog 5 50 0.1000 0.03673 0.2010
## 5 exact 5 50 0.1000 0.03328 0.2181
## 6 logit 5 50 0.1000 0.04224 0.2187
## 7 probit 5 50 0.1000 0.03960 0.2096
## 8 profile 5 50 0.1000 0.03707 0.2027
## 9 lrt 5 50 0.1000 0.03705 0.2027
## 10 prop.test 5 50 0.1000 0.03740 0.2259
## 11 wilson 5 50 0.1000 0.04348 0.2136
Many well-known intervals (e.g., 'asymptotic', 'cloglog', 'logit', 'probit') can be easily obtained in a unified approach by fitting a Binomial generalized linear model (GLM) with different link functions.
binom.ci <- function(x, n, alpha = 0.05, link = "logit") {
z <- qnorm(1 - alpha/2)
family <- binomial(link = link)
fit <- summary(glm(cbind(x, n - x) ~ 1, family = family))
est <- fit$coef[, "Estimate"]
se <- fit$coef[, "Std. Error"]
family$linkinv(est + c(-z, z) * se)
}
binom.ci(x = 5, n = 50, link = "identity")
## [1] 0.01685 0.18315
binom.ci(x = 5, n = 50, link = "cloglog")
## [1] 0.04289 0.22371
binom.ci(x = 5, n = 50, link = "logit")
## [1] 0.04224 0.21869
binom.ci(x = 5, n = 50, link = "probit")
## [1] 0.0396 0.2096
Note that different results are obtained for 'cloglog' intervals! The following is part of the codes from the 'binom.confint' function:
if (any(method == "cloglog") || all.methods) {
if (any(method != "exact")) {
x1 <- x == 0
x2 <- x == n
}
inner <- !x1 & !x2
log.mu <- sd <- lcl <- ucl <- rep(NA, length(x))
log.mu[inner] <- log(-log(p[inner]))
sd[inner] <- sqrt(var.cloglog(p[inner], n[inner]))
lcl[inner] <- exp(-exp(log.mu[inner] + z[inner] * sd[inner]))
ucl[inner] <- exp(-exp(log.mu[inner] - z[inner] * sd[inner]))
lcl[x1] <- rep(0, sum(x1))
lcl[x2] <- alpha2[x2]^(1/n[x2])
ucl[x1] <- 1 - alpha2[x1]^(1/n[x1])
ucl[x2] <- rep(1, sum(x2))
res.cloglog <- data.frame(method = rep("cloglog", NROW(x)), xn, mean = p,
lower = lcl, upper = ucl)
res <- if (is.null(res))
res.cloglog else rbind(res, res.cloglog)
}
It appears that the 'binom.confint' function is using the 'loglog' link \( g(p) = \log\{-\log(p)\} \) instead of the 'cloglog' link \( g(p) = \log\{-\log(1 - p)\} \). Therefore, the 'cloglog' interval returned by the 'binom.confint' function is very misleading!
For many common intervals, fitting a Binomial GLM is not always necessary because analytic formulae can be derived. For a reasonably large sample size \( n \), the Central Limit Theorem (CLT) indicates that \[ \sqrt{n}(\hat{p} - p) \overset{d}{\to} N(0, p(1 - p)). \] The above CLT allows us to construct the 'asymptotic' (Wald) interval \( \hat{p} \pm z_{1 - \alpha / 2}\sqrt{\hat{p} (1 - \hat{p}) / n} \):
x <- 5
n <- 50
p <- x/n
alpha <- 0.05
z <- qnorm(1 - alpha/2)
p + c(-z, z) * sqrt(p * (1 - p)/n)
## [1] 0.01685 0.18315
However, the approximation may be very unsatisfactory when the sample size is small and/or \( p \) is near the boundaries of the parameter spaces (i.e., 0 or 1)! To resolve this issue, we can apply the Delta's method to achieve better approximation: \[ \sqrt{n}\{g(\hat{p}) - g(p)\} \overset{d}{\to} N\left(0, \sigma^{2}(p)\right), \] where \( \sigma^{2}(p) = \{g'(p)\}^{2}p(1 - p) \). Based on the above results, one can first construct an approximate interval for \( \eta = g(p) \): \[ g(\hat{p}) \pm z_{1 - \alpha / 2} \sigma(\hat{p}) / \sqrt{n}, \] and transform the interval back to the original scale to obtain an approximate interval for \( p \): \[ g^{-1}\{g(\hat{p}) \pm z_{1 - \alpha / 2} \sigma(\hat{p}) / \sqrt{n}\}. \]
For \( g(p) = \log\{-\log(p)\} \), we have \( \sigma^{2}(p) = (1 - p) / [p\{\log(p)\}^{2}] \) and \( g^{-1}(\eta) = \exp\{-\exp(\eta)\} \). Note that \( g(p) = \log\{-\log(p)\} \) is a decreasing function of \( p \)!
est.loglog <- log(-log(p))
var.loglog <- (1 - p)/(p * (log(p))^2)
eta.loglog <- est.loglog + c(z, -z) * sqrt(var.loglog)/sqrt(n)
exp(-exp(eta.loglog))
## [1] 0.03673 0.20096
For \( g(p) = \log\{-\log(1 - p)\} \), we have \( \sigma^{2}(p) = p / [(1 - p)\{\log(1 - p)\}^{2}] \) and \( g^{-1}(\eta) = 1 - \exp\{-\exp(\eta)\} \).
est.cloglog <- log(-log(1 - p))
var.cloglog <- p/((1 - p) * (log(1 - p))^2)
eta.cloglog <- est.cloglog + c(-z, z) * sqrt(var.cloglog)/sqrt(n)
1 - exp(-exp(eta.cloglog))
## [1] 0.04289 0.22371
For \( g(p) = \log\{p / (1 - p)\} \), we have \( \sigma^{2}(p) = 1 / \{p (1 - p)\} \) and \( g^{-1}(\eta) = \exp(\eta) / \{1 + \exp(\eta)\} \).
est.logit <- log(p/(1 - p))
var.logit <- 1/(p * (1 - p))
eta.logit <- est.logit + c(-z, z) * sqrt(var.logit)/sqrt(n)
exp(eta.logit)/(1 + exp(eta.logit))
## [1] 0.04224 0.21869
According to Wikipedia, the Clopper-Pearson (exact) interval can be expressed as \[ \left(B_{\alpha/2}(x, n - x + 1), B_{1 - \alpha/2}(x + 1, n - x)\right) \]
c(qbeta(alpha/2, x, n - x + 1), qbeta(1 - alpha/2, x + 1, n - x))
## [1] 0.03328 0.21814
and the Wilson (score) interval can be expressed as \[ \frac{1}{1 + z^{2} / n}\left\{\hat{p} + z^{2} / (2n) \pm z \sqrt{\hat{p}(1 - \hat{p}) / n + z^{2} / (4n^{2})}\right\} \]
(p + z^2/(2 * n) + c(-z, z) * sqrt(p * (1 - p)/n + z^2/(4 * n^2)))/(1 + z^2/n)
## [1] 0.04348 0.21360