We want to test the joint null hypothesis that binomial proportions do not differ between \(p\) samples. We specify this in terms of log-odds as \[H_0:\ \beta = \beta_0 (1, \dots, 1)^T\] where \(\beta = (\beta_1, \dots, \beta_p)^T\), and \(\hat\beta\) denotes the MLE of \(\beta\), and \(\beta_0\) is the common log-odds (implying a common proportion). Three commonly used methods are likelihood ratio, Rao’s score, and Wald test. All three test statistics are asymptotically equivalent; their small-sample performances can be widely different though.
Artificial data of a three-sample design involving \(p=3\) binomial proportions as an endpoint:
set.seed(12345)
MyData <- data.frame(Drug = rep(LETTERS[1:3], each=20),
Success = c(rbinom(20, 10, 0.4), rbinom(20, 10, 0.6), rbinom(20, 10, 0.7)),
Total = 10)
head(MyData)
## Drug Success Total
## 1 A 5 10
## 2 A 6 10
## 3 A 5 10
## 4 A 6 10
## 5 A 4 10
## 6 A 2 10
We can summarize and display the data in a \(2 \times 3\) contingency table:
succ <- xtabs(Success ~ Drug, MyData)
fail <- xtabs(Total ~ Drug, MyData) - succ
MyTable <- rbind(succ, fail)
MyTable
## A B C
## succ 82 119 143
## fail 118 81 57
Fitting a binomial GLM with a logit link function (logistic model) allows us to test odds ratios:
MyGLM <- glm(cbind(Success, Total - Success) ~ Drug, MyData, family = binomial(link = "logit"))
Testing relative risks directly is possible with a log link function:
MyGLM2 <- glm(cbind(Success, Total - Success) ~ Drug, MyData, family = binomial(link = "log"))
Using an identity link function leads to testing the proportions directly:
MyGLM3 <- glm(cbind(Success, Total - Success) ~ Drug, MyData, family = binomial(link = "identity"))
The test statistic is \[-2\ \log \Lambda = -2\ \log\left(\frac{L_0}{L_1}\right) = -2\ (\ell_0 - \ell_1) \overset{d}{\longrightarrow} \chi^2_k\] where \(L_0\) is the maximized likelihood of the reduced model (subject to restrictions on the parameters according to \(H_0\)), \(L_1\) is the maximized likelihood of the full (unrestricted) model, and \(k = p - p_0\) is the difference of their numbers of parameters.
We see that the LRT is invariant to the link function (in our simple one-way ANOVA-type setting):
anova(MyGLM, test="LRT")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: cbind(Success, Total - Success)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 59 116.123
## Drug 2 39.04 57 77.084 3.332e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(MyGLM2, test="LRT")
## Analysis of Deviance Table
##
## Model: binomial, link: log
##
## Response: cbind(Success, Total - Success)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 59 116.123
## Drug 2 39.04 57 77.084 3.332e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(MyGLM3, test="LRT")
## Analysis of Deviance Table
##
## Model: binomial, link: identity
##
## Response: cbind(Success, Total - Success)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 59 116.123
## Drug 2 39.04 57 77.084 3.332e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This LRT is equivalent to the \(G^2\) test for contingency tables with test statistic \[G^2 = 2 \sum_{i=1}^p Obs_i \log \left( \frac{Obs_i}{Exp_i} \right) \overset{d}{\longrightarrow} \chi^2_k\] where \(Obs_i\) and \(Exp_i\) are the observed and expected frequencies of sample \(i\). The \(G^2\) test is not readily available in R but can be implemented with a few lines of code:
g.test <- function(x){
expected <- rowSums(x) %o% colSums(x) / sum(x)
g <- 2 * sum(x * log(x / expected))
df <- (nrow(x) - 1) * (ncol(x) - 1)
p <- 1 - pchisq(g, df)
cat("\n G-squared test\n\nG-squared =", signif(g, 6), ", df =", df, ", p-value =", p)
}
g.test(MyTable)
##
## G-squared test
##
## G-squared = 39.0395 , df = 2 , p-value = 3.331768e-09
The test statistic is \[S = \left(\frac{\partial L(\hat\beta_0)}{\partial \beta}\right)^T -E\left(\frac{\partial^2 L(\hat\beta_0)}{\partial\beta \partial\beta'}\right)^{-1} \frac{\partial L(\hat\beta_0)}{\partial \beta} \overset{d}{\longrightarrow} \chi^2_k\] where \(k\) is defined as above. Unlike in the LRT statistic, only restricted estimates are involved.
Just like the LRT, the score test is invariant to the link function (in our simple one-way ANOVA-type setting):
anova(MyGLM, test="Rao")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: cbind(Success, Total - Success)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi)
## NULL 59 116.123
## Drug 2 39.04 57 77.084 38.604 4.143e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(MyGLM2, test="Rao")
## Analysis of Deviance Table
##
## Model: binomial, link: log
##
## Response: cbind(Success, Total - Success)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi)
## NULL 59 116.123
## Drug 2 39.04 57 77.084 38.604 4.143e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(MyGLM3, test="Rao")
## Analysis of Deviance Table
##
## Model: binomial, link: identity
##
## Response: cbind(Success, Total - Success)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi)
## NULL 59 116.123
## Drug 2 39.04 57 77.084 38.604 4.143e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The score test is equivalent to Pearson’s \(\chi^2\) test for contingency tables with test statistic \[\chi^2 = \sum_{i=1}^p \frac{(Obs_i - Exp_i)^2}{Obs_i} \overset{d}{\longrightarrow} \chi^2_k\] where \(Obs_i\) and \(Exp_i\) are defined as for the \(G^2\) statistic.
chisq.test(MyTable)
##
## Pearson's Chi-squared test
##
## data: MyTable
## X-squared = 38.6037, df = 2, p-value = 4.143e-09
The test statistic is \[W = (\hat\beta - \beta_0)^T\ [Cov(\hat\beta)]^{-1}\ (\hat\beta - \beta_0) \overset{d}{\longrightarrow} \chi^2_p\] with degrees of freedom \(p = rank [Cov(\hat\beta)]\). As opposed to the LRT and score statistics, only unrestricted estimates are involved.
library(multcomp)
summary(glht(MyGLM, mcp(Drug="Tukey")), test=Chisqtest())
##
## General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Linear Hypotheses:
## Estimate
## B - A == 0 0.7486
## C - A == 0 1.2838
## C - B == 0 0.5351
##
## Global Test:
## Chisq DF Pr(>Chisq)
## 1 37.27 2 8.058e-09
Note that the Pearson \(\chi^2\) and the \(G^2\) statistics are special cases of power divergence statistics (Cressie & Read, 1984) that have the general form \[2nI^{\lambda} = \frac{2}{\lambda (\lambda + 1)} \sum_{i=1}^p Obs_i \left[ \left( \frac{Obs_i}{Exp_i} \right)^{\lambda} - 1 \right]\ , \quad \lambda \in R.\] Choosing \(\lambda = 1\) yields Pearson’s \(\chi^2\) statistic, and \(\lambda = 0\) brings about an expression equivalent to the \(G^2\) statistic. Since \(\lambda (\lambda + 1) = 0\) with \(\lambda = 0\), the \(G^2\) statistic is defined by continuity: \[2nI^0 = \underset{\lambda \rightarrow 0}{lim}\ 2nI^{\lambda}.\] Cressie & Read (1984) recommend using \(0 \leq \lambda \leq \frac{3}{2}\) and especially \(\lambda = \frac{2}{3}\) which leads to \[2nI^{\frac{2}{3}} = \frac{9}{5} \sum_{i=1}^p Obs_i \left[ \left( \frac{Obs_i}{Exp_i} \right)^{\frac{2}{3}} - 1 \right].\] We implement this general test in R and apply it to our fake data using \(\lambda = \frac{2}{3}\):
powdiv <- function(x, lambda=2/3){
expected <- rowSums(x) %o% colSums(x) / sum(x)
stat <- 2 / (lambda * (lambda + 1)) * sum(x * ((x / expected)^lambda - 1))
df <- (nrow(x) - 1) * (ncol(x) - 1)
p <- 1 - pchisq(stat, df)
cat("\n Test of general power divergence statistics (Cressie & Read, 1984)\n\nstatistic =", signif(stat, 6), ", df =", df, ", p-value =", p)
}
powdiv(MyTable)
##
## Test of general power divergence statistics (Cressie & Read, 1984)
##
## statistic = 38.6721 , df = 2 , p-value = 4.00373e-09
The grand-mean-type max test rejects the global null hypothesis if the “most extreme” test statistic exceeds the equicoordinate \((1 - \alpha)\) quantile from a \(p\)-dimensional t-distribution which incorporates the correlation matrix of test statistics. Thus for two-sided hypotheses the max-type statistic is \[T_{max} = \underset{p}{\max} |T_i|\] where \(T_i\) is the studentized difference of sample mean \(i\) and the grand mean. We can use parameter and covariance estimates from either of the three models (involving logit, log, and identity link):
MyTest <- glht(MyGLM, mcp(Drug="GrandMean"))
min(summary(MyTest)$test$pvalues)
## [1] 2.935968e-08
MyTest2 <- glht(MyGLM2, mcp(Drug="GrandMean"))
min(summary(MyTest2)$test$pvalues)
## [1] 1.436494e-07
MyTest3 <- glht(MyGLM3, mcp(Drug="GrandMean"))
min(summary(MyTest3)$test$pvalues)
## [1] 1.328242e-08
# p: Number of samples
# n: Number of subsamples
# ni: Number of events per subsample
# delta: Effect size parameter
Set1 <- function(p, n, ni, delta){
Suc <- list()
if(p %% 2 > 0){
for(i in 1:(p/2 + 0.5)){
Suc[[i]] <- rbinom(n, ni, 0.5 - delta)
}
for(i in (p/2 + 1.5):p){
Suc[[i]] <- rbinom(n, ni, 0.5 + delta)
}
}else{
for(i in 1:(p/2)){
Suc[[i]] <- rbinom(n, ni, 0.5 - delta)
}
for(i in (p/2 + 1):p){
Suc[[i]] <- rbinom(n, ni, 0.5 + delta)
}
}
set <- data.frame(Drug = rep(LETTERS[1:p], each=n),
Success = unlist(Suc),
Total = ni)
}
library(ggplot2)
MySet1 <- Set1(p=6, n=100, ni=100, delta=0.1)
qplot(Drug, Success/Total, data=MySet1, geom="boxplot")
Set2 <- function(p, n, ni, delta){
Suc <- list()
Suc[[1]] <- rbinom(n, ni, 0.5 - delta)
for(i in 2:p){
Suc[[i]] <- rbinom(n, ni, 0.5 + delta)
}
set <- data.frame(Drug = rep(LETTERS[1:p], each=n),
Success = unlist(Suc),
Total = ni)
}
MySet2 <- Set2(p=6, n=100, ni=100, delta=0.1)
qplot(Drug, Success/Total, data=MySet2, geom="boxplot")
Set3 <- function(p, n, ni, delta){
Suc <- list()
for(i in 1:p){
Suc[[i]] <- rbinom(n, ni, 0.5 - ((p - 1) * delta / 2) + (i - 1) * delta)
}
set <- data.frame(Drug = rep(LETTERS[1:p], each=n),
Success = unlist(Suc),
Total = ni)
}
MySet3 <- Set3(p=6, n=100, ni=100, delta=0.1)
qplot(Drug, Success/Total, data=MySet3, geom="boxplot")
p <- 3:7
n <- 10
ni <- 10
delta <- seq(0, 0.15, by=0.01)
sim <- 100
LRT1 <- Score1 <- Max1 <- matrix(NA, nrow=length(p), ncol=length(delta))
rownames(LRT1) <- rownames(Score1) <- rownames(Max1) <- p
colnames(LRT1) <- colnames(Score1) <- colnames(Max1) <- delta
for(u in 1:length(p)){
cat("Running: p =", p[u], "\n")
flush.console()
for(v in 1:length(delta)){
lrt <- sco <- max <- numeric(sim)
for(j in 1:sim){
DAT <- Set1(p=p[u], n=n, ni=ni, delta=delta[v])
GLM <- glm(cbind(Success, Total - Success) ~ Drug, DAT, family = binomial(link = "logit"))
LRT <- anova(GLM, test="LRT")
SCO <- anova(GLM, test="Rao")
MAX <- glht(GLM, mcp(Drug="GrandMean"))
lrt[j] <- LRT$Pr[2]
sco[j] <- SCO$Pr[2]
max[j] <- min(summary(MAX)$test$pvalues)
}
LRT1[u, v] <- sum(lrt < 0.05) / sim
Score1[u, v] <- sum(sco < 0.05) / sim
Max1[u, v] <- sum(max < 0.05) / sim
}
}
## Running: p = 3
## Running: p = 4
## Running: p = 5
## Running: p = 6
## Running: p = 7
library(reshape2)
PowerSet1 <- rbind(melt(LRT1), melt(Score1), melt(Max1))
colnames(PowerSet1) <- c("p", "delta", "Power")
PowerSet1$Test <- rep(c("LRT", "Score", "Max"), each=length(delta) * length(p))
ggplot(PowerSet1, aes(x=delta, y=Power, colour=Test)) +
geom_hline(yintercept=0.05, linetype="dashed") +
geom_line() +
facet_wrap(~ p, nrow=1) +
theme_bw()