Global Tests for Binomial Data (in R)

We want to test the joint null hypothesis that binomial proportions do not differ between \(p\) samples. We specify this in terms of log-odds as \[H_0:\ \beta = \beta_0 (1, \dots, 1)^T\] where \(\beta = (\beta_1, \dots, \beta_p)^T\), and \(\hat\beta\) denotes the MLE of \(\beta\), and \(\beta_0\) is the common log-odds (implying a common proportion). Three commonly used methods are likelihood ratio, Rao’s score, and Wald test. All three test statistics are asymptotically equivalent; their small-sample performances can be widely different though.

Data Example

Artificial data of a three-sample design involving \(p=3\) binomial proportions as an endpoint:

set.seed(12345)
MyData <- data.frame(Drug = rep(LETTERS[1:3], each=20),
                     Success = c(rbinom(20, 10, 0.4), rbinom(20, 10, 0.6), rbinom(20, 10, 0.7)),
                     Total = 10)
head(MyData)

##   Drug Success Total
## 1    A       5    10
## 2    A       6    10
## 3    A       5    10
## 4    A       6    10
## 5    A       4    10
## 6    A       2    10

We can summarize and display the data in a \(2 \times 3\) contingency table:

succ <- xtabs(Success ~ Drug, MyData)
fail <- xtabs(Total ~ Drug, MyData) - succ
MyTable <- rbind(succ, fail)
MyTable

##        A   B   C
## succ  82 119 143
## fail 118  81  57

Generalized Linear Model

Fitting a binomial GLM with a logit link function (logistic model) allows us to test odds ratios:

MyGLM <- glm(cbind(Success, Total - Success) ~ Drug, MyData, family = binomial(link = "logit"))

Testing relative risks directly is possible with a log link function:

MyGLM2 <- glm(cbind(Success, Total - Success) ~ Drug, MyData, family = binomial(link = "log"))

Using an identity link function leads to testing the proportions directly:

MyGLM3 <- glm(cbind(Success, Total - Success) ~ Drug, MyData, family = binomial(link = "identity"))

Likelihood Ratio Test

The test statistic is \[-2\ \log \Lambda = -2\ \log\left(\frac{L_0}{L_1}\right) = -2\ (\ell_0 - \ell_1) \overset{d}{\longrightarrow} \chi^2_k\] where \(L_0\) is the maximized likelihood of the reduced model (subject to restrictions on the parameters according to \(H_0\)), \(L_1\) is the maximized likelihood of the full (unrestricted) model, and \(k = p - p_0\) is the difference of their numbers of parameters.

We see that the LRT is invariant to the link function (in our simple one-way ANOVA-type setting):

anova(MyGLM, test="LRT")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: cbind(Success, Total - Success)
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                    59    116.123              
## Drug  2    39.04        57     77.084 3.332e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(MyGLM2, test="LRT")

## Analysis of Deviance Table
## 
## Model: binomial, link: log
## 
## Response: cbind(Success, Total - Success)
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                    59    116.123              
## Drug  2    39.04        57     77.084 3.332e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(MyGLM3, test="LRT")

## Analysis of Deviance Table
## 
## Model: binomial, link: identity
## 
## Response: cbind(Success, Total - Success)
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                    59    116.123              
## Drug  2    39.04        57     77.084 3.332e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This LRT is equivalent to the \(G^2\) test for contingency tables with test statistic \[G^2 = 2 \sum_{i=1}^p Obs_i \log \left( \frac{Obs_i}{Exp_i} \right) \overset{d}{\longrightarrow} \chi^2_k\] where \(Obs_i\) and \(Exp_i\) are the observed and expected frequencies of sample \(i\). The \(G^2\) test is not readily available in R but can be implemented with a few lines of code:

g.test <- function(x){
  expected <- rowSums(x) %o% colSums(x) / sum(x)
  g <- 2 * sum(x * log(x / expected))
  df <- (nrow(x) - 1) * (ncol(x) - 1)
  p <- 1 - pchisq(g, df)
  cat("\n G-squared test\n\nG-squared =", signif(g, 6), ", df =", df, ", p-value =", p)
}

g.test(MyTable)

## 
##  G-squared test
## 
## G-squared = 39.0395 , df = 2 , p-value = 3.331768e-09

Rao’s Score Test (a.k.a. Lagrange Multiplier Test)

The test statistic is \[S = \left(\frac{\partial L(\hat\beta_0)}{\partial \beta}\right)^T -E\left(\frac{\partial^2 L(\hat\beta_0)}{\partial\beta \partial\beta'}\right)^{-1} \frac{\partial L(\hat\beta_0)}{\partial \beta} \overset{d}{\longrightarrow} \chi^2_k\] where \(k\) is defined as above. Unlike in the LRT statistic, only restricted estimates are involved.

Just like the LRT, the score test is invariant to the link function (in our simple one-way ANOVA-type setting):

anova(MyGLM, test="Rao")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: cbind(Success, Total - Success)
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev    Rao  Pr(>Chi)    
## NULL                    59    116.123                     
## Drug  2    39.04        57     77.084 38.604 4.143e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(MyGLM2, test="Rao")

## Analysis of Deviance Table
## 
## Model: binomial, link: log
## 
## Response: cbind(Success, Total - Success)
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev    Rao  Pr(>Chi)    
## NULL                    59    116.123                     
## Drug  2    39.04        57     77.084 38.604 4.143e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(MyGLM3, test="Rao")

## Analysis of Deviance Table
## 
## Model: binomial, link: identity
## 
## Response: cbind(Success, Total - Success)
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev    Rao  Pr(>Chi)    
## NULL                    59    116.123                     
## Drug  2    39.04        57     77.084 38.604 4.143e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The score test is equivalent to Pearson’s \(\chi^2\) test for contingency tables with test statistic \[\chi^2 = \sum_{i=1}^p \frac{(Obs_i - Exp_i)^2}{Obs_i} \overset{d}{\longrightarrow} \chi^2_k\] where \(Obs_i\) and \(Exp_i\) are defined as for the \(G^2\) statistic.

chisq.test(MyTable)

## 
##  Pearson's Chi-squared test
## 
## data:  MyTable
## X-squared = 38.6037, df = 2, p-value = 4.143e-09

Wald Test

The test statistic is \[W = (\hat\beta - \beta_0)^T\ [Cov(\hat\beta)]^{-1}\ (\hat\beta - \beta_0) \overset{d}{\longrightarrow} \chi^2_p\] with degrees of freedom \(p = rank [Cov(\hat\beta)]\). As opposed to the LRT and score statistics, only unrestricted estimates are involved.

library(multcomp)
summary(glht(MyGLM, mcp(Drug="Tukey")), test=Chisqtest())

## 
##   General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Linear Hypotheses:
##            Estimate
## B - A == 0   0.7486
## C - A == 0   1.2838
## C - B == 0   0.5351
## 
## Global Test:
##   Chisq DF Pr(>Chisq)
## 1 37.27  2  8.058e-09

Remark

Note that the Pearson \(\chi^2\) and the \(G^2\) statistics are special cases of power divergence statistics (Cressie & Read, 1984) that have the general form \[2nI^{\lambda} = \frac{2}{\lambda (\lambda + 1)} \sum_{i=1}^p Obs_i \left[ \left( \frac{Obs_i}{Exp_i} \right)^{\lambda} - 1 \right]\ , \quad \lambda \in R.\] Choosing \(\lambda = 1\) yields Pearson’s \(\chi^2\) statistic, and \(\lambda = 0\) brings about an expression equivalent to the \(G^2\) statistic. Since \(\lambda (\lambda + 1) = 0\) with \(\lambda = 0\), the \(G^2\) statistic is defined by continuity: \[2nI^0 = \underset{\lambda \rightarrow 0}{lim}\ 2nI^{\lambda}.\] Cressie & Read (1984) recommend using \(0 \leq \lambda \leq \frac{3}{2}\) and especially \(\lambda = \frac{2}{3}\) which leads to \[2nI^{\frac{2}{3}} = \frac{9}{5} \sum_{i=1}^p Obs_i \left[ \left( \frac{Obs_i}{Exp_i} \right)^{\frac{2}{3}} - 1 \right].\] We implement this general test in R and apply it to our fake data using \(\lambda = \frac{2}{3}\):

powdiv <- function(x, lambda=2/3){
  expected <- rowSums(x) %o% colSums(x) / sum(x)
  stat <- 2 / (lambda * (lambda + 1)) * sum(x * ((x / expected)^lambda - 1))
  df <- (nrow(x) - 1) * (ncol(x) - 1)
  p <- 1 - pchisq(stat, df)
  cat("\n Test of general power divergence statistics (Cressie & Read, 1984)\n\nstatistic =", signif(stat, 6), ", df =", df, ", p-value =", p)
}

powdiv(MyTable)

## 
##  Test of general power divergence statistics (Cressie & Read, 1984)
## 
## statistic = 38.6721 , df = 2 , p-value = 4.00373e-09

Max Test (Grand Mean MCT)

The grand-mean-type max test rejects the global null hypothesis if the “most extreme” test statistic exceeds the equicoordinate \((1 - \alpha)\) quantile from a \(p\)-dimensional t-distribution which incorporates the correlation matrix of test statistics. Thus for two-sided hypotheses the max-type statistic is \[T_{max} = \underset{p}{\max} |T_i|\] where \(T_i\) is the studentized difference of sample mean \(i\) and the grand mean. We can use parameter and covariance estimates from either of the three models (involving logit, log, and identity link):

MyTest <- glht(MyGLM, mcp(Drug="GrandMean"))
min(summary(MyTest)$test$pvalues)

## [1] 2.935968e-08

MyTest2 <- glht(MyGLM2, mcp(Drug="GrandMean"))
min(summary(MyTest2)$test$pvalues)

## [1] 1.436494e-07

MyTest3 <- glht(MyGLM3, mcp(Drug="GrandMean"))
min(summary(MyTest3)$test$pvalues)

## [1] 1.328242e-08

Three Simulation Settings

# p: Number of samples
# n: Number of subsamples
# ni: Number of events per subsample
# delta: Effect size parameter

Set1 <- function(p, n, ni, delta){
  Suc <- list()
  if(p %% 2 > 0){
    for(i in 1:(p/2 + 0.5)){
      Suc[[i]] <- rbinom(n, ni, 0.5 - delta)
    }
    for(i in (p/2 + 1.5):p){
      Suc[[i]] <- rbinom(n, ni, 0.5 + delta)
    }
  }else{
    for(i in 1:(p/2)){
      Suc[[i]] <- rbinom(n, ni, 0.5 - delta)
    }
    for(i in (p/2 + 1):p){
      Suc[[i]] <- rbinom(n, ni, 0.5 + delta)
    }
  }
  set <- data.frame(Drug = rep(LETTERS[1:p], each=n),
                    Success = unlist(Suc),
                    Total = ni)
}

library(ggplot2)

MySet1 <- Set1(p=6, n=100, ni=100, delta=0.1)
qplot(Drug, Success/Total, data=MySet1, geom="boxplot")

Set2 <- function(p, n, ni, delta){
  Suc <- list()
  Suc[[1]] <- rbinom(n, ni, 0.5 - delta)
  for(i in 2:p){
    Suc[[i]] <- rbinom(n, ni, 0.5 + delta)
  }
  set <- data.frame(Drug = rep(LETTERS[1:p], each=n),
                    Success = unlist(Suc),
                    Total = ni)
}

MySet2 <- Set2(p=6, n=100, ni=100, delta=0.1)
qplot(Drug, Success/Total, data=MySet2, geom="boxplot")

Set3 <- function(p, n, ni, delta){
  Suc <- list()
  for(i in 1:p){
    Suc[[i]] <- rbinom(n, ni, 0.5 - ((p - 1) * delta / 2) + (i - 1) * delta)
  }
  set <- data.frame(Drug = rep(LETTERS[1:p], each=n),
                    Success = unlist(Suc),
                    Total = ni)
}

MySet3 <- Set3(p=6, n=100, ni=100, delta=0.1)
qplot(Drug, Success/Total, data=MySet3, geom="boxplot")

Simulation of Setting 1 (with 100 Runs)

p <- 3:7
n <- 10
ni <- 10
delta <- seq(0, 0.15, by=0.01)
sim <- 100

LRT1 <- Score1 <- Max1 <- matrix(NA, nrow=length(p), ncol=length(delta))
rownames(LRT1) <- rownames(Score1) <- rownames(Max1) <- p
colnames(LRT1) <- colnames(Score1) <- colnames(Max1) <- delta

for(u in 1:length(p)){
  cat("Running: p =", p[u], "\n")
  flush.console()
  for(v in 1:length(delta)){
    lrt <- sco <- max <- numeric(sim)
    for(j in 1:sim){
      DAT <- Set1(p=p[u], n=n, ni=ni, delta=delta[v])
      GLM <- glm(cbind(Success, Total - Success) ~ Drug, DAT, family = binomial(link = "logit"))
      LRT <- anova(GLM, test="LRT")
      SCO <- anova(GLM, test="Rao")
      MAX <- glht(GLM, mcp(Drug="GrandMean"))
      lrt[j] <- LRT$Pr[2]
      sco[j] <- SCO$Pr[2]
      max[j] <- min(summary(MAX)$test$pvalues)
    }
    LRT1[u, v] <- sum(lrt < 0.05) / sim
    Score1[u, v] <- sum(sco < 0.05) / sim
    Max1[u, v] <- sum(max < 0.05) / sim
  }
}

## Running: p = 3 
## Running: p = 4 
## Running: p = 5 
## Running: p = 6 
## Running: p = 7

library(reshape2)
PowerSet1 <- rbind(melt(LRT1), melt(Score1), melt(Max1))
colnames(PowerSet1) <- c("p", "delta", "Power")
PowerSet1$Test <- rep(c("LRT", "Score", "Max"), each=length(delta) * length(p))

ggplot(PowerSet1, aes(x=delta, y=Power, colour=Test)) +
  geom_hline(yintercept=0.05, linetype="dashed") +
  geom_line() +
  facet_wrap(~ p, nrow=1) +
  theme_bw()

References

Agresti A (2013) Categorical Data Analysis, 3rd Edition. John Wiley & Sons, Inc., Hoboken, NJ. ISBN 978-0-470-46363-5.
Cressie N, Read TRC (1984) Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society, Series B (Methodological), 46(3), 440-464.
Hothorn T, Bretz F, Westfall P (2008) Simultaneous inference in general parametric models. Biometrical Journal, 50(3), 346-363.
Konietschke F, Boesiger S, Brunner E, Hothorn LA (2013) Are multiple contrast tests superior to the ANOVA? The International Journal of Biostatistics, 9(1), 1-11.
Rao CR (1948) Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44(1), 50-57.
Wald A (1943) Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54(3), 426-483.
Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics, 9(1), 60-62.