The 96 subject rule for logistic regression

Simulate the 96 subject rule

First we sample 96 subjects from a binomial population with probability of 0.55. In other words, the TRUE probability of “success” is 0.55. We can do that by using the rbinom function. We can think of the code below as flipping a coin 96 times that has probability of 0.55 of landing heads. We store the result in x.

x <- rbinom(n = 96, size = 1, prob = 0.55)

Now we pretend we don’t know the true probability is 0.55 and try to estimate it with our sample, x. We can do that with an intercept-only logistic regression model. The coef function extracts the intercept. The plogis function calculates the inverse logit and converts the intercept value to probability.

m <- glm(x ~ 1, family = binomial)
# use inverse-logit to get probability
plogis(coef(m))

## (Intercept) 
##         0.5

Finally we check if our estimated probability is within 0.1 of the TRUE probability of 0.55.

abs(plogis(coef(m)) - 0.55) < 0.1

## (Intercept) 
##        TRUE

Now let’s do that 1000 times and see the proportion of times our estimate is within 0.1 of the TRUE probability of 0.55. The replicate function allows us to replicate a chunk of code as many times as we like. The result is stored in a vector I called “rep.out”. This is a vector of TRUE/FALSE values. Taking the mean of that vector returns the proportion of TRUES.

rep.out <- replicate(n = 1000, {
  x <- rbinom(n = 96, size = 1, prob = 0.55)
  m <- glm(x ~ 1, family = binomial)
  abs(plogis(coef(m)) - 0.55) < 0.10})
mean(rep.out)

## [1] 0.944

We see that we manage to come within 0.1 of the true probability 94.4 percent of the time.

The 96 subject rule for logistic regression

Clay Ford

7/1/2021

The 96 subject rule

Simulate the 96 subject rule