From Probabilities to Log-Odds for Simulation

Let’s say we want to simulate a series of zeroes and ones to indicate a binary response (y) according to the following probabilities for two variables: sex and treat:

female, control: p = 0.40
female, treated: p = 0.85
male, control: p = 0.30
male, treated: p = 0.50

So if you’re female and in the control group, you have a probability of 0.40 of “success”, or getting a 1.

We can express this as a model with “female” and “control” as the reference levels, like so:

\[\text{Prob}(y = 1) = \text{logit}^{-1}(\beta_0 + \beta_1 \times \text{treat} - \beta_2 \times \text{male} - \beta_3 \times \text{treat} \times \text{male})\] \(\text{logit}^{-1}\) is the inverse logit function and it transforms log-odds into probability. We need this because the beta coefficients are on the log-odds scale.

\(\beta_0\) = log-odds of a female in the control group
\(\beta_1\) = difference in log-odds between a female in the treatment group and a female in the control group
\(\beta_2\) = difference in log-odds between a male in the control group and a female in the control group
\(\beta_3\) = difference in log-odds for the male treatment effect versus the female treatment effect (ie, what we add to the other 3 beta coefficients to get the log-odds for a male in the treatment group)

Here’s how we can we express our probabilities as log-odds coefficients using the qlogis the function. The qlogis function takes a probability and returns log-odds. The plogis function takes log-odds and returns a probability. We include it below to demonstrate our log-odds can indeed be transformed back to our desired probabilities.

For \(\beta_0\), we simply transform 0.4 to log-odds. That’s the probability of female in the control group.

# female/control
qlogis(0.4)

## [1] -0.4054651

# verify
plogis(-0.4054651)

## [1] 0.4

For \(\beta_1\), we take the difference in log-odds between female in treatment group and female in the control group.

# female/treat
qlogis(0.85) - qlogis(0.4)

## [1] 2.140066

# verify
plogis(-0.4054651 + 2.140066)

## [1] 0.85

For \(\beta_2\), we take the difference in log-odds between male in control group and female in control group.

# male/control
qlogis(0.3) - qlogis(0.4)

## [1] -0.4418328

# verify
plogis(-0.4054651 + -0.4418328)

## [1] 0.3

\(\beta_3\) is what we add to the other 3 coefficients to get the log-odds for a male in the treated group.

# male/treat
qlogis(0.5) - qlogis(plogis(-0.4054651 + 2.140066 + -0.4418328))

## [1] -1.292768

# verify
plogis(-0.4054651 + 2.140066 + -0.4418328 + -1.292768)

## [1] 0.5

So our model as expressed with log-odds becomes:

p <- plogis(-0.4054651 + 
              2.140066*(treat == "treated") +
              -0.4418328*(sex == "male") +
              -1.292768*(treat == "treated")*(sex == "male"))

And then we can use the probabilities to simulate binary events using the rbinom function:

y <- rbinom(n = n, size = 1, prob = p)

For example:

treat <- sample(c("control", "treated"), size = 100, replace = TRUE)
sex <- sample(c("male", "female"), size = 100, replace = TRUE)
p <- plogis(-0.4054651 + 
              2.140066*(treat == "treated") +
              -0.4418328*(sex == "male") +
              -1.292768*(treat == "treated")*(sex == "male"))
y <- rbinom(n = length(p), size = 1, prob = p)
d <- data.frame(y, treat, sex)
head(d, n = 10)

##    y   treat    sex
## 1  1 treated   male
## 2  1 treated female
## 3  0 control female
## 4  1 treated   male
## 5  1 control   male
## 6  0 control female
## 7  1 treated female
## 8  1 treated female
## 9  0 treated   male
## 10 0 control   male

From Probabilities to Log-Odds for Simulation

Clay Ford

3/18/2021