Let’s say we want to simulate a series of zeroes and ones to indicate a binary response (y) according to the following probabilities for two variables: sex and treat:

So if you’re female and in the control group, you have a probability of 0.40 of “success”, or getting a 1.

We can express this as a model with “female” and “control” as the reference levels, like so:

\[\text{Prob}(y = 1) = \text{logit}^{-1}(\beta_0 + \beta_1 \times \text{treat} - \beta_2 \times \text{male} - \beta_3 \times \text{treat} \times \text{male})\] \(\text{logit}^{-1}\) is the inverse logit function and it transforms log-odds into probability. We need this because the beta coefficients are on the log-odds scale.

Here’s how we can we express our probabilities as log-odds coefficients using the qlogis the function. The qlogis function takes a probability and returns log-odds. The plogis function takes log-odds and returns a probability. We include it below to demonstrate our log-odds can indeed be transformed back to our desired probabilities.

For \(\beta_0\), we simply transform 0.4 to log-odds. That’s the probability of female in the control group.

# female/control
qlogis(0.4)
## [1] -0.4054651
# verify
plogis(-0.4054651)
## [1] 0.4

For \(\beta_1\), we take the difference in log-odds between female in treatment group and female in the control group.

# female/treat
qlogis(0.85) - qlogis(0.4)
## [1] 2.140066
# verify
plogis(-0.4054651 + 2.140066)
## [1] 0.85

For \(\beta_2\), we take the difference in log-odds between male in control group and female in control group.

# male/control
qlogis(0.3) - qlogis(0.4)
## [1] -0.4418328
# verify
plogis(-0.4054651 + -0.4418328)
## [1] 0.3

\(\beta_3\) is what we add to the other 3 coefficients to get the log-odds for a male in the treated group.

# male/treat
qlogis(0.5) - qlogis(plogis(-0.4054651 + 2.140066 + -0.4418328))
## [1] -1.292768
# verify
plogis(-0.4054651 + 2.140066 + -0.4418328 + -1.292768)
## [1] 0.5

So our model as expressed with log-odds becomes:

p <- plogis(-0.4054651 + 
              2.140066*(treat == "treated") +
              -0.4418328*(sex == "male") +
              -1.292768*(treat == "treated")*(sex == "male"))

And then we can use the probabilities to simulate binary events using the rbinom function:

y <- rbinom(n = n, size = 1, prob = p)

For example:

treat <- sample(c("control", "treated"), size = 100, replace = TRUE)
sex <- sample(c("male", "female"), size = 100, replace = TRUE)
p <- plogis(-0.4054651 + 
              2.140066*(treat == "treated") +
              -0.4418328*(sex == "male") +
              -1.292768*(treat == "treated")*(sex == "male"))
y <- rbinom(n = length(p), size = 1, prob = p)
d <- data.frame(y, treat, sex)
head(d, n = 10)
##    y   treat    sex
## 1  1 treated   male
## 2  1 treated female
## 3  0 control female
## 4  1 treated   male
## 5  1 control   male
## 6  0 control female
## 7  1 treated female
## 8  1 treated female
## 9  0 treated   male
## 10 0 control   male