Let’s say we want to simulate a series of zeroes and ones to indicate a binary response (y) according to the following probabilities for two variables: sex and treat:
So if you’re female and in the control group, you have a probability of 0.40 of “success”, or getting a 1.
We can express this as a model with “female” and “control” as the reference levels, like so:
\[\text{Prob}(y = 1) = \text{logit}^{-1}(\beta_0 + \beta_1 \times \text{treat} - \beta_2 \times \text{male} - \beta_3 \times \text{treat} \times \text{male})\] \(\text{logit}^{-1}\) is the inverse logit function and it transforms log-odds into probability. We need this because the beta coefficients are on the log-odds scale.
Here’s how we can we express our probabilities as log-odds coefficients using the qlogis the function. The qlogis function takes a probability and returns log-odds. The plogis function takes log-odds and returns a probability. We include it below to demonstrate our log-odds can indeed be transformed back to our desired probabilities.
For \(\beta_0\), we simply transform 0.4 to log-odds. That’s the probability of female in the control group.
# female/control
qlogis(0.4)
## [1] -0.4054651
# verify
plogis(-0.4054651)
## [1] 0.4
For \(\beta_1\), we take the difference in log-odds between female in treatment group and female in the control group.
# female/treat
qlogis(0.85) - qlogis(0.4)
## [1] 2.140066
# verify
plogis(-0.4054651 + 2.140066)
## [1] 0.85
For \(\beta_2\), we take the difference in log-odds between male in control group and female in control group.
# male/control
qlogis(0.3) - qlogis(0.4)
## [1] -0.4418328
# verify
plogis(-0.4054651 + -0.4418328)
## [1] 0.3
\(\beta_3\) is what we add to the other 3 coefficients to get the log-odds for a male in the treated group.
# male/treat
qlogis(0.5) - qlogis(plogis(-0.4054651 + 2.140066 + -0.4418328))
## [1] -1.292768
# verify
plogis(-0.4054651 + 2.140066 + -0.4418328 + -1.292768)
## [1] 0.5
So our model as expressed with log-odds becomes:
p <- plogis(-0.4054651 +
2.140066*(treat == "treated") +
-0.4418328*(sex == "male") +
-1.292768*(treat == "treated")*(sex == "male"))
And then we can use the probabilities to simulate binary events using the rbinom function:
y <- rbinom(n = n, size = 1, prob = p)
For example:
treat <- sample(c("control", "treated"), size = 100, replace = TRUE)
sex <- sample(c("male", "female"), size = 100, replace = TRUE)
p <- plogis(-0.4054651 +
2.140066*(treat == "treated") +
-0.4418328*(sex == "male") +
-1.292768*(treat == "treated")*(sex == "male"))
y <- rbinom(n = length(p), size = 1, prob = p)
d <- data.frame(y, treat, sex)
head(d, n = 10)
## y treat sex
## 1 1 treated male
## 2 1 treated female
## 3 0 control female
## 4 1 treated male
## 5 1 control male
## 6 0 control female
## 7 1 treated female
## 8 1 treated female
## 9 0 treated male
## 10 0 control male