library(magrittr)
library(reshape2)
library(ggplot2)
Scenario: The outcome is jointly determined by the randomized exposure and two mutually associated predictors. There is no effect modification of exposure effect by either of the predictors.
The continous outcome was created under the true data generating model \(Y = \beta_0 + \beta_1 E + \beta_2 P_1 + \beta_3 P_2 + e\). The dichotomous variable was created from the continous outcome by dichotomizing at the median.
The identity link was used for regressing the continous outcome on the exposure (\(\pm\) predictors). The logit link and log link were used for regressing the dichotomized outcome on the exposure (\(\pm\) predictors).
As the treatment exposure is randomized, the effect estimate of interest is the marginal effect. The question is if we additionally adjust for the predictors, does the effect estimate of the exposure change?
Simulate <- function() {
## Outcome predictor 1
p1 <- rnorm(n = 1000, mean = 45, sd = 5)
## Outcome predictor 2 associated with predictor 1
p2 <- rnorm(n = 1000, mean = 120, sd = 5) + (p1/7)*runif(n = 1000)
## Randomly assign exposure
e <- rbinom(n = 1000, size = 1, prob = 0.5)
## Set true model parameters
beta0 <- 10
beta1 <- -0.5
beta2 <- 0.7
beta3 <- 0.9
## Continuous outcome under the true model
Ycont <- beta0 + beta1*e + beta2*p1 + beta3*p2 + rnorm(n = 1000, mean = 0, sd = 2)
## Dichotomized outcome variable Ydich
Ydich <- as.numeric(Ycont > median(Ycont))
## Identity link: Conditional effect of exposure
identity_conditional_coef_e <-
glm(Ycont ~ e + p1 + p2, family = gaussian(link = "identity")) %>%
coef %>% extract("e")
## Identity link: Marginal effect of exposure (valid estimate as the exposure is randomized)
identity_marginal_coef_e <-
glm(Ycont ~ e, family = gaussian(link = "identity")) %>%
coef %>% extract("e")
## Logit link: Conditional effect of exposure
logit_conditional_coef_e <-
glm(Ydich ~ e + p1 + p2, family = binomial(link = "logit")) %>%
coef %>% extract("e")
## Logit link: Marginal effect of exposure (valid estimate as the exposure is randomized)
logit_marginal_coef_e <-
glm(Ydich ~ e, family = binomial(link = "logit")) %>%
coef %>% extract("e")
## Log link: Conditional effect of exposure
log_conditional_coef_e <-
glm(Ydich ~ e + p1 + p2, family = poisson(link = "log")) %>%
coef %>% extract("e")
## Log link: Marginal effect of exposure (valid estimate as the exposure is randomized)
log_marginal_coef_e <-
glm(Ydich ~ e, family = poisson(link = "log")) %>%
coef %>% extract("e")
##
c(identity_marginal = identity_marginal_coef_e,
identity_conditional = identity_conditional_coef_e,
logit_marginal = logit_marginal_coef_e,
logit_conditional = logit_conditional_coef_e,
log_marginal = log_marginal_coef_e,
log_conditional = log_conditional_coef_e)
}
set.seed(20150117)
resSims <- sapply(1:1000, function(x){Simulate()}) %>%
t %>% data.frame %>%
melt(data = .,
id.vars = NULL,
variable.name = "variable",
value.name = "value")
From left to right, identity link (marginal effect and conditional effect), logit link (marginal effect and conditional effect), and log link (marginal effect and conditional effect). In the logit link case the mean over multiple iterations do not match between the marginal effect and conditional effect (non-collapsibility). For the identity link and log link, the are still conceptually different effect (marginal vs conditional), but in the absence of confounding or effect modification like in this case, they match up.
ggplot(data = resSims, mapping = aes(x = variable, y = value)) +
layer(geom = "boxplot") +
theme_bw() + theme(legend.key = element_blank(),
axis.text.x = element_text(angle = 60, vjust = 0.5))