Assume the following DAG and distributional assumptions \[U \sim Bernoulli(0.5),\] \[X = I_{U+Normal(0,1)>0.5},\] \[Y = 2U + Normal(0,1).\]
set.seed(1)
n = 1000
u = rbinom(n,1,0.5)
x = as.numeric(rnorm(n)+u > 0.5)
y = 2*u + rnorm(n)
dat = data.frame(u,x,y)For one replication, the association between \(X\) and \(U\) is 0.34.
| X=0 | X=1 | |
|---|---|---|
| U=0 | 346 | 174 |
| U=1 | 155 | 325 |
The crude association between \(X\) and \(Y\) is 0.72, which is biased dut to confounding of \(U\).
| y | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.59 | 0.47 – 0.71 | <0.001 |
| x | 0.72 | 0.55 – 0.90 | <0.001 |
| Observations | 1000 | ||
| R2 / R2 adjusted | 0.062 / 0.061 | ||
| y | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | -0.02 | -0.13 – 0.08 | 0.646 |
| x | 0.05 | -0.09 – 0.19 | 0.502 |
| u | 1.98 | 1.84 – 2.12 | <0.001 |
| Observations | 1000 | ||
| R2 / R2 adjusted | 0.473 / 0.472 | ||
First, we create weights
mod = glm(x~u,family = binomial,data = dat)
propensity = predict(mod,type = "response")
ipw = x/propensity + (1-x)/(1-propensity)Then we use those weights in a regression model
| y | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.92 | 0.80 – 1.05 | <0.001 |
| x | 0.05 | -0.13 – 0.23 | 0.601 |
| Observations | 1000 | ||
| R2 / R2 adjusted | 0.000 / -0.001 | ||
p_list = c()
for (i in 1:100) {
set.seed(i)
n = 1000
u = rbinom(n,1,0.5)
x = as.numeric(rnorm(n)+u > 0.5)
y = 2*u + rnorm(n)
dat = data.frame(u,x,y)
mod = glm(x~u,family = binomial,data = dat)
propensity = predict(mod,type = "response")
ipw = x/propensity + (1-x)/(1-propensity)
mod.ipw = lm(y~x, weights = ipw, data = dat)
p_list[i] = summary(mod.ipw)$coefficients[2,4]
}
mean(p_list <= 0.05)## [1] 0.03
Assume the following distributional assumptions \[U \sim Bernoulli(0.5),\] \[X = 0.5U + Normal(0,1),\] \[Y = 2U + Normal(0,1).\]
set.seed(1)
n = 1000
u = rbinom(n,1,0.5)
x = .5*u + rnorm(n)
y = 2*u + rnorm(n)
dat = data.frame(u,x,y)| y | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | -0.01 | -0.10 – 0.08 | 0.872 |
| x | 0.03 | -0.03 – 0.10 | 0.324 |
| u | 1.98 | 1.85 – 2.11 | <0.001 |
| Observations | 1000 | ||
| R2 / R2 adjusted | 0.473 / 0.472 | ||
First, we create weights using \[SW(x) = \frac{\hat{f}_X(x)}{\hat{f}_{X|U}(x|u)} = \frac{\frac{1}{\sqrt{2\pi\hat{\sigma}_1}^2}exp(-\frac{(x-\hat\mu_1)^2}{2\hat{\sigma_1}^2})}{\frac{1}{\sqrt{2\pi\hat{\sigma}_{2|u}^2}}exp(-\frac{(x-\hat\mu_{2|u})^2}{2\hat{\sigma}_{2|u}^2})}.\]
mod1 = lm(x~1,data = dat)
mod2 = lm(x~u,data = dat)
sigma1 = summary(mod1)$sigma
sigma2 = summary(mod2)$sigma
sw = (exp(-residuals(mod1)^2/(2*sigma1^2))/sigma1)/(exp(-residuals(mod2)^2/(2*sigma2^2))/sigma2)Then we use those weights in a regression model
| y | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.94 | 0.85 – 1.04 | <0.001 |
| x | 0.03 | -0.05 – 0.12 | 0.450 |
| Observations | 1000 | ||
| R2 / R2 adjusted | 0.001 / -0.000 | ||
p_list = c()
for (i in 1:100) {
set.seed(i)
n = 1000
u = rbinom(n,1,0.5)
x = .5*u + rnorm(n)
y = 2*u + rnorm(n)
dat = data.frame(u,x,y)
mod1 = lm(x~1,data = dat)
mod2 = lm(x~u,data = dat)
sigma1 = summary(mod1)$sigma
sigma2 = summary(mod2)$sigma
sw = (exp(-residuals(mod1)^2/(2*sigma1^2))/sigma1)/(exp(-residuals(mod2)^2/(2*sigma2^2))/sigma2)
mod.ipw = lm(y~x, weights = sw, data = dat)
p_list[i] = summary(mod.ipw)$coefficients[2,4]
}
mean(p_list <= 0.05)## [1] 0.06