1 Simulation 1: binary \(X\), binary \(U\), and continuous \(Y\)

Assume the following DAG and distributional assumptions \[U \sim Bernoulli(0.5),\] \[X = I_{U+Normal(0,1)>0.5},\] \[Y = 2U + Normal(0,1).\]

For one replication, the association between \(X\) and \(U\) is 0.34.

X=0 X=1
U=0 346 174
U=1 155 325


The crude association between \(X\) and \(Y\) is 0.72, which is biased dut to confounding of \(U\).

  y
Predictors Estimates CI p
(Intercept) 0.59 0.47 – 0.71 <0.001
x 0.72 0.55 – 0.90 <0.001
Observations 1000
R2 / R2 adjusted 0.062 / 0.061

1.1 Accounting for confounding - multiple regression

  y
Predictors Estimates CI p
(Intercept) -0.02 -0.13 – 0.08 0.646
x 0.05 -0.09 – 0.19 0.502
u 1.98 1.84 – 2.12 <0.001
Observations 1000
R2 / R2 adjusted 0.473 / 0.472

1.2 Accounting for confounding - IPW

First, we create weights

Then we use those weights in a regression model

  y
Predictors Estimates CI p
(Intercept) 0.92 0.80 – 1.05 <0.001
x 0.05 -0.13 – 0.23 0.601
Observations 1000
R2 / R2 adjusted 0.000 / -0.001

2 Simulation 2: continuous \(X\), binary \(U\), and continuous \(Y\)

Assume the following distributional assumptions \[U \sim Bernoulli(0.5),\] \[X = 0.5U + Normal(0,1),\] \[Y = 2U + Normal(0,1).\]

2.1 Accounting for confounding - multiple regression

  y
Predictors Estimates CI p
(Intercept) -0.01 -0.10 – 0.08 0.872
x 0.03 -0.03 – 0.10 0.324
u 1.98 1.85 – 2.11 <0.001
Observations 1000
R2 / R2 adjusted 0.473 / 0.472

2.2 Accounting for confounding - IPW

First, we create weights using \[SW(x) = \frac{\hat{f}_X(x)}{\hat{f}_{X|U}(x|u)} = \frac{\frac{1}{\sqrt{2\pi\hat{\sigma}_1}^2}exp(-\frac{(x-\hat\mu_1)^2}{2\hat{\sigma_1}^2})}{\frac{1}{\sqrt{2\pi\hat{\sigma}_{2|u}^2}}exp(-\frac{(x-\hat\mu_{2|u})^2}{2\hat{\sigma}_{2|u}^2})}.\]

Then we use those weights in a regression model

  y
Predictors Estimates CI p
(Intercept) 0.94 0.85 – 1.04 <0.001
x 0.03 -0.05 – 0.12 0.450
Observations 1000
R2 / R2 adjusted 0.001 / -0.000