1 Simulation 1: binary \(X\), binary \(U\), and continuous \(Y\)
2 Simulation 2: continuous \(X\), binary \(U\), and continuous \(Y\)

1 Simulation 1: binary \(X\), binary \(U\), and continuous \(Y\)

Assume the following DAG and distributional assumptions \[U \sim Bernoulli(0.5),\] \[X = I_{U+Normal(0,1)>0.5},\] \[Y = 2U + Normal(0,1).\]

set.seed(1)
n = 1000
u = rbinom(n,1,0.5)
x = as.numeric(rnorm(n)+u > 0.5)
y = 2*u + rnorm(n)
dat = data.frame(u,x,y)

For one replication, the association between \(X\) and \(U\) is 0.34.

	X=0	X=1
U=0	346	174
U=1	155	325

The crude association between \(X\) and \(Y\) is 0.72, which is biased dut to confounding of \(U\).

tab_model(lm(y~x, data = dat))

	y
Predictors	Estimates	CI	p
(Intercept)	0.59	0.47 – 0.71	<0.001
x	0.72	0.55 – 0.90	<0.001
Observations	1000
R² / R² adjusted	0.062 / 0.061

1.1 Accounting for confounding - multiple regression

	y
Predictors	Estimates	CI	p
(Intercept)	-0.02	-0.13 – 0.08	0.646
x	0.05	-0.09 – 0.19	0.502
u	1.98	1.84 – 2.12	<0.001
Observations	1000
R² / R² adjusted	0.473 / 0.472

1.2 Accounting for confounding - IPW

First, we create weights

mod = glm(x~u,family = binomial,data = dat)
propensity = predict(mod,type = "response")
ipw = x/propensity + (1-x)/(1-propensity)

Then we use those weights in a regression model

tab_model(lm(y~x, weights = ipw, data = dat))

	y
Predictors	Estimates	CI	p
(Intercept)	0.92	0.80 – 1.05	<0.001
x	0.05	-0.13 – 0.23	0.601
Observations	1000
R² / R² adjusted	0.000 / -0.001

1.3 Replication of IPW

p_list = c()
for (i in 1:100) {
  set.seed(i)
  n = 1000
  u = rbinom(n,1,0.5)
  x = as.numeric(rnorm(n)+u > 0.5)
  y = 2*u + rnorm(n)
  dat = data.frame(u,x,y)
  mod = glm(x~u,family = binomial,data = dat)
  propensity = predict(mod,type = "response")
  ipw = x/propensity + (1-x)/(1-propensity)
  mod.ipw = lm(y~x, weights = ipw, data = dat)
  p_list[i] = summary(mod.ipw)$coefficients[2,4] 
}
mean(p_list <= 0.05)

## [1] 0.03

2 Simulation 2: continuous \(X\), binary \(U\), and continuous \(Y\)

Assume the following distributional assumptions \[U \sim Bernoulli(0.5),\] \[X = 0.5U + Normal(0,1),\] \[Y = 2U + Normal(0,1).\]

set.seed(1)
n = 1000
u = rbinom(n,1,0.5)
x = .5*u + rnorm(n)
y = 2*u + rnorm(n)
dat = data.frame(u,x,y)

2.1 Accounting for confounding - multiple regression

	y
Predictors	Estimates	CI	p
(Intercept)	-0.01	-0.10 – 0.08	0.872
x	0.03	-0.03 – 0.10	0.324
u	1.98	1.85 – 2.11	<0.001
Observations	1000
R² / R² adjusted	0.473 / 0.472

2.2 Accounting for confounding - IPW

First, we create weights using \[SW(x) = \frac{\hat{f}_X(x)}{\hat{f}_{X|U}(x|u)} = \frac{\frac{1}{\sqrt{2\pi\hat{\sigma}_1}^2}exp(-\frac{(x-\hat\mu_1)^2}{2\hat{\sigma_1}^2})}{\frac{1}{\sqrt{2\pi\hat{\sigma}_{2|u}^2}}exp(-\frac{(x-\hat\mu_{2|u})^2}{2\hat{\sigma}_{2|u}^2})}.\]

mod1 = lm(x~1,data = dat)
mod2 = lm(x~u,data = dat)
sigma1 = summary(mod1)$sigma
sigma2 = summary(mod2)$sigma
sw = (exp(-residuals(mod1)^2/(2*sigma1^2))/sigma1)/(exp(-residuals(mod2)^2/(2*sigma2^2))/sigma2)