Workshop on Causal Inference

Introduction

Recognize what to control for in DAGs
Estimation methods
Steps to causal inference
- Define causal effects
- Specify models
- Specify and defend assumptions

Causal effects and confounding

Causal effects

A: treatment

exposure/intervention of interest
Random variable
lowercase a refers to a realization of var A
Treatment status of subject i

Y: outcome

Could be continuous
Could be time to event
Could be multidimensional

Potential outcomes

Y^a for continuous ind var
Y¹ and Y⁰ for binary
- i.e. Y¹: time until individual would get flu if they received the vaccine
- Y⁰: time until the individual would get the flu if they did not receive the vaccine

Observed outcomes

Observed outcome Y is the outcome under treatment a subject received, i.e. Y = Y^A
Missing data problem where selection decided by treatment received

Independence

Assuming that the treatment given to one subject does not affect outcome for another subject

Causal effects

A had causal effect on Y if Y¹ =/= Y⁰
Cant assume causal effect of ibuprofen without knowing untreated effect
However can make estimates on population level (average causal effects)
E(Y¹ - Y⁰)
- Average outcome if everyone had been treated versus if everyone had been treated
- Typically not equal to E(Y|A = 1) - E(Y|A = 0) (two subpopulations) unless we randomize
- Pearl uses notation E(Y|do(A = 1)) - E(Y|do(A = 0))
- do notation means “set” treatment, equal to notation on top

Other causal effects

Potential risk ratio
Causal effect of a subpopulation E(Y¹ - Y⁰|V = v)
Removing an exposure, hypothetical “what ifs” E(Y¹ - Y⁰ |A = 1)
Quantile causal effects F_{$\overline{1}$}¹(p) - F_{$\overline{0}$}¹(p)

Valid examples

Invalid when comparing same treatment on two separate groups without strong assuptions
Not valid when comparing separate subpopulations even when we have different treatment effects

Confounding

When there are variables that affect the treatment decision and outcome
- Also possible in randomized trials when there is noncompliance
- Creates a problem because there is a lack of comparability between groups

Example

Severe injury is affecting likelihood of getting surgery but its not deterministic
Treatment will bump up Y^a
Will have separate distributions for a=0 and a=1, but curve might get pulled in one direction by L

Types of confounding

Controllable confounding
No confounding
Confounding not controllable using standard methods but controllable using others (e.g. instrumental variables)
- COuld use a natural randomizer
- Difference of differences tries to difference out unmeasured confounding variables
Uncontrollable confounding
- Can do sensitivity analysis

Controllable confounding

L is set of baseline (pre-treatment) covariates
If we control for confounding we can assume any other differnces are random
Controllable if there is overt (not hidden) bias
- f(Y⁰, Y¹ |L,A) = f(Y⁰, Y¹ |L)
- f(A|L, Y⁰, Y¹) = f(A|L)

Ignorability

Basically means conditional randomization
Treatment depends on potential outcomes

Graphical Models

dag terminology
- Nodes are variables and paths are flow from one variable to another

library(dagitty)
library(ggdag)

dagified <- dagify(z ~ x,
                   y  ~ z,
                   exposure = "x",
                   outcome = "y")
dag <- tidy_dagitty(dagified)

ggdag(dag, layout = circle) +
  theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank())

In this DAG:
- X is parent of Z, ancestor of Y
- Y is child of Z, descendent of X
- This is an example of chains

Information flow

Forks
- A $\leftarrow$ Z $\rightarrow$ B
- X and Y are not independent in diagram below since information flows from Z in both cases
- X and Y are associated with eachother

Paths that do not induce association

In cases of a collider G
- A $\rightarrow$ G $\leftarrow$ B
- A and B both affect G
- A and B are both independent

Blocking

Paths can be blocked by conditioning on nodes in the path
Stopping the independence
i.e. temperature -> icy sidewalks -> slipping
We can block the relationship between temperature and slipping by conditioning on sidewalks

D-separation

A path is d-separated by a set of nodes C if:

it contains a chain (D$\rightarrow$E$\rightarrow$F) and the middle part is in C

OR
it contains a fork (D$\leftarrow$E$\rightarrow$F) and the middle part is in C

OR
it contains an inverted form (D$\rightarrow$E$\leftarrow$F) and the middle part is not in C

Backdoor paths

Backdoor paths from treatment A to outcome Y are paths from A to that travel through arrows going into A

These confound the relationship between A and Y
Need to be blocked to make causal assumptions about effect of A on Y

Backdoor path criterion

Set of variables L is sufficient to control for confounding if:

It blocks all backdoor paths from treatment to the outcome
It does not include any descendants of treatment

This is the backdoor path criterion

Can identify backdoor paths as those that go against the flow of information

dagified <- dagify(y ~ a,
                   a  ~ w,
                   y ~ w,
                   w ~ z,
                   a ~ z,
                   exposure = "a",
                   outcome = "y")
dag <- tidy_dagitty(dagified)

ggdag(dag, layout = circle) +
  theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank())

In this example we can block the backdoor path of information from a to y by controlling for W or for W and Z

Can also control for other risk factors affecting y for sake of efficiency gains, does not affect causal outcome
Conditioning on collider opens path between vars leading into collider (ex 4 in slides)

dagified <- dagify(y ~ a,
                   y  ~ x2,
                   x3 ~ x2,
                   x3 ~ x1,
                   a ~ x1,
                   x3 ~ a,
                   y ~ x3,
                   exposure = "a",
                   outcome = "y")
dag <- tidy_dagitty(dagified)

ggdag(dag, layout = circle) +
  theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Here we need to condition on x3 because we need to prevent a backdoor path from x1 but we create another backdoor path when we condition on x3. We must also condition on x2.

Strategies

Adjust for all pre-treatment variables
- controlling for instruments can inflate standard errors
Select minimum set using backdoor criterion
Use disjunctive cause criterion
- Requires knowing all vars that directly affect treatment decision and outcome
- Just need to control for all potentially causal variables on both treatment and exposure
Debate on data-driven approach
- Could discard many weak confounders
- inferential challenge about getting SEs right for data collection

Identifiability and estimation

Identification

Identifying assumptions

Ignorability
- If we condition on L, A tells us nothing about Y
Positivity
Consistency

Special case: standardization

E(Y^a) = $\sum_{l = 0}^{1} f(x)$E(Y|A = a, L)dF(L)

Parametric G-Formula

Could estimate E(Y^a) by using models for E(Y|A = a,L) and f(L)

E(Y^a) = $\int_{}^{}$ E(Y|A = a,L;$\theta$)f(L;$\eta$)dL

Example

Can fit logistic regression Y|A,L
Get predicted values for Y for each L_i if A set to a
Average over empirical distribution of L
do for a = 0 and a = 1

Weighting

Standardization approach

Averaging over the marginal distributon of L

downweights those exposed to L and upweights those not exposed to L

Propensity score

The propensity score is the conditional probability of treatment P(A|L)

IPTW

Can weight by the inverse of the probability of treatment received
- If P(A = 1|L) is high we expect that and we want that case to be weighted down
- If P(A = 1|L) is low we don’t expect that and we want that case to be weighted up
Creating a pseudo-population by getting rid of confounder by creating a balance between treated and controlled
When we have balance we can just take sample means and have weighted estimates
Basically “erasing an arrow” from confounding, but you have to get the right Ls to weight to

Stabilized weights

If the prob of a particular treatment give L is small, then the weight will be large, the use of stabilized weights can help

What we know about IPTW

large weights lead to large SEs
very large weights if positivity nearly violated
We can do trimming or weight truncation to deal with large weights
weight truncation is setting a max weight
augmented IPTW - doubly robust and more efficient than IPTW

Matching

Matching is an alternative to standardization and IPTW estimation

Idea is to make the data more like what you would expect from a randomized trial
Match treated subjects to controls subjects on the confounders
Advtanges to matching
- controlling for confounders is achieved at the design phase without looking at the outcome
- Matching will reveal lack of overlap in covariate distribution
- Once data are matched, essentiall treated as if from an RCT

Distance

Matching requires calculating distance between L_i and L_j
- can do this with exact matching
- mahalanobis distance
- mahalanobis distance with a caliper (if distance to large, we force it to not match)
- propensity score - probability of treatment and compare scores for matching
- propoensity score with caliper
Can combine measures, such as using mahalanobis with propensity score caliper or KNN matching

Propensity score matching

If there is poor overlap we’d have to do extrapolation. We’d lose lots of data if we did trimming.
For subjects i,j, We then can create a distance matrix between e_i and e_j (propensity of i and j)

Criticisms

Propensity is based on treatment so we might be overweighting a covariate highly related to p of treatment even though it might not be that related to outcome

Some match types

Nearest Neighbor matching
- Computationally fast but not optimal
Optimal matching
- Size of population limiting on ability to do optimal matching
Sparse optimal matching
- Somewhat stratified
- Optimal matching within blocks
After matching you can analyze outcome data

R examples

Toy example

library(sandwich)

n<-200
L<-rbinom(n,1,0.5)
A<-rbinom(n,1,(0.3+0.3*L))
Y<-rnorm(n,(35+10*L+20*A),5)
mydata<-data.frame(cbind(A,L,Y))

kable(head(mydata))

A	L	Y
1	0	65.93717
0	1	44.00917
1	1	67.89592
0	0	43.63981
0	1	43.23738
1	1	70.90363

Let’s look at the diff of means

#unadjusted
#mean of Y for treated subjects
mean(Y[A==1])

## [1] 63.88793

#mean of Y for untreated subjects
mean(Y[A==0])

## [1] 38.31022

#difference
mean(Y[A==1])-mean(Y[A==0])

## [1] 25.57771

We can fit a regression here.

reg<-lm(Y~A+L,data=mydata)
summary(reg)

## 
## Call:
## lm(formula = Y ~ A + L, data = mydata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.6897  -3.1837   0.1299   3.6035  11.2976 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  34.5494     0.5490   62.94   <2e-16 ***
## A            20.4373     0.7932   25.77   <2e-16 ***
## L            11.4975     0.7932   14.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.004 on 197 degrees of freedom
## Multiple R-squared:  0.8846, Adjusted R-squared:  0.8834 
## F-statistic: 754.9 on 2 and 197 DF,  p-value: < 2.2e-16

Let’s compute standardization.

#can do by hand since one binary covariate
pr.l <- prop.table(table(mydata$L))
kable(pr.l)

Var1	Freq
0	0.465
1	0.535

tab.out <- aggregate(Y ~ A + L, mydata, mean)
kable(tab.out)

A	L	Y
0	0	34.47951
1	0	55.22618
0	1	46.19054
1	1	66.41428

Conditional means

We can computer conditional means and weigh by value of the covariate. This will allow us to correct the bias of treatment.

((mean(mydata$Y[mydata$A==1 & mydata$L==1]) - 
    mean(mydata$Y[mydata$A==0 & mydata$L==1]))*pr.l[2]) + 
  (mean(mydata$Y[mydata$A==1 & mydata$L==0]) - 
     mean(mydata$Y[mydata$A==0 & mydata$L==0]))*pr.l[1]

##       1 
## 20.4669

We can also calculate propensity scores:

p.score <- glm(A ~ as.factor(L), data=mydata, family=binomial)
p.a <- ifelse(mydata$A == 0, 1 - predict(p.score, type = "response"),
                  predict(p.score, type = "response"))
kable(table(p.a))

p.a	Freq
0.225806451613194	21
0.327102803738384	35
0.672897196261616	72
0.774193548386806	72

We get four values corresponding to two values of a for each treated and untreated L.

Next we can generate IP weights

mydata$w <- 1/p.a
kable(table(mydata$w))

Var1	Freq
1.29166666666715	72
1.48611111111126	72
3.05714285714224	35
4.42857142856573	21

summary(mydata$w)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.292   1.292   1.486   2.000   3.057   4.429

sd(mydata$w)

## [1] 1.046915

Let’s estimate ATE

mean(mydata$w*as.numeric(mydata$A==1)*mydata$Y) - 
  mean(mydata$w*as.numeric(mydata$A==0)*mydata$Y)

## [1] 20.4669

Creating a marginal structural model

msm <- lm(Y  ~ A, data = mydata, weights = w)

Now with inference

Weights create a pseudo population:

SE <-sqrt(diag(vcovHC(msm, type="HC0"))) # robust standard errors
beta <- coef(msm)
lcl <- beta-1.96*SE 
ucl <- beta+1.96*SE
cbind(beta, lcl, ucl)[2,]

##     beta      lcl      ucl 
## 20.46690 18.10841 22.82540

Real data

Let’s load our data:

library(tableone)
library(Matching)
library(geepack)
library(stats)


load(url("https://urldefense.proofpoint.com/v2/url?u=http-3A__biostat.mc.vanderbilt.edu_wiki_pub_Main_DataSets_rhc.sav&d=DwIGAg&c=lEzKI_JJakPtcnbAQ6Q5xQ&r=ADBnQF_uxktNSAkWyomgLeFb-eOkP5LRzvDAXj-4ESQ&m=GJ1JdJTpKzJvLWlb0bGX87lB6Gx5rPtIV9m0KvRo6AI&s=639__Kuc3BOyiS-CFnTwBH3AVQyrBK3GMJStuyusjrk&e="))

kable(head(rhc))

cat1	cat2	ca	sadmdte	dschdte	dthdte	lstctdte	death	cardiohx	chfhx	chrpulhx	malighx	immunhx	transhx	age	sex	edu	surv2md1	das2d3pc	t3d30	dth30	aps1	scoma1	meanbp1	wblc1	hrt1	resp1	temp1	pafi1	alb1	hema1	bili1	crea1	sod1	pot1	paco21	ph1	swang1	wtkilo1	dnr1	ninsclas	resp	card	neuro	gastr	renal	meta	hema	seps	trauma	ortho	adld3p	urin1	race	income	ptid
COPD	NA	Yes	11142	11151	NA	11382	No	0	0	1	1	0	0	70.25098	Male	12.000000	0.6409912	23.50000	30	No	46	0	41	22.0976562	124	10	38.69531	68.0000	3.500000	58.00000	1.0097656	1.1999512	145	4.000000	40	7.359375	No RHC	64.69995	No	Medicare	Yes	Yes	No	No	No	No	No	No	No	No	0	NA	white	Under $11k	00005
MOSF w/Sepsis	NA	No	11799	11844	11844	11844	Yes	1	1	0	0	1	1	78.17896	Female	12.000000	0.7549996	14.75195	30	No	50	0	63	28.8984375	137	38	38.89844	218.3125	2.599609	32.50000	0.6999512	0.5999756	137	3.299805	34	7.329102	RHC	45.69998	No	Private & Medicare	No	No	No	No	No	No	No	Yes	No	No	NA	1437	white	Under $11k	00007
MOSF w/Malignancy	MOSF w/Sepsis	Yes	12083	12143	NA	12400	No	0	0	0	1	1	0	46.09198	Female	14.069916	0.3169999	18.13672	30	No	82	0	57	0.0499954	130	40	36.39844	275.5000	3.500000	21.09766	1.0097656	2.5996094	146	2.899902	16	7.359375	RHC	0.00000	No	Private	No	Yes	No	No	No	No	No	No	No	No	NA	599	white	$25-$50k	00009
ARF	NA	No	11146	11183	11183	11182	Yes	0	0	0	0	1	0	75.33197	Female	9.000000	0.4409790	22.92969	30	No	48	0	55	23.2968750	58	26	35.79688	156.6562	3.500000	26.29688	0.3999634	1.6999512	117	5.799805	30	7.459961	No RHC	54.59998	No	Private & Medicare	Yes	No	No	No	No	No	No	No	No	No	NA	NA	white	$11-$25k	00010
MOSF w/Sepsis	NA	No	12035	12037	12037	12036	Yes	0	0	0	0	0	0	67.90997	Male	9.945259	0.4369998	21.05078	2	Yes	72	41	65	29.6992188	125	27	34.79688	478.0000	3.500000	24.00000	1.0097656	3.5996094	126	5.799805	17	7.229492	RHC	78.39996	Yes	Medicare	No	Yes	No	No	No	No	No	No	No	No	NA	64	white	Under $11k	00011
COPD	NA	No	12389	12396	NA	12590	No	0	1	1	0	0	0	86.07794	Female	8.000000	0.6650000	17.50000	30	No	38	0	115	18.0000000	134	36	39.19531	184.1875	3.099609	30.50000	1.0097656	1.3999023	138	5.399414	68	7.299805	No RHC	54.89999	No	Medicare	Yes	No	No	No	No	No	No	No	No	No	0	242	white	Under $11k	00012

Then we create binaries out of cat1 and make a new dataset:

ARF<-as.numeric(rhc$cat1=='ARF')
CHF<-as.numeric(rhc$cat1=='CHF')
Cirr<-as.numeric(rhc$cat1=='Cirrhosis')
colcan<-as.numeric(rhc$cat1=='Colon Cancer')
Coma<-as.numeric(rhc$cat1=='Coma')
COPD<-as.numeric(rhc$cat1=='COPD')
lungcan<-as.numeric(rhc$cat1=='Lung Cancer')
MOSF<-as.numeric(rhc$cat1=='MOSF w/Malignancy')
sepsis<-as.numeric(rhc$cat1=='MOSF w/Sepsis')
female<-as.numeric(rhc$sex=='Female')
died<-as.numeric(rhc$death=='Yes')
age<-rhc$age
treatment<-as.numeric(rhc$swang1=='RHC')
meanbp1<-rhc$meanbp1


mydata <-data.frame(cbind(ARF,CHF,Cirr,colcan,Coma,lungcan,MOSF,sepsis,
              age,female,meanbp1,treatment,died))

kable(head(mydata))

ARF	MOSF	sepsis	age	female	meanbp1	treatment	died
0	0	0	70.25098	0	41	0	0
0	0	1	78.17896	1	63	1	1
0	1	0	46.09198	1	57	1	0
1	0	0	75.33197	1	55	0	1
0	0	1	67.90997	0	65	1	1
0	0	0	86.07794	1	115	0	0

Making a vector out of covariates and printing summary results

xvars<-c("ARF","CHF","Cirr","colcan","Coma","lungcan","MOSF","sepsis",
         "age","female","meanbp1")
table1<- CreateTableOne(vars=xvars,strata="treatment", data=mydata, test=FALSE)

print(table1,smd=TRUE)

##                      Stratified by treatment
##                       0             1             SMD   
##   n                    3551          2184               
##   ARF (mean (SD))      0.45 (0.50)   0.42 (0.49)   0.059
##   CHF (mean (SD))      0.07 (0.25)   0.10 (0.29)   0.095
##   Cirr (mean (SD))     0.05 (0.22)   0.02 (0.15)   0.145
##   colcan (mean (SD))   0.00 (0.04)   0.00 (0.02)   0.038
##   Coma (mean (SD))     0.10 (0.29)   0.04 (0.20)   0.207
##   lungcan (mean (SD))  0.01 (0.10)   0.00 (0.05)   0.095
##   MOSF (mean (SD))     0.07 (0.25)   0.07 (0.26)   0.018
##   sepsis (mean (SD))   0.15 (0.36)   0.32 (0.47)   0.415
##   age (mean (SD))     61.76 (17.29) 60.75 (15.63)  0.061
##   female (mean (SD))   0.46 (0.50)   0.41 (0.49)   0.093
##   meanbp1 (mean (SD)) 84.87 (38.87) 68.20 (34.24)  0.455

Print unadjusted mean diffs

mean(died[treatment==1])

## [1] 0.6804029

mean(died[treatment==0])

## [1] 0.6296818

Create an outmodel

outmodel<-glm(died~treatment+ARF+CHF+Cirr+colcan+Coma+lungcan+MOSF+
               sepsis+age+female+meanbp1,
             family=binomial(),data=mydata)

Can create predicted values for each person if and if not treated

#now get predicted values for each person if they had been treated
mydata1<-mydata
mydata1$treatment<-1
pred1<-predict(outmodel,mydata1,type='response')

#get predicted values for each person if they had not been treated
mydata0<-mydata
mydata0$treatment<-0
pred0<-predict(outmodel,mydata0,type='response')

mean(pred1)

## [1] 0.6797657

mean(pred0)

## [1] 0.6298412

Propensity Scores

psmodel<-glm(treatment~ARF+CHF+Cirr+colcan+Coma+lungcan+MOSF+
               sepsis+age+female+meanbp1,
             family=binomial(),data=mydata)

#show coefficients etc
summary(psmodel)

## 
## Call:
## glm(formula = treatment ~ ARF + CHF + Cirr + colcan + Coma + 
##     lungcan + MOSF + sepsis + age + female + meanbp1, family = binomial(), 
##     data = mydata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7013  -1.0097  -0.6336   1.1814   2.4791  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.7299670  0.1997692  -3.654 0.000258 ***
## ARF          1.2931956  0.1487784   8.692  < 2e-16 ***
## CHF          1.6804704  0.1715672   9.795  < 2e-16 ***
## Cirr         0.5234506  0.2181458   2.400 0.016416 *  
## colcan       0.0295468  1.0985361   0.027 0.978542    
## Coma         0.7013451  0.1854937   3.781 0.000156 ***
## lungcan     -0.0869570  0.5039331  -0.173 0.863000    
## MOSF         1.3046587  0.1772705   7.360 1.84e-13 ***
## sepsis       2.0433604  0.1545437  13.222  < 2e-16 ***
## age         -0.0031374  0.0017289  -1.815 0.069567 .  
## female      -0.1697903  0.0583574  -2.909 0.003620 ** 
## meanbp1     -0.0109824  0.0008217 -13.366  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7621.4  on 5734  degrees of freedom
## Residual deviance: 6983.6  on 5723  degrees of freedom
## AIC: 7007.6
## 
## Number of Fisher Scoring iterations: 4

#create propensity score
pscore<-psmodel$fitted.values

Create weights from pscore

We can output quick histograms to make sure there is enough overlap.

#unstabilized weights
wt<-ifelse(treatment==1,1/pscore,1/(1-pscore))
hist(wt)

summary(wt)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.046   1.405   1.721   2.001   2.280  21.606

#stabilized weights
#probability of treatment (unconditional)
probt<-mean(treatment)
sw<-ifelse(treatment==1,probt/pscore,(1-probt)/(1-pscore))
hist(sw)

summary(sw)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4917  0.7579  0.8990  1.0004  1.0997  8.2279

Do a weighted regression

weighted.mean(died[treatment==1],wt[treatment==1])

## [1] 0.6820133

weighted.mean(died[treatment==0],wt[treatment==0])

## [1] 0.6304638

#now do weighted regression (get same estimates)
id<-1:length(died)
msm1 <- geeglm(as.numeric(died) ~ treatment, 
               id = id,
               weights = wt,
               family = binomial(link=logit))
summary(msm1)

## 
## Call:
## geeglm(formula = as.numeric(died) ~ treatment, family = binomial(link = logit), 
##     weights = wt, id = id)
## 
##  Coefficients:
##             Estimate Std.err   Wald Pr(>|W|)    
## (Intercept)  0.53421 0.03602 219.95  < 2e-16 ***
## treatment    0.22883 0.06486  12.45 0.000419 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Estimated Scale Parameters:
##             Estimate Std.err
## (Intercept)        1 0.01235
## 
## Correlation: Structure = independenceNumber of clusters:   5735   Maximum cluster size: 1

head(msm1$fitted.values)

##     [,1]
## 1 0.6305
## 2 0.6820
## 3 0.6820
## 4 0.6305
## 5 0.6820
## 6 0.6305

#using stabilized weights
msm2 <- geeglm(as.numeric(died) ~ treatment, 
               id = id,
               weights = sw,
               data=mydata,
               family = binomial(link=logit))
summary(msm2)

## 
## Call:
## geeglm(formula = as.numeric(died) ~ treatment, family = binomial(link = logit), 
##     data = mydata, weights = sw, id = id)
## 
##  Coefficients:
##             Estimate Std.err  Wald Pr(>|W|)    
## (Intercept)   0.5342  0.0360 219.9  < 2e-16 ***
## treatment     0.2288  0.0649  12.4  0.00042 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Estimated Scale Parameters:
##             Estimate Std.err
## (Intercept)        1  0.0106
## 
## Correlation: Structure = independenceNumber of clusters:   5735   Maximum cluster size: 1

head(msm2$fitted.values)

##    [,1]
## 1 0.630
## 2 0.682
## 3 0.682
## 4 0.630
## 5 0.682
## 6 0.630

Matching example

greedymatch<-Match(Tr=treatment,M=1,X=mydata[xvars],replace=FALSE)
matched<-mydata[unlist(greedymatch[c("index.treated","index.control")]), ]

#get table 1 for matched data with standardized differences
matchedtab1<-CreateTableOne(vars=xvars, strata ="treatment", 
                            data=matched, test = FALSE)
print(matchedtab1, smd = TRUE)

##                      Stratified by treatment
##                       0             1             SMD   
##   n                    2184          2184               
##   ARF (mean (SD))      0.42 (0.49)   0.42 (0.49)   0.006
##   CHF (mean (SD))      0.10 (0.29)   0.10 (0.29)  <0.001
##   Cirr (mean (SD))     0.02 (0.15)   0.02 (0.15)  <0.001
##   colcan (mean (SD))   0.00 (0.02)   0.00 (0.02)  <0.001
##   Coma (mean (SD))     0.04 (0.20)   0.04 (0.20)  <0.001
##   lungcan (mean (SD))  0.00 (0.05)   0.00 (0.05)  <0.001
##   MOSF (mean (SD))     0.07 (0.26)   0.07 (0.26)  <0.001
##   sepsis (mean (SD))   0.24 (0.43)   0.32 (0.47)   0.177
##   age (mean (SD))     61.53 (16.15) 60.75 (15.63)  0.049
##   female (mean (SD))   0.44 (0.50)   0.41 (0.49)   0.042
##   meanbp1 (mean (SD)) 73.12 (34.28) 68.20 (34.24)  0.144

#outcome analysis
y_trt<-matched$died[matched$treatment==1]
y_con<-matched$died[matched$treatment==0]
mean(y_trt)

## [1] 0.68

mean(y_con)

## [1] 0.626

#pairwise difference
diffy<-y_trt-y_con

#paired t-test
t.test(diffy)

## 
##  One Sample t-test
## 
## data:  diffy
## t = 4, df = 2000, p-value = 9e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.0271 0.0810
## sample estimates:
## mean of x 
##     0.054

#McNemar test
kable(table(y_trt,y_con))

	0	1
0	303	395
1	513	973

mcnemar.test(matrix(c(973,513,395,303),2,2))

## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  matrix(c(973, 513, 395, 303), 2, 2)
## McNemar's chi-squared = 20, df = 1, p-value = 1e-04

Causal Mediation

This is a case where we have an intervening variable that affects outcome, but isn’t confounding.

Direct and indirect effects

dag3 <- dagify(y ~ x,
       y ~ x + m, 
       m ~ x)
dag_med <- tidy_dagitty(dag3)

ggdag(dag_med, layout = circle) +
  theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Traditional approach is to fit two regression models (Baron and Kenny)

Controlled direct effects

Setting treatment to a but for mediator values we set values to Y^A,MA

Natural direct effects

setting treatment a to a then comparing the potential outcomes if mediator was set to what it would be if treatment a versus if treatment 0

Decomposition

Cross-world conterfactual

unobservable usually
Integrating over a different distribution
- Average mediator distribution over non treated

Workshop on Causal Inference

Chicago Chapter of the American Statistical Association

April 18, 2019

Introduction

Causal effects and confounding

Causal effects

A: treatment

Y: outcome

Potential outcomes

Observed outcomes

Independence

Causal effects

Other causal effects

Valid examples

Confounding

Example

Types of confounding

Controllable confounding

Ignorability

Graphical Models

Information flow

Paths that do not induce association

Blocking

D-separation

Backdoor paths

Backdoor path criterion

Strategies

Identifiability and estimation

Identification

Special case: standardization

Parametric G-Formula

Example

Weighting

Standardization approach

Propensity score

Stabilized weights

What we know about IPTW

Matching

Distance

Propensity score matching

Criticisms

Some match types

R examples

Toy example

We can fit a regression here.

Let’s compute standardization.

Conditional means

We can also calculate propensity scores:

Next we can generate IP weights

Let’s estimate ATE

Creating a marginal structural model

Now with inference

Real data

Let’s load our data:

Then we create binaries out of cat1 and make a new dataset:

Making a vector out of covariates and printing summary results

Print unadjusted mean diffs

Create an outmodel

Can create predicted values for each person if and if not treated

Propensity Scores

Create weights from pscore

Do a weighted regression

Matching example

Causal Mediation

Direct and indirect effects

Controlled direct effects

Natural direct effects

Decomposition

Cross-world conterfactual