Causal Diagrams and Directed Acyclic Graphs (DAGs)

Research Design Lab, Week 6,
11-14 Nov. 2024

Outline

Key terms in DAG language
Why DAGs
Identify common causes & confounding
Identify common effects & selection bias
D-seperation rules
Unmeasured confounders
Identify minimally sufficient set
Read complex DAGs
Measurement bias in DAGs
Limitation of DAGs
Create your own DAGs and read those by others

Section 1: Directed Acyclic Graphs (DAGs)

It is the graphical representation of causal relationships by prior expert knowledge.

Nodes: variables of interest
Arrows/directed edges: direction of causality
Acyclic: non-recursive relationships

Causal DAGs

A DAG is a causal DAG if when two variables on the graph share a cause, and that cause is also represented on the graph (causal Markov condition).

There is no need to include every variable that comes to mind:

Causal path

In the DAGs below:

M is a child of A and A is a parent of M.
M and B are descendants of A, and A and M are ancestors of B.
M, located in the middle of the path, can be referred to as a mediator.
Paths are sequences of arrows, of any direction, connecting two variables and can be causal or non-causal.

Direct, indirect & total effects

Why draw DAGs for our research?

Image source here

Why not?
Clarify theoretical relationships
Build simultaneously statistical models (independence/associations/… without equations!)
Conceptualize and identify potential biases
Theory-informed models

Exercise 1.1

In a causal DAG with nodes X, Z, and Y, the absence of an arrow from node Z to node Y means:

A) There is no effect of Y on Z
B) There is no effect of Z on Y
C) There is no association between Z and Y
D) There is an association between Z and Y

B is correct. On a causal DAG, an arrow from node Z to node Y indicates that Z has an effect on Y. Therefore, the absence of an arrow represents no effect of Z on Y.

Exercise 1.2

A DAG is causal if the causes shared by any pair of variables on the graph are also on the DAG.

True
False

True. All common causes, even if unmeasured, of any pair of variables on the graph must be included on a causal DAG.

Exercise 1.3

If our knowledge is insufficient for us to rule out a direct effect of variable A on variable B, we should then anyway draw an arrow from A to B on the causal DAG.

True
False

True. The absence of an arrow on a DAG means that we believe there is no relationship between A and B. It is a stronger assumption to not have an arrow on a DAG, than to have one. So, if we cannot rule out a direct effect of A on B, we should draw an arrow.

Exercise 1.4.1

Which DAG represents the causal relationship in which X influences Z, Z influences Y, and X also directly influences Y?

B is correct.

Exercise 1.4.2

Which of them are causal DAGs?

A and B. Because Z and Y share the same cause X. In DAG (C), X and Z are two independent causes of Y.

Exercise 1.4.3

In this causal DAG, there is no association between X and Y, conditioning on Z.

True
False

True. Because there is no direct arrow from X to Y, so there is no association between X and Y conditioned on Z. (Even though X has a causal effect on Y!)

Exercise 1.5.1

Select the best causal DAG that represents the following scenario:

Social origin influences a student’s academic performance and, independently, their decision to attend college.

C is correct.

Exercise 1.5.2

Select the best causal DAG that represents the following scenario:

Social origin influences a student’s academic performance, and therefore, their decision to attend college.

A is correct.

Exercise 1.6

For the following questions, think: how does education level influence voting?

You believe that education can only influence voting through the increased SES. Choose the corresponding causal DAG.

Suppose you find in your data that education is associated with voting, conditionally on SES. Choose the corresponding causal DAG.

Suppose you find in your data that education is not associated with voting (unconditionally). Choose the corresponding causal DAG.

None of them.

In DAG (A), education and voting are associated because the effect is mediated by SES. In other words, association flows from education to SES to voting. In DAG (B), education and voting are associated because there is a direct effect of education on voting and there is an effect that is mediated through SES. In DAG (C), education and voting are associated due to the common cause SES. In DAG (D), education and SES independently influence voting.

Section 2: Confounding / common causes

Having information on X allows us to predict Y, but X does not have an effect on Y! Their association is due to a common cause Z, which we can call a confounder.
Backdoor path criterion: a backdoor path is the path between variables without direct arrows between them. And in the analysis we should identify and block the backdoor path to eliminate structural biases.

Confounding vs. Confounder

Image source here

Confounding: absolute concept (exist/not exist in the structure)
Confounder: relative concept (relative label of variables)

Exercise 2.1

Which DAG is consistent with the following conclusion? To determine the total effect of education on voting, we should adjust for SES in our statistical analysis.

C is correct. In DAG (C), adjusting for SES will block the confounding path that is education to SES to vote. In both DAGs (A) and (B), conditioning on SES will block the effect of education on voting that is mediated by SES.

Exercise 2.2

Given the causal DAG, the association between C and D is eliminated if we:

A) Condition on A
B) Condition on B
C) Either of the above
D) Neither of the above

C. There is a path from C to B to A then to D. You can block this path by conditioning on A or B.

Exercise 2.3

When is variable A considered a confounder, and when is it not?

When estimating the effect of B on D, we should adjust for A, and it is a confounder. When estimating the effect of A on D, it is not a confounder.

Exercise 2.4

Randomization, when possible, is the preferred approach to eliminating confounding.

P.s.: Randomization is the process of assigning subjects to different groups in a random way so that each subject has an equal chance of being assigned to any group in the study. (Though in many cases it can be very hard/expensive/impossible to run a randomized trial…)

True
False

True. Besides statistical methods to control for confounding, randomization will also eliminate confounding by even unmeasured variables.

Section 3: Selection bias / common effects

Given the common effect of Z and X on Y, Y can be called a collider (the arrows collide into it).
A collider by itself blocks the flow of association along the path.
Be careful: if we condition on a collider, the backdoor path will be opened! In this case, we may observe an association between Z and X, even though Z does not cause X (resulting in selection bias)

A random real-life example of selection bias

Confounder vs. Collider

Let’s use a square around the variable to indicate that we are conditioning on it:
What if we condition on P? Do we expect to find a conditional association between Z and X?
Yes. P is not a collider, but it is a common effect of Z and X, or to say, P is the effect of the collider. Selection bias arises when we 1) condition on a collider, or 2) condition on the variables affected by a collider.

Exercise 3.0

Selection bias is everywhere!

True
True
True
True

Sorry I get emotional on this topic (wrong controls/self-selection/attrition/non-response/missing data/volunteer/………..)…

Image source here

Exercise 3.1

Which of the following situations is expected to result in bias? (check all that apply)

A) The existence of a common cause
B) The existence of a common effect
C) Conditioning on a common effect
D) Conditioning on a common cause

A and C. A describes the structure of confounding, and C describes the structure of colliding (selection bias).

Exercise 3.2

Which of the following situation is expected to result in a biased association between disease 1 and disease 2?

D. We conditioned on the collider, which opened the backdoor path between disease 1 and disease 2.

Exercise 3.3

Selection bias can arise in randomized experiments.

True
False

True. E.g., volunteering/fail to follow-up/…

Exercise 3.4

When a treatment T has no actual causal effect on an outcome Y, selection bias occurs when conditioning on:

A) a common effect of T and Y
B) a cause shared by T and Y
C) an effect of Y that is independent of T
D) an effect of T that is independent of Y

A. Let’s draw it together.

Exercise 3.5

Which of the following situations may have selection bias for the effect of A on X? (choose all that apply)

C and D. Selection bias arises from conditioning on a collider (DAG C) or conditioning on a variable that is a child of a collider (DAG D).

Exercise 3.6.1

Given the causal DAG below, conditioning on C will introduce selection bias for the causal effect of B on D.

True
False

True.

Exercise 3.6.2

Let’s fit an example into the previous DAG, and answer the following question.

There is an association between child Arts education and bullying behavior.

True
False

True.

Adjusting for either Parent SES or peer influence would eliminate the bias when we study the causal effect of child Arts education on bullying.

True
False

False.

Section 4: Directional separation rules (D-separation)

Rule 1. If there are no variables being conditioned on, a path is blocked if and only if two arrows on the path collide into some variables on the path.

E.g.,
- Y is a collider on the path 1: Z → Y ← X, but it is not a collider on the path 2: Z → Y → P. So the path 1 is blocked, not the path 2
Rule 2. Any path that contains a noncollider that has been conditioned on is blocked.

E.g.,
Rule 3. A collider that has been conditioned on opens a previously blocked path.

E.g.,
Rule 4. A collider that has a descendant (child) that has been conditioned on opens a previously blocked path.

E.g.,

Exercise 4.1

Structural associations persist regardless of the increase in sample size.

True
False

True. Associations due to chances become smaller with increased sample size, but structural associations remain.

Exercise 4.2

Which of the following structures show that A and X are d-seperated? Choose all applied.

C D.

In DAG (A), they are not d-separated because there is a path from A to X. In DAG (B), A and B are d-separated conditioning on X, but A and X are not. In DAG (C) and (D), A and X are d-separated because of a collider B, and we do not condition on it. In DAG (E), we condition on the collider B, opening the path between A and X. In DAG (F), we condition on the effect of the collider B, which is L, again opening the path between A and X.

Exercise 4.3

Consider a variable L is associated with the treatment X, and the outcome Y, and L is not on a causal pathway from X to Y. Adjustment for such a variable will always reduce bias.

True
False

False. L can also be a collider, and adjusting for such a variable will actually introduce bias, rather than reduce it. (Let’s draw it?)

Exercise 4.4 * Outcome-based selection

Given the causal DAG, B and C are d-separated.

True
False

False. We see a backdoor path B ← A → [D] ← C.

2. To determine the causal effect of B on C, A must be adjusted for.

True
False

True. But if D was not conditioned on, we no longer need to condition on A.

Section 5: Unmeasured confounders

We want to study the direct effect of education on life satisfaction. In this DAG, we can simply block the indirect path by conditioning on the mediator occupation.

But if there is an unmeasured common cause, occupation becomes a collider. If we condition on it, we open the backdoor path; however, if we don’t condition on it, the path is blocked:
In this case, education can still be associated with satisfaction even if we condition on occupation.
What to do?

Exercise 5.1

Systematic bias is an association between the treatment X and the outcome Y that does not arise from the causal effect of X on Y.

True
False

True. This can arise from having a common cause (confounding), or conditioning on a common effect (selection bias).

Exercise 5.2

Choose the causal DAG that has an opened backdoor path between poverty and HIV infection.

B is correct, as death is a collider that is conditioned on, opening the backdoor path between HIV infection and poverty. In DAG (D) there is also an opened backdoor path, as the confounder poverty is not conditioned on. But it is a backdoor path between HIV infection and death, not the variables in question.

Exercise 5.3

Think, which variable may commonly cause vitamin intake and health condition?

Education / age / …

Exercise 5.4

Given the causal DAG below, choose the correct statement.

A) We can identify the causal effect of HIV infection on death if we correctly adjust/control for poverty in our analysis
b) We can identify the causal effect of HIV infection on death if we restrict our study population to those not in poverty
C) We cannot identify the causal effect of HIV infection on death if poverty is not measured
D) All of the above

D, all of them are correct.

Exercise 5.5

According to the causal DAG below, which of the following statements is correct? (Note: U is an unmeasured variable)

A) There is no open backdoor path between X and Y
B) We should adjust for/condition on Z in order to eliminate confounding
C) There is no way to eliminate confounding because of the unmeasured common cause U
D) All of the above

B. Although U is unmeasured, we can still block the backdoor path between X and Y by conditioning on Z.

Section 6: More on U

Identify confounding and colliding structures involved in the graph
What should we do to correctly study the effect of A on B?
Solution: no confounding, no action is needed. But if we condition on H, we open the backdoor path between A and B.

Proxy confounder

Condition on L (highly associated to U) will not eliminate all confounding caused by U, but it can proxy it and partially eliminate the bias.

Exercise 6.1

Given the causal DAG:

6.1.0 What adjustment needed for studying the effect of attitudes toward immigration on voting behavior?
6.1.1 What about this?
6.1.2 What about this?

Discuss with your peers first, and we will look at each together.

Exercise 6.2

In which of the following DAGs, is bias present when estimating the effect of treatment X on outcome Y?

B, C, E.

Explain to your peers, why and how?

Section 7: Minimally sufficient set / model parsimony

\[ Y = \beta_0 + \beta_xX + \beta_1Z_1 + \beta_2Z_2 + \beta_3Z_3 \]
\[ Y = \beta_0 + \beta_xX + \beta_1Z_1 + \beta_2Z_2 \]

Exercise 7 :)

In the given causal DAG, is there any bias in estimating the effect of X on Y? If so, how can it be eliminated while considering model parsimony?

Condition on \(A_1\) and L, or on \(A_2\) and L.

Section 8: More complex DAGs

Think more clearly what you want to study, do you really need that many variables?

More assumptions are involved, which can be more subjective to researchers.
More alternative paths could be argued, and other researchers may disagree.

Exercise 8

Given the DAG, answer the questions.

To estimate the effect of E on M, which variable(s) should we condition on?

To estimate the total effect of C on M, which should we condition on?

Once we mistakenly condition on L in our model, what pathways have we opened that bias the causal effect of E on M?

Everything…

Section 9: Measurement bias in DAGs

We add an asterisk (star) to a variable to indicate potential error in the measurement. E.g., the true variable \(A\) and the measured variable with error (observed variable) \(A*\).

\(A ≠ A*\)

9.1 Nondifferential independent error

Given the DAG below:

A = true treatment; L = true outcome; A* = measured treatment; L* = measured outcome; \(U_A\) = unmeasured determinants of A; \(U_L\) = unmeasured determinants of L.

\(U_A\) and \(U_L\) are associated.

True
False

False. The path is blocked by unconditioned colliders.

A* is independent of L*.

True
False

False. The path A* ← A → L → L* is open.

The association between A* and L* is an unbiased estimate of the causal effect of A on L.

True
False

False. The association between them does not represent the causal effect due to measurement bias.

Example 1

9.2 Nondifferential dependent error

Example 2

9.3 Differential independent error

Example 3

9.4 Differential dependent error

Example 4

Exercise 9.1

Measurement error occurs when, for some individuals, a variable’s measured value is not same to its actual value.

True
False

True

Individuals may misunderstand and misclassify themselves in answering some survey questions.

True
False

True

Exercise 9.2

Given the DAG below, when the true effect of A on L is null, there is no biased association between the measured A* and L*.

True
False

True. No paths between them.

Given the DAG below, when the true effect of A on L is null, there is no biased association between the measured A* and L*.

True
False

False. The common cause \(U_{AL}\) opens the backdoor path.

Given the DAG below, when the true effect of A on L is null, there is no biased association between the measured A* and L*.

True
False

False. Path through A* ← A → \(U_L\) → L*.

Given the DAG below, when the true effect of A on L is null, there is no biased association between the measured A* and L*.

True
False

False. Path through A* ← A → \(U_L\) → L* and through A* ← \(U_A\) ← \(U_{AL}\) → \(U_L\) → L*

Last thingy…

Remember we talked about proxy confounder in Section 6?

L is a mismeasured/biased version of U!
Although conditioning on L may help eliminate some bias in estimating effect of A on B, it might also open up other unexpected backdoor paths and introduce additional errors:

Last exercise..s

Given the DAG below, choose the correct answers.

Measurement error for L (\(U_L\)) and Y are d-separated.

True
False

True. There is no open path.

When using the association between A* and Y* to estimate the effect of A on Y, there is both confounding and selection bias.

True
False

False. There is confounding and measurement bias.

Drawing a causal DAG can eliminate bias due to confounding.

True
False

False. It helps us to identify structural relationships between variables, but DAG itself does not solve biases.

A causal DAG can represent different kinds of bias.

True
False

True. Confounding/selection/measurement bias.

Drawing a causal DAG can improve external validity.

True
False

False. Causal DAGs help to identify bias to internal validity.

Limitations of DAGs

Our prior knowledge can be wrong: so we need to assess different potential DAGs and be honest about our uncertainty
DAGs do not represent information about magnitude or functional form of causal effects (no moderators)
Feedback loops, time-ordering must be explicitly represented on DAGs: it can quickly become overwhelmingly complicated and confusing

Bonus - Moderation: effect magnitudes

In-class assignment: Building your own causal DAG

You can use any program to draw your causal DAG (or use a pen and paper).

E.g., somewhere online like: DAGitty, or in draw.io

Draw the first causal DAG of your research project
Think alternative paths and draw a second DAG
Upload your DAG (screenshot/photo) to our Moodle [Research Design] LAB section, here:

Review your peers uploaded DAGs, and leave comments (at least one)

Acknowledgement

Image source here

The slides are prepared based on and inspired by

the excellent course on edX online learning platform: Causal Diagrams: Draw Your Assumptions Before Your Conclusions
The Book of Why: The New Science of Cause and Effect

If you are interested in causal inference and DAGs, highly recommend you to check them out for more details.