Yuxin Zhang yuxin.zhang@unitn.it
Research Design Lab, Week 6,
11-14 Nov. 2024
It is the graphical representation of causal relationships by prior expert knowledge.
Nodes: variables of interest
Arrows/directed edges: direction of causality
Acyclic: non-recursive relationships
A DAG is a causal DAG if when two variables on the graph share a cause, and that cause is also represented on the graph (causal Markov condition).
There is no need to include every variable that comes to mind:
M is a child of A and A is a parent of M.
M and B are descendants of A, and A and M are ancestors of B.
M, located in the middle of the path, can be referred to as a mediator.
Paths are sequences of arrows, of any direction, connecting two variables and can be causal or non-causal.
Why not?
Clarify theoretical relationships
Build simultaneously statistical models (independence/associations/… without equations!)
Conceptualize and identify potential biases
Theory-informed models
In a causal DAG with nodes X, Z, and Y, the absence of an arrow from node Z to node Y means:
A) There is no effect of Y on Z
B) There is no effect of Z on Y
C) There is no association between Z and Y
D) There is an association between Z and Y
A DAG is causal if the causes shared by any pair of variables on the graph are also on the DAG.
True
False
If our knowledge is insufficient for us to rule out a direct effect of variable A on variable B, we should then anyway draw an arrow from A to B on the causal DAG.
True
False
Which DAG represents the causal relationship in which X influences Z, Z influences Y, and X also directly influences Y?
Which of them are causal DAGs?
In this causal DAG, there is no association between X and Y, conditioning on Z.
True
False
Select the best causal DAG that represents the following scenario:
Social origin influences a student’s academic performance and, independently, their decision to attend college.
Select the best causal DAG that represents the following scenario:
Social origin influences a student’s academic performance, and therefore, their decision to attend college.
For the following questions, think: how does education level influence voting?
None of them.
In DAG (A), education and voting are associated because the effect is mediated by SES. In other words, association flows from education to SES to voting. In DAG (B), education and voting are associated because there is a direct effect of education on voting and there is an effect that is mediated through SES. In DAG (C), education and voting are associated due to the common cause SES. In DAG (D), education and SES independently influence voting.
Having information on X allows us to predict Y, but X does not have an effect on Y! Their association is due to a common cause Z, which we can call a confounder.
Backdoor path criterion: a backdoor path is the path between variables without direct arrows between them. And in the analysis we should identify and block the backdoor path to eliminate structural biases.
Which DAG is consistent with the following conclusion? To determine the total effect of education on voting, we should adjust for SES in our statistical analysis.
Given the causal DAG, the association between C and D is eliminated if we:
A) Condition on A
B) Condition on B
C) Either of the above
D) Neither of the above
When is variable A considered a confounder, and when is it not?
Randomization, when possible, is the preferred approach to eliminating confounding.
P.s.: Randomization is the process of assigning subjects to different groups in a random way so that each subject has an equal chance of being assigned to any group in the study. (Though in many cases it can be very hard/expensive/impossible to run a randomized trial…)
True
False
Let’s use a square around the variable to indicate that we are conditioning on it:
What if we condition on P? Do we expect to find a conditional association between Z and X?
Yes. P is not a collider, but it is a common effect of Z and X, or to say, P is the effect of the collider. Selection bias arises when we 1) condition on a collider, or 2) condition on the variables affected by a collider.
Selection bias is everywhere!
True
True
True
True
Sorry I get emotional on this topic (wrong controls/self-selection/attrition/non-response/missing data/volunteer/………..)…
Which of the following situations is expected to result in bias? (check all that apply)
A) The existence of a common cause
B) The existence of a common effect
C) Conditioning on a common effect
D) Conditioning on a common cause
Which of the following situation is expected to result in a biased association between disease 1 and disease 2?
Selection bias can arise in randomized experiments.
True
False
When a treatment T has no actual causal effect on an outcome Y, selection bias occurs when conditioning on:
A) a common effect of T and Y
B) a cause shared by T and Y
C) an effect of Y that is independent of T
D) an effect of T that is independent of Y
Which of the following situations may have selection bias for the effect of A on X? (choose all that apply)
Given the causal DAG below, conditioning on C will introduce selection bias for the causal effect of B on D.
True
False
Let’s fit an example into the previous DAG, and answer the following question.
True
False
True
False
Rule 1. If there are no variables being conditioned on, a path is blocked if and only if two arrows on the path collide into some variables on the path.
E.g.,
Rule 2. Any path that contains a noncollider that has been conditioned on is blocked.
E.g.,
Rule 3. A collider that has been conditioned on opens a previously blocked path.
E.g.,
Rule 4. A collider that has a descendant (child) that has been conditioned on opens a previously blocked path.
E.g.,
Structural associations persist regardless of the increase in sample size.
True
False
Which of the following structures show that A and X are d-seperated? Choose all applied.
C D.
In DAG (A), they are not d-separated because there is a path from A to X. In DAG (B), A and B are d-separated conditioning on X, but A and X are not. In DAG (C) and (D), A and X are d-separated because of a collider B, and we do not condition on it. In DAG (E), we condition on the collider B, opening the path between A and X. In DAG (F), we condition on the effect of the collider B, which is L, again opening the path between A and X.
Consider a variable L is associated with the treatment X, and the outcome Y, and L is not on a causal pathway from X to Y. Adjustment for such a variable will always reduce bias.
True
False
True
False
2. To determine the causal effect of B on C, A must be adjusted
for.
True
False
We want to study the direct effect of education on life satisfaction. In this DAG, we can simply block the indirect path by conditioning on the mediator occupation.
But if there is an unmeasured common cause, occupation becomes a collider. If we condition on it, we open the backdoor path; however, if we don’t condition on it, the path is blocked:
In this case, education can still be associated with satisfaction even if we condition on occupation.
What to do?
Systematic bias is an association between the treatment X and the outcome Y that does not arise from the causal effect of X on Y.
True
False
Choose the causal DAG that has an opened backdoor path between poverty and HIV infection.
Think, which variable may commonly cause vitamin intake and health condition?
Given the causal DAG below, choose the correct statement.
A) We can identify the causal effect of HIV infection on death if we correctly adjust/control for poverty in our analysis
b) We can identify the causal effect of HIV infection on death if we restrict our study population to those not in poverty
C) We cannot identify the causal effect of HIV infection on death if poverty is not measured
D) All of the above
According to the causal DAG below, which of the following statements is correct? (Note: U is an unmeasured variable)
A) There is no open backdoor path between X and Y
B) We should adjust for/condition on Z in order to eliminate confounding
C) There is no way to eliminate confounding because of the unmeasured common cause U
D) All of the above
Given the causal DAG:
6.1.0 What adjustment needed for studying the effect of attitudes toward immigration on voting behavior?
6.1.1 What about this?
6.1.2 What about this?
Discuss with your peers first, and we will look at each together.
In which of the following DAGs, is bias present when estimating the effect of treatment X on outcome Y?
Explain to your peers, why and how?
\[ Y = \beta_0 + \beta_xX + \beta_1Z_1 + \beta_2Z_2 + \beta_3Z_3 \]
\[ Y = \beta_0 + \beta_xX + \beta_1Z_1 + \beta_2Z_2 \]
In the given causal DAG, is there any bias in estimating the effect of X on Y? If so, how can it be eliminated while considering model parsimony?
Think more clearly what you want to study, do you really need that many variables?
Given the DAG, answer the questions.
Everything…
We add an asterisk (star) to a variable to indicate potential error in the measurement. E.g., the true variable \(A\) and the measured variable with error (observed variable) \(A*\).
\(A ≠ A*\)
Given the DAG below:
A = true treatment; L = true outcome; A* = measured treatment; L* = measured outcome; \(U_A\) = unmeasured determinants of A; \(U_L\) = unmeasured determinants of L.
True
False
True
False
True
False
or
or
True
False
True
False
True
False
True
False
True
False
True
False
Remember we talked about proxy confounder in Section 6?
L is a mismeasured/biased version of U!
Although conditioning on L may help eliminate some bias in estimating effect of A on B, it might also open up other unexpected backdoor paths and introduce additional errors:
Given the DAG below, choose the correct answers.
True
False
True
False
True
False
True
False
True
False
Our prior knowledge can be wrong: so we need to assess different potential DAGs and be honest about our uncertainty
DAGs do not represent information about magnitude or functional form of causal effects (no moderators)
Feedback loops, time-ordering must be explicitly represented on DAGs: it can quickly become overwhelmingly complicated and confusing
You can use any program to draw your causal DAG (or use a pen and paper).
E.g., somewhere online like: DAGitty, or in draw.io
Draw the first causal DAG of your research project
Think alternative paths and draw a second DAG
Upload your DAG (screenshot/photo) to our Moodle [Research Design] LAB section, here:
The slides are prepared based on and inspired by
the excellent course on edX online learning platform: Causal Diagrams: Draw Your Assumptions Before Your Conclusions
If you are interested in causal inference and DAGs, highly
recommend you to check them out for more details.