Research question: Does vitamin D deficiency increase risk of severe COVID-19?
Task A: Identify the Variables
For each variable below, classify it as:
C = Confounder (must adjust)
M = Mediator (do not adjust)
Co = Collider (do not adjust)
P = Precision variable (optional)
? = Unclear/depends on causal model
Variable
Classification
Reasoning
Age
Sex
Race/ethnicity
BMI
Diabetes
Smoking
Season
Kidney disease
Task B: Selection Strategies
Three analysts approach this differently:
Analyst 1: Uses stepwise selection (p<0.05 to enter, p>0.10 to remove)
Analyst 2: Adjusts for everything in the table above
Analyst 3: Draws a DAG and identifies confounders, adjusts only for those
Questions:
What are the pros and cons of each approach?
Which is most appropriate for a causal question?
If this were a prediction model instead, would your answer change?
Exercise 4: The Change-in-Estimate Approach
You’re studying the effect of sleep duration on type 2 diabetes risk.
Starting Model
Base model: Sleep duration → Diabetes
Crude OR = 1.45 (95% CI: 1.25-1.68)
Adding Variables One at a Time
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 4.0.1 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Effect of Adding Variables (One at a Time)
Model
OR
CI
Change
Crude
1.45
1.25-1.68
—
+ Age
1.38
1.18-1.61
4.8%
+ Sex
1.44
1.24-1.67
0.7%
+ BMI
1.22
1.04-1.43
15.9%
+ Physical activity
1.36
1.16-1.59
6.2%
+ Diet quality
1.41
1.21-1.64
2.8%
+ Depression
1.32
1.13-1.54
9.0%
+ Shift work
1.28
1.09-1.50
11.7%
Questions:
Using a 10% change threshold, which variables would you keep?
Why might BMI cause such a large change?
Could BMI be a mediator? How would you decide?
What about shift work - confounder or mediator?
Should you adjust for all variables that cause >10% change?
Exercise 5: Well-Defined Interventions
Evaluating Research Questions
For each research question, determine:
Is the exposure well-defined?
Can you describe a hypothetical trial?
If not well-defined, what’s the problem?
How would you reframe the question?
Question 1
“What is the effect of depression on cardiovascular disease?”
Analysis:
Well-defined exposure? (Yes/No): _____
Hypothetical trial:
Problems:
Better question:
Question 2
“Does high LDL cholesterol cause myocardial infarction?”
Analysis:
Well-defined exposure? (Yes/No): _____
Hypothetical trial:
Problems:
Better question:
Question 3
“What is the effect of a Mediterranean diet intervention (as defined by specific food groups and quantities) on 5-year cardiovascular mortality?”
Analysis:
Well-defined exposure? (Yes/No): _____
Hypothetical trial:
Problems (if any):
Is this a good question?
Question 4
“Does C-reactive protein (CRP) level cause increased mortality?”
Analysis:
Well-defined exposure? (Yes/No): _____
Hypothetical trial:
Problems:
Better question:
Exercise 6: Critique Published Studies
Instructions
Find 3 papers from your field that use multivariable regression. For each, complete the following analysis:
Paper 1
Citation:
Research Question:
Model Type (Causal/Prediction/Association):
Variable Selection Method:
Strengths:
Weaknesses:
What would you do differently?
Critical Evaluation Checklist
Use this checklist when evaluating papers:
For ALL models:
For CAUSAL models specifically:
For PREDICTION models specifically:
Exercise 7: The Many Analysts Problem
Simulation Activity
In groups of 3-4, you’ll each analyze the same simulated dataset to answer:
“Does drug X reduce the risk of outcome Y?”
Available Variables
Treatment (drug X vs placebo)
Age (continuous)
Sex (M/F)
Baseline disease severity (mild/moderate/severe)
Comorbidity count (0-5)
Smoking status (current/former/never)
BMI (continuous)
Blood pressure (continuous)
Cholesterol level (continuous)
Previous hospitalizations (count)
Rules
Each person chooses their own:
Which confounders to adjust for
Whether to include interactions
How to handle continuous variables (linear, categories, splines)
Which model to use (logistic, Cox, etc.)
Record your effect estimate and 95% CI
Compare with your group
Discussion Questions
Did you get the same answer? Why or why not?
Which approach seems most defensible?
How would you resolve disagreements?
What does this teach us about analytic flexibility?
Exercise 8: Building a DAG for Your Research
Your Own Research Question
Take a research question from your own work or thesis.
Step 1: Define the Question
Exposure:
Outcome:
Is the exposure well-defined?
Can you describe a trial?
Step 2: List Variables
List all variables you think are relevant:
Confounders:
Potential Mediators:
Potential Colliders:
Other (precision variables, effect modifiers):
Step 3: Draw Your DAG
(Space for drawing or paste image)
Step 4: Identify Adjustment Set
Minimal sufficient adjustment set:
Variables to definitely NOT adjust for:
Uncertain about:
Step 5: Sensitivity Analyses
What unmeasured confounders might be important?
How would you assess robustness?
Exercise 9: Model Building Decisions
Decision Tree Activity
For the research question: “Does prenatal vitamin use reduce risk of neural tube defects?”
Work through this decision tree:
START: Research Question
|
v
Is exposure well-defined?
|
+-- No --> Reframe question
|
+-- Yes
|
v
What's your goal?
|
+-- Causal effect
| |
| v
| Draw DAG
| |
| v
| Identify confounders
| |
| v
| Adjust for confounders only
| |
| v
| Check positivity
| |
| v
| Sensitivity analyses
|
+-- Prediction
| |
| v
| Define clinical use case
| |
| v
| Identify candidate predictors
| |
| v
| Split data (train/test)
| |
| v
| Build model (with CV)
| |
| v
| Validate externally
|
+-- Exploration/Association
|
v
Univariate screening
|
v
Adjust for multiple comparisons
|
v
Avoid causal language
Questions:
Walk through each branch - what decisions do you make?
How would your approach differ for each goal?
What checks and balances are built in?
Exercise 10: Teaching Exercise
Explain to a Colleague
Practice explaining these concepts in simple terms:
Concept 1: Consistency Condition
Explain why “the effect of obesity on mortality” is poorly defined:
(Write a 3-4 sentence explanation that a non-statistician could understand)
Concept 2: Confounding vs Mediation
Use a diagram to show the difference between a confounder and a mediator:
(Draw or describe)
Concept 3: Collider Bias
Give a real-world example of how adjusting for a collider can create bias:
(Explain with a specific scenario)
Concept 4: Prediction vs Causation
Explain to a clinician why their predictive model can’t tell them what causes the outcome:
(3-4 sentences)
Additional Resources
Online Tools
DAGitty (dagitty.net)
Interactive DAG drawing
Automatic identification of adjustment sets
Testable implications
Causal Fusion (causalfusion.net)
Teaching tool for causal concepts
Interactive examples
R Packages
# Install key packagesinstall.packages(c("dagitty", # DAG creation and analysis"ggdag", # Beautiful DAG plotting"gt", # Great tables"broom", # Tidy model outputs"performance"# Model checking))
Recommended Papers
On Model Building:
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289-310.
Greenland, S. (2008). Invited commentary: variable selection versus shrinkage in the control of multiple confounders. Am J Epidemiol, 167(5), 523-529.
On Well-Defined Interventions:
Hernán, M. A. (2016). Does water kill? A call for less casual causal inferences. Ann Epidemiol, 26(10), 674-680.
Robins, J. M., & Hernán, M. A. (2009). Estimation of the causal effects of time-varying exposures. Longitudinal Data Analysis, 553-599.
On DAGs:
Textor, J., et al. (2016). Robust causal inference using directed acyclic graphs: the R package ‘dagitty’. Int J Epidemiol, 45(6), 1887-1894.
Datasets for Practice
NHANES - National Health and Nutrition Examination Survey
Complex survey data
Many variables for practicing adjustment strategies
Framingham Heart Study (teaching dataset)
Classic cardiovascular risk factors
Good for causal inference exercises
UCI Machine Learning Repository
Many datasets for prediction modeling practice
Answer Key (All Exercises)
TipAbout the DAG Solutions
Solutions for exercises involving DAGs include actual R code using dagitty and ggdag packages.
You can: - Run the code yourself to see the DAGs - Modify the DAG structures to test alternatives - Use adjustmentSets() function to verify minimal sufficient adjustment sets - Compare your DAG to the solution
Color scheme: - Blue = Exposure | Red = Outcome | Green = Confounder | Orange = Mediator | Purple = Collider
Exercise 1 - Model Types
Scenario A: Physical Activity and Dementia
Model type: Causal
Reasoning: The question asks “would reduce” - this is asking about the effect of an intervention
Key considerations: Well-defined intervention possible (exercise program), long follow-up needed, many potential confounders
Scenario B: Hospital Readmission
Model type: Prediction
Reasoning: Goal is to identify high-risk patients, not to understand causes
Key considerations: Need actionable timeframe, validation essential, predictors must be available at discharge
Scenario C: Diet and Screening
Model type: Association/Exploratory
Reasoning: “Exploring” suggests hypothesis generation, not causal inference
Do NOT adjust for: - Blood pressure (B) - mediator on pathway from coffee to heart disease - Healthcare access (P) - potential collider caused by both education and heart disease diagnosis
Why this adjustment set works: Age, Smoking, and Education block all backdoor paths between Coffee and Heart Disease without blocking the causal pathway or opening collider bias.
Code
# Verify adjustment set using dagittyadjustmentSets(coffee_dag)
{ A, P, S }
{ A, E, S }
Exercise 2B - Obesity and Mortality
1. Four Mechanisms Leading to Obesity:
Dietary patterns (high calorie, high sugar/fat intake)
Physical inactivity (sedentary lifestyle, lack of exercise)
Genetics → Mortality: Yes (same genes may affect disease susceptibility)
Medical conditions → Mortality: Yes (diseases themselves affect mortality)
3. DAG Showing Obesity as Common Outcome:
Variables:O = Obesity (blue, exposure) | M = Mortality (red, outcome) | D = Diet (green, confounder) | E = Exercise (green, confounder) | G = Genetics (green, confounder) | I = Illness (green, confounder)
Key insight: All pathways to obesity (D, E, G, I) are confounders because they also independently affect mortality. This creates multiple backdoor paths that are difficult to measure and adjust for.
4. Why “Effect of Obesity on Mortality” is Poorly Defined:
Problem 1: We don’t know which mechanism(s) led each person to their current BMI
Problem 2: Different mechanisms may have different mortality effects (e.g., obesity from overeating vs from medication)
Problem 3: Cannot adjust for all mechanisms (especially genetic/physiological ones we can’t measure)
Problem 4: Even if measured, adjusting for them leaves only “residual” obesity, not a meaningful causal effect
5. Two Better-Defined Research Questions:
“What is the effect of a Mediterranean diet intervention on 10-year cardiovascular mortality?”
Well-defined: specific dietary pattern
Implementable in RCT
Clear mechanism
“What is the effect of a structured exercise program (150 min/week moderate activity) on all-cause mortality in middle-aged adults?”
Well-defined: specific exercise prescription
Implementable in RCT
Measurable adherence
Exercise 3 - Vitamin D and COVID-19
Task A: Variable Classifications
Variable
Classification
Reasoning
Age
C
Common cause of both vitamin D status and COVID severity
Sex
C
Affects vitamin D metabolism and COVID outcomes
Race/ethnicity
C
Affects vitamin D levels (skin pigmentation) and COVID risk (socioeconomic factors)
BMI
?
Could be confounder OR mediator - depends on causal model
Diabetes
?
Could be confounder (causes low vit D) OR mediator (vit D affects diabetes)
Smoking
C
Affects vitamin D levels and COVID severity
Season
C
Affects vitamin D (sun exposure) but not COVID severity directly
Kidney disease
M
Likely mediator - vitamin D may affect kidney function; kidney disease affects COVID
Note: BMI and diabetes are ambiguous - need to draw DAG to clarify roles!
Task B: Comparing Three Analysts
Analyst 1: Stepwise selection - Pros: Simple, automated - Cons: ❌ Atheoretical; may remove confounders; inflated Type I error; NOT appropriate for causal question - Verdict: Inappropriate for causal inference
Analyst 2: Adjust for everything - Pros: Thorough - Cons: ❌ Likely includes mediators (kidney disease); loses power; may include colliders - Verdict: “Kitchen sink” - not appropriate
If this were a prediction model: - Analyst 1 or 2 might be acceptable IF using cross-validation - Analyst 3’s approach wouldn’t be necessary (don’t care about confounding) - Would focus on discrimination/calibration instead
Exercise 4 - Change-in-Estimate: Sleep and Diabetes
Using 10% threshold:
Keep these variables: - BMI (15.9% change) ✓ - Shift work (11.7% change) ✓
Could argue for: - Depression (9.0% change - close to threshold)
BMI is likely a mediator: - Sleep duration → BMI → Diabetes - Short sleep may cause weight gain - Weight gain causes diabetes - Adjusting for BMI blocks part of the causal pathway
Question: Should we adjust for BMI? - If interested in total effect of sleep: NO - If interested in direct effect (not through BMI): Maybe, but need mediation analysis
Is BMI a mediator?
Evidence it’s a mediator: - Sleep affects weight - Weight affects diabetes - On the causal pathway
How to decide: - Draw a DAG showing temporal relationships - Consider: Does sleep affect BMI? (Yes) - Does BMI affect diabetes? (Yes) - Therefore: BMI is a mediator
Shift work - confounder or mediator?
Could be either:
If shift work → sleep duration → diabetes: - Shift work is a confounder (causes both exposure and outcome) - Should adjust
If sleep duration → shift work (unlikely) → diabetes: - Shift work is a mediator - Should not adjust for total effect
Most likely: Shift work is a confounder - it causes people to sleep less AND independently affects diabetes risk (circadian disruption)
Should you adjust for all variables >10% change?
NO!
The 10% rule can identify associations, not necessarily confounders
BMI shows large change but is likely a mediator - shouldn’t adjust for total effect
Need to use causal reasoning (DAG), not just statistical criteria
This is why change-in-estimate is not recommended - it can mislead!
Exercise 5 - Well-Defined Interventions
Question 1: Depression
Well-defined? No
Problems: Depression is not an intervention; many ways to treat/prevent depression (CBT, SSRIs, exercise, etc.)
Better: “Does cognitive behavioral therapy reduce risk of CVD in adults with major depression?”
Question 2: LDL Cholesterol
Well-defined? No
Problems: Multiple ways to lower LDL (statins, diet, ezetimibe, PCSK9 inhibitors, each may have different effects)
Better: “Does statin therapy (vs placebo) reduce MI risk in adults with LDL >130 mg/dL?”
Question 3: Mediterranean Diet
Well-defined? Yes!
Problems: None - this is well-specified
Good because: Specific intervention described, could implement in trial, clear definition
Question 4: CRP
Well-defined? No
Problems: CRP is a biomarker, not an intervention; no way to specifically target CRP
Better: Either (a) study CRP as a predictor of outcomes, or (b) study interventions that affect CRP (e.g., “Does aspirin reduce CVD in adults with elevated CRP?”)
Exercise 6 - Critique Published Studies
Sample critique for a hypothetical paper:
Paper 1: “Association between coffee consumption and Type 2 diabetes”
Citation: [Example]
Research Question: Does coffee consumption reduce diabetes risk?
Model Type: Claims to be causal, but analysis suggests association/prediction hybrid
Variable Selection Method: - Started with 50 variables - Used stepwise selection (p<0.05 to enter) - Final model: 8 variables
Strengths: - Large sample size (n=50,000) - Long follow-up (20 years) - Validated coffee assessment
Weaknesses: - ❌ Used stepwise selection for “causal” inference - ❌ No DAG presented - ❌ Likely adjusted for mediators (BMI, glucose) - ❌ No sensitivity analyses - ❌ Causal language but inappropriate methods
What I would do differently: 1. Draw a DAG identifying confounders 2. Pre-specify adjustment variables based on DAG 3. Do NOT use stepwise selection 4. Conduct sensitivity analyses for unmeasured confounding 5. Be more careful about causal language or clearly state this is exploratory
Exercise 7 - Many Analysts Problem
Expected outcomes from simulation:
Discussion Questions - Sample Answers:
1. Did you get the same answer?
Probably not! Even with the same data, different decisions lead to different results: - Different confounders selected - Different categorization of continuous variables (e.g., age as continuous vs categories vs splines) - Different interaction terms - Different model types (logistic vs Cox with different baseline hazards)
2. Which approach seems most defensible?
The one that: - Has clear theoretical justification for variable selection - Uses DAG to identify confounders - Pre-specified the analysis approach - Includes appropriate sensitivity analyses - Acknowledges limitations
3. How would you resolve disagreements?
Examine the DAGs each person drew
Discuss which confounders are most important based on subject knowledge
Check if results are similar across reasonable specifications (robustness)
Consider presenting multiple models with different assumptions
Be transparent about the analytic choices made
4. What does this teach us?
Many reasonable decisions must be made in any analysis
Results depend on these decisions, not just on the data
Transparency is essential
Pre-specification helps limit researcher degrees of freedom
There’s rarely one “right” analysis
Uncertainty in results comes from analytic choices, not just sampling variability
Exercise 8 - Your Own Research
This is individualized - no single answer. But here are evaluation criteria:
Good DAG characteristics:
✅ Exposure and outcome clearly identified
✅ All major confounders included
✅ Mediators identified and noted
✅ Colliders identified and avoided
✅ Arrows represent causal relationships, not just associations
✅ Temporal ordering makes sense
✅ Based on subject matter knowledge, not data
Red flags to watch for:
❌ Too many variables (probably missing structure)
❌ No clear confounders identified
❌ Mixing confounders and mediators
❌ Exposure is ill-defined (biomarker, physiological measure)
❌ Arrows based on statistical associations rather than causal beliefs
Getting feedback:
Share with advisor/mentor
Present to lab group
Check against published DAGs in your field
Revise based on feedback
Remember: DAGs represent beliefs, can be wrong, should be revised
Exercise 9 - Prenatal Vitamin and Neural Tube Defects
Walking Through the Decision Tree:
START: Research Question “Does prenatal vitamin use reduce risk of neural tube defects?”
Is exposure well-defined? ✅ Yes - Prenatal vitamins are a specific intervention (can specify dose, timing, formulation)
What’s your goal? → Causal effect (does vitamin use CAUSE reduction in NTDs?)
Path: Causal Effect
Step 1: Draw DAG
Variables:V = Prenatal Vitamin Use (blue, exposure) | N = Neural Tube Defects (red, outcome) | A = Maternal Age (green, confounder) | S = SES (green, confounder) | P = Planned Pregnancy (green, confounder) | D = Dietary Folate (green, confounder) | F = Folate Levels (orange, mediator) | H = Homocysteine (orange, mediator) | C = Prenatal Care Visits (purple, collider)
Must adjust for: - Maternal age (A) - affects vitamin use AND NTD risk - SES (S) - affects vitamin access AND healthcare/nutrition - Planned pregnancy (P) - affects vitamin use AND prenatal care - Baseline dietary folate intake (D)
Step 3: Adjust for confounders only - Do NOT adjust for: folate levels, homocysteine (mediators) - Do NOT adjust for: prenatal care visits (potential collider)
Step 4: Check positivity - Are there women in all confounder strata who take vitamins? - Are there women in all strata who don’t? - May have positivity violations in planned pregnancies (almost all take vitamins)
Step 5: Sensitivity analyses - Vary definitions of exposure (timing, dose) - Test different adjustment sets - Assess impact of unmeasured confounding (E-value) - Stratify by planned vs unplanned pregnancy
Exercise 10 - Teaching Exercise
Concept 1: Consistency Condition
Explain to a non-statistician:
“Imagine we want to know if obesity causes early death. The problem is that there are many ways to become obese - overeating, not exercising, certain medications, genetic factors. Each of these might affect your health differently, even if they all lead to the same weight. So when we compare people who are obese to people who aren’t, we’re not comparing one thing - we’re comparing a complex mix of different paths to obesity. That’s why researchers say we need to study specific interventions like ‘Mediterranean diet’ or ‘exercise programs’ rather than obesity itself.”
Concept 2: Confounding vs Mediation
Diagram:
CONFOUNDER:
Age
↙ ↘
Exercise → Mortality
Age causes both exercise level AND mortality risk (older people exercise less AND have higher mortality). Must adjust.
MEDIATOR:
Exercise → Weight Loss → Mortality
Exercise causes weight loss, which causes lower mortality. Weight loss is ON the pathway. Adjusting blocks the effect we want to measure.
Concept 3: Collider Bias
Real-world example:
“Imagine you’re studying whether exercise affects heart disease mortality. You decide to adjust for ‘being hospitalized’ thinking it’s a confounder. But hospitalization is actually caused by BOTH exercise (less exercise → more hospitalizations) AND by underlying severe disease (which also causes death).
When you condition on hospitalization (look only at hospitalized people), you create a spurious association between exercise and underlying disease severity. Among hospitalized patients, those who exercise must be sicker (otherwise why are they hospitalized despite exercising?). This makes exercise look harmful when it’s actually protective!
This is called collider bias - adjusting for a common effect opens up a backdoor path that wasn’t there before.”
Concept 4: Prediction vs Causation
Explain to a clinician:
“Your predictive model is excellent at identifying which patients will be readmitted - it’s like a weather forecast that accurately predicts rain. But just like a weather forecast doesn’t tell you HOW to prevent rain, your model doesn’t tell you what CAUSES readmission.
For example, your model might include ‘number of prior hospitalizations’ as a strong predictor. But we can’t prevent readmissions by changing the number on someone’s medical chart! The number of prior hospitalizations is a marker of underlying illness, not a cause of future readmissions.
To know what causes readmissions, we’d need a different study design that identifies and adjusts for confounders, excludes mediators, and avoids colliders - which would likely give us a different (and possibly less accurate) prediction model. That’s okay - they serve different purposes!”
Additional Practice Problems
Problem 1: Alcohol and Liver Disease
Scenario: You’re studying the effect of alcohol consumption on liver cirrhosis.
Should you adjust for hepatitis C? Why or why not?
Should you adjust for coffee? Why or why not?
Is “alcohol consumption” well-defined enough for causal inference?
Answers:
1. Draw a DAG:
Variables:A = Alcohol (blue, exposure) | C = Cirrhosis (red, outcome) | G = Age (green, confounder) | E = Education (green, confounder) | B = BMI (green, confounder) | H = Hepatitis C (gray, NOT a confounder) | K = Coffee (orange, mediator - may protect liver)
2. Confounders: Age (G), Education (E), potentially BMI (B)
3. Hepatitis C (H): - If hepatitis causes alcohol use: Confounder, adjust - If hepatitis is unrelated to alcohol use: NOT a confounder, but may want to stratify - Most likely: NOT a confounder (hepatitis doesn’t cause drinking) - Do NOT adjust unless you have evidence it affects alcohol consumption
4. Coffee (K): - Coffee is associated with alcohol (social drinking) - Coffee → Cirrhosis pathway exists (protective effect) - Alcohol → Coffee pathway possible (both are beverages) - This makes coffee a MEDIATOR (on pathway from alcohol to cirrhosis) - Do NOT adjust if interested in total effect of alcohol - Adjusting would block protective pathway through coffee consumption
5. Well-defined? - Better than “obesity” but still some issues - Type of alcohol matters (wine vs spirits) - Pattern matters (daily vs binge) - Better question: “Does reducing alcohol intake from 4+ drinks/day to <1 drink/day reduce cirrhosis risk?”
Problem 2: Statins and Dementia
Scenario: Observational study finds statin users have lower dementia rates.
Possible confounders: - Age, sex, education - Cardiovascular disease - Cholesterol levels - Healthcare utilization - SES
Questions:
Should you adjust for cholesterol levels? Why or why not?
Should you adjust for cardiovascular disease?
What’s the target trial?
Answers:
DAG showing confounding by indication:
Variables:S = Statin Use (blue, exposure) | D = Dementia (red, outcome) | A = Age (green, confounder) | E = Education/SES (green, confounder) | U = Healthcare Utilization (green, confounder) | L = Cholesterol Levels (orange-red, INDICATION - do NOT adjust!) | V = CVD (orange-red, INDICATION - do NOT adjust!)
1. Cholesterol levels (L): - ❌ Do NOT adjust - this is an indication for treatment - People with high cholesterol get statins - Adjusting for indication creates confounding by indication - Instead: Use methods like instrumental variables or restriction
Cardiovascular disease:
Similar to cholesterol - it’s an indication for statins
❌ Do not adjust for indication
Creates selection bias / confounding by indication
Target trial:
Population: Adults 60-75 without dementia or CVD
Intervention: Statin therapy (specify dose/type)
Comparison: Placebo
Outcome: Incident dementia over 10 years
Assignment: Random
This trial would answer the causal question!
Answer Key Summary
Complete Solutions Provided For:
✅ Exercise 1: Identifying Model Types (3 scenarios)
✅ Exercise 2A: Coffee and Heart Disease DAG with dagitty visualization
✅ Exercise 2B: Obesity and Mortality (5 tasks) with DAG visualization
✅ Exercise 3: Vitamin D and COVID-19 (variable classification & analyst comparison)
✅ Exercise 4: Change-in-Estimate - Sleep and Diabetes (5 questions)
✅ Problem 1: Alcohol and Liver Disease with DAG showing confounders vs non-confounders
✅ Problem 2: Statins and Dementia with DAG showing confounding by indication
All DAGs use: - Color-coded nodes (blue=exposure, red=outcome, green=confounders, orange=mediators, purple=colliders) - Single-letter labels overlaid on nodes - dagitty code that students can modify and verify with adjustmentSets()