Multivariable Model Building

Understanding Different Modeling Goals in Epidemiology

PHTH 6202: Intermediate Epidemiology

2026-01-01

Learning Objectives

By the end of this lecture, you will be able to:

  1. Distinguish between causal, prediction, and association models
  2. Recognize how analytic choices affect study conclusions
  3. Understand the importance of well-defined interventions for causal inference
  4. Apply appropriate model-building strategies for different research questions
  5. Critically evaluate model building approaches in published literature

The Fundamental Question

Before building any model, ask yourself:

What is my research question, and what type of inference do I need?

  • Causal effect estimation?
  • Outcome prediction?
  • Association/exploration?

Different goals → Different strategies → Different interpretations

The Many Analysts Problem: A Natural Experiment in Analytic Variability

The Study Design

Published Study

Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results

R. Silberzahn, E. L. Uhlmann, D. P. Martin, P. Anselmi, et al.

Advances in Methods and Practices in Psychological Science, 2018, Vol. 1(3) 337–356

Twenty-nine teams involving 61 analysts used the same data set to address the same research question: Whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players.

Design: Observational data from four top-tier European leagues (2012-2013 season) - 2,053 players - 146,028 player-game dyads
- 3,147 red cards given

Each team had complete analytic freedom to answer the question.

What Covariates Did Teams Include?

The Results: A Wide Range

Key Takeaways

What This Teaches Us

  1. Analytic flexibility is unavoidable
    • Researchers must make many reasonable decisions
    • Different valid choices lead to different results
  2. Results are contingent on these choices
    • Not just about statistical power
    • Not just about “questionable research practices”
  3. The solution: TRANSPARENCY
    • Expose decisions to scrutiny
    • Allow alternatives to be tested
    • Acknowledge uncertainty

Check Your Understanding 🤔

Question

In the soccer referee study, why did 29 teams get different results from the same data?

A. Some teams made errors in their analysis
B. Different statistical methods are better than others
C. Teams made different but justifiable choices about covariates and methods
D. The dataset was corrupted

Answer: C - The study demonstrates that many subjective but reasonable decisions must be made in any analysis, and these affect outcomes.

Three Types of Regression Models: Understanding Your Analytic Goal

Type 1: Causal Models

Goal

Determine whether a specific predictor is causally related to the outcome

Requirements:

  • ✓ P-values must be accurate
  • ✓ Effect estimates must be “accurate”
  • Confounders must be included
  • ✓ Collinear variables should be excluded (preserve power)
  • ✓ Other significant predictors can be included (enhance power)

Example Question:

Does diet soda consumption cause increased risk of vascular events?

Key Challenge:

Identifying and adjusting for all relevant confounders

Causal Model: Diet Soda Example

Published Study

Diet Soft Drink Consumption is Associated with an Increased Risk of Vascular Events in the Northern Manhattan Study

Hannah Gardener, ScD; Tatjana Rundek, MD PhD; Matthew Markert, MS; Clinton B. Wright, MD, MS; Mitchell S. V. Elkind, MD, MS; Ralph L. Sacco, MD, MS

J Gen Intern Med 27(9):1120–6, 2012

Background: Diet and regular soft drinks have been associated with diabetes and the metabolic syndrome, and regular soft drinks with coronary heart disease.

Objective: To determine the association between soft drinks and combined vascular events, including stroke.

Design: A population-based cohort study of stroke incidence and risk factors.

Participants: N=2,564, 36% men, mean age 69±10, 20% white, 23% black, 53% Hispanic from the Northern Manhattan Study.

Main Measures: We assessed diet and regular soft drink consumption using a food frequency questionnaire and evaluated the association with incident vascular events (ischemic stroke, MI, and vascular death).

Study: Northern Manhattan Study (Gardener et al., 2012)

Diet Soda: Sequential Adjustment Models

Model Variables Included
Model 1 Demographics: age, sex, race/ethnicity, education
Model 2 Model 1 + Behavioral: smoking, physical activity, moderate alcohol, daily diet (total kcal, protein, total fat, saturated fat, carbohydrates, sodium), BMI
Model 3 Model 2 + Vascular risk factors
Model 4 Model 3 + Detailed metabolic: waist circumference, blood sugar, HDL cholesterol, LDL cholesterol, triglycerides, systolic BP, diastolic BP, anti-hypertensive medication use

Strategy: Progressive adjustment to test robustness of association

Diet Soda: Why This Sequence?

Model 1 (Demographics): - Establishes age-adjusted baseline association

Model 2 (+ Behavioral & Diet): - Adjusts for lifestyle confounders and dietary patterns - Tests if association is due to overall diet quality or health behaviors

Model 3 (+ Vascular Risk Factors): - Adjusts for known cardiovascular risk factors (diabetes, hypertension, etc.) - Tests if association is due to underlying disease predisposition

Model 4 (+ Detailed Metabolic): - Adds waist circumference, lipids, blood pressure, glucose - Tests mechanisms - Note: May be over-adjustment if these are mediators!

What type of model is this?

Causal model - trying to estimate causal effect of diet soda consumption

Northern Manhattan: Results Across Models

Key findings: Diet soft drinks show stronger association than regular soft drinks; adjustment for metabolic factors attenuates both associations

Northern Manhattan: Continuous Consumption

Exposure (Continuous)1 Model 1 Model 2 Model 3 Model 4
Regular soft drinks (per serving/wk) 1.00 (0.99-1.02) 1.00 (0.99-1.02) 1.00 (0.99-1.02) 1.00 (0.98-1.02)
Diet soft drinks (per serving/wk) 1.03 (1.02-1.05) 1.03 (1.01-1.05) 1.03 (1.01-1.04) 1.02 (1.00-1.05)
1 Hazard Ratios (95% CI) for each additional serving per week

Interpretation: Each additional diet soft drink per week associated with ~3% increased risk (Model 1), attenuating slightly with full adjustment (~2% in Model 4)

DAG Color Key & Labels

Throughout this lecture, DAGs use consistent colors and single-letter labels

Color Scheme:

  • Blue = Exposure (what we’re interested in)
  • Red = Outcome (what we’re measuring)
  • Green = Confounder (must adjust for)
  • Orange = Mediator (on causal pathway)
  • Purple = Collider (must NOT adjust for)

Single-Letter Labels:

To keep DAGs readable, each variable gets a single letter: - E = Exposure (or Exercise) - Y or M = Outcome (e.g., Mortality) - A = Age - C = Confounder - etc.

Full definitions appear below each DAG

Understanding the DAG for Diet Soda

Variables: D = Diet Soda (blue, exposure) | V = Vascular Events (red, outcome) | A = Demographics (green, confounder) | H = Health Behaviors (green, confounder) | Q = Diet Quality (green, confounder) | R = Vascular Risk Factors (green, confounder) | B = BMI (orange, mediator) | M = Metabolic Factors (orange, mediator)

Active Exercise: Identifying Confounders

Your Turn 📝

Consider a study of physical activity and mortality.

Which variables would you adjust for? Why?

  • Age?
  • Sex?
  • Smoking status?
  • Education?
  • Body mass index (BMI)?
  • Chronic diseases?
  • Diet quality?
  • Genetic factors?

Discussion Points

  • Which are confounders (common causes)?
  • Which might be mediators (on the causal pathway)?
  • Which might be colliders (common effects)?
  • How does this affect your adjustment strategy?

Type 2: Prediction Models

Goal

Build a model that can accurately predict outcomes for individual patients

Requirements:

  • ✓ Predictors should be easy to measure
  • ✓ Measured in advance of outcome
  • ✓ Predictions should be actionable
  • ✓ Enhance sensitivity and specificity (or R²)
  • ✓ Parsimonious (few predictors)
  • ✓ Make clinical sense

Example Question:

Can we predict which patients discharged on outpatient antibiotics will be readmitted within 30 days?

Key Difference from Causal:

We DON’T care if associations are causal!

Prediction Example: Hospital Readmissions

Published Study

Prediction Model for 30-Day Hospital Readmissions Among Patients Discharged Receiving Outpatient Parenteral Antibiotic Therapy

Genève M. Allison, Eavan G. Muldoon, David M. Kent, Jessica K. Paulus, Robin Ruthazer, Aretha Ren, and David R. Snydman

Clinical Infectious Diseases 2014;58(6):812–819

ABSTRACT

Background: Factors associated with readmission for patients prescribed outpatient parenteral antibiotic therapy (OPAT) at hospital discharge have not been definitively identified. The study aim was to develop a model of 30-day readmissions for OPAT patients.

Methods: A database comprising 782 OPAT patients treated between 2009 and 2011 at a single academic center was created. Variables collected included patient demographics, comorbidities, infections, and antibiotic classes. Final model discrimination was assessed using the c-statistic, and calibration was examined graphically.

Results: Mean patient age was 58 years (range, 18–95 years), 43% were women, and the most common diagnoses were bacteremia (24%), osteomyelitis (20%), and pyelonephritis (13%). The unplanned 30-day readmission rate was 26%. The leading indications for readmission were non–infection related (30%), worsening infection (29%), and new infection (19%).

OPAT Study: Prediction Goal

Study: 30-Day Readmissions for OPAT Patients (Allison et al., 2014)

Clinical Goal: Identify high-risk patients for enhanced follow-up

Candidate Predictors: - Demographics - Comorbidities - Infections - Antibiotic classes - Prior hospitalizations

Selection Strategy: 1. Univariate screen (p < 0.2) 2. Age forced in (clinical importance) 3. Backward selection with AIC 4. Model discrimination: c-statistic = 0.61

The Final Prediction Model

Predictors of 30-Day Hospital Readmission
Final multivariable model (N=782)
Predictor Odds Ratio 95% CI P-value
Age (per 10 years) 1.09 0.99-1.21 0.10
Aminoglycoside use 2.33 1.17-4.57 0.01
Drug-resistant organisms 1.57 1.03-2.36 0.03
Prior admissions (12 mo) 1.20 1.09-1.32 <0.001

Note

Note: Charlson comorbidity score was forced in but removed due to non-significance

Causal vs Prediction: Key Differences

Causal Model Prediction Model
Primary Goal Estimate causal effect Predict individual outcomes
Confounding Must adjust for all confounders Don't care about confounding
Variable Selection Theory-driven (DAG) Performance-driven (AIC, cross-validation)
Interpretation Coefficient = causal effect Coefficient = prediction weight
Mediators Usually exclude Can include if they improve prediction
Colliders Must avoid Can include if they improve prediction
Model Fit Secondary concern Primary concern
Clinical Use Guide interventions Identify high-risk individuals

Check Your Understanding 🤔

Question

You’re building a model to predict which patients will develop diabetes. Should you include BMI in the model?

A. No, because BMI might be a mediator of the causal effect
B. No, because BMI is confounded with diet and exercise
C. Yes, if BMI improves prediction accuracy
D. It depends on whether BMI causes diabetes

Answer: C - In prediction models, we care about accuracy, not causal interpretation. If BMI helps predict diabetes (which it does!), include it.

Type 3: Association Models

Goal

Explore risk factors without estimating direct causal effects

Characteristics:

  • Often hypothesis-generating
  • May use univariate analyses
  • Don’t necessarily adjust for confounders
  • More exploratory than confirmatory

Example:

Screening multiple variables for association with disease

When Used:

  • Early-stage research
  • Generating hypotheses
  • Identifying potential risk factors for future study

Important:

These are NOT causal estimates!

Example: TB Risk Factors in HIV

Published Study

High prevalence of subclinical tuberculosis in HIV-1-infected persons without advanced immunodeficiency: implications for TB screening

Tolu Oni, Rachael Burke, Relebohile Tsekela, Nonzwakazi Bangani, Ronnett Seldon, Hannah P Gideon, Kathryn Wood, Katalin A Wilkinson, Tom H M Ottenhoff, Robert J Wilkinson

Thorax 2011;66:669-673

ABSTRACT

Background: The prevalence of asymptomatic tuberculosis (TB) in recently diagnosed HIV-1-infected persons attending pre-antiretroviral therapy (ART) clinics is not well described. In addition, it is unclear if the detection of Mycobacterium tuberculosis in these patients clearly represents an early asymptomatic phase leading to progressive disease or transient excretion of bacilli.

Objective: To describe the prevalence and outcome of subclinical TB disease in HIV-1-infected persons not eligible for ART.

Methods: Untreated adults with HIV presenting for outpatient care in Durban, South Africa were screened for tuberculosis-related symptoms and had sputum tested by acid-fast bacilli smear and tuberculosis culture. Active tuberculosis and subclinical tuberculosis were defined as having any tuberculosis symptom or no tuberculosis symptoms with culture-positive sputum. We evaluated the association between tuberculosis disease category and 12-month survival using Cox regression, adjusting for age, sex, and CD4 count.

Results: Among 654 participants, 96 were diagnosed with active tuberculosis disease and 28 with subclinical disease. The median CD4 count was 68 (interquartile range 39–161) cells/mm³ in patients with active tuberculosis.

TB Study: Association Analysis

Study: Subclinical TB in HIV-infected persons (Oni et al., 2011)

Risk Factors for Subclinical TB
Univariate and multivariable analyses
Risk Factor
Univariate
Multivariable
OR P-value OR P-value
CD4 count (per cell/mm³) 0.996 0.035 0.996 0.060
Tuberculin skin test ≥5mm 2.770 0.081 4.960 0.064
Days since HIV diagnosis 1.000 0.095 1.006 0.056

Note: All three factors show “trends” but none reached statistical significance at α=0.05

Purpose: Hypothesis generation for future studies

Model Building Strategies: How Do We Choose Variables?

Clinical/Theory-Driven Approaches

1. Subject-Matter Expertise

Use DAGs to identify:

  • Confounders (must include)
  • Mediators (usually exclude)
  • Colliders (must exclude)
  • Precision variables (optional)

Best for: Causal inference

2. “Kitchen Sink”

Include all available predictors

Problems:

  • Overfitting
  • Loss of power
  • May include mediators/colliders
  • Difficult to interpret

Rarely recommended

Statistical Approaches

⚠️ Generally NOT Recommended for Causal Inference

These data-driven methods are commonly used but have serious limitations for estimating causal effects. DAG-based approaches are strongly preferred.

Variable Selection Methods

  1. Univariable screening
    • Include if p < 0.10 or 0.20
    • Problem: Ignores confounding structure; may exclude important confounders
  2. Change-in-estimate (Δβ)
    • Include if removing variable changes effect estimate >10%
    • Problem: Arbitrary threshold; can fail to detect confounding; shown next slide
  3. Automated selection (stepwise, forward, backward)
    • Problem: Atheoretical; removes confounders based on p-values; overfitting
    • Never use for causal inference!
  4. Information criteria (AIC, BIC)
    • For causal models: Not appropriate - optimizes fit, not confounding control
    • For prediction models: Appropriate use

Why Change-in-Estimate Can Fail

Problems with the 10% Rule

Can miss important confounders when:

  1. Weak confounding exists but <10% change
  2. Multiple confounders have offsetting effects
  3. Collider is included (creates confounding that looks like it needs adjustment)
  4. Threshold is arbitrary - why 10% and not 5% or 15%?

Better approach: Use DAG to identify confounders based on causal structure, not statistical criteria

The “Change-in-Estimate” Approach (For Reference Only)

Logic: If removing a variable substantially changes the exposure-outcome relationship, it’s likely a confounder

Typical threshold: >10% change in β coefficient

Formula: \[\text{Change} = \frac{|\beta_{\text{reduced}} - \beta_{\text{full}}|}{\beta_{\text{full}}} \times 100\%\]

Example:

Model Beta_Diet_Soda Change
Model 1: Age + Sex 0.45 Reference
Model 2: + Smoking 0.38 16%
Model 3: + BMI 0.36 5%
Model 4: + Exercise 0.32 11%

Keep: Smoking, Exercise
Drop: BMI

Active Exercise: Choose Your Strategy

Scenario

You’re studying the effect of coffee consumption on cardiovascular disease (CVD).

Available variables: - Demographics (age, sex, education) - Smoking status - Physical activity - BMI - Blood pressure - Cholesterol - Family history of CVD - Diet quality - Alcohol use - Sleep duration

Exercise Continued

Questions

  1. Draw a DAG showing the relationships between these variables, coffee, and CVD

  2. Identify:

    • Which variables are confounders?
    • Which might be mediators?
    • Any colliders?
  3. What adjustment strategy would you use?

Let’s Discuss

Work in pairs for 5 minutes, then we’ll share DAGs and strategies

The Importance of Well-Defined Interventions: A Fundamental Principle

Featured Paper

Does obesity shorten life? The importance of well-defined interventions to answer causal questions

MA Hernán and SL Taubman

International Journal of Obesity (2008) 32, S8–S14

ABSTRACT

Many observational studies have estimated a strong effect of obesity on mortality. In this paper, we explicitly define the causal question that is asked by these studies and discuss the problems associated with it. We argue that observational studies of obesity and mortality violate the condition of consistency of counterfactual (potential) outcomes, a necessary condition for meaningful causal inference, because (1) they do not explicitly specify the interventions on body mass index (BMI) that are being compared and (2) different methods to modify BMI may lead to different counterfactual mortality outcomes, even if they lead to the same BMI value in a given person.

Key insight: “Causal effects cannot be defined, much less computed, in the absence of a well-defined intervention.”

The Central Question

Before estimating any “causal effect,” ask:

“Can I describe a randomized trial that would answer this question?”

If not, the “causal effect” may be ill-defined.

A Tale of Two Policy Makers

The King’s Experiment

A despotic king funds three huge randomized trials (30 years, 1 million subjects each):

  1. Your trial: Intense exercise program (1 hr/day strenuous activity)
  2. Miguel’s trial: Comprehensive dietary intervention (calorie/carb restriction)
  3. Sarah’s trial: Combined exercise + diet program

Result: All three achieved the same BMI distribution, but different mortality rates!

Question: How many deaths are attributable to obesity?

Answer: It depends on the intervention!

The President’s Observational Study

Meanwhile, in a neighboring country…

The president analyzes 30 years of observational data: - Annual BMI measurements - Death dates - Lifestyle factors - Risk factors

Result: 150,000 annual excess deaths attributable to obesity

The Paradox:

  • Randomized trials couldn’t answer “What is the effect of obesity?”
  • Observational study seemingly could

Why? The observational study has an implicit intervention that isn’t well-defined!

The Problem: Consistency Violation

The Consistency Condition

For causal inference, we need well-defined counterfactual outcomes:

If person receives treatment \(a\), their outcome \(Y\) equals their potential outcome \(Y^a\)

For obesity: What does “receive obesity” mean? - Overeat? - Don’t exercise?
- Genetic predisposition? - Some combination?

Why Consistency Matters for Obesity

Key Issue: Many paths lead to BMI≥30, each may have different effects on mortality

Variables: O = Obesity/BMI ≥30 (blue, exposure) | M = Mortality (red, outcome) | D = Diet (green, confounder) | E = Exercise (green, confounder) | G = Genetics (green, confounder) | I = Illness (green, confounder) | S = Smoking (green, confounder)

The Problem: All these variables are confounders - they affect both obesity AND mortality, creating multiple backdoor paths. But we can’t adequately adjust for them because we don’t know which specific mechanisms led each person to their current BMI!

Why This Creates Problems

The Confounding Challenge with Obesity

The issue is NOT that obesity has multiple causes - that’s true of many exposures.

The issue IS that:

  1. Different ways of becoming obese (diet vs exercise vs genetics) may have different effects on mortality

  2. An observational study comparing obese vs non-obese people implicitly compares whatever combination of mechanisms led people to their current BMI

  3. We can’t adjust for all these mechanisms (especially genetic/physiological ones)

  4. Even if we could measure them all, adjusting for them would leave us estimating the effect of the residual determinants of obesity - not a meaningful causal effect!

Better approach: Study well-defined interventions (Mediterranean diet, exercise programs) rather than obesity itself

Implications for Causal Inference

Three Key Conditions for Causal Inference

  1. Consistency: Well-defined interventions
  2. Exchangeability: No unmeasured confounding
  3. Positivity: All treatment levels possible in all strata

Violating consistency makes it harder to satisfy the other two!

Consistency and Exchangeability

With well-defined interventions (e.g., exercise program):

Confounders might include: - Age - Sex - Baseline health - Lifestyle factors

These can be measured!

With ill-defined interventions (e.g., “obesity”):

Confounders might include: - All causes of BMI AND mortality - Complex genetic factors - Physiological processes - Gene-environment interactions

These are very hard to measure!

Consistency and Positivity

The Positivity Problem

Scenario: Some genetic traits → BMI always >30

  • These genes also affect mortality
  • They’re confounders → must adjust
  • But no one in that genetic stratum has BMI <30!
  • Positivity violated

Solution: Restrict analysis → lose generalizability

Better solution: Study modifiable interventions (diet, exercise) instead of BMI

What About Mediators?

E = Exercise (blue) | M = Mortality (red) | B = BMI (orange, mediator) | O = Other Mechanisms (orange, mediator)

Key Insight:

BMI is often a mediator of other risk factors: - Exercise → BMI → Mortality - Diet → BMI → Mortality
- Smoking → BMI → Mortality

Should we adjust for BMI? - If interested in total effect of exercise: NO - If interested in direct effect of exercise: MAYBE (but complex!)

Active Exercise: Well-Defined Interventions?

For each exposure, discuss:

Is the intervention well-defined? What would a trial look like?

  1. LDL cholesterol = 100 mg/dL
    • Well-defined? What intervention?
  2. Mediterranean diet
    • Well-defined? How would you implement?
  3. CD4 count = 500 cells/mm³ (in HIV)
    • Well-defined? What intervention?
  4. 30 minutes of moderate exercise daily
    • Well-defined? What’s the target trial?

Exercise: Discussion

Suggested Answers

  1. LDL = 100: Somewhat ill-defined
    • Statin? Diet? Exercise? Combination?
    • Different mechanisms may have different effects
  2. Mediterranean diet: Well-defined!
    • Clear intervention
    • Can be standardized
    • Good for causal inference
  3. CD4 = 500: Ill-defined
    • Antiretroviral therapy? Immune modulation?
    • Better to study ART adherence
  4. 30 min exercise: Well-defined!
    • Clear, implementable intervention
    • Though some details needed (intensity, type)

Practical Examples from Literature: Examining Real Studies

Example 1: Soda and Stroke

Published Study (Different from Northern Manhattan Study)

Soda consumption and the risk of stroke in men and women

Adam M Bernstein, Lawrence de Koning, Alan J Flint, Kathryn M Rexrode, and Walter C Willett

American Journal of Clinical Nutrition, 2012; 95(5):1190-1199

ABSTRACT

Background: Consumption of sugar-sweetened soda has been associated with an increased risk of cardiometabolic disease. The relation with cerebrovascular disease has not yet been closely examined.

Objective: Our objective was to examine patterns of soda consumption and substitution of alternative beverages for soda in relation to stroke risk.

Design: The Nurses’ Health Study, a prospective cohort study of 84,085 women followed for 28 y (1980–2008), and the Health Professionals Follow-Up Study, a prospective cohort study of 43,371 men followed for 22 y (1986–2008), provided data on soda consumption and incident stroke.

Results: We documented 1416 strokes in men during 841,770 person-years of follow-up and 2938 strokes in women during 2,188,230 person-years of follow-up. The pooled RR of total stroke for ≥1 serving of sugar-sweetened soda/d, compared with none, was 1.16 (95% CI: 1.00, 1.34). The pooled RR of total stroke for ≥1 serving of low-calorie soda/d was 1.16 (95% CI: 1.05, 1.28).

Example 1: Study Details

Bernstein et al. (2012): Nurses’ Health Study + Health Professionals Follow-Up Study

Design: - 127,456 participants - 28 years follow-up (NHS) - 22 years follow-up (HPFS)

Question: Effect of soda consumption on stroke risk

Approach: Comprehensive adjustment for: - Dietary factors - Physical activity - Smoking - Alcohol - Medical history - Medications

Question: Is This Causal?

Think About It

Given what we learned about well-defined interventions:

  1. What is the implicit intervention in this observational study?

  2. Is “soda consumption” well-defined?

  3. What alternatives would be better?

Discussion

The intervention is relatively well-defined! - “Drink ≥1 soda/day” is implementable - Unlike “have BMI of 30” - Could design an actual trial

But still challenges: - Type of soda matters - What’s replaced by soda? - Confounding by lifestyle

Example 2: Heart Failure Prediction

Published Study

A Multi-State Model to Predict Heart Failure Hospitalizations and All-Cause Mortality in Outpatients With Heart Failure With Reduced Ejection Fraction: Model Derivation and External Validation

Jenica N. Upshaw, MD, MS; Marvin A. Konstam, MD; David van Klaveren, MS; Farzad Noubary, PhD; Gordon S. Huggins, MD; and David M. Kent, MD, MS

Circulation: Heart Failure, 2019;12:e005625

ABSTRACT

Background: Outpatients with heart failure (HF) who are at high risk for HF hospitalization and/or death may benefit from early identification. We sought to develop and externally validate a model to predict both HF hospitalization and mortality that accounts for the semi-competing nature of the two outcomes and captures the risk associated with the transition from the stable outpatient state to the post-HF hospitalization state.

Methods and Results: A multi-state model to predict HF hospitalization and all-cause mortality was derived using data (n=3834) from the Heart Failure Endpoint evaluation of Angiotensin II Antagonist Losartan (HEAAL) study, a multinational randomized trial in patients with HF with reduced ejection fraction (HFrEF). The model predicts the hazard of death and HF hospitalization as a function of baseline patient characteristics and transitions among states.

Model Goal: Predict who will transition from stable state to hospitalization or death

Heart Failure: Multi-State Model

HEAAL Study Multi-State Model for HF Hospitalization and Mortality

Study Flow: Starting cohort n=3,834 patients with HFrEF → Excluded NYHA I (n=2) or IV (n=22) and missing data (n=24) → Model Development Cohort: n=3,786

Three Transitions Modeled:

  • Transition 1: Stable outpatient → HF hospitalization (n=937 events)
  • Transition 2: Stable outpatient → Death without hospitalization (n=754 events)
  • Transition 3: Post-HF hospitalization → Death (n=522 events)

Model Goal: Predict hazard of each transition using 13 baseline patient characteristics

Heart Failure Model: Variable Selection

Selection Criteria:

  1. Consistently associated with outcome
  2. Not rare (<5%)
  3. Reliably assessed (low inter-observer variability)
  4. Routinely collected

No data-driven selection!

Final Variables (13): - Age - Sex - NYHA class - LVEF (%) - Creatinine - Sodium - SBP - Weight - Diabetes - IHD - Atrial fibrillation - Prior stroke - ICD (added for calibration)

Heart Failure: Results

Multi-State Model Hazard Ratios
Selected predictors (5 of 13 shown)
Predictor HR: Hospitalization HR: Death
Age (per year) 1.02 1.08
Male 1.15 1.34
NYHA III vs II 1.53 1.45
LVEF (per %) 0.98 0.98
Creatinine (per mg/dL) 1.15 1.13

C-statistic: 0.72 (good discrimination)

What Type of Model?

Quick Quiz

The heart failure model is:

A. A causal model
B. A prediction model
C. An association model
D. Could be any of these

Answer: B - Prediction model

Evidence: - Goal: identify high-risk patients - Focus on discrimination (c-statistic) - Variables chosen for reliability, not causal structure - No concern about confounding

Common Pitfalls in Model Building: What Can Go Wrong?

Pitfall 1: Kitchen Sink Confounding

The Problem

Including all measured variables in a causal model

Why it’s wrong: - May include mediators (blocks causal pathway) - May include colliders (induces bias!) - Loss of statistical power - Difficult interpretation

Example: Studying exercise → mortality, adjusting for BMI, blood pressure, cholesterol, diabetes…

Better: Use a DAG to identify only confounders

Pitfall 2: Atheoretical Variable Selection

The Problem:

Using stepwise selection for causal inference

Why it’s wrong: - Ignores causal structure
- Removes “non-significant” confounders - Overfits to sample - P-values invalid after selection

When it’s OK:

Prediction models with: - Cross-validation - Large sample size - Acknowledgment of limitations

Never OK: Causal inference

Pitfall 3: Adjusting for Mediators

Variables: E = Exercise (blue, exposure) | M = Mortality (red, outcome) | W = Weight Loss (orange, mediator)

If interested in total effect: Don’t adjust for weight loss
If interested in direct effect: Mediation analysis needed

Pitfall 4: Ignoring Colliders

Variables: E = Exercise (blue, exposure) | M = Mortality (red, outcome) | G = Genetics (green, confounder) | H = Hospitalization (purple, collider)

Adjusting for hospitalization creates spurious association between exercise and genetics!

Pitfall 5: Confusing Prediction and Causation

Common Mistake

Using a prediction model to make causal claims

Why it’s wrong: - Prediction models don’t address confounding - May include mediators and colliders - Optimized for discrimination, not unbiased estimation

Example: “This model predicts mortality, therefore these variables cause mortality”

A Word of Caution: JAMA Users’ Guide

Adjusted Analyses in Studies Addressing Therapy and Harm

JAMA Users’ Guides to the Medical Literature
Thomas Agoritsas, Arnaud Merglen, Nilay D. Shah, Martin O’Donnell, Gordon H. Guyatt

Key Points:

“Observational studies almost always have bias because prognostic factors are unequally distributed between patients exposed or not exposed to an intervention. The standard approach to dealing with this problem is adjusted or stratified analysis. Its principle is to use measurement of risk factors to create prognostically homogeneous groups and to combine effect estimates across groups.”

If RCTs cannot be conducted, it will remain impossible to determine whether adjusted estimates are accurate or misleading.

The three conditions needed: 1. Investigators identify all known prognostic factors 2. Investigators accurately measure all these factors 3. Investigators conduct adjusted analysis that includes all these factors

Final Exercise: Critique These Approaches

Scenario 1

“We used stepwise selection (p<0.05 to enter, p>0.10 to remove) to identify confounders for the effect of vitamin D on cardiovascular disease.”

What’s wrong?

Scenario 2

“Our prediction model includes the following variables that cause hospital readmission…”

What’s wrong?

Scenario 3

“We adjusted for all available covariates to eliminate confounding.”

What’s wrong?

Critique: Answers

Scenario 1

Problem: Stepwise selection is atheoretical and may remove important confounders based on p-values

Better: Use DAG to identify confounders based on causal structure

Scenario 2

Problem: Prediction models don’t identify causes; they identify associations

Better: “Variables that predict readmission” not “cause readmission”

Scenario 3

Problem: “Kitchen sink” approach may include mediators and colliders

Better: Adjust only for confounders identified via DAG

Best Practices: Guidelines for Your Research

Best Practice 1: Start with the Question

Before ANY analysis:

Ask yourself:

  1. What is my research question?
  2. What type of inference do I need?
    • Causal effect?
    • Prediction?
    • Exploration?
  3. Can I describe a trial that would answer this?

Best Practice 2: Use DAGs for Causal Questions

Steps:

  1. Draw causal relationships
  2. Identify confounders
  3. Identify mediators
  4. Check for colliders
  5. Determine minimal adjustment set

Tools:

  • DAGitty (dagitty.net)
  • ggdag (R package)
  • Draw.io
  • Paper and pencil!

Get feedback from colleagues/experts

Best Practice 3: Pre-Specify Your Approach

Before looking at results:

Write down:

  • Primary exposure/outcome
  • Expected confounders (with justification)
  • Analysis plan
  • Sensitivity analyses

Consider: Pre-registration or analysis plan document

Best Practice 4: Be Transparent

In your paper, report:

  1. Type of model (causal vs prediction vs association)
  2. Variable selection process
  3. DAG (for causal models)
  4. All models fit (not just final)
  5. Rationale for inclusion/exclusion
  6. Sensitivity analyses
  7. Limitations of approach

Best Practice 5: Match Method to Goal

Research Goal Appropriate Methods Avoid
Causal Effect DAG-based selection, Change-in-estimate, Theory-driven Stepwise selection, Kitchen sink
Prediction Cross-validation, AIC/BIC, LASSO, Regularization Including variables just because they're confounders
Association/Exploration Univariate screening, Multiple testing correction Making causal claims

Summary and Key Takeaways

Key Points

  1. Different research questions require different modeling approaches
    • Causal inference ≠ Prediction ≠ Association
  2. Causal inference requires well-defined interventions
    • Can you describe a trial that would answer your question?
    • Ill-defined exposures (like BMI) create serious problems
  3. Variable selection should be theory-driven for causal models
    • Use DAGs, not stepwise selection
    • Adjust for confounders, not mediators or colliders
  4. Prediction models have different priorities
    • Focus on discrimination/calibration
    • Variable selection based on performance
  5. Transparency is essential
    • Report your reasoning
    • Show all models
    • Acknowledge limitations

The Three Conditions for Causal Inference

Remember:

  1. Consistency: Well-defined interventions
  2. Exchangeability: No unmeasured confounding
  3. Positivity: All treatment levels possible

Violating consistency makes the other two harder to achieve!

Questions to Ask When Reading Papers

  1. What type of model is this (causal/prediction/association)?
  2. Does the analysis match the stated goal?
  3. For causal models:
    • Is the exposure well-defined?
    • What’s the implied intervention?
    • Are confounders appropriately identified?
    • Any mediators or colliders adjusted for?
  4. For prediction models:
    • Is performance adequately assessed?
    • Is there validation?
    • Are causal claims avoided?

Additional Resources

  • DAGitty: dagitty.net (online DAG tool)
  • Causal Inference Book: Hernán & Robins (free online)
  • ggdag vignette: R documentation
  • STROBE guidelines: Reporting observational studies
  • TRIPOD guidelines: Reporting prediction models

Practice Assignment

For Next Class

Find a published paper from your field that uses multivariable regression.

Analyze: 1. What type of model is it? 2. What variable selection method was used? 3. Is it appropriate for the research question? 4. What would you do differently?

Be prepared to discuss in 5 minutes

The Bigger Picture: Beyond Confounding

Using Design Thinking to Differentiate Useful From Misleading Evidence in Observational Research

EDITORIAL - Steven N. Goodman, Sebastian Schneeweiss, Michael Baiocchi

“Valid causal inference from nonrandomized studies about treatment effects depends on many factors other than confounding. These include whether the causal question motivating the study is clearly specified, whether the design matches that question and avoids design biases, whether the analysis matches the design, the appropriateness and quality of the data, the fit of adjustment models, and the potential for model searching to find spurious patterns in vast data streams…”

“The hallmark of a well-posed causal question is that one can describe an RCT that would answer it…”

“…how close observational studies can come, how this can be determined, and what else can be reliably learned from them are critical questions for continued research.”

Final Thoughts

Remember

  • Think before you model
  • Match method to question
  • Be transparent
  • Consider the intervention
  • Use DAGs for causal questions

“The hallmark of a well-posed causal question is that one can describe an RCT that would answer it.”

— Hernán & Taubman (2008)

Questions?

Next lecture: Comparing experimental vs observational study designs

Appendix: DAG Resources

Creating DAGs in R

Show code
library(ggdag)
library(dagitty)

# Define causal structure with single-letter labels
dag <- dagitty('dag {
  E [exposure,pos="0,1"]
  Y [outcome,pos="2,1"]
  C [pos="1,0"]
  
  C -> E
  C -> Y
  E -> Y
}')

# Plot with colors and labels on nodes
ggdag(dag, node_size = 20, text_size = 6) +
  geom_dag_node(aes(color = name)) +
  geom_dag_text(color = "white", size = 6) +
  scale_color_manual(
    values = c(
      "E" = "#3498db",  # Exposure - blue
      "Y" = "#e74c3c",  # Outcome - red
      "C" = "#2ecc71"   # Confounder - green
    ),
    guide = "none"
  ) +
  theme_dag()

# Find adjustment sets
adjustmentSets(dag)

Where: E = Exposure, Y = Outcome, C = Confounder

DAG Example: Smoking and Lung Cancer

Show code
library(ggdag)

smoking_dag <- dagitty('dag {
  S [exposure,pos="0,2"]
  L [outcome,pos="3,2"]
  A [pos="1,0"]
  E [pos="1,1"]
  D [pos="2,3"]
  P [pos="1.5,0.5"]
  
  A -> S
  A -> L
  E -> S
  E -> L
  E -> P
  P -> L
  S -> L
  S -> D
  D -> L
}')

ggdag(smoking_dag, node_size = 22, text_size = 6.5) +
  geom_dag_node(aes(color = name)) +
  geom_dag_text(color = "white", size = 6.5) +
  scale_color_manual(
    values = c(
      "S" = "#3498db",  # Exposure - blue
      "L" = "#e74c3c",  # Outcome - red
      "A" = "#2ecc71",  # Confounder - green
      "E" = "#2ecc71",  # Confounder - green
      "P" = "#2ecc71",  # Confounder - green
      "D" = "#f39c12"   # Mediator - orange
    ),
    guide = "none"
  ) +
  theme_dag() +
  theme(legend.position = "none")

Variables: S = Smoking (blue, exposure) | L = Lung Cancer (red, outcome) | A = Age (green, confounder) | E = SES (green, confounder) | P = Air Pollution (green, confounder) | D = Alcohol (orange, mediator)

Find Minimal Adjustment Set

Show code
# What should we adjust for?
adjustmentSets(smoking_dag)
{ A, E }
Show code
# Check if specific set is sufficient
# A = Age, E = SES
isAdjustmentSet(smoking_dag, c("A", "E"))
[1] TRUE

Result: {A, E} is a sufficient adjustment set (Age and SES)

Thank You!

Questions?