1 Marginal Structural Models (MSMs)

MSMs estimate the causal effect of time-varying exposures (e.g., protein levels) on outcomes (e.g., T2D). Standard regression models can introduce bias due to:

2 What is a Lagged Variable?

A lagged variable is a past value of the same variable, used as a predictor for its future value.

MSMs correct for confounding with IPW, while collider bias must be prevented using study design—such as using lagged exposures instead of colliders.


3 MSM and Lagged Proteins: Handling Confounding & Collider Bias

We use MSM and lagged proteins together to handle both confounding and potential collider bias:

Challenge Solution Why It Works
Time-varying confounding MSM + IPW Balances confounders over time
Potential collider bias (baseline CRP) Lagged proteins (CRP at t-1) Avoids spurious associations

4 How These Methods Work Together

✅ MSMs handle time-varying confounders using IPW.

✅ Lagged proteins replace problematic baseline adjustments to avoid collider bias.


5 Causal Structures


6 Simulation Example

I simulated a longitudinal dataset with 500 individuals across 4 time points to study the relationship between protein levels and T2D using a MSM with IPW and lagged variables.

6.1 Variables in the Dataset

  • id: Unique identifier for each individual.
  • time: Time point (1 to 4).
  • protein: Protein levels measured at each time point.
  • bmi: Time-varying Body Mass Index (BMI).
  • activity: Physical activity (binary: 0 = low, 1 = high).
  • medication: Medication use (binary: 0 = no, 1 = yes).
  • t2d: Type 2 Diabetes onset (binary: 0 = no, 1 = yes).
  • protein_lag: Lagged protein levels (previous time point).
  • weights: Stabilized Inverse Probability Weights (IPWs) used in the MSM.

6.2 Simulation Details

  • Protein levels change over time, influenced by BMI, activity, and medication.
  • T2D risk is modeled realistically (~10-20% prevalence).
  • We aim to avoid collider bias by excluding baseline protein levels using lagged protein values instead.
  • Time-varying confounders (BMI, activity, medication) are included, and IPW adjusts for them.

6.3 Why This Approach?

  • Marginal Structural Models (MSMs) allow proper estimation of causal effects in the presence of time-varying confounding.
  • Lagged protein values prevent bias from conditioning on baseline protein, which maybe act as a collider.
  • IPW balances the dataset, mimicking a randomized trial.

This dataset is structured for causal inference and designed to address biases inherent in traditional regression models. By using lagged variables and IPW, we ensure that the estimated effect of protein levels on T2D is not confounded by prior exposures.



7 Conclusion


8 Caveat