MSMs estimate the causal effect of time-varying exposures (e.g., protein levels) on outcomes (e.g., T2D). Standard regression models can introduce bias due to:
Time-varying confounding – Past protein levels influence future levels and T2D risk. MSMs can use Inverse Probability Weighting (IPW) to adjust for these confounders.
Collider bias – Adjusting for a variable influenced by both the exposure (future protein level) and outcome (e.g., adjusting on baseline protein levels, which may be affected by the same genetics as the future protein level and by preclinical disease, aka, reverse causation) can create spurious associations. MSMs do not inherently correct for collider bias; instead, might avoid it by using lagged variables rather than conditioning on potential colliders?
A lagged variable is a past value of the same variable, used as a predictor for its future value.
For example, instead of adjusting for baseline CRP, we might use CRP from the previous time point (t-1) to account for prior inflammation without introducing collider bias.
This helps estimate how changes in protein levels over time influence T2D risk while avoiding spurious associations.
MSMs correct for confounding with IPW, while collider bias must be prevented using study design—such as using lagged exposures instead of colliders.
We use MSM and lagged proteins together to handle both confounding and potential collider bias:
| Challenge | Solution | Why It Works |
|---|---|---|
| Time-varying confounding | MSM + IPW | Balances confounders over time |
| Potential collider bias (baseline CRP) | Lagged proteins (CRP at t-1) | Avoids spurious associations |
✅ MSMs handle time-varying confounders using IPW.
✅ Lagged proteins replace problematic baseline adjustments to avoid collider bias.
Confounding: A confounder influences both the exposure and the outcome, creating a spurious association if not adjusted for (e.g., baseline protein levels affecting both future protein levels and T2D). MSMs can use Inverse Probability Weighting (IPW) to handle confounding.
Collider: A collider is influenced by both the exposure and the outcome, and adjusting for it can induce a false association (e.g., baseline protein levels affected by genetics and preclinical T2D). Collider bias is avoided by not conditioning on the collider and using alternative strategies like lagged variables.
I simulated a longitudinal dataset with 500 individuals across 4 time points to study the relationship between protein levels and T2D using a MSM with IPW and lagged variables.
This dataset is structured for causal inference and designed to address biases inherent in traditional regression models. By using lagged variables and IPW, we ensure that the estimated effect of protein levels on T2D is not confounded by prior exposures.
As I said in lab meeting, I would consult a statistician with more experience than myself.
I’ve never implemented the approach I outlined—it’s purely hypothetical, and I could be mistaken. The model didn’t converge, but that issue isn’t specific to MSM. It’s unclear to me whether baseline proteins act as confounders or colliders—-that may depend on the protein itself.
I’m aware you may have already considered these things 😊
So, if you find it stimulating or useful, that’s why I’m sharing. I think inverse probability weighting (IPW) is worth considering for analyses involving repeat measures.