Data generation complete.
We will create our population and simulate two scenarios:
Observational Study: Where people choose to drink coffee, and that choice is influenced by whether they smoke. This introduces confounding.
Randomized Trial (RCT): Where we randomly assign people to drink coffee, breaking the link with smoking. This is the gold standard for causal inference.
Data generation complete.
Now, let’s calculate the risk ratio from both studies. The observational study gives a biased result, making coffee look harmful, while the RCT correctly finds no effect.
Biased Observational Risk Ratio: 1.395 <-- Coffee appears harmful!
Unbiased Randomized Risk Ratio: 0.998 <-- Correctly finds no effect.
Here, we’ll apply IPW to the observational data. By weighting each person by the inverse of their probability of receiving the treatment they got, we can balance the confounder (smoking) between the groups and get an unbiased estimate
Confounder Balance (Proportion of Smokers):
| Method | prop_smoker_group_0 | prop_smoker_group_1 |
|---|---|---|
| Unweighted | 0.198 | 0.701 |
| Weighted | 0.398 | 0.401 |
Corrected Weighted Risk Ratio: 1.014 <-- The confounding is corrected!