Causal_types_and_weights

1. Setup and Simulation Parameters

2. Data Generation: Observational vs. Randomized Worlds

We will create our population and simulate two scenarios:

Observational Study: Where people choose to drink coffee, and that choice is influenced by whether they smoke. This introduces confounding.
Randomized Trial (RCT): Where we randomly assign people to drink coffee, breaking the link with smoking. This is the gold standard for causal inference.

Data generation complete.

3. Estimating Effects: The Naive Approach

Now, let’s calculate the risk ratio from both studies. The observational study gives a biased result, making coffee look harmful, while the RCT correctly finds no effect.

Biased Observational Risk Ratio: 1.395 <-- Coffee appears harmful!

Unbiased Randomized Risk Ratio: 0.998 <-- Correctly finds no effect.

4. The Fix: Inverse Probability Weighting (IPW)

Here, we’ll apply IPW to the observational data. By weighting each person by the inverse of their probability of receiving the treatment they got, we can balance the confounder (smoking) between the groups and get an unbiased estimate

Confounder Balance (Proportion of Smokers):

Method	prop_smoker_group_0	prop_smoker_group_1
Unweighted	0.198	0.701
Weighted	0.398	0.401


Corrected Weighted Risk Ratio: 1.014 <-- The confounding is corrected!

5. Visualizing the Source of Bias: Causal Type Imbalance