Causal Machine Learning

Author

Richmond Silvanus Baye

Published

June 20, 2025

Intoduction

This tutorial introduces explainable AI through the lens of causal machine learning (Causal ML). While most machine learning models focus purely on prediction, Causal ML aims to understand the why behind the outcome by estimating causal effects. For example, are there individual or group differences (heterogeneity)or what would have happened under a different exposure or policy (the counterfactual).

Causal ML sits at the intersection of machine learning and causal inference. It not only helps us predict outcomes but also allows us to simulate “what-if” scenarios and individual effects.

If you’d like to explore Causal ML in more depth, here are some excellent resources:

Use cases

Causal ML is used in a wide range of domains, including e-commerce, digital marketing, finance, and healthcare.

  • In health economics, Causal ML can assess the effects of policies on population outcomes.

  • In drug development, it enables individualized treatment effect estimation to support personalized clinical decision-making (patient voice).

In this tutorial, we’ll focus on an e-commerce-inspired use case from the ride-hailing industry (Uber).

Case Study: Causal Effect of Surge Pricing Opt-Out

Question:

What is the causal effect of opting out of surge pricing alerts on the number of rides taken during peak hours?

Problem

We cannot run a traditional A/B test by randomly forcing some users to accept surge pricing and others to opt out. Opt-out behavior is self-selected, meaning users decide based on their own characteristics:

  • Riders who are price-insensitive or have urgent travel needs may accept surge pricing.

  • Riders who are price-sensitive or in non-urgent situations are more likely to opt out.

This introduces selection bias, making a simple comparison between opt-out and non-opt-out groups (A/B testing) unreliable.

Solution

Suppose Uber previously conducted an experiment that randomly assigned users to different versions of the opt-out prompt:

  • “Are you sure?” (confirmation step) [less easy]

  • “One-click toggle” (simplified opt-out) [easier]

Some versions made it easier to opt out than others. These prompt variants create random variation in the likelihood of opting out and can serve as an instrumental variable (IV):

  • They affect opt-out behavior (relevance assumption), but

  • They do not directly affect ride volume (exclusion restriction assumption).

We can use microsoft’s EconML Intent-To-Treat Doubly Robust Instrumental Variable estimator (DRIV) to understand this reduced form causal relationship. DRIV combines machine learning with causal inference to estimate the causal effect of opting out of surge pricing on ride volume during peak hours. This estimator helps to estimate the fact not every rider was offered the easier opt-out option.

Let’s begin by loading the packages. We will use python for this exercise.

Code
import shap
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings 
warnings.filterwarnings("ignore")
import os
import sys
Code
#ML packages
import lightgbm as lgb
from sklearn.utils import resample
from sklearn.preprocessing import PolynomialFeatures

#EconMl
from econml.iv.dr import IntentToTreatDRIV
from econml.iv.dr import LinearIntentToTreatDRIV
from econml.cate_interpreter import SingleTreeCateInterpreter, SingleTreePolicyInterpreter

Note that for you to be able to load the EconML package successfully, you need to have :

#pip install econml

Data Simulation

Because we do not have data readily available for this, we can create our own synthetic data of 100,000 observations using the code below.

Code
#Set a seed for reproducibility
np.random.seed(32)

# Sample size 
n = 100000

#Simulated data
data = pd.DataFrame({
    "rides_peak_pre" : np.random.poisson(3, size = n)
    , "total_rides_pre" : np.random.poisson(10, size=n)
    , "avg_fare_pre" : np.round(np.random.normal(15, 5, size=n), 2)
    , "user_city" : np.random.choice(["New York", "Chicago", "San Fransisco", "Boston", "Austin", "Boulder"], size = n)
    , "user_device" : np.random.choice(["IOS", "Andriod"], size =n)
    , "is_biz_account" : np.random.choice([0, 1], size = n, p = [0.8, 0.2])
    , "prompt_variant_easy" : np.random.choice([0, 1], size = n) #Instrument
})

#Simulate treatment assignemnt based on instrument with noise

data["opted_out"] = np.where(
    (data["prompt_variant_easy"] ==1) & (np.random.rand(n)< 0.7), 1
    , np.where((data["prompt_variant_easy"] == 0) & (np.random.rand(n)< 0.3), 1,0)
)

#Simulate post-treatment behavior with some treatment effect
data["rides_peak_post"] = (
    data["rides_peak_pre"] + 
np.random.normal(0, 1, size = n) +
1.5 * (1-data["opted_out"]) #Users who did not opt-out take more pre rides
).round().astype(int)

#Ensure there is no negative rides 
data["rides_peak_post"] = data["rides_peak_post"].clip(lower= 0)

# Create directory if it doesn't exist
os.makedirs('analysis', exist_ok=True)

# Save to CSV
data.to_csv('analysis/synthetic_data.csv', index=False)

# Display the first few rows of the data
data.head()
rides_peak_pre total_rides_pre avg_fare_pre user_city user_device is_biz_account prompt_variant_easy opted_out rides_peak_post
0 6 13 18.60 Austin IOS 1 1 1 7
1 3 13 21.62 Chicago IOS 0 0 0 4
2 4 13 19.51 Chicago Andriod 0 1 0 6
3 1 9 16.94 Austin Andriod 0 1 1 4
4 3 12 16.00 New York Andriod 0 1 1 3

Exploratory Data Analysis

With our data generated, we can explore the distribution of the features and implement the model.

Code
# Set up the subplot grid
fig, axes = plt.subplots(2, 2, figsize=(8, 6))

# Plot 1: Distribution of Post-Peak Rides by Opt-Out Status
sns.histplot(data, x="rides_peak_post", hue="opted_out", multiple="stack", ax=axes[0, 0], palette="CMRmap")
axes[0, 0].set_title("Post-Peak Rides by Opt-Out Status")
axes[0, 0].set_xlabel("Rides During Peak (Post)")
axes[0, 0].set_ylabel("Count")

# Plot 2: Boxplot of Average Fare by Opt-Out Status
sns.boxplot(data=data, x="opted_out", y="avg_fare_pre", ax=axes[0, 1], palette="CMRmap")
axes[0, 1].set_title("Average Fare (Pre) by Opt-Out Status")
axes[0, 1].set_xlabel("Opted Out")
axes[0, 1].set_ylabel("Average Fare (Pre)")

# Plot 3: Opt-Out Rate by City
city_opt_out = data.groupby("user_city")["opted_out"].mean().reset_index()
sns.barplot(data=city_opt_out, x="user_city", y="opted_out", ax=axes[1, 0], palette="CMRmap")
axes[1, 0].set_title("Opt-Out Rate by City")
axes[1, 0].set_xlabel("City")
axes[1, 0].set_ylabel("Opt-Out Rate")
axes[1, 0].set_xticklabels(axes[1,0].get_xticklabels(), fontsize=6)

# Plot 4: Scatter Plot of Pre vs. Post Peak Rides colored by Opt-Out
sns.scatterplot(data=data, x="rides_peak_pre", y="rides_peak_post", hue="opted_out", alpha=0.7, ax=axes[1, 1], palette="CMRmap")
axes[1, 1].set_title("Pre vs Post Peak Rides")
axes[1, 1].set_xlabel("Pre Peak Rides")
axes[1, 1].set_ylabel("Post Peak Rides")

plt.tight_layout()
plt.show()

Key summary from the data. We observe the following.

  • Post-peak rides are higher among users who did not opt out of surge pricing alerts.

  • Average fares are similar across opt-out groups, suggesting fare levels alone may not drive opt-out behavior.

  • Opt-out rates vary slightly by city, with no city showing extreme deviation.

  • There’s a positive relationship between pre- and post-peak rides and stronger for users who did not opt out.

Causal Effect with EconML

Having explored key features of the data, we can now implement our casual effect.

To ensure that we are able to run the model successfully, we create a dummy for the categorical features using One-hot encoder.

Code
# One-hot encode categorical variables
data = pd.get_dummies(data, columns=["user_city", "user_device"])

Next, we define our key variables for analysis. The instrument variable (Z) is prompt_variant_easy, representing the nudge intervention. The treatment variable is opted_out, indicating whether a customer chose to opt out of the prompt. Our outcome variable of interest is rides_peak_post, which captures ride activity during peak hours after the intervention.

We identify several potential confounders that could influence both treatment assignment and the outcome. These include the customer’s city, mobile operating system, average fare price, and whether the account is associated with a corporate (business) profile. To account for these factors, we include them as control variables in our analysis.

Code
# Define the instrument, treatment, outcome
Z = data['prompt_variant_easy']  # Instrument
T = data['opted_out']  # Treatment
Y = data['rides_peak_post']  # Outcome

# Define features excluding the instrument, treatment, and outcome
X = data.drop(columns=['prompt_variant_easy', 'opted_out', 'rides_peak_post'])

In our data generation process, the treatment effect is given by

\[ \text{treatment\_effect} = 1.5 \times (1 - \text{opted\_out}) \]

This implies that the treatment effect is + 1.5 if the rider did not opt-out and 0 if the rider opted out and we are seeking to learn this from the data.

To do that we define our function for the treatment effect.

Code
# Define nuisance models
lgb_T_XZ_params = {
    'objective': 'binary',
    'metric': 'auc',
    'learning_rate': 0.1,
    'num_leaves': 30,
    'max_depth': 5,
    'verbosity' : -1
}

lgb_Y_X_params = {
    'metric': 'rmse',
    'learning_rate': 0.1,
    'num_leaves': 30,
    'max_depth': 5,
    'verbosity' : -1
}

model_T_XZ = lgb.LGBMClassifier(**lgb_T_XZ_params)
model_Y_X = lgb.LGBMRegressor(**lgb_Y_X_params)
flexible_model_effect = lgb.LGBMRegressor(**lgb_Y_X_params)

Having defined the nuisance parameters and without assuming a linear functional form, we use XGBoost to flexibly estimate the priors. This non-parametric approach allows us to capture complex relationships between covariates and the treatment or outcome. With these estimates in place, we proceed to train the causal model using the IntentToTreatDRIV estimator from the EconML library.

Code
# Train EconML model using IntentToTreatDRIV
model = IntentToTreatDRIV(
    model_y_xw=model_Y_X,
    model_t_xwz=model_T_XZ,
    flexible_model_effect=flexible_model_effect
)
Code
# Fit the model
model.fit(Y, T, Z=Z, X=X)
<econml.iv.dr._dr.IntentToTreatDRIV at 0x14a149fd0>
Code
# Get the causal effect
causal_effect = model.effect(X)
print("Causal Effect of Opting Out on Rides During Peak Hours:", causal_effect.mean())
Causal Effect of Opting Out on Rides During Peak Hours: -1.4725080945552658
Code
model.effect(X[:8])
array([-1.37491246, -1.29706647, -1.45915156, -1.48616581, -1.60759469,
       -1.2136639 , -1.36240566, -1.32523829])

Our estimated causal effect is negative, indicating that opting out of surge pricing alerts results in fewer rides during peak hours. For a business aiming to scale, this finding highlights the behavioral influence of price visibility. The alerts likely nudge users to engage more during peak periods and they do this either by making them aware of dynamic pricing or by prompting time-sensitive decisions. This suggests a strategic opportunity: for users who opt out, alternative incentives such as loyalty points or discounts on future rides could be deployed to maintain or boost peak-hour engagement.

Heterogenous Treatment Effect & Policy

One might ask whether this effect holds uniformly across all customers. To explore this, we conducted a heterogeneous treatment effect analysis to inform targeted policy recommendations. Specifically, we used the SingleTreeCateInterpreter to fit a simplified decision tree to the estimated treatment effects. On average, the treatment effect was -1.473, indicating an overall reduction in peak-hour rides due to the intervention. However, the decision tree revealed clear segmentation:

  • Users shaded in dark red experienced strongly negative effects and this suggests that surge pricing alerts reduced their peak-hour ride activity.

  • Users in green, typically those with high prior ride frequency, responded positively to the alerts, showing increased ride activity.

These findings suggest that the intervention is beneficial primarily for a narrow, high-usage segment, but could be counterproductive for the broader user base. A one-size-fits-all approach may therefore reduce overall engagement, highlighting the need for more targeted communication strategies.

Code
# Use SingleTreeCateInterpreter to interpret the treatment effects

intrp = SingleTreeCateInterpreter(max_depth=2, min_samples_leaf=10)
intrp.interpret(model, X)

# Plot the decision tree
plt.figure(figsize=(15, 8))
intrp.plot(feature_names=X.columns, fontsize=11)
plt.show()

Conclusion

Our analysis reveals that surge pricing alerts have a modest overall effect, with substantial heterogeneity across user segments. While high-frequency riders tend to respond positively, the broader user base shows little to no benefit, and in some cases, reduced engagement. These insights suggest that a one-size-fits-all alert strategy may be sub-optimal. Instead, targeted interventions informed by user behavior and engagement patterns are likely to yield greater impact and efficiency.