Differentiating Baseline Spend from Treatment Response with grf

1. Motivation: The Persuadability Curve (Heterogeneous Treatment Effects)

Before modeling, we must understand the “Ground Truth.” We simulate a scenario where treatment response follows a non-linear path based on a feature X1.

The S-Curve (tanh): Represents customers who become more likely to spend as their persuadability score increases.
The Sleeping Dogs: Customers with very low X1 scores who actually react negatively to being contacted.

xs <- seq(-3, 3, length.out = 400)
tau_xs <- 50 * tanh(1.2 * xs) - 10 * (as.numeric(xs < -0.5))

ggplot(data.frame(xs, tau_xs), aes(x = xs, y = tau_xs)) +
  geom_line(color = "#2c3e50", size = 1.2) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dotted", color = "red") +
  annotate("text", x = 1.5, y = 15, label = "Persuadables (Positive Uplift)", color = "darkgreen") +
  annotate("text", x = -2, y = -5, label = "Sleeping Dogs (Negative Uplift)", color = "darkred") +
  labs(title = "True Heterogeneous Treatment Effect",
       x = "X1 (Persuade Score)", y = "True Uplift (tau)") +
  theme_minimal()

2. Data Simulation

We generate a dataset of 12,000 customers. X1 drives the uplift, while X2 drives the baseline spend.

set.seed(42)
n <- 12000
X1 <- rnorm(n); X2 <- rnorm(n); X3 <- rnorm(n); X4 <- rnorm(n)
X <- cbind(X1, X2, X3, X4)
colnames(X) <- c("X1_persuade", "X2_baseline", "X3", "X4")

# Baseline spend is heavily influenced by X2
mu0 <- 180 + 60 * X2 + 25 * (X2^2) + 8 * X3
# Uplift is influenced by X1
tau <- 50 * tanh(1.2 * X1) - 10 * (as.numeric(X1 < -0.5))

W <- rbinom(n, 1, 0.5) # Randomized treatment
Y <- mu0 + W * tau + rnorm(n, 0, 30)

# 60/40 Train/Test Split of the Sample
test_idx <- sample.int(n, size = floor(0.4 * n))
X_tr <- X[-test_idx, ]; Y_tr <- Y[-test_idx]; W_tr <- W[-test_idx]
X_te <- X[test_idx, ]; Y_te <- Y[test_idx]; W_te <- W[test_idx]

3. Training: Spend vs. Uplift Models

We fit two forests: one to predict levels (regression_forest) and one to predict effects (causal_forest).

# Prediction Model (Spend)
spend_model <- regression_forest(X_tr, Y_tr, num.trees = 1000)
spend_score <- predict(spend_model, X_te)$predictions

# Uplift Model (CATE)
uplift_model <- causal_forest(X_tr, Y_tr, W_tr, num.trees = 2000)
tau_hat <- predict(uplift_model, X_te)$predictions

4. Overlap Analysis: The Risk of Spend-Based Targeting

This section quantifies why targeting “High Spenders” can be wasteful or even damaging. \ Spend Score vs. Uplift Score scatter plot visualizes the core insight: High spenders are often not the best people to target..

df_scatter <- tibble(spend = as.numeric(spend_score), uplift = as.numeric(tau_hat))

ggplot(df_scatter, aes(x = spend, y = uplift)) +
  geom_point(alpha = 0.2, size = 1, color = "#34495e") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Spend Score vs. Uplift Score",
       subtitle = "The 'Cloud' shape shows that spend does not predict response",
       x = "Predicted Spend E[Y|X]", y = "Predicted Uplift tau(X)") +
  theme_minimal()

5. Variable Importance Comparison

We compare which features drive the two models.

s: one to predict levels (regression_forest) and one to predict effects (causal_forest).

vi_spend <- tibble(feature = colnames(X), importance = as.numeric(variable_importance(spend_model)), model = "Spend Model")
vi_uplift <- tibble(feature = colnames(X), importance = as.numeric(variable_importance(uplift_model)), model = "Uplift Model")

bind_rows(vi_spend, vi_uplift) %>%
  ggplot(aes(x = reorder(feature, importance), y = importance, fill = model)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~model, scales = "free_y") +
  coord_flip() +
  labs(title = "What Drives the Model?", x = NULL, y = "Importance Score") +
  theme_minimal() + scale_fill_manual(values = c("steelblue", "darkslategray"))

6. Formal Diagnostics & Production Readiness

Calibration Test Interpretation

When you run test_calibration, you are looking for two coefficients:

mean.forest: Should be ~1. This means the forest’s average prediction is accurate.
differential.forest: This is the “Targeting Signal.”
- If the coefficient is positive and the p-value is small (< 0.05), your model has found real heterogeneity. You are ready to target.
- If the p-value is high, the model is “guessing” and targeting will likely fail.

test_calibration(uplift_model)

## 
## Best linear fit using forest predictions (on held-out data)
## as well as the mean forest prediction as regressors, along
## with one-sided heteroskedasticity-robust (HC3) SEs:
## 
##                                Estimate Std. Error t value    Pr(>t)    
## mean.forest.prediction         0.976872   0.199588  4.8944 5.036e-07 ***
## differential.forest.prediction 1.010695   0.019421 52.0405 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

7. Policy Value & RATE (Modern grf Diagnostics)

We evaluate the gain from targeting the top q fraction of users ranked by uplift.

# 1. Evaluation forest for unbiased scores on test set
eval_forest <- causal_forest(X_te, Y_te, W_te, num.trees = 1000)

# 2. Rank Average Treatment Effect (RATE)
rate_obj <- rank_average_treatment_effect(eval_forest, tau_hat)

# 3. Plot TOC (Targeting Operator Characteristic)
ggplot(rate_obj$TOC, aes(x = q, y = estimate)) +
  geom_line(color = "darkblue", size = 1) +
  geom_ribbon(aes(ymin = estimate - 1.96 * std.err, ymax = estimate + 1.96 * std.err), alpha = 0.2) +
  geom_abline(slope = rate_obj$estimate, intercept = 0, linetype = "dashed") +
  labs(title = paste("RATE Curve (AUTOC:", round(rate_obj$estimate, 3), ")"),
       subtitle = "Measures if our ranking captures more uplift than random targeting",
       x = "Fraction Targeted (q)", y = "Targeting Gain (TOC)") +
  theme_minimal()

8. Profit Analysis by targeting top k% of spenders or CATE

We evaluate the gain from targeting the top k fraction of users ranked by uplift.

# --- Net Profit & ROI Analysis ---

# Define economic parameters
coupon_cost <- 2    # The cost of sending the coupon (discount + delivery)
margin_rate <- 0.50  # Assume 50% margin on the revenue (Y)

# Function to calculate Net Profit based on IPS
# Net Profit = (Margin * Incremental Revenue) - (Treatment Cost * People Treated)
calc_net_profit <- function(score, fracs, cost, margin) {
  sapply(fracs, function(f) {
    n_total <- length(score)
    k <- floor(f * n_total)
    
    # 1. Calculate Policy Value (Incremental Revenue)
    pi <- rep(0, n_total)
    if(k > 0) pi[order(score, decreasing = TRUE)[1:k]] <- 1
    
    # IPS Estimator for Incremental Revenue (Revenue vs Treat-None)
    inc_rev <- mean(pi * (W_te * Y_te / 0.5) + (1 - pi) * ((1 - W_te) * Y_te / 0.5)) - 
               mean((1 - W_te) * Y_te / 0.5)
    
    # 2. Calculate Net Profit
    # Margin on the incremental revenue minus the total cost of coupons sent
    net_profit <- (margin * inc_rev) - (f * cost)
    return(net_profit)
  })
}

# Evaluate both models on Profit
fracs <- seq(0, 1, length.out = 25)
profit_spend <- calc_net_profit(spend_score, fracs, coupon_cost, margin_rate)
profit_uplift <- calc_net_profit(tau_hat, fracs, coupon_cost, margin_rate)

df_profit <- tibble(
  percent_treated = fracs * 100,
  `Strategy: High Spend` = profit_spend,
  `Strategy: High Uplift` = profit_uplift
) %>% pivot_longer(-percent_treated, names_to = "Strategy", values_to = "NetProfit")

# Plotting the Net Profit Curve
ggplot(df_profit, aes(x = percent_treated, y = NetProfit, color = Strategy)) +
  geom_line(linewidth = 1.2) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  labs(
    title = "Net Profit Curve: Optimizing for ROI",
    subtitle = "Accounting for Margin and Treatment Costs",
    x = "% of Population Treated",
    y = "Incremental Net Profit per Customer"
  ) +
  theme_minimal()