library(data.table)
library(grf)
library(ggplot2)
library(policytree)
cgm_data = fread('~/Downloads/contact_responsiveness_data.csv')

Estimate heteregeneous treatment effects on T1D patients of in-office visits and remote interventions

Using causal forests from grf package.

Remote interventions

filtered_data = cgm_data[
  type %in% c('x','y') &
    mean_inrange_before>0 &
    mean_inrange_after>0
  ]
W = filtered_data[,ifelse(type == 'y', 1, 0)]
table(W)
## W
##   0   1 
## 106 163

Have 269 observations, 163 of which had interventions.

Fit Causal forest to estimate heterogeneous treatment effects

features = c(
  'mean_inrange_before',
  'mean_coef_var_before'
)
X = as.matrix(filtered_data[, ..features])
Y = filtered_data[, mean_inrange_diff]

f = causal_forest(X, Y, W, tune.parameters="all")

average_treatment_effect(f, target.sample = "treated")
## Warning in average_treatment_effect(f, target.sample = "treated"): Estimated
## treatment propensities go as high as 0.998 which means that treatment effects
## for some treated units may not be well identified. In this case, using
## `target.sample=control` may be helpful.
##    estimate     std.err 
## -0.03265553  0.02881155

Average treatment effect on treated observations is negative and insignificant

Let’s look at distribution of estimated heterogeneous treatment effects

hist(predict(f)$predictions)

Some are negative and some are positive – interesting.

Look at grid.

surface_dt = merge(
  data.table(mean_inrange_before=seq(0,1, 0.05), i=1),
  data.table(mean_coef_var_before=seq(0,1, 0.05), i=1),
  by='i', allow.cartesian = TRUE
  )[,i:=NULL]
surface_dt[,tau := predict(f, newdata = as.matrix(surface_dt[,1:2]))$predictions]

ggplot(surface_dt, aes(x=mean_inrange_before, y=mean_coef_var_before, color=tau, label=round(tau*100,1))) + geom_text(size=2) + theme_minimal() +
  ggtitle("CATE (Percentage point change in Time in Range)")

Looks like high-variance, low-time-in-range patients might have positive treatment effects, which makes sense intuitively.

Check if significant.

average_treatment_effect(f, subset=(X[,1]<0.7 & X[,2]>0.4))
##   estimate    std.err 
## 0.01601459 0.05923015

Positive but not significant. Probably need more data.

Effect of Visits

filtered_data = cgm_data[
  type %in% c('','visit') &
    mean_inrange_before>0 &
    mean_inrange_after>0
  ]
W = filtered_data[,ifelse(type == 'visit', 1, 0)]
table(W)
## W
##   0   1 
##  70 196
X = as.matrix(filtered_data[, ..features])
Y = filtered_data[, mean_inrange_diff]

f = causal_forest(X, Y, W, tune.parameters="all")

average_treatment_effect(f, target.sample = "treated")
##   estimate    std.err 
## 0.03673448 0.01949819

Positive ATE on treated patients and barely significant.

Let’s look at distribution of estimated heterogeneous treatment effects

hist(predict(f)$predictions)

Very little variation in estimated treatment effects.

surface_dt[,tau := predict(f, newdata = as.matrix(surface_dt[,1:2]))$predictions]

ggplot(surface_dt, aes(x=mean_inrange_before, y=mean_coef_var_before, color=tau, label=round(tau*100,1))) + geom_text(size=2) + theme_minimal() +
  ggtitle("CATE (Percentage point change in Time in Range)")

Learning “Optimal” Assignment Policy (Visit vs Remote intervention) by patient

filtered_data = cgm_data[
  type %in% c('','visit','x','y') &
    mean_inrange_before>0 &
    mean_inrange_after>0
  ]
W = merge(
  filtered_data,
  data.table(
    type = c('', 'visit', 'x','y'),
    W = c(1,3,1,2)
  ))$W
X = as.matrix(filtered_data[, ..features])
Y = filtered_data[, mean_inrange_diff]

multi.forest <- multi_causal_forest(X, Y, W, tune.parameters = "all")
opt.tree = policy_tree(X, double_robust_scores(multi.forest), depth = 2)
plot(opt.tree)

Actions are {1: do nothing, 2: remote intervention, 3: visit}.

Not very interesting:

  • Do nothing for time in range < 46%
  • Remote intervention for TIR in (.46, .54] and TIR > 0.67
  • Visits for TIR in (.54, .67]

If this is a good policy, it’s a sad policy. Hopefully it’s not a good policy and we need more data.