library(data.table)
library(grf)
library(ggplot2)
library(policytree)
cgm_data = fread('~/Downloads/contact_responsiveness_data.csv')
Using causal forests from grf package.
filtered_data = cgm_data[
type %in% c('x','y') &
mean_inrange_before>0 &
mean_inrange_after>0
]
W = filtered_data[,ifelse(type == 'y', 1, 0)]
table(W)
## W
## 0 1
## 106 163
Have 269 observations, 163 of which had interventions.
Fit Causal forest to estimate heterogeneous treatment effects
features = c(
'mean_inrange_before',
'mean_coef_var_before'
)
X = as.matrix(filtered_data[, ..features])
Y = filtered_data[, mean_inrange_diff]
f = causal_forest(X, Y, W, tune.parameters="all")
average_treatment_effect(f, target.sample = "treated")
## Warning in average_treatment_effect(f, target.sample = "treated"): Estimated
## treatment propensities go as high as 0.998 which means that treatment effects
## for some treated units may not be well identified. In this case, using
## `target.sample=control` may be helpful.
## estimate std.err
## -0.03265553 0.02881155
Average treatment effect on treated observations is negative and insignificant
Let’s look at distribution of estimated heterogeneous treatment effects
hist(predict(f)$predictions)
Some are negative and some are positive – interesting.
Look at grid.
surface_dt = merge(
data.table(mean_inrange_before=seq(0,1, 0.05), i=1),
data.table(mean_coef_var_before=seq(0,1, 0.05), i=1),
by='i', allow.cartesian = TRUE
)[,i:=NULL]
surface_dt[,tau := predict(f, newdata = as.matrix(surface_dt[,1:2]))$predictions]
ggplot(surface_dt, aes(x=mean_inrange_before, y=mean_coef_var_before, color=tau, label=round(tau*100,1))) + geom_text(size=2) + theme_minimal() +
ggtitle("CATE (Percentage point change in Time in Range)")
Looks like high-variance, low-time-in-range patients might have positive treatment effects, which makes sense intuitively.
Check if significant.
average_treatment_effect(f, subset=(X[,1]<0.7 & X[,2]>0.4))
## estimate std.err
## 0.01601459 0.05923015
Positive but not significant. Probably need more data.
filtered_data = cgm_data[
type %in% c('','visit') &
mean_inrange_before>0 &
mean_inrange_after>0
]
W = filtered_data[,ifelse(type == 'visit', 1, 0)]
table(W)
## W
## 0 1
## 70 196
X = as.matrix(filtered_data[, ..features])
Y = filtered_data[, mean_inrange_diff]
f = causal_forest(X, Y, W, tune.parameters="all")
average_treatment_effect(f, target.sample = "treated")
## estimate std.err
## 0.03673448 0.01949819
Positive ATE on treated patients and barely significant.
Let’s look at distribution of estimated heterogeneous treatment effects
hist(predict(f)$predictions)
Very little variation in estimated treatment effects.
surface_dt[,tau := predict(f, newdata = as.matrix(surface_dt[,1:2]))$predictions]
ggplot(surface_dt, aes(x=mean_inrange_before, y=mean_coef_var_before, color=tau, label=round(tau*100,1))) + geom_text(size=2) + theme_minimal() +
ggtitle("CATE (Percentage point change in Time in Range)")
filtered_data = cgm_data[
type %in% c('','visit','x','y') &
mean_inrange_before>0 &
mean_inrange_after>0
]
W = merge(
filtered_data,
data.table(
type = c('', 'visit', 'x','y'),
W = c(1,3,1,2)
))$W
X = as.matrix(filtered_data[, ..features])
Y = filtered_data[, mean_inrange_diff]
multi.forest <- multi_causal_forest(X, Y, W, tune.parameters = "all")
opt.tree = policy_tree(X, double_robust_scores(multi.forest), depth = 2)
plot(opt.tree)
Actions are {1: do nothing, 2: remote intervention, 3: visit}.
Not very interesting:
If this is a good policy, it’s a sad policy. Hopefully it’s not a good policy and we need more data.