Causal Forest: Plot

The end result of this analysis lets you answer questions such as: Who should get what? Who benefits more from the intervention? Answers can be provided by a simple rule-based policy tree. This procedure is commonly used, for example, to decide on a medical treatment prescription. Causal forests use doubly robust reward estimates to find an optimal decision tree. In this example: Someone whose intention to save index is equal to or lower than 26 and is a female, that person needs no treatment, whereas a man does. Similarly, if their intention to save index is greater than 26 and their financial autonomy index greater than 86, then they do not need treatment, whereas if the financial autonomy index is equal to or lower than 86, they do. These are the first 3 variables that you get when you extract the information on variable importance from the model (see last page): “financial.autonomy.index”, “intention.to.save.index”, “is.female”.

Exploratory Data Analysis

This dataset comprises student-level data from around 17’000 students. Variables include: outcome.test.score, which is a financial proficiency score and the outcome of interest in this study, treatment, school, is.female, father / mother.attended.secondary.school, failed.at.least.one.school.year, family.receives.cash.transfer, has.computer.with.internet.at.home, is.unemployed, has.some.form.of.income, saves.money.for.future.purchases, intention.to.save.index, makes.list.of.expenses.every.month, negotiates.prices.or.payment.methods, and financial.autonomy.index.

column=300x

column=300x

Summary Statistics

# A tibble: 2 × 3
  treatment `mean(outcome.test.score)`     n
      <int>                      <dbl> <int>
1         0                       56.2  8405
2         1                       60.5  8894
  mean(outcome.test.score)
1                 4.329958

It looks like this program increased student financial proficiency on average, as the difference between those who received the treatment vs. those who don’t is positive by over 4 points.

Causal Forest: Variable Importance

In order to estimate and summarize CATE (Conditional Average Treatment Effects), you can fix the propensity score to 0.5 since this is a randomized controlled trial (RCT), and so the possibility of being selected for the experiment and of being assigned to the treatment or control groups are all random. Since the RCT was clustered at the school level, the model includes a clustering variable, so that random units are drawn at the school level. The model lets you compute a doubly robust Average Treatment Effect or ATE estimate. The benefit appears to be quite strong if you consider the small standard error. A very simple way to see which variables appear to make a difference for treatment effects is to inspect variable importance. These variables have a greater influence on the treatment effects. In lay terms, if you want to cause the desired effect, these are the variables you have to focus on.

[1] "financial.autonomy.index"           "intention.to.save.index"           
[3] "is.female"                          "family.receives.cash.transfer"     
[5] "has.computer.with.internet.at.home"