This notebook uses simulations to assess the effect of sampling error on B2SOF payments for a single implementing partner assuming that true IP effect size is equal to the target effect size and that investors achieve 0 return if estimated effect size equals the target effect size. The simulations show with the current evaluation design, investors can expect returns between -30% and +30% (90 confidence interval) and the average return is right around 0. With the alternate evaluation design which measures marginal change in learning year by year, investors can expect returns between -82% and +1019% (90% confidence interval) and average return is 440%.
In the code below, we simulate the output of and the payments from the outcome funders to the investors. To simulate the evaluation output, we first estimate the sampling distribution of the relevant estimators using the evaluation design and the hypothesized true IP impact. We then take many random draws from the sampling distribution of these estimators and, for each random draw, calculate the payment from the outcome payers to the investors based on the random draw and a hypothesized target and payment formula.
For example, suppose that the B2SOF was only conducted for a single year, for a single implementing partner (IP), and the evaluation used a randomized design. Further suppose that the true average treatment effect of the IP is \(\tau\). The sampling distribution for the estimate of the average treatment effect would be:
\[ \hat{\tau} \sim N(\tau, se) \]
Where se is the standard error of the RCT estimate of impact. To simulate the output from this evaluation, we would randomly draw estimates from the sampling distribution for \(\hat{\tau}\) above and then plug these estimates into the payment formula.
This approach differs from the typical approach to determining the appropriate sample size for an evaluation. Typically, an evaluator would perform a power calculation to determine the minimum sample size required for a given hypothesized effect size such that, if the true effect were equal to the hypothesized effect, the evaluation would result in a statistically significant effect 80% of the time. We use the simulation approach described above rather than the typical power calculation approach for two reasons. First, we don’t necessarily care about achieving statistical significance. Rather, we care more about the effect of sampling error on total payments from the outcome payers to the investors. Second, the simulation approach allows us to fully take into account the effect of various potential evaluation designs and payment formulas on the final payment which would be very hard to do analytically. For example, if we estimate the impact of the IP interventions on students twice for the same set of students there will be a negative correlation between the first estimate of impact and the second estimate of impact. Accounting for dependency between the impact estimates analytically would be difficult but is straightforward when using a simulation approach.
We caveat that the simulation approach assumes that the assumptions required for our evaluation hold and that our estimators are unbiased which may not be the case. Thus, these simulations likely underestimate the overall mean squared error of payment returns.
In the calculations below, we assume that the true impact of the IP is:
.15 in the first year of exposure
.09 in the second year of exposure
.05 in the third year of exposure
We assume that the targets are equal to true impact of the IP for the set of students being measured. Since the share of students with one, two, and three years of exposure to the IP’s intervention at each evaluation measurement varies by evaluation design and year, the targets may also vary by evaluation design and year.
We assume that, in year y, the investor is paid \(p_y\) equal to
\[ p_y = \frac{\hat{\tau_y}}{4*t_y} \]
Where \(\hat{\tau_y}\) is the estimated average treatment effect in year y and \(t_y\) is the target for year y. Thus, if the estimated effect equals the target in each year the investor would be paid 1.
The diagram below provides details of the data collection rounds for the evaluation design. Note that the data collection for grade 2 at the end of the learning year and for grade 3 at the end of year 4 are not used to determine payment. Thus, in each year from years 1 to 4, the evaluation estimates the impact of 2 years of exposure to the IP intervention and the target for each year is thus .15+.09 = .24.
To reduce sample size while also ensuring that IPs don’t only target schools selected at baseline for the evaluation, we will resample 50% of schools at each endline.
Graphical user interface, application
With this evaluation design, each annual impact estimate \(\hat{\tau_y}\) is independent and thus we can simulate the output from the evaluation by drawing from the sampling distribution of the \(\hat{\tau_y}\) terms and calculating the payment in each year.
To calculate the variance of \(\hat{\tau_y}\) we note that \(\hat{\tau_y}=.5*\widehat{\tau_{y,dd}}+.5*\widehat{\tau_{y,p}}\) where \(\widehat{\tau_{y,dd}}\) is the diffs in diffs estimate calculated using the 50% of the sample for which schools are resampled and \(\widehat{\tau_{y,p}}\) is the panel estimate calculated using the 50% of the sample for which schools are not resampled and the same set of students are assessed at baseline and endline.
We first estimate the variance of a clustered mean and then use this formula to estimate \(Var(\widehat{\tau_{y,dd}})\) and \(Var(\widehat{\tau_{y,p}})\)
If we take a random sample of J schools with K students per school to estimate mean learning levels y, then the variance of the estimate of the mean (i.e. the square of the standard error) of y is:
\[ V(\bar{y}) = SE^2= \sigma_y^2\left(\frac{\rho}{J}+\frac{(1-\rho)}{JK}\right)=\sigma_y^2*(A+B) \]
Where \(\sigma_y\) is the variance of the outcome variable and \(\rho\) is intra-class correlation (ICC). Since we are calculating the standard error of standard effect sizes, we can take \(\sigma\) to be 1.
For the portion of our sample in which we resample schools, our estimate of impact is
\[ \widehat{\tau_{y,dd}}=(\bar{Y}_{y,post, treat}-\bar{Y}_{y,pre, treat})-(\bar{Y}_{y,post, control}-\bar{Y}_{y,pre, control}) \]
Since we resample schools, each of these means is independent and so the variance of the impact estimate is more or less 4 times the variance of a single clustered mean with the number of schools = J/2 with a slight adjustment to account for our control variables. The variance of our impact estimate is:
\[ Var(\widehat{\tau_{y,dd}}) = 8A(1-R_J^2)+8B(1-R_K^2) \]
Where:
We adjust by the \(R^2\) terms to account for the reduced variance due to the use of control variables.
For the portion of our sample in which we don’t resample schools,
\[ Var(\widehat{\tau_{y,p}}) = 8*(A+B)(1-auto) \]
Where auto is the autocorrelation of test scores. See here for more details
Thus, the overall variance of our annual estimate of impact is
\[ Var(\hat{\tau_y})=.25*Var(\widehat{\tau_{y,dd}})+.25*Var(\widehat{\tau_{y,p}})=\]
\[ 2A(1-R_J^2)+2B(1-R_K^2)+2*(A+B)*(1-auto) \]
library(tidyverse)
# sampling inputs
J <- 50 # Number of schools per arm
K <- 25 # Number of students per school
rho <- 0.1 # ICC
rsj <- .2
rsk <- .2
auto <- .6^2
# variance calculations
A <- rho/J
B <- (1-rho)/(J*K)
var <- 2*A*(1-rsj)+2*B*(1-rsk)+2*(A+B)*(1-auto)
se = var^.5
sims <- 100000
true_impact = .24 # true impact for 2 years of exposure
target = .24 # target for each cohort
est_effect_y1 <- rnorm(sims, mean = true_impact, sd = se)
est_effect_y2 <- rnorm(sims, mean = true_impact, sd = se)
est_effect_y3 <- rnorm(sims, mean = true_impact, sd = se)
est_effect_y4 <- rnorm(sims, mean = true_impact, sd = se)
payment_y1 <- pmax(0, est_effect_y1/(4*target))
payment_y2 <- pmax(0, est_effect_y2/(4*target))
payment_y3 <- pmax(0, est_effect_y3/(4*target))
payment_y4 <- pmax(0, est_effect_y4/(4*target))
inv_return = payment_y1+payment_y2+payment_y3+payment_y4-1
print("Mean return is:")
mean(inv_return)
print("90% confidence interval for return")
quantile(inv_return, c(.05,.95))
print("75% confidence interval for return")
quantile(inv_return, c(.125,.875))
ggplot(tibble(p = inv_return), aes(x = p)) +
geom_histogram(binwidth = .02) +
labs(x = "Return", y = "Frequency")
The table below shows the alternate evaluation design we considered.
Unlike the current evaluation design, with this evaluation design annual impact estimates are not independent. To simulate the output of the evaluation, I first calculate the sampling distribution for each year + grade data collection point (i.e. each cell in the evaluation design figure above) using the hypothesized true effect and the estimated variance of the measured effect and then simulate the evaluation output using draws from these distributions.
For example, the red cell in the figure above represents the different in mean learning levels between treatment and control schools for year 2, grade 2 students \(\Delta \bar{y}_{g2,y2}\). If we believe that the true effect of the intervention over 2 years is T and the standard error of this measurement is se and assume our estimate is normally distributed then our estimate of this quantity has the distribution
\[\widehat{\Delta \bar{y}_{g2,y2}} \sim N(T, se)\]
We can then simulate estimated learning gains and payments by drawing multiple values from the distribution of measured learning gains for each year + grade data collection round.
Note that variance of each grade and year data collection point is
\[ V(\Delta \overline{y_{g,y}}) = 2A(1-R_J^2)+2B(1-R_K^2) \]
library(MASS)
# sampling inputs
J <- 65 # Number of schools per arm
K <- 25 # Number of students per school
rho <- 0.15
rsj <- .2
rsk <- .2
rsl <- .63
corr <- .5
Sigma <- matrix(c(1,corr, corr, 1), ncol = 2)
# variance calculations
A <- rho/J
B <- (1-rho)/(J*K)
var <- 2*A*(1-rsj)+2*B*(1-rsk)
se = var^.5
# Effect size and target inputs
effect_year1 = .15
effect_year2 = .05
effect_year3 = .025
target_year1 = .0875
target_year2 = .1
target_year3 = .0375
# Simulations
# note that g2y0e indicates that the draw is for grade 2, year 0, end of year
num_sims <- 100000
# year 0
g2y0e <- rnorm(n = num_sims, mean = .2, sd = se)
# year 1
g1y1b <- rnorm(n = num_sims, mean = 0, sd = se)
y1e <- mvrnorm(n = num_sims, mu = c(.225, .15), Sigma = Sigma)
g3y1e <- y1e[,1]
g1y1e <- y1e[,2]
# year 2
g1y2b <- rnorm(n = num_sims, mean = 0, sd = se)
y2e <- mvrnorm(n = num_sims, mu = c(.2, .15), Sigma = Sigma)
g2y2e <- y2e[,1]
g1y2e <- y2e[,2]
# year 3
g1y3b <- rnorm(n = num_sims, mean = 0, sd = se)
y3e <- mvrnorm(n = num_sims, mu = c(.2, .15), Sigma = Sigma)
g2y3e <- y3e[,1]
g1y3e <- y3e[,2]
# year 4
y4e <- mvrnorm(n = num_sims, mu = c(.225, .2), Sigma = Sigma)
g3y4e <- y4e[,1]
g2y4e <- y4e[,2]
# Payment calculation
est_effect_y1 <- ((g3y1e-g2y0e)+(g1y1e-g1y1b))/2
est_effect_y2 <- ((g2y2e-g1y1e)+(g1y2e-g1y2b))/2
est_effect_y3 <- ((g2y3e-g1y2e)+(g1y3e-g1y3b))/2
est_effect_y4 <- ((g3y4e-g2y4e)+(g2y4e-g1y3e))/2
# Check that mean estimated effect is similar to targets
print("Mean estimated effect y1 vs target of .0875:")
round(mean(est_effect_y1),4)
print("Mean estimated effect y2 vs target of .1:")
round(mean(est_effect_y2),4)
print("Mean estimated effect y3 vs target of .1:")
round(mean(est_effect_y3),4)
print("Mean estimated effect y4 vs target of .0375:")
round(mean(est_effect_y4),4)
# Calculate payment assuming minimum payment of 0
payment_y1 <- pmax(0, est_effect_y1/(4*.0875))
payment_y2 <- pmax(0, est_effect_y2/(4*.1))
payment_y3 <- pmax(0, est_effect_y3/(4*.1))
payment_y4 <- pmax(0, est_effect_y4/(4*.0375))
inv_return = payment_y1+payment_y2+payment_y3+payment_y4-1
print("Mean return is:")
mean(inv_return)
print("90% confidence interval for return")
quantile(inv_return, c(.05,.95))
print("75% confidence interval for return")
quantile(inv_return, c(.125,.875))
# histogram of total payment
ggplot(tibble(p = inv_return), aes(x = p)) +
geom_histogram(binwidth = .2) +
labs(x = "Return", y = "Frequency")