# Define population parameters
<- 50
n_villages <- 50 # if a village has 1000 people, assume 500 women
n_individuals <- 0.05 # variance explained by village; assume moderate ICC of .05
icc <- 0.2# individual effect (standard deviations)
effect_individual <- 0.01 # village-level spillover effect (standard deviations) effect_village
WITI Power Calculations
Overview
This document presents power calculations for an impact evaluation of an intervention designed by the Women’s Institute of Technology & Innovation (WITI) to build digital literacy among rural women in Uganda. The intervention will target young women who have dropped out of school and focus on basic skills, including graphic design, responsible use of digital payments, and effective use of AI tools. These skills are intended to make participants competitive for remote work opportunities.
Training will be conducted for roughly 4-weeks. To minimize transportation costs, trainings will be conducted in parish trading centers.
Our design uses two-stage randomization, whereby:
- Villages are assigned to treatment or control
- Individuals in treatment villages are assigned to treatment or control
This creates three groups of experimental subjects:
- Pure control (all individuals in control villages)
- Control in treatment villages
- Treatment in treatment villages
Group 1 gives us an estimate of outcome values for individuals with no exposure to the treatment. Group 3 gives an estimate of outcome values for respondents that received the treatment. Group 2 can be used to estimate spillover effects of the treatment to geographically proximate individuals.
Declare population
In this section, we declare parameters of the population and the impact of the intervention. This includes the number of villages, the number of individuals in each village, the inter-cluster correlation (ICC) of individuals within villages, the individual-level treatment effect, and the size of spillover effects. Importantly, we define outcome variables as z-scores so that we can think of effect sizes in terms of standard deviations. In the final section, we will vary some of these parameters to identify minimum viable sample sizes.
Next, we use these parameters to declare a base population. Note that u_village
and u_individual
will be combined to form the outcome measure Y
. Both variables are defined as z-scores with a mean of \(0\) and standard deviation \(1\).
# Declare population
<- declare_population(
population villages = add_level(N = n_villages,
u_village = rnorm(N,
mean = 0,
sd = sqrt(icc)) # take sqrt of icc variance to get sd
),individuals = add_level(N = n_individuals,
u_individual = rnorm(N,
mean = 0,
sd = 1)
)
)
= population() pop
Randomize Assignment
Next we conduct randomization in two stages. First, we use cluster randomization to assign 50% of villages to the treatment group.
# Village-level randomization
= declare_assignment(Z_village = conduct_ra(N = N, clusters = villages, prob = 0.5)) assign_village
Below, we check our work by generating the population (pop
), grouping observations by village (tab
), and counting the number of respondents per village and the number of villages in treatment and control.
= assign_village(pop)
pop = pop %>% group_by(villages) %>% summarise(pop = n(), z = min(Z_village))
tab table(tab$pop, tab$z)
0 1
50 25 25
Second, we randomize 50% of individuals within treatment villages to the treatment group.
# Individual-level randomization within treated villages
<- declare_step(
assign_individual handler = function(data) {
$Z_individual <- ifelse(data$Z_village == 1, rbinom(nrow(data), 1, 0.5), 0)
datareturn(data)
} )
Again, we check our work by counting the number of respondent in each experimental group. Note that we have a large number of individuals in Group 1 (pure control). We may wash to remedy this by assigning a larger proportion of villages to the treatment group than the control group.
= assign_individual(pop)
pop table(pop$Z_village, pop$Z_individual)
0 1
0 1250 0
1 625 625
Define Outcomes
Now we can define simulated outcome values for all experimental subjects. Here, we add u_village
to u_individual
to induce within-village correlation among respondents. We multiple u_village
and u_individual
by their corresponding treatment indicators to add the village-level spillover and individual treatment effect to each observation in Group 2 and Group 3.
# Define outcome based on treatment effects and clustering
<- declare_measurement(
declare_outcomes Y = effect_village * Z_village + # spillover effect
* Z_individual + # treatment effect
effect_individual + # village-level variation in outcome (ICC)
u_village # individual variance in outcome
u_individual )
We check our work by estimating a random intercept model and extracting the ICC of respondents from the same villages. We expect to see an ICC of \(\approx 0.05\).
= declare_outcomes(pop)
pop ## Confirm ICC of ~0.05
# Random intercept model: Y ~ overall intercept + random intercept by villages
<- lme4::lmer(Y ~ 1 + (1 | villages), data = pop)
model <- as.data.frame(lme4::VarCorr(model))
var_components <- var_components[1, "vcov"]
var_villages <- var_components[2, "vcov"]
var_resid <- var_villages / (var_villages + var_resid)
icc_value icc_value
[1] 0.0600922
Define Base Design
Using the functions and parameters defined above, we create the base design. This includes the estimands we care about and the regression model we plan to use.
# Define estimands
<- declare_inquiry(
estimands ATE_village = effect_village,
ATE_individual = effect_individual
)
# Define estimator
<- declare_estimator(
estimator # Z_individual drops because (Z_individual == Z_individual:Z_village)
~ Z_village + # Spillover effect
Y :Z_village , # Treatment effect
Z_individual.method = lm_robust,
clusters = villages,
term = c("Z_village", "Z_village:Z_individual"),
inquiry = c("ATE_village", "ATE_individual")
)
## Create design and run over range of treatment effect sizes
<- population + assign_village + assign_individual + declare_outcomes + estimands + estimator design
Because there are no treatment respondents in control villages, \(Z_individual == Z_individual:Z_village\). For this reason, we drop the Z_individual
component term of the Z_individual:Z_village
interaction from the regression model.
Determining Minimum Viable Sample Size
Finally, we rebuild the design while varying critical parameters. Below, we vary the size of the individual-level treatment effect and the number of villages (the highest cost parameter). Additional parameters can be varied by adding them to the redesign()
function.
= redesign(
designs
design,effect_individual = seq(0, 0.3, by = 0.01),
n_villages = seq(20, 50, by = 10)
)
<- diagnose_design(designs, sims = 500) diagnosis
Here, we visualize the results.
## Plotting
# Filter for ATE_individual inquiry
<- diagnosis$diagnosands_df %>%
results_indiv filter(inquiry == "ATE_individual")
# Plot power vs. effect_individual
ggplot(results_indiv, aes(x = effect_individual, y = power, color = factor(n_villages))) +
geom_line() +
geom_point() +
geom_hline(yintercept = 0.8) +
labs(
x = "Individual-Level Effect (Standard Deviations)",
y = "Power",
title = "Power for Individual ATE across effect sizes",
color = "Number of\nVillages"
+
) scale_y_continuous(labels = function(x) paste0(x*100, "%")) +
theme_bw() +
theme(legend.position = c(.1, .9))