WITI Power Calculations

Author
Affiliation

Jeremy Springman

University of Pennsylvania

Published

February 20, 2025

Overview

This document presents power calculations for an impact evaluation of an intervention designed by the Women’s Institute of Technology & Innovation (WITI) to build digital literacy among rural women in Uganda. The intervention will target young women who have dropped out of school and focus on basic skills, including graphic design, responsible use of digital payments, and effective use of AI tools. These skills are intended to make participants competitive for remote work opportunities.

Training will be conducted for roughly 4-weeks. To minimize transportation costs, trainings will be conducted in parish trading centers.

Our design uses two-stage randomization, whereby:

  1. Villages are assigned to treatment or control
  2. Individuals in treatment villages are assigned to treatment or control

This creates three groups of experimental subjects:

  1. Pure control (all individuals in control villages)
  2. Control in treatment villages
  3. Treatment in treatment villages

Group 1 gives us an estimate of outcome values for individuals with no exposure to the treatment. Group 3 gives an estimate of outcome values for respondents that received the treatment. Group 2 can be used to estimate spillover effects of the treatment to geographically proximate individuals.

Declare population

In this section, we declare parameters of the population and the impact of the intervention. This includes the number of villages, the number of individuals in each village, the inter-cluster correlation (ICC) of individuals within villages, the individual-level treatment effect, and the size of spillover effects. Importantly, we define outcome variables as z-scores so that we can think of effect sizes in terms of standard deviations. In the final section, we will vary some of these parameters to identify minimum viable sample sizes.

# Define population parameters
n_villages <- 50       
n_individuals <- 50    # if a village has 1000 people, assume 500 women    
icc <- 0.05             # variance explained by village; assume moderate ICC of .05
effect_individual <- 0.2# individual effect (standard deviations)
effect_village <- 0.01  # village-level spillover effect (standard deviations)

Next, we use these parameters to declare a base population. Note that u_village and u_individual will be combined to form the outcome measure Y. Both variables are defined as z-scores with a mean of \(0\) and standard deviation \(1\).

# Declare population
population <- declare_population(
  villages = add_level(N = n_villages, 
                       u_village = rnorm(N, 
                                         mean = 0,
                                         sd = sqrt(icc)) # take sqrt of icc variance to get sd
  ),
  individuals = add_level(N = n_individuals, 
                          u_individual = rnorm(N,
                                               mean = 0, 
                                               sd = 1)
  )
)

pop = population()

Randomize Assignment

Next we conduct randomization in two stages. First, we use cluster randomization to assign 50% of villages to the treatment group.

# Village-level randomization
assign_village = declare_assignment(Z_village = conduct_ra(N = N, clusters = villages, prob = 0.5))  

Below, we check our work by generating the population (pop), grouping observations by village (tab), and counting the number of respondents per village and the number of villages in treatment and control.

pop = assign_village(pop)
tab = pop %>% group_by(villages) %>% summarise(pop = n(), z = min(Z_village))
table(tab$pop, tab$z)
    
      0  1
  50 25 25

Second, we randomize 50% of individuals within treatment villages to the treatment group.

# Individual-level randomization within treated villages
assign_individual <- declare_step(
  handler = function(data) {
    data$Z_individual <- ifelse(data$Z_village == 1, rbinom(nrow(data), 1, 0.5), 0)
    return(data)
  }
)

Again, we check our work by counting the number of respondent in each experimental group. Note that we have a large number of individuals in Group 1 (pure control). We may wash to remedy this by assigning a larger proportion of villages to the treatment group than the control group.

pop = assign_individual(pop)
table(pop$Z_village, pop$Z_individual)
   
       0    1
  0 1250    0
  1  625  625

Define Outcomes

Now we can define simulated outcome values for all experimental subjects. Here, we add u_village to u_individual to induce within-village correlation among respondents. We multiple u_village and u_individual by their corresponding treatment indicators to add the village-level spillover and individual treatment effect to each observation in Group 2 and Group 3.

# Define outcome based on treatment effects and clustering
declare_outcomes <- declare_measurement(
  Y = effect_village * Z_village + # spillover effect
    effect_individual * Z_individual + # treatment effect
    u_village + # village-level variation in outcome (ICC) 
    u_individual # individual variance in outcome
)

We check our work by estimating a random intercept model and extracting the ICC of respondents from the same villages. We expect to see an ICC of \(\approx 0.05\).

pop = declare_outcomes(pop)
## Confirm ICC of ~0.05
# Random intercept model: Y ~ overall intercept + random intercept by villages
model <- lme4::lmer(Y ~ 1 + (1 | villages), data = pop)
var_components <- as.data.frame(lme4::VarCorr(model))
var_villages <- var_components[1, "vcov"]
var_resid    <- var_components[2, "vcov"]
icc_value <- var_villages / (var_villages + var_resid)
icc_value
[1] 0.0600922

Define Base Design

Using the functions and parameters defined above, we create the base design. This includes the estimands we care about and the regression model we plan to use.

# Define estimands
estimands <- declare_inquiry(
  ATE_village = effect_village,
  ATE_individual = effect_individual
)

# Define estimator
estimator <- declare_estimator(
  # Z_individual drops because (Z_individual == Z_individual:Z_village)
  Y ~ Z_village + # Spillover effect
    Z_individual:Z_village , # Treatment effect
  .method = lm_robust, 
  clusters = villages, 
  term       = c("Z_village", "Z_village:Z_individual"),
  inquiry = c("ATE_village", "ATE_individual")
)

## Create design and run over range of treatment effect sizes

design <- population + assign_village + assign_individual + declare_outcomes + estimands + estimator

Because there are no treatment respondents in control villages, \(Z_individual == Z_individual:Z_village\). For this reason, we drop the Z_individual component term of the Z_individual:Z_village interaction from the regression model.

Determining Minimum Viable Sample Size

Finally, we rebuild the design while varying critical parameters. Below, we vary the size of the individual-level treatment effect and the number of villages (the highest cost parameter). Additional parameters can be varied by adding them to the redesign() function.

designs = redesign(
  design,
  effect_individual = seq(0, 0.3, by = 0.01),  
  n_villages = seq(20, 50, by = 10)
)

diagnosis <- diagnose_design(designs, sims = 500)

Here, we visualize the results.

## Plotting

# Filter for ATE_individual inquiry
results_indiv <- diagnosis$diagnosands_df %>%
  filter(inquiry == "ATE_individual")

# Plot power vs. effect_individual
ggplot(results_indiv, aes(x = effect_individual, y = power, color = factor(n_villages))) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 0.8) +
  labs(
    x = "Individual-Level Effect (Standard Deviations)",
    y = "Power",
    title = "Power for Individual ATE across effect sizes",
    color = "Number of\nVillages"
    ) +
  scale_y_continuous(labels = function(x) paste0(x*100, "%")) +
  theme_bw() +
  theme(legend.position = c(.1, .9))