! Using an auto-discovered, cached token.
To suppress this message, modify your code or options to clearly consent to
the use of a cached token.
See gargle's "Non-interactive auth" vignette for more details:
<https://gargle.r-lib.org/articles/non-interactive-auth.html>
ℹ The googlesheets4 package is using a cached token for 'robert@agency.fund'.
There are 25 duplicates of pre_hhid rows with differing Household Compliance (%) values - lets drop the latter for now upon clarification from the RTV team.
# A tibble: 0 × 8
# ℹ 8 variables: village_id <chr>, Region <chr>, internet_access <chr>,
# num_hhs <dbl>, hh_id <chr>, 2026 <dbl>, A <chr>, hh_compliance <dbl>
There are no more missing matches in the data but there were manual edits made to the spreadsheets to repair some of the fixes (mispellings, shifted columns, etc).
library(lme4)library(performance)library(pwr)# --- Parameters (adjust as needed) ---compliance_col <-"hh_compliance"# column name for household compliance (0/1 or proportion)alpha <-0.05# two-sided significance levelpower_target <-0.80# desired powereffect_rel <-1.20# desired relative effect size (baseline * effect_rel)# --- Cluster-level summaries ---cluster_stats <- joined |>group_by(village_id) |>summarise(n_hh =n(),village_compliance =mean(.data[[compliance_col]], na.rm =TRUE),.groups ="drop" )K <-nrow(cluster_stats) # total number of villagesm_bar <-mean(cluster_stats$n_hh) # average households per villagep0 <-mean(cluster_stats$village_compliance, na.rm =TRUE) # baseline compliancep1 <- p0 * effect_rel # treatment-arm compliance# --- ICC via lmer ---model <-lmer(hh_compliance ~1+ (1| village_id), data = joined)ICC <-icc(model)$ICC_adjusted# --- Design effect and effective sample size (equal allocation, K/2 villages per arm) ---DEFF <-1+ (m_bar -1) * ICCK_per_arm <- K /2n_eff_per_arm <- K_per_arm * m_bar / DEFF# --- Power (Cohen's h arcsine transformation for proportions) ---h <-ES.h(p1, p0)result <-pwr.2p.test(h = h, n = n_eff_per_arm, sig.level = alpha, alternative ="two.sided")# --- Sensitivity: power across a range of cluster counts ---k_seq <-seq(10, max(K *2, 100), by =5)pwr_seq <-sapply(k_seq, function(k) {pwr.2p.test(h = h, n = (k /2) * m_bar / DEFF, sig.level = alpha, alternative ="two.sided")$power})sensitivity <-data.frame(villages_total = k_seq,villages_per_arm = k_seq /2,power_pct =round(pwr_seq *100, 1))
=== Cluster RCT Power Analysis: Village Randomisation ===
Villages (clusters) : 477 total | 238 per arm
Avg households / village : 28.7
Baseline compliance : 22.9%
Treatment compliance : 27.4% (*1.2)
Estimated ICC : 0.2327
Design effect (DEFF) : 7.44
Effective n per arm : 919.2 households
Cohen's h : 0.1055
--> Estimated power : 61.8%
--- Power by number of villages (holding m_bar, ICC, effect fixed) ---