TODOs and Open Questions

Investigate negative-control outcomes — under the placeholder labelling, skin rash and toothache show large effects that should be null (§9). Once the real allocation is in place, repeat the check and look for residual structural confounding by village / geography.
Village-level participation rate (SAP §5.6 sensitivity, Treatment-on-Treated proxy) — cannot be derived from existing data without using the outcome itself as the stratifier. Either obtain the attendance records from NCRWSTG / Anhui CDC or document the §5.6 sensitivity as a data-availability deviation.
Verify the 7 unmatched EUM household IDs against the paper data collection forms: 20021018, 20021124, 20021125, 20021126, 20020714, 20020815, 20020917. The earlier merge corrections produced a treatment-arm mismatch and were reverted (§2.3); if any pair is a genuine typo to a different correct HH, a corrected merge can be re-applied.
Verify the 12 backward-date EUM rows against the paper forms: village 30 round 5 (6 HHs), village 21 round 3 (5 HHs), HH 20020418 round 5 (1 HH). All 12 are already neutralised in the analysis via lpd_negative; verification would recover the underlying kWh readings for those intervals (§2.4).
Decision: diarrhea recall window (SAP §5.1.2). The pipeline does not auto-switch the primary outcome. The side-by-side inspection material (prevalence by recall period; MH PR by recall window) is in §12a. If the PI judges that differential recall is present, change the primary diarrhea filter from recall_period == "oneweek" to "today" and rerun.
Decision: collaborator memo. A short memo back to the external replicator covers the broken-CSV bug we fixed, the SAP-separated vs composite handwashing outcome, the MH stratification axes, the paired-vs-unpaired t-test choice, etc. Sign-off needed before sending. Detail in reports/replication_status.qmd and reports/meeting_agenda_alasdair.md.

PM2.5 (SAP §5.1.2 secondary outcome) is out of scope for this analysis per PI decision and is documented as a deviation below.

Deviations from Pre-Registered SAP

Documented Deviations from SAP v1 (2018-04-29)

Methodological deviations:

Unpaired vs. paired t-tests for continuous outcomes (SAP §5.2). The SAP specifies “paired t-tests” for unadjusted between-group comparisons of EUM, TTC, and PM2.5. Since these are comparisons between independent treatment and control households (not within-household before-after), unpaired (Welch’s) two-sample t-tests are used. This is the standard approach for a parallel-arm cRCT.
Diarrhea age group boundary (SAP §5.1.2). The SAP specifies age groups “children <5; individuals age 5-18; adults age >=18” — the boundary at age 18 overlaps. We assign 18-year-olds to the 5-18 group, consistent with the SAP’s use of “individuals” for that category and “adults” (implying >18) for the older group.
Covariate screening does not account for clustering (SAP §5.3). Pre-screening of covariates via likelihood ratio tests uses simple GLM/LM models without cluster-level random effects. The final adjusted models do include cluster random effects.
EUM outlier cap at 10 L/person/day. SAP is silent on EUM outliers. Raw data contains values up to ~150 L/p/d, which are physically impossible (a single kettle cannot realistically boil > 5 kWh/day = ~10 L/p/d for a 5-person HH). lpd_outlier_cap = 10 in load_clean_eum_data() sets all values above this to NA. The external replicator effectively used the same threshold.

Data availability deviations:

PM2.5 analysis excluded. PM2.5 is out of scope for this analysis per PI decision; the SAP-listed sensor outcome will not be part of this paper.
~~Electricity affordability and risk perception variables not found.~~ Resolved: fe15 → electricity_afford, fq1/fq2 → risk_untreated_water_first/_second are now in d_clean. Subgroup targets subgroup_electricity_afford, subgroup_risk_perception, subgroup_boiler_sex, subgroup_boiler_age are implemented and render in §11 of this report.
Sensitivity analysis by participation rate not yet implemented (SAP §5.6). Cannot be derived cleanly from existing data — the natural proxies (village-level EK uptake, EUM coverage) are either the outcome itself (using it as a stratifier creates selection bias) or a planned subsample unrelated to attendance. Requires the NCRWSTG attendance records of the promotion events.
Travel time to health center and travel time to water source not available as covariates for adjusted models (SAP §5.3). Resolved: hl15 → health_centre_travel_min and w11 → water_travel_min are now in d_clean and sap_covariates.
Round 6 (endline) not included in primary analyses. The endline was a short survey only; most pre-specified outcomes are analyzed at baseline (R1) and midline (R3) per the SAP timing (§2.7 note about choosing ~6-month follow-up). An exploratory 12-month EK-use estimate is reported in §5d from the R6 enumerator-observation index.

Highest-value spot-checks

uses_electric_kettle primary outcome — concordance against BE.1 and the EK Use Likelihood Index. The SAP primary outcome is built solely from S.1 (drinking_water_method == "Boiled W Electricity"). Cross-check it against (a) kettle_type (BE.1, asked only if S.1 endorsed electric boiling) and (b) the enumerator-observation EK Use Likelihood Index (1–10). Systematic disagreement would suggest the primary outcome needs a tweak (e.g. include HHs flagged by BE.1 even if S.1 says otherwise).
Handwashing recoding rules vs field practice. The code currently maps only "Yes, after defecating" and "Yes, after defecating & before meals" to reported_hw_soap_outcome == 1. The other “Yes” categories ("Yes, but rarely" — 410 HHs at baseline; "Yes, before meals" — 8 HHs; "Yes, when guests visit") are currently coded as 0. Same call needed for OD.1’s "Yes, but not likely regularly used" (currently 0). Confirm these are the SAP-intended categorisations.
Verify the 7 unmatched EUM household IDs against paper forms. IDs 20021018, 20021124, 20021125, 20021126, 20020714, 20020815, 20020917 each have 5 rounds of real EUM kWh readings but no survey match (the prior merge corrections were reverted because they crossed treatment arms). Resolving them recovers up to 35 rounds of legitimate EUM data; leaving them unresolved means they are silently dropped from cluster-aware analyses.
Village → randomisation-block lookup tables. Every stratified analysis depends on the village ID lists at src/R/data_cleaning_functions.R lines 12–21 (mountain_villageids, plain_villageids, baseline_ek_use_0..3_villageids). Cross-check against the original SAP §8.3 randomisation allocation document. A single mis-assigned village would systematically corrupt one block.
The 0.104 kWh/L EUM conversion constant. This single multiplier scales every litres-per-person-per-day estimate in the report. The SAP §8.1 references it as the “grand mean from EUM testing protocol”. Confirm against the original kettle- calibration document — particularly whether the grand mean is still appropriate given the observed mix of kettle types (Standard EK w/auto, EK (M)/EK (L) no auto, induction, etc.).
The 21 inconsistent 1-week / 2-week diarrhea responses (logged in §2.6.3). We have shown these are real respondent inconsistencies rather than a coding bug. A PI field-memory read on whether this is broad diarrhea-recall noise — i.e., whether it should push us toward switching the SAP §5.1.2 primary recall window from 7-day to same-day — would be more credible than a remote analyst’s call.

1. Study Overview

This analysis follows the pre-registered Statistical Analysis Plan (SAP v1, 2018-04-29) for the Rural China Electric Kettle Promotion Program cluster-randomized controlled trial (ClinicalTrials.gov: NCT03376152).

Design: Parallel arm cohort cRCT, 1:1 ratio, 30 villages (clusters), stratified randomization by geography (mountains/plains) and baseline EK use.

Population: 900 poverty households (30 per village), ITT analysis.

Primary Outcomes: Electric kettle use prevalence (reported) and intensity (EUM-measured).

Two-sided tests, 5% significance level, 95% confidence intervals.

2. Data Cleaning and QC Audit

2.1 Pipeline architecture

The authoritative pipeline is driven by _targets.R, which calls four cleaning functions in src/R/data_cleaning_functions.R:

load_clean_household_data() → d_clean (HH × round panel)
load_clean_individual_data() → d_individ (individual × round)
load_clean_diarrhea_data(d_clean) → d_diar (individual × round × recall period)
load_clean_eum_data(d_clean) → d_eum (HH × EUM measurement)

The legacy src/01_data_cleaning.R is retained only for producing shareable CSVs (see replication/EKT_replication_data_corrected/); it does not feed any analysis target.

2.2 Variable derivation audit

The following key derivations were cross-checked against the survey instrument and the raw .dta factor labels:

Output variable	Raw source	Recoding rule	Status
`uses_electric_kettle`	`s1`	`"Boiled W Electricity"` → 1, NA → NA, else 0	✓ verified
`reported_hw_soap_outcome`	`hy4`	`"Yes, after defecating"` OR `"Yes, after defecating & before meals"` → 1; `"Don't know"` → NA; else 0	✓ verified
`observed_hw_soap_outcome`	`od1`	`"Yes, & likely used regularly"` → 1; `"Not able to check"` → NA; else 0	✓ verified
`mountains`	`village`	20 mountain villages, 10 plain villages	✓ verified (SAP §8.3)
`baseline_uses_ek`	`village`	4 categories (Cat. 0–3, SAP §8.3)	✓ verified
`diarrhea_today`	`hl8`	`"Yes"`→1, `"No"`→0, `"DK"`→NA	✓ value labels confirmed
`diarrhea_oneweek`	`hl9`	same	✓ value labels confirmed
`diarrhea_twoweek`	`hl10`	same	✓ verified (survey HL.10 = “Diarrhea in the last two weeks?”)
`cough_today`	`hl2`	same	✓ confirmed (`hl1` is the skip-gate)
`cough_oneweek`	`hl3`	same	✓ confirmed
`rash_oneweek`	`hl5`	same	✓ confirmed (`hl4` is the skip-gate)
`toothache_oneweek`	`hl6`	same	✓ confirmed
`hoh_female`	`g3c1..g3c4`, `g4c1..g4c4`	first person where `g3=="Yes"` AND `g4=="Female"`	✓ verified (all observed HoHs at positions 1–4)
`hoh_age`	`g3c1..g3c4`, `g5c1..g5c4`	age of first `g3=="Yes"` person	✓ verified
`pct_adults_highschool`	`g5c`, `g6c`	adults (age≥18) with edu ∈ {`"High School"`, `"University & higher"`}	✓ fixed (see §2.3)
`nhh`	`g1c1..g1c8`	count of non-NA person codes	✓ verified
`eum_conversion`	`g5c1..g5c8`	sum of (age<12 → 0.5, else 1)	✓ verified (SAP §8.1)
`lpd`	`eum_interval`, `eum_conversion`, `interval_days`	`(eum_interval / 0.104) / eum_conversion / interval_days`	✓ verified (SAP §8.1)
`lpd_outlier_cap`	parameter (default 10 L/p/d)	`lpd > cap` → NA	documented deviation; SAP-silent decision (see §2.4)
`water_travel_min`	`w11`	numeric minutes (NA when HH uses a piped connection at home)	✓ verified (SAP §5.3 covariate trigger evaluated in §2.6.13)
`health_centre_travel_min`	`hl15`	numeric minutes to nearest health centre	✓ verified (SAP §5.3 covariate; ~99% non-NA at baseline; median 20 min)
`risk_untreated_water_first`	`fq1`	first-response category (Nothing / They will get sick / diarrhea / vomit / typhoid / amoeba / cholera / stomach ache / Don’t know / Other)	✓ verified (SAP §5.4 subgroup; ~99% non-NA)
`risk_untreated_water_second`	`fq2`	secondary response (same categories as fq1)	✓ verified (less complete — ~66% non-NA — since respondents often gave only one answer)
`electricity_afford`	`fe15`	Very expensive / Expensive / Slightly expensive / Affordable / Very affordable / Don’t know / Govt. pays	✓ verified (SAP §5.4 subgroup; 100% non-NA)
`primary_boiler_age`	`be16_age1`	numeric age (years) of the primary person who boils water (BE.16 person 1)	✓ verified (SAP §5.4 boiling subgroup; ~38% non-NA — only HHs that report boiling are asked BE.16)
`primary_boiler_sex`	`be16_sex1`	“Female” / “Male”	✓ verified (60% Female, 40% Male among boiling HHs)

2.3 Issues identified and resolved

Education match for pct_adults_highschool. The previous code matched education strings "High school", "College", "College/university". The actual factor labels in both baseline and midline .dta files are "High School" (capital S) and "University & higher". The capital-S form was matched correctly; the "University & higher" label was not, so any adult with university education was excluded from the numerator of the high-school+ proportion. The effect is small in this sample (2 university-educated adults observed at g6c1 in baseline, 0 in midline), but systematic. Fixed by updating the match set to include both forms.

Diarrhea two-week recall mapping (hl10 → diarrhea_twoweek). An earlier draft of this report flagged a potential variable misidentification because the -layout extraction of the survey PDF appeared to show only three question texts in the diarrhea block (HL.7 / HL.8 / HL.9), with a fourth column header (HL.10) but no matching question. A second extraction without -layout cleared this up: in the survey, HL.1 / HL.4 / HL.7 / HL.11 are the “Person/code” column labels that introduce each block of health questions, and the actual diarrhea questions are:

HL.8 = “Diarrhea today?”
HL.9 = “Diarrhea in the last week?”
HL.10 = “Diarrhea in the last two weeks?”

This is exactly what the cleaning code assumes, and matches the pattern across the other blocks (HL.2/HL.3 = cough today/1-week; HL.5/HL.6 = rash/toothache 1-week). The Stata hl1, hl4, hl7, hl11 variables are not real questions — they correspond to the Person/code column labels, have no value labels, and (as expected) show 0% “Yes” rates in the data. The hl8/hl9/hl10 → diarrhea_today/oneweek/twoweek mapping is correct.

A separate observation that remains: at baseline c1, 21 individuals answered “Yes” to one-week diarrhea but “No” to two-week diarrhea (logically impossible for nested recall windows), and 20 answered the consistent reverse pattern. The cleaning is not at fault — this is a real data inconsistency arising from recall bias or rapid-fire question order. It is worth noting in any final write-up of the diarrhea results, but it is not a cleaning bug. The differential-recall sensitivity analysis in §12 already addresses how to handle this in the primary outcome.

Cluster-aware unadjusted analyses. The previous unadjusted MH and t-tests treated households within villages as independent, which under-states variance for a cRCT (inflated Type I error). The pipeline now uses washb::washb_mh() and washb::washb_ttest() for all unadjusted analyses (binary outcomes via stratified Mantel-Haenszel with strat = randomization_block; continuous outcomes via paired t-tests on randomization-block means). Degrees of freedom for the t-tests are now n_blocks − 1 rather than n_households − 2, which is the correct cluster-aware accounting. The tt_* and mh_* targets now carry an n_blocks column reporting how many of the 8 SAP randomization blocks (mountains × baseline_uses_ek) have both arms present in each stratum.

2.4 Flags requiring further verification

1. Placeholder treatment assignment. Until the real (unblinded) village-level allocation is integrated, all household-level inference is null by construction. The current placeholder uses set.seed(20250117) and randomises at the village level (correct for a cRCT design), but the labels themselves are arbitrary.

2. ~~lpd_outlier_cap is a non-SAP decision~~ — RESOLVED (set to 10 L/p/d). The default is now lpd_outlier_cap = 10 in load_clean_eum_data(). Rationale: a single household kettle cannot realistically boil more than ~5 kWh/day of drinking water (boil cycles are ~0.1–0.3 kWh each); for a 5-person HH that’s ~10 L/p/d. The cap also aligns the authoritative pipeline with the external replicator (whose effective cap was ~10). Documented as a deviation in the SAP-deviations list (§6 item 5 in the Deviations callout).

3. ~~Five HHs have 10 EUM rounds~~ — RESOLVED: all 7 ID corrections reverted. Investigation showed each source ID had EUM arm = 1 (treatment) while each target ID had arm = 0 (control) and lived in a survey village. The arm mismatch is incompatible with the typo hypothesis: if the SRC were really a typo for the TGT, they would be the same household and therefore in the same arm. All 7 corrections have been disabled in load_clean_eum_data(). The 5 previously-merged HHs now correctly show 5 rounds each; the 7 SRC IDs now exist as separate, unmatched EUM records and are silently dropped from cluster-aware analyses (their joined mountains / baseline_uses_ek are NA). The 7 SRC IDs (20021018, 20021124, 20021125, 20021126, 20020714, 20020815, 20020917) need verification against the paper data collection forms before any merge can be safely re-applied.

4. ~~EUM round-1 date 2017-01-01~~ — RESOLVED (one HH, fix applied). The single affected HH is 11020623 (village 1, control arm, mountain × Cat. 1). Round 1 was recorded as 2017-01-01 with kWh = 0 (an installation reading). The HH’s other rounds are spaced ~88–103 days apart starting 2018-02-02; the cohort’s median round-1 date is 2017-11-23 (range 2017-11-06 to 2017-12-10). The recorded date is almost certainly a typo where the month digit was lost (2017-11-01 → 2017-01-01). Without this fix the interval into round 2 is 397 days instead of ~95, dividing lpd by ~4 at round 2 for this HH. A targeted correction in load_clean_eum_data() now maps this single date to 2017-11-01. Paper-form verification by the data team is still requested to confirm the exact installation date.

5. ~~17 HHs with backward EUM dates~~ — INVESTIGATED (3 distinct patterns identified; 12 rows remain after the ID-correction revert). The earlier count of 17 HHs / 32 rows included artefacts from the ID-correction merge (Flag 3); after reverting those corrections, 12 rows across 12 distinct HHs remain. They fall into three clear patterns:

Pattern A — village 30, round 5 (6 HHs): 10220504, 10220505, 10220514, 10220517, 10220528, 10220529. Each shows a round-5 date 2–4 days before round 4 (both in late Aug 2018). kWh either doesn’t change (HHs 10220504 / 10220505 show 0 kWh throughout) or increases sensibly (other 4 HHs). The most parsimonious read is that village 30 was never re-visited for the November endline and a round-5 row was entered with a near-duplicate Aug date.
Pattern B — village 21, round 3 (5 HHs): 10520126, 10520127, 10520128, 10520129, 10520130. Each has round 3 dated 2018-08-28 and round 4 dated 2018-08-19/20, putting round 4 before round 3. kWh increases monotonically across rounds, so the order is correct. Round 3 should be a midline (~May 2018) visit; the recorded 2018-08-28 is most likely a month typo (05 → 08).
Pattern C — HH 20020418, round 5 (village 19, treatment): round 5 dated 2018-01-26, 196 days before round 4 (2018-08-10). kWh increases by 148 between rounds 4 and 5, consistent with another ~3 months of normal use. Most likely a month typo (11 → 01); the correct date is probably ~2018-11-26.

The current lpd_negative flag sets lpd = NA for all 12 rows, so they do not bias the analysis. Fixing the underlying dates would recover the data for these intervals (the kWh readings are real), but the corrections need to be verified against the paper forms before being applied. Action: data team to verify the dates for the 12 (household_id, round) pairs listed in the §2.6.9 backward-date chunk; the inferred corrections above are the proposed mapping.

2.5 Confirmed correct against raw data

s1 factor label is exactly "Boiled W Electricity" in both baseline and midline .dta files.
hy4 and od1 factor labels match the SAP-aligned recoding rules.
Health-outcome hl* variables have value labels No / Yes / DK (codes 1/2/3) as the cleaning code assumes.
All hl# → outcome mappings used by the cleaning code match the survey instrument question numbering: HL.2 / HL.3 = cough today / 1-week; HL.5 / HL.6 = rash / toothache 1-week; HL.8 / HL.9 / HL.10 = diarrhea today / 1-week / 2-week. See §2.3 for the resolution of the earlier hl10 flag.
The hl1, hl4, hl7, hl11 variables correspond to the “Person/code” column labels that introduce each block of health questions in the survey. They have no value labels, near-zero “Yes” rates, and are not analysed.
Every other variable rename in rename_vars() has been cross-checked against the survey instrument (extracted without -layout to avoid column misalignment): s1 = drinking water method (S.1), w1 / w6 = winter / summer drinking water source (W.1 / W.6), w23 = water quality perception, w26 = water filter use, be1 = kettle type (BE.1), be6 / be7 = winter / summer boil frequency, hy2 / hy3 = handwash before meals / after defecation, hy4 = reported soap use, hy5 = toilet type, od1 = observed soap, hd13 / hd14 = water-test / air-pollution sample flags. The 11 asset variables g15a..g15k map to refrigerator, rice cooker, washing machine, AC, old TV, flat TV, DVD, computer, e-scooter, petrol scooter, motorcycle per survey item G.15. All mappings are one-to-one and correct.
HoH appears only at roster positions 1–4 in the survey (640/125/10/1 in baseline c1–c4; 0 at c5–c8), so the current 1–4 search is complete.
Village stratification matches SAP §8.3 (20 mountain + 10 plain; 4 baseline-EK-use categories; 30 villages total).
Modern health-outcome recoding (string match on "Yes"/"No"/"DK") correctly recovers all events. The legacy 01_data_cleaning.R has a bind-then-numeric-match bug that drops every "Yes" event; it is no longer used for analysis but is the source of the broken shared CSVs in replication/EKT replication data/.

2.6 Live QC checks

These chunks rebuild every time the report renders and surface any regressions in the cleaning pipeline.

2.6.1 Treatment counts per round

Treatment assignment counts by round
Should be constant across rounds (village-level cRCT)
round	treat_0	treat_1	total
1	390	510	900
3	390	510	900
6	390	510	900

2.6.2 Drinking-water method ↔︎ EK use concordance

drinking_water_method × uses_electric_kettle
`Boiled W Electricity` should be 100% EK=1; all others should be EK=0
drinking_water_method	uses_electric_kettle	n
Bottled W	0	3
Boiled W	0	955
Boiled W Electricity	1	814
Other	0	1

2.6.3 Diarrhea recall-period consistency

For each individual–round combination, compare the 1-week and 2-week diarrhea answers. A “Yes” to the 1-week question implies “Yes” to the 2-week question (nested recall windows), so the “Yes 1wk / No 2wk” cell should be zero. Anything non-zero reflects respondent inconsistency (recall bias or rapid question ordering); the variable mapping itself was verified against the survey instrument in §2.3.

1-week vs 2-week diarrhea per individual
Logically impossible cells = response inconsistency, not a coding bug
label	round_1	round_3
No both	1895	1876
No 1wk / Yes 2wk (consistent)	28	19
Yes 1wk / No 2wk (logically impossible)	43	6
Yes both	45	41

The specific individuals in the logically-impossible cell (“Yes 1wk / No 2wk”) are listed below so the data team can trace back to the survey records:

Individuals reporting diarrhea in past week but NOT in past 2 weeks
Cleaning is correct; these are respondent inconsistencies in the raw data
Round	HH ID	Person
1	10020706	1
1	10020706	2
1	10020709	1
1	10120006	2
1	10220505	1
1	10220506	2
1	10220512	1
1	10220525	1
1	10220527	1
1	10220527	2
1	10420815	1
1	10520007	1
1	10520016	2
1	10520021	5
1	10520030	2
1	10620713	3
1	10620714	3
1	10620720	4
1	10620924	1
1	10620925	1
1	10620930	4
1	10820523	1
1	10920001	5
1	10920702	3
1	10920703	1
1	10920703	2
1	10920708	2
1	10920710	1
1	10920712	5
1	10920724	2
1	10920728	1
1	10921103	1
1	11020212	2
1	11020504	2
1	11020615	1
1	20020603	1
1	20100013	2
1	20120228	1
1	20120610	1
1	20120612	1
1	20120614	2
1	20120623	1
1	20120623	2
3	10120504	3
3	10920709	1
3	10920712	4
3	10920716	3
3	10920718	1
3	10920724	2

2.6.4 Head-of-household identification rate by round

Households with an HoH identified, by round
Expected ~85–90% (some HHs do not flag a HoH in the roster)
round	n_hh	n_with_hoh	pct_with_hoh
1	900	776	86.2
3	900	769	85.4
6	900	0	0.0

2.6.5 Education distribution at baseline (head-level)

Adults with high-school+ education (baseline)
% of adults; % of HHs with at least one such adult
n	mean_pct	median_pct	pct_any_hs_adult
899	3.6	0	8.7

2.6.6 EUM outlier and negative-interval counts

EUM data quality flags by round
Negative interval = meter reset / date error; outlier = lpd > cap
round	n_total	n_lpd	n_negative_flagged	n_outlier_flagged
1	516	0	0	0
2	516	460	0	6
3	516	438	3	5
4	516	390	7	6
5	516	351	14	5

2.6.7 EUM interval-day distribution

Intervals between EUM readings should cluster around the planned ~90-day cadence. Intervals ≤ 0 days are date errors; very short (< 7) or very long (> 180) intervals are suspect.

Interval length between consecutive EUM readings, by round
Counts of intervals in each duration bin
interval_class	round_2	round_3	round_4	round_5
30–180 days (expected)	486	466	437	409
NA (no prior reading)	30	45	74	93
> 180 days (very long)	0	5	0	0
≤ 0 days (backward / same-day)	0	0	5	7
7–29 days (short)	0	0	0	6
< 7 days (very short)	0	0	0	1

2.6.8 EUM households should have an electric kettle at baseline

Every household with EUM readings should have a kettle_type recorded at baseline (BE.1 is only asked when the household reports boiling with electricity in S.1).

EUM households cross-checked against baseline kettle_type
Any 'missing' rows = EUM data for HHs with no electric kettle on record
kettle_status	n
kettle_type missing in baseline survey	294
kettle_type = Standard EK w/auto shutoff	118
kettle_type = EK (M), no auto shutoff	80
kettle_type = EK (L), no auto shutoff	19
kettle_type = Other	3
kettle_type = Induction kettle (M)	1
kettle_type = Kettle (S) & hot plate	1

2.6.9 EUM round-1 (installation) date sanity

Round-1 readings are the installation/baseline measurement. Dates should cluster around the baseline-survey period (late 2017 / early 2018). Wide ranges or out-of-period values suggest data-entry errors.

EUM round-1 (installation) date distribution
Should be a narrow window aligned with baseline data collection
n	earliest_date	median_date	latest_date	range_days
510	2017-11-01	2017-11-23	2017-12-10	39

Round-1 dates that fall well outside the expected installation window (2017-11 / 2017-12) — likely typos for the data team to correct:

EUM round-1 dates that look like data-entry typos
Any HH with a round-1 date before 2017-06-01 (median is 2017-11-23)
household_id	round	date	eum

Within-HH date monotonicity (summary)
Rows where the EUM reading date is not strictly later than the previous round
n_backward_or_same_day	n_households_affected
12	12

Specific (household_id, round) pairs with backward dates — for the data team to cross-check against paper forms:

Households with backward EUM dates
prev_date should be earlier than date; rows shown are where it is not
household_id	Prev round	Prev date	Round	Date
10220504	4	2018-08-18	5	2018-08-16
10220505	4	2018-08-26	5	2018-08-22
10220514	4	2018-08-19	5	2018-08-16
10220517	4	2018-08-19	5	2018-08-16
10220528	4	2018-08-20	5	2018-08-17
10220529	4	2018-08-18	5	2018-08-17
10520126	3	2018-08-28	4	2018-08-19
10520127	3	2018-08-28	4	2018-08-18
10520128	3	2018-08-28	4	2018-08-20
10520129	3	2018-08-28	4	2018-08-20
10520130	3	2018-08-28	4	2018-08-20
20020418	4	2018-08-10	5	2018-01-26

2.6.10 EUM attrition pattern

How many households completed how many EUM rounds?

EUM follow-up completion: households by number of rounds
Of HHs with any EUM data, distribution of rounds completed (max should be 5)
n_rounds_completed	n_households	pct
1	30	5.8
2	24	4.7
3	42	8.1
4	49	9.5
5	371	71.9

Households with more than 5 rounds — physically impossible (there are only 5 EUM rounds). See §2.4 flag 3: these IDs correspond exactly to the pending ID-correction list in load_clean_eum_data(), and the multi-round rows have non-monotonic cumulative kWh, so the corrections likely merge distinct households.

Households with > 5 EUM rounds (likely ID-correction artefacts)
All 5 IDs match the 'corrected to' targets in load_clean_eum_data()
household_id	n_rounds_completed

2.6.11 Treatment-arm alignment: `d_eum` vs `d_clean`

d_eum’s treatment column comes from TC_scrambled in the EUM raw file. d_clean’s treatment column is the placeholder seeded village-level rbinom in load_clean_household_data(). Until the real allocation replaces the placeholder, these two assignments are independent random labels and should disagree at roughly chance rates. Once the real allocation arrives, this check must show 100% agreement.

Treatment arm: d_eum (TC_scrambled) vs d_clean (placeholder)
Expected to disagree today; must agree once the real allocation is wired in
tx_eum	tx_clean	n	label
0	0	209	Agree
0	1	240	Disagree (placeholder vs TC_scrambled)
1	0	20	Disagree (placeholder vs TC_scrambled)
1	1	40	Agree

2.6.12 kWh-per-day sanity

Energy analogue of the lpd_outlier check: distribution of kWh used between consecutive readings, normalised by interval days. Negative values indicate meter resets/replacements. Persistently high values (> 5 kWh/day for a kettle is implausible — a boil cycle is ~0.1–0.3 kWh) suggest measurement errors that may slip past the lpd_outlier_cap.

kWh per day across all EUM intervals
n_negative = meter resets; n_gt_5kWh = implausibly high energy use
stat	value
n	1701.000
min	-1.388
p1	0.000
p25	0.053
median	0.156
p75	0.296
p99	1.891
max	55.748
n_negative	12.000
n_gt_5kWh	8.000

2.6.13 Water-travel-time covariate (SAP §5.3 trigger)

The SAP includes travel time to the drinking water source as a covariate only if more than 5% of households have travel time greater than 10 minutes. The chunk below evaluates that trigger.

Travel time to drinking water source (W.11) at baseline
SAP §5.3: include as covariate only if pct_gt_10_min > 5%
n_total	n_with_minutes	n_gt_10_min	pct_gt_10_min	median_min	max_min
900	43	7	0.78	5	20

Most rows are NA because the question lets households fill in either a numeric minute count or a piped-connection letter code (a–e); only HHs who report fetching water externally fill in minutes. NA is therefore the expected value for HHs with piped water at home.

2.6.14 Missing-data summary for analysis-key variables

Missing-data rates at midline for key analysis variables
variable	n_missing	pct_missing
uses_electric_kettle	27	3.0
reported_hw_soap_outcome	4	0.4
observed_hw_soap_outcome	4	0.4
log10wTTC	623	69.2
mountains	0	0.0
baseline_uses_ek	0	0.0
toilet	34	3.8
filter_use	34	3.8
water_quality_perception	37	4.1

3. CONSORT Flow

Participant Flow Summary
Stage	N
Eligible poverty households	3,545 (across 43 villages)
Randomly selected villages	30
Randomly selected households	900
Baseline (R1) — HH surveys	900
Treatment arm (R1)	510
Control arm (R1)	390
Midline (R3) — HH surveys	900
Treatment arm (R3)	510
Control arm (R3)	390
Endline (R6) — short survey	900
Water quality subsample (R1+R3)	546

4. Baseline Characteristics (SAP Section 4.5)

Table 1: Baseline characteristics by treatment arm
	level	0	1
n		390	510
surveyTime (mean (SD))		25.23 (7.59)	24.44 (7.02)
nhh (mean (SD))		2.27 (1.11)	2.42 (1.26)
n_female_adults (mean (SD))		0.96 (0.60)	1.02 (0.65)
n_female_children (mean (SD))		0.13 (0.36)	0.16 (0.45)
pct_adults_highschool (mean (SD))		0.03 (0.12)	0.04 (0.13)
uses_electric_kettle (%)	0	217 (55.6)	342 (67.1)
	1	173 (44.4)	168 (32.9)
drinking_water_method (%)	Bottled W	0 ( 0.0)	2 ( 0.4)
	Boiled W	217 (55.6)	340 (66.7)
	Boiled W Electricity	173 (44.4)	168 (32.9)
water_quality_perception (%)	Very bad	1 ( 0.3)	0 ( 0.0)
	Poor	55 (14.1)	34 ( 6.7)
	Satisfactory	76 (19.5)	128 (25.1)
	Good	160 (41.1)	248 (48.6)
	Very good	78 (20.1)	85 (16.7)
	Don't know	19 ( 4.9)	15 ( 2.9)
filter_use (%)	1	364 (97.6)	494 (97.2)
	2	5 ( 1.3)	4 ( 0.8)
	3	4 ( 1.1)	10 ( 2.0)
toilet (%)	No toilet	8 ( 2.1)	6 ( 1.2)
	Ventilation improved toilet	14 ( 3.6)	2 ( 0.4)
	Basic pit latrine	309 (79.4)	429 (84.3)
	Double pit latrine	1 ( 0.3)	1 ( 0.2)
	Septic tank latrine	2 ( 0.5)	4 ( 0.8)
	Biogas toilet	2 ( 0.5)	4 ( 0.8)
	Water flush toilet	29 ( 7.5)	26 ( 5.1)
	Water flush toilet w/sewer connection	21 ( 5.4)	36 ( 7.1)
	Other	3 ( 0.8)	1 ( 0.2)
reported_hw_soap_outcome (%)	0	360 (92.3)	478 (93.7)
	1	30 ( 7.7)	32 ( 6.3)
observed_hw_soap_outcome (%)	0	354 (90.8)	423 (82.9)
	1	36 ( 9.2)	87 (17.1)
asset_fridge (%)	0	96 (24.6)	84 (16.5)
	1	287 (73.6)	412 (80.8)
	2	7 ( 1.8)	14 ( 2.7)
asset_washingmachine (%)	0	305 (78.4)	433 (84.9)
	1	84 (21.6)	76 (14.9)
	2	0 ( 0.0)	1 ( 0.2)
asset_ricecooker (%)	0	49 (12.6)	59 (11.6)
	1	326 (83.6)	442 (86.7)
	2	15 ( 3.8)	9 ( 1.8)
asset_tv_flat (%)	0	225 (57.8)	317 (63.0)
	1	158 (40.6)	181 (36.0)
	2	6 ( 1.5)	5 ( 1.0)
asset_computer (%)	0	369 (94.6)	482 (95.4)
	1	19 ( 4.9)	22 ( 4.4)
	2	2 ( 0.5)	1 ( 0.2)

5. Unadjusted Means and SDs (SAP Section 5.2)

Per the SAP: “Unadjusted and adjusted means and SDs will be reported for treatment and control groups.”

Unadjusted Means (SD) by Treatment Arm at Midline
Outcome	Control		Treatment
Outcome	N	Mean (SD)	N	Mean (SD)
EK use prevalence	378	0.537 (0.499)	495	0.545 (0.498)
Log10 TTC	58	0.285 (0.803)	219	0.139 (0.512)
Diarrhea (7-day)	836	0.042 (0.200)	1182	0.012 (0.108)
Handwashing (reported)	386	0.197 (0.398)	510	0.041 (0.199)
Handwashing (observed)	387	0.243 (0.429)	509	0.128 (0.334)
Cough (7-day)	787	0.105 (0.307)	1173	0.033 (0.179)
Skin rash (7-day)	826	0.036 (0.187)	1173	0.009 (0.092)
Toothache (7-day)	826	0.062 (0.241)	1167	0.010 (0.101)
EUM liters/person/day	387	1.119 (1.395)	51	1.438 (1.614)

Baseline vs Midline: Key Outcomes by Arm

Figure 1. Baseline (round 1) and midline (round 3) prevalence of self-reported electric kettle use and self-reported handwashing with soap, by treatment arm. Bars show the proportion in each round-arm cell.

6. Primary Outcomes (SAP Section 5.1.1)

5a. EK Use Prevalence (Reported)

Descriptive Statistics

EK Use Prevalence by Round and Treatment Arm
Round	Arm	N	N Using EK	Prevalence
1	Control	390	173	44.4%
1	Treatment	510	168	32.9%
3	Control	378	203	53.7%
3	Treatment	495	270	54.5%

Figure 2. Self-reported electric kettle (EK) use by round and arm. N and number of EK users per cell are shown above.

Figure 3. Proportion of households reporting electric kettle use at baseline (round 1) and midline (round 3), by treatment arm.

Unadjusted: Mantel-Haenszel Prevalence Ratios and Differences

EK Use: Unadjusted MH Estimates (Midline)
Stratum	Measure	Estimate (95% CI)	P-value	N
Overall (block-stratified)	PR	1.068 (0.915, 1.246)	0.4057	873
Overall (block-stratified)	RD	0.034 (-0.042, 0.11)	0.3835	873
Mountain	PR	1.134 (0.92, 1.398)	0.2398	580
Mountain	RD	0.06 (-0.034, 0.153)	0.2108	580
Plain	PR	0.972 (0.777, 1.214)	0.8005	293
Plain	RD	-0.017 (-0.148, 0.114)	0.7960	293
EK Cat. 0	PR	--	--	144
EK Cat. 0	RD	--	--	144
EK Cat. 1	PR	--	--	204
EK Cat. 1	RD	--	--	204
EK Cat. 2	PR	0.913 (0.776, 1.074)	0.2731	353
EK Cat. 2	RD	-0.057 (-0.159, 0.045)	0.2718	353
EK Cat. 3	PR	--	--	172
EK Cat. 3	RD	--	--	172

Adjusted: Modified Poisson GLMM (IRR)

EK Use: Adjusted IRR (Modified Poisson GLMM)
Outcome	IRR (95% CI)	P-value	N
uses_electric_kettle	0.945 (0.522, 1.711)	0.8520	873

5b. EK Use Intensity (EUM)

Descriptive Statistics

EK Use Intensity: Liters per Person per Day by Round (Interval-based)
Round	Arm	N	Mean L/p/d	SD	Median	min_lpd	max_lpd
2	Control	407	0.86	1.19	0.52	0.00000000	9.483516
2	Treatment	53	1.46	1.72	0.78	0.00000000	7.618343
3	Control	387	1.12	1.39	0.67	0.00000000	9.640861
3	Treatment	51	1.44	1.61	0.84	0.00000000	7.003934
4	Control	343	1.02	1.19	0.70	0.00000000	9.432814
4	Treatment	47	1.58	1.75	1.12	0.03243671	8.281638
5	Control	306	1.27	1.49	0.92	0.00000000	8.999084
5	Treatment	45	1.22	1.16	0.92	0.01254181	6.278281

Unadjusted: T-tests by Round

EUM lpd: unadjusted cluster-aware t-test by round (Treatment − Control)
Paired t-test on block-level means (washb_ttest); df = n_blocks − 1. Welch unpaired fallback used when n_blocks < 2 (not cluster-aware).
Round	Control Mean	Treatment Mean	Diff (95% CI)	P-value	N	Blocks	Method
2 (~3mo)	0.86	1.46	1.223 (-0.639, 3.086)	0.1522	460	6	paired-block (washb_ttest)
3 (~6mo)	1.12	1.44	0.583 (-0.429, 1.595)	0.1986	438	6	paired-block (washb_ttest)
4 (~9mo)	1.02	1.58	0.699 (-0.074, 1.472)	0.0677	390	6	paired-block (washb_ttest)
5 (~12mo)	1.27	1.22	0.104 (-0.307, 0.514)	0.5450	351	6	paired-block (washb_ttest)

Unadjusted: T-tests Stratified (Midline, Round 3)

EUM lpd: Stratified T-tests (Midline, Round 3)
Stratum	Control	Treatment	Diff (95% CI)	P-value	N
Overall (block-paired)	1.12	1.44	0.583 (-0.429, 1.595)	0.1986	438
Mountain	1.19	1.12	-0.057 (-1.354, 1.241)	0.8685	276
Plain	1.00	1.97	1.223 (-1.032, 3.478)	0.1448	162
EK Cat. 0	1.89	1.24	-0.653 (-3.085, 1.779)	0.4802	61
EK Cat. 1	1.13	3.06	--	--	91
EK Cat. 2	0.94	1.45	0.647 (-3.532, 4.827)	0.2993	191
EK Cat. 3	0.92	1.29	0.314 (-1.583, 2.212)	0.2823	95

Adjusted: Linear Mixed Model

EUM lpd: Adjusted Mean Difference (Linear Mixed Model, Midline)
Outcome	Estimate (95% CI)	P-value	N
lpd	0.436 (-0.366, 1.238)	0.2643	438

Exploratory: Longitudinal EUM Trends

Figure 4. Electric kettle use intensity over time. Mean litres of boiled drinking water per person per day (lpd) by EUM round (~3, ~6, ~9, ~12 months) and treatment arm. Error bars are 95% confidence intervals around each round–arm mean.

5c. EK Use Likelihood Index (Enumerator Observations)

EK Use Likelihood Index (1-10 scale) by Round and Arm
Round	Arm	N	Mean Index	SD
3	Control	246	8.13	1.49
3	Treatment	305	8.29	1.78
6	Control	271	8.62	1.48
6	Treatment	344	8.64	1.78

Figure 5. Distribution of the enumerator-observation EK Use Likelihood Index categories (Unlikely / Occasional / Likely regular user) at baseline (R1) and midline (R3), by treatment arm.

5d. Exploratory: EK use at 12 months (R6 endline)

Exploratory analysis — not in the pre-specified SAP

R6 (endline) was a short survey that captured only enumerator observations of the kettle (od8/od9/od10) and not the self-reported drinking_water_method that underlies the SAP-defined uses_electric_kettle outcome at baseline and midline.

To produce a 12-month EK-use estimate from R6, we derive a binary “likely EK user” from the enumerator-observation index ek_use_likelihood (1–10):

Strict threshold: ek_use_likelihood > 7 (“Likely regular user”).
Lenient threshold: ek_use_likelihood > 4 (“Occasional or Likely regular user”).

Both are reported because the cut-point is analyst-arbitrary. The lenient threshold is the closer analog to the midline self-report binary (which captures anyone whose primary boiling method is electric, regardless of frequency).

This analysis was not pre-specified and should be interpreted as a descriptive 12-month follow-up signal, not a primary outcome.

Descriptive: enumerator-observed EK use at 12 months by arm

Enumerator-observed EK use at 12 months
Mean likelihood index (1–10), and % above each threshold
Arm	N	Mean likelihood	% strict (> 7)	% lenient (> 4)
Control	271	8.62	89.3	97.8
Treatment	344	8.64	87.5	96.2

Cluster-aware MH at 12 months (both thresholds)

EK use at 12 months: MH PR / RD by threshold
Cluster-aware MH stratified by randomization block
Threshold	Measure	Estimate (95% CI)	P-value	N
Strict (likelihood > 7)	PR	1.043 (0.973, 1.117)	0.2363	615
Strict (likelihood > 7)	RD	0.037 (-0.025, 0.099)	0.2402	615
Lenient (likelihood > 4)	PR	1.000 (0.978, 1.023)	0.9841	615
Lenient (likelihood > 4)	RD	0.000 (-0.022, 0.023)	0.9839	615

7. Secondary Outcomes

6a. Thermotolerant Coliforms (TTC) — SAP Section 5.1.2

Descriptive: Detection, Geometric Mean, Risk Classification

TTC Summary by Round, Arm, and Geography
Round	Arm	Geography	N	% Detected	Geo. Mean MPN/100mL	geometric_mean_ci_lo	geometric_mean_ci_hi
1	Control	mountain	55	20.0	2.0	1.3	2.9
1	Treatment	mountain	100	29.0	2.5	1.8	3.5
1	Treatment	plain	114	21.1	2.0	1.5	2.7
3	Control	mountain	58	12.1	1.9	1.2	3.1
3	Treatment	mountain	104	8.7	1.4	1.1	1.8
3	Treatment	plain	115	8.7	1.4	1.1	1.7

Figure 6. Geometric mean thermotolerant coliforms (TTC, MPN/100 mL) in household drinking water by round, geography (mountain vs plain), and treatment arm. Error bars are 95% confidence intervals around the geometric mean.

TTC Risk Classification
Round	Arm	Risk Category	N	%
1	Control	Not detected	44	80.0
1	Control	1-9 MPN/100mL	2	3.6
1	Control	10-99 MPN/100mL	8	14.5
1	Control	>=100 MPN/100mL	1	1.8
1	Treatment	Not detected	161	75.2
1	Treatment	1-9 MPN/100mL	14	6.5
1	Treatment	10-99 MPN/100mL	30	14.0
1	Treatment	>=100 MPN/100mL	9	4.2
3	Control	Not detected	51	87.9
3	Control	1-9 MPN/100mL	0	0.0
3	Control	10-99 MPN/100mL	2	3.4
3	Control	>=100 MPN/100mL	5	8.6
3	Treatment	Not detected	200	91.3
3	Treatment	1-9 MPN/100mL	6	2.7
3	Treatment	10-99 MPN/100mL	5	2.3
3	Treatment	>=100 MPN/100mL	8	3.7

Unadjusted: T-tests (Overall)

Log10 TTC: unadjusted cluster-aware t-test (Treatment − Control)
Paired t-test on block-level means when n_blocks ≥ 2; Welch unpaired fallback otherwise (not cluster-aware).
Round	Control Mean	Treatment Mean	Diff (95% CI)	P-value	N	Blocks	Method
1 (Baseline)	0.29	0.35	0.062 (-0.130, 0.254)	0.5252	269	1	Welch (fallback, NOT cluster-aware)
3 (Midline)	0.29	0.14	-0.146 (-0.368, 0.075)	0.1917	277	1	Welch (fallback, NOT cluster-aware)

Note

The water-quality subsample (~270 at baseline, ~280 at midline) often does not cover ≥ 2 randomization blocks with both arms present. When that happens, the function falls back to a Welch unpaired t-test on the raw values; the Method column makes the fallback explicit. The fallback is not cluster-aware (it treats individual households as independent), so the adjusted LMM result below remains the primary inference for TTC.

Unadjusted: T-tests Stratified (Midline)

Log10 TTC: Stratified T-tests (Midline)
Stratum	Control	Treatment	Diff (95% CI)	P-value	N
Overall (block-paired)	0.29	0.14	-0.146 (-0.368, 0.075)	0.1917	277
Mountain	0.29	0.15	-0.139 (-0.372, 0.095)	0.2409	162
Plain	NA	0.13	--	--	115
EK Cat. 0	NA	0.27	--	--	52
EK Cat. 1	0.16	0.05	-0.113 (-0.355, 0.130)	0.3517	89
EK Cat. 2	NA	0.00	--	--	28
EK Cat. 3	0.42	0.17	-0.250 (-0.637, 0.137)	0.1975	108

Adjusted: Linear Mixed Model

Log10 TTC: Adjusted Mean Difference (Linear Mixed Model, Midline)
Outcome	Estimate (95% CI)	P-value	N
log10wTTC	-0.157 (-1.130, 0.815)	0.5595	277

6b. Diarrhea Prevalence — SAP Section 5.1.2

Unadjusted: MH (7-day recall) — Overall + Stratified

Diarrhea (7-day recall): Unadjusted MH Estimates (Midline)
Stratum	Measure	Estimate (95% CI)	P-value	N
Overall (block-stratified)	PR	0.484 (0.214, 1.095)	0.0817	2018
Overall (block-stratified)	RD	-0.015 (-0.031, 0.001)	0.0668	2018
Mountain	PR	0.520 (0.189, 1.434)	0.2064	1291
Mountain	RD	-0.014 (-0.034, 0.006)	0.1702	1291
Plain	PR	0.413 (0.105, 1.621)	0.2051	727
Plain	RD	-0.016 (-0.041, 0.009)	0.2144	727
EK Cat. 0	PR	--	--	334
EK Cat. 0	RD	--	--	334
EK Cat. 1	PR	--	--	454
EK Cat. 1	RD	--	--	454
EK Cat. 2	PR	0.965 (0.353, 2.636)	0.9441	797
EK Cat. 2	RD	-0.001 (-0.020, 0.019)	0.9435	797
EK Cat. 3	PR	--	--	433
EK Cat. 3	RD	--	--	433

Diarrhea Prevalence by Age Group — Descriptive

Diarrhea Prevalence (%) by Age Group and Arm (Midline, 7-day recall)
Age Group	Control			Treatment
Age Group	Control_n	Control_events	Control_prevalence	Treatment_n	Treatment_events	Treatment_prevalence
5-18 years	82	3	3.7	125	1	0.8
<5 years	11	0	0.0	17	2	11.8
>18 years	661	20	3.0	990	11	1.1
Unknown	82	12	14.6	50	0	0.0

By Age Group — MH Prevalence Ratios

Diarrhea PR by Age Group (Midline, 7-day recall)
Age Group	Measure	Estimate (95% CI)	P-value	N
<5 years	PR	--	--	28
5-18 years	PR	0.355 (0.039, 3.234)	0.3581	207
>18 years	PR	0.495 (0.192, 1.273)	0.1443	1651
all	PR	0.484 (0.214, 1.095)	0.0817	2018

Figure 7. Cluster-aware Mantel–Haenszel prevalence ratios for 7-day diarrhea at midline, by age group (<5 years, 5–18 years, >18 years, and all ages). Stratified by the eight randomization blocks (mountains × baseline EK use category).

Figure 8. Forest plot of the 7-day diarrhea prevalence ratios by age group (midline), on a log scale. Point estimates and 95% CIs from cluster-aware Mantel–Haenszel.

Adjusted: Modified Poisson GLMM

Diarrhea: Adjusted IRR (Modified Poisson GLMM)
Outcome	IRR (95% CI)	P-value	N
diarrhea_value	0.435 (0.032, 5.971)	0.5331	2018

6c. Handwashing with Soap — SAP Section 5.1.2

Reported Handwashing — Overall + Stratified

Reported Handwashing with Soap: Unadjusted MH (Midline)
stratum	measure	result	pval	n
Overall (block-stratified)	PR	0.308 (0.193, 0.493)	0.0000	896
Overall (block-stratified)	RD	-0.115 (-0.166, -0.065)	0.0000	896
Mountain	PR	0.215 (0.122, 0.381)	0.0000	596
Mountain	RD	-0.159 (-0.224, -0.094)	0.0000	596
Plain	PR	0.714 (0.281, 1.818)	0.4802	300
Plain	RD	-0.027 (-0.102, 0.048)	0.4856	300
EK Cat. 0	PR	--	--	150
EK Cat. 0	RD	--	--	150
EK Cat. 1	PR	--	--	206
EK Cat. 1	RD	--	--	206
EK Cat. 2	PR	0.586 (0.309, 1.111)	0.1015	360
EK Cat. 2	RD	-0.052 (-0.116, 0.012)	0.1115	360
EK Cat. 3	PR	--	--	180
EK Cat. 3	RD	--	--	180

Observed Handwashing — Overall + Stratified

Observed Handwashing with Soap: Unadjusted MH (Midline)
stratum	measure	result	pval	n
Overall (block-stratified)	PR	0.870 (0.583, 1.298)	0.4962	896
Overall (block-stratified)	RD	-0.018 (-0.072, 0.035)	0.5073	896
Mountain	PR	1.384 (0.765, 2.506)	0.2825	596
Mountain	RD	0.029 (-0.026, 0.085)	0.2985	596
Plain	PR	0.575 (0.326, 1.013)	0.0556	300
Plain	RD	-0.113 (-0.226, -0.000)	0.0492	300
EK Cat. 0	PR	--	--	150
EK Cat. 0	RD	--	--	150
EK Cat. 1	PR	--	--	207
EK Cat. 1	RD	--	--	207
EK Cat. 2	PR	0.605 (0.378, 0.966)	0.0353	359
EK Cat. 2	RD	-0.087 (-0.167, -0.006)	0.0348	359
EK Cat. 3	PR	--	--	180
EK Cat. 3	RD	--	--	180

8. Tertiary Outcome: Cough Prevalence (SAP Section 5.1.3)

Overall + Stratified by Randomization Block

Cough (7-day): unadjusted cluster-aware MH (midline)
stratum	measure	result	pval	n
Overall (block-stratified)	PR	0.432 (0.266, 0.699)	0.0006	1960
Overall (block-stratified)	RD	-0.049 (-0.077, -0.022)	0.0004	1960
Mountain	PR	0.606 (0.311, 1.181)	0.1409	1240
Mountain	RD	-0.026 (-0.058, 0.006)	0.1121	1240
Plain	PR	0.270 (0.131, 0.557)	0.0004	720
Plain	RD	-0.089 (-0.139, -0.040)	0.0004	720
EK Cat. 0	PR	--	--	332
EK Cat. 0	RD	--	--	332
EK Cat. 1	PR	--	--	446
EK Cat. 1	RD	--	--	446
EK Cat. 2	PR	0.971 (0.520, 1.813)	0.9261	763
EK Cat. 2	RD	-0.002 (-0.035, 0.032)	0.9243	763
EK Cat. 3	PR	--	--	419
EK Cat. 3	RD	--	--	419

By Age Group (SAP §5.1.3: <5, 5-18, >18, overall)

Cough (7-day) PR by age group (midline)
Cluster-aware MH stratified by randomization block
age_group	measure	result	pval	n
<5 years	PR	0.714 (0.128, 3.974)	0.7008	28
5-18 years	PR	2.786 (0.267, 29.038)	0.3917	198
>18 years	PR	0.411 (0.243, 0.697)	0.0010	1610
all	PR	0.432 (0.266, 0.699)	0.0006	1960

9. Negative Control Outcomes (SAP Section 5.1.4)

Overall + Stratified by Randomization Block

Negative controls (7-day): unadjusted cluster-aware MH (midline)
Outcome	Stratum	Measure	Estimate (95% CI)	P-value	N
rash_value	Overall (block-stratified)	PR	0.078 (0.020, 0.315)	0.0003	1999
rash_value	Overall (block-stratified)	RD	-0.034 (-0.050, -0.018)	0.0000	1999
rash_value	Mountain	PR	0.034 (0.001, 0.907)	0.0435	1277
rash_value	Mountain	RD	-0.022 (-0.037, -0.006)	0.0059	1277
rash_value	Plain	PR	0.107 (0.023, 0.503)	0.0047	722
rash_value	Plain	RD	-0.057 (-0.091, -0.022)	0.0014	722
rash_value	EK Cat. 0	PR	--	--	333
rash_value	EK Cat. 0	RD	--	--	333
rash_value	EK Cat. 1	PR	--	--	449
rash_value	EK Cat. 1	RD	--	--	449
rash_value	EK Cat. 2	PR	0.055 (0.007, 0.408)	0.0046	790
rash_value	EK Cat. 2	RD	-0.041 (-0.063, -0.018)	0.0004	790
rash_value	EK Cat. 3	PR	--	--	427
rash_value	EK Cat. 3	RD	--	--	427
toothache_value	Overall (block-stratified)	PR	0.137 (0.047, 0.400)	0.0003	1993
toothache_value	Overall (block-stratified)	RD	-0.039 (-0.057, -0.021)	0.0000	1993
toothache_value	Mountain	PR	0.167 (0.046, 0.605)	0.0064	1272
toothache_value	Mountain	RD	-0.037 (-0.060, -0.014)	0.0014	1272
toothache_value	Plain	PR	0.083 (0.011, 0.642)	0.0170	721
toothache_value	Plain	RD	-0.042 (-0.071, -0.012)	0.0053	721
toothache_value	EK Cat. 0	PR	--	--	328
toothache_value	EK Cat. 0	RD	--	--	328
toothache_value	EK Cat. 1	PR	--	--	450
toothache_value	EK Cat. 1	RD	--	--	450
toothache_value	EK Cat. 2	PR	0.309 (0.105, 0.905)	0.0322	791
toothache_value	EK Cat. 2	RD	-0.024 (-0.046, -0.002)	0.0320	791
toothache_value	EK Cat. 3	PR	--	--	424
toothache_value	EK Cat. 3	RD	--	--	424

Rash by Age Group (SAP §5.1.4)

Skin rash (7-day) PR by age group (midline)
Cluster-aware MH stratified by randomization block
age_group	measure	result	pval	n
<5 years	PR	--	--	26
5-18 years	PR	--	--	203
>18 years	PR	0.094 (0.023, 0.380)	0.0009	1646
all	PR	0.078 (0.020, 0.315)	0.0003	1999

Toothache by Age Group (SAP §5.1.4)

Toothache (7-day) PR by age group (midline)
Cluster-aware MH stratified by randomization block
age_group	measure	result	pval	n
<5 years	PR	--	--	26
5-18 years	PR	--	--	201
>18 years	PR	0.162 (0.052, 0.504)	0.0017	1640
all	PR	0.137 (0.047, 0.400)	0.0003	1993

10. All Adjusted Results Summary

All Adjusted Treatment Effects (Midline)
Outcome	Model Type	Estimate (95% CI)	P-value	N
uses_electric_kettle	Binary (IRR)	0.945 (0.522, 1.711)	0.8520	873
diarrhea_value	Binary (IRR)	0.435 (0.032, 5.971)	0.5331	2018
reported_hw_soap_outcome	Binary (IRR)	0.178 (0.042, 0.755)	0.0192	896
observed_hw_soap_outcome	Binary (IRR)	0.946 (0.250, 3.585)	0.9354	896
cough_value	Binary (IRR)	0.524 (0.149, 1.845)	0.3143	1960
rash_value	Binary (IRR)	0.086 (0.011, 0.649)	0.0174	1999
toothache_value	Binary (IRR)	0.183 (0.024, 1.403)	0.1022	1993
log10wTTC	Continuous (Coef)	-0.157 (-1.130, 0.815)	0.5595	277
lpd	Continuous (Coef)	0.436 (-0.366, 1.238)	0.2643	438

Forest Plot: All Outcomes

Figure 9. Forest plot of all binary outcomes at midline, showing the unadjusted cluster-aware Mantel–Haenszel prevalence ratio and the adjusted modified Poisson GLMM IRR (with CR2 cluster-robust SE) side by side. Negative controls (skin rash, toothache) are marked with triangles; SAP-specified outcomes with circles. Estimates on log scale; dashed reference at 1.

11. Subgroup Analyses (SAP Section 5.4)

By Primary Water Type

Subgroup by water type not estimable (no within-group variation in EK use when stratified by baseline water method).

By Water Quality Perception

EK Use PR by Water Quality Perception
Perception	PR (95% CI)	P-value	N
Good	0.910 (0.764, 1.084)	0.2903	498
Very good	1.814 (0.789, 4.170)	0.1608	133
Don't know	--	--	14
Satisfactory	1.193 (0.724, 1.966)	0.4878	148
Poor	0.523 (0.219, 1.248)	0.1440	64

By Electricity Affordability (FE.15)

EK Use PR by Perceived Electricity Affordability (FE.15)
Affordability category	PR (95% CI)	P-value	N
Slightly expensive	2.396 (1.729, 3.320)	0.0000	259
Affordable	0.773 (0.610, 0.979)	0.0326	536
Very affordable	--	--	50
Expensive	1.750 (0.576, 5.316)	0.3236	12
Very expensive	--	--	11

By Risk Perception of Untreated Water (FQ.1)

Shown both at the raw FQ.1-category level and at a binary collapse into “aware of any health consequence” vs “not aware”.

EK Use PR by Risk Perception of Untreated Water (FQ.1)
Variable	Category	PR (95% CI)	P-value	N
risk_untreated_water_first	They will get sick	1.161 (0.919, 1.467)	0.2096	493
risk_untreated_water_first	They will get diarrhea	1.519 (1.133, 2.037)	0.0053	252
risk_untreated_water_first	They will get stomach ache	--	--	68
risk_untreated_water_first	Nothing	--	--	13
risk_untreated_water_first	Other	--	--	22
risk_untreated_water_first	Don't know	--	--	14
risk_aware_binary	Aware (any health consequence)	1.072 (0.912, 1.260)	0.4007	816
risk_aware_binary	Not aware	0.735 (0.519, 1.043)	0.0845	19
risk_aware_binary	Unclear	--	--	36

By Primary Boiler Sex (SAP §5.4 boiling subgroup)

Restricted to households where BE.16 was administered (i.e., the household reported boiling water and identified a primary boiler).

EK Use PR by Primary Boiler Sex
Boiler sex	PR (95% CI)	P-value	N
Female	--	--	298
Male	1.022 (0.980, 1.066)	0.3173	176

By Primary Boiler Age Bin (SAP §5.4 boiling subgroup)

Bins (<50, 50–64, 65+) chosen to give comparable Ns from the observed distribution of primary_boiler_age (min 30, median 56, max 90).

EK Use PR by Primary Boiler Age Bin
Age bin	PR (95% CI)	P-value	N
<50	1.029 (0.973, 1.089)	0.3173	112
50-64	--	--	197
65+	--	--	165

12. Sensitivity Analyses

12a. Differential Recall Check (SAP Section 5.1.2)

SAP §5.1.2 trigger and current default

The SAP-specified primary recall window for diarrhea is 7 days. If there is evidence of differential recall between treatment and control arms — i.e. the treatment-control prevalence comparison is materially different at 1-day vs 7-day recall — the SAP allows switching the primary outcome to same-day recall.

The pipeline does not switch automatically. The two tables below provide the inspection material; the decision is left to the PI.

Step 1 — prevalence by recall period, round, and arm

If recall is comparable across arms, the columns within each row should follow the same ordering (today < 1-week < 2-week, monotonically). A widening gap between arms as the window grows indicates differential recall.

Diarrhea prevalence (%) by recall period, round, and arm
round	treatment	n_oneweek	n_today	n_twoweek	prevalence_oneweek	prevalence_today	prevalence_twoweek
1	Control	852	860	837	5.28	3.14	4.78
1	Treatment	1188	1199	1183	4.04	1.50	2.79
3	Control	836	835	793	4.19	2.63	5.67
3	Treatment	1182	1188	1149	1.18	0.51	1.31

Step 2 — treatment effect (MH PR / RD) at each recall window, midline

The clearer test for differential recall: does the treatment effect itself shift across recall windows? If the MH PR is similar at today / 1-week / 2-week, there is no differential-recall problem and the 7-day primary outcome stands. If it shifts (e.g., today shows a large protective effect that washes out at 1-week and 2-week), the SAP-allowed switch to same-day recall is supported.

Midline diarrhea MH (cluster-aware, block-stratified) by recall window
Compare the PR row across the three windows — large shifts indicate differential recall
Recall window	Measure	Estimate (95% CI)	P-value	N
Today (same day)	PR	0.297 (0.081, 1.090)	0.0672	2023
Today (same day)	RD	-0.012 (-0.023, -0.000)	0.0455	2023
1-week (SAP primary)	PR	0.484 (0.214, 1.095)	0.0817	2018
1-week (SAP primary)	RD	-0.015 (-0.031, 0.001)	0.0668	2018
2-week (sensitivity)	PR	0.357 (0.165, 0.772)	0.0089	1942
2-week (sensitivity)	RD	-0.027 (-0.045, -0.008)	0.0057	1942

How to use these two tables

Read Step 1 to confirm that prevalence increases monotonically with the recall window in each arm (today < 1-week < 2-week). If either arm violates that ordering, recall is suspect even before you check Step 2.
Read Step 2 to compare the treatment-effect PR across the three recall windows. A roughly constant PR across windows means the 7-day primary outcome is fine. A PR that systematically narrows with longer recall (e.g., 0.5 today, 0.8 at 1-week, 1.0 at 2-week) would indicate differential under-reporting in one arm.
If the PI judges that differential recall is present, the pre-specified switch to same-day (today) recall as the primary outcome is supported by SAP §5.1.2. The pipeline will then need to be updated to use recall_period == "today" as the primary filter for the diarrhea MH and adjusted models.

12b. 14-Day Diarrhea Recall Sensitivity

Diarrhea (14-day recall): Sensitivity MH (Midline)
measure	result	pval	n
PR	0.357 (0.165, 0.772)	0.0089	1942
RD	-0.027 (-0.045, -0.008)	0.0057	1942

12c. TTC Sensitivity (With/Without Outliers)

Log10 TTC: sensitivity analysis (with/without TTC:TC ≥ 1 outliers)
Analysis	Diff (95% CI)	P-value	N	Method
All observations	-0.146 (-0.368, 0.075)	0.1917	277	Welch (fallback, NOT cluster-aware)
Excluding TTC:TC>=1 outliers (n=252 removed)	-1.275 (-2.148, -0.402)	0.0091	25	Welch (fallback, NOT cluster-aware)
TC data needed to implement TTC:TC ≥ 1 outlier exclusion per SAP §5.1.2.

Participation Rate Sensitivity (SAP Section 5.6)

Sensitivity analysis stratifying by village-level participation rate to approximate Treatment on the Treated is specified in the SAP but village participation rate data has not yet been incorporated.

13. Covariates Used in Adjusted Models

Covariates Included in Adjusted Models (Pre-screened per SAP Section 5.3)
Model Type	Covariates passing LRT screen (p <= 0.25)
Binary outcomes (Poisson)	mountains, baseline_uses_ek, toilet, asset_fridge
Continuous outcomes (Linear)	mountains, baseline_uses_ek, toilet, w19, w21

Report generated from targets pipeline. All analyses follow the pre-registered SAP (v1, 2018-04-29).

Statistical software: R R version 4.4.2 (2024-10-31 ucrt) with targets, lme4, lmerTest, metafor.

TODOs and Open Questions

Deviations from Pre-Registered SAP

Documented Deviations from SAP v1 (2018-04-29)

Highest-value spot-checks

1. Study Overview

2. Data Cleaning and QC Audit

2.1 Pipeline architecture

2.2 Variable derivation audit

2.3 Issues identified and resolved

2.4 Flags requiring further verification

2.5 Confirmed correct against raw data

2.6 Live QC checks

2.6.1 Treatment counts per round

2.6.2 Drinking-water method ↔︎ EK use concordance

2.6.3 Diarrhea recall-period consistency

2.6.4 Head-of-household identification rate by round

2.6.5 Education distribution at baseline (head-level)

2.6.6 EUM outlier and negative-interval counts

2.6.7 EUM interval-day distribution

2.6.8 EUM households should have an electric kettle at baseline

2.6.9 EUM round-1 (installation) date sanity

2.6.10 EUM attrition pattern

2.6.11 Treatment-arm alignment: d_eum vs d_clean

2.6.12 kWh-per-day sanity

2.6.13 Water-travel-time covariate (SAP §5.3 trigger)

2.6.14 Missing-data summary for analysis-key variables

3. CONSORT Flow

4. Baseline Characteristics (SAP Section 4.5)

5. Unadjusted Means and SDs (SAP Section 5.2)

Baseline vs Midline: Key Outcomes by Arm

6. Primary Outcomes (SAP Section 5.1.1)

5a. EK Use Prevalence (Reported)

Descriptive Statistics

Unadjusted: Mantel-Haenszel Prevalence Ratios and Differences

Adjusted: Modified Poisson GLMM (IRR)

5b. EK Use Intensity (EUM)

Descriptive Statistics

Unadjusted: T-tests by Round

Unadjusted: T-tests Stratified (Midline, Round 3)

Adjusted: Linear Mixed Model

Exploratory: Longitudinal EUM Trends

5c. EK Use Likelihood Index (Enumerator Observations)

5d. Exploratory: EK use at 12 months (R6 endline)

Descriptive: enumerator-observed EK use at 12 months by arm

Cluster-aware MH at 12 months (both thresholds)

7. Secondary Outcomes

6a. Thermotolerant Coliforms (TTC) — SAP Section 5.1.2

Descriptive: Detection, Geometric Mean, Risk Classification

Unadjusted: T-tests (Overall)

Unadjusted: T-tests Stratified (Midline)

Adjusted: Linear Mixed Model

6b. Diarrhea Prevalence — SAP Section 5.1.2

Unadjusted: MH (7-day recall) — Overall + Stratified

Diarrhea Prevalence by Age Group — Descriptive

By Age Group — MH Prevalence Ratios

Adjusted: Modified Poisson GLMM

6c. Handwashing with Soap — SAP Section 5.1.2

Reported Handwashing — Overall + Stratified

Observed Handwashing — Overall + Stratified

8. Tertiary Outcome: Cough Prevalence (SAP Section 5.1.3)

Overall + Stratified by Randomization Block

By Age Group (SAP §5.1.3: <5, 5-18, >18, overall)

9. Negative Control Outcomes (SAP Section 5.1.4)

Overall + Stratified by Randomization Block

Rash by Age Group (SAP §5.1.4)

Toothache by Age Group (SAP §5.1.4)

10. All Adjusted Results Summary

Forest Plot: All Outcomes

11. Subgroup Analyses (SAP Section 5.4)

By Primary Water Type

By Water Quality Perception

By Electricity Affordability (FE.15)

By Risk Perception of Untreated Water (FQ.1)

By Primary Boiler Sex (SAP §5.4 boiling subgroup)

By Primary Boiler Age Bin (SAP §5.4 boiling subgroup)

12. Sensitivity Analyses

12a. Differential Recall Check (SAP Section 5.1.2)

Step 1 — prevalence by recall period, round, and arm

Step 2 — treatment effect (MH PR / RD) at each recall window, midline

12b. 14-Day Diarrhea Recall Sensitivity

2.6.11 Treatment-arm alignment: `d_eum` vs `d_clean`