Introduction

Bootstrapping is a statistical resampling technique used to estimate the distribution of a sample statistic (like the mean, median, variance, or regression coefficients) by repeatedly sampling, with replacement, from the original data set.

In the case where we have a small sample, particularly helpful when estimating a statistic because it allows you to assess the variability and reliability of the estimate without relying on large sample sizes or strict distributional assumptions.

Example

Suppose we run a randomized controlled trial (RCT) to estimate the effect of an intervention (lottery-nudge) aimed at increasing the modern contraceptive prevalence rate (MCPR) among adolescent girls and young women (AGYW). In this experiment, we randomly identify 50 AGYW who are willing to participate and have never used a modern contraceptive before. We allocate 18 girls to the treatment group and 32 girls to the control group. The experiment runs for a period of one year. At the end of the year, we aim to estimate the effect size with a power of 80% or higher.

Solution

Since we have a small sample, we will perform the analysis using the bootstrap resampling method. We will resample 10,000 times and select the replicates with a power of 80% or higher. We will then calculate the average from those replicates to determine the differences in the proportion of AGYW who are using modern contraceptives among the treatment group vs control group.

Load packages


library(tidyverse)
library(infer)
library(pwr)
library(DT)
library(knitr)
library(kableExtra)
library(haven)
library(sjPlot)

Step 1: Attach data

datatable(mydata, options = list(scrollY = '400px', paging = FALSE))

Step 2: Compare treatment group and control on modern contraceptive use:

sjPlot::tab_xtab(var.row = mydata$user_modern_contraceptive, var.col =mydata$group, title = "Treatment vs control: Usage of modern contraceptive methods", show.col.prc = TRUE)
Treatment vs control: Usage of modern contraceptive methods
user_modern_contraceptive group Total
treatment control
yes 10
55.6 %
14
43.8 %
24
48 %
no 8
44.4 %
18
56.2 %
26
52 %
Total 18
100 %
32
100 %
50
100 %
χ2=0.257 · df=1 · φ=0.113 · Fisher’s p=0.557

Step 3: Run bootstrap code

output <- mydata %>%
  specify(user_modern_contraceptive ~ group, success = "yes") %>%
  generate(reps = 10000, type = "bootstrap", size = 50) %>%
  group_by(replicate) %>%
  summarise(
    treatment_sample_size = sum(group == "treatment"),
    control_sample_size = sum(group == "control"),
    treatment_yes = sum(group == "treatment" & user_modern_contraceptive == "yes"),
    control_yes = sum(group == "control" & user_modern_contraceptive == "yes"),
    treatment_proportion = treatment_yes / treatment_sample_size,
    control_proportion = control_yes / control_sample_size,
    treatment_std_dev = sqrt(treatment_proportion * (1 - treatment_proportion) / treatment_sample_size),
    control_std_dev = sqrt(control_proportion * (1 - control_proportion) / control_sample_size),
    diff_in_props = treatment_proportion - control_proportion,
    pooled_std_dev = sqrt(treatment_std_dev^2 + control_std_dev^2),
    .groups = 'drop'
  ) %>%
  mutate(
    lower_ci = diff_in_props - 1.96 * pooled_std_dev,
    upper_ci = diff_in_props + 1.96 * pooled_std_dev,
    power = pwr.p.test(h = diff_in_props / pooled_std_dev, 
                       n = 50, 
                       sig.level = 0.05,
                       alternative = "two.sided")$power
  )

output_display=output %>%
  select(replicate,treatment_sample_size,control_sample_size,treatment_yes,control_yes,treatment_proportion,control_proportion,diff_in_props,lower_ci,upper_ci,power) %>%
  head(100) %>%
  kable(caption = "Summary of Bootstrap Analysis Results",
        col.names = c("Replicate", "Treatment Sample Size", "Control Sample Size", 
                      "Treatment Successes", "Control Successes", 
                      "Treatment Proportion", "Control Proportion", 
                      "Difference in Proportions",
                      "Lower 95% CI", "Upper 95% CI", "Power"),
        align = 'c', format = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE, 
                font_size = 12) %>%
  scroll_box(width = "100%", height = "500px")
output_display
Summary of Bootstrap Analysis Results
Replicate Treatment Sample Size Control Sample Size Treatment Successes Control Successes Treatment Proportion Control Proportion Difference in Proportions Lower 95% CI Upper 95% CI Power
1 21 29 15 13 0.7142857 0.4482759 0.2660099 0.0012530 0.5307667 1.0000000
2 18 32 9 11 0.5000000 0.3437500 0.1562500 -0.1273644 0.4398644 1.0000000
3 20 30 12 11 0.6000000 0.3666667 0.2333333 -0.0420501 0.5087168 1.0000000
4 10 40 6 21 0.6000000 0.5250000 0.0750000 -0.2658055 0.4158055 0.8621453
5 24 26 13 7 0.5416667 0.2692308 0.2724359 0.0101218 0.5347500 1.0000000
6 18 32 12 7 0.6666667 0.2187500 0.4479167 0.1872569 0.7085764 1.0000000
7 19 31 11 13 0.5789474 0.4193548 0.1595925 -0.1222973 0.4414824 1.0000000
8 17 33 6 11 0.3529412 0.3333333 0.0196078 -0.2587381 0.2979537 0.1643028
9 21 29 17 14 0.8095238 0.4827586 0.3267652 0.0792066 0.5743238 1.0000000
10 21 29 9 11 0.4285714 0.3793103 0.0492611 -0.2263976 0.3249197 0.6973343
11 18 32 11 14 0.6111111 0.4375000 0.1736111 -0.1096984 0.4569206 1.0000000
12 19 31 10 14 0.5263158 0.4516129 0.0747029 -0.2100740 0.3594797 0.9530934
13 19 31 13 12 0.6842105 0.3870968 0.2971138 0.0267670 0.5674605 1.0000000
14 20 30 12 14 0.6000000 0.4666667 0.1333333 -0.1458982 0.4125649 0.9999984
15 24 26 13 13 0.5416667 0.5000000 0.0416667 -0.2352399 0.3185732 0.5499507
16 18 32 9 13 0.5000000 0.4062500 0.0937500 -0.1931523 0.3806523 0.9948972
17 25 25 18 6 0.7200000 0.2400000 0.4800000 0.2370865 0.7229135 1.0000000
18 16 34 11 16 0.6875000 0.4705882 0.2169118 -0.0654591 0.4992826 1.0000000
19 23 27 14 15 0.6086957 0.5555556 0.0531401 -0.2205648 0.3268450 0.7675608
20 25 25 16 8 0.6400000 0.3200000 0.3200000 0.0576234 0.5823766 1.0000000
21 17 33 9 14 0.5294118 0.4242424 0.1051693 -0.1859209 0.3962596 0.9988456
22 17 33 8 17 0.4705882 0.5151515 -0.0445633 -0.3367532 0.2476266 0.5611327
23 16 34 8 17 0.5000000 0.5000000 0.0000000 -0.2971061 0.2971061 0.0500000
24 14 36 8 14 0.5714286 0.3888889 0.1825397 -0.1216977 0.4867770 1.0000000
25 22 28 16 13 0.7272727 0.4642857 0.2629870 0.0007657 0.5252083 1.0000000
26 23 27 12 12 0.5217391 0.4444444 0.0772947 -0.1998494 0.3544387 0.9716329
27 12 38 8 10 0.6666667 0.2631579 0.4035088 0.1022720 0.7047456 1.0000000
28 19 31 13 8 0.6842105 0.2580645 0.4261460 0.1665049 0.6857871 1.0000000
29 22 28 15 15 0.6818182 0.5357143 0.1461039 -0.1222374 0.4144452 1.0000000
30 16 34 6 14 0.3750000 0.4117647 -0.0367647 -0.3259720 0.2524425 0.4215674
31 23 27 15 12 0.6521739 0.4444444 0.2077295 -0.0624926 0.4779515 1.0000000
32 13 37 4 13 0.3076923 0.3513514 -0.0436590 -0.3379564 0.2506383 0.5382946
33 21 29 15 13 0.7142857 0.4482759 0.2660099 0.0012530 0.5307667 1.0000000
34 15 35 7 18 0.4666667 0.5142857 -0.0476190 -0.3495457 0.2543076 0.5893718
35 17 33 9 15 0.5294118 0.4545455 0.0748663 -0.2169575 0.3666901 0.9447094
36 22 28 9 13 0.4090909 0.4642857 -0.0551948 -0.3314853 0.2210957 0.7906630
37 18 32 9 14 0.5000000 0.4375000 0.0625000 -0.2254221 0.3504221 0.8527983
38 16 34 9 13 0.5625000 0.3823529 0.1801471 -0.1127186 0.4730128 1.0000000
39 19 31 9 16 0.4736842 0.5161290 -0.0424448 -0.3276741 0.2427845 0.5408211
40 19 31 14 11 0.7368421 0.3548387 0.3820034 0.1220513 0.6419555 1.0000000
41 14 36 6 20 0.4285714 0.5555556 -0.1269841 -0.4328410 0.1788727 0.9999259
42 24 26 14 12 0.5833333 0.4615385 0.1217949 -0.1532051 0.3967948 0.9999853
43 16 34 10 18 0.6250000 0.5294118 0.0955882 -0.1949677 0.3861442 0.9953323
44 16 34 10 15 0.6250000 0.4411765 0.1838235 -0.1062274 0.4738744 1.0000000
45 14 36 6 23 0.4285714 0.6388889 -0.2103175 -0.5133345 0.0926995 1.0000000
46 17 33 12 17 0.7058824 0.5151515 0.1907308 -0.0849353 0.4663970 1.0000000
47 15 35 8 14 0.5333333 0.4000000 0.1333333 -0.1668075 0.4334741 0.9999865
48 21 29 10 14 0.4761905 0.4827586 -0.0065681 -0.2871167 0.2739804 0.0621459
49 21 29 9 18 0.4285714 0.6206897 -0.1921182 -0.4677769 0.0835404 1.0000000
50 22 28 13 14 0.5909091 0.5000000 0.0909091 -0.1856979 0.3675161 0.9952705
51 18 32 13 18 0.7222222 0.5625000 0.1597222 -0.1092754 0.4287199 1.0000000
52 16 34 7 17 0.4375000 0.5000000 -0.0625000 -0.3580235 0.2330235 0.8342575
53 19 31 11 9 0.5789474 0.2903226 0.2886248 0.0150930 0.5621565 1.0000000
54 16 34 12 13 0.7500000 0.3823529 0.3676471 0.0998748 0.6354193 1.0000000
55 22 28 12 13 0.5454545 0.4642857 0.0811688 -0.1970734 0.3594111 0.9813775
56 18 32 11 18 0.6111111 0.5625000 0.0486111 -0.2346984 0.3319206 0.6620541
57 24 26 10 10 0.4166667 0.3846154 0.0320513 -0.2397509 0.3038534 0.3725045
58 17 33 11 13 0.6470588 0.3939394 0.2531194 -0.0286617 0.5349006 1.0000000
59 19 31 13 15 0.6842105 0.4838710 0.2003396 -0.0728541 0.4735332 1.0000000
60 21 29 8 11 0.3809524 0.3793103 0.0016420 -0.2709904 0.2742745 0.0507985
61 18 32 9 16 0.5000000 0.5000000 0.0000000 -0.2887353 0.2887353 0.0500000
62 16 34 8 12 0.5000000 0.3529412 0.1470588 -0.1459063 0.4400239 0.9999997
63 17 33 5 17 0.2941176 0.5151515 -0.2210339 -0.4967000 0.0546323 1.0000000
64 19 31 12 11 0.6315789 0.3548387 0.2767402 0.0021201 0.5513604 1.0000000
65 17 33 9 15 0.5294118 0.4545455 0.0748663 -0.2169575 0.3666901 0.9447094
66 14 36 11 21 0.7857143 0.5833333 0.2023810 -0.0662019 0.4709638 1.0000000
67 17 33 10 13 0.5882353 0.3939394 0.1942959 -0.0929815 0.4815733 1.0000000
68 18 32 11 13 0.6111111 0.4062500 0.2048611 -0.0774120 0.4871342 1.0000000
69 18 32 10 12 0.5555556 0.3750000 0.1805556 -0.1037569 0.4648680 1.0000000
70 21 29 12 11 0.5714286 0.3793103 0.1921182 -0.0835404 0.4677769 1.0000000
71 13 37 7 19 0.5384615 0.5135135 0.0249480 -0.2902941 0.3401901 0.1951462
72 21 29 13 16 0.6190476 0.5517241 0.0673235 -0.2081826 0.3428295 0.9231718
73 22 28 13 10 0.5909091 0.3571429 0.2337662 -0.0377322 0.5052647 1.0000000
74 19 31 11 17 0.5789474 0.5483871 0.0305603 -0.2522430 0.3133635 0.3222049
75 24 26 12 11 0.5000000 0.4230769 0.0769231 -0.1989045 0.3527506 0.9716184
76 22 28 14 14 0.6363636 0.5000000 0.1363636 -0.1369631 0.4096904 0.9999996
77 25 25 13 11 0.5200000 0.4400000 0.0800000 -0.1960749 0.3560749 0.9801151
78 14 36 6 18 0.4285714 0.5000000 -0.0714286 -0.3778234 0.2349662 0.8981351
79 21 29 9 13 0.4285714 0.4482759 -0.0197044 -0.2982055 0.2587966 0.1653307
80 20 30 11 10 0.5500000 0.3333333 0.2166667 -0.0590072 0.4923406 1.0000000
81 15 35 7 17 0.4666667 0.4857143 -0.0190476 -0.3209743 0.2828790 0.1411190
82 16 34 7 19 0.4375000 0.5588235 -0.1213235 -0.4161849 0.1735378 0.9999089
83 17 33 10 11 0.5882353 0.3333333 0.2549020 -0.0290067 0.5388107 1.0000000
84 14 36 8 19 0.5714286 0.5277778 0.0436508 -0.2626096 0.3499112 0.5061764
85 11 39 5 17 0.4545455 0.4358974 0.0186480 -0.3142310 0.3515270 0.1213996
86 17 33 9 19 0.5294118 0.5757576 -0.0463458 -0.3374361 0.2447445 0.5974210
87 19 31 12 11 0.6315789 0.3548387 0.2767402 0.0021201 0.5513604 1.0000000
88 14 36 9 18 0.6428571 0.5000000 0.1428571 -0.1566053 0.4423196 0.9999984
89 25 25 9 12 0.3600000 0.4800000 -0.1200000 -0.3915856 0.1515856 0.9999843
90 14 36 10 16 0.7142857 0.4444444 0.2698413 -0.0171229 0.5568054 1.0000000
91 16 34 8 19 0.5000000 0.5588235 -0.0588235 -0.3552710 0.2376239 0.7852696
92 21 29 16 9 0.7619048 0.3103448 0.4515599 0.2034921 0.6996277 1.0000000
93 25 25 13 9 0.5200000 0.3600000 0.1600000 -0.1115856 0.4315856 1.0000000
94 13 37 7 19 0.5384615 0.5135135 0.0249480 -0.2902941 0.3401901 0.1951462
95 17 33 9 15 0.5294118 0.4545455 0.0748663 -0.2169575 0.3666901 0.9447094
96 21 29 12 12 0.5714286 0.4137931 0.1576355 -0.1197321 0.4350031 1.0000000
97 16 34 9 16 0.5625000 0.4705882 0.0919118 -0.2034464 0.3872699 0.9906856
98 14 36 6 12 0.4285714 0.3333333 0.0952381 -0.2062807 0.3967569 0.9921894
99 20 30 10 14 0.5000000 0.4666667 0.0333333 -0.2493167 0.3159834 0.3725573
100 14 36 13 17 0.9285714 0.4722222 0.4563492 0.2446999 0.6679985 1.0000000

Step 4: Remove underpowered observations (power less than 80%)

filtered_data <- output %>%
  filter(power >= 0.80)

Step 5: Analysis results

Analysis shows that the impact of out intervention was increase of MCPR by 15%

summary_diff_p <- filtered_data %>%
  summarise(mean_diff_p = mean(diff_in_props),
            mean_lower_ci = mean(lower_ci),
            mean_upper_ci = mean(upper_ci))
summary_diff_p
## # A tibble: 1 × 3
##   mean_diff_p mean_lower_ci mean_upper_ci
##         <dbl>         <dbl>         <dbl>
## 1       0.154        -0.127         0.435