Bootstrap standard errors and confidence intervals

Maternal mortality in 2017 by armed conflict

Summary statistics by armed conflict

data2017 |>
  group_by(armconf1) |>
  summarise(n = n(),
            median.matmor = median(MatMor, na.rm = T))

# A tibble: 2 × 3
  armconf1     n median.matmor
     <int> <int>         <dbl>
1        0   134          37.5
2        1    49         164

obs.med.diff <- median(data2017[data2017$armconf1 == 1,]$MatMor) -
  median(data2017[data2017$armconf1 == 0,]$MatMor)
obs.med.diff

[1] 126.5

How can we put a confidence interval around the difference in medians?

Use stratified bootstrap

matmor.arm1 <- finaldata |>
  dplyr::filter(year == 2017 & !is.na(MatMor) & armconf1 == 1) |>
  dplyr::select(ISO, MatMor)
matmor.arm0 <- finaldata |>
  dplyr::filter(year == 2017 & !is.na(MatMor) & armconf1 == 0) |>
  dplyr::select(ISO, MatMor)
B <- 1000
med.diff <- rep(NA, B)
for(b in 1:B){
  resamp.arm1 <- matmor.arm1[sample(nrow(matmor.arm1), size = nrow(matmor.arm1), replace = TRUE),]
  resamp.arm0 <- matmor.arm0[sample(nrow(matmor.arm0), size = nrow(matmor.arm0), replace = TRUE),]
  med.diff[b] <- median(resamp.arm1$MatMor) - median(resamp.arm0$MatMor)
}
head(resamp.arm1, 12)

     ISO MatMor
14   COG    378
17   EGY     37
4    AZE     26
4.1  AZE     26
27   LBY     72
40   SSD   1150
37   RWA    248
13   COL     83
32   NER    509
12   TCD   1140
10   CMR    529
27.1 LBY     72

Histogram of the 1000 bootstrap medians

hist(med.diff, main = "Distribution of bootstrap statistic")

Use the `boot` package

Need to write a function for computing the statistic with at least two arguments
- First argument passed will always be the original data
- Second argument is a vector of indices
With the strata option, the resampling is done within the specified strata

library(boot)

getmeddiff <- function(data, indices) {
  sample_data <- data[indices, ]
  group_meds <- tapply(sample_data$MatMor, sample_data$armconf1, FUN = median)
  meddiff <- group_meds[2] - group_meds[1]
  return(meddiff)
}

bootout <- boot(data2017, statistic = getmeddiff, strata = data2017$armconf1, R = 1000)
bootout


STRATIFIED BOOTSTRAP


Call:
boot(data = data2017, statistic = getmeddiff, R = 1000, strata = data2017$armconf1)


Bootstrap Statistics :
    original  bias    std. error
t1*    126.5   13.18    62.34808

Output from `boot` function

The observed value of the statistic applied to data.

bootout$t0

    1 
126.5

A matrix with sum(R) rows each of which is a bootstrap replicate of the result of calling statistic.

head(bootout$t)

      [,1]
[1,] 109.0
[2,] 121.0
[3,] 106.0
[4,] 110.5
[5,] 216.5
[6,]  97.0

Bootstrap standard error estimate

sd(bootout$t)

[1] 62.34808

Bootstrap confidence intervals

There are several different methods to calculate confidence intervals from the bootstrap samples:

Percentile
Basic
Bias-corrected and accelerated (BCa)

Percentile bootstrap CI

The percentile method defines the \(100(1−\alpha)\%\) confidence interval for \(\theta\) as \[ [Q_{\alpha/2}, Q_{1-\alpha/2}] \] where \(Q_{c}\) is the \(c\)th percentile of the bootstrap distribution of \(\hat{\theta}\).

quantile(bootout$t, probs = c(0.025, 0.975))

    2.5%    97.5% 
 36.9875 265.0000

Basic bootstrap CI

The basic (or reverse percentile) method defines the \(100(1−\alpha)\%\) confidence interval for \(\theta\) as \[ [2\hat{\theta} - Q_{1-\alpha/2}, 2\hat{\theta} + Q_{\alpha/2}] \]

2 * bootout$t0 - quantile(bootout$t, probs = 0.975)

  1 
-12

2 * bootout$t0 - quantile(bootout$t, probs = 0.025)

       1 
216.0125

Bias-corrected and accelerated (BCa) bootstrap CI

Percentile method is simple and intuitive but based only on bootstrap resamples and does not adjust for skewness in the bootstrap distribution.
Basic method is also simple and intuitive but is not range preserving and is skewed in the wrong direction for skewed bootstrap distributions.
BCa interval corrects for bias and skewness in the distribution of bootstrap estimates.
- The bias-correction parameter, which is related to the proportion of bootstrap estimates that are less than the observed statistic.
- The acceleration parameter, which is proportional to the skewness of the bootstrap distribution.

Options for confidence interval

boot.ci(boot.out = bootout, conf = 0.95, type = c("basic", "perc", "bca"))

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot.ci(boot.out = bootout, conf = 0.95, type = c("basic", "perc", 
    "bca"))

Intervals : 
Level      Basic              Percentile            BCa          
95%   (-12.0, 216.5 )   ( 36.5, 265.0 )   ( 37.2, 266.0 )  
Calculations and Intervals on Original Scale

In-class assignment

Compute the 95% bootstrap confidence intervals for the differences in medians of

Infant mortality
Under-5 mortality
Neonatal mortality

by armed conflict.

Interpret the your findings.

Bootstrap standard errors and confidence intervals

Maternal mortality in 2017 by armed conflict

Maternal mortality in 2017 by armed conflict

Summary statistics by armed conflict

Use stratified bootstrap

Histogram of the 1000 bootstrap medians

Use the boot package

Output from boot function

Bootstrap confidence intervals

Percentile bootstrap CI

Basic bootstrap CI

Bias-corrected and accelerated (BCa) bootstrap CI

Options for confidence interval

In-class assignment

Use the `boot` package

Output from `boot` function