Bootstrap standard errors and confidence intervals

Maternal mortality in 2017 by armed conflict

Maternal mortality in 2017 by armed conflict

Summary statistics by armed conflict

data2017 |>
  group_by(armconf1) |>
  summarise(n = n(),
            median.matmor = median(MatMor, na.rm = T))
# A tibble: 2 × 3
  armconf1     n median.matmor
     <int> <int>         <dbl>
1        0   134          37.5
2        1    49         164  
obs.med.diff <- median(data2017[data2017$armconf1 == 1,]$MatMor) -
  median(data2017[data2017$armconf1 == 0,]$MatMor)
obs.med.diff
[1] 126.5

How can we put a confidence interval around the difference in medians?

Use stratified bootstrap

matmor.arm1 <- finaldata |>
  dplyr::filter(year == 2017 & !is.na(MatMor) & armconf1 == 1) |>
  dplyr::select(ISO, MatMor)
matmor.arm0 <- finaldata |>
  dplyr::filter(year == 2017 & !is.na(MatMor) & armconf1 == 0) |>
  dplyr::select(ISO, MatMor)
B <- 1000
med.diff <- rep(NA, B)
for(b in 1:B){
  resamp.arm1 <- matmor.arm1[sample(nrow(matmor.arm1), size = nrow(matmor.arm1), replace = TRUE),]
  resamp.arm0 <- matmor.arm0[sample(nrow(matmor.arm0), size = nrow(matmor.arm0), replace = TRUE),]
  med.diff[b] <- median(resamp.arm1$MatMor) - median(resamp.arm0$MatMor)
}
head(resamp.arm1, 12)
     ISO MatMor
14   COG    378
17   EGY     37
4    AZE     26
4.1  AZE     26
27   LBY     72
40   SSD   1150
37   RWA    248
13   COL     83
32   NER    509
12   TCD   1140
10   CMR    529
27.1 LBY     72

Histogram of the 1000 bootstrap medians

hist(med.diff, main = "Distribution of bootstrap statistic")

Use the boot package

  • Need to write a function for computing the statistic with at least two arguments

    • First argument passed will always be the original data
    • Second argument is a vector of indices
  • With the strata option, the resampling is done within the specified strata

library(boot)

getmeddiff <- function(data, indices) {
  sample_data <- data[indices, ]
  group_meds <- tapply(sample_data$MatMor, sample_data$armconf1, FUN = median)
  meddiff <- group_meds[2] - group_meds[1]
  return(meddiff)
}

bootout <- boot(data2017, statistic = getmeddiff, strata = data2017$armconf1, R = 1000)
bootout

STRATIFIED BOOTSTRAP


Call:
boot(data = data2017, statistic = getmeddiff, R = 1000, strata = data2017$armconf1)


Bootstrap Statistics :
    original  bias    std. error
t1*    126.5   13.18    62.34808

Output from boot function

  • The observed value of the statistic applied to data.
bootout$t0
    1 
126.5 
  • A matrix with sum(R) rows each of which is a bootstrap replicate of the result of calling statistic.
head(bootout$t)
      [,1]
[1,] 109.0
[2,] 121.0
[3,] 106.0
[4,] 110.5
[5,] 216.5
[6,]  97.0
  • Bootstrap standard error estimate
sd(bootout$t)
[1] 62.34808

Bootstrap confidence intervals

There are several different methods to calculate confidence intervals from the bootstrap samples:

  1. Percentile
  2. Basic
  3. Bias-corrected and accelerated (BCa)

Percentile bootstrap CI

  • The percentile method defines the \(100(1−\alpha)\%\) confidence interval for \(\theta\) as \[ [Q_{\alpha/2}, Q_{1-\alpha/2}] \] where \(Q_{c}\) is the \(c\)th percentile of the bootstrap distribution of \(\hat{\theta}\).
quantile(bootout$t, probs = c(0.025, 0.975))
    2.5%    97.5% 
 36.9875 265.0000 

Basic bootstrap CI

  • The basic (or reverse percentile) method defines the \(100(1−\alpha)\%\) confidence interval for \(\theta\) as \[ [2\hat{\theta} - Q_{1-\alpha/2}, 2\hat{\theta} + Q_{\alpha/2}] \]
2 * bootout$t0 - quantile(bootout$t, probs = 0.975)
  1 
-12 
2 * bootout$t0 - quantile(bootout$t, probs = 0.025)
       1 
216.0125 

Bias-corrected and accelerated (BCa) bootstrap CI

  • Percentile method is simple and intuitive but based only on bootstrap resamples and does not adjust for skewness in the bootstrap distribution.

  • Basic method is also simple and intuitive but is not range preserving and is skewed in the wrong direction for skewed bootstrap distributions.

  • BCa interval corrects for bias and skewness in the distribution of bootstrap estimates.

    • The bias-correction parameter, which is related to the proportion of bootstrap estimates that are less than the observed statistic.
    • The acceleration parameter, which is proportional to the skewness of the bootstrap distribution.

Options for confidence interval

boot.ci(boot.out = bootout, conf = 0.95, type = c("basic", "perc", "bca"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot.ci(boot.out = bootout, conf = 0.95, type = c("basic", "perc", 
    "bca"))

Intervals : 
Level      Basic              Percentile            BCa          
95%   (-12.0, 216.5 )   ( 36.5, 265.0 )   ( 37.2, 266.0 )  
Calculations and Intervals on Original Scale

In-class assignment

Compute the 95% bootstrap confidence intervals for the differences in medians of

  • Infant mortality
  • Under-5 mortality
  • Neonatal mortality

by armed conflict.

Interpret the your findings.