Outline

- Introductory problems   
- Definition of "bootstrap"   
- Applications of bootstrap method   
- Practice session   

Unconventional problems

Finding 95% CI of median
Finding 95% CI of ratio of X/Y
Testing for difference between 2 groups involved non-normal distribution, messy data
etc

An example: Genetics of IQ

  • 27 pairs of monozygotic twins: genetic vs foster parent
  • Question: is there difference in IQ between two groups?
genetic = c(82, 90, 91, 115, 115, 129, 131, 78, 79, 82, 97, 100, 107, 68, 73, 81, 85, 87, 87, 93, 94, 95, 97, 97, 103, 106, 111)
foster = c(82, 80, 88, 108, 116, 117, 132, 71, 75, 93, 95, 88, 111, 63, 77, 86, 83, 93, 97, 87, 94, 96, 112, 113, 106, 107, 98)
  • Solution : t-test or bootstrap?

Bootstrap method

  • Bootstrap method = a method of inference
  • A resampling technique for generating confidence intervals for a parameter
  • Basic idea: to generate a lot of artificial datasets and to assess the variability of a statistic text from its variability over all the sets of artificial data.

Statistical Inference

‘Traditional’ method Bootstrap method
- Draw random sample from a population
- Calculate statistic of interest
- Use probability distribution to infer the population parameter
- Draw random sample from a population
- Repeatedly sample from the sample data (bootstrap samples)
- Calculate statistic of interest
- Examine the distribution of the statistic

Sample and Population

Parameter Statistic
\(\mu, \sigma^2, \pi\) \(\bar{x}, s^2, p\)

Statistical inference: using sample statistics to infer on population parameters

Boostrap Idea

- Introduced by Bradley Efron (1979) (Efron 1979)
- Considered a “revolution” in statistical science
- Bootstraps: “to improve your position and get out of a difficult situation by your own efforts, without help from other people” (Longman Dictionary)

Sampling with Replacement

  • Sampling without replacement: each sample unit of the population has only one chance to be selected in the sample.
  • Sampling with replacement: a sample unit selected at random from the population is returned to the population, and then a second unit is selected at random.

Bootstrap Algorithm

  • Step 1: start with the original sample: (\(x_1, x_2, x_3, ..., x_n\))
  • Step 2: take random sample with replacement \(\rightarrow\) (\(x_1, x_2, x_3, x_4, ...\)) and estimate a statistic (call it \(t\))
  • Repeat Step 2 B times \(\rightarrow\) obtain B values of \(t\)
    (\(x_1, x_1, x_3, x_4, ...\)) \(\rightarrow\) \(t_1\)
    (\(x_1, x_1, x_3, x_4, ...\)) \(\rightarrow\) \(t_2\)
    (\(x_1, x_1, x_3, x_4, ...\)) \(\rightarrow\) \(t_3\)

    (\(x_1, x_1, x_3, x_4, ...\)) \(\rightarrow\) \(t_B\)
  • Collect \(t\)
  • Determine the distribution and 95% CI of \(t\)

Implementation

  • Using package ‘boot’ (Electronic Article 2002) and ‘simpleboot’ (Electronic Article 2019)

References

Efron, B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” Journal Article. Ann. Statist. 7 (1): 1–26. doi:10.1214/aos/1176344552.