title : Resampled inference subtitle : Statistical Inference author : Brian Caffo, Jeff Leek, Roger Peng job : Johns Hopkins Bloomberg School of Public Health logo : bloomberg_shield.png framework : io2012 # {io2012, html5slides, shower, dzslides, …} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft}


The jackknife


The jackknife


The jackknife


Continued


Example

We want to estimate the bias and standard error of the median

library(UsingR)
data(father.son)
x <- father.son$sheight
n <- length(x)
theta <- median(x)
jk <- sapply(1 : n,
             function(i) median(x[-i])
             )
thetaBar <- mean(jk)
biasEst <- (n - 1) * (thetaBar - theta) 
seEst <- sqrt((n - 1) * mean((jk - thetaBar)^2))

Example

c(biasEst, seEst)
[1] 0.0000 0.1014
library(bootstrap)
temp <- jackknife(x, median)
c(temp$jack.bias, temp$jack.se)
[1] 0.0000 0.1014

Example


Pseudo observations


The bootstrap


The bootstrap principle


The bootstrap in practice


Nonparametric bootstrap algorithm example


Example code

B <- 1000
resamples <- matrix(sample(x,
                           n * B,
                           replace = TRUE),
                    B, n)
medians <- apply(resamples, 1, median)
sd(medians)
[1] 0.08592
quantile(medians, c(.025, .975))
 2.5% 97.5% 
68.43 68.82 

Histogram of bootstrap resamples

hist(medians)
plot of chunk unnamed-chunk-4

Notes on the bootstrap


Group comparisons

data(InsectSprays)
boxplot(count ~ spray, data = InsectSprays)
plot of chunk unnamed-chunk-5

Permutation tests


Variations on permutation testing

Data type Statistic Test name
Ranks rank sum rank sum test
Binary hypergeometric prob Fisher's exact test
Raw data ordinary permutation test

Permutation test for pesticide data

subdata <- InsectSprays[InsectSprays$spray %in% c("B", "C"),]
y <- subdata$count
group <- as.character(subdata$spray)
testStat <- function(w, g) mean(w[g == "B"]) - mean(w[g == "C"])
observedStat <- testStat(y, group)
permutations <- sapply(1 : 10000, function(i) testStat(y, sample(group)))
observedStat
[1] 13.25
mean(permutations > observedStat)
[1] 0

Histogram of permutations

plot of chunk unnamed-chunk-7