Statistical Resampling Methods: Part 2

Professor James Scott walk-through

Permutation Tests

A permutation test (also called a randomization test, re-randomization test, or an exact test) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points. Permutation tests exist for any test statistic, regardless of whether or not its distribution is known.

titanic = read.csv("TitanicSurvival.csv")
head(titanic)

##                                 X survived    sex     age passengerClass
## 1   Allen, Miss. Elisabeth Walton      yes female 29.0000            1st
## 2  Allison, Master. Hudson Trevor      yes   male  0.9167            1st
## 3    Allison, Miss. Helen Loraine       no female  2.0000            1st
## 4 Allison, Mr. Hudson Joshua Crei       no   male 30.0000            1st
## 5 Allison, Mrs. Hudson J C (Bessi       no female 25.0000            1st
## 6             Anderson, Mr. Harry      yes   male 48.0000            1st

#contingency table: survival status stratified by sex
t1 = xtabs(~sex + survived, data = titanic)
prop.table(t1, margin=1)

##         survived
## sex             no       yes
##   female 0.2725322 0.7274678
##   male   0.8090154 0.1909846

Based on this table, it seems that male passengers had a far higher chance of dying than female passengers. A natural test statistic to quanitfy the association between rows and columns of this table is the relative risk of dying: p(men dying)/p(women dying)

#save table output in variable t1
t1 = xtabs(~sex + survived, data = titanic)
#convert to proportions
p1 = prop.table(t1, margin = 1)

#calculate relative risk of dying
#assign row, col number to variables
risk_female = p1[1,1]
risk_male = p1[2, 1]
relative_risk = risk_male/risk_female
relative_risk

## [1] 2.968513

The relative risk says that male passengers were about 3 times as likely to die as women.

Permutation Tests: Shuffling the Cards

A natural follow-up question: Could this 3:1 relative risk ratio be due to chance?

One way of addressing this is by something called a permutation test, which explicitly breaks any association between the predictor and the response by “shuffling the cards”.

First, we'll break the association between sex and survival status by shuffling the observations of the sex variable. Note: each time you run this you'll get a different permutation of the sex variable, and therefore a data frame in which the correspondence in the original data set has been broken.

library(mosaic)
attach(titanic)
names(titanic)

## [1] "X"              "survived"       "sex"            "age"           
## [5] "passengerClass"

titanic_shuffle = data.frame(shuffle(sex), survived)
head(titanic_shuffle, 10)

##    shuffle.sex. survived
## 1          male      yes
## 2        female      yes
## 3          male       no
## 4          male       no
## 5          male       no
## 6          male      yes
## 7          male      yes
## 8        female       no
## 9          male      yes
## 10         male       no

The simple trick of shuffling the cards allows us to assess the plausible range of values for the relative risk under the assumption that there is no association between the two variables (null hypothesis).

t1_shuffle = xtabs(~shuffle(sex) + survived, data = titanic)
#quick way to calculate relative risk using built-in function
relrisk(t1_shuffle)

## [1] 0.9892993

Try executing the code block above a few times. You notice that each time you calculate the relative risk, you get something much closer to 1 than for the actual data.

Let’s use a Monte Carlo simulation to repeat this process 1000 times. This will allow us to make a histogram that shows the sampling distribution of the relative risk under the null hypothesis of no assocation between sex and survival status:

permtest1 = do(1000)*{
  t1_shuffle = xtabs(~shuffle(sex) + survived, data = titanic)
  relrisk(t1_shuffle)
}
head(permtest1)

##          RR
## 1 1.0108670
## 2 1.0163535
## 3 1.0274428
## 4 0.9892993
## 5 1.0913492
## 6 1.0108670

hist(permtest1$RR, xlab = "Relative Risk", main = "Sampling Distribution of Relative Risk Ratios", col = "yellow", breaks = 20, xlim = c(0.8, 1.3), cex.lab = 1.5, cex.main = 1.5)

plot of chunk unnamed-chunk-5

mean(permtest1$RR)

## [1] 1.000227

min(permtest1$RR)

## [1] 0.8898098

max(permtest1$RR)

## [1] 1.160642

Summary

When the cards are shuffled – and therefore when any possible connection between sex and survival status is explicitly broken – the relative risk almost never falls outside the interval (0.85, 1.15). This is pretty convincing evidence that the actual value we observed (about 3) could not have resulted due to chance.

Futhermore, repeatedly sampling the relative risk ratios resulting from different permutations of the dataset forces the sample mean to closely approximate what we intuitively know the population mean should be (if the null hypothesis is true: sex has no effect on whether or not one survived); 1.

Bootstrapping: Approximating the Sampling Distribution

Bootstrapping the Sample Mean

#data = age, kidney function for a sample of 157 men from a single clinic
library(mosaic)
creatinine = read.csv("creatinine.csv", header = TRUE)
#creatclear: the patient’s creatinine-clearance rate, measured in ml/minute
head(creatinine)

##   age creatclear
## 1  31      117.3
## 2  36      124.8
## 3  24      145.8
## 4  35      118.8
## 5  53      103.2
## 6  36      127.0

summary(creatinine)

##       age          creatclear   
##  Min.   :18.00   Min.   : 89.3  
##  1st Qu.:25.00   1st Qu.:118.6  
##  Median :31.00   Median :128.0  
##  Mean   :36.39   Mean   :125.3  
##  3rd Qu.:43.00   3rd Qu.:133.3  
##  Max.   :88.00   Max.   :147.6

The Question: What can we say about the average creatinine-clearance rate for the population of men who attend this clinic, on the basis of this particular sample of 157? The sample mean is easy enough to compute:

attach(creatinine)

## The following object is masked from titanic:
## 
##     age

mean(creatclear)

## [1] 125.2548

Finding the Standard Error of the Mean:

But we know that our sample mean of 125 won’t exactly equal the population mean. To quantify how far off our estimate is likely to be, we would like to know the standard error of the sample mean.

Moreover, we’d like to know this without taking many more samples of 157 from the population and seeing how the sample mean changes from one sample to the next.

The Idea:

The idea of Bootstrapping is is to pretend that your sample represents the whole population. We then take repeated “bootstrapped” samples from the original sample. Each bootstrapped sample is defined by two properties:

Each sample has the same size as the original sample.

It is a sample with replacement from the original sample. Because we sample with replacement, it is inevitable that our bootstrapped sample will contain ties and omissions. That is, some points from the original sample will get picked more than once, and some won’t get picked at all. This approximates the process of taking repeated real samples from the whole population.

Let’s see this in action. First, let’s create a bootstrapped sample and look at the first 20 data points. We’ll do this using the sample command.

single_bootstrapped_sample = sample(creatinine, size = 157, replace=TRUE)
head(single_bootstrapped_sample, 20)

##      age creatclear orig.ids
## 138   23      119.9      138
## 73    65       89.3       73
## 45    26      130.3       45
## 120   25      133.1      120
## 42    23      131.6       42
## 156   21      135.4      156
## 53    32      137.9       53
## 51    54      106.2       51
## 67    33      122.2       67
## 128   68      102.2      128
## 144   45      117.8      144
## 130   32      126.6      130
## 58    18      143.8       58
## 14    55      114.0       14
## 152   70       95.8      152
## 42.1  23      131.6       42
## 36    33      137.9       36
## 3     24      145.8        3
## 28    45      127.8       28
## 30    56      112.3       30

#You can visualize the pattern of ties and omissions with the following plot:
plot(table(single_bootstrapped_sample$orig.ids), xlab="Omissions", ylab="Frequency", main="Pattern of Ties & Omissions", cex.axis = 1.5, cex.lab = 1.5, cex.main=1.5)

plot of chunk unnamed-chunk-8
The height of each bar shows how many times that original data point was picked. The gaps show data points that were omitted.

Bootstrapping the Sample Mean

We’re now ready to estimate the sampling distribution of the sample mean by bootstrapping. Our basic procedure is:

Take a bootstrap sample from the original sample
For this bootstrapped sample, we compute the sample mean of the creatinine-clearance rate

We repeat this process a large number of times (say, 1000 or more). The key point is that, because each bootstrapped sample has a unique pattern of ties and omissions, each will have a different sample mean. The histogram of sample means across the bootstrapped samples then gives us an idea of how the sample mean changes from sample to sample.

my_boot = do(1000)*{
  bootstrapped_sample = resample(creatinine)
  mean(bootstrapped_sample$creatclear)
}

hist(my_boot$result, breaks = 20, main = "Sampling Distribution of CreatClear Sample Means", xlab = "Sample Means", ylab = "Frequency", col = "purple", cex.axis = 1.5, cex.lab = 1.5, cex.main = 1.5)

#creating lines for the mean and std. error of the mean
abline(v = (mean(my_boot$result) + sd(my_boot$result)), col = "red", lwd = 8)
abline(v = (mean(my_boot$result) - sd(my_boot$result)), col = "red", lwd = 8)
abline(v = mean(my_boot$result), col = "pink", lwd = 8)

plot of chunk unnamed-chunk-9
Incidentally, if you repeatedly execute the above code block, you’ll get slightly different histograms and standard errors each time. We refer to this variability as “Monte Carlo error,” to distinguish it from the standard error of the estimator itself. In principle, you can drive the Monte Carlo error to virtually nothing by taking a very large number of bootstrapped samples.