1 General problems
- 1.1 The birthday problem
- 1.2 Quality control
2 Confidence intervals
- 2.1 Confidence interval for proportions
- 2.2 Reliability of an estimate
3 Hypothesis testing
- 3.1 Hypothesis test for difference in means
- 3.2 Hypothesis testing for difference in proportions

1 General problems

1.1 The birthday problem

What is the probability that at least 2 people in a room of 25 share the same birthday?

Approach: generate an urn with 365 numbers. Pull 25 out of it, with replacement. Check for duplicates; a score of >= 1 duplicates counts as 1.

pacman::p_load(data.table, foreach, magrittr, ggplot2, bootES)

reps = 1000
urn = 1:365

reps.found = foreach(1:reps, .combine = c) %do% {
  s = sample(urn, 25, replace=T)
  score = sum(duplicated(s)) >= 1
} %>% sum

p = reps.found/reps

cat("probability = ", p)

## probability =  0.572

1.2 Quality control

In the 1988 NBA series, Larry Bird got only 20 baskets out of 57 shots. Normally he is a 48% shooter. Is he in a slump?

Appproach: generate an urn with 100 numbers. Pull 57 numbers out of it, with replacement; these are the shots. Count the numbers <= 48; this is what he would normally score. Out of this series, count the number of shots that are <= 20.

urn = 1:100
reps = 1000

l = foreach(1:reps, .combine = c) %do% {
  s = sample(urn, 57, replace = T)
  normal.count = sum(s <= 48)
}

low.count = sum(l <= 20)
p = low.count/reps

cat("probability =", p)

## probability = 0.024

If you consider a normal series of 48% success rate in series of 57 shots, you wil find less than 5% cases where the success rate is <= 20%. So this looks indeed like a slump.

2 Confidence intervals

2.1 Confidence interval for proportions

Out of 1500 voters, 840 would vote for Bush and 660 for Dukakis. What is the confidence interval for the Bush voters?

Approach: generate an urn filled with balls 1:100. Sample 1500 out of this urn with replacement. Results <= 56 are Bush voters (840/1500). Calculate percentiles.

urn = 1:100

l = foreach(1:reps, .combine=c) %do% {
  s = sample(urn, 1500, replace=T)
  bush = (840/1500) * 100
  score = sum(s < bush)/1500
}

p = quantile(l, c(0.025, 0.975)) %>% round(3)
cat("95% confidence interval =", p)

## 95% confidence interval = 0.535 0.586

2.2 Reliability of an estimate

An agricultural lab experiments with a new pig ration on 12 pigs. After 4 weeks, the pigs have on average gained 508. The individual weight gains are included below (“urn”).

urn = c(496,544,464,416,512,560,608,544,480,466,512,496)
reps = 1000

l = foreach(1:reps, .combine = c) %do% {
  s = sample(urn, 12, replace = T)
  m = mean(s)
}

p = quantile(l, c(0.025,0.975)) %>% round
cat("95% confidence interval =", p )

## 95% confidence interval = 480 535

This confidence is slightly narrower than conventional; a better estimation can be obtained using the bootES package.

bootES(urn, R=1000, ci.type="bca")

## 
## 95.00% bca Confidence Interval, 1000 replicates
## Stat         CI (Low)     CI (High)    bias        SE          
## 508.167      478.894      536.000      0.914       14.237

3 Hypothesis testing

3.1 Hypothesis test for difference in means

We have 2 brands of batteries and tested how long they lasted. Brand A is somewhat better (delta = 1.3). Is this significant?

Approach: If there is no difference between brand A and brand B, then we can consider them to be part of the same population (= null hypothesis). We can put them all together in one urn and draw repeated samples with replacement out of it.

A = c(30,32,31,28,31,29,29,24,30,31)
B = c(28,28,32,31,24,23,31,27,27,31)
m1 = mean(A)
m2 = mean(B)
delta = m1-m2

urn = c(A,B)

reps = 1000

l = foreach(1:reps, .combine = c) %do% {
 s1 = sample(urn, length(A), replace = T) 
 s2 = sample(urn, length(A), replace = T)
 
 ms1 = mean(s1)
 ms2 = mean(s2)
 deltas = ms1 - ms2
}

prob = length(l[l > delta])/reps
cat("Probability = ", prob)

## Probability =  0.139

The probability of drawing random samples out of the combined population that has a difference larger than 1.3 is larger than 5%. The difference is therefore not significant.

3.2 Hypothesis testing for difference in proportions

An airline wants to know whether the no-show rates for business routes is different from that of vacation routes. It has taken a sample of 1000 for both groups. How significant is the difference?

Approach: if the difference is not significant (null hypothesis) then we can take artificial samples and label them “business” and “vacation” respectively. We test how often no shows occur by comparing the samples against the average no-show. Then, we see how often the difference in the artificial sample is larger than the actual no-show difference between vacation and business.

n = 1000  # size of groups

noshow.business = 384 # actual no-shows from sample
noshow.vacation = 341

delta = noshow.business - noshow.vacation

null.hyp.est = (noshow.business + noshow.vacation) / (2 * n) # best estimate for difference
avg = ceiling(null.hyp.est * 1000)

reps = 1000

l = foreach(1:reps, .combine = c) %do% {
  
  business = sample(1:1000, n, replace = T)
  vacation = sample(1:1000, n, replace = T)
                    
  bnoshow = business %between% c(1,avg) %>% abs %>% sum
  vnoshow = vacation %between% c(1,avg) %>% abs %>% sum
  
  diff = bnoshow - vnoshow
  
}

prob = length(l[l >= delta]) / reps
cat("Probability = ", prob)

## Probability =  0.023

The difference is smaller than 5%, which is significant.

Resampling statistics

henk@carbonmetrics.com

March 20, 2018