library(dplyr)
library(broom)
set.seed(2015 - 08 - 12)
flips <- expand.grid(flip = 1:100, replicate = 1:250) %>%
mutate(result = rbinom(n(), 1, .5)) %>%
group_by(replicate) %>%
mutate(cumulative = cumsum(result))
# run an exact binomial test on each flip within each replicate
flips <- flips %>%
group_by(replicate, flip) %>%
do(tidy(binom.test(.$cumulative, .$flip)))
flips
## Source: local data frame [25,000 x 8]
## Groups: replicate, flip
##
## replicate flip estimate statistic p.value parameter conf.low
## 1 1 1 1.0000000 1 1.0000000 1 0.02500000
## 2 1 2 1.0000000 2 0.5000000 2 0.15811388
## 3 1 3 0.6666667 2 1.0000000 3 0.09429932
## 4 1 4 0.5000000 2 1.0000000 4 0.06758599
## 5 1 5 0.6000000 3 1.0000000 5 0.14663280
## 6 1 6 0.6666667 4 0.6875000 6 0.22277810
## 7 1 7 0.7142857 5 0.4531250 7 0.29042086
## 8 1 8 0.7500000 6 0.2890625 8 0.34914421
## 9 1 9 0.7777778 7 0.1796875 9 0.39990643
## 10 1 10 0.7000000 7 0.3437500 10 0.34754715
## .. ... ... ... ... ... ... ...
## Variables not shown: conf.high (dbl)
The estimates converge on .5, the true estimate.
library(ggplot2)
ggplot(flips, aes(flip, estimate, group = replicate)) +
geom_line(alpha = .1)
But notice that the p-values often dip below .05. (The pattern is amusing; that’s just because there are finitely many paths a series of coin flips can take. The same would, and does, happen with more continuous simulations).
ggplot(flips, aes(flip, p.value, group = replicate)) +
geom_line(alpha = .1)
Fraction of flips whose p-value on flip 100 lies below .05:
flip_100 <- flips %>%
filter(flip == 100)
mean(flip_100$p.value < .05)
## [1] 0.036
Fraction of flips that get below .05 on any day of the experiment:
min_pval <- flips %>%
group_by(replicate) %>%
summarize(min_pvalue = min(p.value))
mean(min_pval$min_pvalue < .05)
## [1] 0.224