The ThinkBayes Euro coin problem, redone with continous inference.
ThinkBayes has a Euro coin problem, estimating whether the Belgian
Euro coin is fair. The analysis done in the book was discrete, with
101 hypotheses for the weight from 0.0 to 1.0.
This code re-does the analysis, using continuous inference with a beta
prior.
The coin flip data is 250 flips, 140 of which are heads.
coins.h <- 140
coins.t <- 110
coins.n <- coins.h + coins.t
Later, we will want a tolerance for what we consider “fair”.
tolerance <- 0.01
Compute a data frame suitable for plotting a beta distribution.
@param alpha The alpha shape parameter
@param beta The beta shape parameter
@return A data frame summarizing the beta distribution at many
points. It includes the density, cumulative density, and discretized
mass (on 0.001 intervals).
series.beta <- function(alpha, beta) {
# X axis values
xs <- seq(0, 1, 0.001)
# PDF @ x
dens <- dbeta(xs, alpha, beta)
# CDF @ x
cum <- pbeta(xs, alpha, beta)
# Mass on [x-0.0005, x+0.0005]
mass <- pbeta(xs + 5e-04, alpha, beta) - pbeta(xs - 5e-04, alpha, beta)
mass <- mass/sum(mass)
data.frame(p = xs, dens = dens, cum = cum, mass = mass)
}
Compute the probability mass under a beta at a particular
value +/- tolerance.
@param p The weight (probability of success)
@param tolerance The tolerance
@param alpha Beta distribution shape parameter
@param beta Beta distribution shape parameter
@return The probability that the binomial success probability
is within tolerance
of p
.
beta.mass <- function(p, tolerance, alpha, beta) {
pbeta(p + tolerance, alpha, beta) - pbeta(p - tolerance, alpha, beta)
}
We'll first run the analysis with a uniform prior
(\( Beta(1,1) \)). Therefore, the posterior is
\( Beta(1+h, 1+t) \).
Compute the posterior summary.
post.uniform <- series.beta(1 + coins.h, 1 + coins.t)
Plot the probability density
ggplot(post.uniform) + aes(x = p, y = dens) + geom_line()
Plot the cumulative density
ggplot(post.uniform) + aes(x = p, y = cum) + geom_line()
Probability that the coin is within our tolerance of being fair
beta.mass(0.5, tolerance, 1 + coins.h, 1 + coins.t)
## [1] 0.04375
95% Central Posterior Interval for coin weight
qbeta(c(0.025, 0.975), 1 + coins.h, 1 + coins.t)
## [1] 0.4980 0.6202
Look, it includes fair! However, not by much.
Let's compare with the frequentist analysis.
First, a one-sample T-test on the coin being fair:
t.test((1:coins.n <= coins.h) - 0.5)
##
## One Sample t-test
##
## data: (1:coins.n <= coins.h) - 0.5
## t = 1.907, df = 249, p-value = 0.05763
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.001956 0.121956
## sample estimates:
## mean of x
## 0.06
A proportion test on it being fair:
prop.test(coins.h, coins.n, 0.5)
##
## 1-sample proportions test with continuity correction
##
## data: coins.h out of coins.n, null probability 0.5
## X-squared = 3.364, df = 1, p-value = 0.06664
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.4960 0.6221
## sample estimates:
## p
## 0.56
The book re-ran the analysis using an informative prior. It used
a triangular prior; those are hard to handle analytically, so we
will use an informative prior of \( Beta(6,6) \), corresponding to
5 prior heads and 5 prior tails.
Plot the prior:
ggplot(series.beta(6, 6)) + aes(x = p, y = dens) + geom_line()
So the posterior is \( Beta(6+h, 6+t) \)
post.beta5 <- series.beta(6 + coins.h, 6 + coins.t)
Probability density
ggplot(post.beta5) + aes(x = p, y = dens) + geom_line()
Cumulative density
ggplot(post.beta5) + aes(x = p, y = cum) + geom_line()
95% CPI for coin weight
qbeta(c(0.025, 0.975), 6 + coins.h, 6 + coins.t)
## [1] 0.4969 0.6168
First, let's put the data together.
post.uniform$prior <- "Beta(1,1)"
post.beta5$prior <- "Beta(5,5)"
post.merged <- rbind(post.uniform, post.beta5)
Plot both posterior distributions on the same plot.
ggplot(post.merged) +
aes(x=p, y=dens, color=prior, linetype=prior) +
geom_line()