ThinkBayes Euro Coin

The ThinkBayes Euro coin problem, redone with continous inference.

ThinkBayes has a Euro coin problem, estimating whether the Belgian
Euro coin is fair. The analysis done in the book was discrete, with
101 hypotheses for the weight from 0.0 to 1.0.

This code re-does the analysis, using continuous inference with a beta
prior.

Data

The coin flip data is 250 flips, 140 of which are heads.

coins.h <- 140
coins.t <- 110
coins.n <- coins.h + coins.t

Later, we will want a tolerance for what we consider “fair”.

tolerance <- 0.01

Helper Functions

Compute a data frame suitable for plotting a beta distribution.
@param alpha The alpha shape parameter
@param beta The beta shape parameter
@return A data frame summarizing the beta distribution at many
points. It includes the density, cumulative density, and discretized
mass (on 0.001 intervals).

series.beta <- function(alpha, beta) {
    # X axis values
    xs <- seq(0, 1, 0.001)
    # PDF @ x
    dens <- dbeta(xs, alpha, beta)
    # CDF @ x
    cum <- pbeta(xs, alpha, beta)
    # Mass on [x-0.0005, x+0.0005]
    mass <- pbeta(xs + 5e-04, alpha, beta) - pbeta(xs - 5e-04, alpha, beta)
    mass <- mass/sum(mass)
    data.frame(p = xs, dens = dens, cum = cum, mass = mass)
}

Compute the probability mass under a beta at a particular
value +/- tolerance.
@param p The weight (probability of success)
@param tolerance The tolerance
@param alpha Beta distribution shape parameter
@param beta Beta distribution shape parameter
@return The probability that the binomial success probability
is within tolerance of p.

beta.mass <- function(p, tolerance, alpha, beta) {
    pbeta(p + tolerance, alpha, beta) - pbeta(p - tolerance, alpha, beta)
}

Uniform prior

We'll first run the analysis with a uniform prior
(\( Beta(1,1) \)). Therefore, the posterior is
\( Beta(1+h, 1+t) \).

Compute the posterior summary.

post.uniform <- series.beta(1 + coins.h, 1 + coins.t)

Plot the probability density

ggplot(post.uniform) + aes(x = p, y = dens) + geom_line()

plot of chunk unnamed-chunk-5

Plot the cumulative density

ggplot(post.uniform) + aes(x = p, y = cum) + geom_line()

plot of chunk unnamed-chunk-6

Probability that the coin is within our tolerance of being fair

beta.mass(0.5, tolerance, 1 + coins.h, 1 + coins.t)

## [1] 0.04375

95% Central Posterior Interval for coin weight

qbeta(c(0.025, 0.975), 1 + coins.h, 1 + coins.t)

## [1] 0.4980 0.6202

Look, it includes fair! However, not by much.

Frequentist Results

Let's compare with the frequentist analysis.

First, a one-sample T-test on the coin being fair:

t.test((1:coins.n <= coins.h) - 0.5)

## 
##  One Sample t-test
## 
## data:  (1:coins.n <= coins.h) - 0.5 
## t = 1.907, df = 249, p-value = 0.05763
## alternative hypothesis: true mean is not equal to 0 
## 95 percent confidence interval:
##  -0.001956  0.121956 
## sample estimates:
## mean of x 
##      0.06

A proportion test on it being fair:

prop.test(coins.h, coins.n, 0.5)

## 
##  1-sample proportions test with continuity correction
## 
## data:  coins.h out of coins.n, null probability 0.5 
## X-squared = 3.364, df = 1, p-value = 0.06664
## alternative hypothesis: true p is not equal to 0.5 
## 95 percent confidence interval:
##  0.4960 0.6221 
## sample estimates:
##    p 
## 0.56

Informative Prior

The book re-ran the analysis using an informative prior. It used
a triangular prior; those are hard to handle analytically, so we
will use an informative prior of \( Beta(6,6) \), corresponding to
5 prior heads and 5 prior tails.

Plot the prior:

ggplot(series.beta(6, 6)) + aes(x = p, y = dens) + geom_line()

plot of chunk unnamed-chunk-11

So the posterior is \( Beta(6+h, 6+t) \)

post.beta5 <- series.beta(6 + coins.h, 6 + coins.t)

Probability density

ggplot(post.beta5) + aes(x = p, y = dens) + geom_line()

plot of chunk unnamed-chunk-13

Cumulative density

ggplot(post.beta5) + aes(x = p, y = cum) + geom_line()

plot of chunk unnamed-chunk-14

95% CPI for coin weight

qbeta(c(0.025, 0.975), 6 + coins.h, 6 + coins.t)

## [1] 0.4969 0.6168

Merged Analysis

First, let's put the data together.

post.uniform$prior <- "Beta(1,1)"
post.beta5$prior <- "Beta(5,5)"
post.merged <- rbind(post.uniform, post.beta5)

Plot both posterior distributions on the same plot.

ggplot(post.merged) +
  aes(x=p, y=dens, color=prior, linetype=prior) +
  geom_line()

plot of chunk unnamed-chunk-17