Bayes

A Fair coin

Say we flip a coin 11 times and it comes up heads 4 times. What are the odds that it is fair?

Frequentist

Let’s start with a frequentist perspective.

n <- 11
h <- 0:n
p <- 0.5^h * 0.5^(n - h) * factorial(n) / (factorial(h) * factorial(n-h))
threshold <- 4

ggplot(NULL, aes(h, p)) + 
  geom_point()

sum(p[1:5])

## [1] 0.2744141

If it were fair there is a ~27% chance we would have gotten this value or lower.

Bayesian

Next let us consider a bayesian perspective. The difference here is that we consider each possible fairness value of the coin and determine how likely the resulting flip is given that the coin is so fair.

n <- 100
p <- (0:n) / n
get_pdf <- function(heads, tails) {
  pdf <- p^heads * (1 - p)^tails
  pdf <- pdf / sum(pdf)  
  
  pdf
}
show <- function(pdf) {
  data <- data.frame(p, pdf)
  ggplot(data, aes(p, pdf)) + 
    geom_path()
  
}

# No flips
show(get_pdf(heads=0, tails=0))

# One heads
show(get_pdf(heads=1, tails=0))

# Two heads
show(get_pdf(heads=2, tails=0))

# Two heads, one tail
show(get_pdf(heads=2, tails=1))

# Two heads, two tails
show(get_pdf(heads=2, tails=2))

# Seven heads, four tails
show(get_pdf(7, 4))

An interesting progression! How can we estimate the chance of fairness, well we can integrate under the pdf.

pdf <- get_pdf(7, 4)
sum(pdf[1:round(n/2)])

## [1] 0.1842764

sum(pdf[(round(n/2) + 1):n])

## [1] 0.8157236

There is an 80% chance that the coin is sub-fair and a 20% chance that it is more than fair.

Not too dissimilar than the frequentist perspective.

Bayes with updates

These bayesian results can also be achieved by updating one coin at a time using bayes law.

# With updates
p <- (0:100)/100
update <- function(pdf, event) {
  pdf <- pdf * p^event * (1 - p)^(1 - event)
  pdf <- pdf/sum(pdf)
  pdf
}
heads <- 1
tails <- 0

## No flips
pdf <- rep(1, 101)/101
show(pdf)

## One heads
pdf <- update(pdf, heads)
show(pdf)

## Two heads
pdf <- update(pdf, heads)
show(pdf)

## Two heads, one tail
pdf <- update(pdf, tails)
show(pdf)

## Two heads, two tails
pdf <- update(pdf, tails)
show(pdf)

As expected the pattern is the same as in the first case.

Confidence intervals

Previously we used the pdf to directly estimate the fairness of the coin. But we can also estimate confidence intervals by fitting an exponential quadratic model to the peak of the pdf.

Turns out that this funky operation gives us the equivalent of a normal fit to the data!

show <- function(pdf) {
  # Find fits
  max <- which.max(pdf)
  range <- (max-2):(max+2)
  data <- data.frame(p, pdf)
  data$fit <- predict(lm("pdf ~ p + I(p^2)", data[range,]), data)
  data$exp_fit <- exp(predict(lm("I(log(pdf)) ~ p + I(p^2)", data[range,]), data))
  
  # Hack
  # (ggplot needs variables in the global name space!)
  assign("max", max, envir = .GlobalEnv) 
  
  # Show data
  ggplot(data, aes(p, pdf)) + 
    geom_path(size=0.5) +
    geom_point(color='red', size=4,
               aes(p[max], pdf[max])) +
    geom_path(
      aes(p, exp_fit),
      color='red'
    ) +
    geom_path(
      aes(p, fit),
      color='grey'
    ) +
    coord_cartesian(ylim=c(-max(pdf), max(pdf)*1.05))
}

show(get_pdf(heads=20, tails=20))

Here we see that the exponential quadratic function (red) does an excellent job of capturing the pdf (black) when there are 20 tails and 20 heads. It would be totally reasonable to estimate confidence intervals based on this approximantion.

A regular quadratic fit (gray) emulates the data poorly. It gives probabilities below zero which are impossible. The exponential function keeps us constrained to positive probabilities.

show(get_pdf(heads=7, tails=4))

We can see that the exponential quadratic fit (red) gets worse at the data (black) becomes more skewed.

show(get_pdf(heads=2, tails=20))

When we have highly skewed data the exponential quadratic fit (red) is ok, but not perfect. If we develop a 95% confidence interval based on the pdf we will get a higher value than the confidence interval developed based on the approximation.