Hockey Example

Setup

library(ggplot2)

Prior

In ThinkBayes, the author uses a Gaussian prior for the hockey score
example. The gamma prior is more traditional, conjuate, and doesn't
allow negative values. Therefore seems to be a better fit. Let's first
evaluate that.

Parameters

The parameters for the original distribution:

hockey.prior.mean <- 2.7
hockey.prior.sd <- 0.3
hockey.prior.var <- hockey.prior.sd * hockey.prior.sd

Reparameterize for $ Gamma(\alpha, \beta) $:

hockey.prior.rate <- hockey.prior.mean/hockey.prior.var
hockey.prior.shape <- hockey.prior.mean * hockey.prior.rate

Thus the original $N(2.7, 0.3)$ becomes
$Gamma(81, 30)$.

Sanity check

Let's quick make sure that this distribution gives the right mean
and variance.

gamma.rates <- rgamma(1e+05, hockey.prior.shape, rate = hockey.prior.rate)

These samples have a mean of 2.7003 and standard deviation
of 0.2999. Pretty close.

Plotting the Prior

First, we need to set up a data frame.

rates <- seq(0, 6, 0.01)
hockey.prior.frame <- rbind(data.frame(rate = rates, dist = "Normal", dens = dnorm(rates, 
    hockey.prior.mean, hockey.prior.sd), cum = pnorm(rates, hockey.prior.mean, 
    hockey.prior.sd)), data.frame(rate = rates, dist = "Gamma", dens = dgamma(rates, 
    hockey.prior.shape, rate = hockey.prior.rate), cum = pgamma(rates, hockey.prior.shape, 
    rate = hockey.prior.rate)))

With this in place, we can plot the PDF:

ggplot(hockey.prior.frame) + aes(x = rate, y = dens, color = dist, linetype = dist) + 
    ggtitle("PDF") + geom_line()

plot of chunk unnamed-chunk-6

ggplot(hockey.prior.frame) + aes(x = rate, y = cum, color = dist, linetype = dist) + 
    ggtitle("CDF") + geom_line()

plot of chunk unnamed-chunk-7

Conclusion

The gamma and normal produce very similar distributions. The gamma has the advantages
of being more correct (non-negative) and conjugate.

Analytic Posterior

There are four hockey games, each of which has Bruins goals and Canucks goals.

hockey.data <- data.frame(bruins = c(0, 2, 8, 4), canucks = c(1, 3, 1, 0))

To do a posterior inference, we want to compute the shape and rate
parameters of the posterior gamma distribution for each.

bruins.post <- list(shape = hockey.prior.shape + sum(hockey.data$bruins), rate = hockey.prior.rate + 
    nrow(hockey.data))
canucks.post <- list(shape = hockey.prior.shape + sum(hockey.data$canucks), 
    rate = hockey.prior.rate + nrow(hockey.data))

The Bruins therefore have an analytic posterior distribution of
$Gamma(95, )$, and the
Canucks have a posterior of $Gamma(86, 34)$.

Simulation

Let's simulate these distributions!

bruins.rates <- rgamma(10000, bruins.post$shape, bruins.post$rate)
canucks.rates <- rgamma(10000, canucks.post$shape, canucks.post$rate)

So the Bruins have an expected score of 2.7931, and the
Canucks have an expected score of 2.5264.

Let's simulate a game for each of these posterior draws:

bruins.scores <- sapply(bruins.rates, function(r) {
    rpois(1, r)
})
canucks.scores <- sapply(canucks.rates, function(r) {
    rpois(1, r)
})

Plotting

What do these posterior distributions look like? Like usual, we'll start
by building a data frame.

post.frame <- with(new.env(), {
    rates <- seq(0, 6, 0.01)
    rbind(data.frame(rate = rates, team = "Bruins", dens = dgamma(rates, bruins.post$shape, 
        bruins.post$rate)), data.frame(rate = rates, team = "Canucks", dens = dgamma(rates, 
        canucks.post$shape, canucks.post$rate)))
})

And plot the PDFs:

ggplot(post.frame) + aes(x = rate, y = dens, color = team, linetype = team) + 
    ggtitle("Posterior PDFs") + geom_line()

plot of chunk unnamed-chunk-13

Let's plot the simulated scores:

scores.data <- rbind(data.frame(team = "Bruins", score = bruins.scores), data.frame(team = "Canucks", 
    score = canucks.scores))
ggplot(scores.data) + aes(x = score, fill = team, color = team) + stat_bin(binwidth = 1, 
    geom = "line")

## ymax not defined: adjusting position using y instead

plot of chunk unnamed-chunk-14

There's something weird with this plot.

Probability of winning

How often to the Bruins win?

sum(bruins.scores > canucks.scores)/length(bruins.scores)

## [1] 0.4597

How often do the Canucks win?

sum(bruins.scores < canucks.scores)/length(bruins.scores)

## [1] 0.3694

So the Bruins are more likely to win.