Please answer these questions in a google document and share it with me when you have completed this lab.
A basketball player hits 8 of 10 free throws, you want to estimate how good of a free throw shooter they really are. For the first time, we’re going to approach this problem from a Bayesian perspective. We’ll need to start with a prior. Let’s start with the simplest (although not the most reasonable) prior – a uniform prior. We’ll assume that “a priori” (before we saw her hit 8 of 10 free throws) that we thought that every possible ability level was equally likely:
prob.range <- seq(0, 1, .001)
unif.prior <- rep(1,length(prob.range))
plot(prob.range, unif.prior, type="l", main="Uniform Prior")
The next thing we need to is to estime the likelihood of hitting 8 of 10 free throws given every possible level of ability.
likelihoods <- dbinom(8, 10, prob.range)
plot(prob.range, likelihoods, type="l", main="Likelihood Distribution")
Hitting 8 of 10 free throws if much more likely for a good shooter than a bad shooter and this will affect our posterior (our estimate of her free throw shooting ability after seeing her results). To calculate the posterior, we multiply our prior by our likelihood and then normalize it so that the our posterior probabilities add to 1. This normalization can be done by simply dividing the products of likelihoods and priors by the sum of the products.
posterior <- unif.prior*likelihoods/sum(unif.prior*likelihoods)
plot(prob.range, posterior, type="l", main="Posterior Distribution")
Our posterior distirbution shows which possible abilities we find most likely in light of the results we’ve seem. Notice how the posterior distibution looks just like the likelihood distribution. This is because we used a uniform prior. Before trying a different prior, let’s summarize our posterior distribution. We can do this by finding its mean and its standard deviation. We can calculate these using the same formulas we used to find the expected value and standard deviation of random variables:
\[ E[X] = \Sigma x \cdot p(x) \] \[ VAR[X] = \Sigma (x - E[x])^2 \cdot p(x) \] \[ SD[X] = \sqrt Var[X] \]
(EX = sum(prob.range*posterior))
## [1] 0.75
(SDX = sqrt(sum((prob.range-EX)^2*posterior)))
## [1] 0.1200961
How do you interpret these results?
Is it suprising that we think that she is less than an 80% shooter? Why might this be?
What would the mean and standard deviation of our posterior distribution be if she had taken only one free show and made it (Hint: try rewriting the code for the likelihoods and then rerunning the code that comes afterwards.)?
What if she had taken 100 shots and made 80 of them?
A NEW PRIOR
Statisticians spend a lot of time thinking about distributions. There’s a long list of notable distributions that a statistician has at their disposal some of which are continuous (like the normal distribution and the uniform distribution) and can take on any value within a certain range and some of them are discrete (the the binomial distribution which can only take on integer values). One useful continuous distribution is the Beta distribution. We won’t get into the particulars here but for our purposes there are two nice things about the Beta distribution: 1) It takes on values between 0 and 1 (and surely someones shooting percentage falls within this range) and 2) It’s pretty flexible - we can make our Beta distribution take on a wide variety of shapes by changing its two parameters (usually called alpha and beta but R calls them shape1 and shape2).
Take a look a the beta distribution with parameters 70 and 30:
beta.prior <- dbeta(prob.range, 70, 30)
plot(prob.range, beta.prior, type="l", main="beta(70,30) Prior ")
Notice that this distribution peaks near 70%. This is no coincidence, as an alpha of 70 and beta of 30 means that “a priori” (before we’ve seen our basketball player’s performance) we have a mean expectation of 70 successes for every 30 failures. The total of our parameters (70 and 30) has a meaning as well – it tells us how much we regress someone’s observed data towards our prior (remember regression towards the mean?). With parameters 70 and 30, we would regress a player with 100 (70+30) free throw results half the way towards the mean.
What if we think that expecting a 70% shooter is reasonable but think that’s way too much regression to the mean. Then we could try smaller parameters in the same ratio. Lets take a look at beta(7,3):
beta.prior <- dbeta(prob.range, 7, 3)
plot(prob.range, beta.prior, type="l", main="beta(7,3) Prior ")
Let’s evaluate our basketball player, who hit 8 of 10 shots, using this beta(7,3) prior rather than a uniform prior:
beta.prior <- dbeta(prob.range, 7, 3)
likelihoods <- dbinom(8, 10, prob.range)
posterior <- beta.prior*likelihoods/sum(beta.prior*likelihoods)
plot(prob.range, posterior, type="l", main="Posterior Distribution")
(EX = sum(prob.range*posterior))
## [1] 0.75
(SDX = sqrt(sum((prob.range-EX)^2*posterior)))
## [1] 0.09449112
How does the mean and standard deviation of our posterior with a beta(7,3) prior compare with the mean and standard deviation with the uniform(0,1) prior? (Note: uniform(0,1) means that every value 0 to 1 is equally likely.)
Create a beta prior that you think is reasonable for a high school basketball player and find the posterior distribution and the mean and standard deviation of the posterior using your prior.
Loaded Dice, Again
Let’s now return to our familiar scenario with dice that we placed in the over for 2 minutes with the six-pip side up. We will want to know how loaded towards sixes our dice really are in light of our results. Before rolling the dice and analyzing our results, we may want to think about an appropriate choice for our prior. Here are three possibilities
Uniform(1/6, 1) prior: We think that we’ve probably loaded our die to some extent but we have no idea how much. A priori, we think that any amount of loading is equally like so our prior is a uniform distribution stretching from 1/6th to 1.
prob.range <- seq(0, 1, .001)
modified.unif.prior <- (prob.range>=1/6)
plot(prob.range, modified.unif.prior, type="l", main="Uniform(1/6,1) Prior")
Beta(10,40) prior: Our best guess is that, after our oven treatment, we get 10 sixes for every 40 non-sixes (20 % sixes) but there’s substantial uncertainty around this estimate. We express this using a beta prior:
prob.range <- seq(0, 1, .001)
beta.prior <- dbeta(prob.range, 10, 40)
plot(prob.range, beta.prior, type="l", main="Beta(10,40) Prior")
Funky prior: We think that a small amount of loading is more likely that a large amount of loading but we’re certain that we didn’t make sixes less likely to occur. To express this we take our beta distribution but eliminate the possibility of a true six rate less than 1/6th.
prob.range <- seq(0, 1, .001)
funky.prior <- (prob.range>=1/6)*dbeta(prob.range, 10, 40)
plot(prob.range, funky.prior, type="l", main="Funky Prior")
Now, imagine that you roll our (possibly) loaded die 3600 times and get 670 sixes.