Statisticians spend a lot of time thinking about distributions. There’s a long list of notable distributions that a statistician has at their disposal some of which are continuous (like the normal distribution) and can take on any value within a certain range and some of them are discrete (the the binomial distribution which can only take on only integer values).
One useful continuous distribution is the Beta distribution. We won’t get into the particulars here but for our purposes there are two nice things about the Beta distribution: 1) It takes on values between 0 and 1 (just like probabilities!) and 2) It’s pretty flexible - we can make our Beta distribution take on a wide variety of shapes by changing its two parameters (usually called alpha and beta but R calls them shape1 and shape2).
Take a look a the beta distribution with parameters 40 and 60:
prob.range <- seq(0, 1, .001)
beta.prior <- dbeta(prob.range, 40, 60)
plot(prob.range, beta.prior, type="l", main="beta(40,60) Prior ")
Notice that this distribution peaks near 40%. This is no coincidence, as an alpha of 40 and beta of 60 means that “a priori” (before we’ve seen our basketball player’s shooting performance, say) we have a mean expectation of 40 successes for every 60 failures. The total of our parameters (40 and 60) has a meaning as well – it tells us how much we regress someone’s observed data towards our prior (remember regression towards the mean?). With parameters 40 and 60, we would regress a player with 100 (40+60) shot attempts half the way towards the mean.
What if we think that expecting a 40% shooter is reasonable but think that’s way too much regression to the mean. Then we could try smaller parameters in the same ratio. Lets take a look at beta(4,6):
beta.prior <- dbeta(prob.range, 4, 6)
plot(prob.range, beta.prior, type="l", main="beta(4,6) Prior ")
Assume that basketball player corner three shooting follows beta(4,6) distribution and that the basketball player we’re interested in just hit 12 of 20 shots. What is the probability that they hit their next shot?
Repeat the above calculation using a beta(40,60) prior. How does the result change? Why?
Choose a beta distribution which you think captures the probabilities with which Saint Ann’s students make it to school on time. Next, use the prior you chose to determine the probability that a student who arrived on time for 40 of the first 60 days of school will arrive on time the next day.