Applied Bayesian Lab 1

Complete all Questions and submit your final PDF or html (either works!) under Assignments in Canvas.

Goal

We have been discussing a Bayesian model for estimating a proportion \(\theta\). The model is called the Beta-Binomial model, as the sampling model for \(Y|\theta\) is a binomial and the chosen prior for \(\theta\) is a Beta distribution.

In class, we saw that the choice of this sampling model and this prior yields a Beta posterior distribution for \(\theta|Y\). However, we have not discussed how we might choose the hyperparameters in the prior to reflect our prior beliefs about \(\theta\). We have also not seen how the choice of these hyperparameters impacts the posterior distribution. We will do all of that today using R.

Note: This lab assumes that you have already installed the software and gone through the RMarkdown tutorial. If you have not done so, make sure to do these before starting the lab. This lab assumes you are working in Markdown - you may not use R script as your submission format for this lab.

Loading the plotting function

In the Beta-Binomial model, we have a Beta prior distribution for \(\theta\), \[\theta \sim Beta(\alpha, \beta).\] We know that the Beta distribution takes support on [0,1] which is appropriate because \(\theta\) is a proportion.

The two hyperparameters \(\alpha\) and \(\beta\) are key to determining the shape of the Beta prior distribution. All Beta distributions take support on [0,1], but they do not all have the same shape. The key to choosing the hyperparameters that reflect your prior beliefs is to see how the choice of \(\alpha\) and \(\beta\) impacts the shape of the prior. Because of this, we will start our lab by playing a bit with the Beta distribution.

To begin, we are going to load a function that will make visualizing the Beta distribution much easier. For those of you that are new to Markdown, you need to create a Code Chunk (chunk for short) in order to input R commands into the Markdown document. Remember that to do that, you go to Code at the very upper left of your RStudio window and then choose Insert Chunk.

Once you have created your chunk, copy and paste the following inside the chunk and press play.

plot_beta <- function (alpha, beta, titlein, mean = FALSE, mode = FALSE) 
{
  suppressMessages(library(ggplot2))
  p <- ggplot(data = data.frame(x = c(0, 1)), aes(x)) + stat_function(fun = stats::dbeta, 
                                                                      n = 101, args = list(shape1 = alpha, shape2 = beta)) + 
    labs(x = expression(theta), y = "density", title = titlein)
  if (mean == TRUE & mode == FALSE) {
    mean <- alpha/(alpha + beta)
    p <- p + geom_segment(aes(x = mean, y = 0, xend = mean, 
                              yend = dbeta(mean, alpha, beta), linetype = "mean")) + 
      scale_linetype_manual(values = c(mean = "solid")) + 
      theme(legend.title = element_blank())
  }
  if (mean == FALSE & mode == TRUE) {
    mode <- (alpha - 1)/(alpha + beta - 2)
    p <- p + geom_segment(aes(x = mode, y = 0, xend = mode, 
                              yend = dbeta(mode, alpha, beta), linetype = "mode")) + 
      scale_linetype_manual(values = c(mode = "dashed")) + 
      theme(legend.title = element_blank())
  }
  if (mean == TRUE & mode == TRUE) {
    mean <- alpha/(alpha + beta)
    mode <- (alpha - 1)/(alpha + beta - 2)
    p <- p + geom_segment(aes(x = mean, y = 0, xend = mean, 
                              yend = dbeta(mean, alpha, beta), linetype = "mean")) + 
      geom_segment(aes(x = mode, y = 0, xend = mode, yend = stats::dbeta(mode, 
                                                                         alpha, beta), linetype = "mode")) + scale_linetype_manual(values = c(mean = "solid", 
                                                                                                                                              mode = "dashed")) + theme(legend.title = element_blank())
  }
  p
}

Now, when you run this command, it will look like nothing has happened…but it has. This command teaches R a function called plot_beta that will plot the probability density function (pdf) for a Beta distribution with a given set of hyperparameters. We are going to use this to explore different shapes of the pdf of a Beta distribution.

Exploring the Beta Distribution

Now that we have a plotting function, let’s start looking at a few different Beta distributions. To plot a Beta distribution in R using the function we have just loaded, you use the following:

plot_beta(alpha, beta, "Put your Title here")

You will replace alpha and beta with the numeric values of the hyperparameters. You can also title the graph as you like using the 3rd argument in the function. Note: Make sure your chosen title is in quotation marks.

When you create this plot, the x axis will show the values of \(\theta\), ranging from 0 to 1. The y-axis will show the value of the pdf,

\[f(\theta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \theta^{\alpha-1} (1-\theta)^{\beta - 1}.\] Recall that the pdf, unlike a probability, can have values greater than 1.

Question 1

Create a plot of the pdf of a \(Beta(40,50)\) distribution and title it Figure 1. Describe the distribution. Note “describe the distribution” means you should comment on center, spread and modality from what you can see in the graph; you do not need any further computations to answer this question.

In addition to visualizing the pdf of the prior for \(\theta\), we can directly compute the prior mean, the prior mode, and the prior variance of \(\theta\) conditional on the values of the hyperparameters.

This is because knowing that we have chosen \(\theta \sim Beta(\alpha,\beta)\) implies the following:

Mean

\[E(\theta) = \frac{\alpha}{\alpha + \beta}\]

Mode

\[Mode(\theta) = \frac{\alpha - 1}{\alpha + \beta -2}\]

Variance

\[Variance(\theta) = \frac{\alpha \beta}{(\alpha + \beta)^2(\alpha + \beta +1)}\]

Question 2

Compute the mean, mode, and variance for a Beta(40, 50) distribution. Round to 2 significant digits.

If you like, you can also add the mode or mean onto the graph of the Beta pdf using the code adaptation below:

plot_beta(alpha, beta, "Put your Title here", mean= TRUE, mode = TRUE)

Question 3

Add the mean and mode for \(\theta\) onto your Figure 1, and title the new graph Figure 2.

Understanding these measures is important, because the values of the hyperparameters impact both the center and spread of our prior distribution. This means that both (1) the values of \(\theta\) we think are most likely and (2) how strongly we believe \(\theta\) is concentrated in a specific part of the support [0,1] can be reflected in our choice of hyperparameters.

Question 4

If we were to choose this prior distribution, Beta(40,50), for \(\theta\), what would this reflect about our prior beliefs? Specifically, where does this indicate that we think \(\theta\) is likely to be, and is this a fairly strong belief or a fairly weak belief? Explain.

Exploring the hyperparameters

Okay, so we have explored one specific Beta distribution. However, we can create different Beta distributions by changing the values of the hyperparameters. To see this, let’s try changing just one of the hyperparameter values at a time and seeing how that impacts our distribution.

Question 5

Create a plot showing the pdf of a \(Beta(1,50)\), \(Beta(10,50)\), \(Beta(50,50)\), and \(Beta(100,50)\) distribution. Make sure each plot has a title (Figure 3A, Figure 3B, etc.), and stack the four plots. Note: If you need a reminder on how to do this check here.

Question 6

Comment on how the shape of the pdf changes as you increase \(\alpha\).

Question 7

Create a plot showing a the pdf of a \(Beta(50,1)\), \(Beta(50,10)\), \(Beta(50,50)\), and \(Beta(50,100)\) distribution. Make sure each plot has a title, and stack the four plots.

Question 8

Comment on how the shape of the pdf changes as you increase \(\beta\).

So, as we can see, changing \(\alpha\) and \(\beta\) changes the center, spread, and skew of the Beta distribution. This means that there are a lot of a different prior beliefs we can reflect by changing the hyperparameter values.

As a general rule, when \(\alpha = \beta\), we have a symmetric Beta distribution. When \(\alpha > \beta\) the Beta distribution is left-skewed. When \(\beta > \alpha\), the Beta distribution is right-skewed..

Before we move on, there is one very special version of the Beta distribution we need to be aware of.

Question 9

Create a plot of the pdf of a \(Beta(1,1)\) distribution and title it Figure 5. Describe the distribution.

We will explore what it means to choose this a prior later on in the semester!

Choosing prior hyperparameters

Now that we have explored the Beta distribution a little, let’s start thinking about how the properties of the Beta distribution might influence our prior choice. Let’s go back to our example in class about March Madness. We want to estimate \(\theta\), the proportion of Wake Forest students, faculty, and staff who will participate in a March Madness bracket this year. We have already discussed why a Beta-Binomial model makes sense for this scenario, but we have not discussed our hyper-parameters.

In 2022, there were 17,357,589 brackets filled out and registered with the ESPN challenge. There were roughly 333,287,557 people in the United States at that time in 2022, which translates to roughly 5.21% of the population filling out a bracket.

This information can help inform out prior! One could assume that Wake Forest’s participation in the bracket matches that of the US population at large. If we are willing to make this assumption, it might be reasonable to assume that around 5% of Wake Forest faculty, staff, and students filled out a bracket last year. We are modeling this year, but last’s year’s information might be a good place to start.

Even better prior information might be looking at the percent of Wake Forest faculty, staff, and students who filled out a bracket last year…but right now, we don’t have that information. We typically have different prior information depending on the situation we are in. It depends on how much information has been collected on our parameter of interest in the past and what information we actually have access to.

For the moment, let’s assume that the participation by Wake Foresters roughly mirrors the population participation, so we center our Beta prior distribution for \(\theta\) at .05.

Question 10

Consider the formula for \(E(\theta)\). Let \(E(\theta) = .05\) and solve for \(\beta\) in terms of \(\alpha\). In other words, your final answer should be \(\beta = ...\)

Why did we bother doing this?? Well, there are a lot of different possible combination of \(\alpha\) and \(\beta\) that can yield an expected value of .05. However, we want to consider more than just the mean. We also want to consider the variance.

Question 11

Create a graph of the pdf of a \(Beta(1,19)\), \(Beta(2,38)\), \(Beta(4,76)\), and \(Beta(20,380)\) distribution. Label each figure and stack your graphs. Describe the similarities and differences in the 4 graphs. Specifically, what happens to the pdf as you increase the sum of \(\alpha\) and \(\beta\)?

Now that we have visualized the pdfs of these four different Beta distributions, let’s explore measures of central tendency. For each of the four distributions whose pdfs we we have just plotted, we are going to compute the mean and standard deviation. For instance,

\(\theta \sim Beta(1,19)\)

\[StDev(\theta) = \sqrt{\frac{19}{(1 + 19)^2(1 + 19 +1)}} \approx .048\] When we do this for all four distributions, we get

\(\alpha\) \(\beta\) Mean Standard Deviation
1 19 .05 .048
2 38 .05 .034
4 76 .05 .024
20 380 .05 .011

What we notice is that when the sum of the hyperparameters gets bigger, our standard deviation (and hence the variance) of \(\theta\) gets smaller.

This is important!!! This means that the larger the sum of \(\alpha\) and \(\beta\), the more concentrated our prior is on .05. The smaller the sum of \(\alpha\) and \(\beta\) we choose, the less concentrated our prior is on .05.

When we choose a prior that is very concentrated, we call this a strong prior. This means we have strong prior information that \(\theta\) should be a in a very specific part of the space and we allow for less variability in the prior distribution.

When we choose a prior that is more spread out (less concentrated), we call this a weaker prior. This means we have less strong prior information, and we allow more variability in the prior distribution.

Okay…but how does all this tie back to the posterior?

The Posterior

As we saw in class, with the Beta-Binomial model,

\[Y|\theta \sim Binomial(n, \theta),\] \[\theta \sim Beta(\alpha, \beta),\]

we get the following Beta posterior distribution for \(\theta|Y\):

\[(\theta|Y=y) \sim Beta(y + \alpha, n-y + \beta)\]

In the posterior hyperparameters,

\[\alpha^{*} = y + \alpha,\]

\[\beta^{*} = n-y + \beta,\]

we have the influence of the data (through \(n\) and \(y\)) and the influence of the prior (through \(\alpha\) and \(\beta\)).

Take a look at the first hyperparameter, \(y + \alpha\). \(y\) is the number of successes we see in \(n\) trials. For our setting, this means that \(y\) is the number of people in our random sample who are going to fill out a March Madness bracket in Spring 2023.

So, what does \(\alpha\) represent? Well, it’s the hyperparameter in our prior…and it plays the role of a prior number of successes. In essence, when we choose \(\alpha\), we are choosing a number that reflects the number of successes we expect to have in a trial with \(\alpha + \beta\) people in it!

Does that hold when we look at the second hyperparameter, \(n - y + \beta\)? Well, \(n-y\) is the number of failures we see in \(n\) trials. For our setting, this means that \(n-y\) is the number of people in our random sample who are NOT going to fill out a March Madness bracket in Spring 2023.

So, what does \(\beta\) represent? It plays the role of a prior number of failures. In essence, when we choose \(\beta\), we are choosing a number that reflects the number of failures we expect to have in a trial with \(\alpha + \beta\) people in it!!!

So, when we choose a larger sum of \(\alpha + \beta\), we are choosing to have a large prior sample size. Basically, we are reflecting our prior information as though it were in a similar form to the data (a realization of a ). The larger the sum of \(\alpha\) and \(\beta\) relative to \(n\), the more influence the prior is going to have in the posterior. The smaller the sum of \(\alpha\) and \(\beta\) relative to \(n\), the less influence the prior is going to have in the posterior.

Let’s see that.

The Data

We take a random sample of \(n = 100\) Wake Forest students, faculty, and staff. In the sample, 22 people say they are going to fill out a bracket.

Question 12

Look back at the 4 Beta distributions whose pdfs you plotted in Question 11. Which one represents the smallest prior sample size (and hence the weakest prior)? What is the prior sample size for this prior distribution?

Question 13

Which one represents the largest prior sample size (and hence the strongest prior)? What is the prior sample size for this prior distribution?

Question 14

State the posterior distribution you would get using our data and the prior from Question 12 (this means state the distribution and the value of any needed hyperparameters). Draw the pdf of the distribution and label it Figure 7. Include a vertical line at the mean on your graph.

Question 15

State the posterior distribution you would get using our data and the prior from Question 13. Draw the pdf of the distribution and label it Figure 8. Include a vertical line at the mean on your graph.

Question 16

What is different about the center of these two distributions from Questions 14 and 15? Why do you think that one is higher or lower than the other?

Question 17

What is different about the spread of these two distributions from Questions 14 and 15? Why do you think that one is more or less than the other?

There is also a function in the bayesrules package that allows you to visualize the prior density function, a scaled version of the likelihood and the posterior density function all on the same graph. I’ve adapted the function slightly to allow you to add titles to your plots.

plot_beta_binomial <- function (alpha, beta, y = NULL, n = NULL, prior = TRUE, likelihood = TRUE, posterior = TRUE, title = "titlein") 
{
  suppressMessages(library(ggplot2))
  if (is.null(y) | is.null(n)) 
    warning("To visualize the posterior,\n            specify data y and n")
  g <- ggplot(data = data.frame(x = c(0, 1)), aes(x)) + labs(x = expression(theta), 
                                                             y = "density", title = title) + scale_fill_manual("", values = c(prior = "#f0e442", 
                                                                                                               `(scaled) likelihood` = "#0071b2", posterior = "#009e74"), 
                                                                                                breaks = c("prior", "(scaled) likelihood", 
                                                                                                           "posterior"))
  if (prior == TRUE) {
    g <- g + stat_function(fun = dbeta, args = list(shape1 = alpha, 
                                                    shape2 = beta)) + stat_function(fun = dbeta, args = list(shape1 = alpha, 
                                                                                                             shape2 = beta), geom = "area", alpha = 0.5, 
                                                                                    aes(fill = "prior"))
  }
  if (!is.null(y) & !is.null(n)) {
    alpha_post <- alpha + y
    beta_post <- beta + n - y
    y_data <- y
    like_scaled <- function(x) {
      like_fun <- function(x) {
        dbinom(x = y_data, size = n, prob = x)
      }
      scale_c <- integrate(like_fun, lower = 0, upper = 1)[[1]]
      like_fun(x)/scale_c
    }
  }
  if (!is.null(y) & !is.null(n) & (likelihood != FALSE)) {
    g <- g + stat_function(fun = like_scaled) + stat_function(fun = like_scaled, 
                                                              geom = "area", alpha = 0.5, aes(fill = "(scaled) likelihood"))
  }
  if (!is.null(y) & !is.null(n) & posterior == TRUE) {
    g <- g + stat_function(fun = dbeta, args = list(shape1 = alpha_post, 
                                                    shape2 = beta_post)) + stat_function(fun = dbeta, 
                                                                                         args = list(shape1 = alpha_post, shape2 = beta_post), 
                                                                                         geom = "area", alpha = 0.5, aes(fill = "posterior"))
  }
  g
}

To use the function, use the following structure:

plot_beta_binomial( alpha = , beta = , y = , n = , title = "The Title you want")

Question 18

Using the prior from Question 12 and our data, use the function above to plot the pdf of the prior for \(\theta\), (scaled) likelihood for the data, and the pdf of the posterior for \(\theta\) given the data. Repeat with the prior from Question 13 and stack the two graphs. What is different about the posterior in the two graphs?

Wrapping it up

Today we explored the Beta distribution, with specific focus on the role of the hyperparameters. We will continue to discuss prior specification as we move through the course. Our next step, though, is to focus on what to do with the posterior once we have it. How do we use this to estimate \(\theta\), for instance? This is what we will be working on next class!

Turning in your assignment

When your Markdown document is complete, do a final knit and make sure everything compiles. You will then submit your document on Canvas. You must submit a PDF or html document to Canvas. No other formats will be accepted. Make sure you look through the final document to make sure everything has knit correctly.

Creative Commons License
This work was created by Nicole Dalzell and is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Last updated 2023 January 24.