Bayes Theorem Lab

Bayesian analysis is flexible and intuitive because it uses the same approach to all problems. We learn about unobserved quantities from observed ones using the laws of probability as an inferential scaffolding. You derived Bayes’ Theorem in lecture from those basic laws. We learned that all Bayesian models have the same parts: the posterior distribution, the joint distribution composed of the likelihood(s) and priors, and and the marginal probability of the data, which serves as a normalizing constant after the data are collected. We will exploit the proportionality between the posterior and the joint to allow us to dispense with the marginal probability of the data using simulation methods for most of this course. However, it is vital that you understand the relationship among the components of Bayes’ Theorem before we do that. This understanding is the foundation for all of the powerful methods that follow. It is what sets Bayesian analysis apart from maximum likelihood.

This problem is a bit of a toy because you will never estimate posterior distributions this way, at least I never have. But, it is an instructive toy. Reliably using Bayesian methods requires understanding how Bayes’ theorem works. The best way to gain this understanding is to compute a posterior distribution from its component parts, the prior, the likelihood, and the marginal distribution of the data. It helps you understand how you multiply and divide by statistical distributions, something you probably don’t do every morning after coffee. You will be thrilled to know that you only need to do this once.

To start, take out your notes on Bayes’ theorem and go through its explanation, particularly the color-coded section on the likelihood, the prior, the joint, and the probability of the data. Keep the equation in front of you as you do this exercise. Think about what each piece means as you write your code.

The key to success is to create code that does the following: 1) creates vectors using the probability density functions in the numerator of Bayes’ theorem (the likelihood and the prior), 2) multiplies these vectors to obtain the joint distribution of the parameters and the data, 3) integrates the joint distribution and 4) divides the joint distribution by its integral. Voila.

Problem

You are interested in estimating the posterior distribution for the mean number of individuals of an invasive plant species per square meter in a disturbed grassland. We will call that mean θ. You have prior information telling you that the average number of these plants is 10.2 m−2 with a standard deviation of the mean = .5. You have a set of fifty observations in hand obtained by sweaty labor in the field. Execute the following steps.

Objectives

Poisson Distribution

  1. Simulate 50 data points from a Poisson distribution with mean θ = 6.4 to represent the data set. (This portrays the data that you gathered from plots, but it is lots easier to obtain.) What is the variance? Be sure to put the R function set.seed(10) before the call to rpois() to assure that we all get the same results. Call the data vector y.
set.seed(10) # for reproducibility 

y <- rpois(50, 6.4) # Random values within Poisson distribution, 50 values, lambda (mean) of 6.4

y # view the 50 simulated data points from the Poisson Distribution with a mean theta of 6.4
##  [1]  6  5  6  8  3  4  5  5  7  6  7  7  3  7  5  6  3  5  6  9  9  7  8  5  6
## [26]  8  9  5  8  5  6  3  4 10  6  8  9 11  7  6  5  4  2  8  5  4  2  6  3  8

Plot the Poisson Distribution as a Historgram

  1. Plot a histogram of the data with density on the y-axis. It turns out that the histogram function in R is not really appropriate for discrete data (why?). Discover how to do a proper histogram for these data. Look into the arm package or the discrete histogram in ggplot.
  • A view of using hist()
hist(y, main = "Historgram of Poisson Distribution")

  • A view of using the Arm package, discrete.hist()
discrete.hist(y)

  • Both functions could create the same thing, you just need to create enough bins for each value to be discrete using hist().

Assign Values

  1. Assign values for the prior mean (mu.prior) and standard deviation (sigma.prior).

These are from the problem above. mu = 10.2, sd = .5.

mu.prior <- 10.2
sigma.prior <- .5

Set up Vector

  1. Set up a vector containing a sequence of values for θ, the mean number of invasive plants. You want this vector to approximate a continuous θ, so be sure it contains values that are not too far apart. Use code like this: theta = seq(0,15,step) where you set step = .01. Setting a value for step with global scope is important. You will use it later when you integrate.
step <- 0.01 # increment
theta <- seq(0, 15, step) # from 0, to 15, by .01

Prior Distribution

Write the mathematical expression for a gamma prior on θ. Be as specific as possible given the information at hand. Next, write an R function for the prior on θ. The function for the prior should return a vector of gamma probability densities, one for each value of θ. It should have arguments 1) the vector for θ you created in the previous step as well as 2) the prior mean and 3) the prior standard deviation. The mean and the standard deviation, of course, will need to be moment-matched to the proper parameters of the gamma distribution in your function. Recall that when a function is composed of a single statement as it is here, the statement can simply follow the function template on the same line; curly brackets are not needed. So, in this case mu.prior = 10.2 and sigma.prior = .5. You could hard-code these in the function template, but that is bad practice.

prior <- function(theta, mu, sigma){ # Prior(theta,mu,sigma)
  shape <- (mu^2/sigma^2) # Moment matching shape
  rate <- (mu/sigma^2) # Moment matching rate
  gamma <- dgamma(theta, shape, rate, log = F) # use the created shape and rate with theta and the dgamma() function
}
prior.data <- prior(theta, mu.prior, sigma.prior) # theta, mu = 10.2, sigma = .5

Plot Prior Distribution

Plot the prior distribution of θ, the probability density of θ as a function of the values of θ.

# start creating data frame for ggplot2

bayes_data <- cbind.data.frame(theta, prior.data) # combine theta, prior distribution data

ggprior <- ggplot(data = bayes_data) +
  geom_line(mapping = aes(x = theta, y = prior.data), size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
    ggtitle("Prior") +
  ylab( "["~theta~"|y]") +
  xlab(~theta) +
  theme_bw()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggprior

Check Moment Matching

Check your moment matching by generating 100,000 random variates from a gamma distribution with parameters matched to the prior mean and standard deviation. Now compute the mean and standard deviation of the random variates. They should be very close to 10.2 and .5. What is this computation called?

shape <- (mu.prior^2/sigma.prior^2) # Moment matching shape
rate <- (mu.prior/sigma.prior^2) # Moment matching rate


rand_gamma <- rgamma(100000, shape, rate) # 100,000 random variates from gamma distribution with moment matching shape and rate from provided mu and sigma
  • Mean:
mean(rand_gamma)
## [1] 10.19569
  • Standard Deviation:
sd(rand_gamma)
## [1] 0.5000646

This process is called Monte Carlo Simulation. A Monte Carlo simulation is a model used to predict the probability of a variety of outcomes when the potential for random variables is present. Monte Carlo simulations help to explain the impact of risk and uncertainty in prediction and forecasting models.

Compute the Likelihood

What is the mathematical expression for the likelihood [y∣θ], assuming that the data are conditionally independent? Be as specific as possible using the information at hand. Write an R function for the likelihood. The function must use all 50 observations to compute the total likelihood across all of the data points (not the log likelihood) for each value of the vector θ. It should have arguments for the vector θ and the data. The function should create and return a vector with elements [y∣θi]. Note that this is the total probability density of all of the data including each value of θi, not the probability density of a single data point. In reality, θ is a continuous random variable, the mean of the Poisson distribution. We are discretizing it here into small intervals. The function template will be something like:

Likelihood_data <- numeric(length(theta)) # create empty vector the length of theta
for (i in 1:length(Likelihood_data)){ # for every value of empty vector "L"
  Likelihood_data[i] <- prod(dpois(y, theta[i])) # take the product of the Poisson distribution for each value of theta...
} # multipled by each value of the data


bayes_data <- cbind.data.frame(bayes_data, Likelihood_data)

Plotting the Likelihood

Plot the likelihood [y∣θi] holding the data constant and varying θ. What is this plot called? Can you say anything about the area under the curve? What happens to the inference we can make based on likelihood alone if we multiply the curve by a constant?

gglike <- ggplot(data = bayes_data) +
  geom_line(mapping = aes(x = theta, y = Likelihood_data), size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
    ggtitle("Likelihood") +
  ylab( "["~theta~"|y]") +
  xlab(~theta) +
  theme_bw()
gglike

The Joint Distribution

What is the mathematical expression for the joint distribution [θ,y]? Your answer should be as specific as possible. I am not looking for the non-specific equation [θ,y]=[y|θ][θ]. Create an R function for the joint distribution of the parameters and the data as the product of the prior and the likelihood functions. Call this function joint. The function should simply call the previous two functions and multiply them. Plot joint(theta) as a function of theta. Does this seem reasonable? Why are the values on the y axis so small? Think about what is going on here.

joint <- function(prior, likelihood){
  (prior * likelihood)
}

joint_data <- joint(prior.data, Likelihood_data)

bayes_data <- cbind.data.frame(bayes_data, joint_data)

ggjoint <- ggplot(data = bayes_data) +
  geom_line(mapping = aes(x = theta, y = joint_data), size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
    ggtitle("Joint") +
  ylab( "["~theta~"|y]") +
  xlab(~theta) +
  theme_bw()
ggjoint

The marginal probability of the data

What is the mathematical expression for the marginal probability of the data [y] ? Again, be as specific as possible with the information you have been given. Approximate the marginal probability of the data, the integral of the likelihood multiplied by the prior, to obtain a normalization constant [y]. How would you accomplish this integration? (Hint–Recall the first principles definition of the definite integral.) Explain the output of this integration, a scalar. Why do we call [y] a “distribution” if it evaluates to a scalar?

joint_sum <- sum(joint_data)

The posterior distribution

What is the mathematical expression for the posterior distribution [θ∣y]? Be as specific as possible using the information you have been given. Compute the posterior distribution by dividing each element of the vector of output produced by the joint function by the integral of the joint function. Plot the posterior as a function of θ.

posterior_data <- (joint_data / joint_sum)*100


bayes_data <- cbind.data.frame(bayes_data, posterior_data)


ggpost <- ggplot(data = bayes_data) +
  geom_line(mapping = aes(x = theta, y = posterior_data), size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
    ggtitle("Posterior") +
  ylab( "["~theta~"|y]") +
  xlab(~theta) +
  theme_bw()
ggpost

Putting it all together

Plot the prior, a histogram of the data, the likelihood, the joint, and the posterior in a six panel layout.

grid.arrange(ggprior, gglike, ggjoint, ggpost)

par(mfrow=c(1,2))

hist(y)
discrete.hist(y)

Plot Prior, Likelihood, and Posterior Together

The likelihood is on a much larger scale than the prior or posterior in this case. To plot them together, we’ll need to rescale the likelihood. To do so, simply divide each value of the likelihood by it’s maximum likelihood, then multiple each value of likelihood by the maximum posterior value.

scaled <- numeric(length(theta)) # empty vector with same length of data
for (i in 1:length(scaled)){ # for each value in the empty vector
  scaled[i] <- ((bayes_data$Likelihood_data[i] / max(bayes_data$Likelihood_data)) * max(bayes_data$posterior_data)) # Equation
}

bayes_data$scaled_likelihood <- scaled # add to data frame

ggall <- ggplot(data = bayes_data) +
  geom_line(mapping = aes(x = theta, y = posterior_data,  col = "Posterior"), size = 2) +
  geom_line(mapping = aes(x = theta, y = scaled_likelihood, col = "Likelihood"), size = 2) +
  geom_line(mapping = aes(x = theta, y = prior.data, col = "Prior"), size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
    ggtitle("All Together") +
  ylab( "["~theta~"|y]") +
  xlab(~theta) +
  theme_bw()
ggall

Some Experimenting

Now that you have these lovely functions working and plots emerging from them, do some experiments to understand the effect of prior information on the posterior distribution of θ relative to the effect of the data. Increase the variance of the prior distribution to 2.5. Reduce it to .1. Increase the number of observations from 50 to 100. Examine the overlaid plots you produced above for each case.

Increase variance to 2.5

set.seed(10) # for reproducibility 

y <- rpois(50, 6.4) # Random values within Poisson distribution, 50 values, lambda (mean) of 6.4

y # view the 50 simulated data points from the Poisson Distribution with a mean theta of 6.4
##  [1]  6  5  6  8  3  4  5  5  7  6  7  7  3  7  5  6  3  5  6  9  9  7  8  5  6
## [26]  8  9  5  8  5  6  3  4 10  6  8  9 11  7  6  5  4  2  8  5  4  2  6  3  8
mu.prior <- 10.2
sigma.prior <- 2.5

step <- 0.01 # increment
theta <- seq(0, 15, step) # from 0, to 15, by .01

prior <- function(theta, mu, sigma){ # Prior(theta,mu,sigma)
  shape <- (mu^2/sigma^2) # Moment matching shape
  rate <- (mu/sigma^2) # Moment matching rate
  gamma <- dgamma(theta, shape, rate, log = F) # use the created shape and rate with theta and the dgamma() function
}
prior.data <- prior(theta, mu.prior, sigma.prior) # theta, mu = 10.2, sigma = 2.5


bayes_data_25 <- cbind.data.frame(theta, prior.data) # combine theta, prior distribution data



shape <- (mu.prior^2/sigma.prior^2) # Moment matching shape
rate <- (mu.prior/sigma.prior^2) # Moment matching rate


rand_gamma <- rgamma(100000, shape, rate) # 100,000 random variates from gamma distribution with moment matching shape and rate from provided mu and sigma

Likelihood_data <- numeric(length(theta)) # create empty vector the length of theta
for (i in 1:length(Likelihood_data)){ # for every value of empty vector "L"
  Likelihood_data[i] <- prod(dpois(y, theta[i])) # take the product of the Poisson distribution for each value of theta...
} # multipled by each value of the data


bayes_data_25 <- cbind.data.frame(bayes_data_25, Likelihood_data)



joint <- function(prior, likelihood){
  (prior * likelihood)
}

joint_data <- joint(prior.data, Likelihood_data)

bayes_data_25 <- cbind.data.frame(bayes_data_25, joint_data)


joint_sum <- sum(joint_data)


posterior_data <- (joint_data / joint_sum)*100


bayes_data_25 <- cbind.data.frame(bayes_data_25, posterior_data)



scaled <- numeric(length(theta)) # empty vector with same length of data
for (i in 1:length(scaled)){ # for each value in the empty vector
  scaled[i] <- ((bayes_data_25$Likelihood_data[i] / max(bayes_data_25$Likelihood_data)) * max(bayes_data_25$posterior_data)) # Equation
}

bayes_data_25$scaled_likelihood <- scaled # add to data frame

ggall <- ggplot(data = bayes_data_25) +
  geom_line(mapping = aes(x = theta, y = posterior_data,  col = "Posterior"), size = 2) +
  geom_line(mapping = aes(x = theta, y = scaled_likelihood, col = "Likelihood"), size = 2) +
  geom_line(mapping = aes(x = theta, y = prior.data, col = "Prior"), size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
    ggtitle("Standard Deviation of 2.5") +
  ylab( "["~theta~"|y]") +
  xlab(~theta) +
  theme_bw()
ggall

Decrease variance to .1

set.seed(10) # for reproducibility 

y <- rpois(50, 6.4) # Random values within Poisson distribution, 50 values, lambda (mean) of 6.4

y # view the 50 simulated data points from the Poisson Distribution with a mean theta of 6.4
##  [1]  6  5  6  8  3  4  5  5  7  6  7  7  3  7  5  6  3  5  6  9  9  7  8  5  6
## [26]  8  9  5  8  5  6  3  4 10  6  8  9 11  7  6  5  4  2  8  5  4  2  6  3  8
mu.prior <- 10.2
sigma.prior <- .1

step <- 0.01 # increment
theta <- seq(0, 15, step) # from 0, to 15, by .01

prior <- function(theta, mu, sigma){ # Prior(theta,mu,sigma)
  shape <- (mu^2/sigma^2) # Moment matching shape
  rate <- (mu/sigma^2) # Moment matching rate
  gamma <- dgamma(theta, shape, rate, log = F) # use the created shape and rate with theta and the dgamma() function
}
prior.data <- prior(theta, mu.prior, sigma.prior) # theta, mu = 10.2, sigma = 2.5


bayes_data_25 <- cbind.data.frame(theta, prior.data) # combine theta, prior distribution data



shape <- (mu.prior^2/sigma.prior^2) # Moment matching shape
rate <- (mu.prior/sigma.prior^2) # Moment matching rate


rand_gamma <- rgamma(100000, shape, rate) # 100,000 random variates from gamma distribution with moment matching shape and rate from provided mu and sigma

Likelihood_data <- numeric(length(theta)) # create empty vector the length of theta
for (i in 1:length(Likelihood_data)){ # for every value of empty vector "L"
  Likelihood_data[i] <- prod(dpois(y, theta[i])) # take the product of the Poisson distribution for each value of theta...
} # multipled by each value of the data


bayes_data_25 <- cbind.data.frame(bayes_data_25, Likelihood_data)



joint <- function(prior, likelihood){
  (prior * likelihood)
}

joint_data <- joint(prior.data, Likelihood_data)

bayes_data_25 <- cbind.data.frame(bayes_data_25, joint_data)


joint_sum <- sum(joint_data)


posterior_data <- (joint_data / joint_sum)*100


bayes_data_25 <- cbind.data.frame(bayes_data_25, posterior_data)



scaled <- numeric(length(theta)) # empty vector with same length of data
for (i in 1:length(scaled)){ # for each value in the empty vector
  scaled[i] <- ((bayes_data_25$Likelihood_data[i] / max(bayes_data_25$Likelihood_data)) * max(bayes_data_25$posterior_data)) # Equation
}

bayes_data_25$scaled_likelihood <- scaled # add to data frame

ggall <- ggplot(data = bayes_data_25) +
  geom_line(mapping = aes(x = theta, y = posterior_data,  col = "Posterior"), size = 2) +
  geom_line(mapping = aes(x = theta, y = scaled_likelihood, col = "Likelihood"), size = 2) +
  geom_line(mapping = aes(x = theta, y = prior.data, col = "Prior"), size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
    ggtitle("Standard Deviation of 0.1") +
  ylab( "["~theta~"|y]") +
  xlab(~theta) +
  theme_bw()
ggall

Increase observations to 100

set.seed(10) # for reproducibility 

y <- rpois(100, 6.4) # Random values within Poisson distribution, 50 values, lambda (mean) of 6.4

y # view the 50 simulated data points from the Poisson Distribution with a mean theta of 6.4
##   [1]  6  5  6  8  3  4  5  5  7  6  7  7  3  7  5  6  3  5  6  9  9  7  8  5  6
##  [26]  8  9  5  8  5  6  3  4 10  6  8  9 11  7  6  5  4  2  8  5  4  2  6  3  8
##  [51]  5 10  5  6  4  7  6  6  6  6  2  3  6  6  9  8  7  6  3  4  3  6  7  6  2
##  [76]  7  5 11  5  4  9  6  4  7  4  2  8  5  4  4  6  8  7  6  5  9  5  6  8  2
mu.prior <- 10.2
sigma.prior <- .5

step <- 0.01 # increment
theta <- seq(0, 15, step) # from 0, to 15, by .01

prior <- function(theta, mu, sigma){ # Prior(theta,mu,sigma)
  shape <- (mu^2/sigma^2) # Moment matching shape
  rate <- (mu/sigma^2) # Moment matching rate
  gamma <- dgamma(theta, shape, rate, log = F) # use the created shape and rate with theta and the dgamma() function
}
prior.data <- prior(theta, mu.prior, sigma.prior) # theta, mu = 10.2, sigma = 2.5


bayes_data_25 <- cbind.data.frame(theta, prior.data) # combine theta, prior distribution data



shape <- (mu.prior^2/sigma.prior^2) # Moment matching shape
rate <- (mu.prior/sigma.prior^2) # Moment matching rate


rand_gamma <- rgamma(100000, shape, rate) # 100,000 random variates from gamma distribution with moment matching shape and rate from provided mu and sigma

Likelihood_data <- numeric(length(theta)) # create empty vector the length of theta
for (i in 1:length(Likelihood_data)){ # for every value of empty vector "L"
  Likelihood_data[i] <- prod(dpois(y, theta[i])) # take the product of the Poisson distribution for each value of theta...
} # multipled by each value of the data


bayes_data_25 <- cbind.data.frame(bayes_data_25, Likelihood_data)



joint <- function(prior, likelihood){
  (prior * likelihood)
}

joint_data <- joint(prior.data, Likelihood_data)

bayes_data_25 <- cbind.data.frame(bayes_data_25, joint_data)


joint_sum <- sum(joint_data)


posterior_data <- (joint_data / joint_sum)*100


bayes_data_25 <- cbind.data.frame(bayes_data_25, posterior_data)



scaled <- numeric(length(theta)) # empty vector with same length of data
for (i in 1:length(scaled)){ # for each value in the empty vector
  scaled[i] <- ((bayes_data_25$Likelihood_data[i] / max(bayes_data_25$Likelihood_data)) * max(bayes_data_25$posterior_data)) # Equation
}

bayes_data_25$scaled_likelihood <- scaled # add to data frame

ggall <- ggplot(data = bayes_data_25) +
  geom_line(mapping = aes(x = theta, y = posterior_data,  col = "Posterior"), size = 2) +
  geom_line(mapping = aes(x = theta, y = scaled_likelihood, col = "Likelihood"), size = 2) +
  geom_line(mapping = aes(x = theta, y = prior.data, col = "Prior"), size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
    ggtitle("Standard Deviation of 0.1") +
  ylab( "["~theta~"|y]") +
  xlab(~theta) +
  theme_bw()
ggall