Bayesian analysis is flexible and intuitive because it uses the same approach to all problems. We learn about unobserved quantities from observed ones using the laws of probability as an inferential scaffolding. You derived Bayes’ Theorem in lecture from those basic laws. We learned that all Bayesian models have the same parts: the posterior distribution, the joint distribution composed of the likelihood(s) and priors, and and the marginal probability of the data, which serves as a normalizing constant after the data are collected. We will exploit the proportionality between the posterior and the joint to allow us to dispense with the marginal probability of the data using simulation methods for most of this course. However, it is vital that you understand the relationship among the components of Bayes’ Theorem before we do that. This understanding is the foundation for all of the powerful methods that follow. It is what sets Bayesian analysis apart from maximum likelihood.
This problem is a bit of a toy because you will never estimate posterior distributions this way, at least I never have. But, it is an instructive toy. Reliably using Bayesian methods requires understanding how Bayes’ theorem works. The best way to gain this understanding is to compute a posterior distribution from its component parts, the prior, the likelihood, and the marginal distribution of the data. It helps you understand how you multiply and divide by statistical distributions, something you probably don’t do every morning after coffee. You will be thrilled to know that you only need to do this once.
To start, take out your notes on Bayes’ theorem and go through its explanation, particularly the color-coded section on the likelihood, the prior, the joint, and the probability of the data. Keep the equation in front of you as you do this exercise. Think about what each piece means as you write your code.
The key to success is to create code that does the following: 1) creates vectors using the probability density functions in the numerator of Bayes’ theorem (the likelihood and the prior), 2) multiplies these vectors to obtain the joint distribution of the parameters and the data, 3) integrates the joint distribution and 4) divides the joint distribution by its integral. Voila.
You are interested in estimating the posterior distribution for the mean number of individuals of an invasive plant species per square meter in a disturbed grassland. We will call that mean θ. You have prior information telling you that the average number of these plants is 10.2 m−2 with a standard deviation of the mean = .5. You have a set of fifty observations in hand obtained by sweaty labor in the field. Execute the following steps.
set.seed(10) # for reproducibility
y <- rpois(50, 6.4) # Random values within Poisson distribution, 50 values, lambda (mean) of 6.4
y # view the 50 simulated data points from the Poisson Distribution with a mean theta of 6.4
## [1] 6 5 6 8 3 4 5 5 7 6 7 7 3 7 5 6 3 5 6 9 9 7 8 5 6
## [26] 8 9 5 8 5 6 3 4 10 6 8 9 11 7 6 5 4 2 8 5 4 2 6 3 8
hist(y, main = "Historgram of Poisson Distribution")
discrete.hist(y)
These are from the problem above. mu = 10.2, sd = .5.
mu.prior <- 10.2
sigma.prior <- .5
step <- 0.01 # increment
theta <- seq(0, 15, step) # from 0, to 15, by .01
Write the mathematical expression for a gamma prior on θ. Be as specific as possible given the information at hand. Next, write an R function for the prior on θ. The function for the prior should return a vector of gamma probability densities, one for each value of θ. It should have arguments 1) the vector for θ you created in the previous step as well as 2) the prior mean and 3) the prior standard deviation. The mean and the standard deviation, of course, will need to be moment-matched to the proper parameters of the gamma distribution in your function. Recall that when a function is composed of a single statement as it is here, the statement can simply follow the function template on the same line; curly brackets are not needed. So, in this case mu.prior = 10.2 and sigma.prior = .5. You could hard-code these in the function template, but that is bad practice.
prior <- function(theta, mu, sigma){ # Prior(theta,mu,sigma)
shape <- (mu^2/sigma^2) # Moment matching shape
rate <- (mu/sigma^2) # Moment matching rate
gamma <- dgamma(theta, shape, rate, log = F) # use the created shape and rate with theta and the dgamma() function
}
prior.data <- prior(theta, mu.prior, sigma.prior) # theta, mu = 10.2, sigma = .5
Plot the prior distribution of θ, the probability density of θ as a function of the values of θ.
# start creating data frame for ggplot2
bayes_data <- cbind.data.frame(theta, prior.data) # combine theta, prior distribution data
ggprior <- ggplot(data = bayes_data) +
geom_line(mapping = aes(x = theta, y = prior.data), size = 2) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
ggtitle("Prior") +
ylab( "["~theta~"|y]") +
xlab(~theta) +
theme_bw()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggprior
Check your moment matching by generating 100,000 random variates from a gamma distribution with parameters matched to the prior mean and standard deviation. Now compute the mean and standard deviation of the random variates. They should be very close to 10.2 and .5. What is this computation called?
shape <- (mu.prior^2/sigma.prior^2) # Moment matching shape
rate <- (mu.prior/sigma.prior^2) # Moment matching rate
rand_gamma <- rgamma(100000, shape, rate) # 100,000 random variates from gamma distribution with moment matching shape and rate from provided mu and sigma
mean(rand_gamma)
## [1] 10.19569
sd(rand_gamma)
## [1] 0.5000646
This process is called Monte Carlo Simulation. A Monte Carlo simulation is a model used to predict the probability of a variety of outcomes when the potential for random variables is present. Monte Carlo simulations help to explain the impact of risk and uncertainty in prediction and forecasting models.
What is the mathematical expression for the likelihood [y∣θ], assuming that the data are conditionally independent? Be as specific as possible using the information at hand. Write an R function for the likelihood. The function must use all 50 observations to compute the total likelihood across all of the data points (not the log likelihood) for each value of the vector θ. It should have arguments for the vector θ and the data. The function should create and return a vector with elements [y∣θi]. Note that this is the total probability density of all of the data including each value of θi, not the probability density of a single data point. In reality, θ is a continuous random variable, the mean of the Poisson distribution. We are discretizing it here into small intervals. The function template will be something like:
Likelihood_data <- numeric(length(theta)) # create empty vector the length of theta
for (i in 1:length(Likelihood_data)){ # for every value of empty vector "L"
Likelihood_data[i] <- prod(dpois(y, theta[i])) # take the product of the Poisson distribution for each value of theta...
} # multipled by each value of the data
bayes_data <- cbind.data.frame(bayes_data, Likelihood_data)
Plot the likelihood [y∣θi] holding the data constant and varying θ. What is this plot called? Can you say anything about the area under the curve? What happens to the inference we can make based on likelihood alone if we multiply the curve by a constant?
gglike <- ggplot(data = bayes_data) +
geom_line(mapping = aes(x = theta, y = Likelihood_data), size = 2) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
ggtitle("Likelihood") +
ylab( "["~theta~"|y]") +
xlab(~theta) +
theme_bw()
gglike
What is the mathematical expression for the joint distribution [θ,y]? Your answer should be as specific as possible. I am not looking for the non-specific equation [θ,y]=[y|θ][θ]. Create an R function for the joint distribution of the parameters and the data as the product of the prior and the likelihood functions. Call this function joint. The function should simply call the previous two functions and multiply them. Plot joint(theta) as a function of theta. Does this seem reasonable? Why are the values on the y axis so small? Think about what is going on here.
joint <- function(prior, likelihood){
(prior * likelihood)
}
joint_data <- joint(prior.data, Likelihood_data)
bayes_data <- cbind.data.frame(bayes_data, joint_data)
ggjoint <- ggplot(data = bayes_data) +
geom_line(mapping = aes(x = theta, y = joint_data), size = 2) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
ggtitle("Joint") +
ylab( "["~theta~"|y]") +
xlab(~theta) +
theme_bw()
ggjoint
What is the mathematical expression for the marginal probability of the data [y] ? Again, be as specific as possible with the information you have been given. Approximate the marginal probability of the data, the integral of the likelihood multiplied by the prior, to obtain a normalization constant [y]. How would you accomplish this integration? (Hint–Recall the first principles definition of the definite integral.) Explain the output of this integration, a scalar. Why do we call [y] a “distribution” if it evaluates to a scalar?
joint_sum <- sum(joint_data)
What is the mathematical expression for the posterior distribution [θ∣y]? Be as specific as possible using the information you have been given. Compute the posterior distribution by dividing each element of the vector of output produced by the joint function by the integral of the joint function. Plot the posterior as a function of θ.
posterior_data <- (joint_data / joint_sum)*100
bayes_data <- cbind.data.frame(bayes_data, posterior_data)
ggpost <- ggplot(data = bayes_data) +
geom_line(mapping = aes(x = theta, y = posterior_data), size = 2) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
ggtitle("Posterior") +
ylab( "["~theta~"|y]") +
xlab(~theta) +
theme_bw()
ggpost
Plot the prior, a histogram of the data, the likelihood, the joint, and the posterior in a six panel layout.
grid.arrange(ggprior, gglike, ggjoint, ggpost)
par(mfrow=c(1,2))
hist(y)
discrete.hist(y)
The likelihood is on a much larger scale than the prior or posterior in this case. To plot them together, we’ll need to rescale the likelihood. To do so, simply divide each value of the likelihood by it’s maximum likelihood, then multiple each value of likelihood by the maximum posterior value.
scaled <- numeric(length(theta)) # empty vector with same length of data
for (i in 1:length(scaled)){ # for each value in the empty vector
scaled[i] <- ((bayes_data$Likelihood_data[i] / max(bayes_data$Likelihood_data)) * max(bayes_data$posterior_data)) # Equation
}
bayes_data$scaled_likelihood <- scaled # add to data frame
ggall <- ggplot(data = bayes_data) +
geom_line(mapping = aes(x = theta, y = posterior_data, col = "Posterior"), size = 2) +
geom_line(mapping = aes(x = theta, y = scaled_likelihood, col = "Likelihood"), size = 2) +
geom_line(mapping = aes(x = theta, y = prior.data, col = "Prior"), size = 2) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
ggtitle("All Together") +
ylab( "["~theta~"|y]") +
xlab(~theta) +
theme_bw()
ggall
Now that you have these lovely functions working and plots emerging from them, do some experiments to understand the effect of prior information on the posterior distribution of θ relative to the effect of the data. Increase the variance of the prior distribution to 2.5. Reduce it to .1. Increase the number of observations from 50 to 100. Examine the overlaid plots you produced above for each case.
set.seed(10) # for reproducibility
y <- rpois(50, 6.4) # Random values within Poisson distribution, 50 values, lambda (mean) of 6.4
y # view the 50 simulated data points from the Poisson Distribution with a mean theta of 6.4
## [1] 6 5 6 8 3 4 5 5 7 6 7 7 3 7 5 6 3 5 6 9 9 7 8 5 6
## [26] 8 9 5 8 5 6 3 4 10 6 8 9 11 7 6 5 4 2 8 5 4 2 6 3 8
mu.prior <- 10.2
sigma.prior <- 2.5
step <- 0.01 # increment
theta <- seq(0, 15, step) # from 0, to 15, by .01
prior <- function(theta, mu, sigma){ # Prior(theta,mu,sigma)
shape <- (mu^2/sigma^2) # Moment matching shape
rate <- (mu/sigma^2) # Moment matching rate
gamma <- dgamma(theta, shape, rate, log = F) # use the created shape and rate with theta and the dgamma() function
}
prior.data <- prior(theta, mu.prior, sigma.prior) # theta, mu = 10.2, sigma = 2.5
bayes_data_25 <- cbind.data.frame(theta, prior.data) # combine theta, prior distribution data
shape <- (mu.prior^2/sigma.prior^2) # Moment matching shape
rate <- (mu.prior/sigma.prior^2) # Moment matching rate
rand_gamma <- rgamma(100000, shape, rate) # 100,000 random variates from gamma distribution with moment matching shape and rate from provided mu and sigma
Likelihood_data <- numeric(length(theta)) # create empty vector the length of theta
for (i in 1:length(Likelihood_data)){ # for every value of empty vector "L"
Likelihood_data[i] <- prod(dpois(y, theta[i])) # take the product of the Poisson distribution for each value of theta...
} # multipled by each value of the data
bayes_data_25 <- cbind.data.frame(bayes_data_25, Likelihood_data)
joint <- function(prior, likelihood){
(prior * likelihood)
}
joint_data <- joint(prior.data, Likelihood_data)
bayes_data_25 <- cbind.data.frame(bayes_data_25, joint_data)
joint_sum <- sum(joint_data)
posterior_data <- (joint_data / joint_sum)*100
bayes_data_25 <- cbind.data.frame(bayes_data_25, posterior_data)
scaled <- numeric(length(theta)) # empty vector with same length of data
for (i in 1:length(scaled)){ # for each value in the empty vector
scaled[i] <- ((bayes_data_25$Likelihood_data[i] / max(bayes_data_25$Likelihood_data)) * max(bayes_data_25$posterior_data)) # Equation
}
bayes_data_25$scaled_likelihood <- scaled # add to data frame
ggall <- ggplot(data = bayes_data_25) +
geom_line(mapping = aes(x = theta, y = posterior_data, col = "Posterior"), size = 2) +
geom_line(mapping = aes(x = theta, y = scaled_likelihood, col = "Likelihood"), size = 2) +
geom_line(mapping = aes(x = theta, y = prior.data, col = "Prior"), size = 2) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
ggtitle("Standard Deviation of 2.5") +
ylab( "["~theta~"|y]") +
xlab(~theta) +
theme_bw()
ggall
set.seed(10) # for reproducibility
y <- rpois(50, 6.4) # Random values within Poisson distribution, 50 values, lambda (mean) of 6.4
y # view the 50 simulated data points from the Poisson Distribution with a mean theta of 6.4
## [1] 6 5 6 8 3 4 5 5 7 6 7 7 3 7 5 6 3 5 6 9 9 7 8 5 6
## [26] 8 9 5 8 5 6 3 4 10 6 8 9 11 7 6 5 4 2 8 5 4 2 6 3 8
mu.prior <- 10.2
sigma.prior <- .1
step <- 0.01 # increment
theta <- seq(0, 15, step) # from 0, to 15, by .01
prior <- function(theta, mu, sigma){ # Prior(theta,mu,sigma)
shape <- (mu^2/sigma^2) # Moment matching shape
rate <- (mu/sigma^2) # Moment matching rate
gamma <- dgamma(theta, shape, rate, log = F) # use the created shape and rate with theta and the dgamma() function
}
prior.data <- prior(theta, mu.prior, sigma.prior) # theta, mu = 10.2, sigma = 2.5
bayes_data_25 <- cbind.data.frame(theta, prior.data) # combine theta, prior distribution data
shape <- (mu.prior^2/sigma.prior^2) # Moment matching shape
rate <- (mu.prior/sigma.prior^2) # Moment matching rate
rand_gamma <- rgamma(100000, shape, rate) # 100,000 random variates from gamma distribution with moment matching shape and rate from provided mu and sigma
Likelihood_data <- numeric(length(theta)) # create empty vector the length of theta
for (i in 1:length(Likelihood_data)){ # for every value of empty vector "L"
Likelihood_data[i] <- prod(dpois(y, theta[i])) # take the product of the Poisson distribution for each value of theta...
} # multipled by each value of the data
bayes_data_25 <- cbind.data.frame(bayes_data_25, Likelihood_data)
joint <- function(prior, likelihood){
(prior * likelihood)
}
joint_data <- joint(prior.data, Likelihood_data)
bayes_data_25 <- cbind.data.frame(bayes_data_25, joint_data)
joint_sum <- sum(joint_data)
posterior_data <- (joint_data / joint_sum)*100
bayes_data_25 <- cbind.data.frame(bayes_data_25, posterior_data)
scaled <- numeric(length(theta)) # empty vector with same length of data
for (i in 1:length(scaled)){ # for each value in the empty vector
scaled[i] <- ((bayes_data_25$Likelihood_data[i] / max(bayes_data_25$Likelihood_data)) * max(bayes_data_25$posterior_data)) # Equation
}
bayes_data_25$scaled_likelihood <- scaled # add to data frame
ggall <- ggplot(data = bayes_data_25) +
geom_line(mapping = aes(x = theta, y = posterior_data, col = "Posterior"), size = 2) +
geom_line(mapping = aes(x = theta, y = scaled_likelihood, col = "Likelihood"), size = 2) +
geom_line(mapping = aes(x = theta, y = prior.data, col = "Prior"), size = 2) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
ggtitle("Standard Deviation of 0.1") +
ylab( "["~theta~"|y]") +
xlab(~theta) +
theme_bw()
ggall
set.seed(10) # for reproducibility
y <- rpois(100, 6.4) # Random values within Poisson distribution, 50 values, lambda (mean) of 6.4
y # view the 50 simulated data points from the Poisson Distribution with a mean theta of 6.4
## [1] 6 5 6 8 3 4 5 5 7 6 7 7 3 7 5 6 3 5 6 9 9 7 8 5 6
## [26] 8 9 5 8 5 6 3 4 10 6 8 9 11 7 6 5 4 2 8 5 4 2 6 3 8
## [51] 5 10 5 6 4 7 6 6 6 6 2 3 6 6 9 8 7 6 3 4 3 6 7 6 2
## [76] 7 5 11 5 4 9 6 4 7 4 2 8 5 4 4 6 8 7 6 5 9 5 6 8 2
mu.prior <- 10.2
sigma.prior <- .5
step <- 0.01 # increment
theta <- seq(0, 15, step) # from 0, to 15, by .01
prior <- function(theta, mu, sigma){ # Prior(theta,mu,sigma)
shape <- (mu^2/sigma^2) # Moment matching shape
rate <- (mu/sigma^2) # Moment matching rate
gamma <- dgamma(theta, shape, rate, log = F) # use the created shape and rate with theta and the dgamma() function
}
prior.data <- prior(theta, mu.prior, sigma.prior) # theta, mu = 10.2, sigma = 2.5
bayes_data_25 <- cbind.data.frame(theta, prior.data) # combine theta, prior distribution data
shape <- (mu.prior^2/sigma.prior^2) # Moment matching shape
rate <- (mu.prior/sigma.prior^2) # Moment matching rate
rand_gamma <- rgamma(100000, shape, rate) # 100,000 random variates from gamma distribution with moment matching shape and rate from provided mu and sigma
Likelihood_data <- numeric(length(theta)) # create empty vector the length of theta
for (i in 1:length(Likelihood_data)){ # for every value of empty vector "L"
Likelihood_data[i] <- prod(dpois(y, theta[i])) # take the product of the Poisson distribution for each value of theta...
} # multipled by each value of the data
bayes_data_25 <- cbind.data.frame(bayes_data_25, Likelihood_data)
joint <- function(prior, likelihood){
(prior * likelihood)
}
joint_data <- joint(prior.data, Likelihood_data)
bayes_data_25 <- cbind.data.frame(bayes_data_25, joint_data)
joint_sum <- sum(joint_data)
posterior_data <- (joint_data / joint_sum)*100
bayes_data_25 <- cbind.data.frame(bayes_data_25, posterior_data)
scaled <- numeric(length(theta)) # empty vector with same length of data
for (i in 1:length(scaled)){ # for each value in the empty vector
scaled[i] <- ((bayes_data_25$Likelihood_data[i] / max(bayes_data_25$Likelihood_data)) * max(bayes_data_25$posterior_data)) # Equation
}
bayes_data_25$scaled_likelihood <- scaled # add to data frame
ggall <- ggplot(data = bayes_data_25) +
geom_line(mapping = aes(x = theta, y = posterior_data, col = "Posterior"), size = 2) +
geom_line(mapping = aes(x = theta, y = scaled_likelihood, col = "Likelihood"), size = 2) +
geom_line(mapping = aes(x = theta, y = prior.data, col = "Prior"), size = 2) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
ggtitle("Standard Deviation of 0.1") +
ylab( "["~theta~"|y]") +
xlab(~theta) +
theme_bw()
ggall