[Video]
# Sample 10000 draws from Beta(45,55) prior
prior_A <- rbeta(n = 10000, shape1 = 45, shape2 = 55)
# Store the results in a data frame
prior_sim <- data.frame(prior_A)
# Construct a density plot of the prior sample
ggplot(prior_sim, aes(x = prior_A)) +
geom_density()
# Sample 10000 draws from the Beta(1,1) prior
prior_B <- rbeta(n = 10000, shape1 = 1, shape2 = 1)
# Sample 10000 draws from the Beta(100,100) prior
prior_C <- rbeta(n = 10000, shape1 = 100, shape2 = 100)
# Combine the results in a single data frame
prior_sim <- data.frame(samples = c(prior_A, prior_B, prior_C),
priors = rep(c("A","B","C"), each = 10000))
# Plot the 3 priors
ggplot(prior_sim, aes(x = samples, fill = priors)) +
geom_density(alpha = 0.5)
The density plots below illustrate 3 potential prior models for p, the underlying proportion of voters that plan to vote for you. Prior A reflects your original Beta(45,55) prior. In what scenarios would Prior B (Beta(1,1)) or Prior C (Beta(100,100)) be more appropriate?
[Video]
# Define a vector of 1000 p values
p_grid <- seq(from = 0, to = 1, length.out = 1000)
# Simulate 1 poll result for each p in p_grid
poll_result <- rbinom(n = 1000, size = 10, prob = p_grid)
# Create likelihood_sim data frame
likelihood_sim <- data.frame(p_grid, poll_result)
# Density plots of p_grid grouped by poll_result
ggplot(likelihood_sim, aes(x = p_grid, y = poll_result, group = poll_result)) +
geom_density_ridges()
## Picking joint bandwidth of 0.0399
# Density plots of p_grid grouped by poll_result
ggplot(likelihood_sim, aes(x = p_grid, y = poll_result, group = poll_result, fill = poll_result == 6)) +
geom_density_ridges()
## Picking joint bandwidth of 0.0399
In the previous exercise you approximated the likelihood function shown below. The likelihood highlights the relative compatibility of different possible values of p, your underlying election support, with the observed poll in which X=6 of n=10 voters supported you. Specifically, the height of the likelihood function at any given value of p reflects the relative plausibility of observing these particular polling data if your underlying support were equal to p. Thus which of the following two scenarios is more compatible with the poll data?
Scenario 1: your underlying support p is around 45%. Scenario 2: your underlying support p is around 55%.
[Video]
# # DEFINE the model
# vote_model <- "model{
# # Likelihood model for X
# X ~ dbin(p, n)
#
# # Prior model for p
# p ~ dbeta(a, b)
# }"
#
# # COMPILE the model
# vote_jags <- jags.model(textConnection(vote_model),
# data = list(a = 45, b = 55, X = 6, n = 10),
# inits = list(.RNG.name = "base::Wichmann-Hill", .RNG.seed = 100))
#
# # SIMULATE the posterior
# vote_sim <- coda.samples(model = vote_jags, variable.names = c("p"), n.iter = 10000)
#
# # PLOT the posterior
# plot(vote_sim, trace = FALSE)
Michael is a hybrid thinker and doer—a byproduct of being a StrengthsFinder “Learner” over time. With 20+ years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.
Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.
LinkedIn | Twitter | www.michaelmallari.com/data | www.columbia.edu/~mm5470