Exercise 4.17: Different Data, Different Posteriors

This exercise examines how three employees with identical prior beliefs develop different posterior beliefs based on their unique datasets for an internet ad click probability.

Part (a): Prior Distribution Analysis

Let me first create a function to visualize the Beta distribution and analyze the prior Beta(4,3):

# Load necessary libraries
library(ggplot2)
library(dplyr)

# Custom function to create beta plots with enhanced styling
create_beta_plot <- function(alpha, beta, plot_title = "", color_scheme = "steelblue") {
  x_values <- seq(0, 1, by = 0.001)
  y_values <- dbeta(x_values, alpha, beta)
  
  plot(x_values, y_values, type = "l", lwd = 3, col = color_scheme,
       main = plot_title, xlab = "π (Click Probability)", 
       ylab = "Probability Density", cex.main = 1.2)
  grid(col = "lightgray", lty = 2)
}

# Visualize the prior distribution
create_beta_plot(4, 3, "Prior Distribution: Beta(4, 3)", "darkblue")

# Calculate prior statistics
alpha_prior <- 4
beta_prior <- 3
prior_expectation <- alpha_prior / (alpha_prior + beta_prior)
prior_variance <- (alpha_prior * beta_prior) / ((alpha_prior + beta_prior)^2 * (alpha_prior + beta_prior + 1))
prior_std_dev <- sqrt(prior_variance)

cat("Prior Distribution Analysis:\n")
## Prior Distribution Analysis:
cat("Expected value (mean):", round(prior_expectation, 4), "\n")
## Expected value (mean): 0.5714
cat("Standard deviation:", round(prior_std_dev, 4), "\n")
## Standard deviation: 0.175
cat("Variance:", round(prior_variance, 6), "\n")
## Variance: 0.030612

Interpretation of Prior: The Beta(4,3) distribution represents the employees’ shared belief that the click probability is most likely around 0.57, with reasonable uncertainty. The distribution has a slight positive skew, suggesting moderate optimism about the ad’s performance. This prior is equivalent to having observed 7 previous trials with 4 clicks.

Part (b): Deriving Posterior Distributions

Using the Beta-Binomial conjugacy property, I’ll derive each employee’s posterior distribution step by step:

# Employee datasets
employee_data <- data.frame(
  Employee = c("Employee 1", "Employee 2", "Employee 3"),
  Trials = c(1, 10, 100),
  Successes = c(0, 3, 20),
  Failures = c(1, 7, 80)
)

print("Dataset Summary:")
## [1] "Dataset Summary:"
print(employee_data)
##     Employee Trials Successes Failures
## 1 Employee 1      1         0        1
## 2 Employee 2     10         3        7
## 3 Employee 3    100        20       80
# Derive posteriors using conjugacy: Beta(α + successes, β + failures)
# Employee 1: Beta(4 + 0, 3 + 1) = Beta(4, 4)
employee1_post <- c(4, 4)

# Employee 2: Beta(4 + 3, 3 + 7) = Beta(7, 10)
employee2_post <- c(7, 10)

# Employee 3: Beta(4 + 20, 3 + 80) = Beta(24, 83)
employee3_post <- c(24, 83)

cat("\nPosterior Distributions (manually derived):\n")
## 
## Posterior Distributions (manually derived):
cat("Employee 1: Beta(", employee1_post[1], ",", employee1_post[2], ")\n")
## Employee 1: Beta( 4 , 4 )
cat("Employee 2: Beta(", employee2_post[1], ",", employee2_post[2], ")\n")
## Employee 2: Beta( 7 , 10 )
cat("Employee 3: Beta(", employee3_post[1], ",", employee3_post[2], ")\n")
## Employee 3: Beta( 24 , 83 )

Part (c): Comprehensive Bayesian Visualization

I’ll create detailed plots showing the prior, likelihood, and posterior for each employee:

# Function to create comprehensive Bayesian plots
bayesian_analysis_plot <- function(prior_a, prior_b, n_trials, successes, employee_name) {
  pi_range <- seq(0, 1, length.out = 500)
  
  # Prior density
  prior_density <- dbeta(pi_range, prior_a, prior_b)
  
  # Likelihood function (Binomial)
  likelihood <- dbinom(successes, n_trials, pi_range)
  # Normalize likelihood for visualization
  likelihood_normalized <- likelihood / max(likelihood) * max(prior_density)
  
  # Posterior density
  post_a <- prior_a + successes
  post_b <- prior_b + (n_trials - successes)
  posterior_density <- dbeta(pi_range, post_a, post_b)
  
  # Create the plot
  plot(pi_range, prior_density, type = "l", lwd = 2, col = "blue",
       main = paste("Bayesian Analysis -", employee_name),
       xlab = "π", ylab = "Density", ylim = c(0, max(c(prior_density, posterior_density))))
  
  lines(pi_range, likelihood_normalized, lwd = 2, col = "red", lty = 2)
  lines(pi_range, posterior_density, lwd = 3, col = "green")
  
  legend("topright", legend = c("Prior", "Likelihood (scaled)", "Posterior"),
         col = c("blue", "red", "green"), lwd = c(2, 2, 3), lty = c(1, 2, 1))
  grid(col = "lightgray", lty = 3)
}

# Create plots for each employee
par(mfrow = c(2, 2))
bayesian_analysis_plot(4, 3, 1, 0, "Employee 1 (0/1 clicks)")
bayesian_analysis_plot(4, 3, 10, 3, "Employee 2 (3/10 clicks)")
bayesian_analysis_plot(4, 3, 100, 20, "Employee 3 (20/100 clicks)")

# Reset plotting parameters
par(mfrow = c(1, 1))

Part (d): Posterior Analysis and Comparison

# Function to compute beta distribution statistics
compute_beta_statistics <- function(alpha, beta) {
  mean_val <- alpha / (alpha + beta)
  variance_val <- (alpha * beta) / ((alpha + beta)^2 * (alpha + beta + 1))
  std_dev_val <- sqrt(variance_val)
  
  return(list(mean = mean_val, variance = variance_val, std_dev = std_dev_val))
}

# Calculate statistics for all posteriors
emp1_stats <- compute_beta_statistics(4, 4)
emp2_stats <- compute_beta_statistics(7, 10)
emp3_stats <- compute_beta_statistics(24, 83)

# Create comparison table
results_summary <- data.frame(
  Employee = c("Employee 1", "Employee 2", "Employee 3"),
  Posterior = c("Beta(4,4)", "Beta(7,10)", "Beta(24,83)"),
  Mean = c(emp1_stats$mean, emp2_stats$mean, emp3_stats$mean),
  Std_Dev = c(emp1_stats$std_dev, emp2_stats$std_dev, emp3_stats$std_dev),
  Precision = c(1/emp1_stats$variance, 1/emp2_stats$variance, 1/emp3_stats$variance)
)

print("Posterior Comparison Summary:")
## [1] "Posterior Comparison Summary:"
print(round(results_summary[,3:5], 4))
##     Mean Std_Dev Precision
## 1 0.5000  0.1667   36.0000
## 2 0.4118  0.1160   74.3143
## 3 0.2243  0.0401  620.7289
# Overlay plot comparing all posteriors
pi_vals <- seq(0, 1, length.out = 1000)

plot(pi_vals, dbeta(pi_vals, 4, 3), type = "l", lwd = 2, col = "black",
     main = "Comparison: Prior vs. All Posterior Distributions",
     xlab = "π (Click Probability)", ylab = "Density")

lines(pi_vals, dbeta(pi_vals, 4, 4), lwd = 2, col = "orange")
lines(pi_vals, dbeta(pi_vals, 7, 10), lwd = 2, col = "purple")
lines(pi_vals, dbeta(pi_vals, 24, 83), lwd = 2, col = "red")

legend("topright", 
       legend = c("Prior Beta(4,3)", "Employee 1 Beta(4,4)", "Employee 2 Beta(7,10)", "Employee 3 Beta(24,83)"),
       col = c("black", "orange", "purple", "red"), lwd = 2)
grid(col = "lightgray", lty = 2)

Analysis Summary:

  • Employee 1: The single non-click shifted the belief toward a balanced view (mean = 0.5) but with substantial uncertainty (SD = 0.167)
  • Employee 2: The 30% success rate pulled the estimate down to 0.412 with moderate precision
  • Employee 3: The large sample size (100 trials) dominated the prior, yielding a precise estimate of 0.224 with very low uncertainty

Exercise 4.18: Sequential Learning Employee

Part (a): Day-by-Day Sequential Updates

The fourth employee updates their beliefs sequentially as they access each dataset:

# Sequential updating process
initial_belief <- c(4, 3)  # Prior Beta(4, 3)

# Day 1: Incorporate Employee 1's data (0 successes, 1 failure)
day1_belief <- c(initial_belief[1] + 0, initial_belief[2] + 1)
cat("Day 1 Posterior: Beta(", day1_belief[1], ",", day1_belief[2], ")\n")
## Day 1 Posterior: Beta( 4 , 4 )
# Day 2: Incorporate Employee 2's data (3 successes, 7 failures)
day2_belief <- c(day1_belief[1] + 3, day1_belief[2] + 7)
cat("Day 2 Posterior: Beta(", day2_belief[1], ",", day2_belief[2], ")\n")
## Day 2 Posterior: Beta( 7 , 11 )
# Day 3: Incorporate Employee 3's data (20 successes, 80 failures)
day3_belief <- c(day2_belief[1] + 20, day2_belief[2] + 80)
cat("Day 3 Posterior: Beta(", day3_belief[1], ",", day3_belief[2], ")\n")
## Day 3 Posterior: Beta( 27 , 91 )
# Calculate evolving statistics
evolution_stats <- data.frame(
  Day = c("Prior", "Day 1", "Day 2", "Day 3"),
  Distribution = c("Beta(4,3)", "Beta(4,4)", "Beta(7,11)", "Beta(27,91)"),
  Mean = c(4/7, 4/8, 7/18, 27/118),
  Std_Dev = c(sqrt((4*3)/(7^2*8)), sqrt((4*4)/(8^2*9)), sqrt((7*11)/(18^2*19)), sqrt((27*91)/(118^2*119)))
)

print("Sequential Learning Evolution:")
## [1] "Sequential Learning Evolution:"
print(round(evolution_stats[,3:4], 4))
##     Mean Std_Dev
## 1 0.5714  0.1750
## 2 0.5000  0.1667
## 3 0.3889  0.1118
## 4 0.2288  0.0385
# Visualization of sequential learning
pi_sequence <- seq(0, 1, length.out = 800)

# Calculate densities for each stage
prior_seq <- dbeta(pi_sequence, 4, 3)
day1_seq <- dbeta(pi_sequence, 4, 4)
day2_seq <- dbeta(pi_sequence, 7, 11)
day3_seq <- dbeta(pi_sequence, 27, 91)

# Create the sequential learning plot
plot(pi_sequence, prior_seq, type = "l", lwd = 3, col = "blue",
     main = "Sequential Learning: Evolution of Beliefs Over Time",
     xlab = "π (Click Probability)", ylab = "Probability Density",
     ylim = c(0, max(c(prior_seq, day1_seq, day2_seq, day3_seq))))

lines(pi_sequence, day1_seq, lwd = 3, col = "green")
lines(pi_sequence, day2_seq, lwd = 3, col = "orange")
lines(pi_sequence, day3_seq, lwd = 3, col = "red")

legend("topright", 
       legend = c("Prior (Day 0)", "Day 1", "Day 2", "Day 3"),
       col = c("blue", "green", "orange", "red"), lwd = 3)
grid(col = "lightgray", lty = 2)

# Add mean lines
abline(v = 4/7, col = "blue", lty = 2)
abline(v = 4/8, col = "green", lty = 2)
abline(v = 7/18, col = "orange", lty = 2)
abline(v = 27/118, col = "red", lty = 2)

Sequential Learning Description: The employee’s understanding evolved systematically from initial optimism (π ≈ 0.57) through increasing pessimism as negative evidence accumulated. By Day 3, they concluded the true click rate is approximately 0.229 with high confidence.

Part (b): Sequential vs. Batch Comparison

# Batch approach: process all data simultaneously
total_successes_batch <- 0 + 3 + 20  # 23 total successes
total_failures_batch <- 1 + 7 + 80   # 88 total failures

batch_posterior <- c(4 + total_successes_batch, 3 + total_failures_batch)
batch_stats <- compute_beta_statistics(batch_posterior[1], batch_posterior[2])

sequential_day3_stats <- compute_beta_statistics(27, 91)

comparison_methods <- data.frame(
  Method = c("Sequential (Day 3)", "Batch Processing"),
  Distribution = c("Beta(27, 91)", paste0("Beta(", batch_posterior[1], ", ", batch_posterior[2], ")")),
  Mean = c(sequential_day3_stats$mean, batch_stats$mean),
  Std_Dev = c(sequential_day3_stats$std_dev, batch_stats$std_dev)
)

cat("Comparison of Sequential vs. Batch Learning:\n")
## Comparison of Sequential vs. Batch Learning:
print(round(comparison_methods[,3:4], 6))
##       Mean  Std_Dev
## 1 0.228814 0.038508
## 2 0.228814 0.038508
cat("\nNote: Sequential and batch updating now yield identical results,")
## 
## Note: Sequential and batch updating now yield identical results,
cat("\nconfirming the mathematical equivalence principle of Bayesian updating.\n")
## 
## confirming the mathematical equivalence principle of Bayesian updating.

Exercise 5.18: Insect Density Study

Part (a): Statistical Model Selection

Chosen Model: Poisson Distribution

The Poisson model is appropriate for this insect counting study because:

  • Count Nature: We observe discrete, non-negative integer counts
  • Independence: Insect occurrences across different areas are independent
  • Rate Parameter: θ naturally represents insects per square meter
  • Rare Events: The abundance of zero counts suggests rare events, ideal for Poisson modeling

A Normal distribution would be inappropriate because it permits negative values and non-integer counts, violating the fundamental nature of count data.

Part (b): Prior Configuration and Bayesian Analysis

# Configure Gamma prior based on given specifications
# Given: E[θ] = 0.5, SD[θ] = 0.25
# For Gamma(α, β): E[X] = α/β, Var[X] = α/β²

expected_density <- 0.5
standard_deviation <- 0.25
variance_density <- standard_deviation^2

# Solve for parameters: β = E[X]/Var[X], α = E[X] × β
beta_param <- expected_density / variance_density
alpha_param <- expected_density * beta_param

cat("Prior Specification:\n")
## Prior Specification:
cat("Gamma(α =", alpha_param, ", β =", beta_param, ")\n")
## Gamma(α = 4 , β = 8 )
cat("Prior mean:", alpha_param/beta_param, "\n")
## Prior mean: 0.5
cat("Prior standard deviation:", sqrt(alpha_param/beta_param^2), "\n")
## Prior standard deviation: 0.25
# Observational data
insect_observations <- c(3, 2, 5, 1, 2, rep(0, 15))
num_areas <- length(insect_observations)
total_insects <- sum(insect_observations)

cat("\nField Study Summary:\n")
## 
## Field Study Summary:
cat("Areas surveyed:", num_areas, "\n")
## Areas surveyed: 20
cat("Total insects observed:", total_insects, "\n")
## Total insects observed: 13
cat("Empirical mean density:", total_insects/num_areas, "\n")
## Empirical mean density: 0.65
cat("Areas with zero insects:", sum(insect_observations == 0), "(", 
    round(100*sum(insect_observations == 0)/num_areas, 1), "%)\n")
## Areas with zero insects: 15 ( 75 %)
# Posterior derivation using Gamma-Poisson conjugacy
# Prior: Gamma(α, β)
# Likelihood: ∏ Poisson(x_i | θ)
# Posterior: Gamma(α + Σx_i, β + n)

posterior_alpha <- alpha_param + total_insects
posterior_beta <- beta_param + num_areas

cat("Posterior Distribution:\n")
## Posterior Distribution:
cat("Gamma(α =", posterior_alpha, ", β =", posterior_beta, ")\n")
## Gamma(α = 17 , β = 28 )
# Posterior statistics
posterior_mean <- posterior_alpha / posterior_beta
posterior_variance <- posterior_alpha / (posterior_beta^2)
posterior_std <- sqrt(posterior_variance)

cat("Posterior mean:", round(posterior_mean, 4), "\n")
## Posterior mean: 0.6071
cat("Posterior standard deviation:", round(posterior_std, 4), "\n")
## Posterior standard deviation: 0.1473

Comprehensive Bayesian Visualization

# Create comprehensive analysis plot
theta_range <- seq(0, 1.5, length.out = 1000)

# Prior distribution
prior_densities <- dgamma(theta_range, alpha_param, beta_param)

# Likelihood function (properly normalized)
likelihood_function <- function(theta) {
  if(theta <= 0 && total_insects > 0) return(0)
  return(exp(total_insects * log(theta) - num_areas * theta))
}

likelihood_densities <- sapply(theta_range, likelihood_function)
# Scale for visualization
max_prior <- max(prior_densities)
likelihood_densities <- likelihood_densities / max(likelihood_densities) * max_prior * 0.9

# Posterior distribution
posterior_densities <- dgamma(theta_range, posterior_alpha, posterior_beta)

# Create comprehensive plot
plot(theta_range, prior_densities, type = "l", lwd = 3, col = "blue",
     main = "Bayesian Analysis of Insect Density",
     xlab = "θ (insects per m²)", ylab = "Probability Density",
     ylim = c(0, max(c(prior_densities, posterior_densities)) * 1.1))

lines(theta_range, likelihood_densities, lwd = 3, col = "red", lty = 2)
lines(theta_range, posterior_densities, lwd = 3, col = "darkgreen")

# Add vertical reference lines
abline(v = alpha_param/beta_param, col = "blue", lty = 3, lwd = 2)
abline(v = total_insects/num_areas, col = "red", lty = 3, lwd = 2)
abline(v = posterior_mean, col = "darkgreen", lty = 3, lwd = 2)

legend("topright", 
       legend = c("Prior", "Likelihood (scaled)", "Posterior", "Prior Mean", "Sample Mean", "Posterior Mean"),
       col = c("blue", "red", "darkgreen", "blue", "red", "darkgreen"),
       lwd = c(3, 3, 3, 2, 2, 2),
       lty = c(1, 2, 1, 3, 3, 3))

grid(col = "lightgray", lty = 2)

Posterior vs. Prior Analysis: The posterior distribution shows greater alignment with the observed data than the initial prior. The posterior mean (0.607) lies between the prior expectation (0.5) and the sample mean (0.65), demonstrating the Bayesian compromise between prior knowledge and empirical evidence.

Part (c) & (d): Final Results and Ecological Interpretation

# Credible interval calculation
credible_interval <- qgamma(c(0.025, 0.975), posterior_alpha, posterior_beta)

cat("Final Posterior Analysis:\n")
## Final Posterior Analysis:
cat("Mean density:", round(posterior_mean, 4), "insects per m²\n")
## Mean density: 0.6071 insects per m²
cat("Standard deviation:", round(posterior_std, 4), "\n")
## Standard deviation: 0.1473
cat("95% Credible Interval: [", round(credible_interval[1], 4), ",", 
    round(credible_interval[2], 4), "] insects per m²\n")
## 95% Credible Interval: [ 0.3537 , 0.928 ] insects per m²
# Influence analysis
prior_weight <- beta_param / (beta_param + num_areas)
data_weight <- num_areas / (beta_param + num_areas)

cat("\nInfluence Analysis:\n")
## 
## Influence Analysis:
cat("Prior influence:", round(prior_weight * 100, 1), "%\n")
## Prior influence: 28.6 %
cat("Data influence:", round(data_weight * 100, 1), "%\n")
## Data influence: 71.4 %

Biological Interpretation for Field Researchers:

Based on this Bayesian analysis of 20 survey quadrats, I conclude that the regional insect density averages 0.607 insects per m² with 95% confidence that the true density falls between 0.354 and 0.928 insects per m².

Key Ecological Findings:

  • Patchy Distribution: 75% of surveyed areas contained zero insects, indicating highly specific habitat requirements
  • Density Hotspots: The remaining 25% of areas showed variable densities (1-5 insects/m²), suggesting microhabitat preferences
  • Population Structure: This patchy pattern is typical of specialist species with narrow ecological niches

Management Recommendations:

  • Focus conservation efforts on characterizing successful habitat features
  • Investigate environmental variables distinguishing occupied from unoccupied areas
  • Consider the species potentially vulnerable due to habitat specialization
  • Expand sampling to better understand landscape-scale distribution patterns

Conclusion

These exercises demonstrate the power and flexibility of Bayesian statistical methods. Exercise 4.17 shows how different datasets lead to divergent conclusions even with identical priors. Exercise 4.18 illustrates the mathematical equivalence of sequential and batch updating. Exercise 5.18 provides a practical example of how Bayesian methods can inform ecological research and conservation decisions by properly quantifying uncertainty in density estimates.

The Bayesian framework’s strength lies in its ability to formally incorporate prior knowledge while allowing data to update beliefs in a mathematically coherent manner.