Bayesian Model Comparison for Bernoulli Data

1. Understanding the Problem

We have a sequence of independent and identically distributed Bernoulli random variables:

\[ Y_1, Y_2, \dots, Y_6 \stackrel{i.i.d.}{\sim} \text{Bernoulli}(\theta) \]

We observe the specific sequence: \[ y_1 = y_2 = y_3 = y_4 = y_5 = 1, \quad y_6 = 0 \]

That is: 5 successes and 1 failure, in that fixed order.

Likelihood function (probability of this exact sequence given \(\theta\)): \[ L(\theta) = P(\text{data} \mid \theta) = \theta^5 (1-\theta)^1 = \theta^5(1-\theta) \]

The Two Models

Model 0 (M₀): \(\theta = \frac{1}{2}\) (a point null hypothesis)
Model 1 (M₁): \(\theta > \frac{1}{2}\) (a composite alternative)

Under Model 1, \(\theta\) is not fixed but has a prior density \(\pi_1(\theta)\) defined on \(\theta \in (1/2, 1)\).

What We Need to Compute

Posterior model probabilities:

\[ P(M_j \mid \text{data}) = \frac{P(M_j) \cdot m_j(\text{data})}{P(M_0)m_0 + P(M_1)m_1}, \quad j = 0,1 \]

where: - \(m_0 = L(1/2) = (1/2)^5 \times (1/2) = (1/2)^6 = 1/64\) - \(m_1 = \int_{1/2}^1 L(\theta) \, \pi_1(\theta) \, d\theta = \int_{1/2}^1 \theta^5(1-\theta) \, \pi_1(\theta) \, d\theta\)

We consider three different prior specifications (i), (ii), (iii).

2. General Setup in R

# Load necessary library for numerical integration
library(stats)

# Data summary
n <- 6
x <- 5

# Likelihood function (unnormalized, exact sequence)
likelihood <- function(theta) {
  theta^5 * (1 - theta)
}

# Marginal likelihood under Model 0 (point null)
m0 <- (1/2)^6
cat("m0 =", m0, "\n")

## m0 = 0.015625

3. Case (i)

Prior:

P(\(M_0\))=0.5, P(\(M_1\))=0.5, \(\pi_1(\theta)\)=0.5, for \(\theta \in\)(1/2, 1)

# Prior density for Model 1
pi1_i <- function(theta) {
  2  # constant on (0.5, 1)
}

# Marginal likelihood under Model 1
integrand_i <- function(theta) {
  likelihood(theta) * pi1_i(theta)
}

m1_i <- integrate(integrand_i, lower = 0.5, upper = 1)$value
cat("m1_i =", m1_i, "\n")

## m1_i = 0.04464286

# Prior odds
prior_odds_i <- 0.5 / 0.5  # = 1

# Posterior odds
post_odds_i <- prior_odds_i * (m0 / m1_i)
cat("Posterior odds (M0/M1) =", post_odds_i, "\n")

## Posterior odds (M0/M1) = 0.35

# Posterior probabilities
pM0_i <- post_odds_i / (1 + post_odds_i)
pM1_i <- 1 - pM0_i

cat("Case (i):\n")

## Case (i):

cat("P(M0 | data) =", pM0_i, "\n")

## P(M0 | data) = 0.2592593

cat("P(M1 | data) =", pM1_i, "\n")

## P(M1 | data) = 0.7407407

Case (ii)

Prior:

P(\(M_0\))=0.8, P(\(M_1\))=0.2, \(\pi_1(\theta)=8(1-\theta)\), for \(\theta \in\)(1/2, 1)

# Prior density for Model 1
pi1_ii <- function(theta) {
  8 * (1 - theta)
}

# Marginal likelihood under Model 1
integrand_ii <- function(theta) {
  likelihood(theta) * pi1_ii(theta)
}

m1_ii <- integrate(integrand_ii, lower = 0.5, upper = 1)$value
cat("m1_ii =", m1_ii, "\n")

## m1_ii = 0.04073661

# Prior odds
prior_odds_ii <- 0.8 / 0.2  # = 4

# Posterior odds
post_odds_ii <- prior_odds_ii * (m0 / m1_ii)

# Posterior probabilities
pM0_ii <- post_odds_ii / (1 + post_odds_ii)
pM1_ii <- 1 - pM0_ii

cat("Case (ii):\n")

## Case (ii):

cat("P(M0 | data) =", pM0_ii, "\n")

## P(M0 | data) = 0.6054054

cat("P(M1 | data) =", pM1_ii, "\n")

## P(M1 | data) = 0.3945946

5. Case (iii)

Prior:

P(\(M_0\))=0.2, P(\(M_1\))=0.8, \(\pi_1(\theta)=48(\theta-1/2)(1-\theta)\), for \(\theta \in\)(1/2, 1)

# Prior density for Model 1
pi1_iii <- function(theta) {
  48 * (theta - 0.5) * (1 - theta)
}

# Marginal likelihood under Model 1
integrand_iii <- function(theta) {
  likelihood(theta) * pi1_iii(theta)
}

m1_iii <- integrate(integrand_iii, lower = 0.5, upper = 1)$value
cat("m1_iii =", m1_iii, "\n")

## m1_iii = 0.05115327

# Prior odds
prior_odds_iii <- 0.2 / 0.8  # = 0.25

# Posterior odds
post_odds_iii <- prior_odds_iii * (m0 / m1_iii)

# Posterior probabilities
pM0_iii <- post_odds_iii / (1 + post_odds_iii)
pM1_iii <- 1 - pM0_iii

cat("Case (iii):\n")

## Case (iii):

cat("P(M0 | data) =", pM0_iii, "\n")

## P(M0 | data) = 0.07094595

cat("P(M1 | data) =", pM1_iii, "\n")

## P(M1 | data) = 0.9290541

Important Notes

1. The Core Question

In this problem, we see two different things both called “prior”:

\(P(M_0)\) and \(P(M_1)\)
\(\pi_1(\theta)\)

Why are both called “prior”, and what are their properties?

2. Two Levels of Prior Belief

Bayesian analysis has hierarchical prior specifications when comparing models.

Level 1: Prior over Models (Discrete)

\[ P(M_0), \quad P(M_1) \]

These represent your belief about which model is true before seeing any data.

Properties: - They are probabilities (between 0 and 1) - They must sum to 1 because \(M_0\) and \(M_1\) are: - Mutually exclusive (cannot both be true) - Exhaustive (cover all possibilities under consideration)

\[ P(M_0) + P(M_1) = 1 \]

Example: If you set \(P(M_0) = 0.8\), you’re saying “before looking at data, I’m 80% confident that \(\theta = 1/2\).”

Level 2: Prior over Parameters within a Model (Continuous)

Under Model 1 (\(\theta > 1/2\)), \(\theta\) is not a single value but a range. We need a prior distribution for \(\theta\):

\[ \pi_1(\theta) \quad \text{for} \quad \theta \in (1/2, 1) \]

Properties: - This is a probability density function (pdf) - It must integrate to 1 over its support:

\[ \int_{1/2}^1 \pi_1(\theta) \, d\theta = 1 \]

Example: In case (ii), \(\pi_1(\theta) = 8(1-\theta)\). Check:

\[ \int_{1/2}^1 8(1-\theta) \, d\theta = 8 \left[ \theta - \frac{\theta^2}{2} \right]_{1/2}^1 = 1 \quad \checkmark \]

This prior says: under Model 1, smaller \(\theta\) (just above 0.5) is more likely than larger \(\theta\).

3. Why Are Both Called “Prior”?

Because both exist before observing the data:

What	Symbol	Exists before data?	Type
Prior probability of Model 0	\(P(M_0)\)	Yes	Discrete probability
Prior probability of Model 1	\(P(M_1)\)	Yes	Discrete probability
Prior density of \(\theta\) under Model 1	\(\pi_1(\theta)\)	Yes	Continuous density

They operate at different levels of the hierarchy: - Model level: \(P(M_0), P(M_1)\) - Parameter level (within Model 1): \(\pi_1(\theta)\)

In Bayesian model comparison, both are “prior” information that gets updated by the data.

4. Verification: All Three Cases Satisfy Both Normalizations

Let’s check each case from the problem:

Case (i)

\(P(M_0) = 0.5,\ P(M_1) = 0.5\) → sum = 1 ✓
\(\pi_1(\theta) = 2\) on (1/2, 1) → \(\int_{1/2}^1 2 \, d\theta = 2 \times (1 - 0.5) = 1\) ✓

Case (ii)

\(P(M_0) = 0.8,\ P(M_1) = 0.2\) → sum = 1 ✓
\(\pi_1(\theta) = 8(1-\theta)\) → \(\int_{1/2}^1 8(1-\theta) \, d\theta = 1\) ✓

Case (iii)

\(P(M_0) = 0.2,\ P(M_1) = 0.8\) → sum = 1 ✓
\(\pi_1(\theta) = 48(\theta - 1/2)(1-\theta)\) → check numerically:

# Verify that pi1 integrates to 1
pi1_iii <- function(theta) {
  48 * (theta - 0.5) * (1 - theta)
}

integral <- integrate(pi1_iii, lower = 0.5, upper = 1)$value
cat("Integral of pi1_iii from 0.5 to 1 =", integral, "\n")

## Integral of pi1_iii from 0.5 to 1 = 1

Bayesian Model Comparison for Bernoulli Data

Step-by-Step Solution

2026-05-27

1. Understanding the Problem

The Two Models

What We Need to Compute

2. General Setup in R

3. Case (i)

Case (ii)

5. Case (iii)

Important Notes

1. The Core Question

2. Two Levels of Prior Belief

Level 1: Prior over Models (Discrete)

Level 2: Prior over Parameters within a Model (Continuous)

3. Why Are Both Called “Prior”?

4. Verification: All Three Cases Satisfy Both Normalizations

Case (i)

Case (ii)

Case (iii)