We have a sequence of independent and identically distributed Bernoulli random variables:
\[ Y_1, Y_2, \dots, Y_6 \stackrel{i.i.d.}{\sim} \text{Bernoulli}(\theta) \]
We observe the specific sequence: \[ y_1 = y_2 = y_3 = y_4 = y_5 = 1, \quad y_6 = 0 \]
That is: 5 successes and 1 failure, in that fixed order.
Likelihood function (probability of this exact sequence given \(\theta\)): \[ L(\theta) = P(\text{data} \mid \theta) = \theta^5 (1-\theta)^1 = \theta^5(1-\theta) \]
Under Model 1, \(\theta\) is not fixed but has a prior density \(\pi_1(\theta)\) defined on \(\theta \in (1/2, 1)\).
Posterior model probabilities:
\[ P(M_j \mid \text{data}) = \frac{P(M_j) \cdot m_j(\text{data})}{P(M_0)m_0 + P(M_1)m_1}, \quad j = 0,1 \]
where: - \(m_0 = L(1/2) = (1/2)^5 \times (1/2) = (1/2)^6 = 1/64\) - \(m_1 = \int_{1/2}^1 L(\theta) \, \pi_1(\theta) \, d\theta = \int_{1/2}^1 \theta^5(1-\theta) \, \pi_1(\theta) \, d\theta\)
We consider three different prior specifications (i), (ii), (iii).
# Load necessary library for numerical integration
library(stats)
# Data summary
n <- 6
x <- 5
# Likelihood function (unnormalized, exact sequence)
likelihood <- function(theta) {
theta^5 * (1 - theta)
}
# Marginal likelihood under Model 0 (point null)
m0 <- (1/2)^6
cat("m0 =", m0, "\n")
## m0 = 0.015625
Prior:
P(\(M_0\))=0.5, P(\(M_1\))=0.5, \(\pi_1(\theta)\)=0.5, for \(\theta \in\)(1/2, 1)
# Prior density for Model 1
pi1_i <- function(theta) {
2 # constant on (0.5, 1)
}
# Marginal likelihood under Model 1
integrand_i <- function(theta) {
likelihood(theta) * pi1_i(theta)
}
m1_i <- integrate(integrand_i, lower = 0.5, upper = 1)$value
cat("m1_i =", m1_i, "\n")
## m1_i = 0.04464286
# Prior odds
prior_odds_i <- 0.5 / 0.5 # = 1
# Posterior odds
post_odds_i <- prior_odds_i * (m0 / m1_i)
cat("Posterior odds (M0/M1) =", post_odds_i, "\n")
## Posterior odds (M0/M1) = 0.35
# Posterior probabilities
pM0_i <- post_odds_i / (1 + post_odds_i)
pM1_i <- 1 - pM0_i
cat("Case (i):\n")
## Case (i):
cat("P(M0 | data) =", pM0_i, "\n")
## P(M0 | data) = 0.2592593
cat("P(M1 | data) =", pM1_i, "\n")
## P(M1 | data) = 0.7407407
Prior:
P(\(M_0\))=0.8, P(\(M_1\))=0.2, \(\pi_1(\theta)=8(1-\theta)\), for \(\theta \in\)(1/2, 1)
# Prior density for Model 1
pi1_ii <- function(theta) {
8 * (1 - theta)
}
# Marginal likelihood under Model 1
integrand_ii <- function(theta) {
likelihood(theta) * pi1_ii(theta)
}
m1_ii <- integrate(integrand_ii, lower = 0.5, upper = 1)$value
cat("m1_ii =", m1_ii, "\n")
## m1_ii = 0.04073661
# Prior odds
prior_odds_ii <- 0.8 / 0.2 # = 4
# Posterior odds
post_odds_ii <- prior_odds_ii * (m0 / m1_ii)
# Posterior probabilities
pM0_ii <- post_odds_ii / (1 + post_odds_ii)
pM1_ii <- 1 - pM0_ii
cat("Case (ii):\n")
## Case (ii):
cat("P(M0 | data) =", pM0_ii, "\n")
## P(M0 | data) = 0.6054054
cat("P(M1 | data) =", pM1_ii, "\n")
## P(M1 | data) = 0.3945946
Prior:
P(\(M_0\))=0.2, P(\(M_1\))=0.8, \(\pi_1(\theta)=48(\theta-1/2)(1-\theta)\), for \(\theta \in\)(1/2, 1)
# Prior density for Model 1
pi1_iii <- function(theta) {
48 * (theta - 0.5) * (1 - theta)
}
# Marginal likelihood under Model 1
integrand_iii <- function(theta) {
likelihood(theta) * pi1_iii(theta)
}
m1_iii <- integrate(integrand_iii, lower = 0.5, upper = 1)$value
cat("m1_iii =", m1_iii, "\n")
## m1_iii = 0.05115327
# Prior odds
prior_odds_iii <- 0.2 / 0.8 # = 0.25
# Posterior odds
post_odds_iii <- prior_odds_iii * (m0 / m1_iii)
# Posterior probabilities
pM0_iii <- post_odds_iii / (1 + post_odds_iii)
pM1_iii <- 1 - pM0_iii
cat("Case (iii):\n")
## Case (iii):
cat("P(M0 | data) =", pM0_iii, "\n")
## P(M0 | data) = 0.07094595
cat("P(M1 | data) =", pM1_iii, "\n")
## P(M1 | data) = 0.9290541
In this problem, we see two different things both called “prior”:
Why are both called “prior”, and what are their properties?
Bayesian analysis has hierarchical prior specifications when comparing models.
\[ P(M_0), \quad P(M_1) \]
These represent your belief about which model is true before seeing any data.
Properties: - They are probabilities (between 0 and 1) - They must sum to 1 because \(M_0\) and \(M_1\) are: - Mutually exclusive (cannot both be true) - Exhaustive (cover all possibilities under consideration)
\[ P(M_0) + P(M_1) = 1 \]
Example: If you set \(P(M_0) = 0.8\), you’re saying “before looking at data, I’m 80% confident that \(\theta = 1/2\).”
Under Model 1 (\(\theta > 1/2\)), \(\theta\) is not a single value but a range. We need a prior distribution for \(\theta\):
\[ \pi_1(\theta) \quad \text{for} \quad \theta \in (1/2, 1) \]
Properties: - This is a probability density function (pdf) - It must integrate to 1 over its support:
\[ \int_{1/2}^1 \pi_1(\theta) \, d\theta = 1 \]
Example: In case (ii), \(\pi_1(\theta) = 8(1-\theta)\). Check:
\[ \int_{1/2}^1 8(1-\theta) \, d\theta = 8 \left[ \theta - \frac{\theta^2}{2} \right]_{1/2}^1 = 1 \quad \checkmark \]
This prior says: under Model 1, smaller \(\theta\) (just above 0.5) is more likely than larger \(\theta\).
Because both exist before observing the data:
| What | Symbol | Exists before data? | Type |
|---|---|---|---|
| Prior probability of Model 0 | \(P(M_0)\) | Yes | Discrete probability |
| Prior probability of Model 1 | \(P(M_1)\) | Yes | Discrete probability |
| Prior density of \(\theta\) under Model 1 | \(\pi_1(\theta)\) | Yes | Continuous density |
They operate at different levels of the hierarchy: - Model level: \(P(M_0), P(M_1)\) - Parameter level (within Model 1): \(\pi_1(\theta)\)
In Bayesian model comparison, both are “prior” information that gets updated by the data.
Let’s check each case from the problem:
# Verify that pi1 integrates to 1
pi1_iii <- function(theta) {
48 * (theta - 0.5) * (1 - theta)
}
integral <- integrate(pi1_iii, lower = 0.5, upper = 1)$value
cat("Integral of pi1_iii from 0.5 to 1 =", integral, "\n")
## Integral of pi1_iii from 0.5 to 1 = 1