Jeffreys’ Prior: Solving the Problem of Arbitrary Parameterization

The Core Problem: Arbitrariness of Parameterization

The main challenge in defining a “non-informative” prior is that the concept of uniformity (e.g., saying “all values of \(\theta\) are equally likely”) is not invariant to transformation.

Example:

Parameter 1: Probability of success, \(\theta\). If I say \(\pi(\theta)=1\) for \(\theta \in [0,1]\), this is a uniform prior.
Parameter 2: Log-odds, \(\phi = \log\left(\frac{\theta}{1-\theta}\right)\). If \(\theta\) is uniform, what is the prior for \(\phi\)? Using the change-of-variables formula, \(\pi(\phi)\) is not uniform. It will be a logistic distribution.

So, which parameter truly gets the “non-informative” uniform prior? Saying \(\pi(\theta) \propto 1\) is arbitrary. A different researcher, working naturally with \(\phi\), would get different results if they used \(\pi(\phi) \propto 1\). A good non-informative prior should not depend on the choice of parameterization.

The Solution: Jeffreys’ Prior (Equation 4.7)

Jeffreys’ prior, \(\pi(\theta) \propto \sqrt{\mathcal{I}(\theta)}\), is designed to solve this exact problem. It is the unique prior that is invariant under reparameterization.

Here’s the “why” behind your explanation:

1. The Goal: Find a prior that is “uniform” for the “right” parameter.

We want a prior for \(\theta\) that, after we transform to some special parameter \(\phi\), becomes truly uniform (\(\pi(\phi) \propto 1\)). This special parameter \(\phi\) is the one for which the Fisher information is constant (i.e., \(\mathcal{I}(\phi) = \text{constant}\)).

2. The Transformation: From \(\theta\) to \(\phi\)

Let’s define a new parameter \(\phi = g(\theta)\) such that the Fisher information for \(\phi\) is 1 (or any constant). It turns out that the function \(g(\theta) = \int \sqrt{\mathcal{I}(\theta)} \, d\theta\) achieves this.

Why? Because the Fisher information transforms in a specific way. If \(\phi = g(\theta)\), then:

\[ \mathcal{I}(\phi) = \mathcal{I}(\theta) \left( \frac{d\theta}{d\phi} \right)^2 \]

If we choose \(\phi = \int \sqrt{\mathcal{I}(\theta)} \, d\theta\), then \(\frac{d\phi}{d\theta} = \sqrt{\mathcal{I}(\theta)}\), so \(\frac{d\theta}{d\phi} = 1 / \sqrt{\mathcal{I}(\theta)}\). Plugging this in:

\[ \mathcal{I}(\phi) = \mathcal{I}(\theta) \times \left( \frac{1}{\sqrt{\mathcal{I}(\theta)}} \right)^2 = \mathcal{I}(\theta) \times \frac{1}{\mathcal{I}(\theta)} = 1 \]

So, in the \(\phi\) parameterization, the Fisher information is constant (1).

3. Applying Jeffreys’ Rule

Jeffreys’ prior for \(\phi\) is \(\pi(\phi) \propto \sqrt{\mathcal{I}(\phi)} = \sqrt{1} = 1\). This is a uniform (and therefore “non-informative” in the intuitive sense) prior for \(\phi\).

Now, what is the induced prior for \(\theta\)? Using the change-of-variables rule from \(\phi\) back to \(\theta\):

\[ \pi(\theta) = \pi(\phi) \cdot \left| \frac{d\phi}{d\theta} \right| \propto 1 \cdot \sqrt{\mathcal{I}(\theta)} \]

Thus, \(\pi(\theta) \propto \sqrt{\mathcal{I}(\theta)}\) is the prior for \(\theta\) that corresponds to a uniform prior for the “variance-stabilizing” parameter \(\phi\).

Applying This to Binomial Example

For a binomial distribution, \(f(x \mid \theta) = \binom{n}{x} \theta^x (1-\theta)^{n-x}\).

Fisher Information: It is known that \(\mathcal{I}(\theta) = \frac{1}{\theta(1-\theta)}\).
Jeffreys’ Prior: \(\pi(\theta) \propto \sqrt{\mathcal{I}(\theta)} = \sqrt{\frac{1}{\theta(1-\theta)}} = [\theta(1-\theta)]^{-1/2}\).
Identify the distribution: This matches the kernel of a Beta distribution with parameters \(\alpha = 1/2\) and \(\beta = 1/2\).
Find the special parameter \(\phi\): \(\phi = \int \sqrt{\mathcal{I}(\theta)} \, d\theta = \int \frac{1}{\sqrt{\theta(1-\theta)}} \, d\theta = 2\arcsin(\sqrt{\theta})\).

Here, \(\phi\) is the arcsine of the square root of \(\theta\).

For \(\phi\), the prior is \(\pi(\phi) \propto 1\). Saying “I know nothing about \(\theta\)” is equivalent to saying “I know nothing about the transformed parameter \(\phi = 2\arcsin(\sqrt{\theta})\).”