Bayesian Updating with Continuous Priors |
|
Consider a bernoulli trial with probability \(P(1)=\theta\) and \(P(0)=1-\theta\). Then we can hypothesize that \(\theta\) is anywhere in the range \([0, 1]\) - we have a continuous range of hypotheses.
\[ f(\theta|x)\:d\theta = \frac{p(x|\theta)p(\theta)\:d\theta}{\int_{-\infty}^{\infty}p(x|\theta)p(\theta)\:d\theta}\quad =\frac{\text{prob. of data given limited hypothesis range}}{\text{prob. of data given complete set of hypothesis}} \]
Scheme | hypothesis | prior | likelihood | Bayes numerator | posterior | ||
---|---|---|---|---|---|---|---|
events | inequalities | \(\mathcal{H}\) | \(P(\mathcal{D})\) | \(P(\mathcal{D}|\mathcal{H})\) | \(P(\mathcal{D}|\mathcal{H})P(\mathcal{H})\) | \(P(\mathcal{H}|\mathcal{D})\) | |
values | mass | \(\theta\) | \(p(x)\) | \(p(x|\theta)\) | \(p(x|\theta)p(\theta)\) | \(p(\theta|x)\) | |
value intervals | density | \(\theta\) | \(f(x)\:d\theta\) | \(p(x|\theta)\) | \(p(x|\theta)f(\theta)\:d\theta\) | \(f(\theta|x)\:d\theta\) |
Consider a coin with probability \(\theta\) of heads. The value of \(\theta\) is random with Prior PDF \(f_\Theta(\theta)=2\theta\). Suppose we flip the coin three times and get the sequence H,H,T (\(1,1,0\)). We can compute the Posterior PDF for the random variable \(\Theta\) through a Bayesian update.
Hypothesis | PRIOR PDF | likelihood | Bayes’ Numerator | POSTERIOR PDF |
---|---|---|---|---|
General Form: | \(P(\Delta\Theta)=f_{\Theta}(\theta)\:d\theta\) | \(p(1,1,0|\theta)\) | \(p(1,1,0|\theta)f_\Theta(\theta)\:d\theta\) | \(P(\mathcal{H}_\theta|1,1,0)=\frac{\int p(1,1,0|\theta)f_\Theta(\theta)\:d\theta}{P(1,1,0)}\) |
\(\mathcal{H}_\theta\) | \(2\theta\:d\theta\) | \(\theta^2(1-\theta)\) | \(\theta^2(1-\theta)\cdot 2\theta\:d\theta\) | \(P(\mathcal{H}_\theta|1,1,0)=20\theta^3(1-\theta)\:d\theta\) |
Column Totals: | \(1=\int_0^1f_\Theta(\theta)\:d\theta\) | N/A, fixed \(\theta\) | \(P(1,1,0)=\int_0^1 2\theta^3(1-\theta)\:d\theta\\=\int_0^1 2\theta^3-2\theta^4\:d\theta=\left[\frac{\theta^4}{2}-\frac{2\theta^5}{5}\right]_0^1=0.1\) | \(1=\int_0^1 20\theta^3(1-\theta)\:d\theta\) |
Flat priors are a common default choice of priors, they represent a state of minimal prior knowledge or maximum uncertainty about a parameter before seeing any data. Flat priors (often uniform distributions) are used when we want the data to drive the inference rather than subjective prior beliefs. In cases where we want results to be as unbiased as possible, flat priors help avoid injecting strong assumptions into the model. When there’s a lot of data, the likelihood often dominates the posterior distribution, making the choice of a flat prior less influential.
Consider a coin with probability \(\theta\) of heads. Suppose we toss it once and get heads. Assume a flat prior and find the posterior probability for \(\theta\).
Hypothesis | PRIOR PDF | likelihood | Bayes’ Numerator | POSTERIOR PDF |
---|---|---|---|---|
\(\mathcal{H}_\theta\) | \(1\:d\theta\) | \(P(1|\theta)=\theta\) | \(\theta\) | \(f_\Theta(\theta)=2\theta\) |
totals: | \(\int_0^11\:d\theta=1\) | - | \(\int_0^1\theta d\theta=\left[\frac{\theta^2}{2}\right]_0^1=\frac{1}{2}\) | \(\int_0^12\theta\:d\theta\left[\theta^2\right]_0^1=1\) |
So a priori the coin had a \(50%\) chance of heads \(P(\Theta\leq 0.5)=\int_0^{0.5}1\:d\theta=0.5\) (that is the \(0.5\) quantile corresponds to \(\theta=0.5\))
The posterior probability of the coin is biased towards heads (1): \(P(\Theta<0.5)=\int^1_{0.5}2\theta\:d\theta\\=1^2-\left(\frac{1}{2}\right)^2=\frac{3}{4}\)
The expected value is of the a priori coin is calculated from the prior predictive probability: \(E[\Theta]=\int_0^1f_\Theta(\theta)\theta\:d\theta\\=\int_0^1\theta\:d\theta=\frac{1^2}{2}-0=\frac{1}{2}\)
The expected value is of the posterior coin is more weighted to heads as per the yosterior yredictive probability: \(E[\Theta]=\int_0^1f_\Theta(\theta)\theta\:d\theta\\=\int_0^12\theta^2\:d\theta=\frac{2\cdot1^3}{3}-0=\frac{2}{3}\)
Suppose we toss it again and get heads. Assume a flat prior and find the posterior probability for \(\theta\).
Hypothesis | PRIOR PDF | likelihood | Bayes’ Numerator | POSTERIOR PDF |
---|---|---|---|---|
\(\mathcal{H}_\theta\) | \(2\theta\:d\theta\) | \(P(1|\theta)=\theta\) | \(2\theta^2\) | \(f_\Theta(\theta)=3\theta^2\) |
totals: | \(\int_0^12\theta\:d\theta=1\) | - | \(\int_0^12\theta^2 d\theta=\left[\frac{2\theta^3}{3}\right]_0^1=\frac{2}{3}\) | \(\int_0^13\theta^2\:d\theta=\left[\theta^3\right]_0^1=1\) |
Suppose we toss it again and get tails. Assume a flat prior and find the posterior probability for \(\theta\).
Hypothesis | PRIOR PDF | likelihood | Bayes’ Numerator | POSTERIOR PDF |
---|---|---|---|---|
\(\mathcal{H}_\theta\) | \(3\theta^2\:d\theta\) | \(P(0|\theta)=1-\theta\) | \(3\theta^2-3\theta^3\) | \(f_\Theta(\theta)=12\theta^2-12\theta^3\) |
totals: | \(\int_0^13\theta^2\:d\theta=1\) | - | \(\int_0^1(3\theta^2-3\theta^3) d\theta=\left[\theta^3- \frac{3\theta^4}{4}\right]_0^1=\frac{1}{4}\) | \(\int_0^1\left(12\theta^2-12\theta^3\right)\:d\theta=4-3=1\) |
Consider a bernoulli trial with probability \(P(1)=\theta\) and \(P(0)=1-\theta\). Then we can hypothesize that \(\theta\) is anywhere in the range \([0, 1]\) - we have a continuous range of hypotheses.
\[ f(\theta|x)\:d\theta = \frac{\phi(x|\theta)p(\theta)\:d\theta\:dx}{\left(\int_{-\infty}^{\infty}\phi(x|\theta)p(\theta)\:d\theta\right)dx}\quad =\frac{\text{prob. of data given limited hypothesis range}}{\text{prob. of data given complete set of hypothesis}}\\ \quad\\ \text{where the likelihood}\:\phi(x|\theta)\:dx\:\text{is the probability of observing}\:x\:\text{in the interval}\:dx\:\text{given}\:\theta\\ \text{and the probability of the data is}\quad\phi(x)\:dx=\left(\int_{-\infty}^{\infty}\phi(x|\theta)p(\theta)\:d\theta\right)dx \]
Suppose we have data \(x = 5\) which was drawn from a normal distribution with unknown mean \(\theta\) and standard deviation \(1\). The prior distribution for the unknown parameter \(\theta\) is \(theta\sim N(2, 1)\).
\[ f(\theta)=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(\theta-2\right)^2}{2}} \]
\[ \phi(x=5|\theta)=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(5-\theta\right)^2}{2}} \]
\[ \begin{align} f(\theta)\phi(x=5|\theta)&=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(\theta-2\right)^2}{2}}\cdot\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(5-\theta\right)^2}{2}}\\ &=\frac{1}{2\pi}e^{-\frac{\left(\theta-2\right)^2+\left(5-\theta\right)^2}{2}} =\frac{1}{2\pi}e^{-\frac{(\theta^2-4\theta+4)+(25-10\theta+\theta^2)}{2}}\\ &=\frac{1}{2\pi}e^{\frac{2\theta^2-14\theta+29}{2}}=\frac{1}{2\pi}e^{\theta^2-7\theta+\frac{29}{2}}\\ &=\frac{1}{2\pi}e^{(\theta-\frac{7}{2})^2-\frac{49}{4}+\frac{29}{2}}=\frac{1}{2\pi}e^{(\theta-\frac{7}{2})^2-\frac{49}{4}+\frac{58}{4}}=\frac{1}{2\pi}e^{(\theta-\frac{7}{2})^2-\frac{9}{4}}\\ &=\frac{e^{-\frac{9}{4}}}{2\pi}e^{(\theta-\frac{7}{2})^2}\\ \quad\\ &\therefore f(\theta)\phi(x=5|\theta)=c_1e^{(\theta-\frac{7}{2})^2}\quad\text{where}\quad c_1=\frac{e^{-\frac{9}{4}}}{2\pi}\\ \end{align} \]
Hypothesis | PRIOR PDF | likelihood | Bayes’ Numerator | POSTERIOR PDF |
---|---|---|---|---|
\(\mathcal{H}_\theta\) | \(f(\theta)=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(\theta-2\right)^2}{2}}\:d\theta\) | \(P(5|\theta)=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(5-\theta\right)^2}{2}}\) | \(f(\theta)\phi(x=5|\theta)=c_1e^{(\theta-\frac{7}{2})^2}\) | \(\frac{f(\theta)\phi(x=5|\theta)}{\phi(x=5)}=c_2e^{(\theta-\frac{7}{2})^2}\) |
totals: | \(\int_{-\infty}^\infty\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(\theta-2\right)^2}{2}}\:d\theta=1\) | - | \(\phi(x=5)=\int_{-\infty}^\infty f(\theta)\phi(x=5|\theta)\:d\theta\) | \(\int_{-\infty}^\infty c_2e^{(\theta-\frac{7}{2})^2}\:d\theta=1\) |
Here is the graph of the prior and the posterior pdfs for this example. Note how the data ‘pulls’ the prior (the wider bell on the left) towards the data. The posterior is the narrower bell on the right. After collecting data, we have a new opinion about the mean, and we are more sure of this new opinion.