Bayesian Updating with Continuous Priors and Discrete Data

Consider a bernoulli trial with probability \(P(1)=\theta\) and \(P(0)=1-\theta\). Then we can hypothesize that \(\theta\) is anywhere in the range \([0, 1]\) - we have a continuous range of hypotheses.

\[ f(\theta|x)\:d\theta = \frac{p(x|\theta)p(\theta)\:d\theta}{\int_{-\infty}^{\infty}p(x|\theta)p(\theta)\:d\theta}\quad =\frac{\text{prob. of data given limited hypothesis range}}{\text{prob. of data given complete set of hypothesis}} \]

Notation Summary:

Scheme hypothesis prior likelihood Bayes numerator posterior
events inequalities \(\mathcal{H}\) \(P(\mathcal{D})\) \(P(\mathcal{D}|\mathcal{H})\) \(P(\mathcal{D}|\mathcal{H})P(\mathcal{H})\) \(P(\mathcal{H}|\mathcal{D})\)
values mass \(\theta\) \(p(x)\) \(p(x|\theta)\) \(p(x|\theta)p(\theta)\) \(p(\theta|x)\)
value intervals density \(\theta\) \(f(x)\:d\theta\) \(p(x|\theta)\) \(p(x|\theta)f(\theta)\:d\theta\) \(f(\theta|x)\:d\theta\)

Example: Distributed Priors

Consider a coin with probability \(\theta\) of heads. The value of \(\theta\) is random with Prior PDF \(f_\Theta(\theta)=2\theta\). Suppose we flip the coin three times and get the sequence H,H,T (\(1,1,0\)). We can compute the Posterior PDF for the random variable \(\Theta\) through a Bayesian update.

Hypothesis PRIOR PDF likelihood Bayes’ Numerator POSTERIOR PDF
General Form: \(P(\Delta\Theta)=f_{\Theta}(\theta)\:d\theta\) \(p(1,1,0|\theta)\) \(p(1,1,0|\theta)f_\Theta(\theta)\:d\theta\) \(P(\mathcal{H}_\theta|1,1,0)=\frac{\int p(1,1,0|\theta)f_\Theta(\theta)\:d\theta}{P(1,1,0)}\)
\(\mathcal{H}_\theta\) \(2\theta\:d\theta\) \(\theta^2(1-\theta)\) \(\theta^2(1-\theta)\cdot 2\theta\:d\theta\) \(P(\mathcal{H}_\theta|1,1,0)=20\theta^3(1-\theta)\:d\theta\)
Column Totals: \(1=\int_0^1f_\Theta(\theta)\:d\theta\) N/A, fixed \(\theta\) \(P(1,1,0)=\int_0^1 2\theta^3(1-\theta)\:d\theta\\=\int_0^1 2\theta^3-2\theta^4\:d\theta=\left[\frac{\theta^4}{2}-\frac{2\theta^5}{5}\right]_0^1=0.1\) \(1=\int_0^1 20\theta^3(1-\theta)\:d\theta\)

Bayesian Updating with flat priors

Flat priors are a common default choice of priors, they represent a state of minimal prior knowledge or maximum uncertainty about a parameter before seeing any data. Flat priors (often uniform distributions) are used when we want the data to drive the inference rather than subjective prior beliefs. In cases where we want results to be as unbiased as possible, flat priors help avoid injecting strong assumptions into the model. When there’s a lot of data, the likelihood often dominates the posterior distribution, making the choice of a flat prior less influential.

Example: Flat/Uniform Priors

Consider a coin with probability \(\theta\) of heads. Suppose we toss it once and get heads. Assume a flat prior and find the posterior probability for \(\theta\).

Hypothesis PRIOR PDF likelihood Bayes’ Numerator POSTERIOR PDF
\(\mathcal{H}_\theta\) \(1\:d\theta\) \(P(1|\theta)=\theta\) \(\theta\) \(f_\Theta(\theta)=2\theta\)
totals: \(\int_0^11\:d\theta=1\) - \(\int_0^1\theta d\theta=\left[\frac{\theta^2}{2}\right]_0^1=\frac{1}{2}\) \(\int_0^12\theta\:d\theta\left[\theta^2\right]_0^1=1\)

So a priori the coin had a \(50%\) chance of heads \(P(\Theta\leq 0.5)=\int_0^{0.5}1\:d\theta=0.5\) (that is the \(0.5\) quantile corresponds to \(\theta=0.5\))

The posterior probability of the coin is biased towards heads (1): \(P(\Theta<0.5)=\int^1_{0.5}2\theta\:d\theta\\=1^2-\left(\frac{1}{2}\right)^2=\frac{3}{4}\)

The expected value is of the a priori coin is calculated from the prior predictive probability: \(E[\Theta]=\int_0^1f_\Theta(\theta)\theta\:d\theta\\=\int_0^1\theta\:d\theta=\frac{1^2}{2}-0=\frac{1}{2}\)

The expected value is of the posterior coin is more weighted to heads as per the yosterior yredictive probability: \(E[\Theta]=\int_0^1f_\Theta(\theta)\theta\:d\theta\\=\int_0^12\theta^2\:d\theta=\frac{2\cdot1^3}{3}-0=\frac{2}{3}\)

Suppose we toss it again and get heads. Assume a flat prior and find the posterior probability for \(\theta\).

Hypothesis PRIOR PDF likelihood Bayes’ Numerator POSTERIOR PDF
\(\mathcal{H}_\theta\) \(2\theta\:d\theta\) \(P(1|\theta)=\theta\) \(2\theta^2\) \(f_\Theta(\theta)=3\theta^2\)
totals: \(\int_0^12\theta\:d\theta=1\) - \(\int_0^12\theta^2 d\theta=\left[\frac{2\theta^3}{3}\right]_0^1=\frac{2}{3}\) \(\int_0^13\theta^2\:d\theta=\left[\theta^3\right]_0^1=1\)

Suppose we toss it again and get tails. Assume a flat prior and find the posterior probability for \(\theta\).

Hypothesis PRIOR PDF likelihood Bayes’ Numerator POSTERIOR PDF
\(\mathcal{H}_\theta\) \(3\theta^2\:d\theta\) \(P(0|\theta)=1-\theta\) \(3\theta^2-3\theta^3\) \(f_\Theta(\theta)=12\theta^2-12\theta^3\)
totals: \(\int_0^13\theta^2\:d\theta=1\) - \(\int_0^1(3\theta^2-3\theta^3) d\theta=\left[\theta^3- \frac{3\theta^4}{4}\right]_0^1=\frac{1}{4}\) \(\int_0^1\left(12\theta^2-12\theta^3\right)\:d\theta=4-3=1\)

Bayesian Updating with Continuous Priors and Continuous Data

Consider a bernoulli trial with probability \(P(1)=\theta\) and \(P(0)=1-\theta\). Then we can hypothesize that \(\theta\) is anywhere in the range \([0, 1]\) - we have a continuous range of hypotheses.

\[ f(\theta|x)\:d\theta = \frac{\phi(x|\theta)p(\theta)\:d\theta\:dx}{\left(\int_{-\infty}^{\infty}\phi(x|\theta)p(\theta)\:d\theta\right)dx}\quad =\frac{\text{prob. of data given limited hypothesis range}}{\text{prob. of data given complete set of hypothesis}}\\ \quad\\ \text{where the likelihood}\:\phi(x|\theta)\:dx\:\text{is the probability of observing}\:x\:\text{in the interval}\:dx\:\text{given}\:\theta\\ \text{and the probability of the data is}\quad\phi(x)\:dx=\left(\int_{-\infty}^{\infty}\phi(x|\theta)p(\theta)\:d\theta\right)dx \]

Updating with Gaussian Priors and Gaussian Data Distribution & the Ball Ache of Integrals

Suppose we have data \(x = 5\) which was drawn from a normal distribution with unknown mean \(\theta\) and standard deviation \(1\). The prior distribution for the unknown parameter \(\theta\) is \(theta\sim N(2, 1)\).

Prior PDF:

\[ f(\theta)=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(\theta-2\right)^2}{2}} \]

Likelihood:

\[ \phi(x=5|\theta)=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(5-\theta\right)^2}{2}} \]

Bayes’ Numerator:

\[ \begin{align} f(\theta)\phi(x=5|\theta)&=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(\theta-2\right)^2}{2}}\cdot\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(5-\theta\right)^2}{2}}\\ &=\frac{1}{2\pi}e^{-\frac{\left(\theta-2\right)^2+\left(5-\theta\right)^2}{2}} =\frac{1}{2\pi}e^{-\frac{(\theta^2-4\theta+4)+(25-10\theta+\theta^2)}{2}}\\ &=\frac{1}{2\pi}e^{\frac{2\theta^2-14\theta+29}{2}}=\frac{1}{2\pi}e^{\theta^2-7\theta+\frac{29}{2}}\\ &=\frac{1}{2\pi}e^{(\theta-\frac{7}{2})^2-\frac{49}{4}+\frac{29}{2}}=\frac{1}{2\pi}e^{(\theta-\frac{7}{2})^2-\frac{49}{4}+\frac{58}{4}}=\frac{1}{2\pi}e^{(\theta-\frac{7}{2})^2-\frac{9}{4}}\\ &=\frac{e^{-\frac{9}{4}}}{2\pi}e^{(\theta-\frac{7}{2})^2}\\ \quad\\ &\therefore f(\theta)\phi(x=5|\theta)=c_1e^{(\theta-\frac{7}{2})^2}\quad\text{where}\quad c_1=\frac{e^{-\frac{9}{4}}}{2\pi}\\ \end{align} \]

Hypothesis PRIOR PDF likelihood Bayes’ Numerator POSTERIOR PDF
\(\mathcal{H}_\theta\) \(f(\theta)=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(\theta-2\right)^2}{2}}\:d\theta\) \(P(5|\theta)=\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(5-\theta\right)^2}{2}}\) \(f(\theta)\phi(x=5|\theta)=c_1e^{(\theta-\frac{7}{2})^2}\) \(\frac{f(\theta)\phi(x=5|\theta)}{\phi(x=5)}=c_2e^{(\theta-\frac{7}{2})^2}\)
totals: \(\int_{-\infty}^\infty\frac{1}{\sqrt{2\pi}}e^{-\frac{\left(\theta-2\right)^2}{2}}\:d\theta=1\) - \(\phi(x=5)=\int_{-\infty}^\infty f(\theta)\phi(x=5|\theta)\:d\theta\) \(\int_{-\infty}^\infty c_2e^{(\theta-\frac{7}{2})^2}\:d\theta=1\)

Here is the graph of the prior and the posterior pdfs for this example. Note how the data ‘pulls’ the prior (the wider bell on the left) towards the data. The posterior is the narrower bell on the right. After collecting data, we have a new opinion about the mean, and we are more sure of this new opinion.