In the bayesian approach to parameter estimation we treat \(\theta\) as a random variable, whose distribution, which we call the “prior distribution” or simply “prior” is given by \(f_\Theta(\theta)\). This prior distribution represents what we know about the parameter before observing any data, \(X\). Note that \(\Theta\) is a random variable unlike before where the parameter we were trying to estimate was a fixed uknown constant.

For a given value \(\Theta = \theta\) we have the probability distribution of the data is \(f_{X|\Theta}(x|\theta)\), and so the joint distribution of \(X\) and \(\Theta\) is as follows \[f_{X,\Theta}(x,\theta) = f_{X|\Theta}(x|\theta)f_\Theta(\theta)\]

We can easily formulate the distribution of \(\Theta\) using Baye’s Rule by combining the above and the marginal of \(X\). We formulate the marginal distribution of \(X\) as follows: \[f_X(x) = \int f_{X,\Theta}(x,\theta)d\theta = \int f_{X|\Theta}(x|\theta)f_\Theta(\theta)d\theta\] And so applying Baye’s Rule we formulate the distribution of \(\Theta\) to be: \[f_{\Theta|X}(\theta|x) = \frac{f_{X|\Theta}(x|\theta)f_\Theta(\theta)}{\int f_{X|\Theta}(x|\theta)f_\Theta(\theta)d\theta}\]

This last equation is called the posterior distribution, it represents what is known about \(\Theta\) having observed the data \(X\).