Bayesian Linear Regression: the Updating Form of the Posterior Distributions

1. Posterior distribution of Normal Inverse-Gamma prior

Assume \(y \sim N\left( X \beta, \sigma^{2} V\right)\), where \(V\) is known and \(\left\{\beta, \sigma^{2}\right\}\) is unknown. We use a Normal Inverse-Gamma (NIG) prior: \[ \begin{align} P(\beta, \sigma^{2}) &= NIG \left(\beta, \sigma^{2} \mid m_{0}, M_{0}, a_{0}, b_{0}\right) \\ &= IG\left(\sigma^{2} \mid a_{0}, b_{0}\right) \cdot N\left(\beta \mid m_{0}, \sigma^{2} M_{0}\right) \\ \end{align} \]

The posterior distribution is given by:

\[ \boxed{ \begin{align}\label{eq:post_dist} P\left(\beta, \sigma^{2} \mid y\right) & \propto \left(\frac{1}{\sigma^{2}}\right)^{a_{0}+p+1} e^{-\frac{b_{0}}{\sigma^{2}}} e^{-\frac{1}{2 \sigma^{2}} [Q \left(\beta, m_{0}, M_{0}\right)+Q \left(y, X \beta, V\right)]}\; \end{align} } \]

where \(Q(x, m, M)=(x-m)^{\top} M^{-1} (x-m)\).

Click here for more details of above formula

\[ \begin{align} P\left(\beta, \sigma^{2} \mid y\right) & \propto NIG\left(\beta, \sigma^{2} \mid m_{0}, M_{0}, a_{0}, b_{0}\right) \cdot N\left(y \mid X \beta, \sigma^{2} V\right) \nonumber\\ & \propto IG\left(\sigma^{2} \mid a_{0}, b_{0}\right) \cdot N\left(\beta \mid m_{0}, \sigma^{2} M_{0}\right) \cdot N\left(y \mid X \beta, \sigma^{2} V\right) \nonumber\\ & \propto \frac{b_0^{a_0}}{\Gamma\left(a_{0}\right)} \left(\frac{1}{\sigma^{2}}\right)^{a_{0}+1} e^{-\frac{b_{0}}{\sigma^{2}}} \frac{1}{(2 \pi \sigma^{2})^{\frac{p}{2}}\left| M_{0}\right|^{\frac{1}{2}}} e^{-\frac{1}{2 \sigma^{2}} Q \left(\beta, m_{0}, M_{0}\right)} \frac{1}{(2 \pi \sigma^{2})^{\frac{p}{2}}\left| V\right|^{\frac{1}{2}}} e^{-\frac{1}{2 \sigma^{2}} Q \left(y, X \beta, V\right)} \nonumber\\ & \propto \left(\frac{1}{\sigma^{2}}\right)^{a_{0}+p+1} e^{-\frac{b_{0}}{\sigma^{2}}} e^{-\frac{1}{2 \sigma^{2}} [Q \left(\beta, m_{0}, M_{0}\right)+Q \left(y, X \beta, V\right)]}\; \end{align} \]

We can further simplify that \[ \boxed{ \begin{align}\label{eq:multivariate_completion_square} Q \left(\beta, m_{0}, M_{0}\right)+Q \left(y, X \beta, V\right) &= (\beta - M_{1}m_{1})^{\top}M_{1}^{-1}(\beta - M_{1}m_{1}) +c^{\ast}\; \end{align} } \] where \(M_{1}\) is a symmetric positive definite matrix, \(m_{1}\) is a vector, and \(c\) & \(c^{\ast}\) are scalars given by:

\[ \begin{align} M_{1}^{-1} &= M_{0}^{-1} + X^{\top}V^{-1}X\; \\ m_{1} &= M_{0}^{-1}m_{0} + X^{\top}V^{-1}y\; \\ c &= m_{0}^{\top} M_{0}^{-1}m_{0} + y^{\top}V^{-1}y\; \\ c^{\ast} &= c - m^{\top}Mm = m_{0}^{\top} M_{0}^{-1}m_{0} + y^{\top}V^{-1}y - m_{1}^{\top}M_{1}m_{1}\; \end{align} \]

Note: \(M_{1}\), \(m_{1}\) and \(c\) do not depend upon \(\beta\).

Click here for more details of above formula

\[ \begin{align} Q \left(\beta, m_{0}, M_{0}\right)+Q \left(y, X \beta, V\right) &= (\beta - m_{0})^{\top}M_{0}^{-1}(\beta - m_{0}) + (y - X\beta)^{\top}V^{-1}(y - X\beta)\; \nonumber\\ &= \beta^{\top}M_{0}^{-1}\beta - 2\beta^{\top}M_{0}^{-1}m_{0} + m_{0}^{\top}M_{0}^{-1}m_{0} \nonumber\\ &\qquad + \beta^{\top}X^{\top}V^{-1}X\beta - 2\beta^{\top} X^{\top}V^{-1}y + y^{\top}V^{-1}y \nonumber\\ &= \beta^{\top} \left(M_{0}^{-1} + X^{\top}V^{-1}X\right) \beta - 2\beta^{\top}\left(M_{0}^{-1}m_{0} + X^{\top}V^{-1}y\right) \nonumber\\ &\qquad + m_{0}^{\top} M_{0}^{-1}m_{0} + y^{\top}V^{-1}y \nonumber \\ &= \beta^{\top}M_{1}^{-1}\beta - 2\beta^{\top} m_{1} + c\nonumber\\ &= (\beta - M_{1}m_{1})^{\top}M_{1}^{-1}(\beta - M_{1}m_{1}) - m_{1}^{\top}M_{1}m_{1} +c \nonumber\\ &= (\beta - M_{1}m_{1})^{\top}M_{1}^{-1}(\beta - M_{1}m_{1}) +c^{\ast}\; \end{align} \]

Then, we have:

\[ \boxed{ \begin{align} P\left(\beta, \sigma^{2} \mid y\right) & \propto NIG\left(\beta, \sigma^{2} \mid M_{1}m_{1}, M_{1}, a_{1}, b_{1}\right) \; \end{align} } \]

where

\[ \begin{align} m_{1}&=M_{0}^{-1} m_{0}+X^{\top} V^{-1} y \; \\ M_{1}^{-1} &=M_{0}^{-1}+X^{\top} V^{-1} X \; \\ a_{1}&=a_{0}+\frac{p}{2} \; \\ b_{1}&=b_{0}+\frac{c^{\ast}}{2}= b_{0}+\frac{1}{2}\left(m_{0}^{\top} M_{0}^{-1} m_{0}+y^{\top} V^{-1} y-m_{1}^{\top} M_{1} m_{1}\right)\; \end{align} \]

Click here for more details of above formula

\[ \begin{align} P\left(\beta, \sigma^{2} \mid y\right) & \propto \left(\frac{1}{\sigma^{2}}\right)^{a_{0}+p+1} e^{-\frac{b_{0}}{\sigma^{2}}} e^{-\frac{1}{2 \sigma^{2}} ((\beta - M_{1}m_{1})^{\top}M_{1}^{-1}(\beta - M_{1}m_{1}) +c^{\ast})}\\ & \propto \left(\frac{1}{\sigma^{2}}\right)^{a_{0}+p+1} e^{-\frac{b_{0}+\frac{c^{\ast}}{2}}{\sigma^{2}}} e^{-\frac{1}{2 \sigma^{2}} (\beta - M_{1}m_{1})^{\top}M_{1}^{-1}(\beta - M_{1}m_{1})}\\ & \propto \left(\frac{1}{\sigma^{2}}\right)^{a_{0}+\frac{p}{2}+1} e^{-\frac{b_{0}+\frac{c^{\ast}}{2}}{\sigma^{2}}} (\frac{1}{\sigma^2})^{\frac{p}{2}} e^{-\frac{1}{2 \sigma^{2}} (\beta - M_{1}m_{1})^{\top}M_{1}^{-1}(\beta - M_{1}m_{1})}\\ &= IG\left(\sigma^{2} \mid a_{0}+\frac{p}{2}, b_{0}+\frac{c^{\ast}}{2} \right) \cdot N\left(\beta \mid M_{1}m_{1}, \sigma^{2} M_{1}\right) \\ &= IG\left(\sigma^{2} \mid a_{1}, b_{1} \right) \cdot N\left(\beta \mid M_{1}m_{1}, \sigma^{2} M_{1}\right) \\ &= NIG\left(\beta, \sigma^{2} \mid M_{1}m_{1}, M_{1}, a_{1}, b_{1}\right) \; \end{align} \]

2. Updating Form of the Posterior Distribution

To calculate \(M_1\), we utilize the well-known Sherman-Woodbury-Morrison identity in matrix algebra: \[\begin{equation}\label{ShermanWoodburyMorrison} \left(A + BDC\right)^{-1} = A^{-1} - A^{-1}B\left(D^{-1}+CA^{-1}B\right)^{-1}CA^{-1} \end{equation}\] where \(A\) and \(D\) are square matrices that are invertible and \(B\) and \(C\) are rectangular (square if \(A\) and \(D\) have the same dimensions) matrices such that the multiplications are well-defined. This identity is easily verified by multiplying the right hand side with \(A + BDC\) and simplifying to reduce it to the identity matrix.

\[ \begin{aligned} M_1 & = (M_{0}^{-1} + X^{\top}V^{-1}X)^{-1} \\ & = M_0-M_0 X^{\top}\left(V+X M_0 X^{\top}\right)^{-1} X M_0 \\ & = M_0-M_0 X^{\top} Q^{-1} X M_0 \end{aligned} \]

where \(Q = V + X M_0 X^{\top}\)

We can show that \[ \boxed{ \begin{align} M_1 m_1 & =m_0+M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \;. \end{align} } \]

Click here for more details of above formula

\[\begin{align} M_1 m_1 & = \left(M_0^{-1}+X^{\top} V^{-1} X\right)^{-1} m_1 \\ & = [M_0-M_0 X^{\top}\left(V+X M_0 X^{\top}\right)^{-1} X M_0]m_1 \\ & = (M_0-M_0 X^{\top} Q^{-1} X M_0) m_1 \\ & = (M_0-M_0 X^{\top} Q^{-1} X M_0)(M_0^{-1} m_0+X^{\top} V^{-1} y) \\ & = m_0+M_0 X^{\top} V^{-1} y-M_0 X^{\top} Q^{-1} X m_0 - M_0 X^{\top} Q^{-1} X M_0 X^{\top} V^{-1} y \\ & = m_0+M_0 X^{\top}\left(I-Q^{-1} X M_0 X^{\top}\right) V^{-1} y - M_0 X^{\top} Q^{-1} X m_0 \\ & = m_0+M_0 X^{\top} Q^{-1}\left(Q-X M_0 X^{\top}\right)V^{-1} y - M_0 X^{\top} Q^{-1} X m_0 \\ & \left(\text { since } Q=V+X M_0 X^{\top}\right) \\ & = m_0+M_0 X^{\top} Q^{-1}(V) V^{-1} y-M_0 X^{\top} Q^{-1} X m_0 \\ & = m_0+M_0 X^{\top} Q^{-1} y-M_0 X^{\top} Q^{-1} X m_0 \\ & = m_0+M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ \end{align}\]

Furthermore, we can simplify \[ \boxed{ \begin{align} m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y-m_1^{\top} M_1 m_1 & = \left(y-X m_0\right)^{\top} Q^{-1}\left(y-X m_0\right) \;. \end{align} } \]

Click here for more details of above formula

\[\begin{align} m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y-m_1^{\top} M_1 m_1 & = m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y-m_1^{\top} [m_0+M_0 X^{\top} Q^{-1} (y - X m_0)] \\ & = m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y-m_1^{\top} m_0 - m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ & = m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y -m_0^{\top}\left(M_0^{-1} m_0+X^{\top} V^{-1} y\right) \\ & \qquad \qquad \qquad - m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ & = y^{\top} V^{-1} y-y^{\top} V^{-1} X m_0 - m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ & = y^{\top} V^{-1}\left(y-X m_0 \right)-m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-\underbrace{m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right)}_{\substack{\text { simplify from left to right }}} \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-\left(M_0 m_1\right)^{\top} X^{\top} Q^{-1}\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-\left(m_0+M_0 X^{\top} V^{-1} y\right)^{\top} X^{\top} Q^{-1}\left(y-m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-\left(X m_0+X M_0 X^{\top} V^{-1} y\right)^{\top} Q^{-1}\left(y-X m_0\right)\\ & =y^{\top} V^{-1}\left(y-X m_0\right) -\left(Q^{-1} X m_0+Q^{-1}\left(X M_0 X^{\top}\right)V^{-1} y\right)\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-[Q^{-1} X m_0+Q^{-1}(Q-V) V^{-1} y]^{\top}(y-X m_0) \\ & =y^{\top} V^{-1}\left(y-X m_0\right) -\left(Q^{-1} X m_0+V^{-1} y- Q^{-1} y \right)^{\top}\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right) -[V^{-1} y+Q^{-1}\left(X m_0-y\right)]^{\top}\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-y^{\top} V^{-1}\left(y-X m_0\right) +\left(y-X m_0\right)^{\top} Q^{-1}\left(y-X m_0\right) \\ & =\left(y-X m_0\right)^{\top} Q^{-1}\left(y-X m_0\right) \\ \end{align}\]

So, we get the following updating form of the posterior distribution from Bayesian Linear Regression:

\[ \boxed{ \begin{aligned} P\left(\beta, \sigma^2 \mid y\right) &\propto I G\left(\sigma^2 \mid a_0, b_0\right) N\left(\beta \mid m_0, \sigma^2 M_0\right) N(y \mid X \beta, \sigma^2 V)\\ & \propto I G\left(\sigma^2 \mid a_1, b_1\right) N\left(\beta \mid \tilde{m}_1, \sigma^2 \tilde{M}_1\right)\\ \end{aligned} } \] where,

\[ \begin{aligned} \tilde{m}_1 & =M_1 m_1=m_0+M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ \tilde{M}_1 & =M_1=M_0-M_0 X^{\top} Q^{-1} X M_0 \\ Q & =V+X M_0 X^{\top} \\ a_1 & =a_0+\frac{p}{2} \\ b_1 & =b_0+\frac{1}{2}\left(y-X m_0\right)^{\top} Q^{-1}\left(y-X m_0\right) \end{aligned} \]

Bayesian Linear Regression: the Updating Form of the Posterior Distributions

Xiang Chen, Sudipto Banerjee

2023-01-12

1. Posterior distribution of Normal Inverse-Gamma prior

2. Updating Form of the Posterior Distribution