(C1) Maximum a-posteriori (MAP) approach
- Recall the the problem of variable selection in linear regression. The linear regression model with a response \(y\) and covariates \(\{x_{1}, \ldots, x_{p}\}\) assumes that the \(i\)-th observation of the response \(y_i\) is related to that of the covariates \(x_{1,i}, \ldots, x_{p,i}\) as
\[y_{i}=\beta_0+\beta_1 x_{1,i}+\beta_2 x_{2,i} + \cdots + \beta_p x_{p,i} +e_{i}, \]
where \(\beta_j\) is the \(j^{th}\) regression coefficient for \(j \in \{1, \ldots, p\}\) and \(e_{1}, \ldots, e_{n} \stackrel{iid}{\sim} N( 0, \sigma^{2})\).
Given \(p\) covariates, the space of all models, \(\mathcal{M}\), directly maps to the space of all subsets of \(\{1,\ldots,p\}\) and has cardinality \(2^{p}\).
Let \(\boldsymbol{\gamma}\) be a subset of \(\{1,\ldots,p\}\). Then, there exists a model \(M_{\boldsymbol{\gamma}}\) consisting of the covariates corresponding to \(\boldsymbol{\gamma}\), say \({\bf X}_{\boldsymbol{\gamma}}\). For example, if \(\boldsymbol{\gamma} = \{1,2\}\), then under \(M_{\boldsymbol{\gamma}}\), we have
\[y_{i}=\beta_0+\beta_1 x_{1,i}+\beta_2 x_{2,i} +e_{i}. \]
The set of parameters involved in \(M_{\boldsymbol{\gamma}}\) is denoted by \(\boldsymbol{\theta}_{\boldsymbol{\gamma}}\).
- Thus, if \(\boldsymbol{\gamma} = \{1,2\}\), then \(\boldsymbol{\theta}_{\boldsymbol{\gamma}} = \{ \beta_{0}, \beta_{1}, \beta_{2}, \sigma^{2} \}\).
As the Bayesian paradigm suggests, one places a prior distribution to all unknown quantities in the model.
- Let \(\pi\left(\boldsymbol{\theta}_{\boldsymbol{\gamma}}\right)\) be a prior density on \(\boldsymbol{\theta}_{\boldsymbol{\gamma}}\), and \(P(M_{\boldsymbol{\gamma}})\) be the prior on the model \(M_{\boldsymbol{\gamma}}\).
Then the integrated likelihood is calculated as
\[m_{\boldsymbol{\gamma}}({\bf y}_{n})=\int_{\boldsymbol{\theta}_{\boldsymbol{\gamma}}} f\left({\bf y}_{n}|M_{\boldsymbol{\gamma}},\boldsymbol{\theta}_{\boldsymbol{\gamma}} \right) \pi \left( \boldsymbol{\theta}_{\boldsymbol{\gamma}} \right) d{\boldsymbol{\theta}_{\boldsymbol{\gamma}}} .\]
Integrated likelihood is the expected likelihood of the model \(M_{\boldsymbol{\gamma}}\) over the prior distribution, and can be interpreted as the probability (density) of the response given the model (integrating out the effect of a particular choice of \(\theta_{\boldsymbol{\gamma}}\)).
Next the posterior probability of \(M_{\boldsymbol{\gamma}}\) is obtained as \[ P(M_{\boldsymbol{\gamma}}|{\bf y}_{n},{\bf X}_{\boldsymbol{\gamma}}) = \frac{P(M_{\boldsymbol{\gamma}}) m_{\boldsymbol{\gamma}}({\bf y}_{n})}{\sum_{\boldsymbol{\gamma^{\prime}}} P(M_{\boldsymbol{\gamma^{\prime}}}) m_{\boldsymbol{\gamma^{\prime}}}({\bf y}_{n})} \propto P(M_{\boldsymbol{\gamma}}) m_{\boldsymbol{\gamma}}({\bf y}_{n}), \]
by an application of the Bayes theorem.
The maximum a-posteriori (MAP) approach selects the model with highest posterior probability as the best model.