Approximation Methods

dummy slide

Introduction

Presentation Outline

Introduction to mixed models with a probit link.

Discussion of available approximation methods.

Simulation example.

Real data example.

Next steps and alternatives.

Three Models

Why

Very similar marginal log-likelihood.

Many options for approximating the marginal log-likelihood.

Notation

\[ \begin{align*} \phi^{(K)}(\vec x;\vec\mu,\mat\Sigma) &= \frac 1{(2\pi)^{K/2}\lvert\mat\Sigma\rvert^{1/2}}\exp\left( - \frac 12 (\vec x - \vec\mu)^\top\mat\Sigma^{-1}(\vec x - \vec\mu) \right) \\ \Phi^{(K)}(\vec x;\vec\mu,\mat\Sigma) &= \int_{-\infty}^{x_1} \cdots\int_{-\infty}^{x_K} \phi^{(K)}(\vec z;\vec\mu,\mat\Sigma)\der z_1 \cdots \der z_K \\ \end{align*} \] The standard case \[ \begin{align*} \phi^{(K)}(\vec x) &= \phi^{(K)}(\vec x;\vec 0,\mat I) \\ \Phi^{(K)}(\vec x) &= \Phi^{(K)}(\vec x;\vec 0,\mat I) \\ \end{align*} \] The univariate case \[ \begin{align*} \phi(x; \mu, \sigma^2) &= \phi^{(1)}(x; \mu, \sigma^2) \\ \Phi(x; \mu, \sigma^2) &= \Phi^{(1)}(x; \mu, \sigma^2) \end{align*} \]

Generalization of Skew-normal Dist.

\[ \begin{pmatrix} \vec V_1 \\ \vec V_2 \end{pmatrix} \sim N\left( \begin{pmatrix} \vec \xi_1 \\ \vec\xi_2 \end{pmatrix}, \begin{pmatrix} \mat\Xi_{11} & \mat\Xi_{12} \\ \mat\Xi_{21} & \mat\Xi_{22} \end{pmatrix} \right) \]

then the density of \(\vec V_1 = \vec v_1\) and \(\vec V_2 \leq \vec v_2\) is

\[ \begin{align*} \phi^{(k_1)}(\vec v_1; \vec \xi_1, \mat\Xi_{11}) \Prob\left(\vec V_2 < \vec v_2 \,\middle\vert\, \vec V_1 = \vec v_1 \right)\nonumber \hspace{-140pt}& \\ &= \phi^{(k_1)}(\vec v_1; \vec \xi_1, \mat\Xi_{11}) \\ &\hspace{20pt}\cdot \Phi^{(k_2)}\left( \vec v_2 - \mat\Xi_{21}\mat\Xi_{11}^{-1}(\vec v_1 - \vec\xi_1); \vec \xi_2, \mat\Xi_{22} - \mat\Xi_{21}\mat\Xi_{11}^{-1}\mat\Xi_{12} \right) \end{align*} \]

Generalization of Skew-normal Dist.

… and the marginal is

\[ \begin{align*} \Prob(\vec V_2 \leq \vec v_2) &= \Phi^{(k_2)}(\vec v_2; \vec\xi_2, \mat\Xi_2) \\ &= \int \phi^{(k_1)}(\vec v_1; \vec \xi_1, \mat\Xi_{11}) \\ &\hspace{30pt}\cdot \Prob\left(\vec V_2 < \vec v_2 \,\middle\vert\, \vec V_1 = \vec v_1 \right)\der v_{11} \cdots \der v_{1k_1} \end{align*} \]

I.e. either \(k_1\) or \(k_2\)-dimensional intractable integral.

Three Models

We have \(\vec Y = (Y_1,\dots,Y_n)^\top\) observed outcomes.

Mixed binomial

Outcomes \(Y_i \in \{0,\dots,m\}\) are conditionally independent and binomially distributed.

Mixed multinomial

Outcomes \(Y_i \in \{1,\dots,c\}\) are conditionally independent and multinomially distributed.

Mixed generalized survival model (GSM)

Outcomes \(Y_i\in (0,\infty)\) are conditionally independently drawn from a GSM and potentially right censored.

Mixed Binomial

Given random effect \(\vec U \in \mathbb{R}^K\), each \(Y_1, \dots, Y_n\) are

\[ Y_i \sim \text{Bin}(\Phi(\vec x_i^\top\vec\beta + \vec z_i^\top\vec u), m) \]

with

\[\vec U \sim N(\vec 0, \mat\Sigma)\]

\(\vec x_i\) and \(\vec z_i\) are known covariates. \(\vec x_i^\top\vec\beta\) is the fixed effect. \(\vec\beta\) and \(\mat\Sigma\) are unknown parameters. \(\vec U\) is an unobservable random effect.

Mixed Binomial Likelihood

The complete data likelihood is

\[ \begin{align*} p(\vec u, \vec y) &= c(\vec y) \phi^{(K)}(\vec u;\vec 0, \mat\Sigma) \\ &\hspace{20pt}\cdot\prod_{i = 1}^n \Phi(\vec x_i^\top\vec\beta + \vec z_i^\top\vec u)^{y_i} \Phi(-\vec x_i^\top\vec\beta - \vec z_i^\top\vec u)^{m - y_i}\\ c(\vec y) &= \prod_{i = 1}^n \begin{pmatrix}m \\ y_i\end{pmatrix}\nonumber \end{align*} \]

The marginal log-likelihood is

\[ l(\vec\beta,\mat\Sigma) = \log\int p(\vec u, \vec y) \der u_1\cdots \der u_K \]

Mixed Binomial Likelihood

Let \(\mat X = (\vec x_1, \dots, \vec x_n)^\top\) and define

\[ \begin{align*} \vec j_i &= (\underbrace{1, \dots, 1}_{y_i\text{ times}}, \underbrace{-1, \cdots, -1}_{m - y_i\text{ times}})^\top \\ \widetilde{\mat X} &= \begin{pmatrix} \vec j_1 & \vec 0 & \cdots & \vec 0 \\ \vec 0 & \vec j_2 & \ddots & \vec \vdots \\ \vdots & \ddots & \ddots & \vec 0 \\ \vec 0 & \cdots & \vec 0 & \vec j_n \end{pmatrix}\mat X \end{align*} \]

and similarly \(\mat Z\) and \(\widetilde{\mat Z}\).

Mixed Binomial Likelihood

Then

\[ \begin{align*} p(\vec u, \vec y) &= c(\vec y) \phi^{(K)}(\vec u;\vec 0, \mat\Sigma) \\ &\hspace{20pt}\cdot\prod_{i = 1}^n \Phi(\vec x_i^\top\vec\beta + \vec z_i^\top\vec u)^{y_i} \Phi(-\vec x_i^\top\vec\beta - \vec z_i^\top\vec u)^{m - y_i} \\ &= c(\vec y) \phi^{(K)}(\vec u;\vec 0, \mat\Sigma) \Phi^{(nm)}(\widetilde{\mat X}\vec\beta + \widetilde{\mat Z}\vec u) \end{align*} \]

I.e. we have a \(K\) or \(nm\)-dimensional integral

as shown by Y. Pawitan et al. (2004) in the binary case and Ochi and Prentice (1984) in a more restricted case.

Mixed Multinomial

We have \(c\) categories.

\(\mat Z_i = (\vec z_{i1}, \dots, \vec z_{ic})^\top \in \mathbb{R}^{c\times K}\): known random effect covariates.

\(\mat B = (\vec \beta_1, \dots, \vec\beta_c)^\top\): fixed effect coefficients.

We observe \(Y_1,\dots, Y_n \in \{1,\dots,c\}\) with

\[Y_i = k \Leftrightarrow \forall k \neq k':\, A_{ik} > A_{ik'}, \qquad k,k'\in\{1,\dots,c\}\] where \(\vec A_i\) is a latent variable.

Mixed Multinomial Cont.

Assume \[\vec A_i \mid \vec U = \vec u \sim N(\mat B\vec x_i + \mat Z_i \vec u, \mat I)\]

then \[ \begin{align*} \mathcal{C}_{ik} &= \left\{ \vec A_i:\,\forall k' \neq k: A_{ik} > A_{ik'} \right\} \\ p(Y_i = k \mid \vec U = \vec u) &= \int_{\mathcal{C}_{ik}} \phi^{(c)} (\vec a; \mat B\vec x_i + \mat Z_i \vec u, \mat I) \der a_1\cdots\der a_c \\ &\hspace{-60pt}= \Phi^{(c - 1)}( \underbrace{(\vec 1\vec\beta_k^\top - \mat B_{(-k)})}_{ \widetilde{\mat B}_k}\vec x_i + \underbrace{(\vec 1\vec z_{ik}^\top - \mat Z_{i(-k)})}_{ \widetilde{\mat Z}_{ik}}\vec u; \vec 0, \mat I + \vec 1\vec 1^\top) \end{align*} \]

with \(\mat B_{(-k)} = (\vec\beta_1, \dots, \vec\beta_{k-1}, \vec\beta_{k + 1}, \dots, \vec\beta_c)\)

and similarly for \(\mat Z_i\). See McFadden (1984).

Mixed Multinomial Likelihood

The complete data likelihood is

\[ \begin{align*} p(\vec u, \vec y) &= \phi^{(K)}(\vec u; \vec 0, \mat\Sigma) \prod_{i = 1}^n \Phi^{(c - 1)}( \widetilde{\mat B}_{y_i}\vec x_i + \widetilde{\mat Z}_{iy_i}\vec u; \vec 0, \mat I + \vec 1\vec 1^\top) \\ &= \phi^{(K)}(\vec u; \vec 0, \mat\Sigma) \\ &\hspace{20pt}\cdot \Phi^{(n(c - 1))}( \widetilde{\mat B}\vec x + \widetilde{\mat Z}\vec u; \vec 0, \diag(\underbrace{ \mat I + \vec 1\vec 1^\top, \dots, \mat I + \vec 1\vec 1^\top}_{ n\text{ times}})) \end{align*} \]

with \(\widetilde{\mat B} = \diag(\widetilde{\mat B}_{y_1}, \dots, \widetilde{\mat B}_{y_n})\), \(\vec x = (\vec x_1^\top, \dots, \vec x_n^\top)^\top\), and \(\widetilde{\mat Z} = (\widetilde{\mat Z}_{1y_1}^\top \dots, \widetilde{\mat Z}_{ny_n}^\top)^\top\).

I.e. we have a \(K\) or \(n(c-1)\)-dimensional integral.

Mixed GSM

The survival time is \(Y_i^* \in (0, \infty)\).

Only observe \(Y_i = \min (Y_i^*, C_i)\) where \(C_i\) is an independent censoring time and let \(D_i = 1_{\{Y_i^* < C_i\}}\) be an event indicator.

Censoring e.g. due to drop out.

Mixed GSM Cont.

Let \(S(y\mid \vec x, \vec z, \vec u) = \Prob(Y^* > y \mid \vec x, \vec z, \vec u)\) be the conditional survival function.

Assume that

\[S(y\mid\vec x, \vec z, \vec u) = \Phi(-\vec x^\top(y)\vec\beta - \vec z^\top\vec u)\]

See Royston and Parmar (2002), X.-R. Liu, Pawitan, and Clements (2016), and X.-R. Liu, Pawitan, and Clements (2017).

Mixed GSM Cont.

Define the sets of indices of censored and observed events \[ \begin{align*} \mathcal{C} &= \{i\in \{1,\dots,n\}:\, d_i = 0\} \\ \mathcal{O} &= \{i\in \{1,\dots,n\}:\, d_i = 1\} = \{1,\dots,n\}\setminus\mathcal{C} \end{align*} \]

and \[\mat X^o(\vec Y^o) = (\vec x_j(Y_j))_{j\in\mathcal{O}}^\top\]

Similarly define \(\mat X^{\prime o}(\vec Y^o)\), \(\mat Z^o\), \(\vec Y^o\), \(\mat X^c(\vec Y^c)\), \(\mat Z^c\), and \(\vec Y^c\). Derivatives are applied element wise.

Mixed GSM Likelihood

The complete data likelihood is

\[ \begin{align*} p(\vec u, \vec y, \vec d) &= \phi^{(K)}(\vec u; \vec 0, \mat\Sigma) c(\vec y^o, \mat X^o, \vec\beta)\\ &\hspace{20pt}\cdot \phi^{(\lvert \mathcal{O}\rvert)}(-\mat X^o(\vec y^o)\vec\beta - \mat Z^o\vec u) \\ &\hspace{20pt}\cdot \Phi^{(\lvert \mathcal{C}\rvert)}(-\mat X^c(\vec y^c)\vec\beta - \mat Z^c\vec u) \\ c(\vec y^o, \mat X^o, \vec\beta) &= -\mat X^{\prime o}(\vec y^o)\vec\beta \end{align*} \]

Mixed GSM Marginal Log-likelihood

We can show that

\[ \begin{align*} l(\vec\beta, \mat\Sigma) &= \log\int p(\vec u, \vec y, \vec d) \der u_1\cdots\der u_K \\ &= \log k(\vec t^o, \mat X^o, \mat Z^o, \mat\Sigma) \\ &\hspace{20pt}+ \log\int \phi^{(K)}(\vec u; \vec h, \mat H^{-1}) \\ &\hspace{60pt}\cdot \Phi^{(\lvert \mathcal{C}\rvert)}(-\mat X^c(\vec t^c)\vec\beta - \mat Z^c\vec u) \der u_1\cdots\der u_K \end{align*} \]

I.e. we have a \(K\) or \(\lvert \mathcal{C}\rvert\)-dimensional integral.

See one of the next slides for the definitions of \(k\), \(\vec h\), and \(\mat H\).

Mixed GSM Remarks

Mixed Tobit model is a special case.

Similar to a mixed version of the linear transformation model discussed by Hothorn, Möst, and Bühlmann (2018).

A discrete time survival submodel is suggested by Barrett et al. (2015).

Similar to the mixed binomial model.

Mixed GSM Marginal Log-likelihood Cont.

\[ \begin{align*} \mat H(\mat Z^o, \mat\Sigma) &= \mat H = \mat Z^{o\top}\mat Z^o + \mat\Sigma^{-1} \\ \vec h(\vec y^o, \mat X^o, \mat Z^o, \mat\Sigma) &= \vec h = \mat H^{-1} \mat Z^{o\top}\left(- \mat X^o(\vec y^o)\vec\beta \right)\\ k(\vec y^o, \mat X^o, \mat Z^o, \mat\Sigma) &= \\ &\hspace{-40pt} c(\vec y^o, \mat X^o, \vec\beta) (2\pi)^{-\lvert\mathcal O\rvert/2}\lvert\mat\Sigma\mat H\rvert^{-1/2} \\ &\hspace{-20pt} \cdot\exp\left( -\frac 12 (- \mat X^o\vec\beta(\vec y^o))^\top (- \mat X^o\vec\beta(\vec y^o)) +\frac 12\vec h^\top\mat H\vec h\right) \end{align*} \]

Approximation Methods

Recall

Either approximate the \(k_2\)-dimensional CDF or the \(k_1\)-dimensional Gaussian weighted integral.

We focus in the mixed binary model with \(k_2 = n\) and \(k_1 = K\).

I.e. the mixed binomial model with \(m = 1\).

Will Consider

The CDF approximation suggested by Genz (1992).

(Adaptive) Gauss–Hermite Quadrature ([A]GHQ).

Monte Carlo method implemented by Genz and Monahan (1999).

Approximating the CDF

Use approximation shown by Genz (1992)

or similarly and seemingly independently developed GHK method used by Hajivassiliou, McFadden, and Ruud (1996).

Use the pmvtnorm package implementation

(Genz and Bretz 2009; Genz et al. 2020) in Fortran.
Uses randomized Korobov rules (Niederreiter 1972; Keast 1973; Cranley and Patterson 1976). Have seen a more than ten-fold reduction compared to C++ implementation of the MC estimator suggested by Genz (1992).

\(\bigO{n^3 + n^2K + nK^2}\),

and typically not \(\bigO{n^3}\) in practical examples it seems.

Gauss–Hermite Quadrature

Approximate the \(K\)-dimensional integral.

Recursive application of quadrature rule using \(b\) values for each of the \(K\) dimensions.

\(\bigO{nb^K}\).

Adaptive method may require much smaller \(b\)

(Pinheiro and Bates 1995; Q. Liu and Pierce 1994). Requires estimation of \(K\)-dimensional mode and inversion of \(K\times K\) dimensional matrix.

Monte Carlo Estimate

Approximate the \(K\)-dimensional integral using the method described by Genz and Monahan (1999).

Genz and Monahan (1999) provide a Fortran implementation for a generic integrand.

\(\bigO{K^3 + s(nK + K^2)}\) where \(s\) is the number of samples

and the \(\bigO{snK}\)-term is typically dominating.

Adaptive method may require fewer samples, \(s\),

Requires estimation of \(K\)-dimensional mode and inversion of \(K\times K\) dimensional matrix.

Simulation Example

Details

Use a mixed binomial model in the binary case, \(m = 1\).

The unconditional variance of \(\vec x_i^\top\vec\beta + \vec z_i^\top\vec u\) is 2.

Let \(\eta_i = \vec x_i^\top\vec\beta \sim N(0, 1)\) and draw \(\mat\Sigma\) from a Wishart distribution.

Focus on the evaluation of marginal log-likelihood

where we fix the relative error of each method.