\[ \renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\diag{\text{diag}} \]
Introduction to mixed models with a probit link.
Discussion of available approximation methods.
Simulation example.
Real data example.
Next steps and alternatives.
Very similar marginal log-likelihood.
Many options for approximating the marginal log-likelihood.
\[ \begin{align*} \phi^{(K)}(\vec x;\vec\mu,\mat\Sigma) &= \frac 1{(2\pi)^{K/2}\lvert\mat\Sigma\rvert^{1/2}}\exp\left( - \frac 12 (\vec x - \vec\mu)^\top\mat\Sigma^{-1}(\vec x - \vec\mu) \right) \\ \Phi^{(K)}(\vec x;\vec\mu,\mat\Sigma) &= \int_{-\infty}^{x_1} \cdots\int_{-\infty}^{x_K} \phi^{(K)}(\vec z;\vec\mu,\mat\Sigma)\der z_1 \cdots \der z_K \\ \end{align*} \] The standard case \[ \begin{align*} \phi^{(K)}(\vec x) &= \phi^{(K)}(\vec x;\vec 0,\mat I) \\ \Phi^{(K)}(\vec x) &= \Phi^{(K)}(\vec x;\vec 0,\mat I) \\ \end{align*} \] The univariate case \[ \begin{align*} \phi(x; \mu, \sigma^2) &= \phi^{(1)}(x; \mu, \sigma^2) \\ \Phi(x; \mu, \sigma^2) &= \Phi^{(1)}(x; \mu, \sigma^2) \end{align*} \]
\[ \begin{pmatrix} \vec V_1 \\ \vec V_2 \end{pmatrix} \sim N\left( \begin{pmatrix} \vec \xi_1 \\ \vec\xi_2 \end{pmatrix}, \begin{pmatrix} \mat\Xi_{11} & \mat\Xi_{12} \\ \mat\Xi_{21} & \mat\Xi_{22} \end{pmatrix} \right) \]
then the density of \(\vec V_1 = \vec v_1\) and \(\vec V_2 \leq \vec v_2\) is
\[ \begin{align*} \phi^{(k_1)}(\vec v_1; \vec \xi_1, \mat\Xi_{11}) \Prob\left(\vec V_2 < \vec v_2 \,\middle\vert\, \vec V_1 = \vec v_1 \right)\nonumber \hspace{-140pt}& \\ &= \phi^{(k_1)}(\vec v_1; \vec \xi_1, \mat\Xi_{11}) \\ &\hspace{20pt}\cdot \Phi^{(k_2)}\left( \vec v_2 - \mat\Xi_{21}\mat\Xi_{11}^{-1}(\vec v_1 - \vec\xi_1); \vec \xi_2, \mat\Xi_{22} - \mat\Xi_{21}\mat\Xi_{11}^{-1}\mat\Xi_{12} \right) \end{align*} \]
… and the marginal is
\[ \begin{align*} \Prob(\vec V_2 \leq \vec v_2) &= \Phi^{(k_2)}(\vec v_2; \vec\xi_2, \mat\Xi_2) \\ &= \int \phi^{(k_1)}(\vec v_1; \vec \xi_1, \mat\Xi_{11}) \\ &\hspace{30pt}\cdot \Prob\left(\vec V_2 < \vec v_2 \,\middle\vert\, \vec V_1 = \vec v_1 \right)\der v_{11} \cdots \der v_{1k_1} \end{align*} \]
I.e. either \(k_1\) or \(k_2\)-dimensional intractable integral.
We have \(\vec Y = (Y_1,\dots,Y_n)^\top\) observed outcomes.
Outcomes \(Y_i \in \{0,\dots,m\}\) are conditionally independent and binomially distributed.
Outcomes \(Y_i \in \{1,\dots,c\}\) are conditionally independent and multinomially distributed.
Outcomes \(Y_i\in (0,\infty)\) are conditionally independently drawn from a GSM and potentially right censored.
Given random effect \(\vec U \in \mathbb{R}^K\), each \(Y_1, \dots, Y_n\) are
\[ Y_i \sim \text{Bin}(\Phi(\vec x_i^\top\vec\beta + \vec z_i^\top\vec u), m) \]
with
\[\vec U \sim N(\vec 0, \mat\Sigma)\]
\(\vec x_i\) and \(\vec z_i\) are known covariates. \(\vec x_i^\top\vec\beta\) is the fixed effect. \(\vec\beta\) and \(\mat\Sigma\) are unknown parameters. \(\vec U\) is an unobservable random effect.
The complete data likelihood is
\[ \begin{align*} p(\vec u, \vec y) &= c(\vec y) \phi^{(K)}(\vec u;\vec 0, \mat\Sigma) \\ &\hspace{20pt}\cdot\prod_{i = 1}^n \Phi(\vec x_i^\top\vec\beta + \vec z_i^\top\vec u)^{y_i} \Phi(-\vec x_i^\top\vec\beta - \vec z_i^\top\vec u)^{m - y_i}\\ c(\vec y) &= \prod_{i = 1}^n \begin{pmatrix}m \\ y_i\end{pmatrix}\nonumber \end{align*} \]
The marginal log-likelihood is
\[ l(\vec\beta,\mat\Sigma) = \log\int p(\vec u, \vec y) \der u_1\cdots \der u_K \]
Let \(\mat X = (\vec x_1, \dots, \vec x_n)^\top\) and define
\[ \begin{align*} \vec j_i &= (\underbrace{1, \dots, 1}_{y_i\text{ times}}, \underbrace{-1, \cdots, -1}_{m - y_i\text{ times}})^\top \\ \widetilde{\mat X} &= \begin{pmatrix} \vec j_1 & \vec 0 & \cdots & \vec 0 \\ \vec 0 & \vec j_2 & \ddots & \vec \vdots \\ \vdots & \ddots & \ddots & \vec 0 \\ \vec 0 & \cdots & \vec 0 & \vec j_n \end{pmatrix}\mat X \end{align*} \]
and similarly \(\mat Z\) and \(\widetilde{\mat Z}\).
Then
\[ \begin{align*} p(\vec u, \vec y) &= c(\vec y) \phi^{(K)}(\vec u;\vec 0, \mat\Sigma) \\ &\hspace{20pt}\cdot\prod_{i = 1}^n \Phi(\vec x_i^\top\vec\beta + \vec z_i^\top\vec u)^{y_i} \Phi(-\vec x_i^\top\vec\beta - \vec z_i^\top\vec u)^{m - y_i} \\ &= c(\vec y) \phi^{(K)}(\vec u;\vec 0, \mat\Sigma) \Phi^{(nm)}(\widetilde{\mat X}\vec\beta + \widetilde{\mat Z}\vec u) \end{align*} \]
as shown by Y. Pawitan et al. (2004) in the binary case and Ochi and Prentice (1984) in a more restricted case.
We have \(c\) categories.
\(\mat Z_i = (\vec z_{i1}, \dots, \vec z_{ic})^\top \in \mathbb{R}^{c\times K}\): known random effect covariates.
\(\mat B = (\vec \beta_1, \dots, \vec\beta_c)^\top\): fixed effect coefficients.
We observe \(Y_1,\dots, Y_n \in \{1,\dots,c\}\) with
\[Y_i = k \Leftrightarrow \forall k \neq k':\, A_{ik} > A_{ik'}, \qquad k,k'\in\{1,\dots,c\}\] where \(\vec A_i\) is a latent variable.
and similarly for \(\mat Z_i\). See McFadden (1984).
The complete data likelihood is
\[ \begin{align*} p(\vec u, \vec y) &= \phi^{(K)}(\vec u; \vec 0, \mat\Sigma) \prod_{i = 1}^n \Phi^{(c - 1)}( \widetilde{\mat B}_{y_i}\vec x_i + \widetilde{\mat Z}_{iy_i}\vec u; \vec 0, \mat I + \vec 1\vec 1^\top) \\ &= \phi^{(K)}(\vec u; \vec 0, \mat\Sigma) \\ &\hspace{20pt}\cdot \Phi^{(n(c - 1))}( \widetilde{\mat B}\vec x + \widetilde{\mat Z}\vec u; \vec 0, \diag(\underbrace{ \mat I + \vec 1\vec 1^\top, \dots, \mat I + \vec 1\vec 1^\top}_{ n\text{ times}})) \end{align*} \]with \(\widetilde{\mat B} = \diag(\widetilde{\mat B}_{y_1}, \dots, \widetilde{\mat B}_{y_n})\), \(\vec x = (\vec x_1^\top, \dots, \vec x_n^\top)^\top\), and \(\widetilde{\mat Z} = (\widetilde{\mat Z}_{1y_1}^\top \dots, \widetilde{\mat Z}_{ny_n}^\top)^\top\).
I.e. we have a \(K\) or \(n(c-1)\)-dimensional integral.
The survival time is \(Y_i^* \in (0, \infty)\).
Censoring e.g. due to drop out.
Let \(S(y\mid \vec x, \vec z, \vec u) = \Prob(Y^* > y \mid \vec x, \vec z, \vec u)\) be the conditional survival function.
Assume that
\[S(y\mid\vec x, \vec z, \vec u) = \Phi(-\vec x^\top(y)\vec\beta - \vec z^\top\vec u)\]
See Royston and Parmar (2002), X.-R. Liu, Pawitan, and Clements (2016), and X.-R. Liu, Pawitan, and Clements (2017).
Define the sets of indices of censored and observed events \[ \begin{align*} \mathcal{C} &= \{i\in \{1,\dots,n\}:\, d_i = 0\} \\ \mathcal{O} &= \{i\in \{1,\dots,n\}:\, d_i = 1\} = \{1,\dots,n\}\setminus\mathcal{C} \end{align*} \]
Similarly define \(\mat X^{\prime o}(\vec Y^o)\), \(\mat Z^o\), \(\vec Y^o\), \(\mat X^c(\vec Y^c)\), \(\mat Z^c\), and \(\vec Y^c\). Derivatives are applied element wise.
The complete data likelihood is
\[ \begin{align*} p(\vec u, \vec y, \vec d) &= \phi^{(K)}(\vec u; \vec 0, \mat\Sigma) c(\vec y^o, \mat X^o, \vec\beta)\\ &\hspace{20pt}\cdot \phi^{(\lvert \mathcal{O}\rvert)}(-\mat X^o(\vec y^o)\vec\beta - \mat Z^o\vec u) \\ &\hspace{20pt}\cdot \Phi^{(\lvert \mathcal{C}\rvert)}(-\mat X^c(\vec y^c)\vec\beta - \mat Z^c\vec u) \\ c(\vec y^o, \mat X^o, \vec\beta) &= -\mat X^{\prime o}(\vec y^o)\vec\beta \end{align*} \]
We can show that
\[ \begin{align*} l(\vec\beta, \mat\Sigma) &= \log\int p(\vec u, \vec y, \vec d) \der u_1\cdots\der u_K \\ &= \log k(\vec t^o, \mat X^o, \mat Z^o, \mat\Sigma) \\ &\hspace{20pt}+ \log\int \phi^{(K)}(\vec u; \vec h, \mat H^{-1}) \\ &\hspace{60pt}\cdot \Phi^{(\lvert \mathcal{C}\rvert)}(-\mat X^c(\vec t^c)\vec\beta - \mat Z^c\vec u) \der u_1\cdots\der u_K \end{align*} \]
See one of the next slides for the definitions of \(k\), \(\vec h\), and \(\mat H\).
Mixed Tobit model is a special case.
Similar to a mixed version of the linear transformation model discussed by Hothorn, Möst, and Bühlmann (2018).
Similar to the mixed binomial model.
\[ \begin{align*} \mat H(\mat Z^o, \mat\Sigma) &= \mat H = \mat Z^{o\top}\mat Z^o + \mat\Sigma^{-1} \\ \vec h(\vec y^o, \mat X^o, \mat Z^o, \mat\Sigma) &= \vec h = \mat H^{-1} \mat Z^{o\top}\left(- \mat X^o(\vec y^o)\vec\beta \right)\\ k(\vec y^o, \mat X^o, \mat Z^o, \mat\Sigma) &= \\ &\hspace{-40pt} c(\vec y^o, \mat X^o, \vec\beta) (2\pi)^{-\lvert\mathcal O\rvert/2}\lvert\mat\Sigma\mat H\rvert^{-1/2} \\ &\hspace{-20pt} \cdot\exp\left( -\frac 12 (- \mat X^o\vec\beta(\vec y^o))^\top (- \mat X^o\vec\beta(\vec y^o)) +\frac 12\vec h^\top\mat H\vec h\right) \end{align*} \]
\[ \begin{align*} \Prob(\vec V_2 \leq \vec v_2) &= \Phi^{(k_2)}(\vec v_2; \vec\xi_2, \mat\Xi_2) \\ &= \int \phi^{(k_1)}(\vec v_1; \vec \xi_1, \mat\Xi_{11}) \\ &\hspace{30pt}\cdot \Prob\left(\vec V_2 < \vec v_2 \,\middle\vert\, \vec V_1 = \vec v_1 \right)\der v_{11} \cdots \der v_{1k_1} \end{align*} \]
Either approximate the \(k_2\)-dimensional CDF or the \(k_1\)-dimensional Gaussian weighted integral.
I.e. the mixed binomial model with \(m = 1\).
The CDF approximation suggested by Genz (1992).
(Adaptive) Gauss–Hermite Quadrature ([A]GHQ).
Monte Carlo method implemented by Genz and Monahan (1999).
or similarly and seemingly independently developed GHK method used by Hajivassiliou, McFadden, and Ruud (1996).
pmvtnorm package implementation
(Genz and Bretz 2009; Genz et al. 2020) in Fortran.
Uses randomized Korobov rules (Niederreiter 1972; Keast 1973; Cranley and Patterson 1976). Have seen a more than ten-fold reduction compared to C++ implementation of the MC estimator suggested by Genz (1992).
and typically not \(\bigO{n^3}\) in practical examples it seems.
Approximate the \(K\)-dimensional integral.
Recursive application of quadrature rule using \(b\) values for each of the \(K\) dimensions.
\(\bigO{nb^K}\).
(Pinheiro and Bates 1995; Q. Liu and Pierce 1994). Requires estimation of \(K\)-dimensional mode and inversion of \(K\times K\) dimensional matrix.
Genz and Monahan (1999) provide a Fortran implementation for a generic integrand.
and the \(\bigO{snK}\)-term is typically dominating.
Requires estimation of \(K\)-dimensional mode and inversion of \(K\times K\) dimensional matrix.
Use a mixed binomial model in the binary case, \(m = 1\).
The unconditional variance of \(\vec x_i^\top\vec\beta + \vec z_i^\top\vec u\) is 2.
Let \(\eta_i = \vec x_i^\top\vec\beta \sim N(0, 1)\) and draw \(\mat\Sigma\) from a Wishart distribution.
where we fix the relative error of each method.
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 0.02 | 0.12 | 0.79 | 6.93 | 40.46 |
| AGHQ | 0.03 | 0.09 | 0.43 | 3.00 | 26.33 |
| CDF | 0.04 | 0.03 | 0.00 | 0.02 | 0.02 |
| Genz & Monahan | 13.88 | 49.94 | 43.96 | 74.98 | 74.73 |
| Adaptive Genz & Monahan | 13.68 | 9.77 | 6.18 | 23.51 | 22.44 |
Average computation times in milliseconds.
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 0.07 | 0.41 | 3.95 | 24.96 | 153.06 |
| AGHQ | 0.03 | 0.14 | 0.60 | 3.59 | 30.64 |
| CDF | 0.43 | 0.45 | 0.42 | 0.42 | 0.47 |
| Genz & Monahan | 77.28 | 85.60 | 108.23 | 91.98 | 123.41 |
| Adaptive Genz & Monahan | 5.30 | 12.54 | 26.44 | 4.89 | 14.21 |
Average computation times in milliseconds.
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 0.10 | 0.74 | 7.41 | 89.98 | 331.24 |
| AGHQ | 0.05 | 0.17 | 1.36 | 7.80 | 34.50 |
| CDF | 4.74 | 4.61 | 4.55 | 4.74 | 4.59 |
| Genz & Monahan | 103.63 | 238.29 | 244.12 | 215.58 | 193.56 |
| Adaptive Genz & Monahan | 1.91 | 0.82 | 1.20 | 4.21 | 3.34 |
Average computation times in milliseconds.
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 0.28 | 4.42 | 110.59 | 538.85 | 2793.15 |
| AGHQ | 0.06 | 0.26 | 1.74 | 7.81 | 60.53 |
| CDF | 31.71 | 32.35 | 32.07 | 33.58 | 33.39 |
| Genz & Monahan | 141.72 | 383.82 | 439.39 | 575.03 | 561.41 |
| Adaptive Genz & Monahan | 0.24 | 0.69 | 1.26 | 0.71 | 2.99 |
Average computation times in milliseconds.
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 0.98 | 19.06 | 172.64 | 5023.05 | 26964.95 |
| AGHQ | 0.14 | 0.31 | 0.84 | 5.27 | 47.69 |
| CDF | 75.26 | 75.18 | 74.61 | 75.18 | 74.35 |
| Genz & Monahan | 465.74 | 685.87 | 702.84 | 846.62 | 1570.17 |
| Adaptive Genz & Monahan | 0.42 | 0.45 | 0.38 | 0.54 | 0.81 |
Average computation times in milliseconds.
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 8.06 | 6.75 | 8.26 | 6.47 | 5.67 |
| AGHQ | 6.22 | 5.67 | 8.84 | 5.55 | 5.34 |
| CDF | 6.15 | 5.72 | 8.88 | 5.67 | 5.36 |
| Genz & Monahan | 37.81 | 38.73 | 67.95 | 58.60 | 57.10 |
| Adaptive Genz & Monahan | 29.09 | 47.24 | 46.43 | 64.27 | 46.52 |
RMSE of (A)GHQ is computed with 4 consecutive values of \(b\).
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 16.32 | 12.51 | 17.72 | 9.58 | 12.83 |
| AGHQ | 9.43 | 9.58 | 12.43 | 8.67 | 15.18 |
| CDF | 19.69 | 18.70 | 22.91 | 24.32 | 26.48 |
| Genz & Monahan | 162.67 | 131.94 | 139.43 | 127.33 | 172.40 |
| Adaptive Genz & Monahan | 56.07 | 162.08 | 89.73 | 132.42 | 138.80 |
RMSE of (A)GHQ is computed with 4 consecutive values of \(b\).
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 56.11 | 36.70 | 42.10 | 31.64 | 21.65 |
| AGHQ | 14.96 | 19.47 | 20.94 | 17.23 | 18.97 |
| CDF | 17.83 | 23.58 | 25.84 | 25.91 | 22.78 |
| Genz & Monahan | 236.28 | 185.91 | 200.49 | 272.13 | 252.81 |
| Adaptive Genz & Monahan | 119.25 | 179.82 | 175.57 | 202.02 | 212.55 |
RMSE of (A)GHQ is computed with 4 consecutive values of \(b\).
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 134.64 | 160.54 | 126.85 | 93.83 | 85.55 |
| AGHQ | 19.60 | 16.63 | 25.47 | 34.65 | 37.27 |
| CDF | 25.99 | 27.36 | 29.86 | 43.33 | 44.78 |
| Genz & Monahan | 457.59 | 486.99 | 408.75 | 569.23 | 445.92 |
| Adaptive Genz & Monahan | 127.07 | 286.81 | 298.32 | 370.65 | 360.66 |
RMSE of (A)GHQ is computed with 4 consecutive values of \(b\).
| Method/K | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| GHQ | 388.58 | 407.13 | 272.85 | 331.03 | 237.07 |
| AGHQ | 31.78 | 47.17 | 69.40 | 52.50 | 54.38 |
| CDF | 76.19 | 74.42 | 105.98 | 91.77 | 101.08 |
| Genz & Monahan | 859.31 | 903.12 | 1052.99 | 941.96 | 840.60 |
| Adaptive Genz & Monahan | 243.57 | 238.78 | 362.14 | 381.14 | 398.06 |
RMSE of (A)GHQ is computed with 4 consecutive values of \(b\).
Two types of salamanders: whiteside and roughbutt.
\(Y_i = 1\) if the \(i\)th mating pair is successful.
Model is \[ \begin{align*} \vec U &= (\vec U_f, \vec U_m) \sim N(\vec 0, \diag(\sigma_f^2\mat I, \sigma_m^2\mat I))\\ Y_i \mid \vec U = \vec u &\sim \text{Bin}(\Phi(z_i), 1) \\ z_i &= \beta_{ww}I_{ww}(i) + \beta_{wr}I_{wr}(i) + \beta_{rw}I_{rw}(i) \\ &\hspace{20pt} + \beta_{rr}I_{rr}(i) + u_{ff_i} + u_{mm_i} \end{align*} \]
There is a square if a male and female mates.
## Inference for Stan model: salamander.
## 6 chains, each with iter=100000; warmup=20000; thin=1;
## post-warmup draws per chain=80000, total post-warmup draws=480000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## beta[1] 0.63 0 0.26 0.13 0.45 0.62 0.80 1.16 176522 1
## beta[2] -0.43 0 0.29 -1.02 -0.63 -0.43 -0.23 0.14 225548 1
## beta[3] -1.81 0 0.35 -2.54 -2.04 -1.80 -1.57 -1.16 130912 1
## beta[4] 2.23 0 0.38 1.52 1.97 2.22 2.48 2.99 138428 1
## sigma_f 0.77 0 0.17 0.46 0.65 0.76 0.87 1.12 66649 1
## sigma_m 0.72 0 0.16 0.42 0.61 0.71 0.82 1.06 61948 1
##
## Samples were drawn using NUTS(diag_e) at Thu Feb 20 10:48:52 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Requires estimation of \(K\)-dimensional mode and inversion of a \(K\times K\) matrix.
Very fast and scale well but heavily biased in some cases.
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( probit )
## Formula: y ~ wsm * wsf + (1 | female) + (1 | male)
## Data: sala
## AIC BIC logLik deviance df.resid
## 432.2 455.5 -210.1 420.2 354
## Random effects:
## Groups Name Std.Dev.
## female (Intercept) 0.626
## male (Intercept) 0.576
## Number of obs: 360, groups: female, 60; male, 60
## Fixed Effects:
## (Intercept) wsm wsf wsm:wsf
## 0.601 -0.419 -1.731 2.140
Run on an Intel® Core™ i7-8750H CPU compiled with gcc 8.3.0.
Uses the mvkbrv Fortran subroutine from the mvtnorm package to approximate the gradient.
Uses BFGS implementation in R’s (R Core Team 2019) optim function.
##
## fit_CDF_cpp_fast
## ----------------
##
## Fixed effects
## (Intercept) wsm wsf wsm:wsf
## 0.6208 -0.4290 -1.7242 2.1243
##
## Random effect standard deviations
## 0.7073 0.6835
##
## Log-likelihood estimate -206.82
## Computation time 23.77/0.48/0.53 (seconds total/per func./per grad.)
##
## fit_CDF_cpp
## -----------
##
## Fixed effects
## (Intercept) wsm wsf wsm:wsf
## 0.6206 -0.4305 -1.7228 2.1249
##
## Random effect standard deviations
## 0.7075 0.6817
##
## Log-likelihood estimate -206.83
## Computation time 6.81/0.76/2.27 (seconds total/per func./per grad.)
Lai and Shih (2003) suggest a hybrid Laplace and Monte Carlo method.
May be very fast when combined with methods from Genz (1992) and Genz and Monahan (1999).
Girolami and Rogers (2006) and Consonni and Marin (2007) suggest a VA (Ormerod and Wand 2010).
Consonni and Marin (2007) shows that it works poorly in some cases.
Ormerod (2011).
Requires estimation of \(\bigO{K^2}\) variational parameters.
Showed three classes of mixed models with very similar marginal log-likelihoods.
A variety of approximations are available.
Hybrid method may be useful.
One variational approximation seems unexplored.
The presentation is at rpubs.com/boennecd/mix-probit-KTH.
The markdown is at github.com/boennecd/Talks.
Code and more examples at github.com/boennecd/mixprobit.
References are on the next slide.
Barrett, Jessica, Peter Diggle, Robin Henderson, and David Taylor-Robinson. 2015. “Joint Modelling of Repeated Measurements and Time-to-Event Outcomes: Flexible Model Specification and Exact Likelihood Inference.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77 (1): 131–48. doi:10.1111/rssb.12060.
Consonni, Guido, and Jean-Michel Marin. 2007. “Mean-Field Variational Approximate Bayesian Inference for Latent Variable Models.” Computational Statistics & Data Analysis 52 (2): 790–98. doi:https://doi.org/10.1016/j.csda.2006.10.028.
Cranley, R., and T. N. L. Patterson. 1976. “Randomization of Number Theoretic Methods for Multiple Integration.” SIAM Journal on Numerical Analysis 13 (6). Society for Industrial; Applied Mathematics: 904–14. http://www.jstor.org/stable/2156452.
Genz, Alan. 1992. “Numerical Computation of Multivariate Normal Probabilities.” Journal of Computational and Graphical Statistics 1 (2). Taylor & Francis: 141–49. doi:10.1080/10618600.1992.10477010.
Genz, Alan, and Frank Bretz. 2009. Computation of Multivariate Normal and T Probabilities. Lecture Notes in Statistics. Heidelberg: Springer-Verlag.
Genz, Alan, and John Monahan. 1999. “A Stochastic Algorithm for High-Dimensional Integrals over Unbounded Regions with Gaussian Weight.” Journal of Computational and Applied Mathematics 112 (1): 71–81. doi:https://doi.org/10.1016/S0377-0427(99)00214-9.
Genz, Alan, Frank Bretz, Tetsuhisa Miwa, Xuefei Mi, Friedrich Leisch, Fabian Scheipl, and Torsten Hothorn. 2020. mvtnorm: Multivariate Normal and T Distributions. https://CRAN.R-project.org/package=mvtnorm.
Girolami, M., and S. Rogers. 2006. “Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors.” Neural Computation 18 (8): 1790–1817. doi:10.1162/neco.2006.18.8.1790.
Hajivassiliou, Vassilis, Daniel McFadden, and Paul Ruud. 1996. “Simulation of Multivariate Normal Rectangle Probabilities and Their Derivatives Theoretical and Computational Results.” Journal of Econometrics 72 (1): 85–134. doi:https://doi.org/10.1016/0304-4076(94)01716-6.
Hothorn, Torsten, Lisa Möst, and Peter Bühlmann. 2018. “Most Likely Transformations.” Scandinavian Journal of Statistics 45 (1): 110–34. doi:10.1111/sjos.12291.
Keast, P. 1973. “Optimal Parameters for Multidimensional Integration.” SIAM Journal on Numerical Analysis 10 (5): 831–38. doi:10.1137/0710068.
Lai, Tze Leung, and Mei-Chiung Shih. 2003. “A Hybrid Estimator in Nonlinear and Generalised Linear Mixed Effects Models.” Biometrika 90 (4). [Oxford University Press, Biometrika Trust]: 859–79. http://www.jstor.org/stable/30042093.
Liu, Qing, and Donald A. Pierce. 1994. “A Note on Gauss-Hermite Quadrature.” Biometrika 81 (3). [Oxford University Press, Biometrika Trust]: 624–29. http://www.jstor.org/stable/2337136.
Liu, Xing-Rong, Yudi Pawitan, and Mark Clements. 2016. “Parametric and Penalized Generalized Survival Models.” Statistical Methods in Medical Research 27 (5): 1531–46. doi:10.1177/0962280216664760.
Liu, Xing-Rong, Yudi Pawitan, and Mark S. Clements. 2017. “Generalized Survival Models for Correlated Time-to-Event Data.” Statistics in Medicine 36 (29): 4743–62. doi:10.1002/sim.7451.
McFadden, Daniel. 1984. “Chapter 24 Econometric Analysis of Qualitative Response Models.” In, 2:1395–1457. Handbook of Econometrics. Elsevier. doi:https://doi.org/10.1016/S1573-4412(84)02016-X.
Niederreiter, H. 1972. “On a Number-Theoretical Integration Method.” Aequationes Mathematicae 8 (3): 304–11. doi:10.1007/BF01844507.
Ochi, Y., and Ross L. Prentice. 1984. “Likelihood Inference in a Correlated Probit Regression Model.” Biometrika 71 (3). [Oxford University Press, Biometrika Trust]: 531–43. http://www.jstor.org/stable/2336562.
Ormerod, J. T. 2011. “Skew-Normal Variational Approximations for Bayesian Inference.” Unpublished Article.
Ormerod, J. T., and M. P. Wand. 2010. “Explaining Variational Approximations.” The American Statistician 64 (2). Taylor & Francis: 140–53. doi:10.1198/tast.2010.09058.
Pawitan, Y., M. Reilly, E. Nilsson, S. Cnattingius, and P. Lichtenstein. 2004. “Estimation of Genetic and Environmental Factors for Binary Traits Using Family Data.” Statistics in Medicine 23 (3): 449–65. doi:10.1002/sim.1603.
Pinheiro, José C., and Douglas M. Bates. 1995. “Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model.” Journal of Computational and Graphical Statistics 4 (1). American Statistical Association, Taylor & Francis, Ltd., Institute of Mathematical Statistics, Interface Foundation of America: 12–35. http://www.jstor.org/stable/1390625.
R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Royston, Patrick, and Mahesh K. B. Parmar. 2002. “Flexible Parametric Proportional-Hazards and Proportional-Odds Models for Censored Survival Data, with Application to Prognostic Modelling and Estimation of Treatment Effects.” Statistics in Medicine 21 (15): 2175–97. doi:10.1002/sim.1203.