\[ \definecolor{gray}{RGB}{192,192,192} \renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \]
Introduce survival analysis.
particularly with random effects.
in survival analysis.
Cover future work.
Model the time of a terminal event.
Time zero is the time of diagnosis, time of onset, or birth date.
Some events are (right) censored.
Full circle: observed event. Open circle: censored. Order is only for visualization purposes.
Black: treatment. Blue and dashed: placebo.
e.g., short trial period or low event rate.
Need to account for censoring.
\(T_i^*\) be the event time of individual \(i\) and let \(f\) denote the density function.
Let \(D_i = 1_{\{T_i^* < C_i\}}\) be the event indicators.
The observed likelihood is
\[p(t_i, d_i) = f(t_i)^{d_i}P(T_i^* > t_i)^{1 - d_i}\]
Let \(S\) be the survival function
\[S(t_i) = P(T_i^* > t_i) = \int_t^\infty f(s)\der s\]
Let \(\lambda\) be the hazard function
\[ \lambda(t) = \lim_{h\rightarrow 0^+} \frac {P(t\leq T_i^* < t + h \mid T_i^*\geq t)}{h} \]
which is the instantaneous event rate.
Then
\[p(t_i, d_i) = \lambda(t_i)^{d_i}S(t_i) = \lambda(t_i)^{d_i}\exp\left(-\int_0^{t_i}\lambda(s)\der s\right)\]
Important: two types of terms in maximum likelihood estimation
\[d_i\log\lambda(t_i) + \log S(t_i)\]
e.g., with observational data sets.
to improve prediction accuracy or make inference.
Typical choice is
\[ \lambda\Cond{t}{\vec x} = \lambda_0(t)\exp\left(\vec\beta^\top\vec x\right) \]\(\lambda_0\) may be parametric or nonparametric (Cox 1972).
Unit increase in \(x_l\) yields an \(\exp\beta_l\) factor increase in \(\lambda\).
Special classes of \(\lambda_0\) yields a tractable (partial) log-likelihood function.
Generalization may be intractable due to
\[S\Cond{t}{\vec x} = \exp\left(-\int_0^t\lambda\Cond{s}{\vec x}\der s\right)\]Leads to one-dimensional numerical integration.
Let
\[g\left(S\Cond{t}{\vec x}\right) = g\left(S_0(t)\right) + \vec\beta^\top\vec x\]
E.g., see Younes and Lachin (1997) and Royston and Parmar (2002).
Avoid integration.
Get a monotonicity constraint for \(S\Cond{t}{\vec x}\) for all \(\vec x\).
e.g., twins who share genetic background and environment.
\(k=1,\dots,m\) indicates the group and group \(k\) has \(i=1,\dots,n_k\) members.
In particular, we consider
\[\vec U_k \sim N(\vec 0, \mat \Sigma)\]
Let functions be applied elementwise
\[ \begin{align*} \vec x &= (x_1, x_2)^\top \\ f(\vec x) &= (f(x_1), f(x_2))^\top \end{align*} \]
Further, let
\[ \begin{align*} \vec t_k &= (t_{k1}, \dots, t_{kn_k})^\top, & \vec d_k &= (d_{k1}, \dots, d_{kn_k})^\top \\ \mat X_k &= (\vec x_{k1}, \dots, \vec x_{kn_k})^\top, & \mat Z_k &= (\vec z_{k1}, \dots, \vec z_{kn_k})^\top \end{align*} \]
The marginal log-likelihood (or model evidence) term for each group \(k\) is
\[ \begin{align*} l_k(\vec\beta, \mat\Sigma) &= \log \int \exp\left(h_k(\vec\beta, \mat\Sigma, \vec u) + \log \phi(\vec u;\mat \Sigma)\right) \der \vec u \\ h_k(\vec\beta, \mat\Sigma, \vec u) &= \vec d^\top_k \log \lambda\Cond{\vec t_k}{\mat X_k, \mat Z_k, \vec u} \\ &\hspace{20pt} + \vec 1^\top\log S\Cond{\vec t_k}{\mat X_k, \mat Z_k, \vec u} \end{align*} \]
where \(\phi(\cdot ;\mat \Sigma)\) is the density function of the multivariate normal distribution with a zero mean vector and covariance matrix \(\mat \Sigma\). \(\vec 1\) is a vector of ones.
\(l_k(\vec\beta, \mat\Sigma)\) is intractable in general.
Fast, scales well, but may perform poorly e.g., for small groups (\(n_k\) small).
Fast in low dimensions, scales poorly, and performs well. For examples, see Liu and Pierce (1994) and Pinheiro and Bates (1995).
Slow, scales poorly, and performs well.
\[\begin{align*} l &= \int c(\vec u)\der u \qquad\qquad\qquad\qquad\qquad \text{(intractable)} \\ &= \int \frac{c(\vec u)}{\phi(\vec u; \vec\mu, \mat\Lambda)} \phi(\vec u; \vec\mu, \mat\Lambda)\der\vec u \\ &= \int\underbrace{(2\pi)^{k/2} \lvert\Lambda\rvert^{1/2} c(\vec\mu + \Lambda^{1/2}\vec s) \exp\left(\vec s^\top\vec s / 2\right)}_{ \tilde c(\vec s; \vec\mu, \mat\Lambda)} \phi(\vec s; \vec 0, \mat I)\der\vec s \end{align*}\]
where \(\vec u\in \mathbb{R}^k\), \(\phi(\cdot; \vec\mu, \mat\Lambda)\) is the density of a multivariate normal distribution with mean \(\vec\mu\) and covariance matrix \(\mat\Lambda\), and \(\mat I\) is the identity matrix.
Apply Gauss-Hermite quadrature to each coordinate of \(\vec s\).
Each coordinate, \(s_l\), is approximated at \(b\) points.
Let \((g_i, w_i)\) be one of the node values and the corresponding weight for \(i = 1,\dots,b\) for each coordinate, \(s_l\), and
\[\vec s_{j_1,\cdots,j_k} = (g_{j_1},\cdots,g_{j_k})^\top\]
Repeatedly applying Gauss-Hermite quadrature yields
\[\begin{align*} l &= \int \tilde c(\vec s; \vec\mu; \mat\Lambda) \phi(\vec s; \vec 0, \mat I)\der\vec s \\ &\approx \sum_{j_1=1}^b\cdots\sum_{j_k=1}^b \tilde c(\vec s_{j_1,\cdots,j_k}; \vec\mu, \mat\Lambda) \prod_{q = 1}^k w_{j_q} \end{align*}\]
Fast alternatives are attractive.
A lower bound of the marginal log-likelihood is
\[ \begin{align*} \log p\left(\vec t_k, \vec d_k\right) &= \int q(\vec u;\vec\theta_k) \log \left(\frac {p\left(\vec t_k, \vec d_k, \vec u\right) / q(\vec u;\vec\theta_k)} {p\Cond{\vec u}{\vec t_k, \vec d_k} / q(\vec u;\vec\theta_k)} \right)\der\vec u \\ &\geq \int q(\vec u;\vec\theta_k) \log \left(\frac {p\left(\vec t_k, \vec d_k, \vec u\right)} {q(\vec u;\vec\theta_k)} \right)\der\vec u \\ &= \log \tilde p \left(\vec t_k, \vec d_k;\vec\theta_k\right) \end{align*} \]
for some density function \(q\). Equality is if and only
\[ q(\vec u_k;\vec\theta_k) = p\Cond{\vec u_k}{\vec t_k, \vec d_k} \]
Approximate maximum likelihood is
\[\argmax_{\vec\beta, \mat \Sigma, \vec\theta_1, \cdots, \vec\theta_m} \sum_{k=1}^m\log \tilde p \left(\vec t_k, \vec d_k;\vec\theta_k\right)\]
Recall the marginal log-likelihood
\[ \begin{align*} l_k(\vec\beta, \mat\Sigma) &= \log \int \exp\underbrace{\left(h_k(\vec\beta, \mat\Sigma, \vec u) + \log \phi(\vec u;\mat \Sigma)\right)}_{ \log p(\vec t_k, \vec d_k, \vec u)} \der \vec u \\ h_k(\vec\beta, \mat\Sigma, \vec u) &= \vec d^\top_k \log \lambda\Cond{\vec t_k}{\mat X_k, \mat Z_k, \vec u} \\ &\hspace{20pt} + \vec 1^\top\log S\Cond{\vec t_k}{\mat X_k, \mat Z_k, \vec u} \end{align*} \]
Suppose \(q(\vec u; \vec\mu, \mat\Lambda) = \phi(\vec u; \vec\mu, \mat\Lambda)\).
\(\left(\log \phi(\vec u;\mat \Sigma) - \log q(\vec u; \vec\mu, \mat\Lambda)\right)\phi(\cdot)\) term is tractable.
The remaining two types of terms
\[\left(\vec d^\top_k \log \lambda\Cond{\vec t_k}{\mat X_k, \mat Z_k, \vec u} + \vec 1^\top\log S\Cond{\vec t_k}{\mat X_k, \mat Z_k, \vec u}\right)\phi(\cdot)\]
\[ \lambda\Cond{\vec t_k}{\mat X_k, \mat Z_k, \vec u_k} = s\left(\vec t_k; \eta_1(\mat X_k) + \mat Z_k\vec u_k\right) \]
for some function \(s\). Then
\[ \begin{align*} \hspace{20pt}&\hspace{-20pt} \int \log\lambda\Cond{t_{ki}}{\vec x_{ki}, \vec z_{ki}, \vec u} \phi(\vec u; \vec\mu, \mat\Lambda) \der \vec u \\ &= \int \log s \left(t_{ki};\eta_1(\vec x_{ki}) + \vec z_{ki}^\top\vec\mu + \sqrt{\vec z_{ki}^\top\Sigma\vec z_{ki}}x\right) \phi(x)\der x \end{align*} \]
instead of one \(k\)-dimensional integral. As shown by Ormerod and Wand (2012) for generalized linear mixed models.
Recall
\[ g\left(S\Cond{t_{ki}}{\vec x_{ki}, \vec z_{ki}, \vec u_k}\right) = g\left(S_0(t_{ki}; \vec x_{ki})\right) + \vec z^\top_{ki}\vec u_k \]
and suppose that
\[g(x) = \log(-\log(x))\]
Then
\[\begin{align*} \log S\Cond{t_{ki}}{\vec x_{ki}, \vec z_{ki}, \vec u_k} &= \exp(\vec z^\top_{ki}\vec u_k)\log S_0(t_{ki}; \vec x_{ki}) \\ \log\lambda\Cond{t_{ki}}{\vec x_{ki}, \vec z_{ki}, \vec u_k} &= L_0(t_{ki}; \vec x_{ki}) + \vec z^\top_{ki}\vec u_k \end{align*}\]
with
\[L_0(t_{ki}; \vec x_{ki}) = \log\left(- \frac {S'_0(t_{ki}; \vec x_{ki})} {S_0(t_{ki}; \vec x_{ki})} \right)\]
Both yield tractable lower bound terms.
where \(\vec f:\, [0,\infty)\times \mathbb{R}^v\rightarrow\mathbb{R}^k\) is not applied elementwise. Then
\[\begin{align*} \log S\Cond{t_{ki}}{\vec x_{ki}, \vec z_{ki}, \vec u_k} &\\ &\hspace{-40pt}= -\int_0^{t_{ki}} \lambda_0(s ;\vec x_{ki}) \exp\left(\vec f(s, \vec z_{ki})^\top\vec u_k\right) \der s \\ \log\lambda\Cond{t_{ki}}{\vec x_{ki}, \vec z_{ki}, \vec u_k} &= \log \lambda_0(t_{ki};\vec x_{ki}) + \vec f(t_{ki}, \vec z_{ki})^\top\vec u_k \end{align*}\]
Assuming we can change order of integration. Similar to result by Yue and Kontar (2019).
The lower bounds are computationally attractive but precision is unknown.
Compare the Laplace approximation with VA in simple case.
Markers are often observed in discrete time which are supposedly related to the event rate.
Dependence is often assumed through a random mean function of the markers.
The presentation is at rpubs.com/boennecd/KTH-group-meeting-Oct19.
The markdown is at github.com/boennecd/Talks.
Notes are available at bit.ly/KTH19-VA-notes.
References are on the next slide.
Cox, D. R. 1972. “Regression Models and Life-Tables.” Journal of the Royal Statistical Society Series B 34 (2). [Royal Statistical Society, Wiley]: 187–220. http://www.jstor.org/stable/2985181.
Liu, Qing, and Donald A. Pierce. 1994. “A Note on Gauss-Hermite Quadrature.” Biometrika 81 (3). [Oxford University Press, Biometrika Trust]: 624–29. http://www.jstor.org/stable/2337136.
Ormerod, J. T., and M. P. Wand. 2012. “Gaussian Variational Approximate Inference for Generalized Linear Mixed Models.” Journal of Computational and Graphical Statistics 21 (1). Taylor & Francis: 2–17. doi:10.1198/jcgs.2011.09118.
Pinheiro, José C., and Douglas M. Bates. 1995. “Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model.” Journal of Computational and Graphical Statistics 4 (1). American Statistical Association, Taylor & Francis, Ltd., Institute of Mathematical Statistics, Interface Foundation of America: 12–35. http://www.jstor.org/stable/1390625.
Royston, Patrick, and Mahesh K. B. Parmar. 2002. “Flexible Parametric Proportional-Hazards and Proportional-Odds Models for Censored Survival Data, with Application to Prognostic Modelling and Estimation of Treatment Effects.” Statistics in Medicine 21 (15): 2175–97. doi:10.1002/sim.1203.
Younes, Naji, and John Lachin. 1997. “Link-Based Models for Survival Data with Interval and Continuous Time Censoring.” Biometrics 53 (4). [Wiley, International Biometric Society]: 1199–1211. http://www.jstor.org/stable/2533490.
Yue, Xubo, and Raed Kontar. 2019. “Variational Inference of Joint Models Using Multivariate Gaussian Convolution Processes.”