Estimating Joint Survival and Marker Models

dummy slide

Introduction

\[ \renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\Expec{\text{E}} \def\logit{\text{logit}} \def\diag{\text{diag}} \]

Example

How is higher than normal breast density related to breast cancer?

Is the risk related to

  • long periods of higher than normal density
  • the current density only, or
  • large changes over a short period of time?

We need a model for the density, the marker, and the risk of breast cancer, the survival time.

Markers

We have a number of individuals where we observe markers at different points in time.

E.g. breast density.

We assume that markers

  • Have a population mean curve across time.
  • Each individual has a mean curve that may differ from the population mean curve across time.
  • Observations are noisy. E.g. weight.

Example: One individual

Observations for one individual.

Example: Two individuals

Observations for two individuals.

Example: Two individuals

Observations for two individuals with their mean curves.

Example: Population Curves

The population mean curve with pointwise quantiles for the individuals’ mean curves.

Example: Population Curves

The population mean curve with pointwise quantiles for the individuals’ mean curves and with curves for two individuals.

Example: Population Curves

The population mean curve with pointwise quantiles for the individuals’ mean curves and with curves for two individuals and their observed values.

Formally

\[ \begin{align*} Y_{ij} &= \mu_i(s_{ij}, \vec U_i) + \epsilon_{ij} \\ \epsilon_{ij} &\sim N(0, \sigma^2) \\ \mu_{i}(s, \vec U_i) &= \vec x_i^\top\vec\gamma + \vec g(s)^\top\vec\beta + \vec m(s)^\top\vec U_i \\ \vec U_i &\sim N^{(R)}(\vec0, \Psi) \end{align*} \]

\(\vec x_i^\top\vec\gamma\): fixed effects (offsets).

E.g. sex differences.

\(\vec g(s)^\top\vec\beta\): the population mean curve.

\(\vec m(s)^\top\vec U_i\): difference for individual \(i\) to the population curve.

Time to Event Outcomes

We may also have time to event outcomes.

E.g. diagnosis of disease, onset of a disease, and death from a disease.

The deviation from the markers from the population may be linked to higher or lower risk of the event for the individuals than the population.

Different ways the two can be linked.

Relation to Markers: Current Value

A higher level at any given point may lead to higher/lower risk.

Relation to Markers: Derivative

A stepper change at any given point may lead to higher/lower risk.

Relation to Markers: Cumulative

A long period of higher values may lead to higher/lower risk at any given point.

Hazards

\(T_i\) is the event time of individual \(i\). The hazard is defined as

\[ h_i(t) = \lim_{dt\rightarrow0}\frac{P(t\leq T_i + dt\mid T_i \geq t)}{dt} \]

Given the hazard, \(h_i\), we can compute the density of \(T_i = t\) and probabilities for events like \(T_i > t\) and \(T_i\in (t, v)\).

Survival Model

\[ \begin{align*} h_i(t\mid \vec U_i, \xi_i) &= \exp\left( \vec z_i^\top\vec \delta + \omega^\top \vec b(t) + \alpha \vec m(t)^\top\vec U_i + \xi_i \right) \\ \xi_i &\sim N(0, \Xi) \end{align*} \]

\(\vec z_i^\top\vec \delta\): fixed effects (offsets).

E.g. sex differences.

\(\omega^\top \vec b(t)\): the population log hazard function.

\(\alpha\vec m(t)^\top\vec U_i\): individual specific deviation.

Shared with the marker!

\(\xi_i\): individual specific deviation.

Not shared with the marker.

Survival Model

Can replace \(\alpha\vec m(t)^\top\vec U_i\) with

\[\alpha\left(\frac{\partial\vec m(t)}{\partial t}\right)^\top\vec U_i\]

or

\[\alpha\left(\int_0^t \vec m(s) ds\right)^\top\vec U_i.\]

… we can also add them.

Informative Visiting Process

Markers may dependent on the likelihood of an observation.

People with a worse health status may tend to have worse marker values and have more measurements.

Such dependencies can be handled with joint models (Gasparini et al. 2020).

More Markers

We may \(L\) different markers rather than one

\[ \begin{align*} \vec Y_{ij}&= \vec\mu_i(s_{ij}, \vec U_i) + \vec\epsilon_{ij} \\ \epsilon_{ij} &\sim N^{(L)}(\vec 0, \Sigma) \\ \mu_{i1}(s, \vec U_i) &= \vec x_{i1}^\top\vec\gamma_1 + \vec g_1(s)^\top\vec\beta_1 + \vec m_1(s)^\top\vec U_{i1} \\ \vdots &\hphantom{=}\vdots\\\ \mu_{iL}(s, \vec U_i) &= \vec x_{iL}^\top\vec\gamma_L + \vec g_L(s)^\top\vec\beta_L + \vec m_L(s)^\top\vec U_{iL} \\ \vec U_i &= \begin{pmatrix} \vec U_{i1} \\ \vdots \\ \vec U_{iL} \end{pmatrix}\sim N^{(R)}(\vec0, \Psi). \end{align*} \]

More Survival Types

\[G(s) = \begin{pmatrix} \vec g_1(s)^\top & 0^\top & \cdots & \vec 0^\top \\ \vec 0^\top & \vec g_2(s)^\top & \ddots & \vdots \\ \vdots & \ddots & \ddots & \vec 0^\top \\ \vec 0^\top & \cdots & \vec 0^\top & \vec g_L(s)^\top \end{pmatrix}\]

and \(M(s)\) and \(X_i\) similarly. Then

\[\vec\mu_i(s_{ij}, \vec U_i) = X_i\vec\gamma + G(s_{ij})\vec\beta + M(s_{ij})\vec U_i.\]

More Survival Types

We may have \(H\) different survival types rather than one

\[ \begin{align*} h_{i1}(t\mid \vec U_i, \vec\xi_i) &= \exp\left( \vec z_{i1}^\top\vec \delta_1 + \omega_1^\top \vec b_1(t) + \vec\alpha_1^\top M(t)\vec U_i + \xi_{i1} \right) \\ \vdots &\hphantom{=}\vdots \\ h_{iH}(t\mid \vec U_i,\vec \xi_i) &= \exp\left( \vec z_{iL}^\top\vec \delta_H + \omega_H^\top\vec b_H(t) + \vec\alpha_H^\top M(t)\vec U_i + \xi_{iH} \right) \\ \vec\xi_i = \begin{pmatrix}\xi_{i1} \\ \vdots \\ \xi_{iH}\end{pmatrix} &\sim N^{(H)}(\vec 0, \Xi) \end{align*} \]

Likelihood and Approximations

The Likelihood

The likelihood is intractable. Let \(\vec\zeta\) be the model parameters and \(l_i(\vec\zeta)\) be the log likelihood term of individual \(i\).

\(l_i(\vec\zeta)\) is of the form

\[ \exp(l_i(\vec\zeta)) = E\left( g_i(\vec U_i)h_{ij}(t\mid \vec U_i, \vec\xi_i) \exp\left(-\int_0^t h_{ij}(s\mid \vec U_i, \vec\xi_i)ds\right)\right) \]

for some function \(g_i\)

with implicit depends on \(\vec\zeta\).

Variational Approximations

Given distribution \(q^{(i)}_{\vec\theta}\) for \(\theta\in \Theta_i\) and outcomes \(\vec y_i\)

\[ \begin{align*} l_i(\vec\zeta) &= \log p_i(\vec y_i) = \log \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)\big/ q^{(i)}_{\vec\theta}(\vec U)} {p_i(\vec U\mid\vec y_i)\big/ q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U\\ &= \log \int q^{(i)}_{\vec\theta}(\vec U) \left( \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U + \log \left(\frac{q^{(i)}_{\vec\theta}(\vec U)} {p_i(\vec U\mid\vec y_i)}\right) \right)d \vec U \\ &\geq \log \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U = \tilde l_i(\vec\zeta, \vec\theta) \end{align*} \]

Equality if \(p_i(\vec U\mid\vec y_i) = q^{(i)}_{\vec\theta}(\vec U)\).

Approximate Maximum Likelihood

We assume that

\[ \text{argmax}_{\vec\zeta} \sum_{i = 1}^n l_i(\vec\zeta) \approx \text{argmax}_{\vec\zeta} \max_{\vec\theta_1,\dots,\vec\theta_n} \sum_{i = 1}^n\tilde l_i(\vec\zeta, \vec\theta_i) \]

The latter problem has many parameters but the problem is partially separable.

Can develop an efficient Newton-like method (Ormerod and Wand 2012) or use the psqn package (Christoffersen 2021)

as the problem is partially separable.

The Variational Distribution

For a given variational distribution \(q^{(i)}_{\vec\theta}\), we need to be able to easily evaluate

  • the entropy: \(-E_{q^{(i)}_{\vec\theta}}(\log q^{(i)}_{\vec\theta}(\vec U))\).
  • the first and second moment: \(E_{q^{(i)}_{\vec\theta}}(\vec U)\) and \(E_{q^{(i)}_{\vec\theta}}(\vec U\vec U^\top)\).
  • the moment generating function: \(E_{q^{(i)}_{\vec\theta}}(\exp(\vec t^\top\vec U))\).

Not strong requirements!

Implementation

Implemented with a multivariate normal distribution.

Can be extended with e.g. with multivariate t-distribution, skew normal distribution, or a combination thereof.

Will give a tighter lower bound, but may be harder to evaluate or harder to optimize (more saddle points or local maximums).

Implementation

Written in C++ with support for computation in parallel.

Uses an extension of the reverse mode automatic differentiation implementation from Savine (2018).

Similar to the Adept library (Hogan 2014). A gradient implementation will reduce the computation time.

Supports left-truncation, partly observed markers, and much more.

Simulation Study

Overview

Make a comparison with the Monte Carlo based joineRML package (Hickey, Philipson, and Jorgensen 2018).

Simulation study to check the bias of the VA.

Comparison with joineRML

Two different types of markers observed at different points in time.

Can only estimate the diagonal of \(\Sigma\).

There is one (terminal) time to event outcome for each individual.

We sample 300 individuals.

Only three random effects per individual to keep it simple.

A random intercept and a random slope for the first marker and a random intercept for the second marker.

Mean Curves for the First Marker

Mean Curves for the Second Marker

Pointwise Quantiles of the Hazard

The lines are the 10%, 50%, and 90% pointwise quantiles.

Results

The estimation time was 226.434 with joineRML and 4.228 with the new method

using four threads.

The estimated associations parameters, \(\vec\alpha_1\), are shown below.

Marker 1 Marker 2
truth -1.000 2.00
joineRML -0.880 1.92
VA -0.917 1.91

Results

joineRML VA
(Intercept)_1 0.0185 0.0188
time_1 0.0050 0.0130 0.0050 0.0130
(Intercept)_2 -0.0034 -0.0011 0.0198 -0.0040 -0.0009 0.0199

The estimated covariance matrix of the random effects, \(\Psi\), is shown above. The true values are shown below.

(Intercept)_1 0.0219
time_1 0.0005 0.0120
(Intercept)_2 -0.0031 -0.0015 0.02

Details

joineRML makes no assumption about the baseline hazard, \(\exp(\vec g_1(s)^\top\vec\beta_1)\).

This is computationally challenging with many individuals.

Sobol sequences are used with joineRML.

Both markers have a standard normal distributed covariate and the time to event outcome has \(\text{Unif}(-1, 1)\) covariate.

Observation times are drawn with increments that are exponentially distributed with rate 0.5.

Second Simulation Study

Two different types of markers observed possible on the same points in time.

One (terminal) time to event outcome for each individual and a model for the observation times.

We sample 1000 individuals.

Four random effects for the markers and two for the frailties, \(\vec\xi_i\), per individual.

Mean Curves for the First Marker

Mean Curves for the Second Marker

Hazard Curves: Terminal event

The lines are the 10%, 50%, and 90% pointwise quantiles.

Hazard Curves: Observation Process

The lines are the 10%, 50%, and 90% pointwise quantiles.

Results

Average computation time was \(111.299 \pm 2.943\) seconds. 100 data sets are sampled.

Terminal SE Observation SE
Marker 1 -0.133 0.684 0.243 0.345
Marker 2 0.341 0.383 -0.133 0.212

Bias estimates for the association parameters, \(\vec\alpha_1\) and \(\vec\alpha_2\), are shown above multiplied by 100. The SE columns are the standard errors. The true values are shown below.

Terminal Observation
Marker 1 0.6 -0.7
Marker 2 -0.3 0.3

Results

Terminal SE Observation SE
-2.295 1.082 0.807 0.469
0.314 0.643

Bias estimates for the fixed effects of the survival outcomes, \(\vec\delta_1\) and \(\vec\delta_2\), are shown above multiplied by 100. The SE columns are the standard errors. The true values are shown below.

Terminal Observation
-1.00 0.2
0.25

Results

Terminal SE Observation SE
6.46 2.65 -5.26 1.97
-12.56 4.70 -3.06 2.02
11.87 7.03
-33.21 11.42

Bias estimates for the log baseline hazard of the survival outcomes, \(\vec\omega_1\) and \(\vec\omega_2\), are shown above multiplied by 100. The SE columns are the standard errors. The true values are shown below.

Terminal Observation
0.50 -1.00
0.10 -0.25
-0.20
0.11

Results

Bias SE
-0.788 0.649
-1.043 -1.228 0.932 1.708
-0.051 0.222 -0.348 0.336 0.413 0.166
-0.172 -0.524 -0.087 -0.181 0.210 0.336 0.092 0.097

Bias estimates for the covariance matrix of the random effects, \(\Psi\), are shown above multiplied by 100. The last columns show the standard errors. The true values are shown below.

0.35
0.08 1.92
-0.05 -0.24 0.32
0.01 -0.04 0.09 0.12

Results

Bias SE
-0.081 0.034
0.004 0.019 0.025 0.017

Bias estimates for the covariance matrix of the markers’ error term, \(\Sigma\), are shown above multiplied by 100. The last columns show the standard errors. The true values are shown below.

0.090
0.022 0.04

Results

Bias SE
-1.704 0.377
-0.987 -0.324 0.300 0.163

Bias estimates for the covariance matrix of the frailties, \(\Xi\), are shown above multiplied by 100. The last columns show the standard errors. The rue values are shown below.

0.040
0.022 0.062

Details

\(\vec g_1\) and \(\vec m_1\) are natural cubic splines with boundary knots at \((0, 10)\). \(\vec g_1\) has interior knots at \(\{3.33,6.67\}\). The first marker has a standard normally distrusted covariate.

\(\vec g_2\) and \(\vec m_2\) are polynomials without and with an intercept and degree 2 and 1. The second marker has no covariates.

The terminal event has a B-spline basis with boundary knots at \((0,10)\) and an interior knot at \(5\). There is a \(\text{Unif}(-1, 1)\) covariate.

The observation process has a natural cubic spline with boundary knots at \((0,10)\) and an interior knot at \(5\). There is no covariates.

Extensions and Conclusions

Conclusions

Introduced variational approximations.

Showed that they can yield fast estimates of joint survival and marker models.

Simulation studies suggest that the bias is low.

Models

Extension to competing events and multi-state models will be interesting.

Time-varying covariate effects and interval censoring are simple extensions mathematically but not implemented.

Interval censoring: the event happens between \((t, v)\) but we do not know when.

Implementation

The software will soon be public on Github.

The psqn packages needs to be extended to allow for constraints (linear and non-linear)

Useful for approximate likelihood ratio based confidence intervals.

Variational Approximations

Considering other variational approximations will be interesting if bias is an issue in some models.

Recall that the requirements on the variational distribution, \(q^{(i)}_{\vec\theta}\), are not large!

Can consider clustered individuals. Estimation can be fast with restricted Gaussian variational approximations.

Particularly useful for crossed and nested random effects. E.g. Challis and Barber (2013), Miller, Foti, and Adams (2017), and Ong, Nott, and Smith (2018).

Thank You!

The presentation is at rpubs.com/boennecd/VA-Joint-Models-KTH-21.

The markdown is at github.com/boennecd/Talks.

The psqn package is on CRAN and at github.com/boennecd/psqn.

References are on the next slide.

References

Challis, Edward, and David Barber. 2013. “Gaussian Kullback-Leibler Approximate Inference.” Journal of Machine Learning Research 14 (32): 2239–86.
Christoffersen, Benjamin. 2021. Psqn: Partially Separable Quasi-Newton. https://CRAN.R-project.org/package=psqn.
Gasparini, Alessandro, Keith R. Abrams, Jessica K. Barrett, Rupert W. Major, Michael J. Sweeting, Nigel J. Brunskill, and Michael J. Crowther. 2020. “Mixed-Effects Models for Health Care Longitudinal Data with an Informative Visiting Process: A Monte Carlo Simulation Study.” Statistica Neerlandica 74 (1): 5–23. https://doi.org/https://doi.org/10.1111/stan.12188.
Hickey, Graeme L., Pete Philipson, and Andrea Jorgensen. 2018. “joineRML: A Joint Model and Software Package for Time-to-Event and Multivariate Longitudinal Outcomes.” BMC Medical Research Methodology 18 (1). https://doi.org/10.1186/s12874-018-0502-1.
Hogan, Robin J. 2014. “Fast Reverse-Mode Automatic Differentiation Using Expression Templates in c++.” ACM Trans. Math. Softw. 40 (4). https://doi.org/10.1145/2560359.
Miller, Andrew C., Nicholas J. Foti, and Ryan P. Adams. 2017. “Variational Boosting: Iteratively Refining Posterior Approximations.” In Proceedings of the 34th International Conference on Machine Learning, edited by Doina Precup and Yee Whye Teh, 70:2420–29. Proceedings of Machine Learning Research. International Convention Centre, Sydney, Australia: PMLR.
Ong, Victor M.-H., David J. Nott, and Michael S. Smith. 2018. “Gaussian Variational Approximation with a Factor Covariance Structure.” Journal of Computational and Graphical Statistics 27 (3): 465–78. https://doi.org/10.1080/10618600.2017.1390472.
Ormerod, J. T., and M. P. Wand. 2012. “Gaussian Variational Approximate Inference for Generalized Linear Mixed Models.” Journal of Computational and Graphical Statistics 21 (1): 2–17. https://doi.org/10.1198/jcgs.2011.09118.
Savine, Antoine. 2018. Modern Computational Finance: AAD and Parallel Simulations. John Wiley & Sons.