Estimating Joint Survival and Marker Models

dummy slide

Introduction

Example

How is higher than normal breast density and other variables related to breast cancer?

Is the risk related to

long periods of higher than normal density
the current density only, or
large changes over a short period of time?

Can be estimated with a joint model for the density, the marker, and the risk of breast cancer, the survival time.

Goals

The goals is a fast estimation method for general joint marker and survival models.

Start by defining the model.

Markers

Individuals’ markers are observed at different points in time.

We assume that

Each individual has a mean curve that may differ from the population mean curve across time.
Observations are noisy.

Example: Population Curve

The population mean curve with pointwise quantiles for the individuals’ mean curves.

Example: Population Curve

The population mean curve with pointwise quantiles for the individuals’ mean curves and with curves for two individuals.

Example: Population Curve

The population mean curve with pointwise quantiles for the individuals’ mean curves and with curves for two individuals and their observed values.

Formally

\[ \begin{align*} Y_{ij} &= \mu_i(s_{ij}, \vec U_i) + \epsilon_{ij} \\ \epsilon_{ij} &\sim N(0, \sigma^2) \\ \mu_{i}(s, \vec U_i) &= \vec x_i^\top\vec\gamma + \vec g(s)^\top\vec\beta + \vec m(s)^\top\vec U_i \\ \vec U_i &\sim N^{(R)}(\vec0, \Psi) \end{align*} \]

\(\vec x_i^\top\vec\gamma\): fixed effects (offsets).

E.g. sex differences.

\(\vec g(s)^\top\vec\beta\): the population mean curve.

\(\vec m(s)^\top\vec U_i\): difference for individual \(i\) to the population curve.

Time-to-event Outcomes

We have time-to-event outcomes.

Markers’ deviation from the population may be linked to higher or lower risk of the event for each individual.

Different ways the two can be linked.

Relation to Markers: Current Value

A higher level at any given point may lead to higher/lower risk.

Relation to Markers: Derivative

A stepper change at any given point may lead to higher/lower risk.

Relation to Markers: Cumulative

A long period of higher values may lead to higher/lower risk at any given point.

Survival Model

\[ \begin{align*} h_i(t\mid \vec U_i, \xi_i) &= \exp\left( \vec z_i^\top\vec \delta + \omega^\top \vec b(t) + \alpha \vec m(t)^\top\vec U_i + \xi_i \right) \\ \xi_i &\sim N(0, \Xi) \end{align*} \]

\(\vec z_i^\top\vec \delta\): fixed effects (offsets).

E.g. sex differences.

\(\vec\omega^\top \vec b(t)\): the log baseline hazard function.

\(\alpha\vec m(t)^\top\vec U_i\): individual specific deviation.

Shared with the marker!

\(\xi_i\): individual specific deviation.

Not shared with the marker.

Informative Observation Process

Markers may dependent on the likelihood of an observation.

People with a worse health status may tend to have worse marker values and have more measurements.

Such dependencies can be handled with joint models (Gasparini et al. 2020).

More Markers

We may \(L\) different markers rather than one

\[ \begin{align*} \vec Y_{ij}&= \vec\mu_i(s_{ij}, \vec U_i) + \vec\epsilon_{ij} \\ \epsilon_{ij} &\sim N^{(L)}(\vec 0, \Sigma) \\ \mu_{i1}(s, \vec U_i) &= \vec x_{i1}^\top\vec\gamma_1 + \vec g_1(s)^\top\vec\beta_1 + \vec m_1(s)^\top\vec U_{i1} \\ \vdots &\hphantom{=}\vdots\\\ \mu_{iL}(s, \vec U_i) &= \vec x_{iL}^\top\vec\gamma_L + \vec g_L(s)^\top\vec\beta_L + \vec m_L(s)^\top\vec U_{iL} \\ \vec U_i &= \begin{pmatrix} \vec U_{i1} \\ \vdots \\ \vec U_{iL} \end{pmatrix}\sim N^{(R)}(\vec0, \Psi). \end{align*} \]

More Survival Types

\[\begin{align*} G(s) &= \begin{pmatrix} \vec g_1(s)^\top & 0^\top & \cdots & \vec 0^\top \\ \vec 0^\top & \vec g_2(s)^\top & \ddots & \vdots \\ \vdots & \ddots & \ddots & \vec 0^\top \\ \vec 0^\top & \cdots & \vec 0^\top & \vec g_L(s)^\top \end{pmatrix} \\ \vec\gamma &=(\vec \gamma_1^\top,\dots,\vec\gamma_L^\top)^\top, \end{align*}\]

and define \(M(s)\), \(X_i\), and \(\vec\beta\) similarly. Then

\[\vec\mu_i(s_{ij}, \vec U_i) = X_i\vec\gamma + G(s_{ij})\vec\beta + M(s_{ij})\vec U_i.\]

More Survival Types

We may have \(H\) different survival types rather than one

\[ \begin{align*} h_{i1}(t\mid \vec U_i, \vec\xi_i) &= \exp\left( \vec z_{i1}^\top\vec \delta_1 + \omega_1^\top \vec b_1(t) + \vec\alpha_1^\top M(t)\vec U_i + \xi_{i1} \right) \\ \vdots &\hphantom{=}\vdots \\ h_{iH}(t\mid \vec U_i,\vec \xi_i) &= \exp\left( \vec z_{iL}^\top\vec \delta_H + \omega_H^\top\vec b_H(t) + \vec\alpha_H^\top M(t)\vec U_i + \xi_{iH} \right) \\ \vec\xi_i = \begin{pmatrix}\xi_{i1} \\ \vdots \\ \xi_{iH}\end{pmatrix} &\sim N^{(H)}(\vec 0, \Xi) \end{align*} \]

Alternative Parameterization

Simplify the notation to

\[ \mu_{i}(s, \vec U_i) = \vec x_i(s)^\top\vec\gamma + \vec m(s)^\top\vec U_i \]

Could let the hazard depend on \(\mu_{i}(s, \vec U_i)\) rather than \(\vec m(s)^\top\vec U_i\).

Alternative Parameterization (Cont.)

Suppose \(\vec z_i(t) = (\widetilde{\vec z}_i(t)^\top, \vec x_i(t)^\top)^\top\). Then

\[ \begin{align*} h_i(t\mid \vec U_i, \xi_i) &= \exp\big( \vec z_i(t)^\top\vec \delta + \alpha \vec m(t)^\top\vec U_i + \xi_i \big) \\ &= \exp\big( \widetilde{\vec z}_i(t)^\top\vec \delta_1 + \vec x_i(t)^\top\vec (\vec\delta_2 - \alpha\vec\gamma) \\ &\hspace{40pt} + \alpha \mu_{i}(t, \vec U_i) + \xi_i \big) \end{align*} \]

\(\alpha\) has the same interpretation unless \(\vec x_i(t)\) is not a linear combination of \(\vec z_i(t)\).

Alternative Parameterization (Cont.)

The latter may be more efficient but assumes that the effect of the covariates is \(\alpha\vec\gamma\)

if \(\vec x_i(t)\) is not a linear combination of \(\vec z_i(t)\).

Typically, we pick a very flexible baseline hazard so only leaving out a covariate will matter

unless we include time-varying covariate effects.

Likelihood and Approximations

The Likelihood

The likelihood is intractable.

Need an approximation.

Let \(\vec\zeta\) be the model parameters and \(l_i(\vec\zeta)\) be the log likelihood term of individual \(i\).

Variational Approximations

Given a distribution with density \(q^{(i)}_{\vec\theta}\) for \(\theta\in \Theta_i\) and outcomes \(\vec y_i\), the particular variational approximation we use is the lower bound

\[ l_i(\vec\zeta) \geq \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U = \tilde l_i(\vec\zeta, \vec\theta) \]

Typically, much faster to evaluate.

Variational Approximations (Cont.)

\[ \begin{align*} l_i(\vec\zeta) &= \log p_i(\vec y_i) = \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)\big/ q^{(i)}_{\vec\theta}(\vec U)} {p_i(\vec U\mid\vec y_i)\big/ q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U\\ &= \int q^{(i)}_{\vec\theta}(\vec U) \Bigg( \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U + \underbrace{\log \left(\frac{q^{(i)}_{\vec\theta}(\vec U)} {p_i(\vec U\mid\vec y_i)}\right)}_{\text{KL divergence}} \Bigg)d \vec U \\ &\geq \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U = \tilde l_i(\vec\zeta, \vec\theta) \end{align*} \]

Equality if \(p_i(\vec U\mid\vec y_i) = q^{(i)}_{\vec\theta}(\vec U)\).

Approximate Maximum Likelihood

\[ \text{argmax}_{\vec\zeta} \sum_{i = 1}^n l_i(\vec\zeta) \approx \text{argmax}_{\vec\zeta} \max_{\vec\theta_1,\dots,\vec\theta_n} \sum_{i = 1}^n\tilde l_i(\vec\zeta, \vec\theta_i) \]

The latter problem has many parameters but the problem is partially separable.

Can develop an efficient Newton-like method (Ormerod and Wand 2012) or use the psqn package (Christoffersen 2021)

as the problem is partially separable.

Implementation

Implemented with a multivariate normal distribution.

Written in C++ with support for computation in parallel.

Uses an extension of the reverse mode automatic differentiation implementation from Savine (2018).

Similar to the Adept library (Hogan 2014). A gradient implementation will reduce the computation time.

Supports left-truncation, partly observed markers,

cumulative, current value, and derivatives in the hazards, mixtures thereof, and more.

Simulation Study

Overview

Make a comparison with the Monte Carlo based joineRML package (Hickey, Philipson, and Jorgensen 2018).

Simulation study to check the bias of the VA.

Comparison with joineRML

Two different types of markers observed at different points in time.

Can only estimate the diagonal of \(\Sigma\).

One terminal time-to-event outcome for each individual.

We sample 300 individuals.

Only three random effects per individual to keep it simple.

A random intercept and a random slope for the first marker and a random intercept for the second marker.

Mean Curves for the First Marker

Mean Curves for the Second Marker

Pointwise Quantiles of the Hazard

The lines are the 10%, 50%, and 90% pointwise quantiles.

Results

The estimation time was 227.464 with joineRML and 2.285 with the new method

using four threads.

The estimated associations parameters, \(\vec\alpha_1\), are shown below.

	Marker 1	Marker 2
truth	-1.000	2.00
joineRML	-0.880	1.92
VA	-0.912	1.91

Results

	joineRML			VA
(Intercept)_1	0.0185			0.0188
time_1	0.0050	0.0130		0.0050	0.0130
(Intercept)_2	-0.0034	-0.0011	0.0198	-0.0040	-0.0009	0.0199

The estimated covariance matrix of the random effects, \(\Psi\), is shown above. The true values are shown below.

True values
0.0219
0.0005	0.0120
-0.0031	-0.0015	0.02

Details

joineRML makes no assumption about the baseline hazard, \(\exp(\vec g_1(s)^\top\vec\beta_1)\).

This is computationally challenging with many individuals.

Sobol sequences are used with joineRML.

Both markers have a standard normal distributed covariate and the time-to-event outcome has \(\text{Unif}(-1, 1)\) covariate.

Observation times are drawn with increments that are exponentially distributed with rate 0.5.

Second Simulation Study

Two different types of markers

observed possible on the same points in time.

A terminal time-to-event outcome.

An informative observation process.

We sample 1000 individuals.

Four random effects for the markers and two for the frailties, \(\vec\xi_i\), per individual.

Mean Curves for the First Marker

Mean Curves for the Second Marker

Hazard Curves: Terminal event

The lines are the 10%, 50%, and 90% pointwise quantiles.

Hazard Curves: Observation Process

The lines are the 10%, 50%, and 90% pointwise quantiles.

Results

Average computation time was \(48.619 \pm 1.866\) seconds. 100 data sets are sampled.

	Terminal	SE	Observation	SE
Marker 1	0.000919	0.00688	0.00418	0.00344
Marker 2	0.002691	0.00383	-0.00302	0.00209

Bias estimates for the association parameters, \(\vec\alpha_1\) and \(\vec\alpha_2\), are shown above. The SE columns are the standard errors. The true values are shown below.

	Terminal	Observation
Marker 1	0.6	-0.7
Marker 2	-0.3	0.3

Approximate Likelihood Ratio

Approximate log profile likelihood for one association parameter with one data set. Dotted lines: 95% confidence interval limits (0.445, 0.695). Dashed line: true value.

Approximate Likelihood Ratio (cont.)

Coverage of an approximate 95% likelihood ratio based confidence interval is 96%.

based on 100 data sets.

The average computation time was 127.147 seconds.

Computationally efficient approximation Wald intervals can be implemented.

Results

	Terminal	SE	Observation	SE
	-0.02571	0.01089	0.00918	0.00467
	0.00329	0.00641

Bias estimates for the fixed effects of the survival outcomes, \(\vec\delta_1\) and \(\vec\delta_2\), are shown above. The SE columns are the standard errors. The true values are shown below.

	Terminal	Observation
	-1.00	0.2
	0.25

Results

Terminal	SE	Observation	SE
0.0727	0.0269	-0.0133	0.0195
-0.1317	0.0470	-0.0133	0.0202
0.1436	0.0693
-0.3306	0.1097

Bias estimates for the log baseline hazard of the survival outcomes, \(\vec\omega_1\) and \(\vec\omega_2\), are shown above. The SE columns are the standard errors. The true values are shown below.

	Terminal	Observation
	0.50	-1.00
	0.10	-0.25
	-0.20
	0.11

Results

Bias				SE
-0.007704				0.00647
-0.009586	-0.01022			0.00928	0.01706
-0.000668	0.00220	-0.003466		0.00336	0.00414	0.001657
-0.001955	-0.00528	-0.000821	-0.00172	0.00209	0.00336	0.000924	0.000972

Bias estimates for the covariance matrix of the random effects, \(\Psi\), are shown above. The last columns show the standard errors. The true values are shown below.

True values
0.35
0.08	1.92
-0.05	-0.24	0.32
0.01	-0.04	0.09	0.12

Results

	Bias			SE
	-0.000799			0.000340
	0.000041	0.000192		0.000247	0.000167

Bias estimates for the covariance matrix of the markers’ error term, \(\Sigma\), are shown above. The last columns show the standard errors. The true values are shown below.

	True values
	0.090
	0.022	0.04

Results

	Bias			SE
	-0.01559			0.00343
	0.00348	-0.00363		0.00219	0.00162

Bias estimates for the covariance matrix of the frailties, \(\Xi\), are shown above. The last columns show the standard errors. The rue values are shown below.

	True values
	0.040
	0.022	0.062

Details

\(\vec g_1\) and \(\vec m_1\) are natural cubic splines with boundary knots at \((0, 10)\). \(\vec g_1\) has interior knots at \(\{3.33,6.67\}\). The first marker has a standard normally distrusted covariate.

\(\vec g_2\) and \(\vec m_2\) are polynomials without and with an intercept and degree 2 and 1. The second marker has no covariates.

The terminal event has a B-spline basis with boundary knots at \((0,10)\) and an interior knot at \(5\). There is a \(\text{Unif}(-1, 1)\) covariate.

The observation process has a natural cubic spline with boundary knots at \((0,10)\) and an interior knot at \(5\). There is no covariates.

Extensions and Conclusions

Conclusions

Introduced variational approximations.

Showed that they can yield fast estimates of joint survival and marker models.

Simulation studies suggest that the bias is low.

Models

Extension to competing events and multi-state models.

Time-varying covariate effects and interval censoring are simple extensions mathematically but not implemented.

Variational Approximations

The lower bound requires

the entropy: \(-E_{q^{(i)}_{\vec\theta}}(\log q^{(i)}_{\vec\theta}(\vec U))\).
the first two moments: \(E_{q^{(i)}_{\vec\theta}}(\vec U)\) and \(E_{q^{(i)}_{\vec\theta}}(\vec U\vec U^\top)\).
the moment generating function: \(E_{q^{(i)}_{\vec\theta}}(\exp(\vec t^\top\vec U))\).

Other distributions also have reasonably easy expressions of these quantities.

Extension will give tighter lower bounds but may take longer to evaluate or be harder to optimize.

Thank You!

The presentation is at rpubs.com/boennecd/MEB-VAJointSurv.

The markdown is at github.com/boennecd/Talks.

The implementation is at github.com/boennecd/VAJointSurv.

The psqn package is on CRAN and at github.com/boennecd/psqn.

References are on the next slide.

References

Christoffersen, Benjamin. 2021. Psqn: Partially Separable Quasi-Newton. https://CRAN.R-project.org/package=psqn.

Gasparini, Alessandro, Keith R. Abrams, Jessica K. Barrett, Rupert W. Major, Michael J. Sweeting, Nigel J. Brunskill, and Michael J. Crowther. 2020. “Mixed-Effects Models for Health Care Longitudinal Data with an Informative Visiting Process: A Monte Carlo Simulation Study.” Statistica Neerlandica 74 (1): 5–23. https://doi.org/https://doi.org/10.1111/stan.12188.

Hickey, Graeme L., Pete Philipson, and Andrea Jorgensen. 2018. “joineRML: A Joint Model and Software Package for Time-to-Event and Multivariate Longitudinal Outcomes.” BMC Medical Research Methodology 18 (1). https://doi.org/10.1186/s12874-018-0502-1.

Hogan, Robin J. 2014. “Fast Reverse-Mode Automatic Differentiation Using Expression Templates in c++.” ACM Trans. Math. Softw. 40 (4). https://doi.org/10.1145/2560359.

Ormerod, J. T., and M. P. Wand. 2012. “Gaussian Variational Approximate Inference for Generalized Linear Mixed Models.” Journal of Computational and Graphical Statistics 21 (1): 2–17. https://doi.org/10.1198/jcgs.2011.09118.

Savine, Antoine. 2018. Modern Computational Finance: AAD and Parallel Simulations. John Wiley & Sons.