\[ \renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\Expec{\text{E}} \def\logit{\text{logit}} \def\diag{\text{diag}} \]
How is higher than normal breast density and other variables related to breast cancer?
Is the risk related to
Can be estimated with a joint model for the density, the marker, and the risk of breast cancer, the survival time.
The goals is a fast estimation method for general joint marker and survival models.
Start by defining the model.
Individuals’ markers are observed at different points in time.
We assume that
The population mean curve with pointwise quantiles for the individuals’ mean curves.
The population mean curve with pointwise quantiles for the individuals’ mean curves and with curves for two individuals.
The population mean curve with pointwise quantiles for the individuals’ mean curves and with curves for two individuals and their observed values.
\[ \begin{align*} Y_{ij} &= \mu_i(s_{ij}, \vec U_i) + \epsilon_{ij} \\ \epsilon_{ij} &\sim N(0, \sigma^2) \\ \mu_{i}(s, \vec U_i) &= \vec x_i^\top\vec\gamma + \vec g(s)^\top\vec\beta + \vec m(s)^\top\vec U_i \\ \vec U_i &\sim N^{(R)}(\vec0, \Psi) \end{align*} \]
E.g. sex differences.
\(\vec g(s)^\top\vec\beta\): the population mean curve.
\(\vec m(s)^\top\vec U_i\): difference for individual \(i\) to the population curve.
We have time-to-event outcomes.
Markers’ deviation from the population may be linked to higher or lower risk of the event for each individual.
Different ways the two can be linked.
A higher level at any given point may lead to higher/lower risk.
A stepper change at any given point may lead to higher/lower risk.
A long period of higher values may lead to higher/lower risk at any given point.
\[ \begin{align*} h_i(t\mid \vec U_i, \xi_i) &= \exp\left( \vec z_i^\top\vec \delta + \omega^\top \vec b(t) + \alpha \vec m(t)^\top\vec U_i + \xi_i \right) \\ \xi_i &\sim N(0, \Xi) \end{align*} \]
E.g. sex differences.
\(\vec\omega^\top \vec b(t)\): the log baseline hazard function.
Shared with the marker!
Not shared with the marker.
People with a worse health status may tend to have worse marker values and have more measurements.
Such dependencies can be handled with joint models (Gasparini et al. 2020).
We may \(L\) different markers rather than one
\[ \begin{align*} \vec Y_{ij}&= \vec\mu_i(s_{ij}, \vec U_i) + \vec\epsilon_{ij} \\ \epsilon_{ij} &\sim N^{(L)}(\vec 0, \Sigma) \\ \mu_{i1}(s, \vec U_i) &= \vec x_{i1}^\top\vec\gamma_1 + \vec g_1(s)^\top\vec\beta_1 + \vec m_1(s)^\top\vec U_{i1} \\ \vdots &\hphantom{=}\vdots\\\ \mu_{iL}(s, \vec U_i) &= \vec x_{iL}^\top\vec\gamma_L + \vec g_L(s)^\top\vec\beta_L + \vec m_L(s)^\top\vec U_{iL} \\ \vec U_i &= \begin{pmatrix} \vec U_{i1} \\ \vdots \\ \vec U_{iL} \end{pmatrix}\sim N^{(R)}(\vec0, \Psi). \end{align*} \]
\[\begin{align*} G(s) &= \begin{pmatrix} \vec g_1(s)^\top & 0^\top & \cdots & \vec 0^\top \\ \vec 0^\top & \vec g_2(s)^\top & \ddots & \vdots \\ \vdots & \ddots & \ddots & \vec 0^\top \\ \vec 0^\top & \cdots & \vec 0^\top & \vec g_L(s)^\top \end{pmatrix} \\ \vec\gamma &=(\vec \gamma_1^\top,\dots,\vec\gamma_L^\top)^\top, \end{align*}\]
and define \(M(s)\), \(X_i\), and \(\vec\beta\) similarly. Then
\[\vec\mu_i(s_{ij}, \vec U_i) = X_i\vec\gamma + G(s_{ij})\vec\beta + M(s_{ij})\vec U_i.\]
We may have \(H\) different survival types rather than one
\[ \begin{align*} h_{i1}(t\mid \vec U_i, \vec\xi_i) &= \exp\left( \vec z_{i1}^\top\vec \delta_1 + \omega_1^\top \vec b_1(t) + \vec\alpha_1^\top M(t)\vec U_i + \xi_{i1} \right) \\ \vdots &\hphantom{=}\vdots \\ h_{iH}(t\mid \vec U_i,\vec \xi_i) &= \exp\left( \vec z_{iL}^\top\vec \delta_H + \omega_H^\top\vec b_H(t) + \vec\alpha_H^\top M(t)\vec U_i + \xi_{iH} \right) \\ \vec\xi_i = \begin{pmatrix}\xi_{i1} \\ \vdots \\ \xi_{iH}\end{pmatrix} &\sim N^{(H)}(\vec 0, \Xi) \end{align*} \]
Simplify the notation to
\[ \mu_{i}(s, \vec U_i) = \vec x_i(s)^\top\vec\gamma + \vec m(s)^\top\vec U_i \]
Could let the hazard depend on \(\mu_{i}(s, \vec U_i)\) rather than \(\vec m(s)^\top\vec U_i\).
Suppose \(\vec z_i(t) = (\widetilde{\vec z}_i(t)^\top, \vec x_i(t)^\top)^\top\). Then
\[ \begin{align*} h_i(t\mid \vec U_i, \xi_i) &= \exp\big( \vec z_i(t)^\top\vec \delta + \alpha \vec m(t)^\top\vec U_i + \xi_i \big) \\ &= \exp\big( \widetilde{\vec z}_i(t)^\top\vec \delta_1 + \vec x_i(t)^\top\vec (\vec\delta_2 - \alpha\vec\gamma) \\ &\hspace{40pt} + \alpha \mu_{i}(t, \vec U_i) + \xi_i \big) \end{align*} \]
\(\alpha\) has the same interpretation unless \(\vec x_i(t)\) is not a linear combination of \(\vec z_i(t)\).
if \(\vec x_i(t)\) is not a linear combination of \(\vec z_i(t)\).
unless we include time-varying covariate effects.
Need an approximation.
Let \(\vec\zeta\) be the model parameters and \(l_i(\vec\zeta)\) be the log likelihood term of individual \(i\).
Given a distribution with density \(q^{(i)}_{\vec\theta}\) for \(\theta\in \Theta_i\) and outcomes \(\vec y_i\), the particular variational approximation we use is the lower bound
\[ l_i(\vec\zeta) \geq \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U = \tilde l_i(\vec\zeta, \vec\theta) \]
Typically, much faster to evaluate.
\[ \begin{align*} l_i(\vec\zeta) &= \log p_i(\vec y_i) = \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)\big/ q^{(i)}_{\vec\theta}(\vec U)} {p_i(\vec U\mid\vec y_i)\big/ q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U\\ &= \int q^{(i)}_{\vec\theta}(\vec U) \Bigg( \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U + \underbrace{\log \left(\frac{q^{(i)}_{\vec\theta}(\vec U)} {p_i(\vec U\mid\vec y_i)}\right)}_{\text{KL divergence}} \Bigg)d \vec U \\ &\geq \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U = \tilde l_i(\vec\zeta, \vec\theta) \end{align*} \]
Equality if \(p_i(\vec U\mid\vec y_i) = q^{(i)}_{\vec\theta}(\vec U)\).
\[ \text{argmax}_{\vec\zeta} \sum_{i = 1}^n l_i(\vec\zeta) \approx \text{argmax}_{\vec\zeta} \max_{\vec\theta_1,\dots,\vec\theta_n} \sum_{i = 1}^n\tilde l_i(\vec\zeta, \vec\theta_i) \]
The latter problem has many parameters but the problem is partially separable.
as the problem is partially separable.
Implemented with a multivariate normal distribution.
Written in C++ with support for computation in parallel.
Similar to the Adept library (Hogan 2014). A gradient implementation will reduce the computation time.
cumulative, current value, and derivatives in the hazards, mixtures thereof, and more.
Make a comparison with the Monte Carlo based joineRML package (Hickey, Philipson, and Jorgensen 2018).
Simulation study to check the bias of the VA.
Can only estimate the diagonal of \(\Sigma\).
One terminal time-to-event outcome for each individual.
We sample 300 individuals.
A random intercept and a random slope for the first marker and a random intercept for the second marker.
The lines are the 10%, 50%, and 90% pointwise quantiles.
using four threads.
The estimated associations parameters, \(\vec\alpha_1\), are shown below.
Marker 1 | Marker 2 | |
---|---|---|
truth | -1.000 | 2.00 |
joineRML | -0.880 | 1.92 |
VA | -0.912 | 1.91 |
joineRML | VA | ||||||
---|---|---|---|---|---|---|---|
(Intercept)_1 | 0.0185 | 0.0188 | |||||
time_1 | 0.0050 | 0.0130 | 0.0050 | 0.0130 | |||
(Intercept)_2 | -0.0034 | -0.0011 | 0.0198 | -0.0040 | -0.0009 | 0.0199 |
The estimated covariance matrix of the random effects, \(\Psi\), is shown above. The true values are shown below.
True values | |||
---|---|---|---|
0.0219 | |||
0.0005 | 0.0120 | ||
-0.0031 | -0.0015 | 0.02 |
This is computationally challenging with many individuals.
Sobol sequences are used with joineRML.
Both markers have a standard normal distributed covariate and the time-to-event outcome has \(\text{Unif}(-1, 1)\) covariate.
Observation times are drawn with increments that are exponentially distributed with rate 0.5.
observed possible on the same points in time.
A terminal time-to-event outcome.
An informative observation process.
We sample 1000 individuals.
Four random effects for the markers and two for the frailties, \(\vec\xi_i\), per individual.
The lines are the 10%, 50%, and 90% pointwise quantiles.
The lines are the 10%, 50%, and 90% pointwise quantiles.
Average computation time was \(48.619 \pm 1.866\) seconds. 100 data sets are sampled.
Terminal | SE | Observation | SE | |
---|---|---|---|---|
Marker 1 | 0.000919 | 0.00688 | 0.00418 | 0.00344 |
Marker 2 | 0.002691 | 0.00383 | -0.00302 | 0.00209 |
Bias estimates for the association parameters, \(\vec\alpha_1\) and \(\vec\alpha_2\), are shown above. The SE columns are the standard errors. The true values are shown below.
Terminal | Observation | |
---|---|---|
Marker 1 | 0.6 | -0.7 |
Marker 2 | -0.3 | 0.3 |
Approximate log profile likelihood for one association parameter with one data set. Dotted lines: 95% confidence interval limits (0.445, 0.695). Dashed line: true value.
based on 100 data sets.
The average computation time was 127.147 seconds.
Computationally efficient approximation Wald intervals can be implemented.
Terminal | SE | Observation | SE | |
---|---|---|---|---|
-0.02571 | 0.01089 | 0.00918 | 0.00467 | |
0.00329 | 0.00641 |
Bias estimates for the fixed effects of the survival outcomes, \(\vec\delta_1\) and \(\vec\delta_2\), are shown above. The SE columns are the standard errors. The true values are shown below.
Terminal | Observation | |
---|---|---|
-1.00 | 0.2 | |
0.25 |
Terminal | SE | Observation | SE | |
---|---|---|---|---|
0.0727 | 0.0269 | -0.0133 | 0.0195 | |
-0.1317 | 0.0470 | -0.0133 | 0.0202 | |
0.1436 | 0.0693 | |||
-0.3306 | 0.1097 |
Bias estimates for the log baseline hazard of the survival outcomes, \(\vec\omega_1\) and \(\vec\omega_2\), are shown above. The SE columns are the standard errors. The true values are shown below.
Terminal | Observation | |
---|---|---|
0.50 | -1.00 | |
0.10 | -0.25 | |
-0.20 | ||
0.11 |
Bias | SE | ||||||||
---|---|---|---|---|---|---|---|---|---|
-0.007704 | 0.00647 | ||||||||
-0.009586 | -0.01022 | 0.00928 | 0.01706 | ||||||
-0.000668 | 0.00220 | -0.003466 | 0.00336 | 0.00414 | 0.001657 | ||||
-0.001955 | -0.00528 | -0.000821 | -0.00172 | 0.00209 | 0.00336 | 0.000924 | 0.000972 |
Bias estimates for the covariance matrix of the random effects, \(\Psi\), are shown above. The last columns show the standard errors. The true values are shown below.
True values | ||||
---|---|---|---|---|
0.35 | ||||
0.08 | 1.92 | |||
-0.05 | -0.24 | 0.32 | ||
0.01 | -0.04 | 0.09 | 0.12 |
Bias | SE | ||||
---|---|---|---|---|---|
-0.000799 | 0.000340 | ||||
0.000041 | 0.000192 | 0.000247 | 0.000167 |
Bias estimates for the covariance matrix of the markers’ error term, \(\Sigma\), are shown above. The last columns show the standard errors. The true values are shown below.
True values | ||
---|---|---|
0.090 | ||
0.022 | 0.04 |
Bias | SE | ||||
---|---|---|---|---|---|
-0.01559 | 0.00343 | ||||
0.00348 | -0.00363 | 0.00219 | 0.00162 |
Bias estimates for the covariance matrix of the frailties, \(\Xi\), are shown above. The last columns show the standard errors. The rue values are shown below.
True values | ||
---|---|---|
0.040 | ||
0.022 | 0.062 |
\(\vec g_1\) and \(\vec m_1\) are natural cubic splines with boundary knots at \((0, 10)\). \(\vec g_1\) has interior knots at \(\{3.33,6.67\}\). The first marker has a standard normally distrusted covariate.
\(\vec g_2\) and \(\vec m_2\) are polynomials without and with an intercept and degree 2 and 1. The second marker has no covariates.
The terminal event has a B-spline basis with boundary knots at \((0,10)\) and an interior knot at \(5\). There is a \(\text{Unif}(-1, 1)\) covariate.
The observation process has a natural cubic spline with boundary knots at \((0,10)\) and an interior knot at \(5\). There is no covariates.
Introduced variational approximations.
Showed that they can yield fast estimates of joint survival and marker models.
Simulation studies suggest that the bias is low.
Extension to competing events and multi-state models.
Time-varying covariate effects and interval censoring are simple extensions mathematically but not implemented.
The lower bound requires
Extension will give tighter lower bounds but may take longer to evaluate or be harder to optimize.
The presentation is at rpubs.com/boennecd/MEB-VAJointSurv.
The markdown is at github.com/boennecd/Talks.
The implementation is at github.com/boennecd/VAJointSurv.
The psqn package is on CRAN and at github.com/boennecd/psqn.
References are on the next slide.