\[ \renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\Expec{\text{E}} \def\logit{\text{logit}} \def\diag{\text{diag}} \]
How is higher than normal breast density related to breast cancer?
Is the risk related to
We need a model for the density, the marker, and the risk of breast cancer, the survival time.
E.g. breast density.
We assume that markers
Observations for one individual.
Observations for two individuals.
Observations for two individuals with their mean curves.
The population mean curve with pointwise quantiles for the individuals’ mean curves.
The population mean curve with pointwise quantiles for the individuals’ mean curves and with curves for two individuals.
The population mean curve with pointwise quantiles for the individuals’ mean curves and with curves for two individuals and their observed values.
\[ \begin{align*} Y_{ij} &= \mu_i(s_{ij}, \vec U_i) + \epsilon_{ij} \\ \epsilon_{ij} &\sim N(0, \sigma^2) \\ \mu_{i}(s, \vec U_i) &= \vec x_i^\top\vec\gamma + \vec g(s)^\top\vec\beta + \vec m(s)^\top\vec U_i \\ \vec U_i &\sim N^{(R)}(\vec0, \Psi) \end{align*} \]
E.g. sex differences.
\(\vec g(s)^\top\vec\beta\): the population mean curve.
\(\vec m(s)^\top\vec U_i\): difference for individual \(i\) to the population curve.
E.g. diagnosis of disease, onset of a disease, and death from a disease.
The deviation from the markers from the population may be linked to higher or lower risk of the event for the individuals than the population.
Different ways the two can be linked.
A higher level at any given point may lead to higher/lower risk.
A stepper change at any given point may lead to higher/lower risk.
A long period of higher values may lead to higher/lower risk at any given point.
\(T_i\) is the event time of individual \(i\). The hazard is defined as
\[ h_i(t) = \lim_{dt\rightarrow0}\frac{P(t\leq T_i + dt\mid T_i \geq t)}{dt} \]
Given the hazard, \(h_i\), we can compute the density of \(T_i = t\) and probabilities for events like \(T_i > t\) and \(T_i\in (t, v)\).
\[ \begin{align*} h_i(t\mid \vec U_i, \xi_i) &= \exp\left( \vec z_i^\top\vec \delta + \omega^\top \vec b(t) + \alpha \vec m(t)^\top\vec U_i + \xi_i \right) \\ \xi_i &\sim N(0, \Xi) \end{align*} \]
E.g. sex differences.
\(\omega^\top \vec b(t)\): the population log hazard function.
Shared with the marker!
Not shared with the marker.
Can replace \(\alpha\vec m(t)^\top\vec U_i\) with
\[\alpha\left(\frac{\partial\vec m(t)}{\partial t}\right)^\top\vec U_i\]
or
\[\alpha\left(\int_0^t \vec m(s) ds\right)^\top\vec U_i.\]
… we can also add them.
People with a worse health status may tend to have worse marker values and have more measurements.
Such dependencies can be handled with joint models (Gasparini et al. 2020).
We may \(L\) different markers rather than one
\[ \begin{align*} \vec Y_{ij}&= \vec\mu_i(s_{ij}, \vec U_i) + \vec\epsilon_{ij} \\ \epsilon_{ij} &\sim N^{(L)}(\vec 0, \Sigma) \\ \mu_{i1}(s, \vec U_i) &= \vec x_{i1}^\top\vec\gamma_1 + \vec g_1(s)^\top\vec\beta_1 + \vec m_1(s)^\top\vec U_{i1} \\ \vdots &\hphantom{=}\vdots\\\ \mu_{iL}(s, \vec U_i) &= \vec x_{iL}^\top\vec\gamma_L + \vec g_L(s)^\top\vec\beta_L + \vec m_L(s)^\top\vec U_{iL} \\ \vec U_i &= \begin{pmatrix} \vec U_{i1} \\ \vdots \\ \vec U_{iL} \end{pmatrix}\sim N^{(R)}(\vec0, \Psi). \end{align*} \]
\[G(s) = \begin{pmatrix} \vec g_1(s)^\top & 0^\top & \cdots & \vec 0^\top \\ \vec 0^\top & \vec g_2(s)^\top & \ddots & \vdots \\ \vdots & \ddots & \ddots & \vec 0^\top \\ \vec 0^\top & \cdots & \vec 0^\top & \vec g_L(s)^\top \end{pmatrix}\]
and \(M(s)\) and \(X_i\) similarly. Then
\[\vec\mu_i(s_{ij}, \vec U_i) = X_i\vec\gamma + G(s_{ij})\vec\beta + M(s_{ij})\vec U_i.\]
We may have \(H\) different survival types rather than one
\[ \begin{align*} h_{i1}(t\mid \vec U_i, \vec\xi_i) &= \exp\left( \vec z_{i1}^\top\vec \delta_1 + \omega_1^\top \vec b_1(t) + \vec\alpha_1^\top M(t)\vec U_i + \xi_{i1} \right) \\ \vdots &\hphantom{=}\vdots \\ h_{iH}(t\mid \vec U_i,\vec \xi_i) &= \exp\left( \vec z_{iL}^\top\vec \delta_H + \omega_H^\top\vec b_H(t) + \vec\alpha_H^\top M(t)\vec U_i + \xi_{iH} \right) \\ \vec\xi_i = \begin{pmatrix}\xi_{i1} \\ \vdots \\ \xi_{iH}\end{pmatrix} &\sim N^{(H)}(\vec 0, \Xi) \end{align*} \]
The likelihood is intractable. Let \(\vec\zeta\) be the model parameters and \(l_i(\vec\zeta)\) be the log likelihood term of individual \(i\).
\(l_i(\vec\zeta)\) is of the form
\[ \exp(l_i(\vec\zeta)) = E\left( g_i(\vec U_i)h_{ij}(t\mid \vec U_i, \vec\xi_i) \exp\left(-\int_0^t h_{ij}(s\mid \vec U_i, \vec\xi_i)ds\right)\right) \]
with implicit depends on \(\vec\zeta\).
Given distribution \(q^{(i)}_{\vec\theta}\) for \(\theta\in \Theta_i\) and outcomes \(\vec y_i\)
\[ \begin{align*} l_i(\vec\zeta) &= \log p_i(\vec y_i) = \log \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)\big/ q^{(i)}_{\vec\theta}(\vec U)} {p_i(\vec U\mid\vec y_i)\big/ q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U\\ &= \log \int q^{(i)}_{\vec\theta}(\vec U) \left( \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U + \log \left(\frac{q^{(i)}_{\vec\theta}(\vec U)} {p_i(\vec U\mid\vec y_i)}\right) \right)d \vec U \\ &\geq \log \int q^{(i)}_{\vec\theta}(\vec U) \log \left(\frac{p_i(\vec y_i, \vec U)} {q^{(i)}_{\vec\theta}(\vec U)}\right) d \vec U = \tilde l_i(\vec\zeta, \vec\theta) \end{align*} \]
Equality if \(p_i(\vec U\mid\vec y_i) = q^{(i)}_{\vec\theta}(\vec U)\).
We assume that
\[ \text{argmax}_{\vec\zeta} \sum_{i = 1}^n l_i(\vec\zeta) \approx \text{argmax}_{\vec\zeta} \max_{\vec\theta_1,\dots,\vec\theta_n} \sum_{i = 1}^n\tilde l_i(\vec\zeta, \vec\theta_i) \]
The latter problem has many parameters but the problem is partially separable.
as the problem is partially separable.
For a given variational distribution \(q^{(i)}_{\vec\theta}\), we need to be able to easily evaluate
Not strong requirements!
Implemented with a multivariate normal distribution.
Will give a tighter lower bound, but may be harder to evaluate or harder to optimize (more saddle points or local maximums).
Written in C++ with support for computation in parallel.
Similar to the Adept library (Hogan 2014). A gradient implementation will reduce the computation time.
Supports left-truncation, partly observed markers, and much more.
Make a comparison with the Monte Carlo based joineRML package (Hickey, Philipson, and Jorgensen 2018).
Simulation study to check the bias of the VA.
Can only estimate the diagonal of \(\Sigma\).
There is one (terminal) time to event outcome for each individual.
We sample 300 individuals.
A random intercept and a random slope for the first marker and a random intercept for the second marker.
The lines are the 10%, 50%, and 90% pointwise quantiles.
using four threads.
The estimated associations parameters, \(\vec\alpha_1\), are shown below.
| Marker 1 | Marker 2 | |
|---|---|---|
| truth | -1.000 | 2.00 |
| joineRML | -0.880 | 1.92 |
| VA | -0.917 | 1.91 |
| joineRML | VA | ||||||
|---|---|---|---|---|---|---|---|
| (Intercept)_1 | 0.0185 | 0.0188 | |||||
| time_1 | 0.0050 | 0.0130 | 0.0050 | 0.0130 | |||
| (Intercept)_2 | -0.0034 | -0.0011 | 0.0198 | -0.0040 | -0.0009 | 0.0199 |
The estimated covariance matrix of the random effects, \(\Psi\), is shown above. The true values are shown below.
| (Intercept)_1 | 0.0219 | ||
| time_1 | 0.0005 | 0.0120 | |
| (Intercept)_2 | -0.0031 | -0.0015 | 0.02 |
This is computationally challenging with many individuals.
Sobol sequences are used with joineRML.
Both markers have a standard normal distributed covariate and the time to event outcome has \(\text{Unif}(-1, 1)\) covariate.
Observation times are drawn with increments that are exponentially distributed with rate 0.5.
Two different types of markers observed possible on the same points in time.
One (terminal) time to event outcome for each individual and a model for the observation times.
We sample 1000 individuals.
Four random effects for the markers and two for the frailties, \(\vec\xi_i\), per individual.
The lines are the 10%, 50%, and 90% pointwise quantiles.
The lines are the 10%, 50%, and 90% pointwise quantiles.
Average computation time was \(111.299 \pm 2.943\) seconds. 100 data sets are sampled.
| Terminal | SE | Observation | SE | |
|---|---|---|---|---|
| Marker 1 | -0.133 | 0.684 | 0.243 | 0.345 |
| Marker 2 | 0.341 | 0.383 | -0.133 | 0.212 |
Bias estimates for the association parameters, \(\vec\alpha_1\) and \(\vec\alpha_2\), are shown above multiplied by 100. The SE columns are the standard errors. The true values are shown below.
| Terminal | Observation | |
|---|---|---|
| Marker 1 | 0.6 | -0.7 |
| Marker 2 | -0.3 | 0.3 |
| Terminal | SE | Observation | SE | |
|---|---|---|---|---|
| -2.295 | 1.082 | 0.807 | 0.469 | |
| 0.314 | 0.643 |
Bias estimates for the fixed effects of the survival outcomes, \(\vec\delta_1\) and \(\vec\delta_2\), are shown above multiplied by 100. The SE columns are the standard errors. The true values are shown below.
| Terminal | Observation | |
|---|---|---|
| -1.00 | 0.2 | |
| 0.25 |
| Terminal | SE | Observation | SE | |
|---|---|---|---|---|
| 6.46 | 2.65 | -5.26 | 1.97 | |
| -12.56 | 4.70 | -3.06 | 2.02 | |
| 11.87 | 7.03 | |||
| -33.21 | 11.42 |
Bias estimates for the log baseline hazard of the survival outcomes, \(\vec\omega_1\) and \(\vec\omega_2\), are shown above multiplied by 100. The SE columns are the standard errors. The true values are shown below.
| Terminal | Observation | |
|---|---|---|
| 0.50 | -1.00 | |
| 0.10 | -0.25 | |
| -0.20 | ||
| 0.11 |
| Bias | SE | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| -0.788 | 0.649 | ||||||||
| -1.043 | -1.228 | 0.932 | 1.708 | ||||||
| -0.051 | 0.222 | -0.348 | 0.336 | 0.413 | 0.166 | ||||
| -0.172 | -0.524 | -0.087 | -0.181 | 0.210 | 0.336 | 0.092 | 0.097 |
Bias estimates for the covariance matrix of the random effects, \(\Psi\), are shown above multiplied by 100. The last columns show the standard errors. The true values are shown below.
| 0.35 | ||||
| 0.08 | 1.92 | |||
| -0.05 | -0.24 | 0.32 | ||
| 0.01 | -0.04 | 0.09 | 0.12 |
| Bias | SE | ||||
|---|---|---|---|---|---|
| -0.081 | 0.034 | ||||
| 0.004 | 0.019 | 0.025 | 0.017 |
Bias estimates for the covariance matrix of the markers’ error term, \(\Sigma\), are shown above multiplied by 100. The last columns show the standard errors. The true values are shown below.
| 0.090 | ||
| 0.022 | 0.04 |
| Bias | SE | ||||
|---|---|---|---|---|---|
| -1.704 | 0.377 | ||||
| -0.987 | -0.324 | 0.300 | 0.163 |
Bias estimates for the covariance matrix of the frailties, \(\Xi\), are shown above multiplied by 100. The last columns show the standard errors. The rue values are shown below.
| 0.040 | ||
| 0.022 | 0.062 |
\(\vec g_1\) and \(\vec m_1\) are natural cubic splines with boundary knots at \((0, 10)\). \(\vec g_1\) has interior knots at \(\{3.33,6.67\}\). The first marker has a standard normally distrusted covariate.
\(\vec g_2\) and \(\vec m_2\) are polynomials without and with an intercept and degree 2 and 1. The second marker has no covariates.
The terminal event has a B-spline basis with boundary knots at \((0,10)\) and an interior knot at \(5\). There is a \(\text{Unif}(-1, 1)\) covariate.
The observation process has a natural cubic spline with boundary knots at \((0,10)\) and an interior knot at \(5\). There is no covariates.
Introduced variational approximations.
Showed that they can yield fast estimates of joint survival and marker models.
Simulation studies suggest that the bias is low.
Extension to competing events and multi-state models will be interesting.
Interval censoring: the event happens between \((t, v)\) but we do not know when.
The software will soon be public on Github.
Useful for approximate likelihood ratio based confidence intervals.
Recall that the requirements on the variational distribution, \(q^{(i)}_{\vec\theta}\), are not large!
Particularly useful for crossed and nested random effects. E.g. Challis and Barber (2013), Miller, Foti, and Adams (2017), and Ong, Nott, and Smith (2018).
The presentation is at rpubs.com/boennecd/VA-Joint-Models-KTH-21.
The markdown is at github.com/boennecd/Talks.
The psqn package is on CRAN and at github.com/boennecd/psqn.
References are on the next slide.