The package longsurvAccuracyMeasures can be used to evaluate a partly conditional model fit using the R package partlyconditional. Given validation data consisting of a time-to-event outcome and longitudinal marker information on a set of individuals up to a fixed time window, this package includes functions to assess model calibration (PC.calibration) and prognostic accuracy (PC.evaluation) at a specified future landmark prediction time. Point estimates, along with bootstrap confidence intervals, are provided for the following accuracy measures: AUC, ROC (TPF/FPF), PPV, NPV, PCF (proportion cases followed), and PNF (proportion needed to follow).

Tutorial

Load the package

Package can be downloaded directly from Github using the devtools package.

library(devtools)
###install 
devtools::install_github("mdbrown/longSurvAccuracyMeasures")

All package code is available on Github.

#load libraries
library(survival)
library(tidyverse)
library(partlyconditional)
library(longsurvAccuracyMeasures)

Simulated data

For this tutorial, we use data on 478 simulated observations from 100 hypothetical individuals with repeated marker measurements. ‘marker_1’ was simulated to be associated with the outcome status, while ‘marker_2’ is simulated to be random noise.

data(pc_data)

head(pc_data)

##    sub.id       time status meas.time   marker_1   marker_2
## 1       1   9.661293      1         0  1.5966568  0.7168800
## 2       1   9.661293      1         6  2.8376620  0.7314807
## 11      2   4.571974      1         0  0.6415240  0.9021957
## 21      3 103.617181      1         0 -0.5003165  1.5359251
## 22      3 103.617181      1         6  1.2697985 -1.2054431
## 23      3 103.617181      1        12  0.6484258 -1.9537152

Note that pc_data is in ‘long’ format, with one row per measurement time. Each individual has a unique numeric subject id (sub.id) where event time (time) and event status (status) are repeated across marker measurement times (meas.time) given in months.

Fit a partly conditional Cox model using the `partlyconditional` R package

Below we use the function PC.Cox to fit a PC Cox model. To specify the model, we include information on patient id (id), survival time (stime), censoring status (status), measurement time (measurement.time), and markers. Below we fit a model using raw meas.time and two markers. Raw marker values are used in the model as predictors since use.BLUP is set to FALSE for both markers.

For more information on fitting partly conditional Cox or GLM models, please see the tutorial for the R package partlyconditional.

pc.cox.1 <-  PC.Cox(
        id = "sub.id",
        stime = "time",
        status = "status",
        measurement.time = "meas.time",  ##survival and measurement times must be on the same scale!!!
        markers = c("marker_1", "marker_2"),
        data = pc_data,
        use.BLUP = c(FALSE, FALSE), 
        knots.measurement.time = 3)

pc.cox.1

## ### Call:
## PC.Cox(id = "sub.id", stime = "time", status = "status", measurement.time = "meas.time", 
##     markers = c("marker_1", "marker_2"), data = pc_data, use.BLUP = c(FALSE, 
##         FALSE), knots.measurement.time = 3)
## 
## ### Partly conditional Cox model:
##                                coef exp(coef)   se(coef)  robust se          z     Pr(>|z|)
## meas.time.spline.basis1 -0.22523209 0.7983309 0.23702493 0.19289842 -1.1676202 2.429600e-01
## meas.time.spline.basis2 -0.07935234 0.9237144 0.26120392 0.18929434 -0.4192008 6.750694e-01
## meas.time.spline.basis3 -0.21490642 0.8066169 0.19258407 0.20257471 -1.0608749 2.887468e-01
## marker_1                -0.37453287 0.6876104 0.04110525 0.05261546 -7.1183042 1.092682e-12
## marker_2                -0.03347195 0.9670820 0.04539559 0.04160602 -0.8044979 4.211095e-01

Model Evaluation

Use the function PC.evaluation to estimate measures of prognostic accuracy for the model fit above. We specify that we wish to evaluate risk predictions from the model for 12 months in the future conditional on observing 18-24 months of longitudinal marker information. All marker information in pc_data after month 24 will be removed and a note will be generated from the function. Use silent = TRUE to suppress messages.

Summary measures are estimated using the set of observations with measurement times falling within the conditioning time window set by ‘conditioning.time.window’. If two observations from the same ‘individual’ fall within this time window, the observation with greater measurement time is used. These data can be accessed in the ‘data.for.measures’ element of the list returned from this function.

result <- PC.evaluation(pc.cox.1,
                     newdata = pc_data,  
                     conditioning.time.window = c(18, 24), 
                     prediction.time = 12, 
                     risk.threshold = .4, 
                     pnf.threshold = .5, 
                     pcf.threshold = .18, 
                     bootstraps = 100) #should use more bootstrap replicates in practice.

## ... removing 156 observations where meas.time is greater than conditioning.time.window = 24.
## ... calculating summary measures using 55 observations with measurement times within the conditioning time window  [18, 24].

result

## ### Call:
## PC.evaluation(pc.object = pc.cox.1, newdata = pc_data, prediction.time = 12, 
##     conditioning.time.window = c(18, 24), risk.threshold = 0.4, 
##     pnf.threshold = 0.5, pcf.threshold = 0.18, bootstraps = 100)
## 
## ### Summary Measure Estimates:
## # A tibble: 10 x 5
##                       measure estimate    se lower upper
##                         <chr>    <dbl> <dbl> <dbl> <dbl>
##  1                         PE    0.248 0.025 0.202 0.296
##  2                        AUC    0.793 0.070 0.661 0.908
##  3                   TPF(0.4)    0.356 0.096 0.154 0.547
##  4                   FPF(0.4)    0.074 0.048 0.000 0.167
##  5                   PPV(0.4)    0.837 0.110 0.603 1.000
##  6                   NPV(0.4)    0.574 0.072 0.417 0.695
##  7                  PCF(0.18)    0.322 0.046 0.227 0.407
##  8                   PNF(0.5)    0.345 0.057 0.239 0.438
##  9 Proportion with risk > 0.4    0.218 0.055 0.127 0.327
## 10                 Prevalence    0.516 0.062 0.405 0.641
## 
## ### ROC curve estimates in pc.object$roc
## ### showing top 5 rows...
## # A tibble: 5 x 9
##   risk.threshold risk.percentile        TPF   FPF   PPV       NPV landmark.time conditioning.time
##            <dbl>           <dbl>      <dbl> <dbl> <dbl>     <dbl>         <dbl>             <dbl>
## 1      1.0000000      0.01818182 0.00000000     0    NA        NA            36                24
## 2      0.6293872      1.00000000 0.00000000     0   NaN 0.4924528            36                24
## 3      0.6225250      0.98181818 0.03468900     0     1 0.5016015            36                24
## 4      0.5395187      0.96363636 0.06937799     0     1 0.5124346            36                24
## 5      0.4922109      0.94545455 0.10885167     0     1 0.5223482            36                24
## # ... with 1 more variables: prediction.time <dbl>
## ...
## ### data used to estimate measures in pc.object$data.for.measures

Plot an ROC curve

Components to plot the ROC curve can be found in result$roc. Below we show an example of how to plot the ROC curve using this output. NPV, PPV and percentiles of risk are also included in the result$roc data.frame.

library(ggplot2)
roc.data <- result$roc

ggplot(roc.data, aes(FPF, TPF)) + 
  geom_step(color = "dodgerblue", size = 1.2) + 
  geom_abline(slope = 1, intercept = 0, linetype = 2) + 
  theme_bw()

head(roc.data)

## # A tibble: 6 x 9
##   risk.threshold risk.percentile        TPF   FPF   PPV       NPV landmark.time conditioning.time
##            <dbl>           <dbl>      <dbl> <dbl> <dbl>     <dbl>         <dbl>             <dbl>
## 1      1.0000000      0.01818182 0.00000000     0    NA        NA            36                24
## 2      0.6293872      1.00000000 0.00000000     0   NaN 0.4924528            36                24
## 3      0.6225250      0.98181818 0.03468900     0     1 0.5016015            36                24
## 4      0.5395187      0.96363636 0.06937799     0     1 0.5124346            36                24
## 5      0.4922109      0.94545455 0.10885167     0     1 0.5223482            36                24
## 6      0.4843316      0.92727273 0.14354067     0     1 0.5341064            36                24
## # ... with 1 more variables: prediction.time <dbl>

Model Calibration

We evaluate the calibration for the PC risk model using the function PC.calibration. In the function call, we again specify a conditioning time window and a future prediction time at which to assess model calibration.

The figure below shows predicted risk (dashed line) and observed risk (points with whiskers) by risk percentile calculated for a 12 month prediction time conditional on observing 18-24 months of longitudinal marker information. Observed risks, along with 95% confidence intervals, are estimated using the Kaplan-Meier estimator. Predicted risks are directly estimated from the partly conditional model fit above.

calibration.result <- PC.calibration( pc.cox.1,
                                      newdata = pc_data,  
                                      conditioning.time.window = c(18, 24), 
                                      prediction.time = 12, 
                                      n.groups = 8, 
                                      plot = TRUE, 
                                      silent = TRUE)

calibration.result$observed

## # A tibble: 8 x 4
##   observed.risk      lower     upper percentile
##           <dbl>      <dbl>     <dbl>      <dbl>
## 1     0.2857143 0.00000000 0.5529091     0.0625
## 2     0.0000000 0.00000000 0.0000000     0.1875
## 3     0.1428571 0.00000000 0.3665535     0.3125
## 4     0.2857143 0.00000000 0.5529091     0.4375
## 5     0.4285714 0.00000000 0.6991564     0.5625
## 6     0.7142857 0.07823084 0.9114392     0.6875
## 7     0.7142857 0.07823084 0.9114392     0.8125
## 8     0.6666667 0.00000000 0.8924929     0.9375

calibration.result$predicted

## # A tibble: 55 x 2
##    predicted.risk percentile
##             <dbl>      <dbl>
##  1      0.3645136  0.7090909
##  2      0.2939278  0.4181818
##  3      0.4648341  0.8727273
##  4      0.4391236  0.8181818
##  5      0.4716525  0.8909091
##  6      0.2028419  0.1636364
##  7      0.3356909  0.6000000
##  8      0.3252955  0.4909091
##  9      0.4730417  0.9090909
## 10      0.5395187  0.9636364
## # ... with 45 more rows

Tutorial for R package `longsurvAccuracyMeasures`

Marshall Brown

3/9/2018

Tutorial

Load the package

Simulated data

Fit a partly conditional Cox model using the `partlyconditional` R package

Model Evaluation

Plot an ROC curve

Model Calibration

Tutorial for R package longsurvAccuracyMeasures

Marshall Brown

3/9/2018

Tutorial

Load the package

Simulated data

Fit a partly conditional Cox model using the partlyconditional R package

Model Evaluation

Plot an ROC curve

Model Calibration

Tutorial for R package `longsurvAccuracyMeasures`

Fit a partly conditional Cox model using the `partlyconditional` R package