In this tutorial we show examples on how to obtain risk predictions based on a longitudinal marker in a case-cohort (CCH) or a nested case-control (NCC) sample, using methods described in ``Evaluating longitudinal markers under two-phase study designs’’ by Marlena Maziarz, Tianxi Ci, Li Qi, Anna Lok and Yingye Zheng. The examples below illustrate the use of our methods to predict risk using PC\(_{GLM}\) based on a longitudinal biomarker with a survival outcome, as well as several evaluation measures of prediction and their standard error. The evaluation measures we considered are the prediction error (PE), true and false positive fractions (TPF and FPF), the area under the Receiver Operating Characteristics curve (AUC), the proportion of cases followed (PCF) and the proportion of cases needed to be followed (PNF). All code for predicting and evaluating risk, example datasets and examples on how to run the analysis are also available on github.com/marlenamaziarz/longitudinal-two-phase (right-click and open in new tab/window).
require(survival)
require(gam)
require(stats)
require(splines)
require(RCurl) # for 'getURL'
require(repmis) # for 'source_data'
The source code on GitHub contains all the functions needed for risk estimation and estimation of the measures to evaluate risk predictions, including standard errors estimated using perturbation. To load the source code:
script <- getURL("https://raw.githubusercontent.com/marlenamaziarz/longitudinal-two-phase/master/code-helper-functions.r", ssl.verifypeer = FALSE)
eval(parse(text = script))
The example dataset for NCC can be loaded using:
source_data("https://github.com/marlenamaziarz/longitudinal-two-phase/blob/master/dataset-ncc.rdata?raw=True")
The dataset has the following columns:
## sub.id visit meas.time marker time status t.star in.ncc ix.first
## 1 1 1 0 4.0173812 3.116595 0 3.116595 FALSE TRUE
## 11 2 1 0 2.8730772 8.025501 0 8.025501 FALSE TRUE
## 12 2 2 6 1.4007894 8.025501 0 2.025501 FALSE FALSE
## 21 3 1 0 5.3736497 20.844036 0 20.844036 FALSE TRUE
## 22 3 2 6 2.9753239 20.844036 0 14.844036 FALSE FALSE
## 23 3 3 12 2.2411058 20.844036 0 8.844036 FALSE FALSE
## 24 3 4 18 1.6639688 20.844036 0 2.844036 FALSE FALSE
## 31 4 1 0 2.7390869 29.367374 1 29.367374 TRUE TRUE
## 32 4 2 6 0.8567458 29.367374 1 23.367374 TRUE FALSE
## 33 4 3 12 0.3968545 29.367374 1 17.367374 TRUE FALSE
where sub.id = subject id, vist = visit number at time meas.time, marker = marker value, time = event or censoring time, status = event status (1 = event observed, 0 otherwise), t.star = time since measurement, i.e. time - meas.time, in.ncc = index of whether the subject was selected into the NCC study, ix.first = index of the first visits.
To predict risk and evaluation measures based on the NCC example dataset, run the following:
tau <- 6
dc.list <- NULL
results.list <- NULL
results.mx <- NULL
for(si.i in seq(6, 36, 6)){
# P = pert = 500 # ncc =T/F NCC/CCH analysis ie. ncc = T = NCC, ncc = F = CCH.
dc <- set.data.control(si = si.i, ti = si.i + tau, P = 500, seed = 1, ncc = T)
results.list <- c(results.list, main(dc, d.full = d.ncc.cohort))
dc.list <- c(dc.list, list(dc))
# save(list = ls(), file = 'results-ncc.rdata')
}
for(i in 1:length(results.list)){
res.out <- with(results.list[[i]], cbind(sim.stats, sim.stats.se, n.case, n.ctrl, n.cens))
dc.out <- with(dc.list[[i]], c(si, ti, pred.time, P))
res.dc.out <- cbind(matrix(rep(dc.out, each = dc$n.stats), nrow = dc$n.stats, byrow = F), round(res.out, 4))
results.mx <- rbind(results.mx, res.dc.out)
}
colnames(results.mx) <- c('s', 't', 'tau_0', 'Pert', 'Est', 'SE', 'n.case', 'n.ctrl', 'n.cens')
save(list = ls(), file = paste('results-ncc-table-p', dc$P, '-tau', tau, '.rdata', sep = ''))
| s | t | tau_0 | Pert | Est | SE | n.case | n.ctrl | n.cens | |
|---|---|---|---|---|---|---|---|---|---|
| PE | 6 | 12 | 6 | 500 | 0.035 | 0.0046 | 53 | 512 | 8 |
| TPF(0.4) | 6 | 12 | 6 | 500 | 0.13 | 0.053 | 53 | 512 | 8 |
| FPF(0.3) | 6 | 12 | 6 | 500 | 0.01 | 0.0049 | 53 | 512 | 8 |
| AUC | 6 | 12 | 6 | 500 | 0.79 | 0.034 | 53 | 512 | 8 |
| PCF(0.2) | 6 | 12 | 6 | 500 | 0.55 | 0.069 | 53 | 512 | 8 |
| PNF(0.8) | 6 | 12 | 6 | 500 | 0.44 | 0.07 | 53 | 512 | 8 |
| s | t | tau_0 | Pert | Est | SE | n.case | n.ctrl | n.cens | |
|---|---|---|---|---|---|---|---|---|---|
| PE | 36 | 42 | 6 | 500 | 0.11 | 0.017 | 19 | 104 | 22 |
| TPF(0.4) | 36 | 42 | 6 | 500 | 0.37 | 0.11 | 19 | 104 | 22 |
| FPF(0.3) | 36 | 42 | 6 | 500 | 0.2 | 0.041 | 19 | 104 | 22 |
| AUC | 36 | 42 | 6 | 500 | 0.72 | 0.059 | 19 | 104 | 22 |
| PCF(0.2) | 36 | 42 | 6 | 500 | 0.42 | 0.11 | 19 | 104 | 22 |
| PNF(0.8) | 36 | 42 | 6 | 500 | 0.53 | 0.071 | 19 | 104 | 22 |
To load the data file:
source_data("https://github.com/marlenamaziarz/longitudinal-two-phase/blob/master/dataset-cch.rdata?raw=True")
The dataset has the following columns:
## sub.id visit meas.time marker time status t.star in.cch ix.first
## 1 1 1 0 4.047375 8.020719 0 8.0207186 FALSE TRUE
## 2 1 2 6 2.278364 8.020719 0 2.0207186 FALSE FALSE
## 11 2 1 0 4.367408 8.070592 0 8.0705921 FALSE TRUE
## 12 2 2 6 2.036170 8.070592 0 2.0705921 FALSE FALSE
## 21 3 1 0 4.932912 18.897848 0 18.8978483 FALSE TRUE
## 22 3 2 6 2.996144 18.897848 0 12.8978483 FALSE FALSE
## 23 3 3 12 2.202109 18.897848 0 6.8978483 FALSE FALSE
## 24 3 4 18 1.847787 18.897848 0 0.8978483 FALSE FALSE
## 31 4 1 0 2.028736 2.094211 0 2.0942106 TRUE TRUE
## 41 5 1 0 2.755369 9.442380 0 9.4423798 FALSE TRUE
where sub.id = subject id, vist = visit number at time meas.time, marker = marker value, time = event or censoring time, status = event status (1 = event observed, 0 otherwise), t.star = time since measurement, i.e. time - meas.time, in.ncc = index of whether the subject was selected into the CCH study, ix.first = index of the first visits.
To predict risk and evaluation measures based on the CCH example dataset, run the following:
tau <- 6
dc.list <- NULL
results.list <- NULL
results.mx <- NULL
for(si.i in seq(6, 36, 6)){
# P = pert = 500 # ncc =T/F NCC/CCH analysis ie. ncc = T = NCC, ncc = F = CCH.
dc <- set.data.control(si = si.i, ti = si.i + tau, P = 500, seed = 1, ncc = F)
results.list <- c(results.list, main(dc, d.full = d.cch.cohort))
dc.list <- c(dc.list, list(dc))
# save(list = ls(), file = 'results-cch.rdata')
}
for(i in 1:length(results.list)){
res.out <- with(results.list[[i]], cbind(sim.stats, sim.stats.se, n.case, n.ctrl, n.cens))
dc.out <- with(dc.list[[i]], c(si, ti, pred.time, P))
res.dc.out <- cbind(matrix(rep(dc.out, each = dc$n.stats), nrow = dc$n.stats, byrow = F), round(res.out, 4))
results.mx <- rbind(results.mx, res.dc.out)
}
colnames(results.mx) <- c('s', 't', 'tau_0', 'Pert', 'Est', 'SE', 'n.case', 'n.ctrl', 'n.cens')
save(list = ls(), file = paste('results-cch-table-p', dc$P, '-tau', tau, '.rdata', sep = ''))
| s | t | tau_0 | Pert | Est | SE | n.case | n.ctrl | n.cens | |
|---|---|---|---|---|---|---|---|---|---|
| PE | 6 | 12 | 6 | 500 | 0.041 | 0.0045 | 74 | 639 | 124 |
| TPF(0.4) | 6 | 12 | 6 | 500 | 0.052 | 0.038 | 74 | 639 | 124 |
| FPF(0.3) | 6 | 12 | 6 | 500 | 0.0078 | 0.0041 | 74 | 639 | 124 |
| AUC | 6 | 12 | 6 | 500 | 0.81 | 0.025 | 74 | 639 | 124 |
| PCF(0.2) | 6 | 12 | 6 | 500 | 0.61 | 0.059 | 74 | 639 | 124 |
| PNF(0.8) | 6 | 12 | 6 | 500 | 0.41 | 0.048 | 74 | 639 | 124 |
| s | t | tau_0 | Pert | Est | SE | n.case | n.ctrl | n.cens | |
|---|---|---|---|---|---|---|---|---|---|
| PE | 36 | 42 | 6 | 500 | 0.098 | 0.017 | 26 | 84 | 18 |
| TPF(0.4) | 36 | 42 | 6 | 500 | 0.22 | 0.1 | 26 | 84 | 18 |
| FPF(0.3) | 36 | 42 | 6 | 500 | 0.09 | 0.043 | 26 | 84 | 18 |
| AUC | 36 | 42 | 6 | 500 | 0.7 | 0.067 | 26 | 84 | 18 |
| PCF(0.2) | 36 | 42 | 6 | 500 | 0.53 | 0.11 | 26 | 84 | 18 |
| PNF(0.8) | 36 | 42 | 6 | 500 | 0.69 | 0.16 | 26 | 84 | 18 |
| s | t | tau_0 | Pert | Est | SE | n.case | n.ctrl | n.cens | |
|---|---|---|---|---|---|---|---|---|---|
| PE | 6 | 12 | 6 | 500 | 0.035 | 0.0046 | 53 | 512 | 8 |
| TPF(0.4) | 6 | 12 | 6 | 500 | 0.13 | 0.053 | 53 | 512 | 8 |
| FPF(0.3) | 6 | 12 | 6 | 500 | 0.01 | 0.0049 | 53 | 512 | 8 |
| AUC | 6 | 12 | 6 | 500 | 0.79 | 0.034 | 53 | 512 | 8 |
| PCF(0.2) | 6 | 12 | 6 | 500 | 0.55 | 0.069 | 53 | 512 | 8 |
| PNF(0.8) | 6 | 12 | 6 | 500 | 0.44 | 0.07 | 53 | 512 | 8 |
| PE | 12 | 18 | 6 | 500 | 0.063 | 0.0061 | 74 | 423 | 15 |
| TPF(0.4) | 12 | 18 | 6 | 500 | 0.24 | 0.061 | 74 | 423 | 15 |
| FPF(0.3) | 12 | 18 | 6 | 500 | 0.037 | 0.01 | 74 | 423 | 15 |
| AUC | 12 | 18 | 6 | 500 | 0.83 | 0.024 | 74 | 423 | 15 |
| PCF(0.2) | 12 | 18 | 6 | 500 | 0.61 | 0.058 | 74 | 423 | 15 |
| PNF(0.8) | 12 | 18 | 6 | 500 | 0.35 | 0.057 | 74 | 423 | 15 |
| PE | 18 | 24 | 6 | 500 | 0.1 | 0.0091 | 89 | 310 | 24 |
| TPF(0.4) | 18 | 24 | 6 | 500 | 0.24 | 0.046 | 89 | 310 | 24 |
| FPF(0.3) | 18 | 24 | 6 | 500 | 0.088 | 0.015 | 89 | 310 | 24 |
| AUC | 18 | 24 | 6 | 500 | 0.8 | 0.026 | 89 | 310 | 24 |
| PCF(0.2) | 18 | 24 | 6 | 500 | 0.56 | 0.051 | 89 | 310 | 24 |
| PNF(0.8) | 18 | 24 | 6 | 500 | 0.4 | 0.063 | 89 | 310 | 24 |
| PE | 24 | 30 | 6 | 500 | 0.11 | 0.012 | 57 | 226 | 27 |
| TPF(0.4) | 24 | 30 | 6 | 500 | 0.22 | 0.064 | 57 | 226 | 27 |
| FPF(0.3) | 24 | 30 | 6 | 500 | 0.12 | 0.024 | 57 | 226 | 27 |
| AUC | 24 | 30 | 6 | 500 | 0.78 | 0.031 | 57 | 226 | 27 |
| PCF(0.2) | 24 | 30 | 6 | 500 | 0.47 | 0.07 | 57 | 226 | 27 |
| PNF(0.8) | 24 | 30 | 6 | 500 | 0.46 | 0.052 | 57 | 226 | 27 |
| PE | 30 | 36 | 6 | 500 | 0.12 | 0.014 | 50 | 145 | 31 |
| TPF(0.4) | 30 | 36 | 6 | 500 | 0.31 | 0.069 | 50 | 145 | 31 |
| FPF(0.3) | 30 | 36 | 6 | 500 | 0.12 | 0.029 | 50 | 145 | 31 |
| AUC | 30 | 36 | 6 | 500 | 0.81 | 0.031 | 50 | 145 | 31 |
| PCF(0.2) | 30 | 36 | 6 | 500 | 0.47 | 0.061 | 50 | 145 | 31 |
| PNF(0.8) | 30 | 36 | 6 | 500 | 0.41 | 0.053 | 50 | 145 | 31 |
| PE | 36 | 42 | 6 | 500 | 0.11 | 0.017 | 19 | 104 | 22 |
| TPF(0.4) | 36 | 42 | 6 | 500 | 0.37 | 0.11 | 19 | 104 | 22 |
| FPF(0.3) | 36 | 42 | 6 | 500 | 0.2 | 0.041 | 19 | 104 | 22 |
| AUC | 36 | 42 | 6 | 500 | 0.72 | 0.059 | 19 | 104 | 22 |
| PCF(0.2) | 36 | 42 | 6 | 500 | 0.42 | 0.11 | 19 | 104 | 22 |
| PNF(0.8) | 36 | 42 | 6 | 500 | 0.53 | 0.071 | 19 | 104 | 22 |
| s | t | tau_0 | Pert | Est | SE | n.case | n.ctrl | n.cens | |
|---|---|---|---|---|---|---|---|---|---|
| PE | 6 | 12 | 6 | 500 | 0.041 | 0.0045 | 74 | 639 | 124 |
| TPF(0.4) | 6 | 12 | 6 | 500 | 0.052 | 0.038 | 74 | 639 | 124 |
| FPF(0.3) | 6 | 12 | 6 | 500 | 0.0078 | 0.0041 | 74 | 639 | 124 |
| AUC | 6 | 12 | 6 | 500 | 0.81 | 0.025 | 74 | 639 | 124 |
| PCF(0.2) | 6 | 12 | 6 | 500 | 0.61 | 0.059 | 74 | 639 | 124 |
| PNF(0.8) | 6 | 12 | 6 | 500 | 0.41 | 0.048 | 74 | 639 | 124 |
| PE | 12 | 18 | 6 | 500 | 0.06 | 0.0059 | 89 | 475 | 75 |
| TPF(0.4) | 12 | 18 | 6 | 500 | 0.14 | 0.06 | 89 | 475 | 75 |
| FPF(0.3) | 12 | 18 | 6 | 500 | 0.021 | 0.0069 | 89 | 475 | 75 |
| AUC | 12 | 18 | 6 | 500 | 0.83 | 0.024 | 89 | 475 | 75 |
| PCF(0.2) | 12 | 18 | 6 | 500 | 0.65 | 0.058 | 89 | 475 | 75 |
| PNF(0.8) | 12 | 18 | 6 | 500 | 0.36 | 0.071 | 89 | 475 | 75 |
| PE | 18 | 24 | 6 | 500 | 0.095 | 0.0082 | 104 | 326 | 45 |
| TPF(0.4) | 18 | 24 | 6 | 500 | 0.2 | 0.059 | 104 | 326 | 45 |
| FPF(0.3) | 18 | 24 | 6 | 500 | 0.054 | 0.014 | 104 | 326 | 45 |
| AUC | 18 | 24 | 6 | 500 | 0.77 | 0.027 | 104 | 326 | 45 |
| PCF(0.2) | 18 | 24 | 6 | 500 | 0.45 | 0.048 | 104 | 326 | 45 |
| PNF(0.8) | 18 | 24 | 6 | 500 | 0.42 | 0.061 | 104 | 326 | 45 |
| PE | 24 | 30 | 6 | 500 | 0.1 | 0.011 | 85 | 208 | 33 |
| TPF(0.4) | 24 | 30 | 6 | 500 | 0.28 | 0.072 | 85 | 208 | 33 |
| FPF(0.3) | 24 | 30 | 6 | 500 | 0.09 | 0.022 | 85 | 208 | 33 |
| AUC | 24 | 30 | 6 | 500 | 0.83 | 0.027 | 85 | 208 | 33 |
| PCF(0.2) | 24 | 30 | 6 | 500 | 0.58 | 0.06 | 85 | 208 | 33 |
| PNF(0.8) | 24 | 30 | 6 | 500 | 0.33 | 0.071 | 85 | 208 | 33 |
| PE | 30 | 36 | 6 | 500 | 0.11 | 0.014 | 54 | 128 | 26 |
| TPF(0.4) | 30 | 36 | 6 | 500 | 0.21 | 0.074 | 54 | 128 | 26 |
| FPF(0.3) | 30 | 36 | 6 | 500 | 0.095 | 0.029 | 54 | 128 | 26 |
| AUC | 30 | 36 | 6 | 500 | 0.76 | 0.042 | 54 | 128 | 26 |
| PCF(0.2) | 30 | 36 | 6 | 500 | 0.53 | 0.073 | 54 | 128 | 26 |
| PNF(0.8) | 30 | 36 | 6 | 500 | 0.56 | 0.07 | 54 | 128 | 26 |
| PE | 36 | 42 | 6 | 500 | 0.098 | 0.017 | 26 | 84 | 18 |
| TPF(0.4) | 36 | 42 | 6 | 500 | 0.22 | 0.1 | 26 | 84 | 18 |
| FPF(0.3) | 36 | 42 | 6 | 500 | 0.09 | 0.043 | 26 | 84 | 18 |
| AUC | 36 | 42 | 6 | 500 | 0.7 | 0.067 | 26 | 84 | 18 |
| PCF(0.2) | 36 | 42 | 6 | 500 | 0.53 | 0.11 | 26 | 84 | 18 |
| PNF(0.8) | 36 | 42 | 6 | 500 | 0.69 | 0.16 | 26 | 84 | 18 |