See individual examples below for links to the original papers.
For the explanation of the difference between cumulative/dynamic AUCs vs incident/dynamic AUCs, see the paper by Heargery et al ( http://www.statmed.medicina.unimib.it/statisticalps2011/materiale/Heagerty%20and%20Zheng,%20Biometrics%202005.pdf ).
As I understand it,
Thus, the idea behind incident/dynamic AUCs is closer to the idea of hazard (dynamically changing instantaneous incidence at a given time), and it can handle time-varying predictors. But the question answered by cumulative/dynamic AUCs may be more clinically relevant, e.g., does this model discriminate if I will survive next five years .
age: in years
albumin: serum albumin (g/dl)
alk.phos: alkaline phosphotase (U/liter)
ascites: presence of ascites
ast: aspartate aminotransferase, once called SGOT (U/ml)
bili: serum bilirunbin (mg/dl)
chol: serum cholesterol (mg/dl)
copper: urine copper (ug/day)
edema: 0 no edema, 0.5 untreated or successfully treated
1 edema despite diuretic therapy
hepato: presence of hepatomegaly or enlarged liver
id: case number
platelet: platelet count
protime: standardised blood clotting time
sex: m/f
spiders: blood vessel malformations in the skin
stage: histologic stage of disease (needs biopsy)
status: status at endpoint, 0/1/2 for censored, transplant, dead
time: number of days between registration and the earlier of death,
transplantion, or study analysis in July, 1986
trt: 1/2/NA for D-penicillmain, placebo, not randomised
trig: triglycerides (mg/dl)
Load the PBC dataset and modify for later use
library(survival)
data(pbc)
pbc <- within(pbc, {
## transplant (1) and death (2) are considered events, and marked 1
event <- as.numeric(status %in% c(1,2))
## Create a survival vector
Surv <- Surv(time, event)
})
Kaplan-Meier plots
par(mar = c(2,2,2,2))
layout(matrix(1:4, byrow = T, ncol = 2))
library(rms)
survplot(survfit(Surv ~ 1, data = pbc))
survplot(survfit(Surv ~ age >= median(age), data = pbc), label.curves = list(method = "arrow", cex = 1.2))
survplot(survfit(Surv ~ sex, data = pbc), label.curves = list(method = "arrow", cex = 1.2))
survplot(survfit(Surv ~ albumin >= median(albumin), data = pbc), label.curves = list(method = "arrow", cex = 1.2))
Completely ignore the time variable
It is the simplest method. Logistic regression is used instead of Cox regression model.
Completely ignore the time variable and use the outcome variable as a binary outcome variable. It does not take into acount the variable length of follow-up.
## Load epicalc package to calcuate AUC
library(epicalc)
## Model with age and sex
logit.age.sex <- glm(event ~ age + sex, data = pbc, family = binomial)
lroc(logit.age.sex, graph = F)$auc
[1] 0.5879
## Model with age, sex, and albumin
logit.age.sex.albumin <- glm(event ~ age + sex + albumin, data = pbc, family = binomial)
lroc(logit.age.sex.albumin, graph = F)$auc
[1] 0.6751
Cut the follow up at a specifict time point
Only events that occured within two years are considered events and others are treated as non-events. This method can be valid if the specified time is short enough so that there are few censored subjects. Most people have complete follow-up in this situation.
## Create a variable indicating 2-year event
pbc <- within(pbc, {
outcome2yr <- NA
outcome2yr[(event == 1) & (time <= 2 * 365)] <- 1 # event+ within two years
outcome2yr[(event == 0) | (time > 2 * 365)] <- 0 # otherwise
})
## 2-year outcome model with age and sex
logit.age.sex <- glm(outcome2yr ~ age + sex, data = pbc, family = binomial)
lroc(logit.age.sex, graph = F)$auc
[1] 0.6484
## 2-year outcome model with age, sex, and albumin
logit.age.sex.albumin <- glm(outcome2yr ~ age + sex + albumin, data = pbc, family = binomial)
lroc(logit.age.sex.albumin, graph = F)$auc
[1] 0.7918
The linear predictors (lp), when exponetiated, will provide the predicted hazard ratios for individuals. Thus these can be used as the summary predictors calculated from multiple raw predictors.
## Null model
coxph.null <- coxph(Surv ~ 1, data = pbc)
coxph.null
Call: coxph(formula = Surv ~ 1, data = pbc)
Null model
log likelihood= -1009
n= 418
## Model with age and sex
coxph.age.sex <- coxph(Surv ~ age + sex, data = pbc)
coxph.age.sex
Call:
coxph(formula = Surv ~ age + sex, data = pbc)
coef exp(coef) se(coef) z p
age 0.0221 1.022 0.00727 3.04 0.0024
sexf -0.3000 0.741 0.20975 -1.43 0.1500
Likelihood ratio test=12 on 2 df, p=0.0025 n= 418, number of events= 186
## Model with age, sex, and albumin
coxph.age.sex.albumin <- coxph(Surv ~ age + sex + albumin, data = pbc)
coxph.age.sex.albumin
Call:
coxph(formula = Surv ~ age + sex + albumin, data = pbc)
coef exp(coef) se(coef) z p
age 0.0148 1.015 0.00749 1.97 4.9e-02
sexf -0.3467 0.707 0.21079 -1.64 1.0e-01
albumin -1.4066 0.245 0.17096 -8.23 2.2e-16
Likelihood ratio test=73.9 on 3 df, p=6.66e-16 n= 418, number of events= 186
## These models are significantly different by likelihood ratio test
anova(coxph.age.sex, coxph.age.sex.albumin, test = "LRT")
Analysis of Deviance Table
Cox model: response is Surv
Model 1: ~ age + sex
Model 2: ~ age + sex + albumin
loglik Chisq Df P(>|Chi|)
1 -1003
2 -972 61.9 1 3.7e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Put linear predictors ("lp") into pbc dataset
pbc$lp.null <- predict(coxph.null, type = "lp")
pbc$lp.age.sex <- predict(coxph.age.sex, type = "lp")
pbc$lp.age.sex.albumin <- predict(coxph.age.sex.albumin, type = "lp")
Trying to compare (test) the difference in C is not recommended.
Hmisc::rcorrcens
A larger marker value is considered to be associated with a longer survival by this function. Thus, the linear predictor (the higher, the worse) needs to be negated
library(Hmisc)
## Model with age and sex
rcorrcens(formula = Surv ~ I(-1 * lp.age.sex), data = pbc)
Somers' Rank Correlation for Censored Data Response variable:Surv
C Dxy aDxy SD Z P n
I(-1 * lp.age.sex) 0.569 0.138 0.138 0.043 3.18 0.0015 418
## Model with age, sex, and albumin
rcorrcens(formula = Surv ~ I(-1 * lp.age.sex.albumin), data = pbc)
Somers' Rank Correlation for Censored Data Response variable:Surv
C Dxy aDxy SD Z P n
I(-1 * lp.age.sex.albumin) 0.703 0.407 0.407 0.041 9.86 0 418
survival package
Actually, the summary method for coxph objects prints “Concordance” (five lines from bottom), which is the same thing as the Harrells’C, and \( R^2 \).
## Model with age and sex
summary(coxph.age.sex)
Call:
coxph(formula = Surv ~ age + sex, data = pbc)
n= 418, number of events= 186
coef exp(coef) se(coef) z Pr(>|z|)
age 0.02210 1.02234 0.00727 3.04 0.0024 **
sexf -0.29995 0.74085 0.20975 -1.43 0.1527
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
age 1.022 0.978 1.008 1.04
sexf 0.741 1.350 0.491 1.12
Concordance= 0.569 (se = 0.023 )
Rsquare= 0.028 (max possible= 0.992 )
Likelihood ratio test= 12 on 2 df, p=0.0025
Wald test = 12.2 on 2 df, p=0.0022
Score (logrank) test = 12.3 on 2 df, p=0.00214
## Model with age, sex, and albumin
summary(coxph.age.sex.albumin)
Call:
coxph(formula = Surv ~ age + sex + albumin, data = pbc)
n= 418, number of events= 186
coef exp(coef) se(coef) z Pr(>|z|)
age 0.01476 1.01487 0.00749 1.97 0.049 *
sexf -0.34672 0.70700 0.21079 -1.64 0.100 .
albumin -1.40664 0.24496 0.17096 -8.23 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
age 1.015 0.985 1.000 1.030
sexf 0.707 1.414 0.468 1.069
albumin 0.245 4.082 0.175 0.342
Concordance= 0.703 (se = 0.023 )
Rsquare= 0.162 (max possible= 0.992 )
Likelihood ratio test= 73.9 on 3 df, p=6.66e-16
Wald test = 83.1 on 3 df, p=0
Score (logrank) test = 82.2 on 3 df, p=0
CPE package
library(CPE)
## Model with age and sex
phcpe(coxph.age.sex, CPE.SE = TRUE)
$CPE
[1] 0.5724
$CPE.SE
[1] 0.02023
## Model with age, sex, and albumin
phcpe(coxph.age.sex.albumin, CPE.SE = TRUE)
$CPE
[1] 0.6615
$CPE.SE
[1] 0.01531
clinfun package
library(clinfun)
## Model with age and sex
coxphCPE(coxph.age.sex)
CPE smooth.CPE se.CPE
0.57244 0.57236 0.02023
## Model with age, sex, and albumin
coxphCPE(coxph.age.sex.albumin)
CPE smooth.CPE se.CPE
0.66150 0.66128 0.01531
This package calculates the risk-based time-dependent ROC among the study cohort.
Cases are those who died before time t (cumulative cases).
Controls are those who survived until time t (dynamic controls)
library(survivalROC)
## Define a function
fun.survivalROC <- function(lp, t) {
res <- with(pbc,
survivalROC(Stime = time,
status = event,
marker = get(lp),
predict.time = t,
method = "KM")) # KM method without smoothing
## Plot ROCs
with(res, plot(TP ~ FP, type = "l", main = sprintf("t = %.0f, AUC = %.2f", t, AUC)))
abline(a = 0, b = 1, lty = 2)
res
}
## 2 x 5 layout
layout(matrix(1:10, byrow = T, ncol = 5))
## Model with age and sex
res.survivalROC.age.sex <- lapply(1:10 * 365.25, function(t) {
fun.survivalROC(lp = "lp.age.sex", t)
})
## Model with age, sex, and albumin
res.survivalROC.age.sex.albumin <- lapply(1:10 * 365.25, function(t) {
fun.survivalROC(lp = "lp.age.sex.albumin", t)
})
risksetAUC(): This function creates risksetAUC from a survival data set.
risksetROC(): This function creates risksetROC from a survival data set.
This package calculates the incidence-based time-dependent ROC among the risk set (subpopulation) at time t.
Cases are those who died at time t (incident cases).
Controls are those who survived until time t (dynamic controls)
Use of incident cases rather than cumulative cases allows for assessment of time-dependent predictors.
1- to 10-year ROCs
library(risksetROC)
## Define a function
fun.risksetROC <- function(lp, t) {
res <- with(pbc,
risksetROC(Stime = time,
status = event,
marker = get(lp),
predict.time = t,
plot = FALSE))
## Plot ROCs
with(res, plot(TP ~ FP, type = "l", main = sprintf("t = %.0f, AUC = %.2f", t, AUC)))
abline(a = 0, b = 1, lty = 2)
res
}
## 2 x 5 layout
layout(matrix(1:10, byrow = T, ncol = 5))
## Model with age and sex
risksetROC.age.sex <- lapply(365.25 * 1:10, function(t) {
fun.risksetROC(lp = "lp.age.sex", t)
})
## Model with age, sex, and albumin
risksetROC.age.sex.albumin <- lapply(365.25 * 1:10, function(t) {
fun.risksetROC(lp = "lp.age.sex.albumin", t)
})
Up to 10-year AUCs
## 1 x 2 layout
layout(matrix(1:2, byrow = T, ncol = 2))
## Model with age and sex
risksetAUC.age.sex <- with(pbc,
risksetAUC(Stime = time,
status = event,
marker = lp.age.sex,
tmax = 10 * 365.25))
title(sprintf("t = %.0f, iAUC = %.2f", 10 * 365.25, risksetAUC.age.sex$Cindex))
## Model with age, sex, and albumin
risksetAUC.age.sex.albumin <- with(pbc,
risksetAUC(Stime = time,
status = event,
marker = lp.age.sex.albumin,
tmax = 10 * 365.25))
title(sprintf("t = %.0f, iAUC = %.2f", 10 * 365.25, risksetAUC.age.sex.albumin$Cindex))
These can calculate multiple time-dependent ROC at once, and also compute summary measures of a time-dependent AUC curve (iAUC)
These need a training dataset and a test dataset. The same data can be given to both, and it works although I am not sure if this is correct.
Functions
Time-dependent AUCs for the age sex model are calculated by various methods. (1- to 10-year AUCs)
library(survAUC)
## *Cumulative case*/dynamic control AUC by Chambless and Diao (Stat Med 2006;25:3474-3486.)
res.AUC.cd <- AUC.cd(Surv.rsp = pbc$Surv,
Surv.rsp.new = pbc$Surv,
lp = coxph.age.sex$linear.predictors,
lpnew = coxph.age.sex$linear.predictors,
times = 1:10 * 365.25)
res.AUC.cd$iauc
[1] 0.5895
plot(res.AUC.cd)
## *Cumulative case*/dynamic control AUC by Hung and Chiang (Can J Stat 2010;38:8-26)
res.AUC.hc <- AUC.hc(Surv.rsp = pbc$Surv,
Surv.rsp.new = pbc$Surv,
## lp = coxph.age.sex$linear.predictors,
lpnew = coxph.age.sex$linear.predictors,
times = 1:10 * 365.25)
res.AUC.hc$iauc
[1] 0.6005
plot(res.AUC.hc)
## *Incident case* or *Cumulative case*/dynamic control AUC by Song and Zhou (Biometrics 2011;67:906-16)
res.AUC.sh <- AUC.sh(Surv.rsp = pbc$Surv,
Surv.rsp.new = pbc$Surv,
lp = coxph.age.sex$linear.predictors,
lpnew = coxph.age.sex$linear.predictors,
times = 1:10 * 365.25)
res.AUC.sh$iauc
[1] 0.573
plot(res.AUC.sh)
## *Cumulative case*/dynamic control AUC by Uno et al.
## (http://biostats.bepress.com/cgi/viewcontent.cgi?article=1041&context=harvardbiostat)
res.AUC.uno <- AUC.uno(Surv.rsp = pbc$Surv,
Surv.rsp.new = pbc$Surv,
## lp = coxph.age.sex$linear.predictors,
lpnew = coxph.age.sex$linear.predictors,
times = 1:10 * 365.25)
res.AUC.uno$iauc
[1] 0.5811
plot(res.AUC.uno)
Summary measures (10 years when applicable)
## C-statistic by Uno et al. (Stat Med 2011;30:1105-1117)
res.UnoC <- UnoC(Surv.rsp = pbc$Surv,
Surv.rsp.new = pbc$Surv,
## lp = coxph.age.sex$linear.predictors,
lpnew = coxph.age.sex$linear.predictors,
time = 10 * 365.25) # Upper limit, not time points
res.UnoC
[1] 0.5608
## C-statistic by Begg et al. (Stat Med 2000;19:1997-2014)
res.BeggC <- BeggC(Surv.rsp = pbc$Surv,
Surv.rsp.new = pbc$Surv,
lp = coxph.age.sex$linear.predictors,
lpnew = coxph.age.sex$linear.predictors)
res.BeggC
[1] 0.5519
## Gonen and Heller’s Concordance Index for Cox PH models (Biometrika 2005;92:965-970)
res.GHCI <- GHCI(## Surv.rsp = pbc$Surv,
## Surv.rsp.new = pbc$Surv,
## lp = coxph.age.sex$linear.predictors,
lpnew = coxph.age.sex$linear.predictors)
res.GHCI
[1] 0.5717
*R2-type Coefficients for Cox models *
## O'Quigley et al. (Stat Med 2005;24:479-489)
res.OXS <- OXS(Surv.rsp = pbc$Surv,
lp = coxph.age.sex$linear.predictors,
lp0 = coxph.null$linear.predictors)
res.OXS
[1] 0.06092
## Nagelkerke (Biometrika 1991;78:691-692)
res.Nagelk <- Nagelk(Surv.rsp = pbc$Surv,
lp = coxph.age.sex$linear.predictors,
lp0 = coxph.null$linear.predictors)
res.Nagelk
[1] 0.02779
## Xu et al. (Journal of Nonparametric Statistics 1999;12:83-107)
res.XO <- XO(Surv.rsp = pbc$Surv,
lp = coxph.age.sex$linear.predictors,
lp0 = coxph.null$linear.predictors)
res.XO
[1] 0.04166
## Create numeric variable for sex
pbc$female <- as.numeric(pbc$sex == "f")
C-statistics (10-years follow up) using survC1 package
library(survC1)
## C-statistic for age sex model
unoC.age.female <- Est.Cval(mydata = pbc[,c("time","event","age","female")], tau = 10 * 365.25)
unoC.age.female$Dhat
[1] 0.5618
## C-statistic for age sex albumin model
unoC.age.female.albumin <- Est.Cval(mydata = pbc[,c("time","event","age","female","albumin")], tau = 10 * 365.25)
unoC.age.female.albumin$Dhat
[1] 0.672
Comparison of C-statistics
mydata: Input data. The 1st column should be time-to-event, and the
2nd column is event indicator (1=event, 0=censor).
covs0: A matrix that consists of a set of predictors for a base
model (Model 0)
covs1: A matrix that consists of a set of predictors for a new model
(Model 1)
tau: Truncation time. The resulting C tells how well the given
prediction model works in predicting events that occur in the
time range from 0 to ‘tau’. Note that the survival function
for the underlying censoring time distribution needs to be
positive at ‘tau’.
itr: Iteration of perturbation-resampling.
## Comaprison of C-statistics
uno.C.delta <- Inf.Cval.Delta(mydata = pbc[,c("time","event")],
covs0 = pbc[,c("age","female")], # age sex model
covs1 = pbc[,c("age","female","albumin")], # age sex albumin model
tau = 10 * 365.25, # Trucation time (max time to consider)
itr = 1000) # Iteration of perturbation-resampling
uno.C.delta
Est SE Lower95 Upper95
Model1 0.6720 0.02137 0.63009 0.7138
Model0 0.5618 0.02266 0.51736 0.6061
Delta 0.1102 0.02367 0.06384 0.1566
IDI, continous NRI, and median improvement (10-years follow up) using survIDINRI
library(survIDINRI)
res.IDI.INF <- IDI.INF(indata = pbc[,c("time","event")],
covs0 = pbc[,c("age","female")], # age sex model
covs1 = pbc[,c("age","female","albumin")], # age sex albumin model
t0 = 10 * 365.25,
npert = 300, npert.rand = NULL, seed1 = NULL, alpha = 0.05)
## M1 IDI; M2 continuous NRI; M3 median improvement
IDI.INF.OUT(res.IDI.INF)
Est. Lower Upper
M1 0.101 0.027 0.162
M2 0.254 0.015 0.453
M3 0.094 0.018 0.180
## M1 red area; M2 distance between black points; M3 distance between gray points
IDI.INF.GRAPH(res.IDI.INF)