(Nice review) Evaluating health outcomes in the presence of competing risks: a review of statistical methods and clinical applications. http://www.pubmed.org/20473207
Competing risk analysis using R: an easy guide for clinicians: http://www.nature.com/bmt/journal/v40/n4/full/1705727a.html
Regression modeling of competing risk using R: an in depth guide for clinicians: http://www.nature.com/bmt/journal/v45/n9/full/bmt2009359a.html
When the event of interest is relapse of cancer, there is a problem about how to treat non-relapse death. It can be defined as censoring, however, death prevents relapse from happening, violating the assumption those who remained and censored have the same chance of experiencing the event of interest within levels of covariates. Therefore, death is a competing risk event that eliminates the chance of having relapse, i.e., there are multiple modes of failure. Several methods play different roles in such situtations. These are the event-free survival method, the cause-specific survival method, and the cumulative incidence function method.
## Load survival
library(survival)
## Load the cmprsk package
library(cmprsk)
## Load the example data used in the BMT article
bmt <- read.csv("http://www.stat.unipg.it/luca/R/bmt.csv", sep = ";")
## Check data
head(bmt)
## dis ftime status
## 1 0 13 2
## 2 0 1 1
## 3 0 72 0
## 4 0 7 2
## 5 0 8 2
## 6 1 67 0
## dis disease 0 ALL; 1 AML
## ftime follow-up time
## status 0 censored (survival); 1 Transplant-related mortality; 2 relapse
## Label diseases
bmt$dis <- factor(bmt$dis, levels = c(0,1), labels = c("ALL", "AML"))
## Label status
bmt$status <- factor(bmt$status, levels = c(0,1,2), labels = c("Censored","Mortality","Relapse"))
## Summary
summary(bmt)
## dis ftime status
## ALL:17 Min. : 0.0 Censored :11
## AML:18 1st Qu.: 3.0 Mortality: 9
## Median : 7.0 Relapse :15
## Mean :16.3
## 3rd Qu.:18.0
## Max. :72.0
## Cross table
addmargins(xtabs(data = bmt, ~ dis + status))
## status
## dis Censored Mortality Relapse Sum
## ALL 2 3 12 17
## AML 9 6 3 18
## Sum 11 9 15 35
This method focuses on the non-occurrence of events (event-free status). It is often the outcome of interest in cancer trials, no death AND no relapse is important. In the Kaplan-Meier plot, the proportion estimates the proportion of people who have not experienced any of the endpoints (relapse or mortality). Here AML patients have a better chance of surviving without experiencing relapse or mortality. Plotted inversely (1 - EFS KM estimate), the curves are interpretable as the estimated proportions of patients experiencing one of the endpoints (relapse or mortality) over time.
## Calculate the EFS by disease
resEfsByDis <- survfit(formula = Surv(ftime, status != "Censored") ~ dis,
data = bmt,
type = "kaplan-meier",
error = "greenwood",
conf.type = "log-log")
## Numerical results
summary(resEfsByDis, times = c(0,5,10,15,20,30,40,50,60))
## Call: survfit(formula = Surv(ftime, status != "Censored") ~ dis, data = bmt,
## type = "kaplan-meier", error = "greenwood", conf.type = "log-log")
##
## dis=ALL
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 17 2 0.882 0.0781 0.6060 0.969
## 5 11 5 0.588 0.1194 0.3254 0.778
## 10 7 3 0.412 0.1194 0.1858 0.626
## 15 4 3 0.235 0.1029 0.0731 0.449
## 20 4 0 0.235 0.1029 0.0731 0.449
## 30 2 2 0.118 0.0781 0.0196 0.312
## 40 1 0 0.118 0.0781 0.0196 0.312
## 50 1 0 0.118 0.0781 0.0196 0.312
## 60 1 0 0.118 0.0781 0.0196 0.312
##
## dis=AML
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 18 0 1.000 0.000 1.000 1.000
## 5 10 6 0.649 0.116 0.380 0.825
## 10 6 3 0.426 0.131 0.178 0.656
## 15 5 0 0.426 0.131 0.178 0.656
## 20 5 0 0.426 0.131 0.178 0.656
## 30 5 0 0.426 0.131 0.178 0.656
## 40 3 0 0.426 0.131 0.178 0.656
## 50 3 0 0.426 0.131 0.178 0.656
## 60 3 0 0.426 0.131 0.178 0.656
## Plot Kaplan-Meier
rms::survplot(resEfsByDis)
## Plot 1 - EFS KM
rms::survplot(resEfsByDis, fun = function(x) {1 - x})
## Regression
resEfsCoxByDis <- coxph(formula = Surv(ftime, status != "Censored") ~ dis,
data = bmt,
ties = c("efron","breslow","exact")[1])
summary(resEfsCoxByDis)
## Call:
## coxph(formula = Surv(ftime, status != "Censored") ~ dis, data = bmt,
## ties = c("efron", "breslow", "exact")[1])
##
## n= 35, number of events= 24
##
## coef exp(coef) se(coef) z Pr(>|z|)
## disAML -0.565 0.569 0.423 -1.33 0.18
##
## exp(coef) exp(-coef) lower .95 upper .95
## disAML 0.569 1.76 0.248 1.3
##
## Concordance= 0.56 (se = 0.06 )
## Rsquare= 0.051 (max possible= 0.983 )
## Likelihood ratio test= 1.84 on 1 df, p=0.175
## Wald test = 1.78 on 1 df, p=0.182
## Score (logrank) test = 1.82 on 1 df, p=0.177
This method looks at each event type separately. This describes what would have happened in the complete absence of the other event type. This is appropriate if the purpose is the inference about the biological relationship between the event of interest and covariates. For description, it will overestimate the incidence because in the absence of the competing risk event, which this method hypothesizes, the event of interest has more chance to occur. The 1 - CSH KM should also be interpreted cautiously. It represents the occurrence of the event in the hypothetical situation in which the other event type is absent. Thus, the event rate is overestimated compared to the reality, in which occurrence of the other event prevents the event of interest from happening as much as it would otherwise.
\( F_{CSH k}(t) = \int_0^t \lambda_k(u) S_k(u) du \)
\( S_k(u) \) here is the cause-specific survival, i.e. the proportion free of the event type \( k \).
Relapse (Event of interest)
## Calculate the overall CSH for relapse
resCshRelByDis <- survfit(formula = Surv(ftime, status == "Relapse") ~ dis,
data = bmt,
type = "kaplan-meier",
error = "greenwood",
conf.type = "log-log")
## Numerical results
summary(resCshRelByDis, times = c(0,5,10,15,20,30,40,50,60))
## Call: survfit(formula = Surv(ftime, status == "Relapse") ~ dis, data = bmt,
## type = "kaplan-meier", error = "greenwood", conf.type = "log-log")
##
## dis=ALL
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 17 1 0.941 0.0571 0.6502 0.991
## 5 11 4 0.676 0.1200 0.3859 0.851
## 10 7 3 0.473 0.1290 0.2168 0.693
## 15 4 2 0.315 0.1252 0.1038 0.555
## 20 4 0 0.315 0.1252 0.1038 0.555
## 30 2 2 0.158 0.1007 0.0263 0.391
## 40 1 0 0.158 0.1007 0.0263 0.391
## 50 1 0 0.158 0.1007 0.0263 0.391
## 60 1 0 0.158 0.1007 0.0263 0.391
##
## dis=AML
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 18 0 1.000 0.000 1.000 1.000
## 5 10 1 0.944 0.054 0.666 0.992
## 10 6 2 0.708 0.155 0.303 0.905
## 15 5 0 0.708 0.155 0.303 0.905
## 20 5 0 0.708 0.155 0.303 0.905
## 30 5 0 0.708 0.155 0.303 0.905
## 40 3 0 0.708 0.155 0.303 0.905
## 50 3 0 0.708 0.155 0.303 0.905
## 60 3 0 0.708 0.155 0.303 0.905
## Plot Kaplan-Meier
rms::survplot(resCshRelByDis)
## Plot 1 - CSH KM
rms::survplot(resCshRelByDis, fun = function(x) {1 - x})
## Regression
resCshRelCoxByDis <- coxph(formula = Surv(ftime, status == "Relapse") ~ dis,
data = bmt,
ties = c("efron","breslow","exact")[1])
summary(resCshRelCoxByDis)
## Call:
## coxph(formula = Surv(ftime, status == "Relapse") ~ dis, data = bmt,
## ties = c("efron", "breslow", "exact")[1])
##
## n= 35, number of events= 15
##
## coef exp(coef) se(coef) z Pr(>|z|)
## disAML -1.449 0.235 0.648 -2.24 0.025 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## disAML 0.235 4.26 0.066 0.836
##
## Concordance= 0.659 (se = 0.075 )
## Rsquare= 0.165 (max possible= 0.915 )
## Likelihood ratio test= 6.3 on 1 df, p=0.0121
## Wald test = 5 on 1 df, p=0.0253
## Score (logrank) test = 5.91 on 1 df, p=0.015
Mortality (Competing risk event)
## Calculate the overall CSH for mortality
resCshMortByDis <- survfit(formula = Surv(ftime, status == "Mortality") ~ dis,
data = bmt,
type = "kaplan-meier",
error = "greenwood",
conf.type = "log-log")
## Numerical results
summary(resCshMortByDis, times = c(0,5,10,15,20,30,40,50,60))
## Call: survfit(formula = Surv(ftime, status == "Mortality") ~ dis, data = bmt,
## type = "kaplan-meier", error = "greenwood", conf.type = "log-log")
##
## dis=ALL
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 17 1 0.941 0.0571 0.650 0.991
## 5 11 1 0.878 0.0807 0.595 0.968
## 10 7 0 0.878 0.0807 0.595 0.968
## 15 4 1 0.753 0.1352 0.375 0.921
## 20 4 0 0.753 0.1352 0.375 0.921
## 30 2 0 0.753 0.1352 0.375 0.921
## 40 1 0 0.753 0.1352 0.375 0.921
## 50 1 0 0.753 0.1352 0.375 0.921
## 60 1 0 0.753 0.1352 0.375 0.921
##
## dis=AML
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 18 0 1.000 0.000 1.000 1.000
## 5 10 5 0.688 0.116 0.405 0.856
## 10 6 1 0.602 0.129 0.312 0.801
## 15 5 0 0.602 0.129 0.312 0.801
## 20 5 0 0.602 0.129 0.312 0.801
## 30 5 0 0.602 0.129 0.312 0.801
## 40 3 0 0.602 0.129 0.312 0.801
## 50 3 0 0.602 0.129 0.312 0.801
## 60 3 0 0.602 0.129 0.312 0.801
## Plot Kaplan-Meier
rms::survplot(resCshMortByDis)
## Plot 1 - CSH KM
rms::survplot(resCshMortByDis, fun = function(x) {1 - x})
## Regression
resCshMortCoxByDis <- coxph(formula = Surv(ftime, status == "Mortality") ~ dis,
data = bmt,
ties = c("efron","breslow","exact")[1])
summary(resCshMortCoxByDis)
## Call:
## coxph(formula = Surv(ftime, status == "Mortality") ~ dis, data = bmt,
## ties = c("efron", "breslow", "exact")[1])
##
## n= 35, number of events= 9
##
## coef exp(coef) se(coef) z Pr(>|z|)
## disAML 0.665 1.944 0.709 0.94 0.35
##
## exp(coef) exp(-coef) lower .95 upper .95
## disAML 1.94 0.514 0.484 7.81
##
## Concordance= 0.571 (se = 0.093 )
## Rsquare= 0.026 (max possible= 0.807 )
## Likelihood ratio test= 0.93 on 1 df, p=0.335
## Wald test = 0.88 on 1 df, p=0.349
## Score (logrank) test = 0.91 on 1 df, p=0.34
This approach describes the estimated cumulative incidence (risk over time) of each event type in the presence of the other event types. That is, the risk set attrition due to the occurrence of the competing risk is accounted for when estimating the occurrence of the event of interest. The cumulative incidence function for the event type \( k \) is mathematically represented as follows.
\( F_k(t) = \int_0^t \lambda_k(u) S(u) du = \int_0^t f_k(u) du \)
\( S_(u) \) here is the overall survival, i.e. the proportion free of any of the event types.
This is the integration of the event-specific density function \( f_k(\cdot) \). The multiplication by the event-free survival function \( S(t) \) accounts for the risk set attrition by all event types. This in a way “adjusts” the risk set size to the actual size. In the cause-specific hazard analysis, on the other hand, the risk set size is not “adjusted” for attrition to other event types.
Relapse
## Calculate the grouped cumulative incidence functions (CIF)
resCumIncByDis <- cuminc(ftime = bmt$ftime, # failure time variable
fstatus = bmt$status, # variable with distinct codes for different causes of failure
group = bmt$dis, # estimates will calculated within groups
## strata = , # Tests will be stratified on this variable.
rho = 0, # Power of the weight function used in the tests.
cencode = "Censored", # value of fstatus variable which indicates the failure time is censored.
## subset = ,
## na.action = na.omit
)
## CIF and the variance of point estimates
timepoints(resCumIncByDis, times = c(0,5,10,15,20,30,40,50,60))
## $est
## 0 5 10 15 20 30 40 50 60
## ALL Mortality 0.05882 0.11765 0.1176 0.1765 0.1765 0.1765 0.1765 0.1765 0.1765
## AML Mortality 0.00000 0.29514 0.3682 0.3682 0.3682 0.3682 0.3682 0.3682 0.3682
## ALL Relapse 0.05882 0.29412 0.4706 0.5882 0.5882 0.7059 0.7059 0.7059 0.7059
## AML Relapse 0.00000 0.05556 0.2057 0.2057 0.2057 0.2057 0.2057 0.2057 0.2057
##
## $var
## 0 5 10 15 20 30 40 50 60
## ALL Mortality 0.00346 0.006490 0.00649 0.009344 0.009344 0.009344 0.009344 0.009344 0.009344
## AML Mortality 0.00000 0.013175 0.01601 0.016011 0.016011 0.016011 0.016011 0.016011 0.016011
## ALL Relapse 0.00346 0.013061 0.01595 0.016046 0.016046 0.014585 0.014585 0.014585 0.014585
## AML Relapse 0.00000 0.003086 0.01336 0.013358 0.013358 0.013358 0.013358 0.013358 0.013358
## Plot
plot(resCumIncByDis)
## Regression (crr can only take a covariate matrix)
bmtDisMat <- matrix(as.numeric(bmt$dis == "AML"))
colnames(bmtDisMat) <- "dis"
resCrrRelByDis <- crr(ftime = bmt$ftime, # vector of failure/censoring times
fstatus = bmt$status, # vector with a unique code for each failure type and censoring
cov1 = bmtDisMat, # matrix (nobs x ncovs) of fixed covariates
## cov2 = , # matrix of covariates that will be multiplied by functions of time
## tf = , # functions of time
## cengroup = , # vector with different values for each group with a distinct censoring distribution
failcode = "Relapse", # code of fstatus that denotes the failure type of interest
cencode = "Censored" # code of fstatus that denotes censored observations
)
summary(resCrrRelByDis)
## Competing Risks Regression
##
## Call:
## crr(ftime = bmt$ftime, fstatus = bmt$status, cov1 = bmtDisMat,
## failcode = "Relapse", cencode = "Censored")
##
## coef exp(coef) se(coef) z p-value
## dis -1.6 0.201 0.64 -2.51 0.012
##
## exp(coef) exp(-coef) 2.5% 97.5%
## dis 0.201 4.98 0.0573 0.705
##
## Num. cases = 35
## Pseudo Log-likelihood = -43.9
## Pseudo likelihood ratio test = 7.89 on 1 df,
Mortality
## cuminc() part is the same (omitted).
## Regression (crr can only take an n*p covariate matrix)
resCrrMortByDis <- crr(ftime = bmt$ftime, # vector of failure/censoring times
fstatus = bmt$status, # vector with a unique code for each failure type and censoring
cov1 = bmtDisMat, # matrix (nobs x ncovs) of fixed covariates
## cov2 = , # matrix of covariates that will be multiplied by functions of time
## tf = , # functions of time
## cengroup = , # vector with different values for each group with a distinct censoring distribution
failcode = "Mortality", # code of fstatus that denotes the failure type of interest
cencode = "Censored" # code of fstatus that denotes censored observations
)
summary(resCrrMortByDis)
## Competing Risks Regression
##
## Call:
## crr(ftime = bmt$ftime, fstatus = bmt$status, cov1 = bmtDisMat,
## failcode = "Mortality", cencode = "Censored")
##
## coef exp(coef) se(coef) z p-value
## dis 0.765 2.15 0.699 1.09 0.27
##
## exp(coef) exp(-coef) 2.5% 97.5%
## dis 2.15 0.465 0.546 8.46
##
## Num. cases = 35
## Pseudo Log-likelihood = -29.9
## Pseudo likelihood ratio test = 1.23 on 1 df,
Here the CIFs and 1 - CSH KM are plotted together. The 1 - CSH KM (ticked lines) are all overestimating the occurrence over time. Both are actually useful if what they represent is understood.
CIF describes the occurrence of each event type in the presence of the other(s), i.e., it describes what happened in the data. This is “truthful” to the data. However, the estimation only applies to the data it was derived from and data with the same competing risk event structure.
1 - CSH KM describes the occurence of one type of event in the absence of the other(s), i.e., it describes what whould have happened if there had not been any competing risk events. This is hypothetical, but it can potentially be compared across data with different competing risk structures.
## Plot CIFs
plot(resCumIncByDis, col = 1:4, lwd = 3, lty = 1, xlab = "")
## Add lines from the 1 - KM (Relapse)
lines(resCshRelByDis, lwd = 1.5, col = c("green","blue"), fun = function(x) {1 - x})
## Add lines from the 1 - KM (Mortality)
lines(resCshMortByDis, lwd = 1.5, col = c("black","red"), fun = function(x) {1 - x})
The results of regression analysis need careful interpretation. There are hazard ratios for the AML compared to ALL from different models.
EFS 0.57 means that the overall event-free survival is better for the AML patients by 0.57 times than for the ALL patients.
CSH_Rel 0.23 means that in the ideal world without non-relapse mortality, the AML patients are 0.23 times less likely to have relapses and are better off than the ALL patients.
CSH_Mort 1.94 means that in the ideal world without relapse, the AML patients are 1.94 times more likely to have non-relapse mortality and are worse off than the ALL patients.
CIF_Rel 0.20 means that in the presence of non-relapse mortality, the AML patients are 0.20 times less likely to have relapses. This is lower than the CSH_Rel estimate. The AML patients have higher attrition of the risk set to non-relapse mortality, and because they are already dead, they are not able to have relapses.
CIF_Mort 2.15 means that in the presence of relapse, the AML patients are 2.15 times more likely to have non-relapse mortality. This is higher than the CSH_Rel estimate. The AML patients have lower attrition of the risk set to relapse, and because they remain relapse-free, they can have more non-relapse mortality.
In this specific case, the effect of having AML rather than ALL is in the opposite directions (fewer relapses, but more non-relapse mortality). In such a stituation, the cumulative incidence function method has an “exaggerated” effect. This is because beneficial effect on one end point (relapse) makes more people available for the other endpoint (non-relapse mortality). On the other hand, the effect on the event-free survival will be mixture of positive effect and negative effect, and will have less power. The directions of pure effects in the ideal world are most easily captured by the CSH method.
listOfRes <- list(EFS = summary(resEfsCoxByDis)$conf.int ,
CSH_Rel = summary(resCshRelCoxByDis)$conf.int,
CSH_Mort = summary(resCshMortCoxByDis)$conf.int,
CIF_Rel = summary(resCrrRelByDis)$conf.int,
CIF_Mort = summary(resCrrMortByDis)$conf.int
)
matOfRes <- do.call(rbind, listOfRes)
rownames(matOfRes) <- names(listOfRes)
round(matOfRes[,c(1,3,4)], 2)
## exp(coef) lower .95 upper .95
## EFS 0.57 0.25 1.30
## CSH_Rel 0.23 0.07 0.84
## CSH_Mort 1.94 0.48 7.81
## CIF_Rel 0.20 0.06 0.71
## CIF_Mort 2.15 0.55 8.46
## Load the example data used in the BMT article
bmtcrr <- read.csv("http://www.stat.unipg.it/luca/R/bmtcrr.csv")
## Check data
head(bmtcrr)
## Sex D Phase Age Status Source ftime
## 1 M ALL Relapse 48 2 BM+PB 0.67
## 2 F AML CR2 23 1 BM+PB 9.50
## 3 M ALL CR3 7 0 BM+PB 131.77
## 4 F ALL CR2 26 2 BM+PB 24.03
## 5 F ALL CR2 36 2 BM+PB 1.47
## 6 M ALL Relapse 17 2 BM+PB 2.23
## Sex: F or M
## D: ALL or AML
## Phase: CR1 CR2 CR3 Relapse
## Age:
## Status: 0 Censored 1 Relapse 2 Transplant-related death
## Source: BM+PB PB
## ftime:
## Label status
bmtcrr$Status <- factor(bmtcrr$Status, levels = 0:2, labels = c("Censored", "Relapse", "Mortality"))
## Make males reference
bmtcrr$Sex <- relevel(bmtcrr$Sex, ref = "M")
## Make relapse phase reference
bmtcrr$Phase <- relevel(bmtcrr$Phase, ref = "Relapse")
## Summary
summary(bmtcrr)
## Sex D Phase Age Status Source ftime
## M:100 ALL: 73 Relapse:73 Min. : 4.0 Censored :46 BM+PB: 21 Min. : 0.13
## F: 77 AML:104 CR1 :47 1st Qu.:20.0 Relapse :56 PB :156 1st Qu.: 3.23
## CR2 :45 Median :29.0 Mortality:75 Median : 6.63
## CR3 :12 Mean :30.5 Mean : 20.28
## 3rd Qu.:40.0 3rd Qu.: 19.90
## Max. :62.0 Max. :131.77
## Cross table
addmargins(xtabs(~ D + Status, data = bmtcrr))
## Status
## D Censored Relapse Mortality Sum
## ALL 17 28 28 73
## AML 29 28 47 104
## Sum 46 56 75 177
This method focuses on the non-occurrence of events (event-free status). It is often the outcome of interest in cancer trials, no death AND no relapse is important. In the Kaplan-Meier plot, the proportion estimates the proportion of people who have not experienced any of the endpoints (relapse or mortality). Here AML patients have a better chance of surviving without experiencing relapse or mortality. Plotted inversely (1 - EFS KM estimate), the curves are interpretable as the estimated proportions of patients experiencing one of the endpoints (relapse or mortality) over time.
resEfsCox <- coxph(formula = Surv(ftime, Status != "Censored") ~ D + Age + Sex + Phase + Source,
data = bmtcrr,
ties = c("efron","breslow","exact")[1])
summary(resEfsCox)
## Call:
## coxph(formula = Surv(ftime, Status != "Censored") ~ D + Age +
## Sex + Phase + Source, data = bmtcrr, ties = c("efron", "breslow",
## "exact")[1])
##
## n= 177, number of events= 131
##
## coef exp(coef) se(coef) z Pr(>|z|)
## DAML -0.41131 0.66278 0.20051 -2.05 0.040 *
## Age 0.01382 1.01392 0.00757 1.83 0.068 .
## SexF 0.40520 1.49960 0.17998 2.25 0.024 *
## PhaseCR1 -1.19315 0.30326 0.24163 -4.94 0.00000079 ***
## PhaseCR2 -0.90845 0.40315 0.22584 -4.02 0.00005757 ***
## PhaseCR3 -0.89783 0.40745 0.39992 -2.25 0.025 *
## SourcePB -0.57481 0.56281 0.29616 -1.94 0.052 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## DAML 0.663 1.509 0.447 0.982
## Age 1.014 0.986 0.999 1.029
## SexF 1.500 0.667 1.054 2.134
## PhaseCR1 0.303 3.297 0.189 0.487
## PhaseCR2 0.403 2.480 0.259 0.628
## PhaseCR3 0.407 2.454 0.186 0.892
## SourcePB 0.563 1.777 0.315 1.006
##
## Concordance= 0.653 (se = 0.027 )
## Rsquare= 0.191 (max possible= 0.999 )
## Likelihood ratio test= 37.5 on 7 df, p=0.00000372
## Wald test = 36.2 on 7 df, p=0.00000662
## Score (logrank) test = 38.3 on 7 df, p=0.00000264
This method looks at each event type separately. This describes what would have happened in the complete absence of the other event type. This is appropriate if the purpose is the inference about the biological relationship between the event of interest and covariates. For description, it will overestimate the incidence because in the absence of the competing risk event, which this method hypothesizes, the event of interest has more chance to occur. The 1 - CSH KM should also be interpreted cautiously. It represents the occurrence of the event in the hypothetical situation in which the other event type is absent. Thus, the event rate is overestimated compared to the reality, in which occurrence of the other event prevents the event of interest from happening as much as it would otherwise.
\( F_{CSH k}(t) = \int_0^t \lambda_k(u) S_k(u) du \)
\( S_k(u) \) here is the cause-specific survival, i.e. the proportion free of the event type \( k \).
Relapse
## Regression
resCshRelCox <- coxph(formula = Surv(ftime, Status == "Relapse") ~ D + Age + Sex + Phase + Source,
data = bmtcrr,
ties = c("efron","breslow","exact")[1])
summary(resCshRelCox)
## Call:
## coxph(formula = Surv(ftime, Status == "Relapse") ~ D + Age +
## Sex + Phase + Source, data = bmtcrr, ties = c("efron", "breslow",
## "exact")[1])
##
## n= 177, number of events= 56
##
## coef exp(coef) se(coef) z Pr(>|z|)
## DAML -0.65808 0.51784 0.29863 -2.20 0.02755 *
## Age -0.00671 0.99331 0.01214 -0.55 0.58004
## SexF 0.38126 1.46412 0.28463 1.34 0.18042
## PhaseCR1 -1.51583 0.21963 0.39646 -3.82 0.00013 ***
## PhaseCR2 -1.32709 0.26525 0.36472 -3.64 0.00027 ***
## PhaseCR3 -1.00372 0.36651 0.64515 -1.56 0.11976
## SourcePB 0.38109 1.46388 0.57695 0.66 0.50892
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## DAML 0.518 1.931 0.288 0.930
## Age 0.993 1.007 0.970 1.017
## SexF 1.464 0.683 0.838 2.558
## PhaseCR1 0.220 4.553 0.101 0.478
## PhaseCR2 0.265 3.770 0.130 0.542
## PhaseCR3 0.367 2.728 0.103 1.298
## SourcePB 1.464 0.683 0.473 4.535
##
## Concordance= 0.724 (se = 0.042 )
## Rsquare= 0.162 (max possible= 0.945 )
## Likelihood ratio test= 31.3 on 7 df, p=0.0000551
## Wald test = 30.2 on 7 df, p=0.0000888
## Score (logrank) test = 33.8 on 7 df, p=0.0000189
Mortality
## Regression
resCshMortCox <- coxph(formula = Surv(ftime, Status == "Mortality") ~ D + Age + Sex + Phase + Source,
data = bmtcrr,
ties = c("efron","breslow","exact")[1])
summary(resCshMortCox)
## Call:
## coxph(formula = Surv(ftime, Status == "Mortality") ~ D + Age +
## Sex + Phase + Source, data = bmtcrr, ties = c("efron", "breslow",
## "exact")[1])
##
## n= 177, number of events= 75
##
## coef exp(coef) se(coef) z Pr(>|z|)
## DAML -0.1741 0.8402 0.2744 -0.63 0.5259
## Age 0.0291 1.0296 0.0100 2.90 0.0037 **
## SexF 0.4384 1.5502 0.2351 1.86 0.0622 .
## PhaseCR1 -0.8856 0.4125 0.3107 -2.85 0.0044 **
## PhaseCR2 -0.5934 0.5524 0.2956 -2.01 0.0447 *
## PhaseCR3 -0.6472 0.5235 0.5126 -1.26 0.2067
## SourcePB -1.1347 0.3215 0.3575 -3.17 0.0015 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## DAML 0.840 1.190 0.491 1.439
## Age 1.030 0.971 1.010 1.050
## SexF 1.550 0.645 0.978 2.457
## PhaseCR1 0.412 2.424 0.224 0.758
## PhaseCR2 0.552 1.810 0.310 0.986
## PhaseCR3 0.523 1.910 0.192 1.430
## SourcePB 0.322 3.110 0.160 0.648
##
## Concordance= 0.645 (se = 0.036 )
## Rsquare= 0.124 (max possible= 0.981 )
## Likelihood ratio test= 23.4 on 7 df, p=0.00145
## Wald test = 23.6 on 7 df, p=0.00134
## Score (logrank) test = 24.5 on 7 df, p=0.000934
This approach describes the estimated cumulative incidence (risk over time) of each event type in the presence of the other event types. That is, the risk set attrition due to the occurrence of the competing risk is accounted for when estimating the occurrence of the event of interest. The cumulative incidence function for the event type \( k \) is mathematically represented as follows.
\( F_k(t) = \int_0^t \lambda_k(u) S(u) du = \int_0^t f_k(u) du \)
\( S_(u) \) here is the overall survival, i.e. the proportion free of any of the event types.
This is the integration of the event-specific density function \( f_k(\cdot) \). The multiplication by the event-free survival function \( S(t) \) accounts for the risk set attrition by all event types. This in a way “adjusts” the risk set size to the actual size. In the cause-specific hazard analysis, on the other hand, the risk set size is not “adjusted” for attrition to other event types.
Relapse
## Mode matrix creation
bmtcrrMat <- model.matrix(object = ~ D + Age + Sex + Phase + Source, data = bmtcrr)
bmtcrrMat <- bmtcrrMat[,-1]
head(bmtcrrMat)
## DAML Age SexF PhaseCR1 PhaseCR2 PhaseCR3 SourcePB
## 1 0 48 0 0 0 0 0
## 2 1 23 1 0 1 0 0
## 3 0 7 0 0 0 1 0
## 4 0 26 1 0 1 0 0
## 5 0 36 1 0 1 0 0
## 6 0 17 0 0 0 0 0
## Fit
resCrrRel <- crr(ftime = bmtcrr$ftime, # vector of failure/censoring times
fstatus = bmtcrr$Status, # vector with a unique code for each failure type and censoring
cov1 = bmtcrrMat, # matrix (nobs x ncovs) of fixed covariates
## cov2 = , # matrix of covariates that will be multiplied by functions of time
## tf = , # functions of time
## cengroup = , # vector with different values for each group with a distinct censoring distribution
failcode = "Relapse", # code of fstatus that denotes the failure type of interest
cencode = "Censored" # code of fstatus that denotes censored observations
)
summary(resCrrRel)
## Competing Risks Regression
##
## Call:
## crr(ftime = bmtcrr$ftime, fstatus = bmtcrr$Status, cov1 = bmtcrrMat,
## failcode = "Relapse", cencode = "Censored")
##
## coef exp(coef) se(coef) z p-value
## DAML -0.4723 0.624 0.3054 -1.547 0.1200
## Age -0.0185 0.982 0.0119 -1.554 0.1200
## SexF -0.0352 0.965 0.2900 -0.122 0.9000
## PhaseCR1 -1.1018 0.332 0.3764 -2.927 0.0034
## PhaseCR2 -1.0200 0.361 0.3558 -2.867 0.0041
## PhaseCR3 -0.7314 0.481 0.5766 -1.268 0.2000
## SourcePB 0.9211 2.512 0.5530 1.666 0.0960
##
## exp(coef) exp(-coef) 2.5% 97.5%
## DAML 0.624 1.604 0.343 1.134
## Age 0.982 1.019 0.959 1.005
## SexF 0.965 1.036 0.547 1.704
## PhaseCR1 0.332 3.009 0.159 0.695
## PhaseCR2 0.361 2.773 0.180 0.724
## PhaseCR3 0.481 2.078 0.155 1.490
## SourcePB 2.512 0.398 0.850 7.426
##
## Num. cases = 177
## Pseudo Log-likelihood = -267
## Pseudo likelihood ratio test = 24.4 on 7 df,
Mortality
## Regression (crr can only take an n*p covariate matrix)
resCrrMort <- crr(ftime = bmtcrr$ftime, # vector of failure/censoring times
fstatus = bmtcrr$Status, # vector with a unique code for each failure type and censoring
cov1 = bmtcrrMat, # matrix (nobs x ncovs) of fixed covariates
## cov2 = , # matrix of covariates that will be multiplied by functions of time
## tf = , # functions of time
## cengroup = , # vector with different values for each group with a distinct censoring distribution
failcode = "Mortality", # code of fstatus that denotes the failure type of interest
cencode = "Censored" # code of fstatus that denotes censored observations
)
summary(resCrrMort)
## Competing Risks Regression
##
## Call:
## crr(ftime = bmtcrr$ftime, fstatus = bmtcrr$Status, cov1 = bmtcrrMat,
## failcode = "Mortality", cencode = "Censored")
##
## coef exp(coef) se(coef) z p-value
## DAML 0.00226 1.002 0.2833 0.00799 0.9900
## Age 0.03074 1.031 0.0104 2.95247 0.0032
## SexF 0.27386 1.315 0.2376 1.15241 0.2500
## PhaseCR1 -0.37745 0.686 0.3049 -1.23779 0.2200
## PhaseCR2 -0.12735 0.880 0.2772 -0.45944 0.6500
## PhaseCR3 -0.25448 0.775 0.5997 -0.42433 0.6700
## SourcePB -1.16041 0.313 0.3799 -3.05432 0.0023
##
## exp(coef) exp(-coef) 2.5% 97.5%
## DAML 1.002 0.998 0.575 1.75
## Age 1.031 0.970 1.010 1.05
## SexF 1.315 0.760 0.825 2.10
## PhaseCR1 0.686 1.459 0.377 1.25
## PhaseCR2 0.880 1.136 0.511 1.52
## PhaseCR3 0.775 1.290 0.239 2.51
## SourcePB 0.313 3.191 0.149 0.66
##
## Num. cases = 177
## Pseudo Log-likelihood = -360
## Pseudo likelihood ratio test = 18.2 on 7 df,
The results of regression analysis need careful interpretation. There are hazard ratios for the AML compared to ALL from different models after adjusting for the age, sex, phase at which transplant was performed, and source of cells.
EFS 0.66 means that the overall event-free survival is better for the AML patients by 0.66 times than for the ALL patients within levels of baseline confounders. This is often considered the most relevant result in oncology clinical trials.
CSH_Rel 0.52 means that in the ideal world without non-relapse mortality, the AML patients are 0.52 times less likely to have relapses and are better off than the ALL patients within levels of baseline confounders.
CSH_Mort 0.84 means that in the ideal world without relapse, the AML patients are 0.84 times less likely to have non-relapse mortality and are better off than the ALL patients within levels of baseline confounders.
CIF_Rel 0.62 means that in the presence of non-relapse mortality, the AML patients are 0.62 times less likely to have relapses within levels of baseline confounders. This is higher than the CSH_Rel estimate (diminished protective effect). The AML patients have lower attrition of the risk set to non-relapse mortality, thus, they are more “available” for relapse. Although the AML patients are less likely to have relapse (CSH_Rel 0.52), this effect is diminished by this increased availability.
CIF_Mort 1.00 means that in the presence of relapse, the AML patients similarly likely to have non-relapse mortality within levels of baseline confounders. This is qualitatively different compared the CSH_Mort estimate (diminished protective effect). The AML patients have lower attrition of the risk set to relapse, and because they remain relapse-free, they can have more non-relapse mortality. This will diminish the protective effect on non-relapse mortality.
In this specific case, the effect of having AML rather than ALL is in the same directions (both protective: fewer relapses, fewer non-relapse mortality). In such a stituation, the cumulative incidence function method has a “diminished” effect. This is because beneficial effect on one end point (relapse) makes more people available for the other endpoint (non-relapse mortality). Thus, the protective effect on the other endpoint is partially overshadowed. The same is true in a “harmful for both” situation. On the other hand, the effect on the event-free survival will be mixture of positive effects, and will have more power. The directions of pure effects in the ideal world are most easily captured by the CSH method.
## Combine as a list
listOfRes <- list(EFS = summary(resEfsCox)$conf.int,
CSH_Rel = summary(resCshRelCox)$conf.int,
CSH_Mort = summary(resCshMortCox)$conf.int,
CIF_Rel = summary(resCrrRel)$conf.int,
CIF_Mort = summary(resCrrMort)$conf.int
)
## Format
listOfVecs <- sapply(listOfRes,
simplify = FALSE,
FUN = function(MAT) {
sprintf(fmt = "%.2f [%.2f,%.2f] ",
MAT[,1], MAT[,3], MAT[,4])
})
## Name and show
matOfRes <- do.call(cbind, listOfVecs)
rownames(matOfRes) <- rownames(summary(resEfsCox)$conf.int)
noquote(matOfRes)
## EFS CSH_Rel CSH_Mort CIF_Rel CIF_Mort
## DAML 0.66 [0.45,0.98] 0.52 [0.29,0.93] 0.84 [0.49,1.44] 0.62 [0.34,1.13] 1.00 [0.58,1.75]
## Age 1.01 [1.00,1.03] 0.99 [0.97,1.02] 1.03 [1.01,1.05] 0.98 [0.96,1.00] 1.03 [1.01,1.05]
## SexF 1.50 [1.05,2.13] 1.46 [0.84,2.56] 1.55 [0.98,2.46] 0.97 [0.55,1.70] 1.32 [0.83,2.10]
## PhaseCR1 0.30 [0.19,0.49] 0.22 [0.10,0.48] 0.41 [0.22,0.76] 0.33 [0.16,0.69] 0.69 [0.38,1.25]
## PhaseCR2 0.40 [0.26,0.63] 0.27 [0.13,0.54] 0.55 [0.31,0.99] 0.36 [0.18,0.72] 0.88 [0.51,1.52]
## PhaseCR3 0.41 [0.19,0.89] 0.37 [0.10,1.30] 0.52 [0.19,1.43] 0.48 [0.16,1.49] 0.78 [0.24,2.51]
## SourcePB 0.56 [0.31,1.01] 1.46 [0.47,4.54] 0.32 [0.16,0.65] 2.51 [0.85,7.43] 0.31 [0.15,0.66]
The censoring code is given within the Hist() function. The event of interest is given to the cause option within the FGR() function. The familiar formula interface is available. Missingness is handled by complete case analysis.
## Load function
library(riskRegression)
## crr for relapse
resFgrRel <- FGR(Hist(ftime, Status, cens.code = "Censored") ~ D + Age + Sex + Phase + Source, data = bmtcrr, cause = "Relapse")
list(FGR = resFgrRel,
crr = summary(resCrrRel))
## $FGR
##
## Right-censored response of a competing.risks model
##
## No.Observations: 177
##
## Pattern:
##
## Cause event right.censored
## Relapse 56 0
## Mortality 75 0
## unknown 0 46
##
##
## Fine-Gray model: analysis of cause 1
##
## Competing Risks Regression
##
## Call:
## FGR(formula = Hist(ftime, Status, cens.code = "Censored") ~ D +
## Age + Sex + Phase + Source, data = bmtcrr, cause = "Relapse")
##
## coef exp(coef) se(coef) z p-value
## D:AML -0.4723 0.624 0.3054 -1.547 0.1200
## Age -0.0185 0.982 0.0119 -1.554 0.1200
## Sex:F -0.0352 0.965 0.2900 -0.122 0.9000
## Phase:CR1 -1.1018 0.332 0.3764 -2.927 0.0034
## Phase:CR2 -1.0200 0.361 0.3558 -2.867 0.0041
## Phase:CR3 -0.7314 0.481 0.5766 -1.268 0.2000
## Source:PB 0.9211 2.512 0.5530 1.666 0.0960
##
## exp(coef) exp(-coef) 2.5% 97.5%
## D:AML 0.624 1.604 0.343 1.134
## Age 0.982 1.019 0.959 1.005
## Sex:F 0.965 1.036 0.547 1.704
## Phase:CR1 0.332 3.009 0.159 0.695
## Phase:CR2 0.361 2.773 0.180 0.724
## Phase:CR3 0.481 2.078 0.155 1.490
## Source:PB 2.512 0.398 0.850 7.426
##
## Num. cases = 177
## Pseudo Log-likelihood = -267
## Pseudo likelihood ratio test = 24.4 on 7 df,
##
## Convergence: TRUE
##
##
## $crr
## Competing Risks Regression
##
## Call:
## crr(ftime = bmtcrr$ftime, fstatus = bmtcrr$Status, cov1 = bmtcrrMat,
## failcode = "Relapse", cencode = "Censored")
##
## coef exp(coef) se(coef) z p-value
## DAML -0.4723 0.624 0.3054 -1.547 0.1200
## Age -0.0185 0.982 0.0119 -1.554 0.1200
## SexF -0.0352 0.965 0.2900 -0.122 0.9000
## PhaseCR1 -1.1018 0.332 0.3764 -2.927 0.0034
## PhaseCR2 -1.0200 0.361 0.3558 -2.867 0.0041
## PhaseCR3 -0.7314 0.481 0.5766 -1.268 0.2000
## SourcePB 0.9211 2.512 0.5530 1.666 0.0960
##
## exp(coef) exp(-coef) 2.5% 97.5%
## DAML 0.624 1.604 0.343 1.134
## Age 0.982 1.019 0.959 1.005
## SexF 0.965 1.036 0.547 1.704
## PhaseCR1 0.332 3.009 0.159 0.695
## PhaseCR2 0.361 2.773 0.180 0.724
## PhaseCR3 0.481 2.078 0.155 1.490
## SourcePB 2.512 0.398 0.850 7.426
##
## Num. cases = 177
## Pseudo Log-likelihood = -267
## Pseudo likelihood ratio test = 24.4 on 7 df,
## crr for mortality
resFgrMort <- FGR(Hist(ftime, Status, cens.code = "Censored") ~ D + Age + Sex + Phase + Source, data = bmtcrr, cause = "Mortality")
list(FGR = resFgrMort,
crr = summary(resCrrMort))
## $FGR
##
## Right-censored response of a competing.risks model
##
## No.Observations: 177
##
## Pattern:
##
## Cause event right.censored
## Relapse 56 0
## Mortality 75 0
## unknown 0 46
##
##
## Fine-Gray model: analysis of cause 2
##
## Competing Risks Regression
##
## Call:
## FGR(formula = Hist(ftime, Status, cens.code = "Censored") ~ D +
## Age + Sex + Phase + Source, data = bmtcrr, cause = "Mortality")
##
## coef exp(coef) se(coef) z p-value
## D:AML 0.00226 1.002 0.2833 0.00799 0.9900
## Age 0.03074 1.031 0.0104 2.95247 0.0032
## Sex:F 0.27386 1.315 0.2376 1.15241 0.2500
## Phase:CR1 -0.37745 0.686 0.3049 -1.23779 0.2200
## Phase:CR2 -0.12735 0.880 0.2772 -0.45944 0.6500
## Phase:CR3 -0.25448 0.775 0.5997 -0.42433 0.6700
## Source:PB -1.16041 0.313 0.3799 -3.05432 0.0023
##
## exp(coef) exp(-coef) 2.5% 97.5%
## D:AML 1.002 0.998 0.575 1.75
## Age 1.031 0.970 1.010 1.05
## Sex:F 1.315 0.760 0.825 2.10
## Phase:CR1 0.686 1.459 0.377 1.25
## Phase:CR2 0.880 1.136 0.511 1.52
## Phase:CR3 0.775 1.290 0.239 2.51
## Source:PB 0.313 3.191 0.149 0.66
##
## Num. cases = 177
## Pseudo Log-likelihood = -360
## Pseudo likelihood ratio test = 18.2 on 7 df,
##
## Convergence: TRUE
##
##
## $crr
## Competing Risks Regression
##
## Call:
## crr(ftime = bmtcrr$ftime, fstatus = bmtcrr$Status, cov1 = bmtcrrMat,
## failcode = "Mortality", cencode = "Censored")
##
## coef exp(coef) se(coef) z p-value
## DAML 0.00226 1.002 0.2833 0.00799 0.9900
## Age 0.03074 1.031 0.0104 2.95247 0.0032
## SexF 0.27386 1.315 0.2376 1.15241 0.2500
## PhaseCR1 -0.37745 0.686 0.3049 -1.23779 0.2200
## PhaseCR2 -0.12735 0.880 0.2772 -0.45944 0.6500
## PhaseCR3 -0.25448 0.775 0.5997 -0.42433 0.6700
## SourcePB -1.16041 0.313 0.3799 -3.05432 0.0023
##
## exp(coef) exp(-coef) 2.5% 97.5%
## DAML 1.002 0.998 0.575 1.75
## Age 1.031 0.970 1.010 1.05
## SexF 1.315 0.760 0.825 2.10
## PhaseCR1 0.686 1.459 0.377 1.25
## PhaseCR2 0.880 1.136 0.511 1.52
## PhaseCR3 0.775 1.290 0.239 2.51
## SourcePB 0.313 3.191 0.149 0.66
##
## Num. cases = 177
## Pseudo Log-likelihood = -360
## Pseudo likelihood ratio test = 18.2 on 7 df,