Event-free survival, cause-specific hazard, cumulative incidence function in survival analysis

References

Competing risk events

When the event of interest is relapse of cancer, there is a problem about how to treat non-relapse death. It can be defined as censoring, however, death prevents relapse from happening, violating the assumption those who remained and censored have the same chance of experiencing the event of interest within levels of covariates. Therefore, death is a competing risk event that eliminates the chance of having relapse, i.e., there are multiple modes of failure. Several methods play different roles in such situtations. These are the event-free survival method, the cause-specific survival method, and the cumulative incidence function method.

Load packages

## Load survival
library(survival)
## Load the cmprsk package
library(cmprsk)

Single variable

Prepare data

## Load the example data used in the BMT article
bmt <- read.csv("http://www.stat.unipg.it/luca/R/bmt.csv", sep = ";")

## Check data
head(bmt)
##   dis ftime status
## 1   0    13      2
## 2   0     1      1
## 3   0    72      0
## 4   0     7      2
## 5   0     8      2
## 6   1    67      0
## dis    disease 0 ALL; 1 AML
## ftime  follow-up time
## status 0 censored (survival); 1 Transplant-related mortality; 2 relapse

## Label diseases
bmt$dis <- factor(bmt$dis, levels = c(0,1), labels = c("ALL", "AML"))
## Label status
bmt$status <- factor(bmt$status, levels = c(0,1,2), labels = c("Censored","Mortality","Relapse"))

## Summary
summary(bmt)
##   dis         ftime            status  
##  ALL:17   Min.   : 0.0   Censored :11  
##  AML:18   1st Qu.: 3.0   Mortality: 9  
##           Median : 7.0   Relapse  :15  
##           Mean   :16.3                 
##           3rd Qu.:18.0                 
##           Max.   :72.0

## Cross table
addmargins(xtabs(data = bmt, ~ dis + status))
##      status
## dis   Censored Mortality Relapse Sum
##   ALL        2         3      12  17
##   AML        9         6       3  18
##   Sum       11         9      15  35

Event-free survival (EFS)

This method focuses on the non-occurrence of events (event-free status). It is often the outcome of interest in cancer trials, no death AND no relapse is important. In the Kaplan-Meier plot, the proportion estimates the proportion of people who have not experienced any of the endpoints (relapse or mortality). Here AML patients have a better chance of surviving without experiencing relapse or mortality. Plotted inversely (1 - EFS KM estimate), the curves are interpretable as the estimated proportions of patients experiencing one of the endpoints (relapse or mortality) over time.

## Calculate the EFS by disease
resEfsByDis <- survfit(formula   = Surv(ftime, status != "Censored") ~ dis,
                       data      = bmt,
                       type      = "kaplan-meier",
                       error     = "greenwood",
                       conf.type = "log-log")
## Numerical results
summary(resEfsByDis, times = c(0,5,10,15,20,30,40,50,60))
## Call: survfit(formula = Surv(ftime, status != "Censored") ~ dis, data = bmt, 
##     type = "kaplan-meier", error = "greenwood", conf.type = "log-log")
## 
##                 dis=ALL 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0     17       2    0.882  0.0781       0.6060        0.969
##     5     11       5    0.588  0.1194       0.3254        0.778
##    10      7       3    0.412  0.1194       0.1858        0.626
##    15      4       3    0.235  0.1029       0.0731        0.449
##    20      4       0    0.235  0.1029       0.0731        0.449
##    30      2       2    0.118  0.0781       0.0196        0.312
##    40      1       0    0.118  0.0781       0.0196        0.312
##    50      1       0    0.118  0.0781       0.0196        0.312
##    60      1       0    0.118  0.0781       0.0196        0.312
## 
##                 dis=AML 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0     18       0    1.000   0.000        1.000        1.000
##     5     10       6    0.649   0.116        0.380        0.825
##    10      6       3    0.426   0.131        0.178        0.656
##    15      5       0    0.426   0.131        0.178        0.656
##    20      5       0    0.426   0.131        0.178        0.656
##    30      5       0    0.426   0.131        0.178        0.656
##    40      3       0    0.426   0.131        0.178        0.656
##    50      3       0    0.426   0.131        0.178        0.656
##    60      3       0    0.426   0.131        0.178        0.656
## Plot Kaplan-Meier
rms::survplot(resEfsByDis)

plot of chunk unnamed-chunk-4

## Plot 1 - EFS KM
rms::survplot(resEfsByDis, fun = function(x) {1 - x})

plot of chunk unnamed-chunk-4


## Regression
resEfsCoxByDis <- coxph(formula = Surv(ftime, status != "Censored") ~ dis,
                        data    = bmt,
                        ties    = c("efron","breslow","exact")[1])
summary(resEfsCoxByDis)
## Call:
## coxph(formula = Surv(ftime, status != "Censored") ~ dis, data = bmt, 
##     ties = c("efron", "breslow", "exact")[1])
## 
##   n= 35, number of events= 24 
## 
##          coef exp(coef) se(coef)     z Pr(>|z|)
## disAML -0.565     0.569    0.423 -1.33     0.18
## 
##        exp(coef) exp(-coef) lower .95 upper .95
## disAML     0.569       1.76     0.248       1.3
## 
## Concordance= 0.56  (se = 0.06 )
## Rsquare= 0.051   (max possible= 0.983 )
## Likelihood ratio test= 1.84  on 1 df,   p=0.175
## Wald test            = 1.78  on 1 df,   p=0.182
## Score (logrank) test = 1.82  on 1 df,   p=0.177

Cause-specific hazard (CSH)

This method looks at each event type separately. This describes what would have happened in the complete absence of the other event type. This is appropriate if the purpose is the inference about the biological relationship between the event of interest and covariates. For description, it will overestimate the incidence because in the absence of the competing risk event, which this method hypothesizes, the event of interest has more chance to occur. The 1 - CSH KM should also be interpreted cautiously. It represents the occurrence of the event in the hypothetical situation in which the other event type is absent. Thus, the event rate is overestimated compared to the reality, in which occurrence of the other event prevents the event of interest from happening as much as it would otherwise.

\( F_{CSH k}(t) = \int_0^t \lambda_k(u) S_k(u) du \)

\( S_k(u) \) here is the cause-specific survival, i.e. the proportion free of the event type \( k \).

Relapse (Event of interest)

## Calculate the overall CSH for relapse
resCshRelByDis <- survfit(formula   = Surv(ftime, status == "Relapse") ~ dis,
                          data      = bmt,
                          type      = "kaplan-meier",
                          error     = "greenwood",
                          conf.type = "log-log")
## Numerical results
summary(resCshRelByDis, times = c(0,5,10,15,20,30,40,50,60))
## Call: survfit(formula = Surv(ftime, status == "Relapse") ~ dis, data = bmt, 
##     type = "kaplan-meier", error = "greenwood", conf.type = "log-log")
## 
##                 dis=ALL 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0     17       1    0.941  0.0571       0.6502        0.991
##     5     11       4    0.676  0.1200       0.3859        0.851
##    10      7       3    0.473  0.1290       0.2168        0.693
##    15      4       2    0.315  0.1252       0.1038        0.555
##    20      4       0    0.315  0.1252       0.1038        0.555
##    30      2       2    0.158  0.1007       0.0263        0.391
##    40      1       0    0.158  0.1007       0.0263        0.391
##    50      1       0    0.158  0.1007       0.0263        0.391
##    60      1       0    0.158  0.1007       0.0263        0.391
## 
##                 dis=AML 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0     18       0    1.000   0.000        1.000        1.000
##     5     10       1    0.944   0.054        0.666        0.992
##    10      6       2    0.708   0.155        0.303        0.905
##    15      5       0    0.708   0.155        0.303        0.905
##    20      5       0    0.708   0.155        0.303        0.905
##    30      5       0    0.708   0.155        0.303        0.905
##    40      3       0    0.708   0.155        0.303        0.905
##    50      3       0    0.708   0.155        0.303        0.905
##    60      3       0    0.708   0.155        0.303        0.905
## Plot Kaplan-Meier
rms::survplot(resCshRelByDis)

plot of chunk unnamed-chunk-5

## Plot 1 - CSH KM
rms::survplot(resCshRelByDis, fun = function(x) {1 - x})

plot of chunk unnamed-chunk-5


## Regression
resCshRelCoxByDis <- coxph(formula = Surv(ftime, status == "Relapse") ~ dis,
                           data    = bmt,
                           ties    = c("efron","breslow","exact")[1])
summary(resCshRelCoxByDis)
## Call:
## coxph(formula = Surv(ftime, status == "Relapse") ~ dis, data = bmt, 
##     ties = c("efron", "breslow", "exact")[1])
## 
##   n= 35, number of events= 15 
## 
##          coef exp(coef) se(coef)     z Pr(>|z|)  
## disAML -1.449     0.235    0.648 -2.24    0.025 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##        exp(coef) exp(-coef) lower .95 upper .95
## disAML     0.235       4.26     0.066     0.836
## 
## Concordance= 0.659  (se = 0.075 )
## Rsquare= 0.165   (max possible= 0.915 )
## Likelihood ratio test= 6.3  on 1 df,   p=0.0121
## Wald test            = 5  on 1 df,   p=0.0253
## Score (logrank) test = 5.91  on 1 df,   p=0.015

Mortality (Competing risk event)

## Calculate the overall CSH for mortality
resCshMortByDis <- survfit(formula   = Surv(ftime, status == "Mortality") ~ dis,
                           data      = bmt,
                           type      = "kaplan-meier",
                           error     = "greenwood",
                           conf.type = "log-log")
## Numerical results
summary(resCshMortByDis, times = c(0,5,10,15,20,30,40,50,60))
## Call: survfit(formula = Surv(ftime, status == "Mortality") ~ dis, data = bmt, 
##     type = "kaplan-meier", error = "greenwood", conf.type = "log-log")
## 
##                 dis=ALL 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0     17       1    0.941  0.0571        0.650        0.991
##     5     11       1    0.878  0.0807        0.595        0.968
##    10      7       0    0.878  0.0807        0.595        0.968
##    15      4       1    0.753  0.1352        0.375        0.921
##    20      4       0    0.753  0.1352        0.375        0.921
##    30      2       0    0.753  0.1352        0.375        0.921
##    40      1       0    0.753  0.1352        0.375        0.921
##    50      1       0    0.753  0.1352        0.375        0.921
##    60      1       0    0.753  0.1352        0.375        0.921
## 
##                 dis=AML 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0     18       0    1.000   0.000        1.000        1.000
##     5     10       5    0.688   0.116        0.405        0.856
##    10      6       1    0.602   0.129        0.312        0.801
##    15      5       0    0.602   0.129        0.312        0.801
##    20      5       0    0.602   0.129        0.312        0.801
##    30      5       0    0.602   0.129        0.312        0.801
##    40      3       0    0.602   0.129        0.312        0.801
##    50      3       0    0.602   0.129        0.312        0.801
##    60      3       0    0.602   0.129        0.312        0.801
## Plot Kaplan-Meier
rms::survplot(resCshMortByDis)

plot of chunk unnamed-chunk-6

## Plot 1 - CSH KM
rms::survplot(resCshMortByDis, fun = function(x) {1 - x})

plot of chunk unnamed-chunk-6


## Regression
resCshMortCoxByDis <- coxph(formula = Surv(ftime, status == "Mortality") ~ dis,
                            data    = bmt,
                            ties    = c("efron","breslow","exact")[1])
summary(resCshMortCoxByDis)
## Call:
## coxph(formula = Surv(ftime, status == "Mortality") ~ dis, data = bmt, 
##     ties = c("efron", "breslow", "exact")[1])
## 
##   n= 35, number of events= 9 
## 
##         coef exp(coef) se(coef)    z Pr(>|z|)
## disAML 0.665     1.944    0.709 0.94     0.35
## 
##        exp(coef) exp(-coef) lower .95 upper .95
## disAML      1.94      0.514     0.484      7.81
## 
## Concordance= 0.571  (se = 0.093 )
## Rsquare= 0.026   (max possible= 0.807 )
## Likelihood ratio test= 0.93  on 1 df,   p=0.335
## Wald test            = 0.88  on 1 df,   p=0.349
## Score (logrank) test = 0.91  on 1 df,   p=0.34

Cumulative Incidence Function (CIF)

This approach describes the estimated cumulative incidence (risk over time) of each event type in the presence of the other event types. That is, the risk set attrition due to the occurrence of the competing risk is accounted for when estimating the occurrence of the event of interest. The cumulative incidence function for the event type \( k \) is mathematically represented as follows.

\( F_k(t) = \int_0^t \lambda_k(u) S(u) du = \int_0^t f_k(u) du \)

\( S_(u) \) here is the overall survival, i.e. the proportion free of any of the event types.

This is the integration of the event-specific density function \( f_k(\cdot) \). The multiplication by the event-free survival function \( S(t) \) accounts for the risk set attrition by all event types. This in a way “adjusts” the risk set size to the actual size. In the cause-specific hazard analysis, on the other hand, the risk set size is not “adjusted” for attrition to other event types.

Relapse

## Calculate the grouped cumulative incidence functions (CIF)
resCumIncByDis <- cuminc(ftime   = bmt$ftime,  # failure time variable
                         fstatus = bmt$status,  # variable with distinct codes for different causes of failure
                         group   = bmt$dis,  # estimates will calculated within groups
                         ## strata  = ,  # Tests will be stratified on this variable.
                         rho     = 0, # Power of the weight function used in the tests.
                         cencode = "Censored", # value of fstatus variable which indicates the failure time is censored.
                         ## subset = ,
                         ## na.action = na.omit
                         )
## CIF and the variance of point estimates
timepoints(resCumIncByDis, times = c(0,5,10,15,20,30,40,50,60))
## $est
##                     0       5     10     15     20     30     40     50     60
## ALL Mortality 0.05882 0.11765 0.1176 0.1765 0.1765 0.1765 0.1765 0.1765 0.1765
## AML Mortality 0.00000 0.29514 0.3682 0.3682 0.3682 0.3682 0.3682 0.3682 0.3682
## ALL Relapse   0.05882 0.29412 0.4706 0.5882 0.5882 0.7059 0.7059 0.7059 0.7059
## AML Relapse   0.00000 0.05556 0.2057 0.2057 0.2057 0.2057 0.2057 0.2057 0.2057
## 
## $var
##                     0        5      10       15       20       30       40       50       60
## ALL Mortality 0.00346 0.006490 0.00649 0.009344 0.009344 0.009344 0.009344 0.009344 0.009344
## AML Mortality 0.00000 0.013175 0.01601 0.016011 0.016011 0.016011 0.016011 0.016011 0.016011
## ALL Relapse   0.00346 0.013061 0.01595 0.016046 0.016046 0.014585 0.014585 0.014585 0.014585
## AML Relapse   0.00000 0.003086 0.01336 0.013358 0.013358 0.013358 0.013358 0.013358 0.013358
## Plot
plot(resCumIncByDis)

plot of chunk unnamed-chunk-7


## Regression (crr can only take a covariate matrix)
bmtDisMat <- matrix(as.numeric(bmt$dis == "AML"))
colnames(bmtDisMat) <- "dis"

resCrrRelByDis <- crr(ftime    = bmt$ftime, # vector of failure/censoring times
                      fstatus  = bmt$status, # vector with a unique code for each failure type and censoring
                      cov1     = bmtDisMat, #  matrix (nobs x ncovs) of fixed covariates
                      ## cov2     = , # matrix of covariates that will be multiplied by functions of time
                      ## tf       = , # functions of time
                      ## cengroup = , # vector with different values for each group with a distinct censoring distribution
                      failcode = "Relapse", # code of fstatus that denotes the failure type of interest
                      cencode  = "Censored" # code of fstatus that denotes censored observations
                      )
summary(resCrrRelByDis)
## Competing Risks Regression
## 
## Call:
## crr(ftime = bmt$ftime, fstatus = bmt$status, cov1 = bmtDisMat, 
##     failcode = "Relapse", cencode = "Censored")
## 
##     coef exp(coef) se(coef)     z p-value
## dis -1.6     0.201     0.64 -2.51   0.012
## 
##     exp(coef) exp(-coef)   2.5% 97.5%
## dis     0.201       4.98 0.0573 0.705
## 
## Num. cases = 35
## Pseudo Log-likelihood = -43.9 
## Pseudo likelihood ratio test = 7.89  on 1 df,

Mortality

## cuminc() part is the same (omitted).

## Regression (crr can only take an n*p covariate matrix)
resCrrMortByDis <- crr(ftime    = bmt$ftime, # vector of failure/censoring times
                       fstatus  = bmt$status, # vector with a unique code for each failure type and censoring
                       cov1     = bmtDisMat, #  matrix (nobs x ncovs) of fixed covariates
                       ## cov2     = , # matrix of covariates that will be multiplied by functions of time
                       ## tf       = , # functions of time
                       ## cengroup = , # vector with different values for each group with a distinct censoring distribution
                       failcode = "Mortality", # code of fstatus that denotes the failure type of interest
                       cencode  = "Censored" # code of fstatus that denotes censored observations
                       )
summary(resCrrMortByDis)
## Competing Risks Regression
## 
## Call:
## crr(ftime = bmt$ftime, fstatus = bmt$status, cov1 = bmtDisMat, 
##     failcode = "Mortality", cencode = "Censored")
## 
##      coef exp(coef) se(coef)    z p-value
## dis 0.765      2.15    0.699 1.09    0.27
## 
##     exp(coef) exp(-coef)  2.5% 97.5%
## dis      2.15      0.465 0.546  8.46
## 
## Num. cases = 35
## Pseudo Log-likelihood = -29.9 
## Pseudo likelihood ratio test = 1.23  on 1 df,

Graphical Comparison

Here the CIFs and 1 - CSH KM are plotted together. The 1 - CSH KM (ticked lines) are all overestimating the occurrence over time. Both are actually useful if what they represent is understood.

CIF describes the occurrence of each event type in the presence of the other(s), i.e., it describes what happened in the data. This is “truthful” to the data. However, the estimation only applies to the data it was derived from and data with the same competing risk event structure.

1 - CSH KM describes the occurence of one type of event in the absence of the other(s), i.e., it describes what whould have happened if there had not been any competing risk events. This is hypothetical, but it can potentially be compared across data with different competing risk structures.

## Plot CIFs
plot(resCumIncByDis, col = 1:4, lwd = 3, lty = 1, xlab = "")
## Add lines from the 1 - KM (Relapse)
lines(resCshRelByDis, lwd = 1.5, col = c("green","blue"), fun = function(x) {1 - x})
## Add lines from the 1 - KM (Mortality)
lines(resCshMortByDis, lwd = 1.5, col = c("black","red"), fun = function(x) {1 - x})

plot of chunk unnamed-chunk-9

Regression comparison (Protective-Harmful scenario)

The results of regression analysis need careful interpretation. There are hazard ratios for the AML compared to ALL from different models.

EFS 0.57 means that the overall event-free survival is better for the AML patients by 0.57 times than for the ALL patients.

CSH_Rel 0.23 means that in the ideal world without non-relapse mortality, the AML patients are 0.23 times less likely to have relapses and are better off than the ALL patients.

CSH_Mort 1.94 means that in the ideal world without relapse, the AML patients are 1.94 times more likely to have non-relapse mortality and are worse off than the ALL patients.

CIF_Rel 0.20 means that in the presence of non-relapse mortality, the AML patients are 0.20 times less likely to have relapses. This is lower than the CSH_Rel estimate. The AML patients have higher attrition of the risk set to non-relapse mortality, and because they are already dead, they are not able to have relapses.

CIF_Mort 2.15 means that in the presence of relapse, the AML patients are 2.15 times more likely to have non-relapse mortality. This is higher than the CSH_Rel estimate. The AML patients have lower attrition of the risk set to relapse, and because they remain relapse-free, they can have more non-relapse mortality.

In this specific case, the effect of having AML rather than ALL is in the opposite directions (fewer relapses, but more non-relapse mortality). In such a stituation, the cumulative incidence function method has an “exaggerated” effect. This is because beneficial effect on one end point (relapse) makes more people available for the other endpoint (non-relapse mortality). On the other hand, the effect on the event-free survival will be mixture of positive effect and negative effect, and will have less power. The directions of pure effects in the ideal world are most easily captured by the CSH method.

listOfRes <- list(EFS      = summary(resEfsCoxByDis)$conf.int ,
                  CSH_Rel  = summary(resCshRelCoxByDis)$conf.int,
                  CSH_Mort = summary(resCshMortCoxByDis)$conf.int,
                  CIF_Rel  = summary(resCrrRelByDis)$conf.int,
                  CIF_Mort = summary(resCrrMortByDis)$conf.int
                  )
matOfRes <- do.call(rbind, listOfRes)
rownames(matOfRes) <- names(listOfRes)
round(matOfRes[,c(1,3,4)], 2)
##          exp(coef) lower .95 upper .95
## EFS           0.57      0.25      1.30
## CSH_Rel       0.23      0.07      0.84
## CSH_Mort      1.94      0.48      7.81
## CIF_Rel       0.20      0.06      0.71
## CIF_Mort      2.15      0.55      8.46

Multivariable regression

Prepare data

## Load the example data used in the BMT article
bmtcrr <- read.csv("http://www.stat.unipg.it/luca/R/bmtcrr.csv")

## Check data
head(bmtcrr)
##   Sex   D   Phase Age Status Source  ftime
## 1   M ALL Relapse  48      2  BM+PB   0.67
## 2   F AML     CR2  23      1  BM+PB   9.50
## 3   M ALL     CR3   7      0  BM+PB 131.77
## 4   F ALL     CR2  26      2  BM+PB  24.03
## 5   F ALL     CR2  36      2  BM+PB   1.47
## 6   M ALL Relapse  17      2  BM+PB   2.23
## Sex:    F or M
## D:      ALL or AML
## Phase:  CR1 CR2 CR3 Relapse
## Age:
## Status: 0 Censored 1 Relapse 2 Transplant-related death
## Source: BM+PB PB
## ftime:

## Label status
bmtcrr$Status <- factor(bmtcrr$Status, levels = 0:2, labels = c("Censored", "Relapse", "Mortality"))
## Make males reference
bmtcrr$Sex <- relevel(bmtcrr$Sex, ref = "M")
## Make relapse phase reference
bmtcrr$Phase <- relevel(bmtcrr$Phase, ref = "Relapse")

## Summary
summary(bmtcrr)
##  Sex       D           Phase         Age             Status     Source        ftime       
##  M:100   ALL: 73   Relapse:73   Min.   : 4.0   Censored :46   BM+PB: 21   Min.   :  0.13  
##  F: 77   AML:104   CR1    :47   1st Qu.:20.0   Relapse  :56   PB   :156   1st Qu.:  3.23  
##                    CR2    :45   Median :29.0   Mortality:75               Median :  6.63  
##                    CR3    :12   Mean   :30.5                              Mean   : 20.28  
##                                 3rd Qu.:40.0                              3rd Qu.: 19.90  
##                                 Max.   :62.0                              Max.   :131.77

## Cross table
addmargins(xtabs(~ D + Status, data = bmtcrr))
##      Status
## D     Censored Relapse Mortality Sum
##   ALL       17      28        28  73
##   AML       29      28        47 104
##   Sum       46      56        75 177

Event-free survival (EFS)

This method focuses on the non-occurrence of events (event-free status). It is often the outcome of interest in cancer trials, no death AND no relapse is important. In the Kaplan-Meier plot, the proportion estimates the proportion of people who have not experienced any of the endpoints (relapse or mortality). Here AML patients have a better chance of surviving without experiencing relapse or mortality. Plotted inversely (1 - EFS KM estimate), the curves are interpretable as the estimated proportions of patients experiencing one of the endpoints (relapse or mortality) over time.

resEfsCox <- coxph(formula = Surv(ftime, Status != "Censored") ~ D + Age + Sex + Phase + Source,
                   data    = bmtcrr,
                   ties    = c("efron","breslow","exact")[1])
summary(resEfsCox)
## Call:
## coxph(formula = Surv(ftime, Status != "Censored") ~ D + Age + 
##     Sex + Phase + Source, data = bmtcrr, ties = c("efron", "breslow", 
##     "exact")[1])
## 
##   n= 177, number of events= 131 
## 
##              coef exp(coef) se(coef)     z   Pr(>|z|)    
## DAML     -0.41131   0.66278  0.20051 -2.05      0.040 *  
## Age       0.01382   1.01392  0.00757  1.83      0.068 .  
## SexF      0.40520   1.49960  0.17998  2.25      0.024 *  
## PhaseCR1 -1.19315   0.30326  0.24163 -4.94 0.00000079 ***
## PhaseCR2 -0.90845   0.40315  0.22584 -4.02 0.00005757 ***
## PhaseCR3 -0.89783   0.40745  0.39992 -2.25      0.025 *  
## SourcePB -0.57481   0.56281  0.29616 -1.94      0.052 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## DAML         0.663      1.509     0.447     0.982
## Age          1.014      0.986     0.999     1.029
## SexF         1.500      0.667     1.054     2.134
## PhaseCR1     0.303      3.297     0.189     0.487
## PhaseCR2     0.403      2.480     0.259     0.628
## PhaseCR3     0.407      2.454     0.186     0.892
## SourcePB     0.563      1.777     0.315     1.006
## 
## Concordance= 0.653  (se = 0.027 )
## Rsquare= 0.191   (max possible= 0.999 )
## Likelihood ratio test= 37.5  on 7 df,   p=0.00000372
## Wald test            = 36.2  on 7 df,   p=0.00000662
## Score (logrank) test = 38.3  on 7 df,   p=0.00000264

Cause-specific hazard (CSH)

This method looks at each event type separately. This describes what would have happened in the complete absence of the other event type. This is appropriate if the purpose is the inference about the biological relationship between the event of interest and covariates. For description, it will overestimate the incidence because in the absence of the competing risk event, which this method hypothesizes, the event of interest has more chance to occur. The 1 - CSH KM should also be interpreted cautiously. It represents the occurrence of the event in the hypothetical situation in which the other event type is absent. Thus, the event rate is overestimated compared to the reality, in which occurrence of the other event prevents the event of interest from happening as much as it would otherwise.

\( F_{CSH k}(t) = \int_0^t \lambda_k(u) S_k(u) du \)

\( S_k(u) \) here is the cause-specific survival, i.e. the proportion free of the event type \( k \).

Relapse

## Regression
resCshRelCox <- coxph(formula = Surv(ftime, Status == "Relapse") ~ D + Age + Sex + Phase + Source,
                      data    = bmtcrr,
                      ties    = c("efron","breslow","exact")[1])
summary(resCshRelCox)
## Call:
## coxph(formula = Surv(ftime, Status == "Relapse") ~ D + Age + 
##     Sex + Phase + Source, data = bmtcrr, ties = c("efron", "breslow", 
##     "exact")[1])
## 
##   n= 177, number of events= 56 
## 
##              coef exp(coef) se(coef)     z Pr(>|z|)    
## DAML     -0.65808   0.51784  0.29863 -2.20  0.02755 *  
## Age      -0.00671   0.99331  0.01214 -0.55  0.58004    
## SexF      0.38126   1.46412  0.28463  1.34  0.18042    
## PhaseCR1 -1.51583   0.21963  0.39646 -3.82  0.00013 ***
## PhaseCR2 -1.32709   0.26525  0.36472 -3.64  0.00027 ***
## PhaseCR3 -1.00372   0.36651  0.64515 -1.56  0.11976    
## SourcePB  0.38109   1.46388  0.57695  0.66  0.50892    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## DAML         0.518      1.931     0.288     0.930
## Age          0.993      1.007     0.970     1.017
## SexF         1.464      0.683     0.838     2.558
## PhaseCR1     0.220      4.553     0.101     0.478
## PhaseCR2     0.265      3.770     0.130     0.542
## PhaseCR3     0.367      2.728     0.103     1.298
## SourcePB     1.464      0.683     0.473     4.535
## 
## Concordance= 0.724  (se = 0.042 )
## Rsquare= 0.162   (max possible= 0.945 )
## Likelihood ratio test= 31.3  on 7 df,   p=0.0000551
## Wald test            = 30.2  on 7 df,   p=0.0000888
## Score (logrank) test = 33.8  on 7 df,   p=0.0000189

Mortality

## Regression
resCshMortCox <- coxph(formula = Surv(ftime, Status == "Mortality") ~ D + Age + Sex + Phase + Source,
                       data    = bmtcrr,
                       ties    = c("efron","breslow","exact")[1])
summary(resCshMortCox)
## Call:
## coxph(formula = Surv(ftime, Status == "Mortality") ~ D + Age + 
##     Sex + Phase + Source, data = bmtcrr, ties = c("efron", "breslow", 
##     "exact")[1])
## 
##   n= 177, number of events= 75 
## 
##             coef exp(coef) se(coef)     z Pr(>|z|)   
## DAML     -0.1741    0.8402   0.2744 -0.63   0.5259   
## Age       0.0291    1.0296   0.0100  2.90   0.0037 **
## SexF      0.4384    1.5502   0.2351  1.86   0.0622 . 
## PhaseCR1 -0.8856    0.4125   0.3107 -2.85   0.0044 **
## PhaseCR2 -0.5934    0.5524   0.2956 -2.01   0.0447 * 
## PhaseCR3 -0.6472    0.5235   0.5126 -1.26   0.2067   
## SourcePB -1.1347    0.3215   0.3575 -3.17   0.0015 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## DAML         0.840      1.190     0.491     1.439
## Age          1.030      0.971     1.010     1.050
## SexF         1.550      0.645     0.978     2.457
## PhaseCR1     0.412      2.424     0.224     0.758
## PhaseCR2     0.552      1.810     0.310     0.986
## PhaseCR3     0.523      1.910     0.192     1.430
## SourcePB     0.322      3.110     0.160     0.648
## 
## Concordance= 0.645  (se = 0.036 )
## Rsquare= 0.124   (max possible= 0.981 )
## Likelihood ratio test= 23.4  on 7 df,   p=0.00145
## Wald test            = 23.6  on 7 df,   p=0.00134
## Score (logrank) test = 24.5  on 7 df,   p=0.000934

Cumulative Incidence Function (CIF)

This approach describes the estimated cumulative incidence (risk over time) of each event type in the presence of the other event types. That is, the risk set attrition due to the occurrence of the competing risk is accounted for when estimating the occurrence of the event of interest. The cumulative incidence function for the event type \( k \) is mathematically represented as follows.

\( F_k(t) = \int_0^t \lambda_k(u) S(u) du = \int_0^t f_k(u) du \)

\( S_(u) \) here is the overall survival, i.e. the proportion free of any of the event types.

This is the integration of the event-specific density function \( f_k(\cdot) \). The multiplication by the event-free survival function \( S(t) \) accounts for the risk set attrition by all event types. This in a way “adjusts” the risk set size to the actual size. In the cause-specific hazard analysis, on the other hand, the risk set size is not “adjusted” for attrition to other event types.

Relapse

## Mode matrix creation 
bmtcrrMat <- model.matrix(object = ~ D + Age + Sex + Phase + Source, data = bmtcrr)
bmtcrrMat <- bmtcrrMat[,-1]
head(bmtcrrMat)
##   DAML Age SexF PhaseCR1 PhaseCR2 PhaseCR3 SourcePB
## 1    0  48    0        0        0        0        0
## 2    1  23    1        0        1        0        0
## 3    0   7    0        0        0        1        0
## 4    0  26    1        0        1        0        0
## 5    0  36    1        0        1        0        0
## 6    0  17    0        0        0        0        0
## Fit
resCrrRel <- crr(ftime    = bmtcrr$ftime, # vector of failure/censoring times
                 fstatus  = bmtcrr$Status, # vector with a unique code for each failure type and censoring
                 cov1     = bmtcrrMat, #  matrix (nobs x ncovs) of fixed covariates
                 ## cov2     = , # matrix of covariates that will be multiplied by functions of time
                 ## tf       = , # functions of time
                 ## cengroup = , # vector with different values for each group with a distinct censoring distribution
                 failcode = "Relapse", # code of fstatus that denotes the failure type of interest
                 cencode  = "Censored" # code of fstatus that denotes censored observations
                 )
summary(resCrrRel)
## Competing Risks Regression
## 
## Call:
## crr(ftime = bmtcrr$ftime, fstatus = bmtcrr$Status, cov1 = bmtcrrMat, 
##     failcode = "Relapse", cencode = "Censored")
## 
##             coef exp(coef) se(coef)      z p-value
## DAML     -0.4723     0.624   0.3054 -1.547  0.1200
## Age      -0.0185     0.982   0.0119 -1.554  0.1200
## SexF     -0.0352     0.965   0.2900 -0.122  0.9000
## PhaseCR1 -1.1018     0.332   0.3764 -2.927  0.0034
## PhaseCR2 -1.0200     0.361   0.3558 -2.867  0.0041
## PhaseCR3 -0.7314     0.481   0.5766 -1.268  0.2000
## SourcePB  0.9211     2.512   0.5530  1.666  0.0960
## 
##          exp(coef) exp(-coef)  2.5% 97.5%
## DAML         0.624      1.604 0.343 1.134
## Age          0.982      1.019 0.959 1.005
## SexF         0.965      1.036 0.547 1.704
## PhaseCR1     0.332      3.009 0.159 0.695
## PhaseCR2     0.361      2.773 0.180 0.724
## PhaseCR3     0.481      2.078 0.155 1.490
## SourcePB     2.512      0.398 0.850 7.426
## 
## Num. cases = 177
## Pseudo Log-likelihood = -267 
## Pseudo likelihood ratio test = 24.4  on 7 df,

Mortality

## Regression (crr can only take an n*p covariate matrix)
resCrrMort <- crr(ftime    = bmtcrr$ftime, # vector of failure/censoring times
                  fstatus  = bmtcrr$Status, # vector with a unique code for each failure type and censoring
                  cov1     = bmtcrrMat, #  matrix (nobs x ncovs) of fixed covariates
                  ## cov2     = , # matrix of covariates that will be multiplied by functions of time
                  ## tf       = , # functions of time
                  ## cengroup = , # vector with different values for each group with a distinct censoring distribution
                  failcode = "Mortality", # code of fstatus that denotes the failure type of interest
                  cencode  = "Censored" # code of fstatus that denotes censored observations
                  )
summary(resCrrMort)
## Competing Risks Regression
## 
## Call:
## crr(ftime = bmtcrr$ftime, fstatus = bmtcrr$Status, cov1 = bmtcrrMat, 
##     failcode = "Mortality", cencode = "Censored")
## 
##              coef exp(coef) se(coef)        z p-value
## DAML      0.00226     1.002   0.2833  0.00799  0.9900
## Age       0.03074     1.031   0.0104  2.95247  0.0032
## SexF      0.27386     1.315   0.2376  1.15241  0.2500
## PhaseCR1 -0.37745     0.686   0.3049 -1.23779  0.2200
## PhaseCR2 -0.12735     0.880   0.2772 -0.45944  0.6500
## PhaseCR3 -0.25448     0.775   0.5997 -0.42433  0.6700
## SourcePB -1.16041     0.313   0.3799 -3.05432  0.0023
## 
##          exp(coef) exp(-coef)  2.5% 97.5%
## DAML         1.002      0.998 0.575  1.75
## Age          1.031      0.970 1.010  1.05
## SexF         1.315      0.760 0.825  2.10
## PhaseCR1     0.686      1.459 0.377  1.25
## PhaseCR2     0.880      1.136 0.511  1.52
## PhaseCR3     0.775      1.290 0.239  2.51
## SourcePB     0.313      3.191 0.149  0.66
## 
## Num. cases = 177
## Pseudo Log-likelihood = -360 
## Pseudo likelihood ratio test = 18.2  on 7 df,

Regression comparison (Protective-Protective scenario)

The results of regression analysis need careful interpretation. There are hazard ratios for the AML compared to ALL from different models after adjusting for the age, sex, phase at which transplant was performed, and source of cells.

EFS 0.66 means that the overall event-free survival is better for the AML patients by 0.66 times than for the ALL patients within levels of baseline confounders. This is often considered the most relevant result in oncology clinical trials.

CSH_Rel 0.52 means that in the ideal world without non-relapse mortality, the AML patients are 0.52 times less likely to have relapses and are better off than the ALL patients within levels of baseline confounders.

CSH_Mort 0.84 means that in the ideal world without relapse, the AML patients are 0.84 times less likely to have non-relapse mortality and are better off than the ALL patients within levels of baseline confounders.

CIF_Rel 0.62 means that in the presence of non-relapse mortality, the AML patients are 0.62 times less likely to have relapses within levels of baseline confounders. This is higher than the CSH_Rel estimate (diminished protective effect). The AML patients have lower attrition of the risk set to non-relapse mortality, thus, they are more “available” for relapse. Although the AML patients are less likely to have relapse (CSH_Rel 0.52), this effect is diminished by this increased availability.

CIF_Mort 1.00 means that in the presence of relapse, the AML patients similarly likely to have non-relapse mortality within levels of baseline confounders. This is qualitatively different compared the CSH_Mort estimate (diminished protective effect). The AML patients have lower attrition of the risk set to relapse, and because they remain relapse-free, they can have more non-relapse mortality. This will diminish the protective effect on non-relapse mortality.

In this specific case, the effect of having AML rather than ALL is in the same directions (both protective: fewer relapses, fewer non-relapse mortality). In such a stituation, the cumulative incidence function method has a “diminished” effect. This is because beneficial effect on one end point (relapse) makes more people available for the other endpoint (non-relapse mortality). Thus, the protective effect on the other endpoint is partially overshadowed. The same is true in a “harmful for both” situation. On the other hand, the effect on the event-free survival will be mixture of positive effects, and will have more power. The directions of pure effects in the ideal world are most easily captured by the CSH method.

## Combine as a list
listOfRes <- list(EFS      = summary(resEfsCox)$conf.int,
                  CSH_Rel  = summary(resCshRelCox)$conf.int,
                  CSH_Mort = summary(resCshMortCox)$conf.int,
                  CIF_Rel  = summary(resCrrRel)$conf.int,
                  CIF_Mort = summary(resCrrMort)$conf.int
                  )
## Format
listOfVecs <- sapply(listOfRes,
               simplify = FALSE,
               FUN = function(MAT) {

                   sprintf(fmt = "%.2f [%.2f,%.2f]  ",
                           MAT[,1], MAT[,3], MAT[,4])
               })
## Name and show
matOfRes  <- do.call(cbind, listOfVecs)
rownames(matOfRes) <- rownames(summary(resEfsCox)$conf.int)
noquote(matOfRes)
##          EFS                CSH_Rel            CSH_Mort           CIF_Rel            CIF_Mort          
## DAML     0.66 [0.45,0.98]   0.52 [0.29,0.93]   0.84 [0.49,1.44]   0.62 [0.34,1.13]   1.00 [0.58,1.75]  
## Age      1.01 [1.00,1.03]   0.99 [0.97,1.02]   1.03 [1.01,1.05]   0.98 [0.96,1.00]   1.03 [1.01,1.05]  
## SexF     1.50 [1.05,2.13]   1.46 [0.84,2.56]   1.55 [0.98,2.46]   0.97 [0.55,1.70]   1.32 [0.83,2.10]  
## PhaseCR1 0.30 [0.19,0.49]   0.22 [0.10,0.48]   0.41 [0.22,0.76]   0.33 [0.16,0.69]   0.69 [0.38,1.25]  
## PhaseCR2 0.40 [0.26,0.63]   0.27 [0.13,0.54]   0.55 [0.31,0.99]   0.36 [0.18,0.72]   0.88 [0.51,1.52]  
## PhaseCR3 0.41 [0.19,0.89]   0.37 [0.10,1.30]   0.52 [0.19,1.43]   0.48 [0.16,1.49]   0.78 [0.24,2.51]  
## SourcePB 0.56 [0.31,1.01]   1.46 [0.47,4.54]   0.32 [0.16,0.65]   2.51 [0.85,7.43]   0.31 [0.15,0.66]

Appendix: riskRegression for a formula interface for crr

The censoring code is given within the Hist() function. The event of interest is given to the cause option within the FGR() function. The familiar formula interface is available. Missingness is handled by complete case analysis.

## Load function
library(riskRegression)

## crr for relapse
resFgrRel <- FGR(Hist(ftime, Status, cens.code = "Censored") ~ D + Age + Sex + Phase + Source, data = bmtcrr, cause = "Relapse")

list(FGR = resFgrRel,
     crr = summary(resCrrRel))
## $FGR
## 
## Right-censored response of a competing.risks model 
## 
## No.Observations: 177 
## 
## Pattern:
##            
## Cause       event right.censored
##   Relapse      56              0
##   Mortality    75              0
##   unknown       0             46
## 
## 
## Fine-Gray model: analysis of cause 1 
## 
## Competing Risks Regression
## 
## Call:
## FGR(formula = Hist(ftime, Status, cens.code = "Censored") ~ D + 
##     Age + Sex + Phase + Source, data = bmtcrr, cause = "Relapse")
## 
##              coef exp(coef) se(coef)      z p-value
## D:AML     -0.4723     0.624   0.3054 -1.547  0.1200
## Age       -0.0185     0.982   0.0119 -1.554  0.1200
## Sex:F     -0.0352     0.965   0.2900 -0.122  0.9000
## Phase:CR1 -1.1018     0.332   0.3764 -2.927  0.0034
## Phase:CR2 -1.0200     0.361   0.3558 -2.867  0.0041
## Phase:CR3 -0.7314     0.481   0.5766 -1.268  0.2000
## Source:PB  0.9211     2.512   0.5530  1.666  0.0960
## 
##           exp(coef) exp(-coef)  2.5% 97.5%
## D:AML         0.624      1.604 0.343 1.134
## Age           0.982      1.019 0.959 1.005
## Sex:F         0.965      1.036 0.547 1.704
## Phase:CR1     0.332      3.009 0.159 0.695
## Phase:CR2     0.361      2.773 0.180 0.724
## Phase:CR3     0.481      2.078 0.155 1.490
## Source:PB     2.512      0.398 0.850 7.426
## 
## Num. cases = 177
## Pseudo Log-likelihood = -267 
## Pseudo likelihood ratio test = 24.4  on 7 df,
## 
## Convergence: TRUE 
## 
## 
## $crr
## Competing Risks Regression
## 
## Call:
## crr(ftime = bmtcrr$ftime, fstatus = bmtcrr$Status, cov1 = bmtcrrMat, 
##     failcode = "Relapse", cencode = "Censored")
## 
##             coef exp(coef) se(coef)      z p-value
## DAML     -0.4723     0.624   0.3054 -1.547  0.1200
## Age      -0.0185     0.982   0.0119 -1.554  0.1200
## SexF     -0.0352     0.965   0.2900 -0.122  0.9000
## PhaseCR1 -1.1018     0.332   0.3764 -2.927  0.0034
## PhaseCR2 -1.0200     0.361   0.3558 -2.867  0.0041
## PhaseCR3 -0.7314     0.481   0.5766 -1.268  0.2000
## SourcePB  0.9211     2.512   0.5530  1.666  0.0960
## 
##          exp(coef) exp(-coef)  2.5% 97.5%
## DAML         0.624      1.604 0.343 1.134
## Age          0.982      1.019 0.959 1.005
## SexF         0.965      1.036 0.547 1.704
## PhaseCR1     0.332      3.009 0.159 0.695
## PhaseCR2     0.361      2.773 0.180 0.724
## PhaseCR3     0.481      2.078 0.155 1.490
## SourcePB     2.512      0.398 0.850 7.426
## 
## Num. cases = 177
## Pseudo Log-likelihood = -267 
## Pseudo likelihood ratio test = 24.4  on 7 df,

## crr for mortality
resFgrMort <- FGR(Hist(ftime, Status, cens.code = "Censored") ~ D + Age + Sex + Phase + Source, data = bmtcrr, cause = "Mortality")

list(FGR = resFgrMort,
     crr = summary(resCrrMort))
## $FGR
## 
## Right-censored response of a competing.risks model 
## 
## No.Observations: 177 
## 
## Pattern:
##            
## Cause       event right.censored
##   Relapse      56              0
##   Mortality    75              0
##   unknown       0             46
## 
## 
## Fine-Gray model: analysis of cause 2 
## 
## Competing Risks Regression
## 
## Call:
## FGR(formula = Hist(ftime, Status, cens.code = "Censored") ~ D + 
##     Age + Sex + Phase + Source, data = bmtcrr, cause = "Mortality")
## 
##               coef exp(coef) se(coef)        z p-value
## D:AML      0.00226     1.002   0.2833  0.00799  0.9900
## Age        0.03074     1.031   0.0104  2.95247  0.0032
## Sex:F      0.27386     1.315   0.2376  1.15241  0.2500
## Phase:CR1 -0.37745     0.686   0.3049 -1.23779  0.2200
## Phase:CR2 -0.12735     0.880   0.2772 -0.45944  0.6500
## Phase:CR3 -0.25448     0.775   0.5997 -0.42433  0.6700
## Source:PB -1.16041     0.313   0.3799 -3.05432  0.0023
## 
##           exp(coef) exp(-coef)  2.5% 97.5%
## D:AML         1.002      0.998 0.575  1.75
## Age           1.031      0.970 1.010  1.05
## Sex:F         1.315      0.760 0.825  2.10
## Phase:CR1     0.686      1.459 0.377  1.25
## Phase:CR2     0.880      1.136 0.511  1.52
## Phase:CR3     0.775      1.290 0.239  2.51
## Source:PB     0.313      3.191 0.149  0.66
## 
## Num. cases = 177
## Pseudo Log-likelihood = -360 
## Pseudo likelihood ratio test = 18.2  on 7 df,
## 
## Convergence: TRUE 
## 
## 
## $crr
## Competing Risks Regression
## 
## Call:
## crr(ftime = bmtcrr$ftime, fstatus = bmtcrr$Status, cov1 = bmtcrrMat, 
##     failcode = "Mortality", cencode = "Censored")
## 
##              coef exp(coef) se(coef)        z p-value
## DAML      0.00226     1.002   0.2833  0.00799  0.9900
## Age       0.03074     1.031   0.0104  2.95247  0.0032
## SexF      0.27386     1.315   0.2376  1.15241  0.2500
## PhaseCR1 -0.37745     0.686   0.3049 -1.23779  0.2200
## PhaseCR2 -0.12735     0.880   0.2772 -0.45944  0.6500
## PhaseCR3 -0.25448     0.775   0.5997 -0.42433  0.6700
## SourcePB -1.16041     0.313   0.3799 -3.05432  0.0023
## 
##          exp(coef) exp(-coef)  2.5% 97.5%
## DAML         1.002      0.998 0.575  1.75
## Age          1.031      0.970 1.010  1.05
## SexF         1.315      0.760 0.825  2.10
## PhaseCR1     0.686      1.459 0.377  1.25
## PhaseCR2     0.880      1.136 0.511  1.52
## PhaseCR3     0.775      1.290 0.239  2.51
## SourcePB     0.313      3.191 0.149  0.66
## 
## Num. cases = 177
## Pseudo Log-likelihood = -360 
## Pseudo likelihood ratio test = 18.2  on 7 df,