Advanced Methods in Epidemiology

I am Huyen Nguyen. Thank you for reading!

The assignment focuses on survival analysis including competing risk analysis.

Part I: Data application

The data we consider to demonstrate competing risks analysis is contained in the ‘riskRegression’ package in R. A total of 205 patients with melanoma were followed after a surgical operation until the end of 1977. The data can be retrieved as follows

pacman::p_load(
  rio,          # File import
  here,         # File locator
  lubridate,
  survival,
  magrittr,
  tidyverse,
  ggplot2,
  riskRegression,
  survival,
  cmprsk,
  prodlim,
  ggsurvfit,
  gtsummary,
  tidycmprsk,
  condsurv)

## Installing package into 'C:/Users/khanh/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)

## Warning: package 'condsurv' is not available for this version of R
## 
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

## Warning: Perhaps you meant 'condSURV' ?

## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.2:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.2/PACKAGES'

## Warning in p_install(package, character.only = TRUE, ...):

## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'condsurv'

## Warning in pacman::p_load(rio, here, lubridate, survival, magrittr, tidyverse, : Failed to install/load:
## condsurv

# library('riskRegression')
Melanoma2 <- import(here("Melanoma.txt"))

## Warning in (function (input = "", file = NULL, text = NULL, cmd = NULL, :
## Detected 11 column names but the data has 12 columns (i.e. invalid file). Added
## 1 extra default column name for the first column which is guessed to be row
## names or an index. Use setnames() afterwards if this guess is not correct, or
## fix the file write command that created the file to create a valid file.

Study the variables in the dataset time survival time in days, possibly censored. status 1 died from melanoma, 0 alive, 2 dead from other causes. sex 1 = male, 0 = female. age age in years. year of operation. thickness tumor thickness in mm. ulcer 1 = presence, 0 = absence.

Melanoma2 %>% tbl_summary()

Characteristic	N = 205¹
V1	103 (52, 154)
time	2,005 (1,525, 3,042)
status
0	134 (65%)
1	57 (28%)
2	14 (6.8%)
event
censored	134 (65%)
death.malignant.melanoma	57 (28%)
death.other.causes	14 (6.8%)
invasion
level.0	99 (48%)
level.1	77 (38%)
level.2	29 (14%)
ici
0	17 (8.3%)
1	59 (29%)
2	107 (52%)
3	22 (11%)
epicel
not present	116 (57%)
present	89 (43%)
ulcer
not present	115 (56%)
present	90 (44%)
thick	1.94 (0.97, 3.56)
sex
Female	126 (61%)
Male	79 (39%)
age	54 (42, 65)
logthick	0.66 (-0.03, 1.27)
¹ Median (IQR); n (%)

Graphically depict the event time distribution for the event of interest (i.e., death as a result of melanoma) and distinguish between censored and uncensored observations.

A = Melanoma2$time[Melanoma2$status == 0]
B = Melanoma2$time[Melanoma2$status == 1]
C = Melanoma2$time[Melanoma2$status == 2]
c1 = "lightblue"; c2 = "pink"; c3 = "orange";
b <- min(c(A,B,C)) - 0.001 # Set the minimum for the breakpoints
e <- max(c(A,B,C)) # Set the maximum for the breakpoints
ax <- pretty(b:e, n = 12) # Make a neat vector for the breakpoints
hgA <- hist(A, breaks = ax, plot = FALSE) # Save first histogram data
hgB <- hist(B, breaks = ax, plot = FALSE) # Save second histogram data
hgC <- hist(C, breaks = ax, plot = FALSE) # Save second histogram data
par(mfrow = c(1,1))
plot(hgA, col = c1, main = "Event times", xlab = "Time (in days)")
plot(hgB, col = c2, add = TRUE)
plot(hgC, col = c3, add = TRUE)

Competing risks analysis - Combined analysis

A first approach towards analysis competing risks survival data is to combine all events. This combined analysis provides insights into the overall hazard function.

Estimate the overall cumulative incidence function using the Kaplan-Meier estimator for the survival function

Step 1: Estimate Survival function using Kaplan-Meier

Step 2: Using ‘survfit’ function to estimate CIF

fit_combined <- survfit(surv_object ~ 1, data = Melanoma2)
head(summary(fit_combined))

## $n
## [1] 205
## 
## $time
##  [1]   10   30   99  185  204  210  232  279  295  355  386  426  469  493  529
## [16]  621  629  659  667  718  752  779  793  817  826  833  858  869  872  967
## [31]  977  982 1041 1055 1062 1075 1156 1228 1252 1271 1312 1427 1435 1506 1516
## [46] 1525 1548 1560 1584 1621 1667 1690 1726 1860 1933 2061 2062 2085 2103 2108
## [61] 2256 2388 2467 2565 2782 3042 3154 3182 3338 3458
## 
## $n.risk
##  [1] 205 204 202 201 200 199 198 196 195 194 193 192 191 190 189 188 187 186 185
## [20] 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166
## [39] 165 164 163 162 161 159 155 154 152 150 148 146 137 134 131 117 110  95  94
## [58]  92  90  88  80  75  69  63  57  52  46  44  35  28
## 
## $n.event
##  [1] 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 
## $n.censor
##  [1]  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  3  0  1  1  1  1
## [51]  8  2  2 13  6 14  0  1  1  1  7  4  5  5  5  4  5  1  8  6
## 
## $surv
##  [1] 0.9951220 0.9902439 0.9853417 0.9804395 0.9755373 0.9706351 0.9608307
##  [8] 0.9559285 0.9510263 0.9461241 0.9412219 0.9363197 0.9314175 0.9265153
## [15] 0.9216131 0.9167109 0.9118087 0.9069065 0.9020043 0.8971021 0.8922000
## [22] 0.8872978 0.8823956 0.8774934 0.8725912 0.8676890 0.8627868 0.8578846
## [29] 0.8529824 0.8480802 0.8431780 0.8382758 0.8333736 0.8284714 0.8235692
## [36] 0.8186670 0.8137648 0.8088626 0.8039604 0.7990582 0.7941560 0.7892538
## [43] 0.7843516 0.7794186 0.7743901 0.7693616 0.7643000 0.7592046 0.7540749
## [50] 0.7489100 0.7434435 0.7378954 0.7322626 0.7260040 0.7194039 0.7118312
## [57] 0.7042586 0.6966036 0.6888636 0.6810356 0.6725226 0.6635556 0.6539389
## [64] 0.6435589 0.6322684 0.6201094 0.6066288 0.5928417 0.5759034 0.5553354

plot(fit_combined$time, 1-fit_combined$surv, lwd = 2, col = 2, type = "l",
     main = "Kaplan-Meier estimator for the overall CIF(t)",
     xlab = "Time (in days)", ylab = "Overall CIF")

Interpret this result. What is the disadvantage of this approach?

The overall cumulative incidence function provides an estimate of the combined risk. However, the disadvantage of this approach is that the event-specific risks can not be derived. Furthermore, the effect of covariates with respect to the overall hazard function can not be used to assess the cause-specific effects.

Non-parametric estimation of the (cause-specific) hazard function

Estimate the marginal cumulative distribution function (CIF). More specifically, ignore the competing risk and consider such observations as being (right-)censored for the event of interest.

surv_object <- Surv(time = Melanoma2$time,
                    event = as.numeric(Melanoma2$status == 1))
plot(surv_object, main = "Kaplan-Meier estimator for S(t)",
     xlab = "Time (in days)", ylab = "S(t)")

fit1 <- survfit(surv_object ~ 1, data = Melanoma2)
head(summary(fit1))

## $n
## [1] 205
## 
## $time
##  [1]  185  204  210  232  279  295  386  426  469  529  621  629  659  667  718
## [16]  752  779  793  817  833  858  869  872  967  977  982 1041 1055 1062 1075
## [31] 1156 1228 1252 1271 1312 1435 1506 1516 1548 1560 1584 1621 1667 1690 1726
## [46] 1933 2061 2062 2103 2108 2256 2388 2467 2565 2782 3042 3338
## 
## $n.risk
##  [1] 201 200 199 198 196 195 193 192 191 189 188 187 186 185 184 183 182 181 180
## [20] 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 161 159 155
## [39] 152 150 148 146 137 134 131 110  95  94  90  88  80  75  69  63  57  52  35
## 
## $n.event
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 
## $n.censor
##  [1]  4  0  0  1  0  0  1  0  0  1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
## [26]  0  0  0  0  0  0  0  0  0  0  1  1  3  2  1  1  1  8  2  2 20 14  0  3  1
## [51]  7  4  5  5  5  4 16
## 
## $surv
##  [1] 0.9950249 0.9900498 0.9850746 0.9800995 0.9750990 0.9700985 0.9650721
##  [8] 0.9600457 0.9550192 0.9499662 0.9449132 0.9398602 0.9348072 0.9297542
## [15] 0.9247012 0.9196482 0.9145951 0.9095421 0.9044891 0.8994077 0.8943263
## [22] 0.8892449 0.8841635 0.8790821 0.8740007 0.8689193 0.8638379 0.8587565
## [29] 0.8536751 0.8485937 0.8435123 0.8384309 0.8333495 0.8282681 0.8231867
## [36] 0.8180738 0.8129286 0.8076839 0.8023702 0.7970211 0.7916358 0.7862137
## [43] 0.7804749 0.7746504 0.7687371 0.7617486 0.7537301 0.7457117 0.7374261
## [50] 0.7290462 0.7199331 0.7103340 0.7000393 0.6889276 0.6768411 0.6638250
## [57] 0.6448585

plot(fit1$time, 1-fit1$surv, lwd = 2, col = 2, type = "l",
     main = "Kaplan-Meier estimator for CIF(t)",
     xlab = "Time (in days)", ylab = "Marginal CIF")

How do you interpret the aforementioned result? The marginal CIF provides an unbiased estimate for the risk of dying from melanoma (prior to time t), at least under the assumption of independent censoring as a result of the competing risks, in the hypothetical world in which the competing risk does not exist. However, in case the occurrence of the competing risks is associated with the occurrence of the event of interest, the aforementioned assumption is violated. For example, if the (instantaneous) risk of experiencing the event of interest is different for individuals experiencing a competing event as compared to those that do not,the independence assumption is untenable. More specifically, if ‘frail’ individuals that are at high risk of dying from melanoma, are also at increased risk of experiencing the competing event (i.e., dying from another cause), the marginal CIF is unreliable and provides a biased estimate of the risk of dying from melanoma.

This method requires an assumption about non-informative censoring. But it can be violated in the real world so we need to take into account the competing risk in this case.

In order to do so, the Kaplan-Meier (KM) estimator can be used in case there are no competing risks OR when competing risks are assumed to be independent (see above). However, the KM method provides biased estimates in case the independence assumption is violated (i.e., the event time distributions for individuals that do experience or do not experience the competing event are different), hence, one should estimate cause-specific CIFs to gain additional insights into the nature of the survival data under study. The cuminc() function is the R package cmprsk can be used to estimate the cause-specific CIFs.

The cumulative incidence functions (CIFs) for different causes, i.e., cause-specific CIFs, are used for the statistical description of survival data with competing risks. Estimate the cause-specific CIF for death related to melanoma and for death related to another cause

# Estimation of the cause-specific CIF
#--------------------------------------
library("MASS")

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:gtsummary':
## 
##     select

## The following object is masked from 'package:dplyr':
## 
##     select

library("ggsurvfit")
data(Melanoma, package = "MASS")

# time survival time in days, possibly censored.
# status 1 died from melanoma, 2 alive, 3 dead from other causes.
# sex 1 = male, 0 = female.
# age age in years.
# year of operation.
# thickness tumor thickness in mm.
# ulcer 1 = presence, 0 = absence.


# status 0=alive, 1=died from melanoma, 2=dead from other causes.
Melanoma <- 
  Melanoma %>% 
  mutate(
    status = as.factor(recode(status, `2` = 0, `1` = 1, `3` = 2))
  )
# step 1. Cumulative incidence for competing risks

cuminc(Surv(time, status) ~ 1, data = Melanoma)

##

## ── cuminc() ────────────────────────────────────────────────────────────────────

## • Failure type "1"

## time    n.risk   estimate   std.error   95% CI          
## 1,000   171      0.127      0.023       0.086, 0.177    
## 2,000   103      0.230      0.030       0.174, 0.291    
## 3,000   54       0.310      0.037       0.239, 0.383    
## 4,000   13       0.339      0.041       0.260, 0.419    
## 5,000   1        0.339      0.041       0.260, 0.419

## • Failure type "2"

## time    n.risk   estimate   std.error   95% CI          
## 1,000   171      0.034      0.013       0.015, 0.066    
## 2,000   103      0.050      0.016       0.026, 0.087    
## 3,000   54       0.058      0.017       0.030, 0.099    
## 4,000   13       0.106      0.032       0.053, 0.179    
## 5,000   1        0.106      0.032       0.053, 0.179

cuminc(Surv(time, status) ~ 1, data = Melanoma) %>% 
  ggcuminc() + 
  labs(
    x = "Days"
  ) + 
  add_confidence_interval() +
  add_risktable()

## Plotting outcome "1".

cuminc(Surv(time, status) ~ 1, data = Melanoma) %>% 
  ggcuminc(outcome = c("1", "2")) +
  ylim(c(0, 1)) + 
  labs(
    x = "Days"
  )

The cuminc() function also allows for the estimation of the cause-specific CIFs for different groups of patients, for example, males and females. Look in the documentation of the function to estimate the cause-specific CIFs for females and males separately

cuminc(Surv(time, status) ~ sex, data = Melanoma) %>% 
  ggcuminc(outcome = c("1", "2")) +
  ylim(c(0, 1)) + 
  labs(
    x = "Days"
  )

The cause-specific CIFs for males and females express the risk of experiencing each of the competing events for males vs. females. Apparently, there exists a large(r) difference between the risk of dying from melanoma between males and females as compared to the difference in risk of dying from another cause. More specifically, males tend to have a higher risk than females to die after the surgical operation as a result of melanoma. Differences between the survival probabilities for the other cause are small between males and females (dashed lines).

In order to perform a formal statistical test to check whether the difference between males and females is different, a modified χ 2 -statistic can be considered (Gray, 1988). Perform this test (see documentation cuminc() function). Interpret the result.

cuminc(Surv(time, status) ~ sex, data = Melanoma)

##

## ── cuminc() ────────────────────────────────────────────────────────────────────

## • Failure type "1"

## strata   time    n.risk   estimate   std.error   95% CI          
## 0        1,000   111      0.087      0.025       0.046, 0.145    
## 0        2,000   68       0.181      0.035       0.118, 0.255    
## 0        3,000   36       0.236      0.043       0.158, 0.323    
## 0        4,000   9        0.284      0.052       0.187, 0.389    
## 0        5,000   1        0.284      0.052       0.187, 0.389    
## 1        1,000   60       0.192      0.045       0.113, 0.287    
## 1        2,000   35       0.310      0.053       0.210, 0.415    
## 1        3,000   18       0.425      0.065       0.296, 0.547    
## 1        4,000   4        0.425      0.065       0.296, 0.547    
## 1        5,000   0        NA         NA          NA, NA

## • Failure type "2"

## strata   time    n.risk   estimate   std.error   95% CI          
## 0        1,000   111      0.032      0.016       0.010, 0.074    
## 0        2,000   68       0.040      0.018       0.015, 0.085    
## 0        3,000   36       0.052      0.021       0.021, 0.105    
## 0        4,000   9        0.085      0.039       0.029, 0.181    
## 0        5,000   1        0.085      0.039       0.029, 0.181    
## 1        1,000   60       0.038      0.022       0.010, 0.098    
## 1        2,000   35       0.067      0.029       0.024, 0.140    
## 1        3,000   18       0.067      0.029       0.024, 0.140    
## 1        4,000   4        0.135      0.054       0.051, 0.259    
## 1        5,000   0        NA         NA          NA, NA

## • Tests

## outcome   statistic   df     p.value    
## 1         5.81        1.00   0.016      
## 2         0.854       1.00   0.36

The first column of the output shows the \(\chiˆ2\)-statistic for the between-group test, and the second column presents the corresponding p-values. Based on the results, there exists a significant difference in risk of dying from melanoma between males and females at a 5% significance level (p-value = 0.016). There is no significant difference in mortality risk from other causes for male versus female patients (p-value = 0.360).

Cause-specific hazard regression

A cause-specific hazard regression model is used to investigate the effect of different covariates directly on the cause-specific hazard function (i.e., the rate of occurrence of a specific cause/event). A cause-specific hazard regression model can be fitted using the standard Cox proportional hazards regression model while treating failures from the cause of interest as events and failure from other causes as censored observations. The effect of covariates on the cause-specific hazard can be estimated using the partial likelihood method as proposed by Cox. Such a model is fit using the coxph() function in the survival package in R.

Assess the impact of gender, age and invasion on the cause-specific hazard function for dying as a result of the melanoma

# Cox PH model for cause-specific hazards regression
#----------------------------------------------------
Melanoma2 <- import(here("Melanoma.txt"))

## Warning in (function (input = "", file = NULL, text = NULL, cmd = NULL, :
## Detected 11 column names but the data has 12 columns (i.e. invalid file). Added
## 1 extra default column name for the first column which is guessed to be row
## names or an index. Use setnames() afterwards if this guess is not correct, or
## fix the file write command that created the file to create a valid file.

csh <- coxph(Surv(time, status==1) ~ sex+age+invasion,data=Melanoma2)
summary(csh)

## Call:
## coxph(formula = Surv(time, status == 1) ~ sex + age + invasion, 
##     data = Melanoma2)
## 
##   n= 205, number of events= 57 
## 
##                     coef exp(coef) se(coef)     z Pr(>|z|)    
## sexMale         0.663383  1.941349 0.266320 2.491 0.012741 *  
## age             0.009823  1.009871 0.008339 1.178 0.238840    
## invasionlevel.1 1.037168  2.821217 0.328241 3.160 0.001579 ** 
## invasionlevel.2 1.403225  4.068300 0.380744 3.685 0.000228 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                 exp(coef) exp(-coef) lower .95 upper .95
## sexMale             1.941     0.5151    1.1519     3.272
## age                 1.010     0.9902    0.9935     1.027
## invasionlevel.1     2.821     0.3545    1.4826     5.368
## invasionlevel.2     4.068     0.2458    1.9290     8.580
## 
## Concordance= 0.7  (se = 0.035 )
## Likelihood ratio test= 26.7  on 4 df,   p=2e-05
## Wald test            = 24.39  on 4 df,   p=7e-05
## Score (logrank) test = 26.85  on 4 df,   p=2e-05

Based on the model output and the estimated HRs, we can conclude that there exists a significant difference in instantaneous probability of dying from melanoma between males and females. More specifically, the hazard of dying from melanoma is estimated to be approximately 2 times higher for males vs.females, accounting for death due to another reason. Furthermore, there is no significant effect of age on the cause-specific hazard (p-value = 0.239). The invasion level alters the rate of melanoma-related mortality significantly with the highest rate observed for invasion level 2, with the hazard being approximately four times higher as compared to the hazard with baseline invastion level (level 0).

NOTE: Alternatively, cause-specific regression can also be performed using the CSC() function contained in the riskRegression package in R.

library("prodlim")
library("riskRegression")
csh_ext<-CSC(Hist(time,status)~sex+age+invasion,data=Melanoma2)
print(csh_ext)

## CSC(formula = Hist(time, status) ~ sex + age + invasion, data = Melanoma2)
## 
## Right-censored response of a competing.risks model
## 
## No.Observations: 205 
## 
## Pattern:
##          
## Cause     event right.censored
##   1          57              0
##   2          14              0
##   unknown     0            134
## 
## 
## ----------> Cause:  1 
## 
## Call:
## coxph(formula = survival::Surv(time, status) ~ sex + age + invasion, 
##     x = TRUE, y = TRUE)
## 
##   n= 205, number of events= 57 
## 
##                     coef exp(coef) se(coef)     z Pr(>|z|)    
## sexMale         0.663383  1.941349 0.266320 2.491 0.012741 *  
## age             0.009823  1.009871 0.008339 1.178 0.238840    
## invasionlevel.1 1.037168  2.821217 0.328241 3.160 0.001579 ** 
## invasionlevel.2 1.403225  4.068300 0.380744 3.685 0.000228 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                 exp(coef) exp(-coef) lower .95 upper .95
## sexMale             1.941     0.5151    1.1519     3.272
## age                 1.010     0.9902    0.9935     1.027
## invasionlevel.1     2.821     0.3545    1.4826     5.368
## invasionlevel.2     4.068     0.2458    1.9290     8.580
## 
## Concordance= 0.7  (se = 0.035 )
## Likelihood ratio test= 26.7  on 4 df,   p=2e-05
## Wald test            = 24.39  on 4 df,   p=7e-05
## Score (logrank) test = 26.85  on 4 df,   p=2e-05
## 
## 
## 
## ----------> Cause:  2 
## 
## Call:
## coxph(formula = survival::Surv(time, status) ~ sex + age + invasion, 
##     x = TRUE, y = TRUE)
## 
##   n= 205, number of events= 14 
## 
##                     coef exp(coef) se(coef)      z Pr(>|z|)    
## sexMale          0.29893   1.34842  0.54620  0.547 0.584180    
## age              0.09271   1.09715  0.02592  3.576 0.000349 ***
## invasionlevel.1 -0.88527   0.41260  0.64069 -1.382 0.167051    
## invasionlevel.2 -1.34958   0.25935  1.12351 -1.201 0.229666    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                 exp(coef) exp(-coef) lower .95 upper .95
## sexMale            1.3484     0.7416   0.46227     3.933
## age                1.0971     0.9115   1.04279     1.154
## invasionlevel.1    0.4126     2.4236   0.11754     1.448
## invasionlevel.2    0.2593     3.8558   0.02868     2.345
## 
## Concordance= 0.839  (se = 0.042 )
## Likelihood ratio test= 18.96  on 4 df,   p=8e-04
## Wald test            = 14.21  on 4 df,   p=0.007
## Score (logrank) test = 15.53  on 4 df,   p=0.004

Subdistribution hazard regression

A subdistribution hazard regression model is also referred to as the Fine and Gray model. The motivation for the use of the Fine and Gray model is that the effect of a covariate on the cause-specific hazard function may be quite different from that on the cumulative incidence. In other words, a covariate may have strong influence on the cause-specific hazard, but no effect on the CIF. The difference between cause-specific hazard regression and sub-distribution hazard regression is that both hazard functions rely on a different treatment of the competing risk events. The former considers competing risk events as non-informative censoring, whereas the latter takes into account the informative censoring nature of the competing risk events. 1. Consider the Fine and Gray model for the example at hand. More specifically, fit a subdistribution hazards regression model with gender, age and invasion as covariates.

# Fine and Gray model for subdistribution hazards regression
#------------------------------------------------------------
shm <- FGR(Hist(time,status)~sex+age+invasion,data=Melanoma2)
shm

## 
## Right-censored response of a competing.risks model
## 
## No.Observations: 205 
## 
## Pattern:
##          
## Cause     event right.censored
##   1          57              0
##   2          14              0
##   unknown     0            134
## 
## 
## Fine-Gray model: analysis of cause 1 
## 
## Competing Risks Regression
## 
## Call:
## FGR(formula = Hist(time, status) ~ sex + age + invasion, data = Melanoma2, 
##     cause = "1")
## 
##                    coef exp(coef) se(coef)     z p-value
## sexMale         0.62762      1.87  0.27170 2.310  0.0210
## age             0.00565      1.01  0.00913 0.619  0.5400
## invasionlevel.1 1.04909      2.86  0.34040 3.082  0.0021
## invasionlevel.2 1.37802      3.97  0.40137 3.433  0.0006
## 
##                 exp(coef) exp(-coef)  2.5% 97.5%
## sexMale              1.87      0.534 1.100  3.19
## age                  1.01      0.994 0.988  1.02
## invasionlevel.1      2.86      0.350 1.465  5.56
## invasionlevel.2      3.97      0.252 1.806  8.71
## 
## Num. cases = 205
## Pseudo Log-likelihood = -274 
## Pseudo likelihood ratio test = 24.6  on 4 df,
## 
## Convergence: TRUE

The estimated coefficient for cause 1 (i.e., mortality as a result of melanoma) deviates slightly from the one obtained in the cause-specific hazard model (HR: 1.87 vs. 1.94), as a result of different assumptions for the competing risks. The numerical values derived from the Fine and Gray model have no simple interpretation. These estimates reflect the ordering of the cumulative incidence curves.The cause-specific hazard function represents the time-dependent rate of failure due to cause per unit of time for patients who are still alive. The subdistribution hazard, however, is the rate of failure due to cause one per unit of time for patients who are either alive or have already failed from cause 2. In other words, patients who fail from other causes remain in the risk set.

Discuss the differences between the cause-specific hazard regression and subdistribution hazard regression models in terms of interpretation and formulation.

Cause-specific hazard regression:

Formula: h_i(t) = h_0(t)exp(X_ib) The interpretation of the coefficients in the cause-specific hazard regression model is straightforward. For example, if the coefficient for a particular predictor variable is positive, then the hazard rate of the event of interest increases as the value of that predictor variable increases.

subdistribution hazard regression models (Fine-Gray model)

Formula: h_i(t) = (d_i(t) + h_0(t)exp(X_ib))/R_i(t) Where d_i(t) is the cumulative incidence function of the competing events

The interpretation of the coefficients in the subdistribution hazard regression model is more complex, as they reflect the effect of the predictor variable on the cumulative incidence function, which takes into account both the occurrence of the event of interest and competing events. For example, a positive coefficient for a particular predictor variable indicates that the cumulative incidence of the event of interest increases as the value of that predictor variable increases, while taking into account the occurrence of competing events and death.

In summary, the cause-specific hazard regression model and subdistribution hazard regression model differ in their focus, with the former modeling the hazard rate of a specific event of interest while treating competing events as censoring events, and the latter modeling the hazard rate of the event of interest while taking into account both competing events and the competing risk of death. The interpretation of the coefficients in the cause-specific hazard regression model is straightforward, while the interpretation in the subdistribution hazard regression model is more complex, reflecting the cumulative incidence function of the event of interest. Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Advanced Methods in Epidemiology

Huyen Nguyen - 05 March 2023

Part I: Data application

Competing risks analysis - Combined analysis

Non-parametric estimation of the (cause-specific) hazard function

Cause-specific hazard regression

Subdistribution hazard regression