I am Huyen Nguyen. Thank you for reading!
The assignment focuses on survival analysis including competing risk analysis.
The data we consider to demonstrate competing risks analysis is contained in the ‘riskRegression’ package in R. A total of 205 patients with melanoma were followed after a surgical operation until the end of 1977. The data can be retrieved as follows
pacman::p_load(
rio, # File import
here, # File locator
lubridate,
survival,
magrittr,
tidyverse,
ggplot2,
riskRegression,
survival,
cmprsk,
prodlim,
ggsurvfit,
gtsummary,
tidycmprsk,
condsurv)
## Installing package into 'C:/Users/khanh/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)
## Warning: package 'condsurv' is not available for this version of R
##
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
## Warning: Perhaps you meant 'condSURV' ?
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.2:
## cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.2/PACKAGES'
## Warning in p_install(package, character.only = TRUE, ...):
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'condsurv'
## Warning in pacman::p_load(rio, here, lubridate, survival, magrittr, tidyverse, : Failed to install/load:
## condsurv
# library('riskRegression')
Melanoma2 <- import(here("Melanoma.txt"))
## Warning in (function (input = "", file = NULL, text = NULL, cmd = NULL, :
## Detected 11 column names but the data has 12 columns (i.e. invalid file). Added
## 1 extra default column name for the first column which is guessed to be row
## names or an index. Use setnames() afterwards if this guess is not correct, or
## fix the file write command that created the file to create a valid file.
Melanoma2 %>% tbl_summary()
| Characteristic | N = 2051 |
|---|---|
| V1 | 103 (52, 154) |
| time | 2,005 (1,525, 3,042) |
| status | |
| Â Â Â Â 0 | 134 (65%) |
| Â Â Â Â 1 | 57 (28%) |
| Â Â Â Â 2 | 14 (6.8%) |
| event | |
| Â Â Â Â censored | 134 (65%) |
| Â Â Â Â death.malignant.melanoma | 57 (28%) |
| Â Â Â Â death.other.causes | 14 (6.8%) |
| invasion | |
| Â Â Â Â level.0 | 99 (48%) |
| Â Â Â Â level.1 | 77 (38%) |
| Â Â Â Â level.2 | 29 (14%) |
| ici | |
| Â Â Â Â 0 | 17 (8.3%) |
| Â Â Â Â 1 | 59 (29%) |
| Â Â Â Â 2 | 107 (52%) |
| Â Â Â Â 3 | 22 (11%) |
| epicel | |
| Â Â Â Â not present | 116 (57%) |
| Â Â Â Â present | 89 (43%) |
| ulcer | |
| Â Â Â Â not present | 115 (56%) |
| Â Â Â Â present | 90 (44%) |
| thick | 1.94 (0.97, 3.56) |
| sex | |
| Â Â Â Â Female | 126 (61%) |
| Â Â Â Â Male | 79 (39%) |
| age | 54 (42, 65) |
| logthick | 0.66 (-0.03, 1.27) |
| 1 Median (IQR); n (%) | |
A = Melanoma2$time[Melanoma2$status == 0]
B = Melanoma2$time[Melanoma2$status == 1]
C = Melanoma2$time[Melanoma2$status == 2]
c1 = "lightblue"; c2 = "pink"; c3 = "orange";
b <- min(c(A,B,C)) - 0.001 # Set the minimum for the breakpoints
e <- max(c(A,B,C)) # Set the maximum for the breakpoints
ax <- pretty(b:e, n = 12) # Make a neat vector for the breakpoints
hgA <- hist(A, breaks = ax, plot = FALSE) # Save first histogram data
hgB <- hist(B, breaks = ax, plot = FALSE) # Save second histogram data
hgC <- hist(C, breaks = ax, plot = FALSE) # Save second histogram data
par(mfrow = c(1,1))
plot(hgA, col = c1, main = "Event times", xlab = "Time (in days)")
plot(hgB, col = c2, add = TRUE)
plot(hgC, col = c3, add = TRUE)
A first approach towards analysis competing risks survival data is to combine all events. This combined analysis provides insights into the overall hazard function.
Step 1: Estimate Survival function using Kaplan-Meier
Step 2: Using ‘survfit’ function to estimate CIF
fit_combined <- survfit(surv_object ~ 1, data = Melanoma2)
head(summary(fit_combined))
## $n
## [1] 205
##
## $time
## [1] 10 30 99 185 204 210 232 279 295 355 386 426 469 493 529
## [16] 621 629 659 667 718 752 779 793 817 826 833 858 869 872 967
## [31] 977 982 1041 1055 1062 1075 1156 1228 1252 1271 1312 1427 1435 1506 1516
## [46] 1525 1548 1560 1584 1621 1667 1690 1726 1860 1933 2061 2062 2085 2103 2108
## [61] 2256 2388 2467 2565 2782 3042 3154 3182 3338 3458
##
## $n.risk
## [1] 205 204 202 201 200 199 198 196 195 194 193 192 191 190 189 188 187 186 185
## [20] 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166
## [39] 165 164 163 162 161 159 155 154 152 150 148 146 137 134 131 117 110 95 94
## [58] 92 90 88 80 75 69 63 57 52 46 44 35 28
##
## $n.event
## [1] 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##
## $n.censor
## [1] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 0 1 1 1 1
## [51] 8 2 2 13 6 14 0 1 1 1 7 4 5 5 5 4 5 1 8 6
##
## $surv
## [1] 0.9951220 0.9902439 0.9853417 0.9804395 0.9755373 0.9706351 0.9608307
## [8] 0.9559285 0.9510263 0.9461241 0.9412219 0.9363197 0.9314175 0.9265153
## [15] 0.9216131 0.9167109 0.9118087 0.9069065 0.9020043 0.8971021 0.8922000
## [22] 0.8872978 0.8823956 0.8774934 0.8725912 0.8676890 0.8627868 0.8578846
## [29] 0.8529824 0.8480802 0.8431780 0.8382758 0.8333736 0.8284714 0.8235692
## [36] 0.8186670 0.8137648 0.8088626 0.8039604 0.7990582 0.7941560 0.7892538
## [43] 0.7843516 0.7794186 0.7743901 0.7693616 0.7643000 0.7592046 0.7540749
## [50] 0.7489100 0.7434435 0.7378954 0.7322626 0.7260040 0.7194039 0.7118312
## [57] 0.7042586 0.6966036 0.6888636 0.6810356 0.6725226 0.6635556 0.6539389
## [64] 0.6435589 0.6322684 0.6201094 0.6066288 0.5928417 0.5759034 0.5553354
plot(fit_combined$time, 1-fit_combined$surv, lwd = 2, col = 2, type = "l",
main = "Kaplan-Meier estimator for the overall CIF(t)",
xlab = "Time (in days)", ylab = "Overall CIF")
The overall cumulative incidence function provides an estimate of the combined risk. However, the disadvantage of this approach is that the event-specific risks can not be derived. Furthermore, the effect of covariates with respect to the overall hazard function can not be used to assess the cause-specific effects.
surv_object <- Surv(time = Melanoma2$time,
event = as.numeric(Melanoma2$status == 1))
plot(surv_object, main = "Kaplan-Meier estimator for S(t)",
xlab = "Time (in days)", ylab = "S(t)")
fit1 <- survfit(surv_object ~ 1, data = Melanoma2)
head(summary(fit1))
## $n
## [1] 205
##
## $time
## [1] 185 204 210 232 279 295 386 426 469 529 621 629 659 667 718
## [16] 752 779 793 817 833 858 869 872 967 977 982 1041 1055 1062 1075
## [31] 1156 1228 1252 1271 1312 1435 1506 1516 1548 1560 1584 1621 1667 1690 1726
## [46] 1933 2061 2062 2103 2108 2256 2388 2467 2565 2782 3042 3338
##
## $n.risk
## [1] 201 200 199 198 196 195 193 192 191 189 188 187 186 185 184 183 182 181 180
## [20] 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 161 159 155
## [39] 152 150 148 146 137 134 131 110 95 94 90 88 80 75 69 63 57 52 35
##
## $n.event
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##
## $n.censor
## [1] 4 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## [26] 0 0 0 0 0 0 0 0 0 0 1 1 3 2 1 1 1 8 2 2 20 14 0 3 1
## [51] 7 4 5 5 5 4 16
##
## $surv
## [1] 0.9950249 0.9900498 0.9850746 0.9800995 0.9750990 0.9700985 0.9650721
## [8] 0.9600457 0.9550192 0.9499662 0.9449132 0.9398602 0.9348072 0.9297542
## [15] 0.9247012 0.9196482 0.9145951 0.9095421 0.9044891 0.8994077 0.8943263
## [22] 0.8892449 0.8841635 0.8790821 0.8740007 0.8689193 0.8638379 0.8587565
## [29] 0.8536751 0.8485937 0.8435123 0.8384309 0.8333495 0.8282681 0.8231867
## [36] 0.8180738 0.8129286 0.8076839 0.8023702 0.7970211 0.7916358 0.7862137
## [43] 0.7804749 0.7746504 0.7687371 0.7617486 0.7537301 0.7457117 0.7374261
## [50] 0.7290462 0.7199331 0.7103340 0.7000393 0.6889276 0.6768411 0.6638250
## [57] 0.6448585
plot(fit1$time, 1-fit1$surv, lwd = 2, col = 2, type = "l",
main = "Kaplan-Meier estimator for CIF(t)",
xlab = "Time (in days)", ylab = "Marginal CIF")
This method requires an assumption about non-informative censoring. But it can be violated in the real world so we need to take into account the competing risk in this case.
In order to do so, the Kaplan-Meier (KM) estimator can be used in case there are no competing risks OR when competing risks are assumed to be independent (see above). However, the KM method provides biased estimates in case the independence assumption is violated (i.e., the event time distributions for individuals that do experience or do not experience the competing event are different), hence, one should estimate cause-specific CIFs to gain additional insights into the nature of the survival data under study. The cuminc() function is the R package cmprsk can be used to estimate the cause-specific CIFs.
# Estimation of the cause-specific CIF
#--------------------------------------
library("MASS")
##
## Attaching package: 'MASS'
## The following object is masked from 'package:gtsummary':
##
## select
## The following object is masked from 'package:dplyr':
##
## select
library("ggsurvfit")
data(Melanoma, package = "MASS")
# time survival time in days, possibly censored.
# status 1 died from melanoma, 2 alive, 3 dead from other causes.
# sex 1 = male, 0 = female.
# age age in years.
# year of operation.
# thickness tumor thickness in mm.
# ulcer 1 = presence, 0 = absence.
# status 0=alive, 1=died from melanoma, 2=dead from other causes.
Melanoma <-
Melanoma %>%
mutate(
status = as.factor(recode(status, `2` = 0, `1` = 1, `3` = 2))
)
# step 1. Cumulative incidence for competing risks
cuminc(Surv(time, status) ~ 1, data = Melanoma)
##
## ── cuminc() ────────────────────────────────────────────────────────────────────
## • Failure type "1"
## time n.risk estimate std.error 95% CI
## 1,000 171 0.127 0.023 0.086, 0.177
## 2,000 103 0.230 0.030 0.174, 0.291
## 3,000 54 0.310 0.037 0.239, 0.383
## 4,000 13 0.339 0.041 0.260, 0.419
## 5,000 1 0.339 0.041 0.260, 0.419
## • Failure type "2"
## time n.risk estimate std.error 95% CI
## 1,000 171 0.034 0.013 0.015, 0.066
## 2,000 103 0.050 0.016 0.026, 0.087
## 3,000 54 0.058 0.017 0.030, 0.099
## 4,000 13 0.106 0.032 0.053, 0.179
## 5,000 1 0.106 0.032 0.053, 0.179
cuminc(Surv(time, status) ~ 1, data = Melanoma) %>%
ggcuminc() +
labs(
x = "Days"
) +
add_confidence_interval() +
add_risktable()
## Plotting outcome "1".
cuminc(Surv(time, status) ~ 1, data = Melanoma) %>%
ggcuminc(outcome = c("1", "2")) +
ylim(c(0, 1)) +
labs(
x = "Days"
)
cuminc(Surv(time, status) ~ sex, data = Melanoma) %>%
ggcuminc(outcome = c("1", "2")) +
ylim(c(0, 1)) +
labs(
x = "Days"
)
The cause-specific CIFs for males and females express the risk of experiencing each of the competing events for males vs. females. Apparently, there exists a large(r) difference between the risk of dying from melanoma between males and females as compared to the difference in risk of dying from another cause. More specifically, males tend to have a higher risk than females to die after the surgical operation as a result of melanoma. Differences between the survival probabilities for the other cause are small between males and females (dashed lines).
cuminc(Surv(time, status) ~ sex, data = Melanoma)
##
## ── cuminc() ────────────────────────────────────────────────────────────────────
## • Failure type "1"
## strata time n.risk estimate std.error 95% CI
## 0 1,000 111 0.087 0.025 0.046, 0.145
## 0 2,000 68 0.181 0.035 0.118, 0.255
## 0 3,000 36 0.236 0.043 0.158, 0.323
## 0 4,000 9 0.284 0.052 0.187, 0.389
## 0 5,000 1 0.284 0.052 0.187, 0.389
## 1 1,000 60 0.192 0.045 0.113, 0.287
## 1 2,000 35 0.310 0.053 0.210, 0.415
## 1 3,000 18 0.425 0.065 0.296, 0.547
## 1 4,000 4 0.425 0.065 0.296, 0.547
## 1 5,000 0 NA NA NA, NA
## • Failure type "2"
## strata time n.risk estimate std.error 95% CI
## 0 1,000 111 0.032 0.016 0.010, 0.074
## 0 2,000 68 0.040 0.018 0.015, 0.085
## 0 3,000 36 0.052 0.021 0.021, 0.105
## 0 4,000 9 0.085 0.039 0.029, 0.181
## 0 5,000 1 0.085 0.039 0.029, 0.181
## 1 1,000 60 0.038 0.022 0.010, 0.098
## 1 2,000 35 0.067 0.029 0.024, 0.140
## 1 3,000 18 0.067 0.029 0.024, 0.140
## 1 4,000 4 0.135 0.054 0.051, 0.259
## 1 5,000 0 NA NA NA, NA
## • Tests
## outcome statistic df p.value
## 1 5.81 1.00 0.016
## 2 0.854 1.00 0.36
The first column of the output shows the \(\chiˆ2\)-statistic for the between-group test, and the second column presents the corresponding p-values. Based on the results, there exists a significant difference in risk of dying from melanoma between males and females at a 5% significance level (p-value = 0.016). There is no significant difference in mortality risk from other causes for male versus female patients (p-value = 0.360).
A cause-specific hazard regression model is used to investigate the effect of different covariates directly on the cause-specific hazard function (i.e., the rate of occurrence of a specific cause/event). A cause-specific hazard regression model can be fitted using the standard Cox proportional hazards regression model while treating failures from the cause of interest as events and failure from other causes as censored observations. The effect of covariates on the cause-specific hazard can be estimated using the partial likelihood method as proposed by Cox. Such a model is fit using the coxph() function in the survival package in R.
# Cox PH model for cause-specific hazards regression
#----------------------------------------------------
Melanoma2 <- import(here("Melanoma.txt"))
## Warning in (function (input = "", file = NULL, text = NULL, cmd = NULL, :
## Detected 11 column names but the data has 12 columns (i.e. invalid file). Added
## 1 extra default column name for the first column which is guessed to be row
## names or an index. Use setnames() afterwards if this guess is not correct, or
## fix the file write command that created the file to create a valid file.
csh <- coxph(Surv(time, status==1) ~ sex+age+invasion,data=Melanoma2)
summary(csh)
## Call:
## coxph(formula = Surv(time, status == 1) ~ sex + age + invasion,
## data = Melanoma2)
##
## n= 205, number of events= 57
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexMale 0.663383 1.941349 0.266320 2.491 0.012741 *
## age 0.009823 1.009871 0.008339 1.178 0.238840
## invasionlevel.1 1.037168 2.821217 0.328241 3.160 0.001579 **
## invasionlevel.2 1.403225 4.068300 0.380744 3.685 0.000228 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexMale 1.941 0.5151 1.1519 3.272
## age 1.010 0.9902 0.9935 1.027
## invasionlevel.1 2.821 0.3545 1.4826 5.368
## invasionlevel.2 4.068 0.2458 1.9290 8.580
##
## Concordance= 0.7 (se = 0.035 )
## Likelihood ratio test= 26.7 on 4 df, p=2e-05
## Wald test = 24.39 on 4 df, p=7e-05
## Score (logrank) test = 26.85 on 4 df, p=2e-05
Based on the model output and the estimated HRs, we can conclude that there exists a significant difference in instantaneous probability of dying from melanoma between males and females. More specifically, the hazard of dying from melanoma is estimated to be approximately 2 times higher for males vs.females, accounting for death due to another reason. Furthermore, there is no significant effect of age on the cause-specific hazard (p-value = 0.239). The invasion level alters the rate of melanoma-related mortality significantly with the highest rate observed for invasion level 2, with the hazard being approximately four times higher as compared to the hazard with baseline invastion level (level 0).
NOTE: Alternatively, cause-specific regression can also be performed using the CSC() function contained in the riskRegression package in R.
library("prodlim")
library("riskRegression")
csh_ext<-CSC(Hist(time,status)~sex+age+invasion,data=Melanoma2)
print(csh_ext)
## CSC(formula = Hist(time, status) ~ sex + age + invasion, data = Melanoma2)
##
## Right-censored response of a competing.risks model
##
## No.Observations: 205
##
## Pattern:
##
## Cause event right.censored
## 1 57 0
## 2 14 0
## unknown 0 134
##
##
## ----------> Cause: 1
##
## Call:
## coxph(formula = survival::Surv(time, status) ~ sex + age + invasion,
## x = TRUE, y = TRUE)
##
## n= 205, number of events= 57
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexMale 0.663383 1.941349 0.266320 2.491 0.012741 *
## age 0.009823 1.009871 0.008339 1.178 0.238840
## invasionlevel.1 1.037168 2.821217 0.328241 3.160 0.001579 **
## invasionlevel.2 1.403225 4.068300 0.380744 3.685 0.000228 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexMale 1.941 0.5151 1.1519 3.272
## age 1.010 0.9902 0.9935 1.027
## invasionlevel.1 2.821 0.3545 1.4826 5.368
## invasionlevel.2 4.068 0.2458 1.9290 8.580
##
## Concordance= 0.7 (se = 0.035 )
## Likelihood ratio test= 26.7 on 4 df, p=2e-05
## Wald test = 24.39 on 4 df, p=7e-05
## Score (logrank) test = 26.85 on 4 df, p=2e-05
##
##
##
## ----------> Cause: 2
##
## Call:
## coxph(formula = survival::Surv(time, status) ~ sex + age + invasion,
## x = TRUE, y = TRUE)
##
## n= 205, number of events= 14
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexMale 0.29893 1.34842 0.54620 0.547 0.584180
## age 0.09271 1.09715 0.02592 3.576 0.000349 ***
## invasionlevel.1 -0.88527 0.41260 0.64069 -1.382 0.167051
## invasionlevel.2 -1.34958 0.25935 1.12351 -1.201 0.229666
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexMale 1.3484 0.7416 0.46227 3.933
## age 1.0971 0.9115 1.04279 1.154
## invasionlevel.1 0.4126 2.4236 0.11754 1.448
## invasionlevel.2 0.2593 3.8558 0.02868 2.345
##
## Concordance= 0.839 (se = 0.042 )
## Likelihood ratio test= 18.96 on 4 df, p=8e-04
## Wald test = 14.21 on 4 df, p=0.007
## Score (logrank) test = 15.53 on 4 df, p=0.004
A subdistribution hazard regression model is also referred to as the Fine and Gray model. The motivation for the use of the Fine and Gray model is that the effect of a covariate on the cause-specific hazard function may be quite different from that on the cumulative incidence. In other words, a covariate may have strong influence on the cause-specific hazard, but no effect on the CIF. The difference between cause-specific hazard regression and sub-distribution hazard regression is that both hazard functions rely on a different treatment of the competing risk events. The former considers competing risk events as non-informative censoring, whereas the latter takes into account the informative censoring nature of the competing risk events. 1. Consider the Fine and Gray model for the example at hand. More specifically, fit a subdistribution hazards regression model with gender, age and invasion as covariates.
# Fine and Gray model for subdistribution hazards regression
#------------------------------------------------------------
shm <- FGR(Hist(time,status)~sex+age+invasion,data=Melanoma2)
shm
##
## Right-censored response of a competing.risks model
##
## No.Observations: 205
##
## Pattern:
##
## Cause event right.censored
## 1 57 0
## 2 14 0
## unknown 0 134
##
##
## Fine-Gray model: analysis of cause 1
##
## Competing Risks Regression
##
## Call:
## FGR(formula = Hist(time, status) ~ sex + age + invasion, data = Melanoma2,
## cause = "1")
##
## coef exp(coef) se(coef) z p-value
## sexMale 0.62762 1.87 0.27170 2.310 0.0210
## age 0.00565 1.01 0.00913 0.619 0.5400
## invasionlevel.1 1.04909 2.86 0.34040 3.082 0.0021
## invasionlevel.2 1.37802 3.97 0.40137 3.433 0.0006
##
## exp(coef) exp(-coef) 2.5% 97.5%
## sexMale 1.87 0.534 1.100 3.19
## age 1.01 0.994 0.988 1.02
## invasionlevel.1 2.86 0.350 1.465 5.56
## invasionlevel.2 3.97 0.252 1.806 8.71
##
## Num. cases = 205
## Pseudo Log-likelihood = -274
## Pseudo likelihood ratio test = 24.6 on 4 df,
##
## Convergence: TRUE
The estimated coefficient for cause 1 (i.e., mortality as a result of melanoma) deviates slightly from the one obtained in the cause-specific hazard model (HR: 1.87 vs. 1.94), as a result of different assumptions for the competing risks. The numerical values derived from the Fine and Gray model have no simple interpretation. These estimates reflect the ordering of the cumulative incidence curves.The cause-specific hazard function represents the time-dependent rate of failure due to cause per unit of time for patients who are still alive. The subdistribution hazard, however, is the rate of failure due to cause one per unit of time for patients who are either alive or have already failed from cause 2. In other words, patients who fail from other causes remain in the risk set.
Formula: h_i(t) = h_0(t)exp(X_ib) The interpretation of the coefficients in the cause-specific hazard regression model is straightforward. For example, if the coefficient for a particular predictor variable is positive, then the hazard rate of the event of interest increases as the value of that predictor variable increases.
Formula: h_i(t) = (d_i(t) + h_0(t)exp(X_ib))/R_i(t) Where d_i(t) is the cumulative incidence function of the competing events
The interpretation of the coefficients in the subdistribution hazard regression model is more complex, as they reflect the effect of the predictor variable on the cumulative incidence function, which takes into account both the occurrence of the event of interest and competing events. For example, a positive coefficient for a particular predictor variable indicates that the cumulative incidence of the event of interest increases as the value of that predictor variable increases, while taking into account the occurrence of competing events and death.
In summary, the cause-specific hazard regression model and subdistribution hazard regression model differ in their focus, with the former modeling the hazard rate of a specific event of interest while treating competing events as censoring events, and the latter modeling the hazard rate of the event of interest while taking into account both competing events and the competing risk of death. The interpretation of the coefficients in the cause-specific hazard regression model is straightforward, while the interpretation in the subdistribution hazard regression model is more complex, reflecting the cumulative incidence function of the event of interest. Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.