Competing Risk Cox Model

Traditional survival analysis generally only cares about one endpoint event. requiring that the censoring situation and the individual endpoint event are independent of each other. In actual situations, another event may affect the probability of the endpoint event, in which case it can be considered that the former has a competing risk with the latter.

Observation time not fully observed: censoring (Assumed to be non-informative); (Observed) Event that prevents the event of interest: competing risks.

The purpose is to study the relationship between the number of lymph node examinations and the survival time of patients with non-small cell lung cancer based on a competing risk model. The event of interest is the death of patients from non-small cell lung cancer, and the competing event is death from other causes (such as other tumors, etc.). The outcome variable is a three-category variable (0, 1, 2), including survival (censored), death from non-small cell lung cancer, and death from other causes.

Read data

setwd("C:\\Users\\hed2\\Downloads\\dell\\code-storage\\code")
mydata <- read.csv ('compt risk survival.csv') 
str (mydata)

## 'data.frame':    1200 obs. of  9 variables:
##  $ ID             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age            : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Gender         : int  2 2 1 2 2 1 1 1 2 2 ...
##  $ Ethnicity      : int  2 1 1 1 1 1 3 3 3 2 ...
##  $ Primary_site   : int  5 1 1 2 1 1 1 1 4 3 ...
##  $ T_stage        : int  1 4 1 2 2 4 1 1 1 1 ...
##  $ Lymph_number   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Outcome        : int  1 0 1 0 1 1 1 1 1 1 ...
##  $ Survival_months: int  6 52 26 89 38 8 0 35 15 4 ...

Data transform

mydata$Age <- factor (mydata$Age, levels = c(1, 2, 3), labels = c ("<60", "60-74", "≥75"))
mydata$Gender <- factor (mydata$Gender, levels = c(1, 2), labels = c ("Male", "Female"))
mydata$Ethnicity <- factor (mydata$Ethnicity, levels = c(1, 2, 3, 4), labels = c ("White", "Asian", "Black", "Others"))
mydata$Primary_site <- factor (mydata$Primary_site, levels = c(1, 2, 3, 4, 5), labels = c ("Upper lobe", "Middle lobe", "Lower lobe", "Main bronchus", "Overlapping lesion"))
mydata$T_stage <- factor (mydata$T_stage, levels = c(1, 2, 3, 4), labels = c ("T1", "T2", "T3", "T4"))
mydata$Lymph_number <- factor (mydata$Lymph_number, levels = c(1, 2), labels = c ("< 16", "≥ 16"))
mydata$Outcome <- factor (mydata$ Outcome)
View(mydata)

summary (mydata)

##        ID            Age         Gender     Ethnicity  
##  Min.   :   1.0   <60  :281   Male  :694   White :958  
##  1st Qu.: 300.8   60-74:797   Female:506   Asian : 82  
##  Median : 600.5   ≥75  :122                Black :150  
##  Mean   : 600.5                            Others: 10  
##  3rd Qu.: 900.2                                        
##  Max.   :1200.0                                        
##              Primary_site T_stage  Lymph_number Outcome Survival_months
##  Upper lobe        :777   T1:285   < 16:1029    0:422   Min.   : 0.00  
##  Middle lobe       : 74   T2:314   ≥ 16: 171    1:654   1st Qu.: 8.00  
##  Lower lobe        :308   T3:311                2:124   Median :25.00  
##  Main bronchus     : 25   T4:290                        Mean   :33.53  
##  Overlapping lesion: 16                                 3rd Qu.:55.25  
##                                                         Max.   :95.00

Cumulative incidence (Fine-Gray univariate analysis)

library (cmprsk)

## Warning: package 'cmprsk' was built under R version 4.4.3

## Loading required package: survival

library (riskRegression)

## Warning: package 'riskRegression' was built under R version 4.4.3

## riskRegression version 2023.12.21

library(pec)

## Warning: package 'pec' was built under R version 4.4.3

## Loading required package: prodlim

## Warning: package 'prodlim' was built under R version 4.4.3

## 
## Attaching package: 'pec'

## The following objects are masked from 'package:riskRegression':
## 
##     ipcw, selectCox

cum_Lymph_number <- cuminc (mydata$Survival_months, mydata$Outcome, mydata$Lymph_number)  
cum_Lymph_number

## Tests:
##        stat           pv df
## 1 26.466910 2.680913e-07  1
## 2  2.980508 8.427284e-02  1
## Estimates and Variances:
## $est
##                20         40        60        80
## < 16 1 0.40225361 0.52657627 0.5737580 0.6125047
## ≥ 16 1 0.21183283 0.33154283 0.3562282 0.4031302
## < 16 2 0.05091013 0.07275006 0.0945214 0.1142624
## ≥ 16 2 0.05869348 0.08931267 0.1424881 0.1764088
## 
## $var
##                  20           40           60           80
## < 16 1 2.353980e-04 2.485565e-04 2.576628e-04 0.0002832024
## ≥ 16 1 9.886072e-04 1.360950e-03 1.463218e-03 0.0017834570
## < 16 2 4.729336e-05 6.735227e-05 9.318157e-05 0.0001278999
## ≥ 16 2 3.263241e-04 4.883142e-04 8.855947e-04 0.0011831381

cum_Lymph_number$`< 16 1`$est[length(cum_Lymph_number $`< 16 1`$est)]

## [1] 0.6285701

cum_Lymph_number$`≥ 16 1`$est[length(cum_Lymph_number $`≥ 16 1`$est)]

## [1] 0.422242

plot (cum_Lymph_number, color = c ("pink", "red", "blue", "darkcyan"), xlab = "", xlim = c (0,100), ylim = c (0,1), ylab = "CIF", main = 'Lymph')

Univariate competing risk regression analysis

fgr_Lymph_number <- FGR(Hist(Survival_months, Outcome)~ Lymph_number, data = mydata)

## Argument cause missing. Analyse cause: 1

summary (fgr_Lymph_number)

## Competing Risks Regression
## 
## Call:
## FGR(formula = Hist(Survival_months, Outcome) ~ Lymph_number, 
##     data = mydata, cause = "1")
## 
##                    coef exp(coef) se(coef)     z p-value
## Lymph_number≥ 16 -0.647     0.523    0.128 -5.05 4.5e-07
## 
##                  exp(coef) exp(-coef)  2.5% 97.5%
## Lymph_number≥ 16     0.523       1.91 0.407 0.673
## 
## Num. cases = 1200
## Pseudo Log-likelihood = -4357 
## Pseudo likelihood ratio test = 28.4  on 1 df,

After controlling for competing risk events, the risk of death from non-small cell lung cancer in the group <16 was 0.91 times higher than that in the group ≥16 (HR=1.91), and the risk of death from non-small cell lung cancer in the group ≥16 was 0.477 times lower than that in the group <16 (HR=0.523, 95%CI: 0.407-0.673). ≥16/<16=exp(coef)=0.523; <16/≥16=exp(-coef)=1.91.

Please be careful here the highest level of the covariate is the reference group in SAS but the lowest level in R.

Multivariable competing risk regression analysis

fgr_multi <- FGR (Hist(Survival_months, Outcome)~Age + Gender + Primary_site + Lymph_number, data = mydata)

## Argument cause missing. Analyse cause: 1

summary (fgr_multi)

## Competing Risks Regression
## 
## Call:
## FGR(formula = Hist(Survival_months, Outcome) ~ Age + Gender + 
##     Primary_site + Lymph_number, data = mydata, cause = "1")
## 
##                                   coef exp(coef) se(coef)      z p-value
## Age60-74                        0.0971     1.102   0.0974  0.997 3.2e-01
## Age≥75                          0.3814     1.464   0.1506  2.532 1.1e-02
## GenderFemale                   -0.1418     0.868   0.0793 -1.788 7.4e-02
## Primary_siteMiddle lobe        -0.0993     0.905   0.1702 -0.584 5.6e-01
## Primary_siteLower lobe          0.0981     1.103   0.0920  1.066 2.9e-01
## Primary_siteMain bronchus       0.3667     1.443   0.2451  1.496 1.3e-01
## Primary_siteOverlapping lesion  0.9316     2.539   0.2964  3.143 1.7e-03
## Lymph_number≥ 16               -0.6463     0.524   0.1288 -5.017 5.2e-07
## 
##                                exp(coef) exp(-coef)  2.5% 97.5%
## Age60-74                           1.102      0.907 0.910 1.334
## Age≥75                             1.464      0.683 1.090 1.967
## GenderFemale                       0.868      1.152 0.743 1.014
## Primary_siteMiddle lobe            0.905      1.104 0.649 1.264
## Primary_siteLower lobe             1.103      0.907 0.921 1.321
## Primary_siteMain bronchus          1.443      0.693 0.893 2.333
## Primary_siteOverlapping lesion     2.539      0.394 1.420 4.539
## Lymph_number≥ 16                   0.524      1.908 0.407 0.674
## 
## Num. cases = 1200
## Pseudo Log-likelihood = -4346 
## Pseudo likelihood ratio test = 50.7  on 8 df,

Without consider competing risk events

mydata$Outcome2=ifelse (mydata$Outcome==2|mydata$Outcome==0, 0, 1)  

coxph <- coxph ( Surv (Survival_months, Outcome2)~Age + Gender + Primary_site + Lymph_number, data = mydata)   
summary (coxph)

## Call:
## coxph(formula = Surv(Survival_months, Outcome2) ~ Age + Gender + 
##     Primary_site + Lymph_number, data = mydata)
## 
##   n= 1200, number of events= 654 
## 
##                                    coef exp(coef) se(coef)      z Pr(>|z|)    
## Age60-74                        0.12111   1.12875  0.09753  1.242 0.214343    
## Age≥75                          0.48381   1.62225  0.14372  3.366 0.000762 ***
## GenderFemale                   -0.17314   0.84102  0.08026 -2.157 0.030993 *  
## Primary_siteMiddle lobe        -0.07497   0.92778  0.17426 -0.430 0.667051    
## Primary_siteLower lobe          0.10737   1.11335  0.09134  1.176 0.239787    
## Primary_siteMain bronchus       0.51099   1.66695  0.25507  2.003 0.045138 *  
## Primary_siteOverlapping lesion  0.88788   2.42998  0.27332  3.249 0.001160 ** 
## Lymph_number≥ 16               -0.65906   0.51734  0.13308 -4.952 7.33e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                                exp(coef) exp(-coef) lower .95 upper .95
## Age60-74                          1.1287     0.8859    0.9323    1.3665
## Age≥75                            1.6222     0.6164    1.2240    2.1501
## GenderFemale                      0.8410     1.1890    0.7186    0.9843
## Primary_siteMiddle lobe           0.9278     1.0778    0.6593    1.3055
## Primary_siteLower lobe            1.1133     0.8982    0.9309    1.3316
## Primary_siteMain bronchus         1.6669     0.5999    1.0111    2.7481
## Primary_siteOverlapping lesion    2.4300     0.4115    1.4222    4.1519
## Lymph_number≥ 16                  0.5173     1.9330    0.3986    0.6715
## 
## Concordance= 0.584  (se = 0.012 )
## Likelihood ratio test= 58.49  on 8 df,   p=9e-10
## Wald test            = 57  on 8 df,   p=2e-09
## Score (logrank) test = 58.98  on 8 df,   p=7e-10

The risk of death from non-small cell lung cancer in the group with ≥16 lymph node examinations was 0.483 times lower than that in the group with <16 lymph node examinations (HR=0.517, P=7.33e-07<0.001). It can be seen that when there is a competing risk, if the competing risk model is not used and the traditional proportional risk Cox regression is used, the effect of the research factor will be exaggerated.

Subdistribution PH assumption

The subdistribution satisfies the proportional hazards (PH) assumption.

library ('survival')   
library ('survminer')

## Warning: package 'survminer' was built under R version 4.4.3

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 4.4.3

## Loading required package: ggpubr

## 
## Attaching package: 'survminer'

## The following object is masked from 'package:survival':
## 
##     myeloma

data_1 <- mydata[mydata$Outcome!=2,]   
data_2 <- mydata[mydata$Outcome!=1,]   
data_2$Outcome <- ifelse(data_2$Outcome ==0,0,1)   

cox_1 <- coxph (formula = Surv(Survival_months, Outcome) ~ Age + Gender + Primary_site + Lymph_number, data = data_1, id = ID)   
cox_2 <- coxph (formula = Surv(Survival_months, Outcome) ~ Age + Gender + Primary_site + Lymph_number, data = data_2, id = ID)

## Warning in coxph.fit(X, Y, istrat, offset, init, control, weights = weights, :
## Loglik converged before variable 7 ; coefficient may be infinite.

ph_1 <- cox.zph(cox_1)   
ph_2 <- cox.zph(cox_2)   
ph_1

##              chisq df     p
## Age           6.81  2 0.033
## Gender        0.98  1 0.322
## Primary_site  2.04  4 0.729
## Lymph_number  2.75  1 0.097
## GLOBAL       12.76  8 0.120

ph_2

##               chisq df    p
## Age           1.923  2 0.38
## Gender        0.607  1 0.44
## Primary_site  7.141  4 0.13
## Lymph_number  0.704  1 0.40
## GLOBAL       11.242  8 0.19

ggcoxzph (ph_1)

ggcoxzph (ph_2)

The test P value of the covariate Age in model 1 was 0.033, indicating that the proportional hazard assumption was not met. However, given that the overall test P values of the two models were 0.120 and 0.188, both > 0.05, and the test P values of the target independent variable (number of lymph node examinations) were all > 0.05, it can be considered that the analysis model meets the PH assumption.

Multicollinearity diagnosis

library ('car')

## Loading required package: carData

data_lm <- mydata   
data_lm$Group <- 1:length(data_lm[,1])   
lm_model <- lm (formula = Group ~ Age + Gender + Primary_site + Lymph_number, data = data_lm)   
vif(lm_model)

##                  GVIF Df GVIF^(1/(2*Df))
## Age          1.017491  2        1.004344
## Gender       1.008541  1        1.004262
## Primary_site 1.019437  4        1.002409
## Lymph_number 1.009030  1        1.004505

Competing Risk Cox Model

2025-05-23