Traditional survival analysis generally only cares about one endpoint event. requiring that the censoring situation and the individual endpoint event are independent of each other. In actual situations, another event may affect the probability of the endpoint event, in which case it can be considered that the former has a competing risk with the latter.
Observation time not fully observed: censoring (Assumed to be non-informative); (Observed) Event that prevents the event of interest: competing risks.
The purpose is to study the relationship between the number of lymph node examinations and the survival time of patients with non-small cell lung cancer based on a competing risk model. The event of interest is the death of patients from non-small cell lung cancer, and the competing event is death from other causes (such as other tumors, etc.). The outcome variable is a three-category variable (0, 1, 2), including survival (censored), death from non-small cell lung cancer, and death from other causes.
setwd("C:\\Users\\hed2\\Downloads\\dell\\code-storage\\code")
mydata <- read.csv ('compt risk survival.csv')
str (mydata)
## 'data.frame': 1200 obs. of 9 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Age : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Gender : int 2 2 1 2 2 1 1 1 2 2 ...
## $ Ethnicity : int 2 1 1 1 1 1 3 3 3 2 ...
## $ Primary_site : int 5 1 1 2 1 1 1 1 4 3 ...
## $ T_stage : int 1 4 1 2 2 4 1 1 1 1 ...
## $ Lymph_number : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Outcome : int 1 0 1 0 1 1 1 1 1 1 ...
## $ Survival_months: int 6 52 26 89 38 8 0 35 15 4 ...
mydata$Age <- factor (mydata$Age, levels = c(1, 2, 3), labels = c ("<60", "60-74", "≥75"))
mydata$Gender <- factor (mydata$Gender, levels = c(1, 2), labels = c ("Male", "Female"))
mydata$Ethnicity <- factor (mydata$Ethnicity, levels = c(1, 2, 3, 4), labels = c ("White", "Asian", "Black", "Others"))
mydata$Primary_site <- factor (mydata$Primary_site, levels = c(1, 2, 3, 4, 5), labels = c ("Upper lobe", "Middle lobe", "Lower lobe", "Main bronchus", "Overlapping lesion"))
mydata$T_stage <- factor (mydata$T_stage, levels = c(1, 2, 3, 4), labels = c ("T1", "T2", "T3", "T4"))
mydata$Lymph_number <- factor (mydata$Lymph_number, levels = c(1, 2), labels = c ("< 16", "≥ 16"))
mydata$Outcome <- factor (mydata$ Outcome)
View(mydata)
summary (mydata)
## ID Age Gender Ethnicity
## Min. : 1.0 <60 :281 Male :694 White :958
## 1st Qu.: 300.8 60-74:797 Female:506 Asian : 82
## Median : 600.5 ≥75 :122 Black :150
## Mean : 600.5 Others: 10
## 3rd Qu.: 900.2
## Max. :1200.0
## Primary_site T_stage Lymph_number Outcome Survival_months
## Upper lobe :777 T1:285 < 16:1029 0:422 Min. : 0.00
## Middle lobe : 74 T2:314 ≥ 16: 171 1:654 1st Qu.: 8.00
## Lower lobe :308 T3:311 2:124 Median :25.00
## Main bronchus : 25 T4:290 Mean :33.53
## Overlapping lesion: 16 3rd Qu.:55.25
## Max. :95.00
library (cmprsk)
## Warning: package 'cmprsk' was built under R version 4.4.3
## Loading required package: survival
library (riskRegression)
## Warning: package 'riskRegression' was built under R version 4.4.3
## riskRegression version 2023.12.21
library(pec)
## Warning: package 'pec' was built under R version 4.4.3
## Loading required package: prodlim
## Warning: package 'prodlim' was built under R version 4.4.3
##
## Attaching package: 'pec'
## The following objects are masked from 'package:riskRegression':
##
## ipcw, selectCox
cum_Lymph_number <- cuminc (mydata$Survival_months, mydata$Outcome, mydata$Lymph_number)
cum_Lymph_number
## Tests:
## stat pv df
## 1 26.466910 2.680913e-07 1
## 2 2.980508 8.427284e-02 1
## Estimates and Variances:
## $est
## 20 40 60 80
## < 16 1 0.40225361 0.52657627 0.5737580 0.6125047
## ≥ 16 1 0.21183283 0.33154283 0.3562282 0.4031302
## < 16 2 0.05091013 0.07275006 0.0945214 0.1142624
## ≥ 16 2 0.05869348 0.08931267 0.1424881 0.1764088
##
## $var
## 20 40 60 80
## < 16 1 2.353980e-04 2.485565e-04 2.576628e-04 0.0002832024
## ≥ 16 1 9.886072e-04 1.360950e-03 1.463218e-03 0.0017834570
## < 16 2 4.729336e-05 6.735227e-05 9.318157e-05 0.0001278999
## ≥ 16 2 3.263241e-04 4.883142e-04 8.855947e-04 0.0011831381
cum_Lymph_number$`< 16 1`$est[length(cum_Lymph_number $`< 16 1`$est)]
## [1] 0.6285701
cum_Lymph_number$`≥ 16 1`$est[length(cum_Lymph_number $`≥ 16 1`$est)]
## [1] 0.422242
plot (cum_Lymph_number, color = c ("pink", "red", "blue", "darkcyan"), xlab = "", xlim = c (0,100), ylim = c (0,1), ylab = "CIF", main = 'Lymph')
fgr_Lymph_number <- FGR(Hist(Survival_months, Outcome)~ Lymph_number, data = mydata)
## Argument cause missing. Analyse cause: 1
summary (fgr_Lymph_number)
## Competing Risks Regression
##
## Call:
## FGR(formula = Hist(Survival_months, Outcome) ~ Lymph_number,
## data = mydata, cause = "1")
##
## coef exp(coef) se(coef) z p-value
## Lymph_number≥ 16 -0.647 0.523 0.128 -5.05 4.5e-07
##
## exp(coef) exp(-coef) 2.5% 97.5%
## Lymph_number≥ 16 0.523 1.91 0.407 0.673
##
## Num. cases = 1200
## Pseudo Log-likelihood = -4357
## Pseudo likelihood ratio test = 28.4 on 1 df,
After controlling for competing risk events, the risk of death from non-small cell lung cancer in the group <16 was 0.91 times higher than that in the group ≥16 (HR=1.91), and the risk of death from non-small cell lung cancer in the group ≥16 was 0.477 times lower than that in the group <16 (HR=0.523, 95%CI: 0.407-0.673). ≥16/<16=exp(coef)=0.523; <16/≥16=exp(-coef)=1.91.
Please be careful here the highest level of the
covariate is the reference group in SAS but the lowest level in R.
fgr_multi <- FGR (Hist(Survival_months, Outcome)~Age + Gender + Primary_site + Lymph_number, data = mydata)
## Argument cause missing. Analyse cause: 1
summary (fgr_multi)
## Competing Risks Regression
##
## Call:
## FGR(formula = Hist(Survival_months, Outcome) ~ Age + Gender +
## Primary_site + Lymph_number, data = mydata, cause = "1")
##
## coef exp(coef) se(coef) z p-value
## Age60-74 0.0971 1.102 0.0974 0.997 3.2e-01
## Age≥75 0.3814 1.464 0.1506 2.532 1.1e-02
## GenderFemale -0.1418 0.868 0.0793 -1.788 7.4e-02
## Primary_siteMiddle lobe -0.0993 0.905 0.1702 -0.584 5.6e-01
## Primary_siteLower lobe 0.0981 1.103 0.0920 1.066 2.9e-01
## Primary_siteMain bronchus 0.3667 1.443 0.2451 1.496 1.3e-01
## Primary_siteOverlapping lesion 0.9316 2.539 0.2964 3.143 1.7e-03
## Lymph_number≥ 16 -0.6463 0.524 0.1288 -5.017 5.2e-07
##
## exp(coef) exp(-coef) 2.5% 97.5%
## Age60-74 1.102 0.907 0.910 1.334
## Age≥75 1.464 0.683 1.090 1.967
## GenderFemale 0.868 1.152 0.743 1.014
## Primary_siteMiddle lobe 0.905 1.104 0.649 1.264
## Primary_siteLower lobe 1.103 0.907 0.921 1.321
## Primary_siteMain bronchus 1.443 0.693 0.893 2.333
## Primary_siteOverlapping lesion 2.539 0.394 1.420 4.539
## Lymph_number≥ 16 0.524 1.908 0.407 0.674
##
## Num. cases = 1200
## Pseudo Log-likelihood = -4346
## Pseudo likelihood ratio test = 50.7 on 8 df,
mydata$Outcome2=ifelse (mydata$Outcome==2|mydata$Outcome==0, 0, 1)
coxph <- coxph ( Surv (Survival_months, Outcome2)~Age + Gender + Primary_site + Lymph_number, data = mydata)
summary (coxph)
## Call:
## coxph(formula = Surv(Survival_months, Outcome2) ~ Age + Gender +
## Primary_site + Lymph_number, data = mydata)
##
## n= 1200, number of events= 654
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age60-74 0.12111 1.12875 0.09753 1.242 0.214343
## Age≥75 0.48381 1.62225 0.14372 3.366 0.000762 ***
## GenderFemale -0.17314 0.84102 0.08026 -2.157 0.030993 *
## Primary_siteMiddle lobe -0.07497 0.92778 0.17426 -0.430 0.667051
## Primary_siteLower lobe 0.10737 1.11335 0.09134 1.176 0.239787
## Primary_siteMain bronchus 0.51099 1.66695 0.25507 2.003 0.045138 *
## Primary_siteOverlapping lesion 0.88788 2.42998 0.27332 3.249 0.001160 **
## Lymph_number≥ 16 -0.65906 0.51734 0.13308 -4.952 7.33e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age60-74 1.1287 0.8859 0.9323 1.3665
## Age≥75 1.6222 0.6164 1.2240 2.1501
## GenderFemale 0.8410 1.1890 0.7186 0.9843
## Primary_siteMiddle lobe 0.9278 1.0778 0.6593 1.3055
## Primary_siteLower lobe 1.1133 0.8982 0.9309 1.3316
## Primary_siteMain bronchus 1.6669 0.5999 1.0111 2.7481
## Primary_siteOverlapping lesion 2.4300 0.4115 1.4222 4.1519
## Lymph_number≥ 16 0.5173 1.9330 0.3986 0.6715
##
## Concordance= 0.584 (se = 0.012 )
## Likelihood ratio test= 58.49 on 8 df, p=9e-10
## Wald test = 57 on 8 df, p=2e-09
## Score (logrank) test = 58.98 on 8 df, p=7e-10
The risk of death from non-small cell lung cancer in the group with ≥16 lymph node examinations was 0.483 times lower than that in the group with <16 lymph node examinations (HR=0.517, P=7.33e-07<0.001). It can be seen that when there is a competing risk, if the competing risk model is not used and the traditional proportional risk Cox regression is used, the effect of the research factor will be exaggerated.
The subdistribution satisfies the proportional hazards (PH) assumption.
library ('survival')
library ('survminer')
## Warning: package 'survminer' was built under R version 4.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.3
## Loading required package: ggpubr
##
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
##
## myeloma
data_1 <- mydata[mydata$Outcome!=2,]
data_2 <- mydata[mydata$Outcome!=1,]
data_2$Outcome <- ifelse(data_2$Outcome ==0,0,1)
cox_1 <- coxph (formula = Surv(Survival_months, Outcome) ~ Age + Gender + Primary_site + Lymph_number, data = data_1, id = ID)
cox_2 <- coxph (formula = Surv(Survival_months, Outcome) ~ Age + Gender + Primary_site + Lymph_number, data = data_2, id = ID)
## Warning in coxph.fit(X, Y, istrat, offset, init, control, weights = weights, :
## Loglik converged before variable 7 ; coefficient may be infinite.
ph_1 <- cox.zph(cox_1)
ph_2 <- cox.zph(cox_2)
ph_1
## chisq df p
## Age 6.81 2 0.033
## Gender 0.98 1 0.322
## Primary_site 2.04 4 0.729
## Lymph_number 2.75 1 0.097
## GLOBAL 12.76 8 0.120
ph_2
## chisq df p
## Age 1.923 2 0.38
## Gender 0.607 1 0.44
## Primary_site 7.141 4 0.13
## Lymph_number 0.704 1 0.40
## GLOBAL 11.242 8 0.19
ggcoxzph (ph_1)
ggcoxzph (ph_2)
The test P value of the covariate Age in model 1 was 0.033, indicating that the proportional hazard assumption was not met. However, given that the overall test P values of the two models were 0.120 and 0.188, both > 0.05, and the test P values of the target independent variable (number of lymph node examinations) were all > 0.05, it can be considered that the analysis model meets the PH assumption.
library ('car')
## Loading required package: carData
data_lm <- mydata
data_lm$Group <- 1:length(data_lm[,1])
lm_model <- lm (formula = Group ~ Age + Gender + Primary_site + Lymph_number, data = data_lm)
vif(lm_model)
## GVIF Df GVIF^(1/(2*Df))
## Age 1.017491 2 1.004344
## Gender 1.008541 1 1.004262
## Primary_site 1.019437 4 1.002409
## Lymph_number 1.009030 1 1.004505