Here we are going to see how we performe Stratified Cox PH Model using R.
It is a modification of Cox Proportional Hazard model. We know in our model, preditors(predictor variables) are assumed to satisfy PH Assumption.
In our model the exponential terms(X’s variables) are independent over time(t).
So what happen if PH assumption is not satisfy?. Then we use Stratified Cox PH Model.
From here we use some examples to understand this. Let’s attack ;) nessessary libraries.
library(survival)
## Warning: package 'survival' was built under R version 4.0.2
In this example we use a data set called kidney in survival package.
Let’s see our dataset.
# View(kidney)
# ?kidney
head(kidney, 10)
str(kidney)
## 'data.frame': 76 obs. of 7 variables:
## $ id : num 1 1 2 2 3 3 4 4 5 5 ...
## $ time : num 8 16 23 13 22 28 447 318 30 12 ...
## $ status : num 1 1 1 0 1 1 1 1 1 1 ...
## $ age : num 28 28 48 48 32 32 31 32 10 10 ...
## $ sex : num 1 1 2 2 1 1 2 2 1 1 ...
## $ disease: Factor w/ 4 levels "Other","GN","AN",..: 1 1 2 2 1 1 1 1 1 1 ...
## $ frail : num 2.3 2.3 1.9 1.9 1.2 1.2 0.5 0.5 1.5 1.5 ...
attach(kidney) # At the end of this example we detach this dataset ;)
Let’s findout more details about kidney data set,
table(status)
## status
## 0 1
## 18 58
table(sex)
## sex
## 1 2
## 20 56
table(disease)
## disease
## Other GN AN PKD
## 26 18 24 8
So we can see, in this experiment 58 deaths, 18 censored and total 76 observations. Also 20 males and 56 females and four types of diseases have encountered.
Now let’s go to work
# Surv object
kidney.surv <- Surv(time = time, event = status)
kidney.surv
## [1] 8 16 23 13+ 22 28 447 318 30 12 24 245 7 9 511
## [16] 30 53 196 15 154 7 333 141 8+ 96 38 149+ 70+ 536 25+
## [31] 17 4+ 185 177 292 114 22+ 159+ 15 108+ 152 562 402 24+ 13
## [46] 66 39 46+ 12 40 113+ 201 132 156 34 30 2 25 130 26
## [61] 27 58 5+ 43 152 30 190 5+ 119 8 54+ 16+ 6+ 78 63
## [76] 8+
Now we are going to look about four different types of diseases.
KM curves for four different types of diseases.
kidney.km <- survfit(kidney.surv ~ disease)
plot(kidney.km, xlab = "Time", ylab = "Survival Probability",
main = "Survival curves for diseases", col = c(1,2,3,4))
legend("topright", legend = c("GN", "AN", "PKD", "Other"),
col = c(1,2,3,4), fill = c(1,2,3,4), title = "Disease")
Now we fit the Cox PH model for this,
kidney.coxm1 <- coxph(kidney.surv ~ sex + disease + age + frail, data = kidney)
summary(kidney.coxm1)
## Call:
## coxph(formula = kidney.surv ~ sex + disease + age + frail, data = kidney)
##
## n= 76, number of events= 58
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sex -2.099844 0.122475 0.392654 -5.348 8.90e-08 ***
## diseaseGN 0.130666 1.139587 0.436114 0.300 0.764471
## diseaseAN 0.640906 1.898200 0.447886 1.431 0.152442
## diseasePKD -2.168515 0.114347 0.648825 -3.342 0.000831 ***
## age 0.007714 1.007744 0.011907 0.648 0.517055
## frail 1.791873 6.000682 0.257639 6.955 3.53e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sex 0.1225 8.1649 0.05673 0.2644
## diseaseGN 1.1396 0.8775 0.48476 2.6790
## diseaseAN 1.8982 0.5268 0.78904 4.5665
## diseasePKD 0.1143 8.7453 0.03206 0.4079
## age 1.0077 0.9923 0.98450 1.0315
## frail 6.0007 0.1666 3.62158 9.9427
##
## Concordance= 0.822 (se = 0.03 )
## Likelihood ratio test= 68.71 on 6 df, p=8e-13
## Wald test = 60.01 on 6 df, p=4e-11
## Score (logrank) test = 86.24 on 6 df, p=<2e-16
Look the three stars, which means sex, diseasePKD and frail variables are significant. You can see the Hazard ratio(\(HR\)) by exp(coef) and one over HR (\(1/HR\)) by exp(-coef). Also 95% C.I. (if C.I. does not contains 1 it is significant).
kidney.coxm2 <- coxph(kidney.surv ~ sex * disease, data = kidney)
summary(kidney.coxm2)
## Call:
## coxph(formula = kidney.surv ~ sex * disease, data = kidney)
##
## n= 76, number of events= 58
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sex -2.5252779 0.0800361 0.5659622 -4.462 8.12e-06 ***
## diseaseGN -1.3415792 0.2614325 1.3037724 -1.029 0.303481
## diseaseAN -1.4180768 0.2421793 1.4840553 -0.956 0.339304
## diseasePKD -7.3819153 0.0006224 1.9493785 -3.787 0.000153 ***
## sex:diseaseGN 0.8312922 2.2962840 0.7604435 1.093 0.274320
## sex:diseaseAN 1.0754002 2.9311656 0.8156353 1.318 0.187342
## sex:diseasePKD 4.2144726 67.6584758 1.1501090 3.664 0.000248 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sex 8.004e-02 1.249e+01 2.640e-02 0.2427
## diseaseGN 2.614e-01 3.825e+00 2.030e-02 3.3661
## diseaseAN 2.422e-01 4.129e+00 1.321e-02 4.4398
## diseasePKD 6.224e-04 1.607e+03 1.364e-05 0.0284
## sex:diseaseGN 2.296e+00 4.355e-01 5.173e-01 10.1933
## sex:diseaseAN 2.931e+00 3.412e-01 5.926e-01 14.4981
## sex:diseasePKD 6.766e+01 1.478e-02 7.101e+00 644.6096
##
## Concordance= 0.709 (se = 0.037 )
## Likelihood ratio test= 30.26 on 7 df, p=8e-05
## Wald test = 32.89 on 7 df, p=3e-05
## Score (logrank) test = 42.46 on 7 df, p=4e-07
Here we can see sex, diseadePKD and sex*diseasePKD are significant.
kidney.coxm3 <- coxph(kidney.surv ~ sex * frail, data = kidney)
summary(kidney.coxm3)
## Call:
## coxph(formula = kidney.surv ~ sex * frail, data = kidney)
##
## n= 76, number of events= 58
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sex -0.3725 0.6890 0.5343 -0.697 0.486
## frail 2.6393 14.0041 0.6740 3.916 9.01e-05 ***
## sex:frail -0.8510 0.4270 0.3717 -2.290 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sex 0.689 1.45142 0.2418 1.9635
## frail 14.004 0.07141 3.7371 52.4774
## sex:frail 0.427 2.34197 0.2061 0.8847
##
## Concordance= 0.798 (se = 0.036 )
## Likelihood ratio test= 46.18 on 3 df, p=5e-10
## Wald test = 51.23 on 3 df, p=4e-11
## Score (logrank) test = 81.61 on 3 df, p=<2e-16
Now frail and sex*frail significant but sex is not significant.
Likewise we can build models.
Just suppose we want to know sex*trail interaction effect is significant or not by using Log-Likelyhood test what we do ?. So we fit a model like this,
kidney.coxmem <- coxph(kidney.surv ~ sex * frail + age + disease, data = kidney)
summary(kidney.coxmem)
## Call:
## coxph(formula = kidney.surv ~ sex * frail + age + disease, data = kidney)
##
## n= 76, number of events= 58
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sex -2.639368 0.071406 0.739592 -3.569 0.000359 ***
## frail 1.153203 3.168325 0.782822 1.473 0.140714
## age 0.008522 1.008558 0.012116 0.703 0.481838
## diseaseGN 0.118022 1.125268 0.441243 0.267 0.789103
## diseaseAN 0.592404 1.808331 0.456345 1.298 0.194236
## diseasePKD -2.506857 0.081524 0.764567 -3.279 0.001043 **
## sex:frail 0.424413 1.528693 0.494655 0.858 0.390894
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sex 0.07141 14.0043 0.01676 0.3043
## frail 3.16833 0.3156 0.68311 14.6950
## age 1.00856 0.9915 0.98489 1.0328
## diseaseGN 1.12527 0.8887 0.47388 2.6720
## diseaseAN 1.80833 0.5530 0.73933 4.4230
## diseasePKD 0.08152 12.2663 0.01822 0.3648
## sex:frail 1.52869 0.6542 0.57979 4.0306
##
## Concordance= 0.823 (se = 0.03 )
## Likelihood ratio test= 69.44 on 7 df, p=2e-12
## Wald test = 59.48 on 7 df, p=2e-10
## Score (logrank) test = 94.74 on 7 df, p=<2e-16
Now we do the test using hypotheis,
So model 1(kidney.coxm1) will be our Full Model(FM) and extra model(kidney.coxmem) will be our Redius Model(RM).
And look the Log-Rank test values of FM and RM.
Then use this test statistic : \((-2*Log_{RM}) - (-2*Log_{FM})\)
Finally check the significant of this test statistic with chi-squre distribution with degree of freedom 1 (\(X_{df=1}\)).
ts <- (-2*86.24) - (-2*94.74)
ts
## [1] 17
And let’s get \(\alpha = 0.05\) , then chi-square df 1 (\(X_{df=1}(\alpha=0.05)\)) is equal to 3.841.
So, calculated test statistic is greater than table value.
Therefore we reject \(H_0\) (Reject Null Hypothesis).
Which means result is statistically significant.
Now we already have the models and let’s check the PH assumptions,
kidney.test.phm1 <- cox.zph(fit = kidney.coxm1)
kidney.test.phm1
## chisq df p
## sex 0.2907 1 0.5898
## disease 0.2763 3 0.9644
## age 0.0507 1 0.8219
## frail 6.7193 1 0.0095
## GLOBAL 8.8036 6 0.1849
Here p value is p-values of PH(\(p(PH)\)). If it is less than \(\alpha = 0.05\) we can say PH Assumption is viaolated(Not satisfy).
If \(p(PH) < \alpha\) => PH Assumption not satisfy.
So here it is frail(violate PH Assumption).
This is not enough to say that a perticular variable is viaolating PH assumptions, we use another graphycal method.,
plot(survfit(kidney.surv ~ sex), fun = "cloglog", col = c("red", "blue"))