Here we are going to see how we performe Stratified Cox PH Model using R.

1. What is this SC(Stratified Cox) ?

It is a modification of Cox Proportional Hazard model. We know in our model, preditors(predictor variables) are assumed to satisfy PH Assumption.

[#] What is this PH Assumption ?

In our model the exponential terms(X’s variables) are independent over time(t).

So what happen if PH assumption is not satisfy?. Then we use Stratified Cox PH Model.
From here we use some examples to understand this. Let’s attack ;) nessessary libraries.

library(survival)
## Warning: package 'survival' was built under R version 4.0.2

2. Example 1

In this example we use a data set called kidney in survival package.

2.1 Data

Let’s see our dataset.

# View(kidney)
# ?kidney
head(kidney, 10)
str(kidney)
## 'data.frame':    76 obs. of  7 variables:
##  $ id     : num  1 1 2 2 3 3 4 4 5 5 ...
##  $ time   : num  8 16 23 13 22 28 447 318 30 12 ...
##  $ status : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ age    : num  28 28 48 48 32 32 31 32 10 10 ...
##  $ sex    : num  1 1 2 2 1 1 2 2 1 1 ...
##  $ disease: Factor w/ 4 levels "Other","GN","AN",..: 1 1 2 2 1 1 1 1 1 1 ...
##  $ frail  : num  2.3 2.3 1.9 1.9 1.2 1.2 0.5 0.5 1.5 1.5 ...
attach(kidney) # At the end of this example we detach this dataset ;)

Let’s findout more details about kidney data set,

table(status)
## status
##  0  1 
## 18 58
table(sex)
## sex
##  1  2 
## 20 56
table(disease)
## disease
## Other    GN    AN   PKD 
##    26    18    24     8

So we can see, in this experiment 58 deaths, 18 censored and total 76 observations. Also 20 males and 56 females and four types of diseases have encountered.
Now let’s go to work

2.2 Work

2.2.1 Creating Surv object,

# Surv object
kidney.surv <- Surv(time = time, event = status)
kidney.surv
##  [1]   8   16   23   13+  22   28  447  318   30   12   24  245    7    9  511 
## [16]  30   53  196   15  154    7  333  141    8+  96   38  149+  70+ 536   25+
## [31]  17    4+ 185  177  292  114   22+ 159+  15  108+ 152  562  402   24+  13 
## [46]  66   39   46+  12   40  113+ 201  132  156   34   30    2   25  130   26 
## [61]  27   58    5+  43  152   30  190    5+ 119    8   54+  16+   6+  78   63 
## [76]   8+

Now we are going to look about four different types of diseases.

2.2.2 Plotting KM Curves

KM curves for four different types of diseases.

kidney.km <- survfit(kidney.surv ~ disease)
plot(kidney.km, xlab = "Time", ylab = "Survival Probability", 
     main = "Survival curves for diseases", col = c(1,2,3,4))
legend("topright", legend = c("GN", "AN", "PKD", "Other"),
       col = c(1,2,3,4), fill = c(1,2,3,4), title = "Disease")

2.2.3 Cox PH model

Now we fit the Cox PH model for this,

->first model

kidney.coxm1 <- coxph(kidney.surv ~ sex + disease + age + frail, data = kidney)
summary(kidney.coxm1)
## Call:
## coxph(formula = kidney.surv ~ sex + disease + age + frail, data = kidney)
## 
##   n= 76, number of events= 58 
## 
##                 coef exp(coef)  se(coef)      z Pr(>|z|)    
## sex        -2.099844  0.122475  0.392654 -5.348 8.90e-08 ***
## diseaseGN   0.130666  1.139587  0.436114  0.300 0.764471    
## diseaseAN   0.640906  1.898200  0.447886  1.431 0.152442    
## diseasePKD -2.168515  0.114347  0.648825 -3.342 0.000831 ***
## age         0.007714  1.007744  0.011907  0.648 0.517055    
## frail       1.791873  6.000682  0.257639  6.955 3.53e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##            exp(coef) exp(-coef) lower .95 upper .95
## sex           0.1225     8.1649   0.05673    0.2644
## diseaseGN     1.1396     0.8775   0.48476    2.6790
## diseaseAN     1.8982     0.5268   0.78904    4.5665
## diseasePKD    0.1143     8.7453   0.03206    0.4079
## age           1.0077     0.9923   0.98450    1.0315
## frail         6.0007     0.1666   3.62158    9.9427
## 
## Concordance= 0.822  (se = 0.03 )
## Likelihood ratio test= 68.71  on 6 df,   p=8e-13
## Wald test            = 60.01  on 6 df,   p=4e-11
## Score (logrank) test = 86.24  on 6 df,   p=<2e-16

Look the three stars, which means sex, diseasePKD and frail variables are significant. You can see the Hazard ratio(\(HR\)) by exp(coef) and one over HR (\(1/HR\)) by exp(-coef). Also 95% C.I. (if C.I. does not contains 1 it is significant).

->second model

kidney.coxm2 <- coxph(kidney.surv ~ sex * disease, data = kidney)
summary(kidney.coxm2)
## Call:
## coxph(formula = kidney.surv ~ sex * disease, data = kidney)
## 
##   n= 76, number of events= 58 
## 
##                      coef  exp(coef)   se(coef)      z Pr(>|z|)    
## sex            -2.5252779  0.0800361  0.5659622 -4.462 8.12e-06 ***
## diseaseGN      -1.3415792  0.2614325  1.3037724 -1.029 0.303481    
## diseaseAN      -1.4180768  0.2421793  1.4840553 -0.956 0.339304    
## diseasePKD     -7.3819153  0.0006224  1.9493785 -3.787 0.000153 ***
## sex:diseaseGN   0.8312922  2.2962840  0.7604435  1.093 0.274320    
## sex:diseaseAN   1.0754002  2.9311656  0.8156353  1.318 0.187342    
## sex:diseasePKD  4.2144726 67.6584758  1.1501090  3.664 0.000248 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## sex            8.004e-02  1.249e+01 2.640e-02    0.2427
## diseaseGN      2.614e-01  3.825e+00 2.030e-02    3.3661
## diseaseAN      2.422e-01  4.129e+00 1.321e-02    4.4398
## diseasePKD     6.224e-04  1.607e+03 1.364e-05    0.0284
## sex:diseaseGN  2.296e+00  4.355e-01 5.173e-01   10.1933
## sex:diseaseAN  2.931e+00  3.412e-01 5.926e-01   14.4981
## sex:diseasePKD 6.766e+01  1.478e-02 7.101e+00  644.6096
## 
## Concordance= 0.709  (se = 0.037 )
## Likelihood ratio test= 30.26  on 7 df,   p=8e-05
## Wald test            = 32.89  on 7 df,   p=3e-05
## Score (logrank) test = 42.46  on 7 df,   p=4e-07

Here we can see sex, diseadePKD and sex*diseasePKD are significant.

->third model

kidney.coxm3 <- coxph(kidney.surv ~ sex * frail, data = kidney)
summary(kidney.coxm3)
## Call:
## coxph(formula = kidney.surv ~ sex * frail, data = kidney)
## 
##   n= 76, number of events= 58 
## 
##              coef exp(coef) se(coef)      z Pr(>|z|)    
## sex       -0.3725    0.6890   0.5343 -0.697    0.486    
## frail      2.6393   14.0041   0.6740  3.916 9.01e-05 ***
## sex:frail -0.8510    0.4270   0.3717 -2.290    0.022 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## sex           0.689    1.45142    0.2418    1.9635
## frail        14.004    0.07141    3.7371   52.4774
## sex:frail     0.427    2.34197    0.2061    0.8847
## 
## Concordance= 0.798  (se = 0.036 )
## Likelihood ratio test= 46.18  on 3 df,   p=5e-10
## Wald test            = 51.23  on 3 df,   p=4e-11
## Score (logrank) test = 81.61  on 3 df,   p=<2e-16

Now frail and sex*frail significant but sex is not significant.
Likewise we can build models.

->extra model

Just suppose we want to know sex*trail interaction effect is significant or not by using Log-Likelyhood test what we do ?. So we fit a model like this,

kidney.coxmem <- coxph(kidney.surv ~ sex * frail + age + disease, data = kidney)
summary(kidney.coxmem)
## Call:
## coxph(formula = kidney.surv ~ sex * frail + age + disease, data = kidney)
## 
##   n= 76, number of events= 58 
## 
##                 coef exp(coef)  se(coef)      z Pr(>|z|)    
## sex        -2.639368  0.071406  0.739592 -3.569 0.000359 ***
## frail       1.153203  3.168325  0.782822  1.473 0.140714    
## age         0.008522  1.008558  0.012116  0.703 0.481838    
## diseaseGN   0.118022  1.125268  0.441243  0.267 0.789103    
## diseaseAN   0.592404  1.808331  0.456345  1.298 0.194236    
## diseasePKD -2.506857  0.081524  0.764567 -3.279 0.001043 ** 
## sex:frail   0.424413  1.528693  0.494655  0.858 0.390894    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##            exp(coef) exp(-coef) lower .95 upper .95
## sex          0.07141    14.0043   0.01676    0.3043
## frail        3.16833     0.3156   0.68311   14.6950
## age          1.00856     0.9915   0.98489    1.0328
## diseaseGN    1.12527     0.8887   0.47388    2.6720
## diseaseAN    1.80833     0.5530   0.73933    4.4230
## diseasePKD   0.08152    12.2663   0.01822    0.3648
## sex:frail    1.52869     0.6542   0.57979    4.0306
## 
## Concordance= 0.823  (se = 0.03 )
## Likelihood ratio test= 69.44  on 7 df,   p=2e-12
## Wald test            = 59.48  on 7 df,   p=2e-10
## Score (logrank) test = 94.74  on 7 df,   p=<2e-16

Now we do the test using hypotheis,
So model 1(kidney.coxm1) will be our Full Model(FM) and extra model(kidney.coxmem) will be our Redius Model(RM).
And look the Log-Rank test values of FM and RM.
Then use this test statistic : \((-2*Log_{RM}) - (-2*Log_{FM})\)
Finally check the significant of this test statistic with chi-squre distribution with degree of freedom 1 (\(X_{df=1}\)).

ts <- (-2*86.24) - (-2*94.74)
ts
## [1] 17

And let’s get \(\alpha = 0.05\) , then chi-square df 1 (\(X_{df=1}(\alpha=0.05)\)) is equal to 3.841.
So, calculated test statistic is greater than table value.
Therefore we reject \(H_0\) (Reject Null Hypothesis).
Which means result is statistically significant.

2.2.4 Evaluate PH Assumptions

Now we already have the models and let’s check the PH assumptions,

kidney.test.phm1 <- cox.zph(fit = kidney.coxm1)
kidney.test.phm1
##          chisq df      p
## sex     0.2907  1 0.5898
## disease 0.2763  3 0.9644
## age     0.0507  1 0.8219
## frail   6.7193  1 0.0095
## GLOBAL  8.8036  6 0.1849

Here p value is p-values of PH(\(p(PH)\)). If it is less than \(\alpha = 0.05\) we can say PH Assumption is viaolated(Not satisfy).
If \(p(PH) < \alpha\) => PH Assumption not satisfy.
So here it is frail(violate PH Assumption).

This is not enough to say that a perticular variable is viaolating PH assumptions, we use another graphycal method.,

plot(survfit(kidney.surv ~ sex), fun = "cloglog", col = c("red", "blue"))