Survival Analysis II: Cox Proportional Hazards Model

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Proportional Hazard Model

Assumptions 1. the ratio of hazard keep constant for any two individual over time. 2. predictors should be time independent. 3. if time dependent variables are considered, the cox model may still be used, but no longer satisfies the PH assumption, and is called the extended Cox model .

Check assumptions: 1. log log survival curves. 2. comparison of “observed” with “expected” survival curves. 3. GOF approach. use of time-dependent variables.

er<-read.csv("D:\\Users\\EChen13\\OneDrive - JNJ\\Documents\\stat\\Survival data\\remissionData.csv", sep=";")

er$grp <- factor(er$placebo  )
library(survival)
fit1 <- with(er, coxph(Surv(time, event) ~grp))
fit1

## Call:
## coxph(formula = Surv(time, event) ~ grp)
## 
##        coef exp(coef) se(coef)     z        p
## grp1 1.5721    4.8169   0.4124 3.812 0.000138
## 
## Likelihood ratio test=16.35  on 1 df, p=5.261e-05
## n= 42, number of events= 30

fit2<-with(er, coxph(Surv(time, event) ~grp + logWBC))
fit2

## Call:
## coxph(formula = Surv(time, event) ~ grp + logWBC)
## 
##          coef exp(coef) se(coef)     z       p
## grp1   1.3861    3.9991   0.4248 3.263  0.0011
## logWBC 1.6909    5.4243   0.3359 5.034 4.8e-07
## 
## Likelihood ratio test=46.71  on 2 df, p=7.187e-11
## n= 42, number of events= 30

fit3 <- with(er, coxph(Surv(time, event) ~grp*logWBC))
fit3

## Call:
## coxph(formula = Surv(time, event) ~ grp * logWBC)
## 
##                coef exp(coef) se(coef)      z        p
## grp1         2.3749   10.7500   1.7055  1.393    0.164
## logWBC       1.8724    6.5040   0.4514  4.148 3.35e-05
## grp1:logWBC -0.3175    0.7280   0.5258 -0.604    0.546
## 
## Likelihood ratio test=47.07  on 3 df, p=3.356e-10
## n= 42, number of events= 30

a simple anova test to show interaction term can be dropped since AIC value is so close.

drop1(fit3)

to compute difference in -2lgo likelihood between two nested models and compute the null on the covariate coefficients.

chistat <- -2*( (summary(fit2))$loglik[2] - (summary(fit3))$loglik[2])
1-pchisq(chistat,1)

## [1] 0.5488232

similiar test can be performed between Model 1 and Model 2.

chistat <- -2*( (summary(fit1))$loglik[2] - (summary(fit2))$loglik[2])
1- pchisq(chistat, 1)

## [1] 3.587325e-08

In this case, the test is signicant, which says that the log WBC is a variable worth considering. The reason for showing the output of the three models is to drive home that the analysis of the model output is similar to regression output analysis or logistic regression output analysis. The advantage of prop hazard model is that you can draw adjusted survival curves, i.e survival curves adjusted for the covariates.

Assumption evaluation Log-Log Plots

er$wb_cat <- cut(er$logWBC, c(0,2.3,3,8))
fit<-with(er, survfit(Surv(time,event)~grp))
plot(fit, fun="cloglog", main="", log="x", col=c("blue","green"))

fit <- with(er, survfit(Surv(time,event)~wb_cat))
plot(fit, fun="cloglog",main = "",log="x",col= c("blue","green","red"))

fit <- with(er, survfit(Surv(time,event)~female))
plot(fit, fun="cloglog",main = "",log="x",col= c("blue","green"))

The Goodness of Fit Testing Approach: Schoenfeld residuals

fit2<-with(er,coxph(Surv(time, event)~grp+logWBC))
print(cox.zph(fit2))

##           chisq df    p
## grp    8.27e-05  1 0.99
## logWBC 4.00e-02  1 0.84
## GLOBAL 4.02e-02  2 0.98

The following results show that PH assumption holds good for grp and logWBC but not for sex

fit2 <- with(er, coxph(Surv(time,event) ~ grp + logWBC + female))
print(cox.zph(fit2))

##        chisq df    p
## grp    0.036  1 0.85
## logWBC 0.142  1 0.71
## female 5.420  1 0.02
## GLOBAL 5.879  3 0.12

In the table above, notice GLOBAL. It is the result of a test of the global null hypothesis that proportionality holds. The relatively small p-value tells us that we have a big problem with the proportionality assumption.