library(survival)
## Loading required package: splines
Lung=read.csv("Lung.csv")
coxph(Surv(Time, Status)~Sex, data=Lung)
## Call:
## coxph(formula = Surv(Time, Status) ~ Sex, data = Lung)
##
##
## coef exp(coef) se(coef) z p
## Sex 0.531 1.7 0.167 3.18 0.0015
##
## Likelihood ratio test=10.6 on 1 df, p=0.00111 n= 228, number of events= 165
The male-to-female hazard ratio is exp(.531)=1.701 (meaning that males have longer survival times than females).
mod=coxph(Surv(Time, Status)~Sex, data=Lung)
new=data.frame(Sex=c(0,1))
plot(survfit(mod, newdata=new), mark.time=F, col=1:2)
km=survfit(Surv(Time, Status)~Sex, data=Lung) #use KM, not Cox PH
plot(km,fun='cloglog',mark.time=F,col=1:2)
The gap between the c-log-log curves is the log of the hazard ratio (b) at log(t).
The gap in the plot from (c) appears to shrink over time, which tells us that a Cox PH model is not appropriate for this data (b is not constant, indicating changing ratios).
To plot the Shoenfeld residual graphs for the model in (a):
plot(cox.zph(mod))
The line has a slightly negative slope, which matches with the shrinking gap in the plot from (c).
plot(cox.zph(mod))
abline(0.531,0, col='red')
The red line is our expected value of b at any given point, and the solid black line is the actual value.
cox.zph(mod)
## rho chisq p
## Sex -0.131 2.77 0.0962
The p-value for this model is greater than .05, indicating that the PH assumption is adequate.
coxph(Surv(Time, Status)~PhysEcog, data=Lung)
## Call:
## coxph(formula = Surv(Time, Status) ~ PhysEcog, data = Lung)
##
##
## coef exp(coef) se(coef) z p
## PhysEcog 0.476 1.61 0.113 4.2 2.7e-05
##
## Likelihood ratio test=17.6 on 1 df, p=2.77e-05 n= 227, number of events= 164
## (1 observation deleted due to missingness)
coxph(Surv(Time, Status)~PhysKarno, data=Lung)
## Call:
## coxph(formula = Surv(Time, Status) ~ PhysKarno, data = Lung)
##
##
## coef exp(coef) se(coef) z p
## PhysKarno -0.0164 0.984 0.00585 -2.81 0.005
##
## Likelihood ratio test=7.56 on 1 df, p=0.00597 n= 227, number of events= 164
## (1 observation deleted due to missingness)
coxph(Surv(Time, Status)~PatKarno, data=Lung)
## Call:
## coxph(formula = Surv(Time, Status) ~ PatKarno, data = Lung)
##
##
## coef exp(coef) se(coef) z p
## PatKarno -0.0199 0.98 0.00547 -3.63 0.00028
##
## Likelihood ratio test=12.5 on 1 df, p=0.000412 n= 225, number of events= 162
## (3 observations deleted due to missingness)
Individually, each of the variables is significantly related to survival time (p-value < .05).
I would select PhysEcog as a predictor of survival.
In order to check whether PhysEcog maintains its significance in the presence of Sex, without making the PH assumption for Sex, we could fit the model stratified by Sex.
sexcat=0+(Lung$Sex==1) #stratify the data
table(sexcat) #verify that data has been stratified
## sexcat
## 0 1
## 90 138
coxph(Surv(Time, Status)~PhysEcog+strata(sexcat), data=Lung) #fit the Cox PH model stratified by Sex
## Call:
## coxph(formula = Surv(Time, Status) ~ PhysEcog + strata(sexcat),
## data = Lung)
##
##
## coef exp(coef) se(coef) z p
## PhysEcog 0.485 1.62 0.113 4.27 1.9e-05
##
## Likelihood ratio test=18.1 on 1 df, p=2.04e-05 n= 227, number of events= 164
## (1 observation deleted due to missingness)
PhysEcog is still significant in the presence of Sex.
\(h_{1}(t)\)=\(h_{0}(t) e^{bx}\)
The hazard function is the negative derivative of the log of the survival curve, and \(S_{1}(t)=S_{0}(t)^{e^{bx}}\).
If the log of a hazard ratio (b) is zero, there is no change in hazard between the two functions, i.e. \(S_{1}(t)\) is the baseline survival curve.