library(survival)
## Loading required package: splines
Lung=read.csv("Lung.csv")

Question 1

  1. To fit a Cox PH model for Time that uses Sex:
coxph(Surv(Time, Status)~Sex, data=Lung)
## Call:
## coxph(formula = Surv(Time, Status) ~ Sex, data = Lung)
## 
## 
##      coef exp(coef) se(coef)    z      p
## Sex 0.531       1.7    0.167 3.18 0.0015
## 
## Likelihood ratio test=10.6  on 1 df, p=0.00111  n= 228, number of events= 165

The male-to-female hazard ratio is exp(.531)=1.701 (meaning that males have longer survival times than females).

  1. To plot the estimated survival curves from the model in (a):
mod=coxph(Surv(Time, Status)~Sex, data=Lung)
new=data.frame(Sex=c(0,1))
plot(survfit(mod, newdata=new), mark.time=F, col=1:2)

plot of chunk unnamed-chunk-3

  1. To make the c-log-log plot:
km=survfit(Surv(Time, Status)~Sex, data=Lung) #use KM, not Cox PH
plot(km,fun='cloglog',mark.time=F,col=1:2)

plot of chunk unnamed-chunk-4

The gap between the c-log-log curves is the log of the hazard ratio (b) at log(t).

  1. The gap in the plot from (c) appears to shrink over time, which tells us that a Cox PH model is not appropriate for this data (b is not constant, indicating changing ratios).

  2. To plot the Shoenfeld residual graphs for the model in (a):

plot(cox.zph(mod))

plot of chunk unnamed-chunk-5

The line has a slightly negative slope, which matches with the shrinking gap in the plot from (c).

  1. The coefficient on Sex from the Cox PH model (0.531) is the (created) ratio between males and females, as shown in the graph below.
plot(cox.zph(mod))
abline(0.531,0, col='red')

plot of chunk unnamed-chunk-6

The red line is our expected value of b at any given point, and the solid black line is the actual value.

cox.zph(mod)
##        rho chisq      p
## Sex -0.131  2.77 0.0962

The p-value for this model is greater than .05, indicating that the PH assumption is adequate.

  1. To assess which of the three measures (if any) are significantly related to survival time:
coxph(Surv(Time, Status)~PhysEcog, data=Lung)
## Call:
## coxph(formula = Surv(Time, Status) ~ PhysEcog, data = Lung)
## 
## 
##           coef exp(coef) se(coef)   z       p
## PhysEcog 0.476      1.61    0.113 4.2 2.7e-05
## 
## Likelihood ratio test=17.6  on 1 df, p=2.77e-05  n= 227, number of events= 164 
##    (1 observation deleted due to missingness)
coxph(Surv(Time, Status)~PhysKarno, data=Lung)
## Call:
## coxph(formula = Surv(Time, Status) ~ PhysKarno, data = Lung)
## 
## 
##              coef exp(coef) se(coef)     z     p
## PhysKarno -0.0164     0.984  0.00585 -2.81 0.005
## 
## Likelihood ratio test=7.56  on 1 df, p=0.00597  n= 227, number of events= 164 
##    (1 observation deleted due to missingness)
coxph(Surv(Time, Status)~PatKarno, data=Lung)
## Call:
## coxph(formula = Surv(Time, Status) ~ PatKarno, data = Lung)
## 
## 
##             coef exp(coef) se(coef)     z       p
## PatKarno -0.0199      0.98  0.00547 -3.63 0.00028
## 
## Likelihood ratio test=12.5  on 1 df, p=0.000412  n= 225, number of events= 162 
##    (3 observations deleted due to missingness)

Individually, each of the variables is significantly related to survival time (p-value < .05).

  1. If we model survival time using all three variables simultaneously, the variables will not maintain significance because the coefficients differ in direction.
  1. I would select PhysEcog as a predictor of survival.

  2. In order to check whether PhysEcog maintains its significance in the presence of Sex, without making the PH assumption for Sex, we could fit the model stratified by Sex.

sexcat=0+(Lung$Sex==1) #stratify the data
table(sexcat) #verify that data has been stratified
## sexcat
##   0   1 
##  90 138
coxph(Surv(Time, Status)~PhysEcog+strata(sexcat), data=Lung) #fit the Cox PH model stratified by Sex
## Call:
## coxph(formula = Surv(Time, Status) ~ PhysEcog + strata(sexcat), 
##     data = Lung)
## 
## 
##           coef exp(coef) se(coef)    z       p
## PhysEcog 0.485      1.62    0.113 4.27 1.9e-05
## 
## Likelihood ratio test=18.1  on 1 df, p=2.04e-05  n= 227, number of events= 164 
##    (1 observation deleted due to missingness)

PhysEcog is still significant in the presence of Sex.

Question 2

  1. \(h_{1}(t)\)=\(h_{0}(t) e^{bx}\)

  2. The hazard function is the negative derivative of the log of the survival curve, and \(S_{1}(t)=S_{0}(t)^{e^{bx}}\).

  3. If the log of a hazard ratio (b) is zero, there is no change in hazard between the two functions, i.e. \(S_{1}(t)\) is the baseline survival curve.