Part I: Descriptive and univariate analysis

a) Calculate descriptive statistics for each variable, and provide a table.

Descriptive statistics of continuous variables

Statistic N Mean St. Dev. Min Max
Survival time 137 121.6 157.8 1 999
Age 137 58.3 10.5 34 81
Karnofsky score 137 58.6 20.0 10 99
Diagnosis time 137 8.8 10.6 1 87

Descriptive statistics of categorical variables

Treatment
  standard 69 (50%)
  test 68 (50%)
Prior therapy
  no 97 (71%)
  yes 40 (29%)
Cell type
  squamous cell 35 (26%)
  small cell 48 (35%)
  adenocarcinoma 27 (20%)
  large cell 27 (20%)
Status
  died 128 (93%)
  living 9 (7%)

b) What are the median survival times, in days, for the treatment and the control groups?

##                records n.max n.start events median 0.95LCL 0.95UCL
## treat=standard      69    69      69     64  103.0      59     132
## treat=test          68    68      68     64   52.5      44      95

The median survival time for the standard care (control) group is 103 days. The median survival time for treatment group is 52.5 days.

c) Estimate the univariate associations of treatment with survival, using a Cox Proportional Hazards model to obtain 95% confidence intervals for the Hazard Ratio, and the log-rank test to get a p-value. From this randomized controlled trial, would you conclude that the treatment is effective overall?

## Call:
## coxph(formula = Surv(stime, status) ~ treat, data = va)
## 
##   n= 137, number of events= 128 
## 
##              coef exp(coef) se(coef)     z Pr(>|z|)
## treattest 0.01774   1.01790  0.18066 0.098    0.922
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## treattest     1.018     0.9824    0.7144      1.45
## 
## Concordance= 0.525  (se = 0.026 )
## Rsquare= 0   (max possible= 0.999 )
## Likelihood ratio test= 0.01  on 1 df,   p=0.9218
## Wald test            = 0.01  on 1 df,   p=0.9218
## Score (logrank) test = 0.01  on 1 df,   p=0.9218

The estimated hazard ratio is 1.02 (95% CI 0.71, 1.45), p=0.92. This finding is not statistically significant at the 0.05 level. Overall, it appears that treatment is not effective in this study.

d) (2 points) Draw Kaplan-Meier plots for each covariate, with continuous variables dichotomized at their median.

e) Using a univariate model, report the hazard ratio and 95% confidence interval for a 10-point and a 20-point decrease in Karnofsky score.

## Call:
## coxph(formula = y ~ Karn, data = va)
## 
##   n= 137, number of events= 128 
## 
##           coef exp(coef)  se(coef)      z Pr(>|z|)    
## Karn -0.033424  0.967129  0.005075 -6.586 4.51e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      exp(coef) exp(-coef) lower .95 upper .95
## Karn    0.9671      1.034    0.9576    0.9768
## 
## Concordance= 0.709  (se = 0.03 )
## Rsquare= 0.264   (max possible= 0.999 )
## Likelihood ratio test= 42.03  on 1 df,   p=8.983e-11
## Wald test            = 43.38  on 1 df,   p=4.513e-11
## Score (logrank) test = 45.32  on 1 df,   p=1.674e-11

In this study, a person is 1.95 (95% CI 1.6, 2.38) as likely to die than someone with a 20-point increase in their Karnofsky score. A person is 1.4 (95% CI 1.26, 1.54) as likely to die than someone with a 10-point increase in their Karnofsky score.

f) Use a log-minus-log plot to examine the proportional hazards assumption for the treatment variable and interpret the results.

In the plot of the logarithm of time against the logarithm of the negative logarithm of the estimated survivor function, the curves appear to be somewhat crossing eachother. However, the curves are relatively parallel to eachother; therefore, the proportional hazards assumption seems to be valid.

=======

Part II: Multivariable analysis

a) Test whether the treatment might be differentially effective for any of the cancer cell types, under the proportional hazards assumption. Do this making the proportional hazards assumption for cell types, by fitting a multivariate model that includes treatment, cell type, and an interaction term.

Proportional Hazards Model: Treatment, Cell Type, and Treatment by Cell Type Interaction

exp(coef) Lower 95% CI Upper 95% CI p-value
cellsmall cell 1.480 0.760 2.860 0.246
celladenocarcinoma 1.970 0.830 4.670 0.122
celllarge cell 0.670 0.310 1.430 0.299
treattest 0.470 0.220 1.010 0.054
cellsmall cell:treattest 4.220 1.550 11.490 0.005
celladenocarcinoma:treattest 2.440 0.800 7.440 0.117
celllarge cell:treattest 3.410 1.140 10.230 0.029

After adjusting for cell type and interaction, treated patients were 0.47 times as likely to die (95% CI 0.22, 1.01) as patients who recieved the standard of care. This result was not statistically significant at the 0.05 level (p=0.0544). However, small cell and large cell patients who were treated had an increased hazard of death as compared to squamous cell patients. The interaction coeffients for treatment and small cell and treatment and large cell were 4.22 (95% CI 1.55, 11.49) and 3.41 (95% CI 1.14, 10.23), respectively. Adenocarcinoma patients who were treated also had an increased hazard of death; however, this result was not statistically significant at the 0.05 level. This suggests that treatment is differentially effective by cell type, and that squamous cell patients might have the greatest benefit of receiving treatment compared to any of the other cell types.

b) Fit a multivariate model with all available predictors, and report the hazard ratio, 95% confidence interval, and p-value from Wald’s test for treatment after correcting for all other covariates.

Proportional Hazards Multivariate Model

exp(coef) Lower 95% CI Upper 95% CI p-value
treattest 1.340 0.890 2.020 0.156
cellsmall cell 2.370 1.380 4.060 0.002
celladenocarcinoma 3.310 1.830 5.960 0.0001
celllarge cell 1.490 0.860 2.600 0.156
age 0.990 0.970 1.010 0.349
Karn 0.970 0.960 0.980 0
diag.time 1 0.980 1.020 0.993
prioryes 1.070 0.680 1.690 0.758

c) Do a residuals analysis for your final model, and comment on the goodness of fit.

Schoenfeld residuals vs Time

Correlation between scaled Schoenfeld residuals and time

##                        rho   chisq        p
## treattest          -0.0273  0.1227 0.726104
## cellsmall cell      0.0128  0.0261 0.871621
## celladenocarcinoma  0.1424  2.9794 0.084329
## celllarge cell      0.1712  4.1093 0.042649
## age                 0.1890  5.3476 0.020750
## Karn                0.3073 13.0449 0.000304
## diag.time           0.1491  2.9436 0.086217
## prioryes           -0.1767  4.4714 0.034467
## GLOBAL                  NA 27.9972 0.000475

The graphical plot and output of Schoenfled residuals vs time suggest that among the predictors, a diagnosis of large cell type, age, Karnofsky score, and prior treatment violated the proportionality assumption, with p-values less than 0.05. The other covariates do not appear to have a changing effect with respect to time.

Martingale and deviance residuals were also plotted (below), but appeared to suggest a less favorable assessment of model fit compared to the Schoenfeld residual analysis.

Martingale Residuals

Deviance Residuals