Survival Analysis

The problems on this assignment require you to and compare survival between groups. Answer all questions specified on the problem but if you see something interesting and want to do more analysis please report it as well. Don’t forget to include discussion and to make your plots look nice and presentable. Submit your .rmd file with the knitted PDF (or knitted Word Document saved as a PDF). If you are still having trouble with .rmd, let us know and we will help you, but we are going to start requiring this from here on out.

#load libraries
library(survival)
library(dplyr)
library(tidyr)
library(coin)
library(ggplot2)
library(HSAUR3)
library(party)

Question 1

An investigator collected data on survival of patients with lung cancer at Mayo Clinic. The investigator would like you, the statistician, to answer the following questions and provide some graphs. Use the cancer data located in the survival package.

Part A.

What is the probability that someone will survive past 300 days? Answer - The probability that someone will surviving lung cancer past 300 days is \(\approx\) 0.5301.

#load the cancer data
data(cancer)
#build a survival model
surv.mod <- survfit(Surv(time,status==2)~1,data=cancer)
#compute the probability that someone will live past 300 days
summary(surv.mod,time = 300)[6]

## $surv
## [1] 0.5306081

Part B.

Provide a graph, including 95% confidence limits, of the Kaplan-Meier estimate of the entire study.

par(mfrow=c(1,1),mar=c(5, 4, 4, 2) + 0.1)
plot(surv.mod,main='Kaplan-Meir Estimate',
     xlab='Time in Days',
     ylab='Probability of Surviving',
     col=c(2,1,1),lty = c(2,1,1))
legend("bottomleft", c("Control", "RIT"), 
lty = c(2,1,1), col = c(2,1,1))

Part C.

Is there a difference in the survival rates between males and females? Provide formal statistical test with p-value and visual evidence.

Answer - Yes women have a higher probability of surviving lung cancer than men as show by the Kaplan-Meier estimate plots and by the probability of each surviving past 300 days. The difference in probability is substation and is \(\approx\) 0.233. The null hypothesis can also be rejected and the p-value is also significant at the 0.05 level.

Build a Model for Men and Women, output the p-values.

#set the sex variable to a factor
cancer$sex <- as.factor(cancer$sex)
#create a subset for men and women
male <- cancer %>%
  filter(sex=='1')
female <- cancer %>%
  filter(sex=='2')
#build the two models
surv.male <- survfit(Surv(time,status==2)~1,data=male)
surv.female <- survfit(Surv(time,status==2)~1,data=female)
#compute the p-values
logrank_test(Surv(time,status==2)~sex,data=cancer)

## 
##  Asymptotic Two-Sample Logrank Test
## 
## data:  Surv(time, status == 2) by sex (1, 2)
## Z = -3.2779, p-value = 0.001046
## alternative hypothesis: true theta is not equal to 1

Probability of Men Surviving Past 300 Days

The probability of men surviving past 300 days is \(\approx\) 0.441 which is lower than for women.

summary(surv.male,time = 300)[6]

## $surv
## [1] 0.4410889

Probability of Women Surviving Past 300 Days

The probability of men surviving past 300 days is \(\approx\) 0.674 which is higher than for men.

summary(surv.female,time = 300)[6]

## $surv
## [1] 0.6742026

Plot of Kaplan-Meier Estimates For Men and Women

par(mfrow=c(1,2),oma = c(4, 4, 2, 2),mar = c(2, 2, 1, 1))
plot(surv.male,main="Male Patients",col=c(2,1,1),lty = c(2,1,1))
legend("topright", c("Control", "RIT"), 
lty = c(2,1,1), col = c(2,1,1))
plot(surv.female,main='Female Patients',col=c(2,1,1),lty = c(2,1,1))
legend("topright", c("Control", "RIT"), 
lty = c(2,1,1), col = c(2,1,1))
mtext('Kaplan-Meier Estimate by Gender', outer = TRUE, cex = 1.5)
mtext('Time in Days', side = 1, outer = TRUE, line = 2)
mtext('Probability of Surviving', side = 2, outer = TRUE, line = 2)

Part D.

Is there a difference in the survival rates for the older half of the group versus the younger half? Provide a formal statistical test with p-value and visual evidence.

Answer - There is a very small difference in the probability of death in the two age groups. However, the p-value shows that the null hypotheses can be rejected, but not at the 0.05 level. The difference in probability of death is only .03.

Partition the Data into Two Groups

Judging by the histogram the split should occur at 63 years.

#calculate the median
median_age <- median(cancer$age)
#plot the histogram of age
ggplot(cancer, aes(x=age))+ 
  geom_histogram(binwidth=1,colour = "black") + 
  geom_vline(xintercept=median_age, colour="red",lwd=1.5) +
  geom_text(label = paste(median_age,'years'), y = 4, x=median_age, 
            angle = 60, hjust = 0,color='red',size=15) +
  labs(title='Histogram of Cancer Patient Age',
       x='Age of Patient',
       y='Count')

#create the agegroup variable splitting old & young by the median
cancer <- cancer %>%
  mutate(agegroup = ifelse(age >= median_age,"Old","Young"))
#set the agegroup to a factor
cancer$agegroup <- as.factor(cancer$agegroup)
#create two new partitioned datasets         
old <- cancer %>%         
  filter(agegroup=='Old')
young <- cancer %>%
  filter(agegroup=='Young')

Calculate the p-value

logrank_test(Surv(time,status==2)~agegroup,data=cancer)

## 
##  Asymptotic Two-Sample Logrank Test
## 
## data:  Surv(time, status == 2) by agegroup (Old, Young)
## Z = -0.8637, p-value = 0.3878
## alternative hypothesis: true theta is not equal to 1

Probability of Adults Above 63 Years Surviving After 300 Days

The probability of an adult age 63 and older surviving lung cancer is \(\approx\) 0.518.

surv.old <- survfit(Surv(time,status==2)~1,data=old)

surv.young <- survfit(Surv(time,status==2)~1,data=young)

summary(surv.old,time = 300)[6]

## $surv
## [1] 0.5181564

Probability of Adults Below 63 Years Surviving After 300 Days

The probability of an adult younger than 63 surviving lung cancer is \(\approx\) 0.5444.

summary(surv.young,time = 300)[6]

## $surv
## [1] 0.5444021

Plot of Kaplan-Meier Estimates For Older and Younger than 63 Years of Age

par(mfrow=c(1,2),oma = c(4, 4, 2, 2),mar = c(2, 2, 1, 1))
plot(surv.old, main='Over 63',col=c(2,1,1))
legend("topright", c("Control", "RIT"), 
lty = c(2,1,1), col = c(2,1,1))
plot(surv.young,main='Under 63',col=c(2,1,1))
legend("topright", c("Control", "RIT"), 
lty = c(2,1,1), col = c(2,1,1))
mtext('Kaplan-Meier Estimate by Age Group', outer = TRUE, cex = 1.5)
mtext('Time in Days', side = 1, outer = TRUE, line = 2)
mtext('Probability of Surviving', side = 2, outer = TRUE, line = 2)

Cox Proportional Hazards Regression Model of Cancer Data

As shown by the summary of the Cox Proportional Hazards model the ph.ecog, sex, inst, ph.karno, and wt.loss all contribute/have statistical significance with the survival rates of patients with advanced lung cancer.

cancer.cox <- coxph(Surv(time,status==2)~.,data=cancer)
summary(cancer.cox)

## Call:
## coxph(formula = Surv(time, status == 2) ~ ., data = cancer)
## 
##   n= 167, number of events= 120 
##    (61 observations deleted due to missingness)
## 
##                     coef  exp(coef)   se(coef)      z Pr(>|z|)    
## inst          -2.850e-02  9.719e-01  1.305e-02 -2.183 0.029004 *  
## age            3.678e-02  1.037e+00  1.936e-02  1.900 0.057452 .  
## sex2          -5.826e-01  5.584e-01  2.015e-01 -2.891 0.003840 ** 
## ph.ecog        8.755e-01  2.400e+00  2.375e-01  3.687 0.000227 ***
## ph.karno       2.354e-02  1.024e+00  1.157e-02  2.035 0.041819 *  
## pat.karno     -1.110e-02  9.890e-01  8.133e-03 -1.365 0.172353    
## meal.cal       1.522e-05  1.000e+00  2.630e-04  0.058 0.953858    
## wt.loss       -1.680e-02  9.833e-01  7.777e-03 -2.161 0.030715 *  
## agegroupYoung  5.283e-01  1.696e+00  3.315e-01  1.594 0.110945    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##               exp(coef) exp(-coef) lower .95 upper .95
## inst             0.9719     1.0289    0.9474    0.9971
## age              1.0375     0.9639    0.9988    1.0776
## sex2             0.5584     1.7907    0.3762    0.8289
## ph.ecog          2.4000     0.4167    1.5069    3.8226
## ph.karno         1.0238     0.9767    1.0009    1.0473
## pat.karno        0.9890     1.0112    0.9733    1.0049
## meal.cal         1.0000     1.0000    0.9995    1.0005
## wt.loss          0.9833     1.0169    0.9685    0.9984
## agegroupYoung    1.6961     0.5896    0.8858    3.2478
## 
## Concordance= 0.652  (se = 0.031 )
## Rsquare= 0.195   (max possible= 0.998 )
## Likelihood ratio test= 36.23  on 9 df,   p=3.608e-05
## Wald test            = 34.6  on 9 df,   p=7.008e-05
## Score (logrank) test = 35.53  on 9 df,   p=4.806e-05

Conditional Inference Tree of Cancer Data

The conditional inference tree clearly shows the increased probability of survival for female patients vs. male patients and also uses the highly significant ph.ecog variable for the first split of the tree.

cancer.tree <- ctree(Surv(time, status==2) ~ ., data = cancer)
plot(cancer.tree)

Question 2

The mastectomy data from the HSAUR3 package are the survival times (in months) after mastectomy of women with breast cancer. The cancers are classiffied as having metastasized or not based on a histochemical marker.

Answer - There is a substantial difference in the probability of death for metastasized and un-metastasized patients. The difference is 0.2708 at 50 months. This can also be seen with the plot. Notice the much higher upper confidence interval for non-metastasized patients. There is a much higher probability of patients surviving if the cancer has not metastasized.

Part A.

Plot the survivor functions of each group, estimated using the Kaplan-Meier estimate, on the same graph and comment on the differences.

Probability of Surviving After 50 Months for Metastasized Patients

#load the data
data(mastectomy)
#set the event variable to numeric
mastectomy <- mastectomy %>%
  mutate(event = ifelse(event==T,2,1))
#partition the data into metastasized and not metastasized
meta.yes <- mastectomy[mastectomy$metastasized == 'yes',]
meta.no <- mastectomy[mastectomy$metastasized == 'no',]
#create two models
mast.yes <- survfit(Surv(time,event==2)~1,data=meta.yes)
mast.no <- survfit(Surv(time,event==2)~1,data=meta.no)


summary(mast.yes,time = 50)[6]

## $surv
## [1] 0.5625

Probability of Surviving After 50 Months for Un-Metastasized Patients

summary(mast.no,time = 50)[6]

## $surv
## [1] 0.8333333

Plot of Kaplan-Meier Estimates For Mastasized and Un-Mastasized Patients

par(mfrow=c(1,2),oma = c(4, 4, 2, 2),mar = c(2, 2, 1, 1))
plot(mast.yes, main='Metastasized',col=c(2,1,1))
legend("topright", c("Control", "RIT"), 
       lty = c(2,1,1), col = c(2,1,1))
plot(mast.no,main='Not Metastasized',col=c(2,1,1))
legend("bottomleft", c("Control", "RIT"), 
       lty = c(2,1,1), col = c(2,1,1))
mtext('Kaplan-Meier Estimate of Mastectomy Data', outer = TRUE, cex = 1.5)
mtext('Time in Months', side = 1, outer = TRUE, line = 2)
mtext('Probability of Surviving', side = 2, outer = TRUE, line = 2)

Part B.

The null hypothesis can be rejected and the p-value is significant but not at the 0.05 level.

logrank_test(Surv(time,event==2)~metastasized,data=mastectomy)

## 
##  Asymptotic Two-Sample Logrank Test
## 
## data:  Surv(time, event == 2) by metastasized (no, yes)
## Z = 1.8667, p-value = 0.06194
## alternative hypothesis: true theta is not equal to 1

STAT 701 Homework 7

Jeremiah Lowhorn

Monday, October 16, 2017

Survival Analysis

Question 1

Part A.

Part B.

Part C.

Build a Model for Men and Women, output the p-values.

Probability of Men Surviving Past 300 Days

Probability of Women Surviving Past 300 Days

Plot of Kaplan-Meier Estimates For Men and Women

Part D.

Partition the Data into Two Groups

Calculate the p-value

Probability of Adults Above 63 Years Surviving After 300 Days

Probability of Adults Below 63 Years Surviving After 300 Days

Plot of Kaplan-Meier Estimates For Older and Younger than 63 Years of Age

Cox Proportional Hazards Regression Model of Cancer Data

Conditional Inference Tree of Cancer Data

Question 2

Part A.

Probability of Surviving After 50 Months for Metastasized Patients

Probability of Surviving After 50 Months for Un-Metastasized Patients

Plot of Kaplan-Meier Estimates For Mastasized and Un-Mastasized Patients

Part B.