Load Sample Data

The ‘survival’ R package has multiple sample health data sets. I’ll use the pbc set, which is from the Mayo Clinic Primary Biliary Cirrhosis data. The study was conducted from 1974 to 1984 on 418 patients as a randomized placebo controlled trial of the drug D-penicillamine.

pbc_data <- read_csv("../Samples/pbc_data.csv")
head(pbc_data)

# A tibble: 6 x 20
     id  time status   trt   age sex   ascites hepato spiders edema
  <dbl> <dbl>  <dbl> <dbl> <dbl> <chr>   <dbl>  <dbl>   <dbl> <dbl>
1     1   400      2     1  58.8 f           1      1       1   1  
2     2  4500      0     1  56.4 f           0      1       1   0  
3     3  1012      2     1  70.1 m           0      0       0   0.5
4     4  1925      2     1  54.7 f           0      1       1   0.5
5     5  1504      1     2  38.1 f           0      1       1   0  
6     6  2503      2     2  66.3 f           0      1       0   0  
# ... with 10 more variables: bili <dbl>, chol <dbl>, albumin <dbl>,
#   copper <dbl>, alk.phos <dbl>, ast <dbl>, trig <dbl>,
#   platelet <dbl>, protime <dbl>, stage <dbl>
# i Use `colnames()` to see all variable names

Multivariate Cox Regression Analysis

I’ll perform a Cox regression of time to death on the covariates age, sex, and treatment.

First, I need to recode some variables and remove the non-randomized participants.

# remove NA from trt variable (non-randomized participants)
pbc_data <- pbc_data %>% na.omit(trt)

# sex 1 = female, 2 = male
pbc_data$sex[pbc_data$sex == "f"] <- "1"
pbc_data$sex[pbc_data$sex == "m"] <- "2"

# remove transplant (1) status
pbc_data <- pbc_data %>% filter(!str_detect(status, "1"))

# recode censor (0) status to 1
pbc_data$status[pbc_data$status == "0"] <- "1"
pbc_data$status <- as.numeric(pbc_data$status)

head(pbc_data, 10)

# A tibble: 10 x 20
      id  time status   trt   age sex   ascites hepato spiders edema
   <dbl> <dbl>  <dbl> <dbl> <dbl> <chr>   <dbl>  <dbl>   <dbl> <dbl>
 1     1   400      2     1  58.8 1           1      1       1   1  
 2     2  4500      1     1  56.4 1           0      1       1   0  
 3     3  1012      2     1  70.1 2           0      0       0   0.5
 4     4  1925      2     1  54.7 1           0      1       1   0.5
 5     7  1832      1     2  55.5 1           0      1       0   0  
 6     8  2466      2     2  53.1 1           0      0       0   0  
 7     9  2400      2     1  42.5 1           0      0       1   0  
 8    10    51      2     2  70.6 1           1      0       1   1  
 9    11  3762      2     2  53.7 1           0      1       1   0  
10    12   304      2     2  59.1 1           0      0       1   0  
# ... with 10 more variables: bili <dbl>, chol <dbl>, albumin <dbl>,
#   copper <dbl>, alk.phos <dbl>, ast <dbl>, trig <dbl>,
#   platelet <dbl>, protime <dbl>, stage <dbl>
# i Use `colnames()` to see all variable names

sex:

1 = female
2 = male

trt:

1 = D-penicillmain
2 = placebo

status:

1 = censored
2 = deceased

res_cox <- coxph(Surv(time, status) ~ age + sex + trt, data =  pbc_data)
summary(res_cox)

Call:
coxph(formula = Surv(time, status) ~ age + sex + trt, data = pbc_data)

  n= 258, number of events= 111 

          coef exp(coef)  se(coef)      z Pr(>|z|)    
age   0.038871  1.039637  0.009892  3.930 8.51e-05 ***
sex2  0.330437  1.391576  0.248463  1.330    0.184    
trt  -0.014984  0.985128  0.192715 -0.078    0.938    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     exp(coef) exp(-coef) lower .95 upper .95
age     1.0396     0.9619    1.0197     1.060
sex2    1.3916     0.7186    0.8551     2.265
trt     0.9851     1.0151    0.6752     1.437

Concordance= 0.628  (se = 0.029 )
Likelihood ratio test= 19.88  on 3 df,   p=2e-04
Wald test            = 20.58  on 3 df,   p=1e-04
Score (logrank) test = 20.71  on 3 df,   p=1e-04

The p-value for all 3 overall tests (likelihood, Wald, and score) are significant, indicating that the model is significant. The omnibus null hypothesis (that all betas (β) are 0) is rejected.

In the multivariate Cox analysis, the covariate age is significant (p < 0.01). However, the covariates sex and trt fail to be significant (p = 0.0.18 and p = 0.94, respectively).

The hazard ratio for age is HR = exp(coef) = 1.039, with a significant p-value, which indicates a strong relationship between age and increased risk of death. Holding the other covariates constant, a higher value for age is associated with poorer prognosis for survival.

Visualizing the Estimated Distribution of Survival Times

# Plot the baseline survival function
ggsurvplot(survfit(res_cox, data = pbc_data), color = "#2E9FDF",
           ggtheme = theme_minimal())

Although sex was not determined to have a significant effect on survival when controlling for age and treatment, I’ll demostrate plotting impact of sex on the estimated survival probability.

# Create the new data  
sex_df <- with(pbc_data,
            data.frame(sex = c(1, 2), 
            age = rep(mean(age, na.rm = TRUE), 2),
            trt = c(1, 1)
))

# Survival curves
fit2 <- survfit(res_cox, newdata = sex_df)
ggsurvplot(fit2, conf_int = TRUE, legend.labs=c("Sex=1", "Sex=2"),
           ggtheme = theme_minimal(), data = sex_df)

Original Example with Multivariate Cox Regression Analysis

The above was my adaptation of the following example with the ‘lung’ dataset.

We’ll include the 3 factors (sex, age and ph.ecog) into the multivariate model.

This is direct coding and explanation from r-bloggers; not my own work

A Cox regression of time to death on the time-constant covariates is specified as follow:

head(lung, 10)

   inst time status age sex ph.ecog ph.karno pat.karno meal.cal
1     3  306      2  74   1       1       90       100     1175
2     3  455      2  68   1       0       90        90     1225
3     3 1010      1  56   1       0       90        90       NA
4     5  210      2  57   1       1       90        60     1150
5     1  883      2  60   1       0      100        90       NA
6    12 1022      1  74   1       1       50        80      513
7     7  310      2  68   2       2       70        60      384
8    11  361      2  71   2       2       60        80      538
9     1  218      2  53   1       1       70        80      825
10    7  166      2  61   1       2       70        70      271
   wt.loss
1       NA
2       15
3       15
4       11
5        0
6        0
7       10
8        1
9       16
10      34

res.cox <- coxph(Surv(time, status) ~ age + sex + ph.ecog, data =  lung)
summary(res.cox)

Call:
coxph(formula = Surv(time, status) ~ age + sex + ph.ecog, data = lung)

  n= 227, number of events= 164 
   (1 observation deleted due to missingness)

             coef exp(coef)  se(coef)      z Pr(>|z|)    
age      0.011067  1.011128  0.009267  1.194 0.232416    
sex     -0.552612  0.575445  0.167739 -3.294 0.000986 ***
ph.ecog  0.463728  1.589991  0.113577  4.083 4.45e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
age        1.0111     0.9890    0.9929    1.0297
sex        0.5754     1.7378    0.4142    0.7994
ph.ecog    1.5900     0.6289    1.2727    1.9864

Concordance= 0.637  (se = 0.025 )
Likelihood ratio test= 30.5  on 3 df,   p=1e-06
Wald test            = 29.93  on 3 df,   p=1e-06
Score (logrank) test = 30.5  on 3 df,   p=1e-06

“The p-value for all 3 overall tests (likelihood, Wald, and score) are significant, indicating that the model is significant. These tests evaluate the omnibus null hypothesis that all of the betas (β) are 0. In the above example, the test statistics are in close agreement, and the omnibus null hypothesis is soundly rejected.”
“In the multivariate Cox analysis, the covariates sex and ph.ecog are significant (p < 0.05). However, the covariate age fails to be significant (p = 0.23, which is grater than 0.05).”
“The p-value for sex is 0.000986, with a hazard ratio HR = exp(coef) = 0.58, indicating a strong relationship between the patients’ sex and decreased risk of death. The hazard ratios of covariates are interpretable as multiplicative effects on the hazard. For example, holding the other covariates constant, being female (sex=2) reduces the hazard by a factor of 0.58, or 42%. We conclude that, being female is associated with good prognostic.”
“Similarly, the p-value for ph.ecog is 4.45e-05, with a hazard ratio HR = 1.59, indicating a strong relationship between the ph.ecog value and increased risk of death. Holding the other covariates constant, a higher value of ph.ecog is associated with a poor survival.”
“By contrast, the p-value for age is now p=0.23. The hazard ratio HR = exp(coef) = 1.01, with a 95% confidence interval of 0.99 to 1.03. Because the confidence interval for HR includes 1, these results indicate that age makes a smaller contribution to the difference in the HR after adjusting for the ph.ecog values and patient’s sex, and only trend toward significance. For example, holding the other covariates constant, an additional year of age induce daily hazard of death by a factor of exp(beta) = 1.01, or 1%, which is not a significant contribution.”

Visualizing the Estimated Distribution of Survival Times

# Plot the baseline survival function
ggsurvplot(survfit(res.cox, data = lung), color = "#2E9FDF",
           ggtheme = theme_minimal())

To assess the impact of the sex on the estimated survival probability:

# Create new dataset with 2 rows (each sex)
# other covariates are fixed to their average values
sex_df <- with(lung,
            data.frame(sex = c(1, 2), 
            age = rep(mean(age, na.rm = TRUE), 2),
            ph.ecog = c(1, 1)
))
sex_df

  sex      age ph.ecog
1   1 62.44737       1
2   2 62.44737       1

# Survival curves
fit2 <- survfit(res.cox, newdata = sex_df)
ggsurvplot(fit2, conf.int = TRUE, legend.labs=c("Sex=1", "Sex=2"),
           ggtheme = theme_minimal(), data = sex_df)

SOURCE: r-bloggers

Sample: Cox Proportional Hazard Model

Load Sample Data

Multivariate Cox Regression Analysis

Visualizing the Estimated Distribution of Survival Times

Original Example with Multivariate Cox Regression Analysis

Visualizing the Estimated Distribution of Survival Times