library(survival)
library(survminer)
## Loading required package: ggplot2
## Loading required package: ggpubr
##
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
##
## myeloma
library(ggplot2)
library(tidyr)
data(package = "survival")
data("cancer" , package = "survival")
head(cancer)
## inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
## 1 3 306 2 74 1 1 90 100 1175 NA
## 2 3 455 2 68 1 0 90 90 1225 15
## 3 3 1010 1 56 1 0 90 90 NA 15
## 4 5 210 2 57 1 1 90 60 1150 11
## 5 1 883 2 60 1 0 100 90 NA 0
## 6 12 1022 1 74 1 1 50 80 513 0
colnames(cancer)
## [1] "inst" "time" "status" "age" "sex" "ph.ecog"
## [7] "ph.karno" "pat.karno" "meal.cal" "wt.loss"
unique(cancer$sex)
## [1] 1 2
# Fit Kaplan-Meier model stratified by Sex
km_fit <- survfit(Surv(time, status) ~ sex, data = cancer )
# Plot Kaplan-Meier survival curve stratified by sex
ggsurvplot(km_fit, data = cancer,
main = "Kaplan-Meier Survival Curve by sex",
xlab = "Time",
ylab = "Survival Probability",
palette = c("blue", "red"),
legend.title = "sex",
legend.labs = c("Male", "Female"))
The males in the dataset seem to have better survival compared to females Females appear to have a lower survival probability over time compared to males in this dataset. This could indicate that females in this study group experience the event of interest at a higher rate than males
logrank_test <- survdiff(Surv(time, status) ~ sex, data = cancer)
# Display the result of the test
logrank_test
## Call:
## survdiff(formula = Surv(time, status) ~ sex, data = cancer)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=1 138 112 91.6 4.55 10.3
## sex=2 90 53 73.4 5.68 10.3
##
## Chisq= 10.3 on 1 degrees of freedom, p= 0.001
The p-value of 0.001 indicates that the difference in survival between males and females is statistically significant. Therefore, we can reject the null hypothesis and conclude that there is a significant difference in survival between the two groups (males and females).
The Chi-square value of 10.3 supports this finding, showing a strong association between sex and survival time in the data.
fit_na <- survfit(Surv(time, status) ~ sex, data = cancer, type = "fleming-harrington")
ggsurvplot(fit_na, data = cancer, fun = "cumhaz",
xlab = "Time", ylab = "Cumulative Hazard",
legend.title = "sex", legend.labs = c("Male", "Female"))
Based on this Nelson-Aalen estimate of the cumulative hazard, males have a higher cumulative hazard compared to females in this dataset. This could indicate that males have a higher overall risk of the event than females over the time period observed.
fit_cox <- coxph(Surv(time, status) ~ age + sex + ph.ecog , data = cancer)
fit_cox
## Call:
## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog, data = cancer)
##
## coef exp(coef) se(coef) z p
## age 0.011067 1.011128 0.009267 1.194 0.232416
## sex -0.552612 0.575445 0.167739 -3.294 0.000986
## ph.ecog 0.463728 1.589991 0.113577 4.083 4.45e-05
##
## Likelihood ratio test=30.5 on 3 df, p=1.083e-06
## n= 227, number of events= 164
## (1 observation deleted due to missingness)
# Display the results
summary(fit_cox )
## Call:
## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog, data = cancer)
##
## n= 227, number of events= 164
## (1 observation deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.011067 1.011128 0.009267 1.194 0.232416
## sex -0.552612 0.575445 0.167739 -3.294 0.000986 ***
## ph.ecog 0.463728 1.589991 0.113577 4.083 4.45e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.0111 0.9890 0.9929 1.0297
## sex 0.5754 1.7378 0.4142 0.7994
## ph.ecog 1.5900 0.6289 1.2727 1.9864
##
## Concordance= 0.637 (se = 0.025 )
## Likelihood ratio test= 30.5 on 3 df, p=1e-06
## Wald test = 29.93 on 3 df, p=1e-06
## Score (logrank) test = 30.5 on 3 df, p=1e-06
These values represent the effect of each covariate on the log hazard. A positive value increases the hazard (i.e., increases the risk of the event), while a negative value decreases the hazard. age: (0.0111) — The coefficient for age is positive, which means that as age increases, the hazard (risk of the event) slightly increases. However, the effect is small.
sex: (−0.5526) — The negative coefficient suggests that being male (1 for male and 2 for female) decreases the hazard, meaning males are at a lower risk of the event compared to females.
ph.ecog:(0.4637) — The positive coefficient indicates that worse performance status (higher ECOG score) increases the risk of the event. Higher ECOG scores are associated with poorer health status and an increased hazard.
These are the hazard ratios (HR), which show the relative risk of the event for each unit increase in the covariate. age: 1.0111 — For each one-unit increase in age, the hazard increases by 1.1%.
sex: 0.5754 — Being male reduces the hazard by 42.46% compared to being female.
ph.ecog: 1.5899 — A one-unit increase in the ECOG performance status increases the hazard by approximately 59%
This tests whether the coefficient is significantly different from zero (null hypothesis). A p-value less than 0.05 typically indicates statistical significance.
age: The p-value is 0.2324, which is not statistically significant. This means that age does not significantly affect the survival time in this model.
sex: The p-value is 0.000986, which is very significant. This suggests that sex has a significant effect on survival.
ph.ecog: The p-value is 4.45×10^−5, which is extremely significant, indicating that the performance status (ph.ecog) significantly affects survival.
age: The 95% confidence interval for age is [0.9929, 1.0297]. Since this range includes 1, it suggests that the effect of age might not be statistically significant.
sex: The 95% confidence interval for sex is [0.4142, 0.7994], which does not include 1, suggesting that being male is significantly associated with a lower risk of the event.
ph.ecog: The 95% confidence interval for ph.ecog is [1.2727, 1.9864], which does not include 1, indicating that higher performance status is significantly associated with a higher risk.
Likelihood ratio test: The likelihood ratio test statistic is 30.5, which tests the overall significance of the model. The p-value is 1×10^−6, indicating that the model as a whole is statistically significant.
Wald test: The Wald test statistic is 29.93, which is also significant, with a p-value of 1×10^−6
Score (logrank) test: The score test statistic is 30.5, with a p-value of 1×10^−6, indicating significant differences in the hazard across the covariates.
The concordance index (C-index) is 0.637. The C-index measures how well the model discriminates between high- and low-risk individuals. The value of 0.637 suggests that the model has moderate discrimination.