VA dataset survival in R

##Survival Analysis - Veterans’ Lung Cancer Study 1980

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(gtsummary)
library(survival)
library(survminer)
## Loading required package: ggplot2
## Loading required package: ggpubr
## 
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
## 
##     myeloma
library(ggplot2)
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:gtsummary':
## 
##     select
## The following object is masked from 'package:dplyr':
## 
##     select
str(veteran)
## 'data.frame':    137 obs. of  8 variables:
##  $ trt     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ celltype: Factor w/ 4 levels "squamous","smallcell",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ time    : num  72 411 228 126 118 10 82 110 314 100 ...
##  $ status  : num  1 1 1 1 1 1 1 1 1 0 ...
##  $ karno   : num  60 70 60 60 70 20 40 80 50 70 ...
##  $ diagtime: num  7 5 3 9 11 5 10 29 18 6 ...
##  $ age     : num  69 64 38 63 65 49 69 68 43 70 ...
##  $ prior   : num  0 10 0 10 10 0 10 0 0 0 ...

Treatment (1=standard, 2=test), Celltype (1=squamous, 2=small cell, 3=adenocarcinoma, 4=large cell), Survival (time to death in days), Status (1=dead, 0=censored), Karnofsky score (patient’s performance status), Diagtime (months from diagnosis), Age (in years), Prior Therapy (0=no, 10=yes), and a Patient ID

View(veteran)
head(veteran)
surv_object <- Surv(time = veteran$time, event = veteran$status)
fit1 <- survfit(surv_object ~ trt, data = veteran)

Plot the Kaplan-Meier curve

plot(fit1, 
     col = c("blue", "red"),
     main = "Survival Curves for Treatment Groups",
     xlab = "Time in Days",
     ylab = "Survival Probability")
legend("topright", 
       legend = c("Standard", "Test Drug"), 
       col = c("blue", "red"), 
       lty = 1)

#INTERPRETATION :

The Kaplan-Meier curve displays the estimated survival probability over time for two different treatment groups: “Standard” (red line) and “Test Drug” (red line). Here’s an interpretation of the graph: Axes: The X-axis represents “Time in Days,” indicating the duration of the study, while the Y-axis represents “Survival Probability,” ranging from 0.0 to 1.0 (or 0% to 100%). Starting Point: Both curves begin at a survival probability of 1.0 (100%) at Time 0, as all participants are alive at the start of the study. Curve Shape: The downward steps in both curves indicate the occurrence of an event (e.g., death, disease progression), leading to a decrease in the survival probability. Comparison of Treatments: Initially, the survival probabilities for both the “Standard” and “Test Drug” groups appear similar. However, as time progresses (around 200 days onwards), the red line (“Test Drug”) consistently stays above the red line (“Standard”). This indicates that, at any given time point after approximately 200 days, the “Test Drug” group has a higher survival probability compared to the “Standard” group. Conclusion: The Kaplan-Meier curves suggest that the “Test Drug” appears to be more effective in prolonging survival compared to the “Standard” treatment in this study, particularly after the initial phase. Further statistical analysis, such as a log-rank test, would be needed to determine if this observed difference is statistically significant.

survdiff(surv_object ~ trt, data = veteran)
## Call:
## survdiff(formula = surv_object ~ trt, data = veteran)
## 
##        N Observed Expected (O-E)^2/E (O-E)^2/V
## trt=1 69       64     64.5   0.00388   0.00823
## trt=2 68       64     63.5   0.00394   0.00823
## 
##  Chisq= 0  on 1 degrees of freedom, p= 0.9
ggsurvplot(fit1, data = veteran,
           pval = TRUE,  # Add p-value of log-rank test
           risk.table = TRUE)

#Interpretation of the Kaplan-Meier Survival Curve

This RStudio plot displays Kaplan-Meier survival curves, comparing the survival probability of two treatment groups, “trt=1” and “trt=2”, over time. The “Number at risk” table shows the number of individuals remaining in each group at different time points. Here’s a breakdown of the interpretation: Step 1: Understand the Axes and Curves The x-axis (Time) represents the duration of follow-up, likely in days as indicated by the “Time” label and numerical values. The y-axis (Survival probability) indicates the estimated probability of surviving (or remaining event-free) at any given time point. The red curve (trt=1) and the blue curve (trt=2) represent the survival probabilities for the two respective treatment groups. Step 2: Analyze the Survival Curves Both curves start at a survival probability of 1.0 (100%) at Time = 0, as all subjects are alive at the start of the study. The curves show a gradual decline over time, indicating that the probability of survival decreases as time progresses for both treatment groups, which is expected in survival analysis. The vertical tick marks (censored data) are present on both curves, indicating individuals who were lost to follow-up or withdrew from the study, or were still alive at the end of the study.

#Interpret the “Number at risk” Table

The table plot, labeled “Number at risk.”, shows the count of subjects in each treatment group that are still under observation and have not yet experienced the event (e.g., death) at specific time intervals (0, 250, 500, 750, 1000). For example, at Time = 0, both trt=1 and trt=2 have a similar number of subjects at risk (69 and 68, respectively), but this number decreases over time due to events and censoring.

#Interpret the p-value

The p-value = 0.93 is associated with a statistical test, most likely a log-rank test, which compares the survival curves of the two groups. The null hypothesis of the log-rank test is that there is no statistically significant difference in survival between the groups. A p-value of 0.93 is much greater than the conventional significance level of 0.05.