This guide will demonstrate how to construct Kaplan Meier curves in R using SAS generated output.
Kaplan Meier Analysis in SAS
We’ll use the BMT
dataset that is available in the SASHELP
library for demonstration purposes.
The following SAS code performs the analysis and saves the necessary output needed to construct the curve.
You then need to export the surv_data
dataset - I recommend using something simple (e.g. CSV).
R Prerequisites
Packages
You will need the following packages installed in your R environment.
Data Management
The first step is to import and verify the contents of the data set generated by SAS.
surv <- read.csv("surv_data.csv", header=T)
str(surv)
#> 'data.frame': 144 obs. of 9 variables:
#> $ Group : Factor w/ 3 levels "ALL","AML-High Risk",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ T : int 0 1 55 74 86 104 107 109 110 122 ...
#> $ X_CENSOR_ : int NA 0 0 0 0 0 0 0 0 0 ...
#> $ SURVIVAL : num 1 0.974 0.947 0.921 0.895 ...
#> $ SDF_LCL : num 1 0.828 0.806 0.775 0.743 ...
#> $ SDF_UCL : num 1 0.996 0.987 0.974 0.959 ...
#> $ STRATUM : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ NumberAtRisk: int 38 NA NA NA NA NA NA NA NA NA ...
#> $ Timelist : num 0 NA NA NA NA NA NA NA NA NA ...
By default, SAS does not carry forward survival estimates across time. Fortunately this is straight forward to modify in R.
Basic KM Curves in R
The following code will construct a basic Kaplan Meier curve using ggplot2
. You can of course apply any other ggplot-specific customizations too.
Note: The scale_x_continuous
arguement should be consistent with whatever you specified in the initial SAS code.
km <- ggplot(surv, aes(group=Group, y=SURVIVAL,x=T,colour=Group)) +
geom_step() +
geom_point(show.legend=FALSE, aes(shape=as.factor(X_CENSOR_))) +
scale_shape_manual("", values=c(32,3)) +
scale_y_continuous(breaks=seq(0,1,0.2), limits=c(0,1) ) +
scale_x_continuous(breaks=seq(0,2500,500), limits=c(0,2500)) +
xlab("\nTime (Days)") +
ylab("\nProbability of Survival") +
theme_bw() +
theme(legend.position="top",legend.spacing.x = unit(.7, 'cm'))
km
To export the image to a high-res graphic, you can use ggsave
That’s it!
Advanced KM Curves in R
Numbers at Risk
To add the numbers at risk table, we’ll need to create another plot object.
Note: The scale_x_continuous
arguement should be consistent with whatever you specified in the initial SAS code.
atrisk <- ggplot(subset(surv, !is.na(NumberAtRisk)), aes(color=Group)) +
geom_text(aes(label=NumberAtRisk, x=Timelist,y=Group), show.legend=FALSE) +
scale_x_continuous(breaks=seq(0,2500,500), limits=c(0,2500)) +
ylab("") +
xlab("\nNumber at Risk\n") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
atrisk
To assemble both plots into a single plot, we can use patchwork
. You will need to install it from github as it is currently not available through CRAN.
#install.packages("devtools")
#devtools::install_github("thomasp85/patchwork")
library(patchwork)
km2 <- km + atrisk + plot_layout(ncol=1, heights=c(1,0.20))
km2
Themes
The ggthemr
package is used here to demonstrate adding themes to the Kaplan Meier plot. It can also be installed from github.
Here are some examples. Note that I removed theme_bw()
arguements from the code below.
ggthemr('flat', type = 'outer')
km <- ggplot(surv, aes(group=Group, y=SURVIVAL,x=T,colour=Group)) +
geom_step() +
geom_point(show.legend=FALSE, aes(shape=as.factor(X_CENSOR_))) +
scale_shape_manual("", values=c(32,3)) +
scale_y_continuous(breaks=seq(0,1,0.2), limits=c(0,1) ) +
scale_x_continuous(breaks=seq(0,2500,500), limits=c(0,2500)) +
xlab("\nTime (Days)") +
ylab("\nProbability of Survival") +
theme(legend.position="top",legend.spacing.x = unit(.7, 'cm'))
atrisk <- ggplot(subset(surv, !is.na(NumberAtRisk)), aes(color=Group)) +
geom_text(aes(label=NumberAtRisk, x=Timelist,y=Group), show.legend=FALSE) +
scale_x_continuous(breaks=seq(0,2500,500), limits=c(0,2500)) +
ylab("") +
xlab("\nNumber at Risk\n") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
eg1 <- km + atrisk + plot_layout(ncol=1, heights=c(1,0.20))
eg1
Here are some others.