生存分析

2021-09-02

感谢麦子老师

survival包和内置数据集

suppressMessages(library(survival))
help(package=survival)
#通过help文件找到了cancer数据集,里面包含所有内置数据集
data("cancer")
rm(list=ls())
Lung <- lung
#看不懂,help
head(Lung)
##   inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
## 1    3  306      2  74   1       1       90       100     1175      NA
## 2    3  455      2  68   1       0       90        90     1225      15
## 3    3 1010      1  56   1       0       90        90       NA      15
## 4    5  210      2  57   1       1       90        60     1150      11
## 5    1  883      2  60   1       0      100        90       NA       0
## 6   12 1022      1  74   1       1       50        80      513       0
str(Lung)
## 'data.frame':    228 obs. of  10 variables:
##  $ inst     : num  3 3 3 5 1 12 7 11 1 7 ...
##  $ time     : num  306 455 1010 210 883 ...
##  $ status   : num  2 2 1 2 2 1 2 2 2 2 ...
##  $ age      : num  74 68 56 57 60 74 68 71 53 61 ...
##  $ sex      : num  1 1 1 1 1 1 2 2 1 1 ...
##  $ ph.ecog  : num  1 0 0 1 0 1 2 2 1 2 ...
##  $ ph.karno : num  90 90 90 90 100 50 70 60 70 70 ...
##  $ pat.karno: num  100 90 90 60 90 80 60 80 80 70 ...
##  $ meal.cal : num  1175 1225 NA 1150 NA ...
##  $ wt.loss  : num  NA 15 15 11 0 0 10 1 16 34 ...

help-lung | Format

inst: Institution

code time: Survival time in days

status: censoring status 1=censored, 2=dead censored:here means alive in the end

age: Age in years

sex: Male=1 Female=2

ph.ecog: ECOG performance score as rated by the physician. 0=asymptomatic, 1= symptomatic but completely ambulatory, 2= in bed <50% of the day, 3= in bed > 50% of the day but not bedbound, 4 = bedbound

ph.karno: Karnofsky performance score(bad=0-good=100) rated by physician pat.karno: Karnofsky performance score as rated by patient meal.cal: Calories consumed at meals wt.loss: Weight loss in last six months

Kaplain-Meier曲线

在t检验和回归分析中,我们得到的是数据的参数(均值、标准差等);在生存分析中我们得到的是生存曲线。

基本流程

1.提取生存时间Surv()

2.拟合模型survfit()

3.如果有分组,分析差异survdiff()

polt()绘图

#提取生存时间
Lusurv <- Surv(time = Lung$time, event = Lung$status)
head(Lusurv)
## [1]  306   455  1010+  210   883  1022+
#拟合模型
Lufit <- survfit(Lusurv~1)
plot(Lufit)

plot(Lufit,conf.int = "none")#去掉可信区间

#分组分析差异
Lufit <- survfit(Lusurv~Lung$sex)
#生存曲线
{
plot(Lufit)
abline(h = 0.5)#水平参考线
abline(v = 500, lty=3 )#垂直参考线,linetype虚线
}

#分析差异
Lufit
## Call: survfit(formula = Lusurv ~ Lung$sex)
## 
##              n events median 0.95LCL 0.95UCL
## Lung$sex=1 138    112    270     212     310
## Lung$sex=2  90     53    426     348     550
survdiff(Lusurv~Lung$sex)
## Call:
## survdiff(formula = Lusurv ~ Lung$sex)
## 
##              N Observed Expected (O-E)^2/E (O-E)^2/V
## Lung$sex=1 138      112     91.6      4.55      10.3
## Lung$sex=2  90       53     73.4      5.68      10.3
## 
##  Chisq= 10.3  on 1 degrees of freedom, p= 0.001

Surv()函数 * time、time2:观察时间(死亡时间)直接time=;起止时间:time=;time2= * event:事件,默认死亡为2 * 结果输出生存时间,截尾事件用+表示

survfit()函数 * 输入:fomula公式 ~左边为生存时间,~右边为分组依据,不分组则为1 * 输出:生存模型,包含均值和95%可信区间,用plot()直接绘图

survdiff()函数 * 输入:同样是公式 * 输出:可知p值

图形美化

plot(Lufit,conf.int = 'none',
     col = c("red","blue"),lwd = 2,
     xlab = 'Time', ylab = 'Survival Probability')
abline(h=0.5,lty=3)
abline(v=c(270,460),lty=3)
legend('bottomleft',c("Male","Female"),#注意与分组数据对应
       col = c("red","blue"),lwd = 2)
text(900,0.9, 'p = 0.001')#位置坐标与坐标轴对应
text(150,0.5,'Male,n=138',cex=.8)
text(600,0.45,'Female,n=90',cex=.8)

图形美化部分知识点:

  • 记住参数:col,conf.int,lwd,xlab

  • 添加参考线abline():h=/v=+数值,然后可以设置线形lty=

  • 添加图例legend():

    • 位置:can be ser as (x,y); or The location may also be specified by setting x to a single keyword from the list “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”.

    • 图例:c("a","b")与公式里面的分组依据对应

    • 颜色和形状:plot() 函数对应

  • 添加文字text() :

    • 位置:(x,y)

    • 文字:""

    • cex=.5 原本字体的倍数

suppressMessages(library(ggplot2))
suppressMessages(library(ggpubr))
suppressMessages(library(survminer))

ggsurvplot(Lufit,data = Lung, 
           risk.table=TRUE,conf.int=TRUE,
           #palette = c("skyblue","pink"),#颜色设置
           pval=TRUE,#log-rank检验
           pval.method=TRUE)#添加检验text