2021-09-02
感谢麦子老师
suppressMessages(library(survival))
help(package=survival)
#通过help文件找到了cancer数据集,里面包含所有内置数据集
data("cancer")
rm(list=ls())
Lung <- lung
#看不懂,help
head(Lung)
## inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
## 1 3 306 2 74 1 1 90 100 1175 NA
## 2 3 455 2 68 1 0 90 90 1225 15
## 3 3 1010 1 56 1 0 90 90 NA 15
## 4 5 210 2 57 1 1 90 60 1150 11
## 5 1 883 2 60 1 0 100 90 NA 0
## 6 12 1022 1 74 1 1 50 80 513 0
str(Lung)
## 'data.frame': 228 obs. of 10 variables:
## $ inst : num 3 3 3 5 1 12 7 11 1 7 ...
## $ time : num 306 455 1010 210 883 ...
## $ status : num 2 2 1 2 2 1 2 2 2 2 ...
## $ age : num 74 68 56 57 60 74 68 71 53 61 ...
## $ sex : num 1 1 1 1 1 1 2 2 1 1 ...
## $ ph.ecog : num 1 0 0 1 0 1 2 2 1 2 ...
## $ ph.karno : num 90 90 90 90 100 50 70 60 70 70 ...
## $ pat.karno: num 100 90 90 60 90 80 60 80 80 70 ...
## $ meal.cal : num 1175 1225 NA 1150 NA ...
## $ wt.loss : num NA 15 15 11 0 0 10 1 16 34 ...
inst: Institution
code time: Survival time in days
status: censoring status 1=censored, 2=dead censored:here means alive in the end
age: Age in years
sex: Male=1 Female=2
ph.ecog: ECOG performance score as rated by the physician. 0=asymptomatic, 1= symptomatic but completely ambulatory, 2= in bed <50% of the day, 3= in bed > 50% of the day but not bedbound, 4 = bedbound
ph.karno: Karnofsky performance score(bad=0-good=100) rated by physician pat.karno: Karnofsky performance score as rated by patient meal.cal: Calories consumed at meals wt.loss: Weight loss in last six months
在t检验和回归分析中,我们得到的是数据的参数(均值、标准差等);在生存分析中我们得到的是生存曲线。
1.提取生存时间Surv()
2.拟合模型survfit()
3.如果有分组,分析差异survdiff()
polt()绘图#提取生存时间
Lusurv <- Surv(time = Lung$time, event = Lung$status)
head(Lusurv)
## [1] 306 455 1010+ 210 883 1022+
#拟合模型
Lufit <- survfit(Lusurv~1)
plot(Lufit)
plot(Lufit,conf.int = "none")#去掉可信区间
#分组分析差异
Lufit <- survfit(Lusurv~Lung$sex)
#生存曲线
{
plot(Lufit)
abline(h = 0.5)#水平参考线
abline(v = 500, lty=3 )#垂直参考线,linetype虚线
}
#分析差异
Lufit
## Call: survfit(formula = Lusurv ~ Lung$sex)
##
## n events median 0.95LCL 0.95UCL
## Lung$sex=1 138 112 270 212 310
## Lung$sex=2 90 53 426 348 550
survdiff(Lusurv~Lung$sex)
## Call:
## survdiff(formula = Lusurv ~ Lung$sex)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Lung$sex=1 138 112 91.6 4.55 10.3
## Lung$sex=2 90 53 73.4 5.68 10.3
##
## Chisq= 10.3 on 1 degrees of freedom, p= 0.001
Surv()函数 * time、time2:观察时间(死亡时间)直接time=;起止时间:time=;time2= * event:事件,默认死亡为2 * 结果输出生存时间,截尾事件用+表示
survfit()函数 * 输入:fomula公式 ~左边为生存时间,~右边为分组依据,不分组则为1 * 输出:生存模型,包含均值和95%可信区间,用plot()直接绘图
survdiff()函数 * 输入:同样是公式 * 输出:可知p值
plot(Lufit,conf.int = 'none',
col = c("red","blue"),lwd = 2,
xlab = 'Time', ylab = 'Survival Probability')
abline(h=0.5,lty=3)
abline(v=c(270,460),lty=3)
legend('bottomleft',c("Male","Female"),#注意与分组数据对应
col = c("red","blue"),lwd = 2)
text(900,0.9, 'p = 0.001')#位置坐标与坐标轴对应
text(150,0.5,'Male,n=138',cex=.8)
text(600,0.45,'Female,n=90',cex=.8)
图形美化部分知识点:
记住参数:col,conf.int,lwd,xlab
添加参考线abline():h=/v=+数值,然后可以设置线形lty=
添加图例legend():
位置:can be ser as (x,y); or The location may also be specified by setting x to a single keyword from the list “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”.
图例:c("a","b")与公式里面的分组依据对应
颜色和形状: 与plot() 函数对应
添加文字text() :
位置:(x,y)
文字:""
cex=.5 原本字体的倍数
suppressMessages(library(ggplot2))
suppressMessages(library(ggpubr))
suppressMessages(library(survminer))
ggsurvplot(Lufit,data = Lung,
risk.table=TRUE,conf.int=TRUE,
#palette = c("skyblue","pink"),#颜色设置
pval=TRUE,#log-rank检验
pval.method=TRUE)#添加检验text