R是S语言的一种实现。S语言是由 AT&T贝尔实验室开发的一种用来进行数据探索、统计分析、作图的解释型语言。
最初S语言的实现版本主要是S-PLUS。S-PLUS是一个商业软件,它基于S语言,并由MathSoft公司的统计科学部进一步完善
后来Auckland大学的Robert Gentleman 和 Ross Ihaka 及其他志愿人员开发了一个R系统。
R的使用与S-PLUS有很多类似之处,两个软件有一定的兼容性。
1.有效的数据处理和保存机制。
2.拥有一整套数组和矩阵的操作运算符。
3.一系列连贯而又完整的数据分析中间工具。
4.图形统计可以对数据直接进行分析和显示,可用于多种图形设备。
5.一种相当完善、简洁和高效的程序设计语言。它包括条件语句、循环语句、用户自定义的递归函数以及输入输出接口。
6.R语言是彻底面向对象的统计编程语言。
7.R语言和其它编程语言、数据库之间有很好的接口。
8.R语言是自由软件,可以放心大胆地使用,但其功能却不比任何其它同类软件差。
9.R语言具有丰富的网上资源
显示当前的工作目录
getwd()
修改当前的工作目录
setwd()
列出当前工作空间中的对象
ls()
移除(删除)一个或多个对象
rm(objectlist)
保存工作空间到文件myfile中(默认后缀 .RData)
save.image(“myfile”)
保存指定对象到一个文件中
save(object, file=“myfile”)
读取一个工作空间到当前会话中
load(“myfile”)
标量与向量:
标量只含一个元素,向量为多个相同模式的标量组成的一维数组,使用c()进行组合
x1<-c(1,2,3,4,5,6)
x2<-x1*2
length(x1)
## [1] 6
class(x1)
## [1] "numeric"
相同模式元素的二维数组,通过matrix()创建
rbind(x1,x2)
## [,1] [,2] [,3] [,4] [,5] [,6]
## x1 1 2 3 4 5 6
## x2 2 4 6 8 10 12
cbind(x1,x2)
## x1 x2
## [1,] 1 2
## [2,] 2 4
## [3,] 3 6
## [4,] 4 8
## [5,] 5 10
## [6,] 6 12
Dmat<-cbind(x1,x2);Dmat
## x1 x2
## [1,] 1 2
## [2,] 2 4
## [3,] 3 6
## [4,] 4 8
## [5,] 5 10
## [6,] 6 12
class(Dmat)
## [1] "matrix"
Dmat2<-rbind(x1,x2)
class(Dmat2)
## [1] "matrix"
使用c函数结合的变量组成的是矩阵
数组:
维度可以大于2,元素模式相同,通过array()创建
dim1 <- c("A1","A2")
dim2 <- c("B1","B2","B3","B4")
dim3 <- c("C1","C2","C3")
myarray <- array(1:24,c(2,4,3),dimnames = list(dim1,dim2,dim3));myarray
## , , C1
##
## B1 B2 B3 B4
## A1 1 3 5 7
## A2 2 4 6 8
##
## , , C2
##
## B1 B2 B3 B4
## A1 9 11 13 15
## A2 10 12 14 16
##
## , , C3
##
## B1 B2 B3 B4
## A1 17 19 21 23
## A2 18 20 22 24
mylist <- list(Dmat,Dmat2,myarray)
mylist[3]
## [[1]]
## , , C1
##
## B1 B2 B3 B4
## A1 1 3 5 7
## A2 2 4 6 8
##
## , , C2
##
## B1 B2 B3 B4
## A1 9 11 13 15
## A2 10 12 14 16
##
## , , C3
##
## B1 B2 B3 B4
## A1 17 19 21 23
## A2 18 20 22 24
数据框:
不同的列可以包含不同模式的数据
Dframe<-as.data.frame(cbind(x1,x2));Dframe
dframe <- data.frame("n"=x1,"m"=x2)
class(Dframe)
## [1] "data.frame"
Dmat;Dframe
## x1 x2
## [1,] 1 2
## [2,] 2 4
## [3,] 3 6
## [4,] 4 8
## [5,] 5 10
## [6,] 6 12
df <- iris
矩阵和数据框格式是不一样的,矩阵是由行列组成的,数据框是由记录和变量组成
清除环境
rm(list=ls())
setwd("J:/port")
.csv
Stata,SAS,SPSS(foreign)
.txt{read.table/read.csv,sep=“./”}
library(rJava)
library(xlsxjars)
library(xlsx)
port <- read.xlsx("port.xls",sheetIndex=2,header=T,encoding='UTF-8')
port2 <- read.csv(“port.csv”,header=T,encoding=‘UTF-8’,row.names = 1)
names(port)
## [1] "no" "hosp" "patient"
## [4] "birth_y" "birth_m" "birth_d"
## [7] "gender" "height" "weight"
## [10] "location" "diagnosis" "complication"
## [13] "device" "Picc_n" "Picc_m"
## [16] "Port_m" "drug" "irriate"
## [19] "impact" "mobility" "care"
## [22] "care_h" "wtp" "outpatient"
## [25] "hospitalization" "maintain" "eq1"
## [28] "eq2" "eq3" "eq4"
## [31] "eq5" "v51" "v52"
## [34] "v53" "v54" "v55"
## [37] "v56" "v57" "v58"
## [40] "utility"
str(port)
## 'data.frame': 433 obs. of 40 variables:
## $ no : num 1 2 3 4 5 6 7 8 9 10 ...
## $ hosp : Factor w/ 2 levels "G","S": 1 1 1 1 1 1 1 1 1 1 ...
## $ patient : Factor w/ 396 levels "AJL","AL","BCG",..: 147 60 38 372 375 373 326 156 42 225 ...
## $ birth_y : num 1978 1978 1971 1978 1988 ...
## $ birth_m : num 4 5 3 2 8 5 3 8 12 5 ...
## $ birth_d : num 11 6 8 3 12 1 2 22 5 2 ...
## $ gender : num 2 1 1 1 2 1 1 2 1 2 ...
## $ height : num 156 165 173 160 172 169 166 157 170 155 ...
## $ weight : num 46 40 85 50 63 58 62 71 60 55 ...
## $ location : Factor w/ 114 levels "安徽-安庆市-太湖县",..: 19 25 25 31 67 80 31 32 25 11 ...
## $ diagnosis : num 4 4 4 4 1 4 4 4 4 4 ...
## $ complication : num 2 2 2 2 2 2 2 2 2 2 ...
## $ device : num 2 1 1 1 2 1 1 1 1 1 ...
## $ Picc_n : num -3 1 1 1 -3 1 1 1 1 1 ...
## $ Picc_m : num -3 5 5 2 -3 3 5 3 2 2 ...
## $ Port_m : Factor w/ 30 levels "-3","1","10",..: 30 1 1 1 28 1 1 1 1 1 ...
## $ drug : Factor w/ 131 levels "5-FU","avbd",..: 69 106 97 106 69 58 127 97 106 78 ...
## $ irriate : num 1 1 1 1 1 1 1 1 1 1 ...
## $ impact : num 2 3 4 2 2 2 2 2 2 2 ...
## $ mobility : num 2 2 4 2 2 2 1 1 2 2 ...
## $ care : num 2 2 2 2 2 2 2 2 2 2 ...
## $ care_h : num -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 ...
## $ wtp : num 5 30 15 8 20 8 9 8 8 8 ...
## $ outpatient : num 1 15 4 4 1 4 4 4 4 4 ...
## $ hospitalization: num 0 0 0 0 0 0 0 0 0 0 ...
## $ maintain : Factor w/ 10 levels "0","1","2","3",..: 2 10 8 5 2 5 5 5 5 5 ...
## $ eq1 : num 1 1 1 1 2 1 1 1 1 1 ...
## $ eq2 : num 1 1 1 1 1 1 1 1 1 1 ...
## $ eq3 : num 1 2 3 1 2 2 1 1 1 1 ...
## $ eq4 : num 1 1 1 1 2 1 1 1 1 2 ...
## $ eq5 : num 1 1 2 1 1 2 2 1 2 1 ...
## $ v51 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ v52 : num 0 1 1 1 0 1 1 1 1 1 ...
## $ v53 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ v54 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ v55 : num 1 1 0 0 1 0 1 1 1 1 ...
## $ v56 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ v57 : num 1 0 0 0 0 0 1 1 1 1 ...
## $ v58 : num 1 0 1 0 0 0 0 0 0 0 ...
## $ utility : num 1 0.955 0.844 1 0.831 0.906 0.951 1 0.951 0.942 ...
head(port)
tail(port)
port = transform(port,date=2018)
port$age <- port$date-port$birth_y
hist(port$age)
mean(port$age)
## [1] 51.99769
groupvec <- c(0,40,60,100)
labels <- c('young','midaged','senior')
port$agecat <- with(port,
cut(age,breaks = groupvec,labels = labels))
library(reshape)
names(port)[12] <- "comorbidity"
fix(port) df2 <- edit(port)
renamedf <- rename(df,c(oldname1=“newname”, oldname2=“newname2”)) #变量重命名
从字符串“abcdef”中提取出第2到4个位置上的字符
substr(“abcdef”, 2, 4)
substring(“abcdef”, 1:6, 1:6)
从字符串“abcdef”中提取出第1到1、2到2—6到6位置上的字符,即把字符串单个化
port$location <- as.character(port$location)
port$loc <- substr(port$location,1,2)
port$loc <- factor(port$loc)
levels(port$loc)
## [1] "安徽" "福建" "广东" "广西" "贵州" "海南" "河南" "湖北" "湖南" "江苏"
## [11] "江西" "辽宁" "山东"
截取字符
freq <- lapply(port[,c(7,11:13)],table) ;freq
## $gender
##
## 1 2
## 179 254
##
## $diagnosis
##
## 1 2 3 4
## 113 24 23 273
##
## $comorbidity
##
## 1 2
## 53 380
##
## $device
##
## 1 2
## 225 208
options(digits = 4)
prop <- lapply(freq[1:4],prop.table);prop
## $gender
##
## 1 2
## 0.4134 0.5866
##
## $diagnosis
##
## 1 2 3 4
## 0.26097 0.05543 0.05312 0.63048
##
## $comorbidity
##
## 1 2
## 0.1224 0.8776
##
## $device
##
## 1 2
## 0.5196 0.4804
character <- c(names(freq[1]),names(freq[[1]]))
NOC <- c(NA,paste0(freq[[1]],"(",round(prop[[1]]*100,2),")"))
characteristic <- data.frame("characteritic"=character,
"number of case"=NOC);characteristic
char <- NULL
for(i in 1:4){
character <- c(names(freq[i]),names(freq[[i]]))
NOC <- c(NA,paste0(freq[[i]],"(",round(prop[[i]]*100,2),")"))
characteristics <- data.frame("characteritics"=character,
"number of case(%)"=NOC)
char <- rbind(char,characteristics)};char
write.csv(char,"characteristics.csv",row.names = F)
library(ggpubr)
## Loading required package: ggplot2
## Loading required package: magrittr
载入包
ggdensity(port$utility)
port$device <- as.factor(port$device)
ggdensity(port,x="utility",color="device",fill="device",palette = c("blue","pink"),
rug = T,add = "mean")
密度图
gghistogram(port,x="utility",color="device",fill="device",palette = c("blue","pink"),
rug = T,add = "mean")
## Warning: Using `bins = 30` by default. Pick better value with the argument
## `bins`.
直方图
ggqqplot(port,x="utility",color="device")
qq图
shapiro.test(port$utility)
##
## Shapiro-Wilk normality test
##
## data: port$utility
## W = 0.69, p-value <2e-16
shapiro.test(port$utility[port$device==1])
##
## Shapiro-Wilk normality test
##
## data: port$utility[port$device == 1]
## W = 0.81, p-value = 1e-15
shapiro.test(port$utility[port$device==2])
##
## Shapiro-Wilk normality test
##
## data: port$utility[port$device == 2]
## W = 0.53, p-value <2e-16
tapply(port$utility, port$device, shapiro.test)
## $`1`
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.81, p-value = 1e-15
##
##
## $`2`
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.53, p-value <2e-16
正态性检验
p<0.05 不符合正态性检验,不能用t.test,用wilcox.test
wilcox.test(utility~device,data = port,excat="F",alternative="l")
##
## Wilcoxon rank sum test with continuity correction
##
## data: utility by device
## W = 17000, p-value = 4e-07
## alternative hypothesis: true location shift is less than 0
wilcox.test(utility~device,data = port,excat="F",alternative="t")
##
## Wilcoxon rank sum test with continuity correction
##
## data: utility by device
## W = 17000, p-value = 8e-07
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(port$utility,mu=0.95,excat="F",alternative="t")
##
## Wilcoxon signed rank test with continuity correction
##
## data: port$utility
## V = 45000, p-value = 0.5
## alternative hypothesis: true location is not equal to 0.95
wilcox.test
wilcox.test(utility~device,data = port,excat=“F”,alternative=“t”,paired=T)
若p>0.05,数据符合正态分布,则用t.test
方差齐性检验
var.test(utility~device,data = port)
##
## F test to compare two variances
##
## data: utility by device
## F = 0.96, num df = 220, denom df = 210, p-value = 0.8
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.7339 1.2542
## sample estimates:
## ratio of variances
## 0.9601
p>0.05 方差齐,经典t检验
t.test
非配对
t.test(utility~device,data = port,var.equal=T)
##
## Two Sample t-test
##
## data: utility by device
## t = -3.7, df = 430, p-value = 2e-04
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.05344 -0.01662
## sample estimates:
## mean in group 1 mean in group 2
## 0.9144 0.9494
diff <- with(ToothGrowth,len[supp=="OJ"]-len[supp=="VC"])
shapiro.test(diff)
##
## Shapiro-Wilk normality test
##
## data: diff
## W = 0.95, p-value = 0.1
t.test(len~supp,data = ToothGrowth,paired=T)
##
## Paired t-test
##
## data: len by supp
## t = 3.3, df = 29, p-value = 0.003
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.409 5.991
## sample estimates:
## mean of the differences
## 3.7
配对
diff <- with(port,utility[device==1]-utility[device==2]) shapiro.test(diff) t.test(utility~device,data = port,excat=“F”,alternative=“t”,paired=T)
t.test(port$utility,mu=0.95,excat="F",alternative="t")
##
## One Sample t-test
##
## data: port$utility
## t = -4, df = 430, p-value = 9e-05
## alternative hypothesis: true mean is not equal to 0.95
## 95 percent confidence interval:
## 0.9219 0.9405
## sample estimates:
## mean of x
## 0.9312
单样本
ggboxplot(port,x="device",y="utility",color = "device",palette = "lancet",
xlab = "Device",ylab = "Utility value",title = "UTILITY")+
stat_compare_means(method = "wilcox.test",paired = F,method.args = list(alternative="l"),
label.x = 2,label.y = 1.2)
方差齐性检验 转为因子型变量
port$diagnosisf <- factor(port$diagnosis,ordered = F)
正态性检验
with(port,tapply(utility, diagnosisf, shapiro.test))
## $`1`
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.7, p-value = 7e-14
##
##
## $`2`
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.53, p-value = 1e-07
##
##
## $`3`
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.67, p-value = 7e-06
##
##
## $`4`
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.7, p-value <2e-16
方差齐性检验 wilcox.test适用于两样本比较
library(car)
## Loading required package: carData
leveneTest(utility~diagnosisf,port)
p>0.05
单因素ANOVA
aov1 <- aov(utility~diagnosisf,port)
summary(aov1)
## Df Sum Sq Mean Sq F value Pr(>F)
## diagnosisf 3 0.08 0.02658 2.75 0.042 *
## Residuals 429 4.14 0.00965
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
res1 <- aov1$residuals
shapiro.test(res1)
##
## Shapiro-Wilk normality test
##
## data: res1
## W = 0.73, p-value <2e-16
ggqqplot(res1)
qqplot
诊断模型
TukeyHSD(aov1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = utility ~ diagnosisf, data = port)
##
## $diagnosisf
## diff lwr upr p adj
## 2-1 0.022047 -0.03489 0.078983 0.7502
## 3-1 -0.003607 -0.06156 0.054341 0.9985
## 4-1 -0.023366 -0.05170 0.004971 0.1463
## 3-2 -0.025654 -0.09957 0.048264 0.8074
## 4-2 -0.045413 -0.09935 0.008522 0.1328
## 4-3 -0.019759 -0.07476 0.035243 0.7907
两两比较
ggboxplot(port,x="diagnosisf",y="utility",fill = "diagnosisf",palette = "lancet")+
stat_compare_means(method = "anova",label.x =2,label.y = 0.25)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following object is masked from 'package:reshape':
##
## rename
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
port_lm <- select(port,hosp,agecat,height,weight,gender,diagnosis,comorbidity,device,irriate,utility)%>%
mutate(bmi=weight/(height/100)^2)%>%
within({bmicat <- NA
bmicat[bmi >= 24] <- "overweight"
bmicat[bmi <= 18.5] <- "underweight"
bmicat[bmi < 24 & bmi> 18.5] <- "ideal weight"})%>%
select(-height,-weight,-bmi)
library(dplyr)
port_lm <- port[,c(2,43,7:9,11:13,18,40)]%>%
mutate(bmi=weight/(height/100)^2)%>%
within({bmicat <- NA
bmicat[bmi >= 24] <- "overweight"
bmicat[bmi <= 18.5] <- "underweight"
bmicat[bmi < 24 & bmi> 18.5] <- "ideal weight"})
port_lm$hosp <- as.factor(port_lm$hosp)
port_lm$gender <- as.factor(port_lm$gender)
port_lm$diagnosis <- as.factor(port_lm$diagnosis)
port_lm$device <- as.factor(port_lm$device)
port_lm$comorbidity <- as.factor(port_lm$comorbidity)
port_lm$irriate <- as.factor(port_lm$irriate)
port_lm$bmicat <- as.factor(port_lm$bmicat)
port_lm$agecat <- as.factor(port_lm$agecat)
fit.lm <- lm(utility~.,data = port_lm)
summary(fit.lm)
##
## Call:
## lm(formula = utility ~ ., data = port_lm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8904 -0.0265 0.0219 0.0509 0.1338
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.102707 0.474180 2.33 0.02052 *
## hospS 0.040699 0.011184 3.64 0.00031 ***
## agecatmidaged 0.014396 0.012454 1.16 0.24836
## agecatsenior 0.005101 0.014872 0.34 0.73180
## gender2 -0.031687 0.014492 -2.19 0.02934 *
## height -0.001309 0.002917 -0.45 0.65383
## weight 0.000563 0.003659 0.15 0.87776
## diagnosis2 0.005875 0.023220 0.25 0.80038
## diagnosis3 -0.013680 0.024214 -0.56 0.57240
## diagnosis4 -0.013383 0.012764 -1.05 0.29503
## comorbidity2 -0.005367 0.015668 -0.34 0.73213
## device2 0.032933 0.009812 3.36 0.00086 ***
## irriate2 -0.007481 0.011120 -0.67 0.50145
## bmi 0.000169 0.009552 0.02 0.98593
## bmicatoverweight -0.016878 0.014595 -1.16 0.24818
## bmicatunderweight -0.009996 0.018160 -0.55 0.58232
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0951 on 417 degrees of freedom
## Multiple R-squared: 0.106, Adjusted R-squared: 0.0736
## F-statistic: 3.29 on 15 and 417 DF, p-value: 3.24e-05
filt.lm 置信区间
confint(fit.lm)
## 2.5 % 97.5 %
## (Intercept) 0.170626 2.034788
## hospS 0.018714 0.062684
## agecatmidaged -0.010084 0.038876
## agecatsenior -0.024133 0.034334
## gender2 -0.060173 -0.003200
## height -0.007044 0.004425
## weight -0.006630 0.007756
## diagnosis2 -0.039768 0.051518
## diagnosis3 -0.061278 0.033917
## diagnosis4 -0.038473 0.011707
## comorbidity2 -0.036165 0.025431
## device2 0.013645 0.052220
## irriate2 -0.029339 0.014376
## bmi -0.018607 0.018944
## bmicatoverweight -0.045566 0.011811
## bmicatunderweight -0.045694 0.025702
par(mfrow=c(2,2))
plot(fit.lm)
port[c(256,281,295),]
查看离群观测
fit.lm2 <- lm(utility~.,data = port_lm[-c(256),])
summary(fit.lm2)
##
## Call:
## lm(formula = utility ~ ., data = port_lm[-c(256), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5491 -0.0269 0.0191 0.0485 0.1283
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.937432 0.420424 2.23 0.026 *
## hospS 0.044799 0.009917 4.52 8.2e-06 ***
## agecatmidaged 0.012865 0.011035 1.17 0.244
## agecatsenior 0.011103 0.013189 0.84 0.400
## gender2 -0.027412 0.012847 -2.13 0.033 *
## height -0.000383 0.002586 -0.15 0.882
## weight -0.000924 0.003245 -0.28 0.776
## diagnosis2 0.008287 0.020575 0.40 0.687
## diagnosis3 -0.007231 0.021463 -0.34 0.736
## diagnosis4 -0.003972 0.011344 -0.35 0.726
## comorbidity2 0.003749 0.013908 0.27 0.788
## device2 0.039665 0.008717 4.55 7.0e-06 ***
## irriate2 -0.009633 0.009854 -0.98 0.329
## bmi 0.003732 0.008470 0.44 0.660
## bmicatoverweight -0.006351 0.012969 -0.49 0.625
## bmicatunderweight -0.011952 0.016092 -0.74 0.458
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0843 on 416 degrees of freedom
## Multiple R-squared: 0.146, Adjusted R-squared: 0.116
## F-statistic: 4.76 on 15 and 416 DF, p-value: 1.6e-08
par(mfrow=c(2,2))
plot(fit.lm2)
summary(lm(utility~diagnosis*gender,data=port_lm[-256,]))
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
stepAIC(fit.lm2,direction = "backward")
## Start: AIC=-2122
## utility ~ hosp + agecat + gender + height + weight + diagnosis +
## comorbidity + device + irriate + bmi + bmicat
##
## Df Sum of Sq RSS AIC
## - diagnosis 3 0.0041 2.96 -2127
## - bmicat 2 0.0068 2.96 -2125
## - agecat 2 0.0098 2.96 -2124
## - height 1 0.0002 2.95 -2124
## - comorbidity 1 0.0005 2.96 -2124
## - weight 1 0.0006 2.96 -2124
## - bmi 1 0.0014 2.96 -2123
## - irriate 1 0.0068 2.96 -2123
## <none> 2.95 -2122
## - gender 1 0.0323 2.99 -2119
## - hosp 1 0.1449 3.10 -2103
## - device 1 0.1471 3.10 -2103
##
## Step: AIC=-2127
## utility ~ hosp + agecat + gender + height + weight + comorbidity +
## device + irriate + bmi + bmicat
##
## Df Sum of Sq RSS AIC
## - bmicat 2 0.0077 2.97 -2130
## - agecat 2 0.0097 2.97 -2130
## - height 1 0.0000 2.96 -2129
## - comorbidity 1 0.0005 2.96 -2129
## - weight 1 0.0011 2.96 -2129
## - bmi 1 0.0022 2.96 -2129
## - irriate 1 0.0072 2.96 -2128
## <none> 2.96 -2127
## - gender 1 0.0328 2.99 -2124
## - hosp 1 0.1574 3.12 -2107
## - device 1 0.1684 3.13 -2105
##
## Step: AIC=-2130
## utility ~ hosp + agecat + gender + height + weight + comorbidity +
## device + irriate + bmi
##
## Df Sum of Sq RSS AIC
## - agecat 2 0.0101 2.98 -2132
## - height 1 0.0000 2.97 -2132
## - comorbidity 1 0.0007 2.97 -2132
## - weight 1 0.0013 2.97 -2132
## - bmi 1 0.0023 2.97 -2132
## - irriate 1 0.0080 2.97 -2131
## <none> 2.97 -2130
## - gender 1 0.0322 3.00 -2127
## - hosp 1 0.1574 3.12 -2110
## - device 1 0.1762 3.14 -2107
##
## Step: AIC=-2132
## utility ~ hosp + gender + height + weight + comorbidity + device +
## irriate + bmi
##
## Df Sum of Sq RSS AIC
## - comorbidity 1 0.0003 2.98 -2134
## - height 1 0.0003 2.98 -2134
## - weight 1 0.0006 2.98 -2134
## - bmi 1 0.0015 2.98 -2134
## - irriate 1 0.0105 2.99 -2133
## <none> 2.98 -2132
## - gender 1 0.0412 3.02 -2128
## - hosp 1 0.1693 3.15 -2111
## - device 1 0.1870 3.16 -2108
##
## Step: AIC=-2134
## utility ~ hosp + gender + height + weight + device + irriate +
## bmi
##
## Df Sum of Sq RSS AIC
## - height 1 0.0003 2.98 -2136
## - weight 1 0.0006 2.98 -2136
## - bmi 1 0.0015 2.98 -2136
## - irriate 1 0.0103 2.99 -2135
## <none> 2.98 -2134
## - gender 1 0.0410 3.02 -2130
## - hosp 1 0.1833 3.16 -2111
## - device 1 0.1886 3.17 -2110
##
## Step: AIC=-2136
## utility ~ hosp + gender + weight + device + irriate + bmi
##
## Df Sum of Sq RSS AIC
## - irriate 1 0.0105 2.99 -2137
## <none> 2.98 -2136
## - weight 1 0.0192 3.00 -2136
## - bmi 1 0.0290 3.01 -2134
## - gender 1 0.0417 3.02 -2132
## - hosp 1 0.1830 3.16 -2113
## - device 1 0.1883 3.17 -2112
##
## Step: AIC=-2137
## utility ~ hosp + gender + weight + device + bmi
##
## Df Sum of Sq RSS AIC
## <none> 2.99 -2137
## - weight 1 0.0198 3.01 -2136
## - bmi 1 0.0309 3.02 -2134
## - gender 1 0.0422 3.03 -2133
## - device 1 0.1866 3.17 -2113
## - hosp 1 0.2356 3.22 -2106
##
## Call:
## lm(formula = utility ~ hosp + gender + weight + device + bmi,
## data = port_lm[-c(256), ])
##
## Coefficients:
## (Intercept) hospS gender2 weight device2
## 0.87426 0.04859 -0.02752 -0.00161 0.04232
## bmi
## 0.00568
fit.lmstep <- lm(utility ~ hosp + gender + device, data = port_lm[-256, ])
summary(fit.lmstep)
##
## Call:
## lm(formula = utility ~ hosp + gender + device, data = port_lm[-256,
## ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5621 -0.0315 0.0243 0.0528 0.1174
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.89821 0.00771 116.56 < 2e-16 ***
## hospS 0.05049 0.00820 6.16 1.7e-09 ***
## gender2 -0.01561 0.00844 -1.85 0.065 .
## device2 0.04263 0.00823 5.18 3.4e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.084 on 428 degrees of freedom
## Multiple R-squared: 0.127, Adjusted R-squared: 0.121
## F-statistic: 20.8 on 3 and 428 DF, p-value: 1.38e-12
library(leaps) leaps <- regsubsets(utility ~ hosp + gender + device,data = port_lm,nbest = 4) par(mfrow=c(1,1)) plot(leaps,scale = “adjr2”)
install.packages(“stargazer”)
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
stargazer(port,header = F,type="text")
##
## ====================================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## --------------------------------------------------------------------
## no 433 217.000 125.100 1 109 325 433
## birth_y 433 1,966.000 13.260 1,934 1,956 1,975 1,999
## birth_m 433 6.418 3.438 1 3 9 12
## birth_d 433 12.260 8.858 1 5 20 31
## gender 433 1.587 0.493 1 1 2 2
## height 433 162.300 7.365 140 156 168 185
## weight 433 59.450 11.760 38 52 65 156
## diagnosis 433 3.053 1.316 1 1 4 4
## comorbidity 433 1.878 0.328 1 2 2 2
## Picc_n 433 -0.822 2.134 -3 -3 1 6
## Picc_m 433 0.411 3.576 -3 -3 3 12
## irriate 433 1.275 0.447 1 1 2 2
## impact 433 1.471 0.581 1 1 2 4
## mobility 433 1.480 0.545 1 1 2 4
## care 433 1.409 0.492 1 1 2 2
## care_h 433 3.952 7.835 -3 -3 7 24
## wtp 433 12.660 23.810 0 2 10 200
## outpatient 433 1.993 2.138 0 1 4 31
## hospitalization 433 3.222 5.072 0 0 4 31
## eq1 433 1.118 0.422 1 1 1 5
## eq2 433 1.226 0.481 1 1 1 5
## eq3 433 1.263 0.509 1 1 1 5
## eq4 433 1.256 0.458 1 1 1 3
## eq5 433 1.434 0.523 1 1 2 3
## v51 433 0.014 0.117 0 0 0 1
## v52 433 0.499 0.501 0 0 1 1
## v53 433 0.023 0.150 0 0 0 1
## v54 433 0.016 0.126 0 0 0 1
## v55 433 0.711 0.454 0 0 1 1
## v56 433 0.021 0.143 0 0 0 1
## v57 433 0.617 0.487 0 0 1 1
## v58 433 0.383 0.487 0 0 1 1
## utility 433 0.931 0.099 0 0.9 1 1
## date 433 2,018.000 0.000 2,018 2,018 2,018 2,018
## age 433 52.000 13.260 19 43 62 84
## --------------------------------------------------------------------
stargazer(fit.lm,fit.lm2,title="result",type="text",align = T)
##
## result
## ===================================================================
## Dependent variable:
## -----------------------------------------------
## utility
## (1) (2)
## -------------------------------------------------------------------
## hospS 0.041*** 0.045***
## (0.011) (0.010)
##
## agecatmidaged 0.014 0.013
## (0.012) (0.011)
##
## agecatsenior 0.005 0.011
## (0.015) (0.013)
##
## gender2 -0.032** -0.027**
## (0.014) (0.013)
##
## height -0.001 -0.0004
## (0.003) (0.003)
##
## weight 0.001 -0.001
## (0.004) (0.003)
##
## diagnosis2 0.006 0.008
## (0.023) (0.021)
##
## diagnosis3 -0.014 -0.007
## (0.024) (0.021)
##
## diagnosis4 -0.013 -0.004
## (0.013) (0.011)
##
## comorbidity2 -0.005 0.004
## (0.016) (0.014)
##
## device2 0.033*** 0.040***
## (0.010) (0.009)
##
## irriate2 -0.007 -0.010
## (0.011) (0.010)
##
## bmi 0.0002 0.004
## (0.010) (0.008)
##
## bmicatoverweight -0.017 -0.006
## (0.015) (0.013)
##
## bmicatunderweight -0.010 -0.012
## (0.018) (0.016)
##
## Constant 1.103** 0.937**
## (0.474) (0.420)
##
## -------------------------------------------------------------------
## Observations 433 432
## R2 0.106 0.146
## Adjusted R2 0.074 0.116
## Residual Std. Error 0.095 (df = 417) 0.084 (df = 416)
## F Statistic 3.289*** (df = 15; 417) 4.758*** (df = 15; 416)
## ===================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
library(readxl)
CCAP <- read_excel("j:/port/CCAP.xlsx")
library(dplyr)
CCAP_qc <- CCAP%>%group_by(病案号,入院日期)%>%filter(row_number()==1)%>%ungroup()
CCAP1 <- CCAP_qc%>%filter(遴选意见=="入选"&主要诊断疾病名称=="支气管肺炎"|
遴选意见=="入选"&主要诊断疾病名称=="喘息型支气管肺炎")
names(CCAP1)
## [1] "序号" "出院科别"
## [3] "病案号" "出院日期"
## [5] "单病种上报" "主要诊断疾病名称"
## [7] "年份" "遴选意见"
## [9] "性别" "年龄"
## [11] "入院日期" "入院科别"
## [13] "天数" "ICD码"
## [15] "诊断类型" "疾病名称"
## [17] "主要诊断疾病名称__1" "主要诊断ICD码"
## [19] "其他诊断疾病名称" "总费用"
## [21] "床位费" "护理费"
## [23] "西药费" "中药费"
## [25] "化验费" "诊疗费"
## [27] "手术费" "检查费"
## [29] "麻醉费" "其他费用"
## [31] "其他费" "治疗材料费"
## [33] "西药费其中抗菌药物费用" "临床路径病例编号"
## [35] "临床路径病例" "新付款方式编号"
## [37] "新付款方式" "自费金额"
is.na(CCAP1)
sum(is.na(CCAP1))
## [1] 202
table(is.na(CCAP1))
##
## FALSE TRUE
## 11692 202
CCAP1[complete.cases(CCAP1),] #列出没有缺失值的行
nrow(CCAP1[complete.cases(CCAP1),]) #计算没有缺失值的样本量
## [1] 143
CCAP2 <- na.omit(CCAP1) #删缺失所在行
table(is.na(CCAP2))
##
## FALSE
## 5434
CCAP2$出院日期 <- as.Date(CCAP2$出院日期)
CCAP2$入院日期 <- as.Date(CCAP2$入院日期)
CCAP2$住院天数=CCAP2$出院日期-CCAP2$入院日期
CCAP3 <- CCAP2%>%mutate(
综合医疗服务费=床位费+护理费+诊疗费+其他费用,综合医疗服务费占比=round(综合医疗服务费/总费用*100,2),
药费=西药费+中药费,药费占比=round(药费/总费用*100,2),
手术治疗费=手术费+麻醉费+治疗材料费,手术治疗费占比=round(手术治疗费/总费用*100,2),
诊断费=化验费+检查费,诊断费占比=round(诊断费/总费用*100,2),
其他=其他费,其他占比=round(其他/总费用*100,2))%>%mutate(主要诊断疾病名称="儿童社区获得性肺炎")
library(parallel)
detectCores()
## [1] 8
ls() #work space
memory.size() #查看当前work place内存使用
gc()
memory.limit() #查看当前设置下最大内存(?“Memory-limits”)
memory.limit(newLimit)
memory.size(NA) #查看当前设置下最大内存(?“Memory-limits”)
install.packages(“ggplot2”)
library(ggplot2)
help(package=“ggplot2”)