非参数检验不考虑总体分布是否已知,也不针对总体参数,而是针对总体的某些一般性假设(如总体分布的位置是否相同,总体分布是否正态)进行检验。非参数检验方法简便,不依赖于总体分布的具体形式因而适用性强,但灵敏度和精确度不如参数检验。一般而言,非参数检验适用于以下三种情况:顺序类型的数据资料,这这类数据的分布形态一般是未知的;总体分布形态未知或者非正态的连续型数据,这类数据和卡方检验一样,称自由分布检验;总体分布虽正态,数据也是连续类型,但样本容量极小,如10以下(虽然T检验被称为小样本统计方法,但样本容量太小时,代表性毕竟很差,最好不要用要求较严格的参数检验法)

单样本(One-sample)

符号检验(Sign-test)

符号检验法是通过对两个相关样本的每对数据之差的符号(正号或负号)进行检验,以比较这两个样本所代表的总体的差异显著性,对应于参数检验中两相关样本差异显著性的T检验。其基本思想是:若两总体差异不显著,则两样本差值的正号与负号应大致各占一半,即中位数为0,可见符号检验是以中数作为统计量进行假设检验的。 例 某药厂生产的胶囊平均长度为13mm,现随机在生产线上抽取10个,测得长度如下:15 13 13 15 13 13 12 12 10 9,生产过程中对长度的控制是否需要调整?

leng <- c(15,13,13,15,13,13,12,12,10,9)
SignTest(leng,mu = 13)
## 
##  One-sample Sign-Test
## 
## data:  leng
## S = 2, number of differences = 6, p-value = 0.6875
## alternative hypothesis: true median is not equal to 13
## 97.9 percent confidence interval:
##  10 15
## sample estimates:
## median of the differences 
##                        13

P值大于0.05,可以认为长度不需要调整。

例 乙型脑炎病例在25个不同三甲医院的治疗费用(元)如下:15165 4257 25844 36779 4730 14687 2901 13964 16226 25004 37086 2193 31139 4057 34443 37572 6260 2754 18377 39357 29184 17398 9948 13938 32874试用符号检验分析,某三甲医院治疗费(10000元)用是在中位数之上还是之下。

样本的中位数(M)作为治疗费用的中间值,\(H_{0}\):M>10000,\(H_{1}\):M<10000。

x <- c(15165,4257,25844,36779,4730,14687,2901,13964,
       16226,25004,37086,2193,31139,4057,34443,37572,6260,2754,
       18377,39357,29184,17398,9948,13938,32874)
binom.test(sum(x>10000),n = length(x),alternative = "l")
## 
##  Exact binomial test
## 
## data:  sum(x > 10000) and length(x)
## number of successes = 17, number of trials = 25, p-value = 0.9784
## alternative hypothesis: true probability of success is less than 0.5
## 95 percent confidence interval:
##  0.0000000 0.8296963
## sample estimates:
## probability of success 
##                   0.68

P值大于0.05,可以认为样本的中位数大于10000元,也就是说某三甲医院治疗费用(10000元)在中位数之下。

例 为豚鼠注入肾上腺素前后的每分钟灌流滴数,给药前:30 38 48 48 60 46 26 58 46 48 44 46,给药后:46 50 52 52 58 64 56 54 54 58 36 54,试比较给药前后灌流滴数有无显著差别?

x <- c(30,38,48,48,60,46,26,58,46,48,44,46)
y <- c(46,50,52,52,58,64,56,54,54,58,36,54)
binom.test(sum(x<y),length(x))
## 
##  Exact binomial test
## 
## data:  sum(x < y) and length(x)
## number of successes = 9, number of trials = 12, p-value = 0.146
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4281415 0.9451394
## sample estimates:
## probability of success 
##                   0.75

P值大于0.05,可认为给药前后无显著差异。

Wilcoxon符号秩检验(Wilcoxon signed rank test)

“秩”又称等级、即按数据大小排定的顺序号,顺序号的和称“秩和”,秩和检验就是用秩和作为统计量进行假设检验的方法。Wilcoxon符号秩检验即符号等级检验法是通过对两个相关样本的每对数据之差的符号(正号或负号)及等级进行检验,以比较这两个样本所代表的总体的差异显著性,对应于参数检验中两相关样本差异显著性的T检验。其基本思想是:若两总体差异不显著,则两样本的正负向差值的等级之和应大致相等,即分布对称,且中位数为0,可见符号检验是以中位数和分布的对称性为统计量进行假设检验的。

例 某地10名儿童智力测验所的IQ如下:99 131 118 112 128 136 120 107 134 122,这10名儿童的智力是否小于等于当地智力中位数110?

IQ    <- c(99, 131, 118, 112, 128, 136, 120, 107, 134, 122)
medH0 <- 110
wilcox.test(IQ, alternative="greater", mu=medH0, conf.int=TRUE)
## 
##  Wilcoxon signed rank test
## 
## data:  IQ
## V = 48, p-value = 0.01855
## alternative hypothesis: true location is greater than 110
## 95 percent confidence interval:
##  113.5   Inf
## sample estimates:
## (pseudo)median 
##            121

P值小于0.05,可以认为这10名儿童的智力大于当地智力中位数110。

两独立样本(Two independent samples)

符合检验(Sign-test)

例 在某项研究中,经随机抽样获得甲乙两组病人的血尿素氮(BUN)mmol/L,甲组:4.98 3.90 4.02 0.68 4.98 5.04 1.20 2.64 6.23 3.00,乙组:4.17 4.95 3.96 3.59 4.89 3.03 3.71 5.91 5.55 6.29 4.82 3.90 6.11,试比较甲乙组病人血尿素氮(BUN)的含量有无差别?

Nj  <- c(10, 13)
x <- c(4.98,3.90,4.02,0.68,4.98,5.04,1.20,2.64,6.23,3.00)
y <- c(4.17,4.95,3.96,3.59,4.89,3.03,3.71,5.91,5.55,6.29,4.82,3.90,6.11)
wIndDf <- data.frame(DV=c(x, y),
                     IV=factor(rep(1:2, Nj), labels=LETTERS[1:2]))
median_test(DV ~ IV, distribution="exact", data=wIndDf)
## 
##  Exact Two-Sample Brown-Mood Median Test
## 
## data:  DV by IV (A, B)
## Z = -0.6445, p-value = 0.6802
## alternative hypothesis: true mu is not equal to 0

P值均大于0.05,不能拒绝原假设,可以认为两组病人血尿素氮无差异。

Wilcoxon符号秩和检验(Wilcoxon rank-sum test)

Wilcoxon符号秩和检验即Mann-Whitney U检验适用于两独立样本的差异显著性检验,用以确定两种总体的分布是否相同,对应于参数检验中两独立样本均数之差的T检验。 例 在某项研究中,经随机抽样获得甲乙两组病人的血尿素氮(BUN)mmol/L,甲组:4.98 3.90 4.02 0.68 4.98 5.04 1.20 2.64 6.23 3.00,乙组:4.17 4.95 3.96 3.59 4.89 3.03 3.71 5.91 5.55 6.29 4.82 3.90 6.11,试比较甲组病人血尿素氮(BUN)的含量是否大于乙组?

#wilcox.test(x,y,exact = FALSE,correct =F) #差异性比较 
#wilcox.test(x,y,exact = FALSE) 
wilcox.test(DV ~ IV, alternative="less", conf.int=TRUE, data=wIndDf)
## Warning in wilcox.test.default(x = c(4.98, 3.9, 4.02, 0.68, 4.98, 5.04, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = c(4.98, 3.9, 4.02, 0.68, 4.98, 5.04, :
## cannot compute exact confidence intervals with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  DV by IV
## W = 47.5, p-value = 0.1458
## alternative hypothesis: true location shift is less than 0
## 95 percent confidence interval:
##       -Inf 0.1599603
## sample estimates:
## difference in location 
##             -0.9299707

P值均大于0.05,不能拒绝原假设,可以认为甲组病人血尿素氮(BUN)的含量不大于乙组。

例 已知某地正常人尿氟含量的中位数为2.15mmol/L。今在该地某厂随机抽取12名工人,测得尿氟含量(mmol/L),结果如下2.15 2.10 2.20 2.12 2.42 2.52 2.62 2.72 2.99 3.19 3.37 4.57。问该厂工人的尿氟含量是否高于当地正常人?

\(H_{0}\):M>2.15,\(H_{1}\):M<2.15。

x <- c(2.15,2.10,2.20,2.12,2.42,2.52,2.62,2.72,2.99,3.19,3.37,4.57)
wilcox.test(x,alternative = "l",mu = 2,15)
## 
##  Wilcoxon rank sum test
## 
## data:  x and 15
## W = 0, p-value = 0.07692
## alternative hypothesis: true location shift is less than 2

P值大于0.05,接受原假设,认为该工厂的尿氟含量高于当地正常人。

配对样本比较的Wilcoxon

配对设计计量资料两处理效应的比较,一般采用配对t检验,如果差数严重偏离正态分布,可采用Wilcoxon秩检验,亦称符号秩和检验(signed rank test)。一般认为,在数据满足配对t检验要求时,Wilcoxon秩检验的功效是检验效能的95%左右。目的是推断配对样本差值的总体中位数是否和0有差别,即推断配对的两个相关样本所来自的两个总体中位数是否有差别。

例 某研究者欲研究保健食品对小鼠抗疲劳作用,将同种属的小鼠按性别和年龄相同、体重相近配成对子,共10对,并将每对中的两只小鼠随机分到保健食品两个不同的剂量组,过一定时期将小鼠杀死,测得10对小鼠肝糖原含量(mg/100g)中剂量组620.16 866.50 641.22 812.91 738.96 899.38 760.78 694.95 749.92 793.94 高剂量组958.47 838.42 788.90 815.20 783.17 910.92 758.49 870.80 862.26 805.48,问不同剂量组的小鼠肝糖原含量有无差别?

x <- c(620.16,866.50,641.22,812.91,738.96,899.38,760.78,694.95,749.92,793.94)
y <- c(958.47,838.42,788.90,815.20,783.17,910.92,758.49,870.80,862.26,805.48)
wilcox.test(x,y,paired = T)
## Warning in wilcox.test.default(x, y, paired = T): cannot compute exact p-
## value with ties
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  x and y
## V = 6, p-value = 0.0322
## alternative hypothesis: true location shift is not equal to 0

P值小于0.05,可以认为该保健食品的不同剂量对小鼠肝糖原含量的作用不同。

例 在某项研究中,经随机抽样获得甲乙两组病人的血尿素氮(BUN)mmol/L,甲组:4.98 3.90 4.02 0.68 4.98 5.04 1.20 2.64 6.23 3.00,乙组:4.17 4.95 3.96 3.59 4.89 3.03 3.71 5.91 5.55 6.29 4.82 3.90 6.11,试比较甲乙组病人血尿素氮(BUN)的含量有无差别?

x <- c(4.98,3.90,4.02,0.68,4.98,5.04,1.20,2.64,6.23,3.00)
y <- c(4.17,4.95,3.96,3.59,4.89,3.03,3.71,5.91,5.55,6.29,4.82,3.90,6.11)
wilcox.test(x,y,exact = FALSE,correct =F) 
## 
##  Wilcoxon rank sum test
## 
## data:  x and y
## W = 47.5, p-value = 0.2775
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(x,y,exact = FALSE) 
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  x and y
## W = 47.5, p-value = 0.2915
## alternative hypothesis: true location shift is not equal to 0

无论是否采用连续校正,P值均大于0.05,不能拒绝原假设,可以两组病人血尿素氮无差异。

例 某研究者欲评价新药按摩乐口服液治疗高甘油三脂血症的疗效,将高甘油三脂血症患者189例随机分为两组,分别用按摩乐口服液和山楂精降脂片治疗,数据见下表,问两种药物治疗高甘油三脂血症的疗效有无不同?

疗效 按摩乐口服液 山楂精降脂片
无效 17 70
有效 25 13
显效 37 37
a <- rep(1:3,c(17,25,27))
b <- rep(1:3,c(70,13,37))
wilcox.test(a, b, exact=FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 5248, p-value = 0.0009432
## alternative hypothesis: true location shift is not equal to 0

P值小于0.05,可以认为,两种药物对高甘油三脂血症的疗效分布不同。

多组样本(more than two samples)

非参数多组比较法其主要用于等级型数据或不满足参数检验条件的场合,故亦称等级方差分析(ANOVA by ranks)。和方差分析一样,按照实验设计不同,非参数多组比较法也包括适用于完全随机化设计的单向秩次方差分析或称Kruskal-Wallis检验和适用于随机化区组设计的双向秩次方差分析(Friedman检验)。 ###无序独立样本(Independent samples - unordered groups) Kruskal-Wallis检验用来检验多个独立样本的位置是否一样。

例 四组病例测试的智力IQ分别打分如下:A:99, 131, 118, 112, 128, 136, 120, 107, 134, 122,B:134, 103, 127, 121, 139, 114, 121, 132 C:110, 123, 100, 131, 108, 114, 101, 128, 110,D:117, 125, 140, 109, 128, 137, 110, 138, 127, 141, 119, 148 问不同组之间的IQ是否有差异?

A  <- c(99, 131, 118, 112, 128, 136, 120, 107, 134, 122)
B  <- c(134, 103, 127, 121, 139, 114, 121, 132)
C  <- c(110, 123, 100, 131, 108, 114, 101, 128, 110)
D  <- c(117, 125, 140, 109, 128, 137, 110, 138, 127, 141, 119, 148)
Nj   <- c(length(A), length(B), length(C), length(D))
KWdf <- data.frame(DV=c(A,B,C,D),
                   IV=factor(rep(1:4, Nj), labels=c("A", "B", "C", "D")))
kruskal.test(DV ~ IV, data=KWdf) #stats包
## 
##  Kruskal-Wallis rank sum test
## 
## data:  DV by IV
## Kruskal-Wallis chi-squared = 6.0595, df = 3, p-value = 0.1087
kruskal_test(DV ~ IV, distribution=approximate(B=9999), data=KWdf) #coin包
## 
##  Approximative Kruskal-Wallis Test
## 
## data:  DV by IV (A, B, C, D)
## chi-squared = 6.0595, p-value = 0.111
pairwise.wilcox.test(KWdf$DV, KWdf$IV, p.adjust.method="holm") #两两比较
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot
## compute exact p-value with ties

## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot
## compute exact p-value with ties

## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot
## compute exact p-value with ties

## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot
## compute exact p-value with ties

## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot
## compute exact p-value with ties

## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot
## compute exact p-value with ties
## 
##  Pairwise comparisons using Wilcoxon rank sum test 
## 
## data:  KWdf$DV and KWdf$IV 
## 
##   A    B    C   
## B 0.97 -    -   
## C 0.84 0.51 -   
## D 0.84 0.97 0.16
## 
## P value adjustment method: holm

P值均大于0.05,不能拒绝原假设,可以认为四组的智力没有差异。两两比较结果,没有智力差异。 permutation检验即置换检验,在当样本量不够大,样本分布未知的情况下,用置换检验模拟出样本均值分布,然后再进行比较。

oneway_test(DV ~ IV, distribution=approximate(B=9999), data=KWdf)
## 
##  Approximative K-Sample Fisher-Pitman Permutation Test
## 
## data:  DV by IV (A, B, C, D)
## chi-squared = 6.8133, p-value = 0.07211

P值均大于0.05,不能拒绝原假设,可以认为四组的智力没有差异。

例 三名评分员对10名测试的分别打分如下:A:1 2 5 3 2 1 1 3 2 1,B:4 3 6 5 2 6 1 6 5 4,C:9,6,7,7,5,1,8,9,6,5,问不同的评分员之间是否有差异?

Value <- c(1,2,5,3,2,1,1,3,2,1,4,3,6,5,2,6,1,6,5,4,9,6,7,7,5,1,8,9,6,5)
Group <- factor(c(rep(1,10),rep(2,10),rep(3,10)))
data <- data.frame(Group, Value)

kruskal.test(Value ~ Group, data=data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Value by Group
## Kruskal-Wallis chi-squared = 13.675, df = 2, p-value = 0.001073

P值小于0.05,可以认为,三位评员的评分分布不同。

例 下面的数据是游泳、打篮球、骑自行车等三种不同的运动在30分钟内消耗的热量(单位:卡路里). 这些数据是否说明这三种运动消耗的热量全相等?游泳: 306, 385, 300, 319, 320;打篮球: 311, 364, 315, 338, 398;骑自行车: 289, 198, 201, 302, 289.

x<-list(swim=c(306, 385, 300, 319, 320),
basketball=c(311, 364, 315, 338, 398),
bicycle=c(289, 198, 201, 302, 289))
kruskal.test(x)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  x
## Kruskal-Wallis chi-squared = 9.1564, df = 2, p-value = 0.01027

有序独立样本(Independent samples - ordered groups)

Jonckheere Terpstra trend检验适用于多个总体有相似的连续分布(除了位置可能不同外),所有的观察值在样本内和样本间独立。不同于Kruskal-Wallis检验,Jonckheere Terpstra trend检验可以判断是否出现单调的趋势。

例 在一项健康试验中,有3种生活方式,他们一个月后减少的种类如下:A 3.7 3.7 3.0 3.9 2.7 B:7.3 5.2 5.3 5.7 6.5 C:9.0 4.9 7.1 8.7,能否得三种生活方式减肥效果相同?

A  <- c(3.7,3.7,3.0,3.9,2.7)
B  <- c(7.3,5.2,5.3,5.7,6.5)
C  <- c(9.0,4.9,7.1,8.7)
Nj <- c(length(A), length(B), length(C))
JTdf <- data.frame(IV=ordered(rep(LETTERS[1:3], Nj)),
                   DV=c(A,B,C))

kruskal_test(DV ~ IV, distribution=approximate(B=9999), data=JTdf)
## 
##  Approximative Linear-by-Linear Association Test
## 
## data:  DV by IV (A < B < C)
## Z = 2.9232, p-value = 0.0013
## alternative hypothesis: two.sided
JonckheereTerpstraTest(DV ~ IV, data=JTdf)
## Warning in JonckheereTerpstraTest.default(c(3.7, 3.7, 3, 3.9, 2.7, 7.3, : Sample size > 100 or data with ties 
##  p-value based on normal approximation. Specify nperm for permutation p-value
## 
##  Jonckheere-Terpstra test
## 
## data:  DV by IV
## JT = 59, p-value = 0.001913
## alternative hypothesis: two.sided

p值小于0.05,可以认为三个总体的位置有上升趋势。

无序非独立样本(Dependent samples - unordered groups)

Friedman检验即双向等级方差分析对应于参数检验中随机区组设计的方差分析 例 现有6条狗服用阿司匹林不同时间(时间)血中的药物浓度数据(r/ml)如下表,问服药后不同时间血中的药物浓度有无差别?

编号 0.5h 1h 6h 8h 24h 48h
1 51.6 135.2 169.8 137.2 31.9 0.4
2 49.6 101.6 158.4 133.0 18.7 0.0
3 40.6 88.4 142.8 126.6 18.1 2.0
4 11.2 37.2 131.8 130.3 17.5 0.2
5 17.8 48.2 118.0 124.5 18.7 1.8
6 14.4 41.6 120.8 123.5 24.8 3.0
x<-matrix(c(51.6,49.6,40.6,11.2,17.8,14.4,
           135.2,101.6,88.4,37.2,48.2,41.6,
           169.8,158.4,142.8,131.8,118,120.8,
           137.2,133,126.6,130.3,124.5,123.5,
           31.9,18.7,18.1,17.5,18.7,24.8,
           0.4,0,2,0.2,1.8,3
          ),
         nrow = 6,
         byrow = TRUE,
         dimnames = list(1 : 6,c("h1", "h2", "h3","h4","h5","h6")))

friedman.test(x)
## 
##  Friedman rank sum test
## 
## data:  x
## Friedman chi-squared = 12.799, df = 5, p-value = 0.02534

P值小于0.05,拒绝原假设,可以认为服药后不同时间血中的药物浓度有差异。

例 四种测试的方式,分别测试得到如下:A:14 13 12 11 10 ,B:11 12 13 14 15,C:16 15 14 13 12,D:13 12 11 10 9,问不同的测试方式之间是否有差异?

N   <- 5
P   <- 4
DV1 <- c(14, 13, 12, 11, 10)
DV2 <- c(11, 12, 13, 14, 15)
DV3 <- c(16, 15, 14, 13, 12)
DV4 <- c(13, 12, 11, 10,9)
Fdf <- data.frame(id=factor(rep(1:N, times=P)),
                  DV=c(DV1, DV2, DV3, DV4),
                  IV=factor(rep(1:P, each=N),
                            labels=LETTERS[1:P]))
friedman.test(DV ~ IV | id, data=Fdf)
## 
##  Friedman rank sum test
## 
## data:  DV and IV and id
## Friedman chi-squared = 8.2653, df = 3, p-value = 0.04084

P值小于0.05,拒绝原假设,可以认为四种不同的测试方式之间有差异。

由于本例中样本量较小,置换检验如下

oneway_test(DV ~ IV | id, distribution=approximate(B=9999), data=Fdf)
## 
##  Approximative K-Sample Fisher-Pitman Permutation Test
## 
## data:  DV by IV (A, B, C, D) 
##   stratified by id
## chi-squared = 6.8182, p-value = 0.06781

置换检验结果表明四种不同的测试方式之间没有差异。

例 现有6条狗服用阿司匹林不同时间(时间)血中的药物浓度数据(r/ml)如下表,问服药后不同时间血中的药物浓度有无差别?

编号 0.5h 1h 6h 8h 24h 48h
1 51.6 135.2 169.8 137.2 31.9 0.4
2 49.6 101.6 158.4 133.0 18.7 0.0
3 40.6 88.4 142.8 126.6 18.1 2.0
4 11.2 37.2 131.8 130.3 17.5 0.2
5 17.8 48.2 118.0 124.5 18.7 1.8
6 14.4 41.6 120.8 123.5 24.8 3.0
x<-matrix(c(51.6,49.6,40.6,11.2,17.8,14.4,
           135.2,101.6,88.4,37.2,48.2,41.6,
           169.8,158.4,142.8,131.8,118,120.8,
           137.2,133,126.6,130.3,124.5,123.5,
           31.9,18.7,18.1,17.5,18.7,24.8,
           0.4,0,2,0.2,1.8,3
          ),
         nrow = 6,
         byrow = TRUE,
         dimnames = list(1 : 6,c("h1", "h2", "h3","h4","h5","h6")))

friedman.test(x)
## 
##  Friedman rank sum test
## 
## data:  x
## Friedman chi-squared = 12.799, df = 5, p-value = 0.02534

P值小于0.05,拒绝原假设,可以认为服药后不同时间血中的药物浓度有差异

例 A、B两种药物治疗后,测得体重相近的三组患儿的血肌酐含量如下,每组2名,问A、B两种药物对患儿的血肌酐含量的影响是否有差异?

药物 区组 血浆蛋白
A L 44.55
B L 28.22
A M 24.00
B M 28.77
A H 24.55
B H 18.77
w <- as.factor(c("A","B","A","B","A","B"))
t <- as.factor(c("L","L","M","M","H","H"))
x <- c(44.55,28.22,24.00,28.77,24.55,18.77)
wb <- as.data.frame(cbind(x,w,t))
friedman.test(x ~ w | t, data = wb)
## 
##  Friedman rank sum test
## 
## data:  x and w and t
## Friedman chi-squared = 0.33333, df = 1, p-value = 0.5637

P值大于0.05,不能拒绝原假设,可以认为A、B两种药物对患儿的血肌酐含量影响无差异。

有序非独立样本(Dependent samples - ordered groups)

Page trend检验对应于参数检验中Spearman’s rank相关检验,适用于有序的多组之间的的非参数的趋势检验。 例 四种测试的方式,分别测试得到如下:A:1.79 -3.05 2.44 1.53 -0.43 0.96 0.17 1.55 0.19 -1.21,B:2.43 -3.20 0.73 -3.60 -4.05 -1.91 -5.44 1.97 -3.78 -3.25,C:3.33 -3.50 -0.49 0.87 2.28 -1.25 6.70 2.79 3.53 4.73,D:-3.23 2.00 4.43 3.95 2.54 3.45 0.66 0.91 0.06 -1.47,问不同的测试方式之间是否有差异?

A <- c(1.79,-3.05,2.44,1.53,-0.43,0.96,0.17,1.55,0.19,-1.21)
B <- c(2.43,-3.20,0.73,-3.60,-4.05,-1.91,-5.44,1.97,-3.78,-3.25)
C <- c(3.33,-3.50,-0.49,0.87,2.28,-1.25,6.70,2.79,3.53,4.73)
D <- c(-3.23,2.00,4.43,3.95,2.54,3.45,0.66,0.91,0.06,-1.47)

Pdf <- data.frame(id=factor(rep(1:10, times=4)),
                  DV=c(A,B,C,D),
                  IV=ordered(rep(LETTERS[1:4], each=10)))
PageTest(DV ~ IV | id, data=Pdf) #DescTools包
## 
##  Page test for ordered alternatives
## 
## data:  DV and IV and id
## L = 261, p-value = 0.1269
friedman_test(DV ~ IV | id, distribution=approximate(B=9999), data=Pdf) #coin包
## 
##  Approximative Page Test
## 
## data:  DV by
##   IV (A < B < C < D) 
##   stratified by id
## Z = 1.205, p-value = 0.2542
## alternative hypothesis: two.sided

p值大于0.05,可以认为四种不同的测试方式之间没有差异。

二项分布检验(Binomial test)

二项分布检验是对二分类变量的拟合优度检验,它考察每个类别中观察值的频数与特定二项分布下的预期频数间是否存在统计学差异。在二项分布检验中,实际上采用的和K-S检验的原理相同,只是这里主要使用的是二分变量,是一个离散分布的检验情况。 ###单侧(One-sided) 例 某研究检测7份血清标本,乙肝HBsAg的抗体结果如下:“+”, “+”, “-”, “+”, “-”, “+”, “+”,问研究的血清标本阳性率是否小于等于0.25?

DV   <- factor(c("+", "+", "-", "+", "-", "+", "+"), levels=c("+", "-"))
N    <- length(DV)
(tab <- table(DV))
## DV
## + - 
## 5 2
pH0 <- 0.25
binom.test(tab, p=pH0, alternative="greater", conf.level=0.95)
## 
##  Exact binomial test
## 
## data:  tab
## number of successes = 5, number of trials = 7, p-value = 0.01288
## alternative hypothesis: true probability of success is greater than 0.25
## 95 percent confidence interval:
##  0.3412614 1.0000000
## sample estimates:
## probability of success 
##              0.7142857

P值大于0.05,可以认为研究的血清标本阳性率时候小于等于0.25。

双侧(Two-sided)

例 某研究检测20份血清标本,乙肝HBsAg的抗体有10份是阳性,问研究的血清标本阳性率是否等于0.25?

N    <- 20
hits <- 10
binom.test(hits, N, p=pH0, alternative="two.sided")
## 
##  Exact binomial test
## 
## data:  hits and N
## number of successes = 10, number of trials = 20, p-value = 0.01704
## alternative hypothesis: true probability of success is not equal to 0.25
## 95 percent confidence interval:
##  0.2719578 0.7280422
## sample estimates:
## probability of success 
##                    0.5

P值小于0.05,可以认为研究的血清标本阳性率时候不等于0.25。

置信区间(Confidence intervals)

例 某研究检测7份血清标本,乙肝HBsAg的抗体结果如下:“+”, “+”, “-”, “+”, “-”, “+”, “+”,问研究的血清标本阳性率和其置信区间?

DV   <- factor(c("+", "+", "-", "+", "-", "+", "+"), levels=c("+", "-"))
N    <- length(DV)
tab <- table(DV)
BinomCI(tab[1], sum(tab), method="wilson")
##            est    lwr.ci    upr.ci
## [1,] 0.7142857 0.3589345 0.9177811

该血清阳性率为71.42%,95%置信区间为35.89%~91.77%.

Pearson拟合优度\(\chi ^{2}\)检验

假定某随机变量\(X\)应当有分布,对\(X\)进行\(n\)次观察,可以得到Pearson\(\chi^{2}\)统计量\(K=\sum_{i=1}^{m}\frac{(n_{i}-np{i})^{2}}{np_{i}}\),当\(n\rightarrow \infty\)时,\(K\)分布收敛于自由度为\(m-1\)\(\chi ^{2}\)分布。\(m\)为区间个数,\(n_{i}\)为随机变量落在区间内的个数,\(p_{i}\)为区间的理论概率。

例 20位病人的某项临床检验值如下9.66 35.42 39.24 32.00 6.63 27.96 20.79 17.81 19.97 13.03 35.93 13.30 32.39 23.21 31.35 16.98 20.56 16.94 35.21 23.33,请用Pearson \(\chi ^{2}\)检验判断是否服从正态分布?

 x <- c(9.66,35.42,39.24,32.00,6.63,27.96,20.79,
        17.81,19.97,13.03,35.93,13.30,32.39,23.21,
        31.35,16.98,20.56,16.94,35.21,23.33)
 g <- table(cut(x,breaks = c(5,10,15,20,25,30,35)))
 p <- pnorm(c(10,15,20,25,30,35),mean(x),sd(x))
 p <- c(p[1],p[2]-p[1],p[3]-p[2],p[4]-p[3],p[5]-p[4],1-p[5])
 chisq.test(g,p = p)
## Warning in chisq.test(g, p = p): Chi-squared approximation may be incorrect
## 
##  Chi-squared test for given probabilities
## 
## data:  g
## X-squared = 2.903, df = 5, p-value = 0.7149

P值大于0.05,可以认为服从正态分布

Kolmogorov-Smirnov检验

Kolmogorov-Smirnov是比较一个频率分布f(x)与理论分布g(x)或者两个观测值分布的检验方法。其原假设\(H_{0}\):两个数据分布一致或者数据符合理论分布。 \(D=max|f(x)-g(x)|\),当实际观测值\(D>D(n,\alpha)\)则拒绝\(H_{0}\),否则则接受\(H_{0}\)假设。检验统计量为\(Z=\sqrt{n}\max_{i}(|F_{n}(x_{i-1}-F(x_{i})|,|F_{n}(x_{i}-F(x_{i})|)\)

单样本检验

例 20位病人的某项临床检验值如下9.66 35.42 39.24 32.00 6.63 27.96 20.79 17.81 19.97 13.03 35.93 13.30 32.39 23.21 31.35 16.98 20.56 16.94 35.21 23.33,请用Kolmogorov-Smirnov检验判断是否服从正态分布?

x <- c(9.66,35.42,39.24,32.00,6.63,27.96,20.79,17.81,
       19.97,13.03,35.93,13.30,32.39,23.21,31.35,16.98,
       20.56,16.94,35.21,23.33)
ks.test(x,mean(x),sd(x))
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  x and mean(x)
## D = 0.6, p-value = 0.8571
## alternative hypothesis: two-sided

P值大于0.05,可以认为服从正态分布

例 对一台设备进行寿命检验,记录十次无故障操作时间,并按从小到大的次序排列如下420 500 920 1380 1510 1650 1760 2100 2300 2350检验此设备无故障工作时间是否符合rambda=1/1500的指数分布?

X<-c(420, 500, 920, 1380, 1510, 1650, 1760, 2100, 2300, 2350)
ks.test(X, "pexp", 1/1500)
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  X
## D = 0.30148, p-value = 0.2654
## alternative hypothesis: two-sided

两样本检验

例 测定两组病人尿液中的尿胆原(URO),A组 2.13 13.91 15.65 12.34 0.74 10.49 7.22 5.86 6.84 3.67 B组 4.62 6.16 0.52 4.79 0.21 3.69 3.46 3.42 5.67 2.83,请检验A组和B组是否相同?

a <- c(2.13,13.91,15.65,12.34,0.74,10.49,7.22,5.86,6.84,3.67)
b <- c(4.62,6.16,0.52,4.79,0.21,3.69,3.46,3.42,5.67,2.83)
ks.test(a,b)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  a and b
## D = 0.6, p-value = 0.05245
## alternative hypothesis: two-sided

P值大于0.05,可以认为A、B两组的分布相同。

列联表的独立性检验

在研究问题时有时候会遇到要求判断两个分类变量之间是否存在联系的问题,通常使用\(\chi ^{2}\)检验判断两组或多组资料是否相互关联。如果不相互关联,就称为独立,这类问题处理称为独立性检验(Test of Independence)。

pearson\(\chi ^{2}\)检验

\(i\)为行,\(j\)为列的列联表中,统计量为\(K=\sum_{i=1}^{I}\sum_{j=1}^{J}\frac{[n\cdot n_{ij}-n_{i}\cdot n_{j}]^{2}}{n\cdot n_{i}\cdot n_{j}}\)

例 vcd包中Arthritis数据集是一项风湿性关节炎新疗法的双盲临床实验的结果,请分析治疗措施是否与风湿性关节炎改善情况有关?

panderOptions('table.split.table', Inf)
pander(head(Arthritis))
ID Treatment Sex Age Improved
57 Treated Male 27 Some
46 Treated Male 29 None
77 Treated Male 30 None
17 Treated Male 32 Marked
36 Treated Male 46 Marked
23 Treated Male 58 Marked
mytable <- xtabs(~Treatment+Improved,data=Arthritis)
chisq.test(mytable)
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 13.055, df = 2, p-value = 0.001463

P值小于0.05,可以认为治疗情况与改善状况有关。

例 某杂志刊登了一篇研究吸烟是否与患肺癌有关文章,对63位肺癌患者及43名非肺癌患者调查了其中的吸烟人数,得到2*2列联表,如下表所示,认为吸烟与肺癌有关,请予以验证!

患肺癌 未患肺癌
吸烟 60 32
不吸烟 3 11
x<-c(60, 3, 32, 11)
dim(x)<-c(2,2)
chisq.test(x,correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  x
## X-squared = 9.6636, df = 1, p-value = 0.00188

P值小于0.05,可以认为吸烟与肺癌有关。

卡方检验(\(\chi^{2}\)检验)

卡方检验用于两个率或两个构成比之间的比较。 例 三组大白鼠分别为35 16 94只,在不同致癌剂作用下的发病数分别为15 6 39,问不同组只见发病数是否有差别?

total <- c(35, 16, 94)
hits  <- c(15,6,39)
prop.test(hits, total)
## 
##  3-sample test for equality of proportions without continuity
##  correction
## 
## data:  hits out of total
## X-squared = 0.13125, df = 2, p-value = 0.9365
## alternative hypothesis: two.sided
## sample estimates:
##    prop 1    prop 2    prop 3 
## 0.4285714 0.3750000 0.4148936

P值大于0.05,不能拒绝零假设,不能认为三组之间的发病数有差异。

游程检验(Runs-test)

游程检验即单样本变量值的随机性检验,其一个或多个的代表相同属性或类型的接连出现区段称为游程。游程检验主要是用来检验数据是否为随机性取得。 例 某村发生一种地方病,其住户沿一条河排列,调查时对发病的住户标记为“t”,对非发病的住户标记为“f”,共8户,其取值如下表所示:“f”, “t”, “t”, “f”, “t”, “f”, “f”, “f”,问该地方的发病是否随机?

queue <- factor(c("f", "t", "t", "f", "t", "f", "f", "f"))
RunsTest(queue, alternative="greater")
## 
##  Runs Test for Randomness
## 
## data:  queue
## runs = 5, m = 5, n = 3, p-value = 0.5714
## alternative hypothesis: true number of runs is greater than expected

P值大于0.05,可以认为地方病的发病是随机的。

置换检验(Manual permutation test)

样本数较少,实现置换检验

Nj    <- table(queue)
(runs <- rle(levels(queue)[as.numeric(queue)]))
## Run Length Encoding
##   lengths: int [1:5] 1 2 1 1 3
##   values : chr [1:5] "f" "t" "f" "t" "f"
(rr <- length(runs$lengths))
## [1] 5
(rr1 <- table(runs$values)[1])
## f 
## 3
(rr2 <- table(runs$values)[2])
## t 
## 2
getP <- function(r1, r2, n1, n2) {
    # iterations of a symbol <= total number of this symbol?
    stopifnot(r1 <= n1, r2 <= n2)

    # probability in case r1+r2 is uneven
    p <- (choose(n1-1, r1-1) * choose(n2-1, r2-1)) / choose(n1+n2, n1)

    # probability in case r1+r2 is even: twice the uneven case
    ifelse(((r1+r2) %% 2) == 0, 2*p, p)
}

n1 <- Nj[1]
n2 <- Nj[2]
N <- sum(Nj)
rMin  <- 2
(rMax <- ifelse(n1 == n2, N, 2*min(n1, n2) + 1))
## f 
## 7
p3.2 <- getP(3, 2, n1, n2)
p2.3 <- getP(2, 3, n1, n2)
p3.3 <- getP(3, 3, n1, n2)
p4.3 <- getP(4, 3, n1, n2)
(pGrEq <- p3.2 + p2.3 + p3.3 + p4.3)
## [1] 0.5714286
p2.2 <- getP(2, 2, n1, n2)
p1.2 <- getP(1, 2, n1, n2)
p2.1 <- getP(2, 1, n1, n2)
p1.1 <- getP(1, 1, n1, n2)
(pLess <- p2.2 + p1.2 + p2.1 + p1.1)
## [1] 0.4285714
pGrEq + pLess
## [1] 1

对上述游程进行检验可用正态近似法

muR   <- 1 + ((2*n1*n2) / N)
varR  <- (2*n1*n2*(2*n1*n2 - N)) / (N^2 * (N-1))
rZ    <- (rr-muR) / sqrt(varR)
(pVal <- 1-pnorm(rZ))
##         f 
## 0.4184066

无序分类联合检验(Association tests and measures for unordered categorical variables)

(2×2)列联表

Fishe精确检验是检验两个二分类变量是否是独立的一种检验。在样本较小时(单元的期望频数小于5),需要用Fisher精确检验来做独立性检验。Fisher精确检验的原假设是:边界固定的列联表中行和列是相互独立的,其调用格式为fisher.test(mytable),其中的mytable是一个二维列联表,这里的fisher.test()函数可以在任意行列数大于等于2的二维列联表上使用,但不能用于2×2的列联表。

例 vcd包中Arthritis数据集是一项风湿性关节炎新疗法的双盲临床实验的结果,请用Fisher精确独立检验分析治疗措施是否与风湿性关节炎改善情况。

mytable <- xtabs(~Treatment+Improved,data=Arthritis)
fisher.test(mytable) #fish.test()函数不可用于2*2列联表
## 
##  Fisher's Exact Test for Count Data
## 
## data:  mytable
## p-value = 0.001393
## alternative hypothesis: two.sided

P值小于0.05,可以认为治疗情况与改善状况有关。

例 某疾病的影像学诊断和疾病的确诊结果如下,问影像学诊断和确诊结果是否有差别?

疾病 isHealthy isIll
no 8 2
yes 1 4
disease <- factor(rep(c("no", "yes"),   c(10, 5)))
diagN   <- rep(c("isHealthy", "isIll"), c( 8, 2))
diagY   <- rep(c("isHealthy", "isIll"), c( 1, 4))
diagT   <- factor(c(diagN, diagY))
contT1  <- table(disease, diagT)
addmargins(contT1)
##        diagT
## disease isHealthy isIll Sum
##     no          8     2  10
##     yes         1     4   5
##     Sum         9     6  15
fisher.test(contT1)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  contT1
## p-value = 0.08891
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    0.7470432 875.8802434
## sample estimates:
## odds ratio 
##   12.49706

P值大于0.05,可以认为影像学诊断和确诊结果没有差别。

灵敏度、特异度等(Prevalence, sensitivity, specificity, CCR, F-score)

TN <- c11 <- contT1[1, 1]       ## true negative 真阴性
TP <- c22 <- contT1[2, 2]       ## true positive / hit 真阳性
FP <- c12 <- contT1[1, 2]       ## false positive 假阳性
FN <- c21 <- contT1[2, 1]       ## false negative / miss 假阴性

(prevalence <- sum(contT1[2, ]) / sum(contT1)) #盛行率
## [1] 0.3333333
(sensitivity <- recall <- TP / (TP+FN))  #灵敏度
## [1] 0.8
(specificity <- TN / (TN+FP))  #特异度
## [1] 0.8
(relevance <- precision <- TP / (TP+FP)) #阳性预测值
## [1] 0.6666667
(CCR <- sum(diag(contT1)) / sum(contT1)) #正确分类率 accuracy = (TP + TN)/(TP + FP + FN + TN)
## [1] 0.8
(Fval <- 1 / mean(1 / c(precision, recall))) #F得分 = 2*TP /(2*TP + FP + FN) % 
## [1] 0.7272727

OR值、相对危险度等(Odds ratio, Yule’s Q and risk ratio)

(OR <- OddsRatio(contT1, conf.level=0.95)) # OR值
## odds ratio     lwr.ci     upr.ci 
##  16.000000   1.092859 234.247896
YuleQ(contT1)  ## Goodman-Kruskal γ值,属于2×2四格表的列联比例函数
## [1] 0.8823529
RelRisk(contT1) #相对危险度
## [1] 4
(risk    <- prop.table(contT1, margin=1))
##        diagT
## disease isHealthy isIll
##     no        0.8   0.2
##     yes       0.2   0.8
(relRisk <- risk[1, 1] / risk[2, 1]) #相对危险度
## [1] 4

(r×c)列联表

例 某研究得出吸烟和冠心病疾病的严重程度人数分布如下表,问吸烟和冠心病疾病的严重程度是否有关?

smokes
no 5 21 6
yes 4 13 1
table2flat <- function(mytable){
  df <- as.data.frame(mytable)
  rows <- dim(df)[1]
  cols <- dim(df)[2]
  x <- NULL
  for(i in 1:rows){
    for(j in 1:as.integer(as.character(df$Freq[i]))){
      row <- df[i,c(1:(cols-1))]
      x <- rbind(x,row)
    }
  }
  row.names(x) <- c(1:dim(x)[1])
  return(x)
}

smokes <- rep(c("no","yes"),times=1)
siblings <- rep(c("1","2","3"),each=2)
Freq <- c(5,4,21,13,6,1)
mytable <- as.data.frame(cbind(smokes,siblings,Freq))
mydata <- table2flat(mytable)
cTab <- table(mydata)
cTab
##       siblings
## smokes  1  2  3
##    no   5 21  6
##    yes  4 13  1
addmargins(cTab)
##       siblings
## smokes  1  2  3 Sum
##    no   5 21  6  32
##    yes  4 13  1  18
##    Sum  9 34  7  50
chisq.test(cTab)
## Warning in chisq.test(cTab): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  cTab
## X-squared = 1.7848, df = 2, p-value = 0.4097

p值大于0.05,可以认为吸烟和冠心病疾病的严重程度无关。

\(\phi\), Cramer’s \(V\), contingency coefficient等值

Assocs(cTab)
##                           estimate      lwr.ci     upr.ci
## Phi Coeff.              1.8890e-01           -          -
## Contingency Coeff.      1.8570e-01           -          -
## Cramer V                1.8890e-01      0.0000 4.3020e-01
## Goodman Kruskal Gamma  -3.4300e-01 -8.4970e-01 1.6380e-01
## Kendall Tau-b          -1.6070e-01 -4.0960e-01 8.8300e-02
## Stuart Tau-c           -1.5200e-01 -3.9050e-01 8.6500e-02
## Somers D C|R           -1.6490e-01 -4.2240e-01 9.2500e-02
## Somers D R|C           -1.5650e-01 -4.3600e-01 1.2290e-01
## Pearson Correlation    -1.6840e-01 -4.2670e-01 1.1540e-01
## Spearman Correlation   -1.6670e-01 -4.2530e-01 1.1710e-01
## Lambda C|R                  0.0000      0.0000     0.0000
## Lambda R|C                  0.0000      0.0000     0.0000
## Lambda sym                  0.0000      0.0000     0.0000
## Uncertainty Coeff. C|R  2.3600e-02 -3.6100e-02 8.3400e-02
## Uncertainty Coeff. R|C  3.0600e-02 -4.7300e-02 1.0860e-01
## Uncertainty Coeff. sym  2.6700e-02 -4.0900e-02 9.4200e-02
## Mutual Information      2.8900e-02           -          -

Cochran-Mantel-Haenszel 检验

MHC 检验在存在第三个类别变量的情况下有条件地检验两个二分类变量的关联度。

例 检验在A、B两组组别因素存在的情况下,性别和工作而分类变量的关联度

set.seed(123)
myDf <- data.frame(work =factor(sample(c("home", "office"), 10, replace=TRUE)),
                   sex=factor(sample(c("f", "m"),10, replace=TRUE)),
                   group=factor(sample(c("A", "B"), 10, replace=TRUE)))
tab3 <- xtabs(~ work + sex + group, data=myDf)
tab3
## , , group = A
## 
##         sex
## work     f m
##   home   0 1
##   office 1 0
## 
## , , group = B
## 
##         sex
## work     f m
##   home   0 3
##   office 4 1
cmh_test(tab3, distribution=approximate(B=9999))
## 
##  Approximative Generalized Cochran-Mantel-Haenszel Test
## 
## data:  sex by
##   work (home, office) 
##   stratified by group
## chi-squared = 5.0909, p-value = 0.07191

P值大于0.05,可以认为性别和工作无关联。

在存在第三个分类变量的情况下,有条件的检验两个分类变量的关联度,需要用Cochran-Mantel-Haenszel检验。其原假设是两个分类变量在第三个变量的每一层中都是条件独立的。

例 Arthritis数据集检验治疗情况和改善情况在性别的每一水平下是否独立,假设不存在三阶交互作用(治疗情况×改善情况×性别)。

mytable <- xtabs(~Treatment+Improved+Sex,data=Arthritis)
mantelhaen.test(mytable)
## 
##  Cochran-Mantel-Haenszel test
## 
## data:  mytable
## Cochran-Mantel-Haenszel M^2 = 14.632, df = 2, p-value = 0.0006647

P值较小,表明接受治疗与得到改善在性别的每一水平下并不独立即,分性别来看,用药治疗的患者叫接受安慰剂的患者有了更多的改善。

有序分类联合检验(Association tests and measures for ordered categorical variables)

线性间的联合检验(Linear-by-linear association test)

线性间的联合检验检验两有序变量之间升高或降低的变化趋势。

例 下表表示不同年龄组血液病患者真菌感染发生的严重分级情况,是否年龄组越高,感染越重? 年龄 I II III IV — — — — — <=29 1 5 6 9 30~59 5 7 6 7 >=60 8 9 13 10

age <- rep(c("Q","Z","L"),times=4)
severity <- rep(c("I","II","III","IV"),each=3)
Freq <- c(1,5,8,5,7,9,6,6,13,9,7,10)
mytable <- as.data.frame(cbind(age,severity,Freq))
dfOrd <- table2flat(mytable)
cTab <- xtabs(~ age + severity, data=dfOrd)
addmargins(cTab)
##      severity
## age    I II III IV Sum
##   L    8  9  13 10  40
##   Q    1  5   6  9  21
##   Z    5  7   6  7  25
##   Sum 14 21  25 26  86
lbl_test(cTab, distribution=approximate(B=9999))
## 
##  Approximative Linear-by-Linear Association Test
## 
## data:  severity (ordered) by age (L < Q < Z)
## Z = 0.11723, p-value = 0.9577
## alternative hypothesis: two.sided

P值大于0.05,不能认为年龄越高真菌感染的严重程度越重。

多序列相关(Polychoric and polyserial correlation)

Polychoric相关是估计从两个从有序分类变量所获得的两个近似连续正态分布的潜变量(latent variables)之间的相关。

polychor(dfOrd$age, dfOrd$severity, ML=TRUE)
## [1] 0.03116752
#polychor(cTab, ML=TRUE)

polyserial相关是估计一个连续性变量和一个有序分类的变量之间的相关。

x <- rnorm(86,mean = 23,sd=10) 
polyserial(x, dfOrd$severity) #两步骤估计
## [1] 0.04068652
polyserial(x, dfOrd$severity,ML=TRUE, std.err=TRUE) #ML估计 
## 
## Polyserial Correlation, ML est. = 0.04281 (0.1165)
## Test of bivariate normality: Chisquare = 25.15, df = 11, p = 0.008653
## 
##                 1       2      3
## Threshold -0.9826 -0.2348 0.5179
## Std.Err.   0.1617  0.1364 0.1419

异构相关矩阵(Heterogeneous correlation matrices)

N <- 100
Sigma <- matrix(c(4,2,-3, 2,16,-1, -3,-1,9), byrow=TRUE, ncol=3)
mu<- c(-3, 2, 4)
Xdf   <- data.frame(rmvnorm(n=N, mean=mu, sigma=Sigma))
lOrd   <- lapply(Xdf, function(x) {
  cut(x, breaks=quantile(x), include.lowest=TRUE,
      ordered=TRUE, labels=LETTERS[1:4]) })
dfOrd  <- data.frame(lOrd)

Xdf2<- rmvnorm(n=N, mean=mu, sigma=Sigma) #产生随机多元正态分布
dfBoth <- cbind(Xdf2, dfOrd)
hetcor(dfBoth, ML=TRUE)
## 
## Maximum-Likelihood Estimates
## 
## Correlations/Type of Correlation:
##          1         2        3         X1         X2         X3
## 1        1   Pearson  Pearson Polyserial Polyserial Polyserial
## 2   0.1975         1  Pearson Polyserial Polyserial Polyserial
## 3   -0.389  -0.08689        1 Polyserial Polyserial Polyserial
## X1 0.02187 -0.002543  -0.0465          1 Polychoric Polychoric
## X2 0.04476  -0.01478 -0.04411     0.1197          1 Polychoric
## X3 0.05493   0.06691 -0.03846    -0.4418   -0.08623          1
## 
## Standard Errors:
##          1       2      3     X1    X2
## 1                                     
## 2  0.09642                            
## 3  0.08529 0.09954                    
## X1  0.1071  0.1085 0.1074             
## X2  0.1077  0.1104 0.1076 0.1158      
## X3  0.1082  0.1078 0.1084 0.0979 0.115
## 
## n = 100 
## 
## P-values for Tests of Bivariate Normality:
##         1      2      3     X1     X2
## 1                                    
## 2  0.9802                            
## 3  0.8369  0.865                     
## X1 0.7713 0.7068 0.7135              
## X2 0.9466 0.6085 0.4274 0.3976       
## X3 0.8192 0.1957 0.5062 0.5814 0.5824

有序变量和连续性变量(Association measures involving categorical and continuous variables)

对有序分类变量和连续性变量的相关测量,可以使用rms包中建立Logistic模型的lmr函数,AUC等指标可从模型的stats中获得。

set.seed(123)
N   <- 100
x   <- rnorm(N)
y   <- x + rnorm(N, 0, 2)
yDi <- ifelse(y <= median(y), 0, 1)

lrm(yDi ~ x)$stats
##          Obs    Max Deriv   Model L.R.         d.f.            P 
## 1.000000e+02 1.769745e-09 1.578680e+01 1.000000e+00 7.089558e-05 
##            C          Dxy        Gamma        Tau-a           R2 
## 7.240000e-01 4.480000e-01 4.487179e-01 2.262626e-01 1.947166e-01 
##        Brier            g           gr           gp 
## 2.123535e-01 1.018865e+00 2.770049e+00 2.225106e-01
#绘制ROC曲线
(rocRes <- roc(yDi ~ x, plot=TRUE, ci=TRUE, main="ROC-curve",
               xlab="specificity (TN / (TN+FP))", ylab="sensitivity (TP / (TP+FN))"))
## 
## Call:
## roc.formula(formula = yDi ~ x, plot = TRUE, ci = TRUE, main = "ROC-curve",     xlab = "specificity (TN / (TN+FP))", ylab = "sensitivity (TP / (TP+FN))")
## 
## Data: x in 50 controls (yDi 0) < 50 cases (yDi 1).
## Area under the curve: 0.7244
## 95% CI: 0.6236-0.8252 (DeLong)
rocCI <- ci.se(rocRes)
plot(rocCI, type="shape")
## Warning in plot.ci.se(rocCI, type = "shape"): Low definition shape.

Cochran Q检验(Cochran-Q-test)

Cochran-Q检验是对多个变量的二分数据具有同一分布的检验。检验的数据是二分类数据,且数据是0或1两种编码。

例 某研究调查了20名患者,对四个医院的满意情况,满意用1表示,不满意用0表示。

data_wide <- read.csv("Cochran.csv",header = T)
data_wide
##    A. B. C. D.
## 1   0  1  0  0
## 2   1  1  1  0
## 3   1  0  1  0
## 4   0  0  1  0
## 5   0  0  1  1
## 6   1  1  0  1
## 7   1  1  0  0
## 8   1  1  0  0
## 9   1  1  0  1
## 10  1  1  1  0
## 11  1  0  0  0
## 12  1  1  0  0
## 13  1  1  0  0
## 14  1  0  1  1
## 15  1  1  1  0
## 16  1  1  0  1
## 17  0  0  1  1
## 18  1  0  0  0
## 19  1  0  1  0
## 20  1  0  0  0
data_long <- melt(data_wide, id.vars = NULL) #转为长格式
id=factor(rep(1:20,times=4))
cdf <- data.frame(id,data_long)
cdf$value <- as.numeric(cdf$value) #转为数字
symmetry_test(value~variable  | id, teststat="quad", data=cdf)
## 
##  Asymptotic General Symmetry Test
## 
## data:  value by
##   variable (A., B., C., D.) 
##   stratified by id
## chi-squared = 9.3529, df = 3, p-value = 0.02495

P值小于0.05,可以认为对四个医院的满意情况有差异。

McNemar检验(McNemar test)

McNemar检验用于配对计数资料的分析,主要分析配对资料中对照组和处理组的频数或比率是否有差异。常用于同一批观察对象用药前后或试验前后的结果是否有差异。McNemar检验用于配对计数资料的分析,主要分析配对资料中对照组和处理组的频数或比率是否有差异,也可以分析统一批观察对象用药前后或试验前后的结果是否有差异,并不是独立性检验。Kappa统计量用于测量两次调查的可重复性,\(k=\frac{p_{0}-p_{e}}{1-p_{e}}\),\(p_{0}\)两次调查中一致的概率,\(p_{e}\)两次调查的期望一致概率,\(p_{e}= \sum_{i=1}^{c}a_{i}b_{i}\)\(a_{i}b_{i}\)\(c*c\)列联表中两个调查第\(i\)个类型的边际概率,k>0.75表示极好的重复性,0.4<=k<=0.75好的重复性,0<=k<0.4边界(勉强够格)的重复性。

例 同一批病例治疗前后症状研究结果如下,试分析治疗前后是否有差异?

后有 后无
前有 3 5
前无 6 6
pre <- rep(c("yes","no"),times=2)
post <- rep(c("yes","no"),each=2)
Freq <- c(3,6,5,6)
mytable <- as.data.frame(cbind(pre,post,Freq))
mydata <- table2flat(mytable)
cTab    <- table(mydata$pre, mydata$post)
addmargins(cTab)
##      
##       no yes Sum
##   no   6   6  12
##   yes  5   3   8
##   Sum 11   9  20
mcnemar.test(cTab, correct=FALSE)
## 
##  McNemar's Chi-squared test
## 
## data:  cTab
## McNemar's chi-squared = 0.090909, df = 1, p-value = 0.763
symmetry_test(cTab, teststat="quad", distribution=approximate(B=9999)) #coin包
## 
##  Approximative General Symmetry Test
## 
## data:  response by
##   conditions (Var1, Var2) 
##   stratified by block
## chi-squared = 0.090909, p-value = 1

P值大于0.05,不但能认为治疗前后的症状有差异。 例 研究人员将患霍奇金淋巴瘤的病人和非病人按同性别及年龄在5岁以内的条件进行配对,共配对85对病人。两组人群中扁桃体切除情况如下,试问配对的两组人群扁桃体切除率是否有差别?

对照是 对照否
病人是 26 15
病人否 7 37
x <- c(26,7,15,37)
dim(x) <- c(2,2)
mcnemar.test(x,correct=F)
## 
##  McNemar's Chi-squared test
## 
## data:  x
## McNemar's chi-squared = 2.9091, df = 1, p-value = 0.08808

P值大于0.05,不能认为两组人群扁桃体切除率有差异。

例 同时使用甲乙两种方法测定265份标本中的金黄色葡萄球菌,结果如下,问甲乙两种方法可重复性(kappa)是多少?

甲+ 甲-
乙+ 107 35
乙- 24 99
kappa.test <- function(x)
{
  N=sum(x)
  Po=sum(diag(x)/N) #观察到的一致数
  mr=apply(x,1,sum) #行边际
  mr=mr/N
  mc=apply(x,2,sum) #列边际
  mc=mc/N
  Pe=sum(mc * mr) #期望一致数
  k=(Po-Pe)/(1-Pe) #kappa统计量
  se_k = sqrt((Pe+Pe^2-sum(mr*mc*(mr+mc)))/(N*(1-Pe)^2)) # kappa统计量的标准误
  z=k/se_k #检验统计量
  p.value=1-pnorm(z) # p值
  res=list(kappa=k,se_k=se_k,p.value=p.value,z=z)
}

X<-c(107, 24, 35, 99)
dim(X)<-c(2,2)
k <- kappa.test(X)
k
## $kappa
## [1] 0.5550781
## 
## $se_k
## [1] 0.0612178
## 
## $p.value
## [1] 0
## 
## $z
## [1] 9.067267

Bowker 检验(Bowker test)

对于配对设计的二分类资料,通常使用McNemar检验,而对于配对设计多分类资料,使用Bowker检验,是McNemar检验的一般化。

例 两名放射科医生对203名棉屑沉着病可疑患者的诊断结果见下表。试分析两名医生诊断的结果是否相同。

I II III
I 78 5 2
II 6 56 12
III 2 10 32
A <- rep(c("I","II","III"),times=3)
B <- rep(c("I","II","III"),each=3)
Freq <- c(78,6,2,5,56,10,2,12,32)
mytable <- as.data.frame(cbind(A,B,Freq))
mydata <- table2flat(mytable)
cTab  <- table(mydata$A, mydata$B)
addmargins(cTab)
##      
##         I  II III Sum
##   I    78   5   2  85
##   II    6  56  12  74
##   III   2  10  32  44
##   Sum  86  71  46 203
mcnemar.test(cTab) #对于大于2*2的列联表mcnemar.test自动采用Bowke检验
## 
##  McNemar's Chi-squared test
## 
## data:  cTab
## McNemar's chi-squared = 0.27273, df = 3, p-value = 0.9651
symmetry_test(cTab, teststat="quad", distribution=approximate(B=9999)) #coin包
## 
##  Approximative General Symmetry Test
## 
## data:  response by
##   conditions (Var1, Var2) 
##   stratified by block
## chi-squared = 0.27273, p-value = 0.8812

P值大于0.05,不能认为两医生的诊断有差异。

Stuart Maxwell检验(Stuart-Maxwell-test for marginal homogeneity)

Stuart Maxwell检验与Bowker 检验类似,适用于有序多分类的资料。

例 如上题。

mh_test(cTab, distribution=approximate(B=9999))
## 
##  Approximative Marginal Homogeneity Test
## 
## data:  response by
##   conditions (Var1, Var2) 
##   stratified by block
## chi-squared = 0.27273, p-value = 0.8767

结果与上题类似,P值大于0.05,不能认为两医生的诊断有差异。

基于尺度参数的检验

Kruskal-Wallis和Wilcoxon检验均为基于位置参数的检验,常用基于尺度参数的检验有Ansari-Bradley检验和Fligner-Killeen检验。

尺度参数的Ansari-Bradley检验

例 两台机器加工的片剂直径(各10个)为(单位:mm):机器A: 18.0, 17.1, 16.4, 16.9, 16.9, 16.7, 16.7, 17.2, 17.5, 16.9;机器B: 17.0, 16.9, 17.0, 16.9, 17.2, 17.1, 16.8, 17.1, 17.1, 17.2.这个结果能否说明两台机器的水平(加工精度)一致?

worker.a<-c(18.0,17.1,16.4,16.9,16.9,16.7,16.7,17.2,17.5,16.9)
worker.b<-c(17.0,16.9,17.0,16.9,17.2,17.1,16.8,17.1,17.1,17.2)
ansari.test(worker.a,worker.b)
## Warning in ansari.test.default(worker.a, worker.b): cannot compute exact p-
## value with ties
## 
##  Ansari-Bradley test
## 
## data:  worker.a and worker.b
## AB = 41.5, p-value = 0.04232
## alternative hypothesis: true ratio of scales is not equal to 1

P值小于0.05,拒绝原假设,可以认为两台机器的加工精度不同。

尺度参数的Fligner-Killeen检验

例 三名不同的病人A、B、C同时在同一条件下进行某项智力测验,各进行10次,他们评分如下: A: 8, 7, 9, 10, 9, 6, 5, 8, 10, 5; B: 8, 7, 9, 6, 8, 9, 10, 7, 8, 9; C: 10, 10, 9, 6, 8, 3, 5, 6, 7, 4. 问这三名病例的智力测验稳定性是否一样?

x<-list(A=c(8,7,9,10,9,6,5,8,10,5),
B=c(8,7,9,6,8,9,10,7,8,9),
C=c(10,10,9,6,8,3,5,6,7,4))
fligner.test(x)
## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  x
## Fligner-Killeen:med chi-squared = 5.1905, df = 2, p-value =
## 0.07463

P值大于0.05,接受原假设,认为三名病例的智力测验稳定性一样。

重抽样(Resampling)

当数据抽样于非正态分布时,如未知或混合分布、样本量过小、存在离群点、基于理论分布设计合适的统计检验过于复杂且数学上难以处理等情况,这时可以使用基于随机化和重抽样的统计方法。

置换检验(Permutation tests)

以原假设为起点,假定二组没有差别,由此将二组样本合并,从中以无放回方式进行抽样,分别归入两个组再计算统计量,反复进行由此得到置换分布,在此基础上进行推断。相对于传统检验,coin包提供了进行置换检验的一般性框架,函数形式:function(formula,data,distribution=),formula描述的是要检验变量间的关系,如下表所示,data是一个数据框,distribution指定经验分布在零假设条件下的形式,若distribution =“exact”,那么表示在零假设条件下,分布的计算是精确的(即依据所有可能的排列组合),仅可用于两样本问题;若distribution = “asymptotic”表示根据它的渐进分布进行计算;若distribution =“approxiamate(B = #)”表示根据蒙特卡洛重抽样来做近似计算,其中#指所需重复的次数。

检验 coin函数
两样本和K样本置换检验 oneway_test(y ~ A)
含一个分层(区组)因子的两样本和K样本置换检验 oneway_test(y ~ A | C)
Wilcoxon-Mann-Whitney秩和检验 wilcox_test(y ~ A)
Kruskal-Wallis检验 kruskal_test(y ~ A)
Person卡方检验 chisq_test(A ~ B)
Cochran-Mantel-Haenszel检验 cmh_test(A ~ B | C)
线性关联检验 lbl_test(D ~ E)
Spearman检验 spearman_test(y ~ x)
Friedman检验 friedman_test(y ~ A | C)
Wilcoxon符号秩检验 wilcoxsign_test(y1 ~ y2)

coin函数中,y和x是数值变量,A和B是分类因子,C是类别型区组变量,D和E是有序因子,y1和y2是相匹配的数值变量。类别变量和有序变量均应转化为因子和有序因子。

独立两样本和K样本检验(Two-sample t-test / one-way ANOVA for independent groups)

例 某医院欲研究A、B两种降血脂药物对家兔血清肾素血管紧张素转化酶(ACE)的影响,将家兔随机分为三组,均喂以高脂饮食,分别给予不同的降血脂药物。一定时间后测定家兔血清ACE浓度(u/ml),A组(45 44 43 47 48 44 46 44 40 45 42 40 43 46 47 45 46 45 43 44),B组(45 48 47 43 46 47 48 46 43 49 46 43 47 46 47 46 45 46 44 45 46 44 43 42 45),问两组家兔血清ACE浓度是否相同?

a <- c(45, 44, 43, 47, 48, 44, 46, 44, 40, 45, 42, 40, 43, 46, 47, 45, 
    46, 45, 43, 44)
b <- c(45, 48, 47, 43, 46, 47, 48, 46, 43, 49, 46, 43, 47, 46, 47, 46, 
    45, 46, 44, 45, 46, 44, 43, 42, 45)

dfCRp <- data.frame(length = c(a, b), site = factor(c(rep("1", 20), 
    rep("2", 25))))

set.seed(123)
ot <- oneway_test(length ~ site, data = dfCRp,distribution=approximate(B=9999))
ot
## 
##  Approximative Two-Sample Fisher-Pitman Permutation Test
## 
## data:  length by site (1, 2)
## Z = -1.8676, p-value = 0.05991
## alternative hypothesis: true mu is not equal to 0

置换密度图

supp <- support(ot)
dens <- sapply(supp, dperm, object=ot)
plot(supp, dens, xlab="Support", ylab=NA, pch=20, main="Density permutation distribution")

QQ图

qEmp <- sapply(ppoints(supp), qperm, object=ot)
qqnorm(qEmp, xlab="Normal quantiles", ylab="Permutation quantiles",
       pch=20, main="Permutation quantiles vs. normal quantiles")
abline(a=0, b=1, lwd=2, col="blue")

累积分布图

plot(qEmp, ecdf(qEmp)(qEmp), col="gray60", pch=16,
     xlab="Difference in means", ylab="Cumulative relative frequency",
     main="Cumulative relative frequency and normal CDF")

非独立两样本和K样本检验(Two-sample t-test / one-way ANOVA for dependent groups)

例 10各高血压病人在治疗前后的血压值分别为86 120 82 137 96 104 142 86 77 82和75 107 88 111 111 132 67 115 103 136,问治疗前后的血压值是否有不同?

N <- 10
pre <- c(86,120,82,137,96,104,142,86,77,82)
post <- c(75,107,88,111,111,132,67,115,103,136)
tDepDf <- data.frame(DV=c(pre, post),
                     IV=factor(rep(0:1, each=N), labels=c("pre", "post")))
id     <- factor(rep(1:N, times=2))
set.seed(123)
oneway_test(DV ~ IV | id, alternative="less", distribution=approximate(B=9999), data=tDepDf)
## 
##  Approximative Two-Sample Fisher-Pitman Permutation Test
## 
## data:  DV by IV (pre, post) 
##   stratified by id
## Z = -0.30039, p-value = 0.3859
## alternative hypothesis: true mu is less than 0

P值大于0.05,治疗前后的血压值没有不同。

独立性检验

例 vcd包中Arthritis数据集包含了关节炎的治疗情况(Treatment)和改善情况(Improved),问治疗情况和改善情况是否独立?

Arthritisc <- transform(Arthritis,Improved=as.factor(as.numeric(Improved)))
set.seed(123)
chisq_test(Treatment~Improved,distribution=approximate(B=9999),data=Arthritisc)
## 
##  Approximative Pearson Chi-Squared Test
## 
## data:  Treatment by Improved (1, 2, 3)
## chi-squared = 13.055, p-value = 0.001

p值较小,可以认为治疗情况和改善情况不独立。

Cochran-Mantel-Haenszel检验

CMH检验可以对一些分层变量进行调整,从而获得反应率的总体比较。最为最为常见的应用是在多中心试验中对研究中心进行调整而进行两组率的比较。

例 vcd包中Arthritis数据集包含了关节炎的治疗情况(Treatment)、性别(Sex)和改善情况(Improved),在性别分层的情况下治疗情况和改善情况是否独立?

set.seed(123)
cmh_test(Treatment~Improved|Sex,distribution=approximate(B=9999),data=Arthritis)
## 
##  Approximative Linear-by-Linear Association Test
## 
## data:  Treatment by
##   Improved (None < Some < Marked) 
##   stratified by Sex
## Z = -3.8252, p-value = 3e-04
## alternative hypothesis: two.sided

P值较小,分性别来看,治疗情况和改善情况并不独立。

趋势检验

例 vcd包中Arthritis数据集包含了关节炎的治疗情况(Treatment)和改善情况(Improved),问治疗情况和改善情况是否存在线性趋势?

set.seed(123)
lbl_test(Treatment~Improved,distribution=approximate(B=9999),data=Arthritis)
## 
##  Approximative Linear-by-Linear Association Test
## 
## data:  Treatment by Improved (None < Some < Marked)
## Z = -3.5859, p-value = 3e-04
## alternative hypothesis: two.sided

P值较小,治疗情况和改善情况存在线性趋势。

数值变量的独立性

某医生分别采用盐析法和结合法测定正常皮肤中胶原蛋白的含量,检测结果如下,分析两数值变量之间的独立性。

编号 盐析法 结合法
1 6.8 546
2 7.8 553
3 8.7 562
4 8.7 563
5 8.9 570
6 9.5 575
7 10.1 581
8 10.2 605
9 10.3 607
10 10.4 621
11 11.1 624
12 12.4 626
13 13.3 632
14 13.1 640
15 13.2 656
y <- c(546,553,562,563,570,575,581,605,607,621,624,626,632,640,656)
x <- c(6.8,7.8,8.7,8.7,8.9,9.5,10.1,10.2,10.3,10.4,11.1,12.4,13.3,13.1,13.2)
df <- as.data.frame(cbind(x,y))

set.seed(123)
spearman_test(x~y,data=df,distribution=approximate(B=9999))
## 
##  Approximative Spearman Correlation Test
## 
## data:  x by y
## Z = 3.6982, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0

基于9999次重复的近似置换检验, 盐析法和结合法并不独立。

相关性

Wilcoxon符号秩检验

Wilcoxon符号秩检验适用于成对观测的数据,主要用于成对比较,比传统的单独用正负号的检验更加有效。 例 某医生为了探讨缺碘地区母婴TSH水平的关系,应用免疫放射分析测定了160名孕妇(15-17周)及分娩时脐带血TSH水平(mU/L),现随机抽取10对数据,母血TSH1.21 1.30 1.39 1.42 1.47 1.56 1.68 1.72 1.98 2.1,脐血TSH3.90 4.5 4.20 4.83 4.16 4.93 4.32 4.99 4.7 5.2,试对母血TSH水平与新生儿脐带血TSH水平进行相关分析。

x <- c(1.21,3.90,1.30,4.50,1.39,4.20,1.42,4.83,1.47,4.16)
y <- c(1.56,4.93,1.68,4.32,1.72,4.99,1.98,4.70,2.10,5.20)
df <- as.data.frame(cbind(x,y))

set.seed(123)
wilcoxsign_test(x~y,data=df,distribution="exact")
## 
##  Exact Wilcoxon-Pratt Signed-Rank Test
## 
## data:  y by x (pos, neg) 
##   stratified by block
## Z = -2.4973, p-value = 0.009766
## alternative hypothesis: true mu is not equal to 0

P值较小,可以认为母血TSH水平与新生儿脐带血TSH水平相关。

Mann-Whitney-Wilcoxon检验

Mann-Whitney-Wilcoxon检验是比较没有配对的两个独立样本的非参数检验。 例 在某项研究中,经随机抽样获得甲乙两组病人的血尿素氮(BUN)mmol/L,甲组:4.98 3.90 4.02 0.68 4.98 5.04 1.20 2.64 6.23 3.00,乙组:4.17 4.95 3.96 3.59 4.89 3.03 3.71 5.91 5.55 6.29 4.82 3.90 6.11,试比较甲组病人血尿素氮(BUN)的含量是否大于乙组?

#wilcox.test(x,y,exact = FALSE,correct =F) #差异性比较 
#wilcox.test(x,y,exact = FALSE) 
wilcox_test(DV ~ IV, alternative="less", conf.int=TRUE,
            distribution="exact", data=wIndDf)
## 
##  Exact Wilcoxon-Mann-Whitney Test
## 
## data:  DV by IV (A, B)
## Z = -1.0858, p-value = 0.1454
## alternative hypothesis: true mu is less than 0
## 95 percent confidence interval:
##  -Inf 0.15
## sample estimates:
## difference in location 
##                  -0.93

P值均大于0.05,不能拒绝原假设,可以认为甲组病人血尿素氮(BUN)的含量不大于乙组。

Spearman秩相关检验

例 两位评分员对新出生的5名新生儿进行Apgar评分,甲:6 7 8 9 10,乙:5 6 7 8 10。试用Spearman秩相关检验方法检验两个评分员对等级评定有无相关关系。

x <- c(6,7,8,9,10)
y <- c(5,6,7,8,10)
set.seed(123)####Spearman秩相关检验
spearman_test(y~x)
## 
##  Asymptotic Spearman Correlation Test
## 
## data:  y by x
## Z = 2, p-value = 0.0455
## alternative hypothesis: true rho is not equal to 0

P值小于0.05,可以认为两位评分员结论有关。

Friedman检验

例 四种测试的方式,分别测试得到如下:A:14 13 12 11 10 ,B:11 12 13 14 15,C:16 15 14 13 12,D:13 12 11 10 9,问不同的测试方式之间是否有差异?

N   <- 5
P   <- 4
DV1 <- c(14, 13, 12, 11, 10)
DV2 <- c(11, 12, 13, 14, 15)
DV3 <- c(16, 15, 14, 13, 12)
DV4 <- c(13, 12, 11, 10,9)
Fdf <- data.frame(id=factor(rep(1:N, times=P)),
                  DV=c(DV1, DV2, DV3, DV4),
                  IV=factor(rep(1:P, each=N),
                            labels=LETTERS[1:P]))
set.seed(123)
friedman_test(DV ~ IV | id, distribution=approximate(B=9999), data=Fdf)
## 
##  Approximative Friedman Test
## 
## data:  DV by IV (A, B, C, D) 
##   stratified by id
## chi-squared = 8.2653, p-value = 0.0342

P值小于0.05,拒绝原假设,可以认为四种不同的测试方式之间有差异。

回归分析

例 某项“冠状动脉缓慢血流现象”的影响因素的研究,以前降支、回旋支、右冠状动脉三支血管的平均TIMI帧基数(MTFC)表示,调查的影响因素有年龄(AGE,岁)、收缩压(SBP,mmHg)、舒张压(DBP,mmHg)、白细胞(WBC,\(10^{2}\)/L),寻找影响MTFC变化的因素。

age sbp dbp wbc mtfc
43 110 50 6.19 33.67
63 105 60 6.03 26.67
59 100 60 5.28 23
78 100 60 6.52 26
67 100 60 7.31 28
65 119 61 5.67 30.33
66 120 64 5.11 27
73 130 88 6.40 47
53 113 68 4.41 27.67
76 120 70 4.20 37.33
76 136 70 5.38 35.67
76 130 70 4.94 31.33
68 126 70 4.56 32.33
61 136 70 5.42 30.67
78 124 70 5.75 37.67
80 110 70 4.68 36
74 140 70 8.67 41
75 130 70 6.62 41.67
66 130 70 6.86 22
55 114 70 7.52 23.33
71 120 70 4.94 25.67
62 130 70 4.59 25
69 130 70 4.26 27
45 110 70 10.21 29
79 120 70 6.46 30.33
58 110 70 4.70 27
65 100 70 6.06 28
44 119 70 5.55 22.33
53 110 70 14.0 29.33
62 130 72 7.29 43
62 118 72 3.97 27.33
53 122 74 3.97 18.33
71 130 75 3.78 31
54 116 75 4.35 22.33
64 120 76 6.59 30
71 140 78 5.70 35.67
50 121 78 5.27 40.33
51 138 80 5.65 34.67
73 130 80 7.45 35.33
64 138 80 6.58 33.67
40 130 80 7.51 35.33
72 120 80 4.42 34
51 100 80 7.85 21
49 120 89 5.14 20.67
63 150 90 8.18 42.67
56 130 90 5.23 30.67
69 160 90 7.10 39
78 130 90 6.03 29
78 120 90 4.52 30.67
61 150 92 7.52 40
76 142 92 4.66 38
51 140 100 5.70 28.33
51 140 100 6.71 42.67
57 160 100 6.14 41
63 190 100 5.25 46
69 150 80 6.33 22.67
records <- read.table("example1")
ex <- rename(records, c("V1"="age","V2"="sbp","V3"="dbp","V4"="wbc","V5"="mtfc"))
set.seed(123)
fit <- lmp(mtfc~age+sbp+log10(wbc),data=ex,perm = "Prob")
## [1] "Settings:  unique SS : numeric variables centered"
summary(fit)
## 
## Call:
## lmp(formula = mtfc ~ age + sbp + log10(wbc), data = ex, perm = "Prob")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -15.76343  -3.01554   0.07689   2.44306  12.64824 
## 
## Coefficients:
##            Estimate Iter Pr(Prob)    
## age          0.1574 3026   0.0321 *  
## sbp          0.2247 5000   <2e-16 ***
## log10(wbc)  15.6553 2972   0.0326 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.504 on 52 degrees of freedom
## Multiple R-Squared: 0.4174,  Adjusted R-squared: 0.3838 
## F-statistic: 12.42 on 3 and 52 DF,  p-value: 3.068e-06

结果显示三个因素均有统计学意义。perm有Exact、Prob和SPR三个选项,Exact适用于10个观测数以下的精确间检验,Prob从可能的排列中不断抽样,直至估计的标准差在0.1之下,SPR适用序贯概率比检验来判断何时停止抽样。pgirmess包中PermTest函数也可以对lm、lme和glm(binomial and Poisson)进行置换检验。

fit <- lm(mtfc~age+sbp+log10(wbc),data=ex)
PermTest(fit,B=1000)
## 
## Monte-Carlo test
## 
## Call: 
## PermTest.lm(obj = fit, B = 1000)
## 
## Based on 1000 replicates
## Simulated p-value:
##            p.value
## age          0.022
## sbp          0.000
## log10(wbc)   0.029

回归系数的置信区间

getRegr <- function(dat, idx) {
  bsFit <- lm(mtfc~age+sbp+log10(wbc), subset=idx, data=dat)
  coef(bsFit)
}

(bsRegr <- boot(ex, statistic=getRegr, R=999))
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = ex, statistic = getRegr, R = 999)
## 
## 
## Bootstrap Statistics :
##        original       bias    std. error
## t1* -18.6719523  0.170056926  9.56747896
## t2*   0.1573923  0.001221412  0.07670665
## t3*   0.2246608 -0.004772995  0.03942347
## t4*  15.6552701  0.468670463  6.20925415
boot.ci(bsRegr, conf=0.95, type="bca", index=1)$bca #需要指定索引参数
##      conf                                
## [1,] 0.95 37.79 984.54 -34.52737 1.680563

方差分析

单因素方差分析

例 某医院欲研究A、B、C三种降血脂药物对家兔血清肾素血管紧张素转化酶(ACE)的影响,将家兔随机分为三组,均喂以高脂饮食,分别给予不同的降血脂药物。一定时间后测定家兔血清ACE浓度(u/ml),A组(45 44 43 47 48 44 46 44 40 45 42 40 43 46 47 45 46 45 43 44),B组(45 48 47 43 46 47 48 46 43 49 46 43 47 46 47 46 45 46 44 45 46 44 43 42 45),c组(47 48 45 46 46 44 45 48 49 50 49 48 47 44 45 46 45 43 44 45 46 43 42),问三组家兔血清ACE浓度是否相同?

a <- c(45, 44, 43, 47, 48, 44, 46, 44, 40, 45, 42, 40, 43, 46, 47, 45, 
    46, 45, 43, 44)
b <- c(45, 48, 47, 43, 46, 47, 48, 46, 43, 49, 46, 43, 47, 46, 47, 46, 
    45, 46, 44, 45, 46, 44, 43, 42, 45)
c <- c(47, 48, 45, 46, 46, 44, 45, 48, 49, 50, 49, 48, 47, 44, 45, 46, 
    45, 43, 44, 45, 46, 43, 42)
dfCRp <- data.frame(length = c(a, b, c), site = factor(c(rep("1", 20), 
    rep("2", 25), rep("3", 23))))

set.seed(123)
ot <- aovp(length ~ site, data = dfCRp,perm="Prob")
## [1] "Settings:  unique SS "
summary(ot)
## Component 1 :
##             Df R Sum Sq R Mean Sq Iter Pr(Prob)  
## site         2   26.292   13.1462 4491  0.05055 .
## Residuals   65  263.399    4.0523                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P值为0.05,可以认为三组的家兔血清ACE浓度不同。

自主法求F值及其置信区间

anBase <- anova(lm(length ~ site, data=dfCRp))
Fbase  <- anBase["site", "F value"]
(pBase <- anBase["length", "Pr(>F)"])
## [1] NA
fit0 <- lm(length ~ 1, data=dfCRp)    ## fit 0-model
E    <- residuals(fit0)               ## residuals
Er   <- E / sqrt(1-hatvalues(fit0))   ## rescaled residuals
Yhat <- fitted(fit0)     

getAnova <- function(dat, idx) {
  Ystar <- Yhat + Er[idx]
  anBS  <- anova(lm(Ystar ~ site, data=dat))
  anBS["site", "F value"]
}

(bsAnova <- boot(dfCRp, statistic=getAnova, R=999))
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = dfCRp, statistic = getAnova, R = 999)
## 
## 
## Bootstrap Statistics :
##     original    bias    std. error
## t1* 3.244153 -2.262829    1.035141
boot.ci(bsAnova,conf=0.95, type=c("basic", "bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = bsAnova, conf = 0.95, type = c("basic", "bca"))
## 
## Intervals : 
## Level      Basic                BCa          
## 95%   ( 2.603,  6.468 )   ( 2.686,  7.750 )  
## Calculations and Intervals on Original Scale
## Warning : BCa Intervals used Extreme Quantiles
## Some BCa intervals may be unstable
Fstar    <- bsAnova$t
# don't use >= because of floating point arithmetic problems
tol     <- .Machine$double.eps^0.5
FsIsGEQ <- (Fstar > Fbase) | (abs(Fstar-Fbase) < tol)
(pValBS <- (sum(FsIsGEQ) + 1) / (length(Fstar) + 1))
## [1] 0.046
plot(Fstar, ecdf(Fstar)(Fstar), col="gray60", pch=1, xlab="f* bzw. f",
     ylab="P(F <= f)", main="F*: cumulative rel. freqs and F CDF")
curve(pf(x, P-1, sum(Nj) - P), lwd=2, add=TRUE)
legend(x="topleft", lty=c(NA, 1), pch=c(1, NA), lwd=c(2, 2),
       col=c("gray60", "black"), legend=c("F*", "F"))

Wild boostrap

getAnovaWild <- function(dat, idx) {
  n  <- length(idx)                     ## size of replication
  ## 1st choice for random variate U: Rademacher-variables
  Ur <- sample(c(-1, 1), size=n, replace=TRUE, prob=c(0.5, 0.5))
  
  ## 2nd option for choosing random variate U
  Uf <- sample(c(-(sqrt(5) - 1)/2, (sqrt(5) + 1)/2), size=n, replace=TRUE,
               prob=c((sqrt(5) + 1)/(2*sqrt(5)), (sqrt(5) - 1)/(2*sqrt(5))))
  
  Ystar <- Yhat + (Er*Ur)[idx]          ## for E* with Rademacher-variables
  # Ystar <- Yhat + (Er*Uf)[idx]        ## for E* with 2nd option
  anBS  <- anova(lm(Ystar ~ site, data=dat))
  anBS["site", "F value"]
}

bsAnovaW <- boot(dfCRp, statistic=getAnovaWild, R=999)
boot.ci(bsAnovaW,conf=0.95, type=c("basic", "bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = bsAnovaW, conf = 0.95, type = c("basic", "bca"))
## 
## Intervals : 
## Level      Basic                BCa          
## 95%   (-4.0163,  0.0822 )   ( 0.0005,  0.0933 )  
## Calculations and Intervals on Original Scale
## Warning : BCa Intervals used Extreme Quantiles
## Some BCa intervals may be unstable
FstarW   <- bsAnovaW$t

# don't use >= because of floating point arithmetic problems
tol      <- .Machine$double.eps^0.5
FsIsGEQ  <- (FstarW > Fbase) | (abs(FstarW-Fbase) < tol)
(pValBSw <- (sum(FsIsGEQ) + 1) / (length(FstarW) + 1))
## [1] 0.056
单因素协方差分析

例 multcomp包中litter数据集是怀孕小白鼠被分为四个小组,每个小组接受不同剂量(0、5、50和500)的药物处理dose为自变量,产下幼崽的体重weigth均值为因变量,怀孕时间gesttime为协变量。

data(litter,package = "multcomp")
ddply(.data = litter,.(dose),summarize,mean=mean(weight))
##   dose     mean
## 1    0 32.30850
## 2    5 29.30842
## 3   50 29.86611
## 4  500 29.64647
shapiro.test(litter$weight)
## 
##  Shapiro-Wilk normality test
## 
## data:  litter$weight
## W = 0.9674, p-value = 0.05282
bartlett.test(weight~dose,data = litter)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  weight by dose
## Bartlett's K-squared = 9.6497, df = 3, p-value = 0.02179
set.seed(123)
ancova <- aovp(weight~gesttime+dose,data = litter,perm = "Prob")
## [1] "Settings:  unique SS : numeric variables centered"
summary(ancova)
## Component 1 :
##             Df R Sum Sq R Mean Sq Iter Pr(Prob)   
## gesttime     1   161.49   161.493 5000  0.00260 **
## dose         3   137.12    45.708 4457  0.03388 * 
## Residuals   69  1151.27    16.685                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2))
plot(ancova)

par(mfrow=c(1,1))

数据满足正态性的要求,但不满足方差齐性的要求。检验结果表明怀孕时间gesttime与出生体重weight相关,在控制怀孕时间后,每种药物剂量dose下出生体重weight均值不同。

双因素方差分析

例 基础安装中的ToothGrowth数据集是随机分配60只豚鼠,分别采用两种喂食方法supp(橙汁或维生素C),各喂食方法中抗坏血酸含量有三种水平dose(0.5mg、1mg或2mg),每种处理方式组合都被分配10只豚鼠,牙齿长度len为因变量。

attach(ToothGrowth)
## The following object is masked _by_ .GlobalEnv:
## 
##     supp
## The following object is masked from data (pos = 3):
## 
##     dose
#table(supp,dose)
ddply(.data = ToothGrowth,.(supp,dose),summarise,mean=mean(len))
##   supp dose  mean
## 1   OJ  0.5 13.23
## 2   OJ  1.0 22.70
## 3   OJ  2.0 26.06
## 4   VC  0.5  7.98
## 5   VC  1.0 16.77
## 6   VC  2.0 26.14
ddply(.data = ToothGrowth,.(supp,dose),summarise,sd=sd(len))
##   supp dose       sd
## 1   OJ  0.5 4.459709
## 2   OJ  1.0 3.910953
## 3   OJ  2.0 2.655058
## 4   VC  0.5 2.746634
## 5   VC  1.0 2.515309
## 6   VC  2.0 4.797731

table语句的预处理表明该设计是均衡设计(各设计单元中样本大小都相同),ddply语句处理可获得各单元的均值和标准差。

set.seed(123)
aovCRFpq <- aovp(len~ supp*dose, data=ToothGrowth,perm = "Prob")
## [1] "Settings:  unique SS : numeric variables centered"
summary(aovCRFpq)
## Component 1 :
##             Df R Sum Sq R Mean Sq Iter Pr(Prob)    
## supp         1   205.35    205.35 5000  0.00220 ** 
## dose         1  2224.30   2224.30 5000  < 2e-16 ***
## supp:dose    1    88.92     88.92 2118  0.04533 *  
## Residuals   56   933.63     16.67                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2))
plot(aovCRFpq)

par(mfrow=c(1,1))

得到方差分析表,可以看到主效应(supp和dose)和交互效应都非常显著。

Logistic回归

例 AER包中包含一个Affairs数据,记录了一组婚外情数据,其中包括参与者性别、年龄、婚龄、是否有小孩、宗教信仰程度(5分制,1表示反对,5表示非常信仰)、学历、职业和婚姻的自我评分(5分制,1表示非常不幸福,5表示非常幸福),问婚外情的影响因素。

data(Affairs,package="AER")
Affairs$ynaffair[Affairs$affairs > 0] <- 1
Affairs$ynaffair[Affairs$affairs==0] <- 0
Affairs$ynaffair <- factor(Affairs$ynaffair,levels=c(0,1),labels=c("No","Yes"))
fit<- glm(Affairs$ynaffair~gender+age+yearsmarried+children+religiousness+education+occupation+rating,data=Affairs,family=binomial())
summary(fit)
## 
## Call:
## glm(formula = Affairs$ynaffair ~ gender + age + yearsmarried + 
##     children + religiousness + education + occupation + rating, 
##     family = binomial(), data = Affairs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5713  -0.7499  -0.5690  -0.2539   2.5191  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    1.37726    0.88776   1.551 0.120807    
## gendermale     0.28029    0.23909   1.172 0.241083    
## age           -0.04426    0.01825  -2.425 0.015301 *  
## yearsmarried   0.09477    0.03221   2.942 0.003262 ** 
## childrenyes    0.39767    0.29151   1.364 0.172508    
## religiousness -0.32472    0.08975  -3.618 0.000297 ***
## education      0.02105    0.05051   0.417 0.676851    
## occupation     0.03092    0.07178   0.431 0.666630    
## rating        -0.46845    0.09091  -5.153 2.56e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 675.38  on 600  degrees of freedom
## Residual deviance: 609.51  on 592  degrees of freedom
## AIC: 627.51
## 
## Number of Fisher Scoring iterations: 4
set.seed(123)
PermTest(fit,B=250)
## 
## Monte-Carlo test
## 
## Call: 
## PermTest.glm(obj = fit, B = 250)
## 
## Based on 250 replicates
## Simulated p-value:
##               p.value
## gender          0.236
## age             0.232
## yearsmarried    0.000
## children        0.108
## religiousness   0.000
## education       0.876
## occupation      0.508
## rating          0.000

yearsmarried、religiousness和rating三个变量的P值较小,可能是婚外情的影响因素。

广义线性模型

例 robust包中Breslow癫痫数据记录了治疗初期八周内,抗癫痫药物对癫痫发病数的影响,因变量sumY为随机后8周内癫痫发病数,自变量治疗Trt,年龄Age和治疗前8周的癫痫发病数Base。

data(breslow.dat,package="robust")
fitm <- glm(breslow.dat$sumY~Base+Age+Trt,data=breslow.dat,family = poisson())
summary(fitm)
## 
## Call:
## glm(formula = breslow.dat$sumY ~ Base + Age + Trt, family = poisson(), 
##     data = breslow.dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.0569  -2.0433  -0.9397   0.7929  11.0061  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.9488259  0.1356191  14.370  < 2e-16 ***
## Base          0.0226517  0.0005093  44.476  < 2e-16 ***
## Age           0.0227401  0.0040240   5.651 1.59e-08 ***
## Trtprogabide -0.1527009  0.0478051  -3.194   0.0014 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 2122.73  on 58  degrees of freedom
## Residual deviance:  559.44  on 55  degrees of freedom
## AIC: 850.71
## 
## Number of Fisher Scoring iterations: 5
set.seed(123)
PermTest(fitm,B=250)
## 
## Monte-Carlo test
## 
## Call: 
## PermTest.glm(obj = fitm, B = 250)
## 
## Based on 250 replicates
## Simulated p-value:
##      p.value
## Base   0.000
## Age    0.432
## Trt    0.736

Base变量的P值较小,可能对癫痫的发病数有影响。

自主法(Bootstrapping)

自主法是在原样本中有放回的抽样,随机抽取形成一个新的样本,重复这样的操作,形成一系列的新样本,通过这些样本就可以计算出样本的一个分布。该法无需特定的理论分布,便可计算统计量的置信区间,并能检验统计假设。

单个统计量

例 计算robust包中breslow.dat数据集年龄Age的中位数及其置信区间

mymedian <- function(data,indices){
  d <- data[indices,]
  rs <- median(d$Age)
  return (rs)
}

set.seed(123)
result <- boot(data=breslow.dat,statistic = mymedian,R=1000)
print(result)
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = breslow.dat, statistic = mymedian, R = 1000)
## 
## 
## Bootstrap Statistics :
##     original  bias    std. error
## t1*       28  -0.095    1.540243
plot(result)

boot.ci(result,type = c("perc","bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = result, type = c("perc", "bca"))
## 
## Intervals : 
## Level     Percentile            BCa          
## 95%   (25, 31 )   (25, 30 )  
## Calculations and Intervals on Original Scale
## Some BCa intervals may be unstable

多个统计量

例 计算Affairs数据中logistic回归系统的置信区间

bs <- function(formula,data,indices){
  d <- data[indices,]
  fit <- glm(formula,data=d,family=binomial())
  return (coef(fit))
}

result <- boot(data = Affairs,statistic = bs,R=1000,formula=ynaffair~gender+age+yearsmarried+children+religiousness+education+occupation+rating)
print(result)
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = Affairs, statistic = bs, R = 1000, formula = ynaffair ~ 
##     gender + age + yearsmarried + children + religiousness + 
##         education + occupation + rating)
## 
## 
## Bootstrap Statistics :
##        original        bias    std. error
## t1*  1.37725816  0.0432807910  0.96142857
## t2*  0.28028665 -0.0014779510  0.25592478
## t3* -0.04425502 -0.0019613543  0.01922035
## t4*  0.09477302  0.0021235214  0.03310868
## t5*  0.39767213  0.0289370604  0.30131353
## t6* -0.32472063 -0.0062587042  0.09369153
## t7*  0.02105086  0.0008435406  0.05141061
## t8*  0.03091971  0.0013995644  0.07642079
## t9* -0.46845426 -0.0112432424  0.09466788
#print(result,index = 8) #多个统计量,添加索引
plot(result,index = 8)

boot.ci(result,type = "perc",index = 8)
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = result, type = "perc", index = 8)
## 
## Intervals : 
## Level     Percentile     
## 95%   (-0.1209,  0.1699 )  
## Calculations and Intervals on Original Scale

type参数有norm(normal approximation方法)、basic(basic bootstrap方法)、stud(studentized bootstrap方法)、perc(bootstrap percentile方法)、bca(adjusted bootstrap percentile)和all(全部)。

分层自主法(Stratified bootstrapping)

例 制药厂试制某种安定神经的新药,两台仪器制造药品服从正态分布,从各自加工药品中,分别取若干个测量其直径,两组直径如下A组 20.5 19.8 19.7 20.4 20.1 20.0 19.0 19.9 B组 20.7 19.8 19.5 20.8 20.4 19.6 20.2,问两组均数之差的置信区间? 两组分层抽样非参数法

DVm<-c(20.5, 19.8, 19.7, 20.4, 20.1, 20.0, 19.0, 19.9)
DVf<-c(20.7, 19.8, 19.5, 20.8, 20.4, 19.6, 20.2)
tDf <- data.frame(DV=c(DVm, DVf),
                  IV=factor(rep(c("m", "f"), c(length(DVm), length(DVf)))))
getDM <- function(dat, idx) {
  Mfm <- aggregate(DV ~ IV, data=dat, subset=idx, FUN=mean)
  -diff(Mfm$DV)
}

bsTind <- boot(tDf, statistic=getDM, strata=tDf$IV, R=999)
boot.ci(bsTind, conf=0.95, type=c("basic", "bca")) 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = bsTind, conf = 0.95, type = c("basic", "bca"))
## 
## Intervals : 
## Level      Basic                BCa          
## 95%   (-0.2446,  0.6625 )   (-0.1881,  0.7330 )  
## Calculations and Intervals on Original Scale

参数法

tt <- t.test(DV ~ IV, alternative="two.sided", var.equal=TRUE, data=tDf)
tt$conf.int
## [1] -0.3327106  0.7684249
## attr(,"conf.level")
## [1] 0.95