Non-parametric Tests with R

什么是非参数检验?

在总体方差未知或知道甚少的情况下，利用样本数据对总体分布形态等进行推断的方法。
在推断过程中不涉及有关总体分布的参数。

非参数检验的优点及缺点

优点　
- 计算方便，容易掌握
- 总体方差未知的情况下可以使用
缺点
- 信息利用率不高
- 检验表功效不大
- II型错误可能性增加

Index

Wilcoxon Signed Rank Test
Wilcoxon Rank Sum Test
Kruskal-Wallis Test
Spearman’s Rank Correlation

Wilcoxon Signed Rank Test

利用观测值和原假设中心位置只差的符号检验,不利用差的大小
Wilcoxon符号秩检验统计量　\[W{+}=sum_{i=1}^nu_iR_i\]

wilconx.test()调用格式

wilcox.test(x, y = NULL, alternative = c(“two.sided”, “less”, “greater”), mu = 0, paired = FALSE, exact = NULL, correct = TRUE, conf.int = FALSE, conf.level = 0.95, …)
Exact表示是否算出准确的p值
Correct表示大样本时是否作连续型修正

Example 1

某保险种类中，一次关于2006年索赔数额（单位：元）的随机抽样为：4632,4728,5052,5064,5484,6972,7696,9048,14760,15013,1830,21240,22836,52788,67200 已知2005年的索赔数额中位数为6064元，问2006年索赔的中位数与前一年是否有所变化(α=0.05)?

 insure<-c(4632,4728,5052,5064,5484,6972,7696,9048,14760,15013,18730,21240,22836,52788,67200) 
 wilcox.test(insure,mu=6064,conf.int=TRUE)

## 
##  Wilcoxon signed rank test
## 
## data:  insure
## V = 101, p-value = 0.01807
## alternative hypothesis: true location is not equal to 6064
## 95 percent confidence interval:
##   6840 28926
## sample estimates:
## (pseudo)median 
##          13065

P=0.01807<0.05,故拒绝原假设，认为两年的中位数显著不同

Wilcoxon Rank Sum Test

在未知总体分布的情况下，若用t检验对两样本的均值进行检验会有风险，所以此时考虑Wilcoxon秩和检验
此法用于检验两个样本的位置参数关系
H0:Mx=My(M为样本总体中位数)

Example 2

有糖尿病的老鼠和正常的老鼠体重为（单位:g）
糖尿病鼠:
42,44,38,52,48,46,34,44,38
正常老鼠:
34,43,35,33,34,26,30,31,31,27,28,27,30,37,32
检验这两组的体重是否有显著不同（α=0.05）

 normal<-c(34,43,35,33,34,26,30,31,31,27,28,27,30,37,32)
 diabetes<-c(42,44,38,52,48,46,34,44,38)
 wilcox.test(diabetes,normal,exact=FALSE,correct=FALSE)

## 
##  Wilcoxon rank sum test
## 
## data:  diabetes and normal
## W = 128, p-value = 0.0003008
## alternative hypothesis: true location shift is not equal to 0

P=0.0003008<0.05,拒绝原价假设，认为这两组体重显著不同

Kruskal-Wallis Test

检验统计量
No ties: \[H=H^*=\frac{12}{N(N+1)}*sum_{i=1}^k\frac{R_{i}^2}{n_i}-3(N+1)\]
Ties: \[H=\frac{H^*}{1-\frac{sum_{j=1}^g(t_{j}^3-t_j)}{N^3-N}}\]

Kruskal.test()调用格式

kruskal.test(x,g,……)
X为列向量或者列表，g为对x分类的因子，当x为列表时，g可以省略
H0：θ1= θ2= θ3=…= θk

Exapmle 3

Group
A 6.4 6.8 7.2 8.3 8.4 9.1 9.4 9.7
B 2.5 3.7 4.9 5.4 5.9 8.1 8.2
C 1.3 4.1 4.9 5.2 5.5 8.2
Is the qualities of these three wines different ?

 x <- list ( a = c(6.4,6.8,7.2,8.3,8.4,9.1,9.4,9.7)
            ,b = c(2.5,3.7,4.9,5.4,5.9,8.1,8.2)
            ,c = c(1.3,4.1,4.9,5.2,5.5,8.2))
 kruskal.test(x)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  x
## Kruskal-Wallis chi-squared = 9.8491, df = 2, p-value = 0.007266

P=0.007266<0.05,故拒绝原假设，认为这三种酒品质不全相同

Spearman rank correlation

当每个样本有两个观测值,并且你想测试这两个观测值之间相互关系时可以运用Spearman randcorrelation
适用于等级型，半定量数据

调用格式

调用的函数为cor(x, y = NULL, use = “everything”, method = c(“pearson”, “kendall”, “spearman”))
等级相关系数
Method变量取spearman即可

Example 4

请对某省地方性甲状腺肿患病率（morb）与当地食品与水中含碘量的关系数据，计算其等级相关系数，说明两者间的关系。

表 6-1 某地地方性甲状腺肿患病率（%）与其食品、水中含碘量的数据

调查地点	含碘量（I）	患病率（morb）	含碘量	患病率	d	d2
1	201	0.2	1	7	-6	36
2	178	0.6	2	6	-4	16
3	155	1.1	3	4	-1	1
4	154	0.8	4	5	-1	1
5	126	2.5	5	3	2	4
6	81	4.4	6	2	4	16
7	71	16.9	7	1	6	36

 I <- c(201,178,155,154,126,81,71)
 morb <-c(0.2,0.6,1.1,0.8,2.5,4.4,16.9)
 cor(I,morb,method="spearman")

## [1] -0.9642857

此值大于“等级相关系数 rs 界值表”中的r0.01=0.929,故P＜0.01，说明甲状腺肿患病率与当地食品水中含碘量之间呈负相关，即含碘量越高，患病率就越低。

## R version 3.3.3 (2017-03-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## locale:
##  [1] LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=zh_CN.UTF-8        LC_COLLATE=zh_CN.UTF-8    
##  [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8   
##  [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] backports_1.0.5      magrittr_1.5         rprojroot_1.2       
##  [4] tools_3.3.3          htmltools_0.3.6      yaml_2.1.14         
##  [7] Rcpp_0.12.12         stringi_1.1.5        rmarkdown_1.6.0.9001
## [10] knitr_1.16           stringr_1.2.0        digest_0.6.12       
## [13] evaluate_0.10.1

Non-parametric Tests with R

shuchengren

2017-09-04