editor_options: markdown: wrap: 72 output: html_document: df_print: paged —
组间差异比较
正态分布检验
方差齐性检验
## 准备工作
rm(list=ls())
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 449586 24.1 962665 51.5 643731 34.4
## Vcells 803220 6.2 8388608 64.0 1648428 12.6
#加载R包
options(warn=0)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.0.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(car)
## 载入需要的程辑包:carData
##
## 载入程辑包:'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
library(gplots)
##
## 载入程辑包:'gplots'
## The following object is masked from 'package:stats':
##
## lowess
#install.packages("HH")
library(HH)
## 载入需要的程辑包:lattice
## 载入需要的程辑包:grid
## 载入需要的程辑包:latticeExtra
##
## 载入程辑包:'latticeExtra'
## The following object is masked from 'package:ggplot2':
##
## layer
## 载入需要的程辑包:multcomp
## 载入需要的程辑包:mvtnorm
## 载入需要的程辑包:survival
## 载入需要的程辑包:TH.data
## 载入需要的程辑包:MASS
##
## 载入程辑包:'MASS'
## The following object is masked from 'package:dplyr':
##
## select
##
## 载入程辑包:'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
## 载入需要的程辑包:gridExtra
##
## 载入程辑包:'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
##
## 载入程辑包:'HH'
## The following object is masked from 'package:gplots':
##
## residplot
## The following objects are masked from 'package:car':
##
## logit, vif
## The following object is masked from 'package:purrr':
##
## transpose
#install.packages("rrcov")
library(rrcov)
## 载入需要的程辑包:robustbase
##
## 载入程辑包:'robustbase'
## The following object is masked from 'package:survival':
##
## heart
## Scalable Robust Estimators with High Breakdown Point (version 1.6-0)
#install.packages("multicomp")
#library(multicomp)
#install.packages("effects")
library(effects)
## Use the command
## lattice::trellis.par.set(effectsTheme())
## to customize lattice options for effects plots.
## See ?effectsTheme for details.
#install.packages("MASS")
library(MASS)
#install.packages("mvoutlier")
library(mvoutlier)
## 载入需要的程辑包:sgeostat
#install.packages("pwr")
library(pwr)
#install.packages("FSA")
library(FSA)
## Registered S3 methods overwritten by 'FSA':
## method from
## confint.boot car
## hist.boot car
## ## FSA v0.9.1. See citation('FSA') if used in publication.
## ## Run fishR() for related website and fishR('IFAR') for related book.
##
## 载入程辑包:'FSA'
## The following object is masked from 'package:car':
##
## bootCase
library(ggstatsplot)
## Registered S3 method overwritten by 'WRS2':
## method from
## print.ancova HH
## Registered S3 methods overwritten by 'parameters':
## method from
## as.double.parameters_kurtosis datawizard
## as.double.parameters_skewness datawizard
## as.double.parameters_smoothness datawizard
## as.numeric.parameters_kurtosis datawizard
## as.numeric.parameters_skewness datawizard
## as.numeric.parameters_smoothness datawizard
## print.parameters_distribution datawizard
## print.parameters_kurtosis datawizard
## print.parameters_skewness datawizard
## summary.parameters_kurtosis datawizard
## summary.parameters_skewness datawizard
## You can cite this package as:
## Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
## Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167
options(warn = 1)
两组间比较还是多组比较?
参数还是非参数?
| 2组 | 多组 | 事后两两检验 | |
|---|---|---|---|
| 参数 | t-检验 | ANOVA | F检验;Gam es-Howell检验 |
| 非参数 | wilcoxon秩 和(MW检验) | KW检验 | Dunn检验 |
Description: Performs one and two sample t-tests on vectors of data.
适用条件:①under normality assumptions ②large samples(多大叫大?)
说明:比较的是均值,一般不是特别偏态分布就可以用
Description: Performs one- and two-sample Wilcoxon tests on vectors of data; the latter is also known as Mann-Whitney test.
适用条件:①不符合t-test的条件(不满足正态分布);②因变量为等级、分类变量时;③是t-test的替代品,效力不如t-test
## 载入数据
load("datt.Rdata")
head(datt)
## agegroup Stage Expression
## 1 >65 IV 4.523562
## 2 >65 IV 5.000000
## 3 <=65 IV 3.169925
## 4 <=65 IV 1.000000
## 5 <=65 IV 5.700440
## 6 >65 IV 6.209453
summary(datt)
## agegroup Stage Expression
## <=65:324 I : 19 Min. : 0.000
## >65 :175 II : 95 1st Qu.: 2.000
## NA's: 1 III :102 Median : 3.459
## IV :270 Mean : 3.822
## NA's: 14 3rd Qu.: 5.459
## Max. :12.783
dat <- datt %>% dplyr::select(agegroup,Expression) %>% na.omit()
# 参数检验:
ggbetweenstats(data=dat,x=agegroup,y=Expression,type = "p")
t.test(Expression~agegroup,data = dat)
##
## Welch Two Sample t-test
##
## data: Expression by agegroup
## t = -0.24934, df = 404.91, p-value = 0.8032
## alternative hypothesis: true difference in means between group <=65 and group >65 is not equal to 0
## 95 percent confidence interval:
## -0.4929888 0.3820058
## sample estimates:
## mean in group <=65 mean in group >65
## 3.802547 3.858039
ggbetweenstats()和t-test结果一致
ggbetweenstats()和t.test()默认方差不等,默认使用Welch’s t-test(即方差不齐,适合2组观测数量不等时使用的Student’s t-test的改良版)
可以通过var.equal=TRUE设置方差齐平
Welch’s t-test From Wikipedia: In statistics, Welch’s t-test, or unequal variances t-test, is a two-sample location test which is used to test the hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, is an adaptation of Student’s t-test, and is more reliable when the two samples have unequal variances and/or unequal sample sizes. These tests are often referred to as “unpaired” or “independent samples” t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch’s t-test has been less popular than Student’s t-test and may be less familiar to readers, a more informative name is “Welch’s unequal variances t-test” — or “unequal variances t-test” for brevity.
ggbetweenstats(data=dat,x=agegroup,y=Expression,type = "p",var.equal = TRUE)
t.test(Expression~agegroup,data = dat,var.equal=TRUE)
##
## Two Sample t-test
##
## data: Expression by agegroup
## t = -0.23837, df = 497, p-value = 0.8117
## alternative hypothesis: true difference in means between group <=65 and group >65 is not equal to 0
## 95 percent confidence interval:
## -0.5128847 0.4019017
## sample estimates:
## mean in group <=65 mean in group >65
## 3.802547 3.858039
两者结果再次一致,此时ggbetweenstats()进行的是经典的Student's t-test
但是这种检验合理吗?Student’s t-test 要求符合正态分布或大样本
方法: shapiro-wilk正态性检验
假设: H0:总体符合正态分布 VS 总体不符合正态分布
检验统计量:W
条件: P<0.05拒绝H0
# 正态分布检验
## QQ图检验:
x <- dat$Expression
#qqnorm(x)+qqline(x)
## shapiro-wilk test
shapiro.test(x)
##
## Shapiro-Wilk normality test
##
## data: x
## W = 0.96137, p-value = 3.639e-10
点距离线较远且p<0.05,拒绝H0(符合正态分布),即非正态
#非正态分布数据,用Levene's test
library(car)
leveneTest(Expression~agegroup,data=dat)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 5.6498 0.01784 *
## 497
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Pr(>F) < 0.01拒绝H0,方差不齐 所以不能用Student’s t-test
效力低于Welch’s t-test
ggbetweenstats(data=dat,x=agegroup,y=Expression,type = "n")
wilcox.test(Expression~agegroup,data = dat)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Expression by agegroup
## W = 27037, p-value = 0.3927
## alternative hypothesis: true location shift is not equal to 0
Description: Fit an analysis of variance model by a call to lm for each stratum.
适用条件:正态分布、方差相齐
适用条件:多组间比较差异是否显著+不满足ANOVA条件
参数:Games-Howell比较泛用,方差不齐也可以
非参数:Dunn检验
## 方差齐性检验
#非正态分布数据,用Levene's test
dat <- datt %>% dplyr::select(Stage,Expression) %>% na.omit()
leveneTest(Expression~Stage,data=dat)#方差齐平
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.0152 0.9974
## 482
## one-way ANOVA
ggbetweenstats(data = dat,x=Stage,y=Expression,type = "p",var.equal = TRUE)
## p=0.268
oneway.test(Expression~Stage,data = dat,var.equal = TRUE)
##
## One-way analysis of means
##
## data: Expression and Stage
## F = 1.3162, num df = 3, denom df = 482, p-value = 0.2684
结果一致,差异性用one-way ANOVA(Fisher 检验),事后组间两两比较用了Student’ t-test(此处应该是没办法改的?)
但是如果不设置方差齐平,则会默认进行Welch’s ANOVA检测
ggbetweenstats(data = dat,x=Stage,y=Expression,type = "p",var.equal = FALSE)
## p=0.310
oneway.test(Expression~Stage,data = dat,var.equal = FALSE)
##
## One-way analysis of means (not assuming equal variances)
##
## data: Expression and Stage
## F = 1.2165, num df = 3.00, denom df = 75.78, p-value = 0.3096
此时组间两两比较默认适用比较宽泛的Games-Howell test
## 非参数检验
ggbetweenstats(data=dat,x=Stage,y=Expression,type = "n")
p=0.234
#差异性检验
kruskal.test(Expression~Stage,data = dat)
##
## Kruskal-Wallis rank sum test
##
## data: Expression by Stage
## Kruskal-Wallis chi-squared = 4.2719, df = 3, p-value = 0.2336
#组间两两比较
#perform Dunn's Test with Bonferroni correction for p-values
dunnTest(Expression ~ Stage,
data=dat,
method="holm")
## Dunn (1964) Kruskal-Wallis multiple comparison
## p-values adjusted with the Holm method.
## Comparison Z P.unadj P.adj
## 1 I - II -1.7566221 0.07898224 0.4738934
## 2 I - III -1.4112275 0.15817753 0.6327101
## 3 II - III 0.6230429 0.53325631 0.5332563
## 4 I - IV -1.1029377 0.27005421 0.8101626
## 5 II - IV 1.5062327 0.13200746 0.6600373
## 6 III - IV 0.7816220 0.43443673 0.8688735
多组非参数检验适用KW秩和检验,事后组间两两比较使用Dunn test ### 总结: * ggbetweenstats 默认方差不齐,可以通过var.equal=TRUE设置,同时检验方法会改变
决定ggbetweenstats的统计方法的参数有:
type(参数检验还是秩和检验,还可以选robust——稳健性检验;bayes——完全是另一个学派);
var.equal方差是否相齐: 两组间比较时:Student-t | Welch’s-t; 方差分析: Fisher(经典单因素方差分析) | Welch’s ANOVA; 事后组间比较:Games-Well | Student‘s t-test
p.adjust.method:P值矫正: 默认“holm”; 多组比较推荐“fdr”
Adjustment method for p-values for multiple comparisons. Possible methods are: “holm” (default), “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”.
centrality.type中心点类型会随type变化:
“parameteric” (for mean)
“nonparametric” (for median)
“robust” (for trimmed mean)
“bayes” (for MAP estimator)
`