在CRAN(comprehensive R Achive Netwokrk)中已有13000多个R包了 简单讲ggstatsplot能够提供更为丰富信息的包,其实就是画出高质量的图 不需要我们花费过多的精力去调整绘图细节,举个例子 一般的探索性数据分过程析包括数据可视化与数据统计两个部分,而ggstatsplot正是达到两者结合的目的
组间比较-ggbetweenstats
library(ggstatsplot)
## Warning: package 'ggstatsplot' was built under R version 3.6.1
## Registered S3 methods overwritten by 'broom.mixed':
## method from
## augment.lme broom
## augment.merMod broom
## glance.lme broom
## glance.merMod broom
## glance.stanreg broom
## tidy.brmsfit broom
## tidy.gamlss broom
## tidy.lme broom
## tidy.merMod broom
## tidy.rjags broom
## tidy.stanfit broom
## tidy.stanreg broom
## Registered S3 methods overwritten by 'lme4':
## method from
## cooks.distance.influence.merMod car
## influence.merMod car
## dfbeta.influence.merMod car
## dfbetas.influence.merMod car
library(ggplot2)
p代表参数检验,np代表非参数 mpaa是分类变量,y是数值型变量
head(movies_long)
## # A tibble: 6 x 8
## title year length budget rating votes mpaa genre
## <chr> <int> <int> <dbl> <dbl> <int> <fct> <fct>
## 1 Shawshank Redemption, The 1994 142 25 9.1 149494 R Drama
## 2 Lord of the Rings: The Ret~ 2003 251 94 9 103631 PG-13 Acti~
## 3 Lord of the Rings: The Fel~ 2001 208 93 8.8 157608 PG-13 Acti~
## 4 Lord of the Rings: The Two~ 2002 223 94 8.8 114797 PG-13 Acti~
## 5 Pulp Fiction 1994 168 8 8.8 132745 R Drama
## 6 Schindler's List 1993 195 25 8.8 97667 R Drama
ggbetweenstats(
data = movies_long,
x = mpaa, # > 2 groups
y = rating,
type = "p", # default
messages = FALSE
)
> 默认参数绘图
ggbetweenstats(
data = movies_long,
x = mpaa,
y = rating
)
## Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
##
## Note: Shapiro-Wilk Normality Test for rating : p-value = < 0.001
##
## Note: Bartlett's test for homogeneity of variances for factor mpaa : p-value = 0.004
##
配对比较 pairwise.display参数控制曾现的比较,ns无意义,all,所有,s有意义的
ggbetweenstats(
data = movies_long,
x = mpaa,
y = rating,
type = "np",
mean.ci = TRUE,
pairwise.comparisons = TRUE,
pairwise.display = "s",
p.adjust.method = "fdr",
messages = FALSE
)
> 调整颜色,主题,可信区间调整,突出值标记 > confi.level:可信区间调整,ggtheme主题,pallete:颜色调用 > outlier:超出界限标记
ggbetweenstats(
data = movies_long,
x = mpaa,
y = rating,
type = "r",
conf.level = 0.99,
pairwise.comparisons = TRUE,
pairwise.annotation = "p",
outlier.tagging = TRUE,
outlier.label = title,
outlier.coef = 2,
ggtheme = hrbrthemes::theme_ipsum_tw(),
palette = "Darjeeling2",
package = "wesanderson",
messages = FALSE
)
图还是非常美观,就不去细讲每个参数了,需要时调用即可,这也是作者的意图
ggwithinstats(
data = WRS2::WineTasting,
x = Wine, # > 2 groups
y = Taste,
pairwise.comparisons = TRUE,
pairwise.annotation = "p",
ggtheme = hrbrthemes::theme_ipsum_tw(),
ggstatsplot.layer = FALSE,
messages = FALSE
)
代码简介,细节丰富
ggscatterstats(
data = movies_long,
x = budget,
y = rating,
type = "p", # default #<<<
conf.level = 0.99,
marginal=F,
messages = TRUE
)
其实还可以画很多其它的图,颜值都非常高,这里不再过多介绍,真正做到一图胜千言 总结一下这个包的局限性: - 虽然图的信息量大,但有时比如presentation,时间不够,图信息过多反而不利于简明扼要的传达信息 - 另外就是计算的统计量比较单一
参考资料:官方文档