#install.packages("pacman")
library(pacman)
p_load(tidyverse, ggstatsplot, ggpubr, easystats, vioplot)Data Viz for Stats Students
For this project, I’ll be doing a comparison of the data visualization capabilities of Base R graphics, ggplot2, ggpubr, easystats, and ggstatsplot. The inspiration for this project came seeing the R Graph Gallery and the ggplot2 extensions gallery.
For this project, I’ll be using the mpg dataset from the ggplot2 package. This dataset is used extensively in the book ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen, which also served as an inspiration for me to further explore data viz.
glimpse(mpg)Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "…
$ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "…
$ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.…
$ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200…
$ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, …
$ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto…
$ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4…
$ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1…
$ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2…
$ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p…
$ class <chr> "compact", "compact", "compact", "compact", "compact", "c…
Distributions
Histograms
Base R histograms
By default, the Base R hist function returns a frequency histogram. You can make a count histogram instead by setting the argument freq to FALSE. Here I’ve changed the number of bins using the the argument breaks.
hist(mpg$hwy, breaks = 30)ggplot2 histograms
The default number of bins for histograms made in ggplot2is 30, but that change be changed with the bins or binwidth argument. unfortunately ggplot2 histograms only shows observation counts, not densities.
ggplot(mpg, aes(hwy)) +
geom_histogram()`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggstatsplot Histograms
The ggstatsplot::gghistostats function allows you to see both the count and frequency (proportion) of observations through dual axes on your histograms. By default, gghistostats also plots central tendency measures and provides statistical test results as subtitles. Setting the results.subtitle and centrality.plotting arguments equal to FALSE removes them.
ggstatsplot::gghistostats(mpg, hwy,
results.subtitle = FALSE,
centrality.plotting = FALSE)ggpubr Histograms
Unlike the previous package, ggpubr::gghistogram requires you to use quotation marks around the name of your x value. You can add a density curve to you histogram using the argument add_density = TRUE. You can also include a line of central tendency using the add argument.
ggpubr::gghistogram(data = mpg, x = "hwy")Warning: Using `bins = 30` by default. Pick better value with the argument
`bins`.
Density Plots
Base R density plots
The plot function ….
plot(density(mpg$hwy))ggplot2 density plots
ggplot(mpg, aes(hwy)) +
geom_density()ggpubr density plots
ggpubr::ggdensity(mpg, x = "hwy")Summary Statistics
Boxplots
Base R boxplots
boxplot(hwy~drv, data = mpg)ggplot2 boxplots
ggplot(mpg, aes(drv, hwy)) +
geom_boxplot()ggstatsplot boxplots
ggstatsplot::ggbetweenstats(mpg, x= drv, y = hwy,
plot.type = "box",
results.subtitle = FALSE,
pairwise.comparisons = FALSE,
centrality.plotting = FALSE)ggpubr boxplots
ggpubr::ggboxplot(data = mpg, x = "drv", y = "hwy")Violin Plots
Base R violin plots
Using the vioplot package, we are able to create violin plots in base R. Using the horizontal logical argument, you can choose to have either horizontal or vertical plots. You can choose whether to have a one or two-sided violin by setting the side argument to either “left”, right, or the default “both”. The plotCentre argument allows you to see the median value as either a point (default) or line by setting it equal to “points” and “line” respectively.
vioplot::vioplot(hwy~drv, data=mpg)ggplot2 violin plots
ggplot(mpg, aes(drv, hwy)) +
geom_violin()ggstatsplot violin plots
ggstatsplot::ggbetweenstats(mpg, x= drv, y = hwy,
plot.type = "violin",
results.subtitle = FALSE,
pairwise.comparisons = FALSE,
centrality.plotting = FALSE)ggpubr violin plots
ggpubr::ggviolin(data = mpg, x = "drv", y = "hwy")see
ggplot(mpg, aes(drv, hwy)) +
see::geom_violindot()ggplot(mpg, aes(drv, hwy)) +
see::geom_violinhalf()Jitterplots
ggplot2 jitterplots
ggplot(mpg, aes(drv, hwy)) +
geom_jitter()easystats jitterplots
ggplot(mpg, aes(drv, hwy)) +
see::geom_jitter2()Simple Linear Regression
Scatterplot
Base R scatterplot
plot(hwy ~ displ, data = mpg)ggplot2 scatterplot
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()ggstatsplot scatterplot
ggstatsplot::ggscatterstats(data = mpg, x = displ, y = hwy,
results.subtitle = FALSE,
marginal = FALSE,
conf.level = NULL )Warning in cbind(predictor, predictor + hwid %o% c(1, -1)): number of rows of
result is not a multiple of vector length (arg 1)
Warning: Computation failed in `stat_smooth()`
Caused by error in `base::data.frame()`:
! arguments imply differing number of rows: 80, 0
By default, ggstatsplot::ggscatterstats adds marginal plots to scatterplots. Setting the argument marginal equal to FALSE removes them.
easystats scatterplot
ggplot(mpg, aes(x = displ, y = hwy)) +
see::geom_point2()ggpubr scatterplot
ggpubr::ggscatter(mpg, x="displ", y = "hwy")You can also make scatter plots with marginal plots using ggpubr::ggscatterhist and setting the parameter margin.plot to either “histogram”, “density”, or “boxplot”.
Regression lines
Base R regression lines
plot(hwy ~ displ, data = mpg)
abline(lm(hwy ~ displ, data = mpg))ggplot2 regression lines
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = lm)`geom_smooth()` using formula = 'y ~ x'
ggstatsplot regression lines
ggstatsplot::ggscatterstats(data = mpg, x = displ, y = hwy,
results.subtitle = FALSE,
marginal = FALSE )ggpubr regression lines
ggpubr::ggscatter(mpg, x="displ", y = "hwy",
add = "reg.line",
conf.int = TRUE)`geom_smooth()` using formula = 'y ~ x'