ggplot2, by Hadley Wickham, is an excellent and flexible package for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a ggplot, the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills.
The {ggpubr} package provides some easy-to-use functions
for creating and customizing ‘ggplot2’- based publication ready
plots.
Find out more at https://rpkgs.datanovia.com/ggpubr.
{ggpubr}?#Install required package
install.packages('ggpubr')
# Load the package
library(tidyverse)
library(gt)
library(ggpubr)
library(ggsci)
library(gridExtra)
# Load data into R
data <- read.csv("../data/pulse_data.csv")
# Explore first few rows of the data
data %>%
head() %>%
gt()
| Height | Weight | Age | Gender | Smokes | Alcohol | Exercise | Ran | Pulse1 | Pulse2 | BMI | BMICat |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.73 | 57 | 18 | Female | No | Yes | Moderate | No | 86 | 88 | 19.04507 | Underweight |
| 1.79 | 58 | 19 | Female | No | Yes | Moderate | Yes | 82 | 150 | 18.10181 | Underweight |
| 1.67 | 62 | 18 | Female | No | Yes | High | Yes | 96 | 176 | 22.23099 | Normal |
| 1.95 | 84 | 18 | Male | No | Yes | High | No | 71 | 73 | 22.09073 | Normal |
| 1.73 | 64 | 18 | Female | No | Yes | Low | No | 90 | 88 | 21.38394 | Normal |
| 1.84 | 74 | 22 | Male | No | Yes | Low | Yes | 78 | 141 | 21.85728 | Normal |
# Check Data Structure
glimpse(data)
Rows: 108
Columns: 12
$ Height <dbl> 1.73, 1.79, 1.67, 1.95, 1.73, 1.84, 1.62, 1.69, 1.64, 1.68, 1…
$ Weight <dbl> 57, 58, 62, 84, 64, 74, 57, 55, 56, 60, 75, 58, 68, 59, 72, 1…
$ Age <int> 18, 19, 18, 18, 18, 22, 20, 18, 19, 23, 20, 19, 22, 18, 18, 2…
$ Gender <chr> "Female", "Female", "Female", "Male", "Female", "Male", "Fema…
$ Smokes <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
$ Alcohol <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
$ Exercise <chr> "Moderate", "Moderate", "High", "High", "Low", "Low", "Modera…
$ Ran <chr> "No", "Yes", "Yes", "No", "No", "Yes", "No", "No", "No", "Yes…
$ Pulse1 <dbl> 86, 82, 96, 71, 90, 78, 68, 71, 68, 88, 76, 74, 70, 78, 69, 7…
$ Pulse2 <dbl> 88, 150, 176, 73, 88, 141, 72, 77, 68, 150, 88, 76, 71, 82, 6…
$ BMI <dbl> 19.04507, 18.10181, 22.23099, 22.09073, 21.38394, 21.85728, 2…
$ BMICat <chr> "Underweight", "Underweight", "Normal", "Normal", "Normal", "…
4 Main Aspects
Shape: Overall appearance of histogram. Can be symmetric, bell-shaped, left skewed, right skewed, etc.
Center: Mean or Median
Spread: How far our data spreads. Range, Interquartile Range (IQR),standard deviation, variance.
Outliers: Data points that fall far from the bulk of the data
gghistogram(data, x = "BMI")
Warning: Using `bins = 30` by default. Pick better value with the argument
`bins`.
# Change the bins size
gghistogram(data, x = "BMI", bins = 15)
# Color
gghistogram(data, x = "BMI", bins = 15, color = "Gender")
# fill
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender")
# Add statistics
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender", add = "mean")
# Add rug
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender", add = "mean", rug = TRUE)
# Add rug
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender", add = "mean", rug = TRUE, add_density = TRUE)
# Add palette
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender", add = "mean", rug = TRUE, add_density = TRUE, palette = c("#00AFBB", "#E7B800"))
Density plots are another way of getting a quick idea of the distribution of each attribute.
The plots look like an abstracted histogram with a smooth curve drawn through the top of each bin,much like your eye tried to do with the histograms
# Create density plot
ggdensity(data, x = "Height")
# Separate by Sex
ggdensity(data, x = "Height", fill="Gender")
# Color by a categorical variable
ggdensity(data, x = "Height", fill="Gender", color = "Gender")
# Add rug
ggdensity(data, x = "Height", fill="Gender", color = "Gender", rug = TRUE)
# Add statistics
ggdensity(data, x = "Height", fill="Gender", color = "Gender", rug = TRUE, add = "median")
# Combine density plots with histogram
gghistogram(data, x = "Height", bins = 15, color = "Gender", fill="Gender", rug = TRUE, add = "mean", add_density = TRUE)
ggqqplot(data, x = "Weight")
ggdensity(data, x = "BMI", fill = "red") +
scale_x_continuous(limits = c(-1, 50)) +
stat_overlay_normal_density(color = "red", linetype = "dashed")
# Color by groups
ggdensity(data, "BMI", color = "Exercise") +
stat_overlay_normal_density(aes(color = "Exercise"), linetype = "dashed")
# Color by groups
ggdensity(data, "BMI", color = "Exercise", facet.by = "Exercise") +
stat_overlay_normal_density(aes(color = "Exercise"), linetype = "dashed")
{ggpubr} documentation link - https://rpkgs.datanovia.com/ggpubr/reference/ggboxplot.html
Boxplots provide a graphical picture of the five-number summary: showing center (median), spread (IQR and range), and identifies potential outliers.
Boxplots can hide some shape aspects(histograms do better job at displaying shape)
Side-by-Side Boxplots are useful for comparing two or more sets of observations.
ggboxplot(data, x = "BMICat", y = "Age")
# Change the plot orientation: horizontal
ggboxplot(data, x = "BMICat", y = "Age", orientation = "horiz")
# Set width
ggboxplot(data, x = "BMICat", y = "Age", width = 0.8)
# Color
ggboxplot(data, x = "BMICat", y = "Age", width = 0.8, fill="red")
# Color by Sex
ggboxplot(data, x = "BMICat", y = "Age", color = "Gender")
# Add jitter
ggboxplot(data, x = "BMICat", y = "Age", color = "Gender",
add = "jitter")
# Add shape
ggboxplot(data, x = "BMICat", y = "Age", color = "Gender",
add = "jitter", shape = "BMICat")
{ggpubr} documentation link - https://rpkgs.datanovia.com/ggpubr/reference/ggviolin.htmlggviolin(data, x = "BMICat", y = "Weight")
# Change the plot orientation: horizontal
ggviolin(data, x = "BMICat", y = "Weight", orientation = "horiz")
# Add summary statistics
# Draw quantiles
ggviolin(data, "BMICat", "Weight", add = "none",
draw_quantiles = 0.5)
# Add box plot
ggviolin(data, x = "BMICat", y = "Weight",
add = "boxplot")
#
ggviolin(data, x = "BMICat", y = "Weight", color = "Gender",
add = "jitter", error.plot = "crossbar")
{ggpubr} documentation link - https://rpkgs.datanovia.com/ggpubr/reference/ggbarplot.html# Data: Reading Hours
df <- data.frame(days = c("D1", "D2", "D3"),
hours = c(4.2, 10, 10.5))
df
days hours
1 D1 4.2
2 D2 10.0
3 D3 10.5
ggbarplot(df, x = "days", y = "hours")
# Change width
ggbarplot(df, x = "days", y = "hours", width = 0.5)
# Change the plot orientation: horizontal
ggbarplot(df, x = "days", y = "hours", width = 0.5, orientation = "horiz")
# Change the default order of items
ggbarplot(df, x = "days", y = "hours", width = 0.5, orientation = "horiz", order = c("D3", "D2", "D1"))
# Change colors
ggbarplot(df, x = "days", y = "hours", width = 0.5, color = "steelblue", fill = "steelblue")
# Add label
ggbarplot(df, x = "days", y = "hours", width = 0.5, color = "steelblue", fill = "steelblue", label = TRUE, lab.pos = "in", lab.col = "white")
# Use custom color palette
ggbarplot(df, x = "days", y = "hours", width = 0.5, color = "days", fill = "steelblue", label = TRUE, lab.pos = "in", lab.col = "white", palette = c("#00AFCB", "#E7B800", "#FC4E07"))
# Use custom color palette
ggbarplot(df, x = "days", y = "hours", width = 0.5, color = "days", fill = "days", label = TRUE, lab.pos = "in", lab.col = "white", palette = c("#00AFCB", "#E7B800", "#FC4E07"))
# Data
df <- data.frame(
group = c("Male", "Female", "Child"),
value = c(25, 25, 50))
df
group value
1 Male 25
2 Female 25
3 Child 50
# Basic pie charts
ggpie(df, "value", label = "group")
# Change color
# Change fill color by group
# set line color to white
# Use custom color palette
ggpie(df, "value", label = "group",
fill = "group", color = "white",
palette = c("#00AFBB", "#E7B800", "#FC4E07") )
# Change label
# Show group names and value as labels
labs <- paste0(df$group, " (", df$value, "%)")
ggpie(df, "value", label = labs,
fill = "group", color = "white",
palette = c("#00AFBB", "#E7B800", "#FC4E07"))
# Change the position and font color of labels
ggpie(df, "value", label = labs,
lab.pos = "in", lab.font = "white",
fill = "group", color = "white",
palette = c("#00AFBB", "#E7B800", "#FC4E07"))
ggline(data, x = "BMICat", y = "Weight")
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender")
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender")
# Visualize the mean of each group
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender", add = "mean")
# Add error bars: mean_se
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender", add = "mean_se")
# Add error bars: mean_se
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender", add = "mean_se", error.plot = "pointrange")
# Add jitter points and errors (mean_se)
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender", add = c("mean_se", "jitter"))