Introduction

ggplot2, by Hadley Wickham, is an excellent and flexible package for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a ggplot, the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills.

The {ggpubr} package provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots.

Find out more at https://rpkgs.datanovia.com/ggpubr.

Why `{ggpubr}`?

The syntax is simpler compared to ggplot2.
Creates publication ready plots with minimum code.
In the box plots and line plots, it automatically adds P and significance values.
Annotation is satisfying to watch.
You can easily play with colors and labels of the plot.

Install ggpubr in R

#Install required package
install.packages('ggpubr')

Load the Package

# Load the package 
library(tidyverse) 
library(gt)
library(ggpubr)
library(ggsci)
library(gridExtra)

Loading and Exploring Data

# Load data into R 
data <- read.csv("../data/pulse_data.csv")

# Explore first few rows of the data
data %>% 
  head() %>% 
  gt()

Height	Weight	Age	Gender	Smokes	Alcohol	Exercise	Ran	Pulse1	Pulse2	BMI	BMICat
1.73	57	18	Female	No	Yes	Moderate	No	86	88	19.04507	Underweight
1.79	58	19	Female	No	Yes	Moderate	Yes	82	150	18.10181	Underweight
1.67	62	18	Female	No	Yes	High	Yes	96	176	22.23099	Normal
1.95	84	18	Male	No	Yes	High	No	71	73	22.09073	Normal
1.73	64	18	Female	No	Yes	Low	No	90	88	21.38394	Normal
1.84	74	22	Male	No	Yes	Low	Yes	78	141	21.85728	Normal

# Check Data Structure 
glimpse(data)

Rows: 108
Columns: 12
$ Height   <dbl> 1.73, 1.79, 1.67, 1.95, 1.73, 1.84, 1.62, 1.69, 1.64, 1.68, 1…
$ Weight   <dbl> 57, 58, 62, 84, 64, 74, 57, 55, 56, 60, 75, 58, 68, 59, 72, 1…
$ Age      <int> 18, 19, 18, 18, 18, 22, 20, 18, 19, 23, 20, 19, 22, 18, 18, 2…
$ Gender   <chr> "Female", "Female", "Female", "Male", "Female", "Male", "Fema…
$ Smokes   <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
$ Alcohol  <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
$ Exercise <chr> "Moderate", "Moderate", "High", "High", "Low", "Low", "Modera…
$ Ran      <chr> "No", "Yes", "Yes", "No", "No", "Yes", "No", "No", "No", "Yes…
$ Pulse1   <dbl> 86, 82, 96, 71, 90, 78, 68, 71, 68, 88, 76, 74, 70, 78, 69, 7…
$ Pulse2   <dbl> 88, 150, 176, 73, 88, 141, 72, 77, 68, 150, 88, 76, 71, 82, 6…
$ BMI      <dbl> 19.04507, 18.10181, 22.23099, 22.09073, 21.38394, 21.85728, 2…
$ BMICat   <chr> "Underweight", "Underweight", "Normal", "Normal", "Normal", "…

Plot One Variable – X, Continuous

Histogram

4 Main Aspects

Shape: Overall appearance of histogram. Can be symmetric, bell-shaped, left skewed, right skewed, etc.
Center: Mean or Median
Spread: How far our data spreads. Range, Interquartile Range (IQR),standard deviation, variance.
Outliers: Data points that fall far from the bulk of the data

gghistogram(data, x = "BMI")

Warning: Using `bins = 30` by default. Pick better value with the argument
`bins`.

# Change the bins size 
gghistogram(data, x = "BMI", bins = 15)

# Color 
gghistogram(data, x = "BMI", bins = 15, color = "Gender")

# fill 
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender")

# Add statistics 
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender", add = "mean")

# Add rug  
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender", add = "mean", rug = TRUE)

# Add rug  
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender", add = "mean", rug = TRUE, add_density = TRUE)

# Add palette  
gghistogram(data, x = "BMI", bins = 15, color = "Gender", fill="Gender", add = "mean", rug = TRUE, add_density = TRUE, palette = c("#00AFBB", "#E7B800"))

Density Plots

Density plots are another way of getting a quick idea of the distribution of each attribute.
The plots look like an abstracted histogram with a smooth curve drawn through the top of each bin,much like your eye tried to do with the histograms

# Create density plot 
ggdensity(data, x = "Height")

# Separate by Sex 
ggdensity(data, x = "Height", fill="Gender")

# Color by a categorical variable 
ggdensity(data, x = "Height", fill="Gender", color = "Gender")

# Add rug 
ggdensity(data, x = "Height", fill="Gender", color = "Gender", rug = TRUE)

# Add statistics 
ggdensity(data, x = "Height", fill="Gender", color = "Gender", rug = TRUE, add = "median")

# Combine density plots with histogram
gghistogram(data, x = "Height", bins = 15, color = "Gender", fill="Gender", rug = TRUE, add = "mean", add_density = TRUE)

QQ Plot

Q Q Plots (Quantile-Quantile plots) are plots of two quantiles against each other. A quantile is a fraction where certain values fall below that quantile
The purpose of Q Q plots is to find out if two sets of data come from the same distribution
The assumption of normality is an important assumption for many statistical tests; you assume you are sampling from a normally distributed population.
The normal Q Q plot is one way to assess normality.

ggqqplot(data, x = "Weight")

Overlay Normal Density Plot

Overlay normal density plot (with the same mean and SD) to the density distribution of ‘x’.
This is useful for visually inspecting the degree of deviance from normality.

ggdensity(data, x = "BMI", fill = "red") +
  scale_x_continuous(limits = c(-1, 50)) +
  stat_overlay_normal_density(color = "red", linetype = "dashed")

# Color by groups 
ggdensity(data, "BMI", color = "Exercise") +
 stat_overlay_normal_density(aes(color = "Exercise"), linetype = "dashed")

# Color by groups 
ggdensity(data, "BMI", color = "Exercise", facet.by = "Exercise") +
 stat_overlay_normal_density(aes(color = "Exercise"), linetype = "dashed")

Plot Two Vriables - X and Y, Discrete X and Continuous Y

Boxplot

{ggpubr} documentation link - https://rpkgs.datanovia.com/ggpubr/reference/ggboxplot.html
Boxplots provide a graphical picture of the five-number summary: showing center (median), spread (IQR and range), and identifies potential outliers.
Boxplots can hide some shape aspects(histograms do better job at displaying shape)
Side-by-Side Boxplots are useful for comparing two or more sets of observations.

ggboxplot(data, x = "BMICat", y = "Age")

# Change the plot orientation: horizontal
ggboxplot(data, x = "BMICat", y = "Age", orientation = "horiz")

# Set width 
ggboxplot(data, x = "BMICat", y = "Age", width = 0.8)

# Color 
ggboxplot(data, x = "BMICat", y = "Age", width = 0.8, fill="red")

# Color by Sex 
ggboxplot(data, x = "BMICat", y = "Age", color = "Gender")

# Add jitter 
ggboxplot(data, x = "BMICat", y = "Age", color = "Gender", 
          add = "jitter")

# Add shape 
ggboxplot(data, x = "BMICat", y = "Age", color = "Gender", 
          add = "jitter", shape = "BMICat")

Violin Plots

{ggpubr} documentation link - https://rpkgs.datanovia.com/ggpubr/reference/ggviolin.html
Violin plots are similar to box plots, except that they also show the kernel probability density of the data at different values.

ggviolin(data, x = "BMICat", y = "Weight")

# Change the plot orientation: horizontal
ggviolin(data, x = "BMICat", y = "Weight", orientation = "horiz")

# Add summary statistics
# Draw quantiles
ggviolin(data, "BMICat", "Weight", add = "none",
   draw_quantiles = 0.5)

# Add box plot
ggviolin(data, x = "BMICat", y = "Weight",
 add = "boxplot")

# 
ggviolin(data, x = "BMICat", y = "Weight", color = "Gender", 
          add = "jitter", error.plot = "crossbar")

Bar Charts

{ggpubr} documentation link - https://rpkgs.datanovia.com/ggpubr/reference/ggbarplot.html
A barplot shows the relationship between a numeric and a categoric variable. Each entity of the categoric variable is represented as a bar. The size of the bar represents its numeric value.
Barplot is sometimes described as a boring way to visualize information. However it is probably the most efficient way to show this kind of data. Ordering bars and providing good annotation are often necessary
To describe the number of observations in each category of the discrete variable
To visualize estimated error for discrete variables

# Data: Reading Hours 
df <- data.frame(days = c("D1", "D2", "D3"),
   hours = c(4.2, 10, 10.5))
df

  days hours
1   D1   4.2
2   D2  10.0
3   D3  10.5

ggbarplot(df, x = "days", y = "hours")

# Change width
ggbarplot(df, x = "days", y = "hours", width = 0.5)

# Change the plot orientation: horizontal
ggbarplot(df, x = "days", y = "hours", width = 0.5, orientation = "horiz")

# Change the default order of items
ggbarplot(df, x = "days", y = "hours", width = 0.5, orientation = "horiz", order = c("D3", "D2", "D1"))

# Change colors
ggbarplot(df, x = "days", y = "hours", width = 0.5, color = "steelblue",  fill = "steelblue")

# Add label 
ggbarplot(df, x = "days", y = "hours", width = 0.5, color = "steelblue",  fill = "steelblue",  label = TRUE, lab.pos = "in", lab.col = "white")

# Use custom color palette
ggbarplot(df, x = "days", y = "hours", width = 0.5, color = "days",  fill = "steelblue",  label = TRUE, lab.pos = "in", lab.col = "white",  palette = c("#00AFCB", "#E7B800", "#FC4E07"))

# Use custom color palette
ggbarplot(df, x = "days", y = "hours", width = 0.5, color = "days",  fill = "days",  label = TRUE, lab.pos = "in", lab.col = "white",  palette = c("#00AFCB", "#E7B800", "#FC4E07"))

Pie Charts

A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice is proportional to the quantity it represents
https://rpkgs.datanovia.com/ggpubr/reference/ggpie.html

# Data 
df <- data.frame(
 group = c("Male", "Female", "Child"),
  value = c(25, 25, 50))
df

   group value
1   Male    25
2 Female    25
3  Child    50

# Basic pie charts
ggpie(df, "value", label = "group")

# Change color
# Change fill color by group
# set line color to white
# Use custom color palette
 ggpie(df, "value", label = "group",
      fill = "group", color = "white",
       palette = c("#00AFBB", "#E7B800", "#FC4E07") )

# Change label
# Show group names and value as labels
labs <- paste0(df$group, " (", df$value, "%)")
ggpie(df, "value", label = labs,
   fill = "group", color = "white",
   palette = c("#00AFBB", "#E7B800", "#FC4E07"))

# Change the position and font color of labels
ggpie(df, "value", label = labs,
   lab.pos = "in", lab.font = "white",
   fill = "group", color = "white",
   palette = c("#00AFBB", "#E7B800", "#FC4E07"))

Line Plots

https://rpkgs.datanovia.com/ggpubr/reference/ggline.html

ggline(data, x = "BMICat", y = "Weight")

ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender")

ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender")

# Visualize the mean of each group
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender", add = "mean")

# Add error bars: mean_se
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender", add = "mean_se")

# Add error bars: mean_se
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender", add = "mean_se", error.plot = "pointrange")

# Add jitter points and errors (mean_se)
ggline(data, x = "BMICat", y = "Weight", shape = "Gender", linetype = "Gender", color = "Gender", add = c("mean_se", "jitter"))

Brief Introduction of ggpubr - Create Easily Publication Ready Plots in R

Jubayer Hossain, Founder & Instructor, CHIRAL Bangladesh

11 May 2022

Introduction

Why `{ggpubr}`?

Install ggpubr in R

Load the Package

Loading and Exploring Data

Plot One Variable – X, Continuous

Histogram

Density Plots

QQ Plot

Overlay Normal Density Plot

Plot Two Vriables - X and Y, Discrete X and Continuous Y

Boxplot

Violin Plots

Bar Charts

Pie Charts

Line Plots

Brief Introduction of ggpubr - Create Easily Publication Ready Plots in R

Jubayer Hossain, Founder & Instructor, CHIRAL Bangladesh

11 May 2022

Introduction

Why {ggpubr}?

Install ggpubr in R

Load the Package

Loading and Exploring Data

Plot One Variable – X, Continuous

Histogram

Density Plots

QQ Plot

Overlay Normal Density Plot

Plot Two Vriables - X and Y, Discrete X and Continuous Y

Boxplot

Violin Plots

Bar Charts

Pie Charts

Line Plots

Why `{ggpubr}`?