Introduction to ggplot2

Learning Objectives

  • Understand the code structure for ggplot plots

  • Understand layers of ggplot objects

  • Adding labels and captions of ggplot objects

  • Understand aesthetics of ggplot objects

  • Know the difference between mapping & setting aesthetics

Let’s load some packages to start with

library(palmerpenguins)
library(tidyverse)
library(knitr)

Let’s load the Palmer Penguins data set and explore some high-level characteristics of the data set.

data(penguins, package = "palmerpenguins")

glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex               <fct> male, female, female, NA, female, male, female, male…
## $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Creating our first ggplot

Start by creating a blank canvass by using the penguins data frame.

ggplot(data = penguins)

Map bill depth to the x-axis

ggplot(data = penguins,
       aes(x = bill_depth_mm))

Next, we add y-axis, bill lengths (mm)

ggplot(data = penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm))

Add a scatter plot using geom_point()

Create a scatter plot, representing each two-dimensional observation with a point by adding a geom_point() layer. Additional layers are added to a ggplot using the + operator.

ggplot(data = penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm)) + 
  geom_point()

Color the points using a variable in the data set

Let’s color the points in our scatter plot based on the species of the penguin being Adelie, Chinstrap, or Gentoo, reproducing the plot below.

ggplot(data = penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm,
           color = species)) + 
  geom_point()

Add labs() to customize the title and subtitle

Next, add another layer to our ggplot to customize the title and subtitle using the labs() function to reproduce the plot below.

ggplot(data = penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm,
           color = species)) + 
  geom_point() + 
  labs(title = "Penguins bill lengths by bill depths",
       subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins")

Also customize the axis labels using the x and y options in the labs() function.

ggplot(data = penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm,
           color = species)) + 
  geom_point() + 
  labs(title = "Penguins bill lengths by bill depths",
       subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins",
       x = "Bill depth (mm)",
       y = "Bill length (mm)",
       color = "species")

Legend title

We can also customize the legend title by using the color option and a caption for the overall plot using the caption option in the labs() function. Customize the legend title to be “Species” instead of “species”, and modify the plot caption to match our overall desired plot.

ggplot(data = penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm,
           color = species)) + 
  geom_point() + 
  labs(title = "Penguins bill lengths by bill depths",
       subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins",
       x = "Bill depth (mm)",
       y = "Bill length (mm)",
       color = "species",
       caption = "ggplot by @Anangwe")

Change color to accomodate color blindness

Lastly, use a discrete color scale that is inclusive of viewers with common forms of color blindness by adding a scale_color_viridis_d() layer to the plot, creating the final plot below.

ggplot(data = penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm,
           color = species)) + 
  geom_point() + 
  labs(title = "Penguins bill lengths by bill depths",
       subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins",
       x = "Bill depth (mm)",
       y = "Bill length (mm)",
       color = "species",
       caption = "ggplot by @Anangwe",
       scale_color_viridis_d())

Mapping and scattering aesthetics

ggplot(data = penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm,
           color = species,
           size = body_mass_g,
           alpha = 0.50)) + 
  geom_point() + 
  labs(title = "Penguins bill lengths by bill depths",
       subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins",
       x = "Bill depth (mm)",
       y = "Bill length (mm)",
       color = "species",
       caption = "ggplot by @Anangwe",
       scale_color_viridis_d())

Missing values

To see missing values we can use skim() function from the skimr package. Works best with html.

library(skimr)
skim(penguins)
Data summary
Name penguins
Number of rows 344
Number of columns 8
_______________________
Column type frequency:
factor 3
numeric 5
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
species 0 1.00 FALSE 3 Ade: 152, Gen: 124, Chi: 68
island 0 1.00 FALSE 3 Bis: 168, Dre: 124, Tor: 52
sex 11 0.97 FALSE 2 mal: 168, fem: 165

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
bill_length_mm 2 0.99 43.92 5.46 32.1 39.23 44.45 48.5 59.6 ▃▇▇▆▁
bill_depth_mm 2 0.99 17.15 1.97 13.1 15.60 17.30 18.7 21.5 ▅▅▇▇▂
flipper_length_mm 2 0.99 200.92 14.06 172.0 190.00 197.00 213.0 231.0 ▂▇▃▅▂
body_mass_g 2 0.99 4201.75 801.95 2700.0 3550.00 4050.00 4750.0 6300.0 ▃▇▆▃▂
year 0 1.00 2008.03 0.82 2007.0 2007.00 2008.00 2009.0 2009.0 ▇▁▇▁▇

Create a histogram

Reproduce the histogram of the penguin body masses displayed below using ggplot2

ggplot(data = penguins,
       aes(x = body_mass_g)) +
  geom_histogram()

Create a box plot

Reproduce the box plot of the penguin body masses displayed below using ggplot2. Hint: to suppress the y-axis text and ticks, add a theme(axis.text.y = element_blank(), axis.ticks.y = element_blank()) layer to the plot.

ggplot(data = penguins,
       aes(x = body_mass_g,
           y = species,
           color = species)) +
  geom_boxplot() +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())