Introduction to ggplot2
Learning Objectives
Understand the code structure for ggplot plots
Understand layers of ggplot objects
Adding labels and captions of ggplot objects
Understand aesthetics of ggplot objects
Know the difference between mapping & setting aesthetics
Let’s load the Palmer Penguins data set and explore some high-level characteristics of the data set.
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex <fct> male, female, female, NA, female, male, female, male…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
Creating our first ggplot
Start by creating a blank canvass by using the penguins data frame.
Next, we add y-axis, bill lengths (mm)
Add a scatter plot using geom_point()
Create a scatter plot, representing each two-dimensional observation
with a point by adding a geom_point()
layer. Additional
layers are added to a ggplot
using the + operator.
Color the points using a variable in the data set
Let’s color the points in our scatter plot based on the species of the penguin being Adelie, Chinstrap, or Gentoo, reproducing the plot below.
Add labs()
to customize the title and subtitle
Next, add another layer to our ggplot
to customize the
title and subtitle using the labs()
function to reproduce
the plot below.
ggplot(data = penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)) +
geom_point() +
labs(title = "Penguins bill lengths by bill depths",
subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins")
Also customize the axis labels using the x and y options in the
labs()
function.
ggplot(data = penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)) +
geom_point() +
labs(title = "Penguins bill lengths by bill depths",
subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins",
x = "Bill depth (mm)",
y = "Bill length (mm)",
color = "species")
Legend title
We can also customize the legend title by using the color option and
a caption for the overall plot using the caption option in the
labs()
function. Customize the legend title to be “Species”
instead of “species”, and modify the plot caption to match our overall
desired plot.
ggplot(data = penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)) +
geom_point() +
labs(title = "Penguins bill lengths by bill depths",
subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins",
x = "Bill depth (mm)",
y = "Bill length (mm)",
color = "species",
caption = "ggplot by @Anangwe")
Change color to accomodate color blindness
Lastly, use a discrete color scale that is inclusive of viewers with
common forms of color blindness by adding a
scale_color_viridis_d()
layer to the plot, creating the
final plot below.
ggplot(data = penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)) +
geom_point() +
labs(title = "Penguins bill lengths by bill depths",
subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins",
x = "Bill depth (mm)",
y = "Bill length (mm)",
color = "species",
caption = "ggplot by @Anangwe",
scale_color_viridis_d())
Mapping and scattering aesthetics
ggplot(data = penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species,
size = body_mass_g,
alpha = 0.50)) +
geom_point() +
labs(title = "Penguins bill lengths by bill depths",
subtitle = "Dimensions for Adelie, Chinstrap and Gentoo penguins",
x = "Bill depth (mm)",
y = "Bill length (mm)",
color = "species",
caption = "ggplot by @Anangwe",
scale_color_viridis_d())
Missing values
To see missing values we can use skim()
function from
the skimr
package. Works best with html.
Name | penguins |
Number of rows | 344 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
factor | 3 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
species | 0 | 1.00 | FALSE | 3 | Ade: 152, Gen: 124, Chi: 68 |
island | 0 | 1.00 | FALSE | 3 | Bis: 168, Dre: 124, Tor: 52 |
sex | 11 | 0.97 | FALSE | 2 | mal: 168, fem: 165 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
bill_length_mm | 2 | 0.99 | 43.92 | 5.46 | 32.1 | 39.23 | 44.45 | 48.5 | 59.6 | ▃▇▇▆▁ |
bill_depth_mm | 2 | 0.99 | 17.15 | 1.97 | 13.1 | 15.60 | 17.30 | 18.7 | 21.5 | ▅▅▇▇▂ |
flipper_length_mm | 2 | 0.99 | 200.92 | 14.06 | 172.0 | 190.00 | 197.00 | 213.0 | 231.0 | ▂▇▃▅▂ |
body_mass_g | 2 | 0.99 | 4201.75 | 801.95 | 2700.0 | 3550.00 | 4050.00 | 4750.0 | 6300.0 | ▃▇▆▃▂ |
year | 0 | 1.00 | 2008.03 | 0.82 | 2007.0 | 2007.00 | 2008.00 | 2009.0 | 2009.0 | ▇▁▇▁▇ |
Create a histogram
Reproduce the histogram of the penguin body masses displayed below
using ggplot2
Create a box plot
Reproduce the box plot of the penguin body masses displayed below
using ggplot2
. Hint: to suppress the y-axis text and ticks,
add a
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
layer to the plot.
ggplot(data = penguins,
aes(x = body_mass_g,
y = species,
color = species)) +
geom_boxplot() +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())