The Idea

The ggplot2 package is a package developed by Hadley Wickham. It is widely trusted and is updated regularly. Thus, it is an excellent tool for your data science tool box. The ggplot2 website tells us that “ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.”

A good reference for this material is the data transformation chapter in R for data science. A cheat sheet is also available. A very helpful Graph Gallery is available, too. A sheet that helps with the design elements outside of the data is here. Finally, the Top 50 ggplot visualizations is also a good site to visit.

ggplot2

library(tidyverse)
library(ggplot2)

In most cases, we begin with a ggplot() command and supply it a dataset and aesthetic mapping. After that, we add layers such as a geom_point() or a geom_histogram(), scales such as scale_colour_brewer(), faceting specifications like facet_wrap(), and coordinate systems such as coord_flip(). Here’s an example using the mpg dataset.

dat <- mpg
head(dat)
## # A tibble: 6 x 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa~
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa~
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa~
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa~
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa~
## 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa~
ggplot(data = dat) +
    geom_point(mapping = aes(x = displ, y = hwy))

The plot quickly shows a negative relationship between engine size (displ) and fuel efficiency (hwy).

With ggplot2, you begin a plot with the ggplot() function. This creates a coordinate system to which you can add layers. The first argument of ggplot() is the dataset to use in the graph. So ggplot(data = MyDataset) creates an empty graph.

ggplot(data = dat)

To complete the graph, add some layers. The function geom_point() adds a layer of points to your plot, which creates a scatterplot. ggplot2 comes with many geom functions that each add a different type of layer to a plot.

Each geom function in ggplot2 takes a mapping argument. This defines how variables in your dataset are mapped to visual properties. The mapping argument is always paired with aes(), and the x and y arguments of aes() specify which variables to map to the x and y axes. ggplot2 looks for the mapped variables in the data argument, in this case, mpg.

In essence, a piece of code such as this will do the trick.

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

Scaling and Aesthetics

Additional information can be added to the graph by adding to the aes command. Here, we add a colour that indicates the type of car. This process is known as scaling. ggplot2 will also add a legend that explains which levels correspond to which values. We can do this with shapes, shade levels or size of dot. However, we warn of two things: 1. ggplot has only 6 shapes and so if there are more than 6 classes, the remaining classes will go unmapped. 2. It is not a good idea to assign size or shade levels to categorical variables as those indcate a hierarchy that doesn’t actually exist.

ggplot(data = dat) +
    geom_point(mapping = aes(x = displ, y = hwy, colour = class))

Of course, all of these aesthetics need not correspond to a variable.

ggplot(data = dat) +
    geom_point(mapping = aes(x = displ, y = hwy), colour = "blue", shape = 2)

Facets

One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets. A facet is a subplot that each display one subset of the data.

To facet your plot by a single variable, use facet_wrap(). The first argument of facet_wrap() should be a formula, which you create with ~ followed by a variable name. The variable that you pass to facet_wrap() should be discrete.

ggplot(data = dat) +
    geom_point(mapping = aes(x = displ, y = hwy)) +
    facet_grid(drv ~ cyl)

Changing the geom()

A geom is the geometric object that a plot uses to represent data. To change the geom in your plot, change the geom function that you add to ggplot(). Here are some examples.

ggplot(data = dat) + 
  geom_point(mapping = aes(x = displ, y = hwy))

ggplot(data = dat) + 
  geom_smooth(mapping = aes(x = displ, y = hwy))

ggplot(data = dat) + 
    geom_smooth(mapping = aes(x = displ, y = hwy, colour=drv))

Indeed, one can add multiple geoms to the same graph. Here’s an example.

ggplot(data = dat) + 
    geom_smooth(mapping = aes(x = displ, y = hwy, colour=drv)) +
    geom_point(mapping = aes(x=displ, y=hwy, colour=drv))

In addition, we can simplify things by putting the aesthetic mapping that we want to use in each of the geoms in the ggplot command. Thus, we do not have to repete it in subsequent geom layers.

ggplot(data = dat, mapping = aes(x=displ, y=hwy)) + 
    geom_point(mapping = aes(colour=class)) +
    geom_smooth()

ggplot may seem complicated; however, the bones of it are quite straight forward. Please see the link to my interactive ggplot graph guide.

Want to know more?

The R Graphics Cookbook describes the theoretical underpinnings of ggplot2. It shows how all of the pieces fit together. However, it is very complicated and is best for those who are already fairly proficient at ggplot2. The CRAN documentation can also be useful; however, it is nearly 300 pages and unless you know what you are looking for, it is highly difficult to use.

Citations

“Cookbook for R.” Accessed June 30, 2021. Available here.

“Create Elegant Data Visualisations Using the Grammar of Graphics.” Accessed June 30, 2021. Available here.

“R for Data Science.” Accessed June 30, 2021. Available here.

Wickham, Hadley et al. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics (version 3.3.5), 2021. Available here.