Getting Started

1. Introduction

1.1 Welcome to ggplot2

  • ggplot2 is an R package for producing statistical, or data, graphics. Unlike most other graphics packages, ggplot2 has an underlying grammar, based on the Grammar of Graphics, that allows you to compose graphs by combining independent components.

  • ggplot2 is designed to work iteratively. You start with a layer that shows the raw data. Then you add layers of annotations and statistical summaries.

1.2 What is the grammar of graphics?

  • In brief, the grammar tells us that a graphic maps the data to the aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars).

  • All plots are composed of the data, the information you want to visualise, and a mapping, the description of how the data’s variables are mapped to aesthetic attributes. There are five mapping components:

A layer is a collection of geometric elements and statistical transformations. Geometric elements, geoms for short, represent what you actually see in the plot: points, lines, polygons, etc. Statistical transformations, stats for short, summarise the data: for example, binning and counting observations to create a histogram, or fitting a linear model.

Scales map values in the data space to values in the aesthetic space. This includes the use of colour, shape or size. Scales also draw the legend and axes, which make it possible to read the original data values from the plot (an inverse mapping).

A coord, or coordinate system, describes how data coordinates are mapped to the plane of the graphic. It also provides axes and gridlines to help read the graph. We normally use the Cartesian coordinate system, but a number of others are available, including polar coordinates and map projections.

A facet specifies how to break up and display subsets of data as small multiples. This is also known as conditioning or latticing/trellising.

A theme controls the finer points of display, like the font size and background colour. While the defaults in ggplot2 have been chosen with care, you may need to consult other references to create an attractive plot.

1.5 Prerequisites

  • R packages: “colorBlindness”, “directlabels”, “dplyr”, “gameofthrones”, “ggforce”,“gghighlight”, “ggnewscale”, “ggplot2”, “ggraph”, “ggrepel”, “ggtext”, “ggthemes”, “hexbin”, “Hmisc”, “mapproj”, “maps”, “munsell”, “ozmaps”, “paletteer”, “patchwork”, “rmapshaper”, “scico”, “seriation”, “sf”, “stars”, “tidygraph”, “tidyr”, “wesanderson”

2. First Steps

2.1 Introduction

  • You’ll learn the basics of ggplot() along with some useful “recipes” to make the most important plots. ggplot() allows you to make complex plots with just a few lines of code because it’s based on a rich underlying theory, the grammar of graphics.

2.2 Fuel economy data

: In this chapter, we’ll mostly use one data set that’s bundled with ggplot2: mpg. It includes information about the fuel economy of popular car models in 1999 and 2008

library(ggplot2)

mpg
## # A tibble: 234 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
##  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
##  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
##  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
##  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
##  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
##  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
## 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
## # … with 224 more rows

This dataset suggests many interesting questions.

How are engine size and fuel economy related?

Do certain manufacturers care more about fuel economy than others?

Has fuel economy improved in the last ten years?

2.3 Key components

: Every ggplot2 plot has three key components:

  1. data
  1. A set of aesthetic mappings between variables in the data and visual properties
  1. At least one layer which describes how to rendoer each observation.

Here’s a simple example

ggplot(mpg, aes(x = displ, y = hwy)) + 
  geom_point() 

Pay attention to the structure of this function call: data and aesthetic mappings are supplied in ggplot(), then layers are added on with +.

2.4 Colour, size, shape and other aesthetic attributes

: To add additional variables to a plot, we can use other aesthetics like colour, shape, and size. These work in the same way as the x and y aesthetics, and are added into the call to aes():

aes(displ, hwy, colour = class)

aes(displ, hwy, shape = drv)

aes(displ, hwy, size = cyl)

There is one scale for each aesthetic mapping in a plot. The scale is also responsible for creating a guide, an axis or legend, that allows you to read the plot, converting aesthetic values back into data values.

ggplot(mpg, aes(displ, hwy, colour = class)) + 
  geom_point()

If you want to set an aesthetic to a fixed value, without scaling it, do so in the individual layer outside of aes(). Compare the following two plots:

ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = 'blue'))

ggplot(mpg, aes(displ, hwy)) + geom_point(colour = 'blue')

Different types of aesthetic attributes work better with different types of variables. For example, colour and shape work well with categorical variables, while size works well for continuous variables.

2.5 Faceting