ggplot2 Intro

2026-06-22

Setup

Make the tidyverse available.

Solution

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Grammar of Graphics

A basic graph has three elements.

  1. The data, a dataframe
  2. A mapping to describe the relationships between visual attributes and variables in the dataframe.
  3. A geom to define the type of graphic.

We will use the builtin dataframe iris for some basic example. Run glimpse to look at the contents.

Solution

glimpse(iris)
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…

Starting

Create a very simple ggplot object,p, and try to display it. In the call to ggplot, just specify that the data is the iris dataframe. Then display the object by typing its name

Solution

p = ggplot(data = iris)
p

No complaint, but no results.

Add a simple mapping

Creat an ggplot object p2. Specify that Sepal.length is mapped to x.

Display p2.

Solution

p2 = ggplot(data = iris,mapping = (aes(x = Sepal.Length)))
p2

We can see that the variable Sepal.Length has been assigned to the horizontal axis, but we still don’t have a usable result. We need to add a geom to ask for some kind of graph. We’ll start with a histogram and build on p2, but we’ll keep p2 to try other geoms.

Create p2a by adding geom_histogram().

Display p2a.

Solution

p2a = p2 + geom_histogram()
p2a 
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Density

A density plot is a good alternative to a histogram. Create p2b using geom_density().

Solution

p2b = p2 + geom_density()
p2b 

Adding a Rug

We can add on a second geom, a rug, to either of these. geom_rug() places a “bristle” where each actual value occurs.

Do this to create and display p2a_rug and p2b_rug.

Solution

p2a_rug = p2a + geom_rug()
p2a_rug
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

p2b_rug = p2b + geom_rug()
p2b_rug

Exercise

Play with the bins argument for the histogram you just developed. Pick a value which seems good to you.

Solution

p2a = p2 + geom_histogram(bins = 20)
p2a 

p2a = p2 + geom_histogram(bins = 40)
p2a 

Exercise

Play with the adjust parameter of the density geom with either p2a or p2b_rug to get a value you like. The default value is 1. Smaller values give you more detail while larger values give you a smoother plot.

Solution

p2b = p2 + geom_density(adjust = .5)
p2b 

p2b = p2 + geom_density(adjust = 2)
p2b 

A Relationship

We can look at the relationship between Sepal.Length and the categorical Species in a few different ways. One simple way is to include a mapping of color to Species.

Exercise

Do this for the histogram. Use an aes() in the geom. Outside of the aes, set fill = “white”.

Solution

p2a = p2 + geom_histogram(aes(color = Species),fill = "white")
p2a 
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Exercise

Do the same for the density.

Solution

p2b = p2 + geom_density(aes(color = Species),fill = "white")
p2b 

Which of these graphs is easier to interpret?

Exercise

An alternative is to create separate graphs using facet_wrap().

Do this for the histogram and p2a or p2b, setting ncol = 1.

Solution

p2a = p2 + geom_histogram(aes(color = Species),fill = "white") +
facet_wrap(~Species, ncol = 1)
p2a 
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Another Relationship

Create a scatterplot, p3, of Sepal.Width and Sepal.Length.

Solution

p3 = ggplot(data = iris,aes(x = Sepal.Width, y = Sepal.Length))

p3a = p3 + geom_point()
p3a

Exercise

It looks like there may be overplotting. Try geom_jitter() instead.

Solution

p3 = ggplot(data = iris,aes(x = Sepal.Width, y = Sepal.Length))

p3a = p3 + geom_jitter()
p3a

Exercise

Let’s throw Species into the mix. Add it with a mapping to color in the aes in geom_jitter().

Solution

p3 = ggplot(data = iris,aes(x = Sepal.Width, y = Sepal.Length))

p3a = p3 + geom_jitter(aes(color = Species))
p3a

Use facet_wrap()

In addition to color use facet_wrap().

p3a + facet_wrap(~Species)

Exercise

Do that with ncol = 1.

Solution

p3a + facet_wrap(~Species,ncol = 1)