ggplot2 1

2022-09-18

Setup

library(tidyverse)

Grammar of Graphics

A basic graph has three elements.

  1. The data, a dataframe
  2. A mapping to describe the relationships between visual attributes and variables in the dataframe.
  3. A geom to define the type of graphic.

We will use the builtin dataframe iris for some basic example. Run glimpse to look at the contents.

glimpse(iris)
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…

Starting

Create a very simple ggplot object and try to display it.

p = ggplot(data = iris)
p

No complaint, but no results.

Add a simple mapping

p2 = ggplot(data = iris,mapping = (aes(x = Sepal.Length)))
p2

We can see that the variable Sepal.Length has been assigned to the horizontal axis, but we still don’t have a usable result. We need to add a geom to ask for some kind of graph. We’ll start with a histogram and build on p2, but we’ll keep p2 to try other geoms.

p2a = p2 + geom_histogram()
p2a 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Density

A density plot is a good alternative to a histogram.

p2b = p2 + geom_density()
p2b 

Adding a Rug

We can add on a second geom, a rug, to either of these. geom_rug() places a “bristle” where each actual value occurs.

p2a_rug = p2a + geom_rug()
p2a_rug
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p2b_rug = p2b + geom_rug()
p2b_rug

Exercise

Play with the bins argument for the histogram you just developed. Pick a value which seems good to you.

Solution

p2a = p2 + geom_histogram(bins = 20)
p2a 

p2a = p2 + geom_histogram(bins = 40)
p2a 

Exercise

Play with the adjust parameter of the density geom to get a value you like. The default value is 1. Smaller values give you more detail while larger values give you a smoother plot.

Solution

p2b = p2 + geom_density(adjust = .5)
p2b 

p2b = p2 + geom_density(adjust = 2)
p2b 

A Relationship

We can look at the relationship between Sepal.Length and the categorical Species in a few different ways. One simple way is to include a mapping of color to Species.

Exercise

Do this for the histogram. Use an aes() in the geom. Outside of the aes, set fill = “white”.

Solution

p2a = p2 + geom_histogram(aes(color = Species),fill = "white")
p2a 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Exercise

Do the same for the density.

Solution

p2b = p2 + geom_density(aes(color = Species),fill = "white")
p2b 

Which of these graphs is easier to interpret?

Exercise

An alternative is to create separate graphs using facet_wrap().

Do this for the histogram, setting ncol = 1.

Solution

p2a = p2 + geom_histogram(aes(color = Species),fill = "white") +
facet_wrap(~Species, ncol = 1)
p2a 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Another Relationship

Create a scatterplot of Sepal.Width and Sepal.Length.

Solution

p3 = ggplot(data = iris,aes(x = Sepal.Width, y = Sepal.Length))

p3a = p3 + geom_point()
p3a

Exercise

It looks like there may be overplotting. Try geom_jitter() instead.

Solution

p3 = ggplot(data = iris,aes(x = Sepal.Width, y = Sepal.Length))

p3a = p3 + geom_jitter()
p3a

Exercise

Let’s throw Species into the mix. Add it with a mapping to color in the aes in geom_jitter().

Solution

p3 = ggplot(data = iris,aes(x = Sepal.Width, y = Sepal.Length))

p3a = p3 + geom_jitter(aes(color = Species))
p3a

Use facet_wrap()

In addition to color use facet_wrap().

p3a + facet_wrap(~Species)

Exercise

Do that with ncol = 1.

p3a + facet_wrap(~Species,ncol = 1)