The ggplot2 library is an extremely popular visualization package that provides an interface for extremely fine control over graphics for plotting. It is used by a number of of other popular packages in their built-in plotting functions. It provides a “grammar of graphics” that is quite useful to know.
A note about accessibility:
The default colors automatically selected by ggplot2 are not very user-friendly. Colors are chosen by sampling evenly spaced hues on the color wheel. Because of this behavior, all of the colors have similar intensity, which means that they do not work well when printed in gray-scale, and may be difficult to distinguish for users with atypical color vision. There are many resources for selecting color palettes online. Here are just a few:
In this documentation we will be using four palettes generated using the viridis library.
library(ggplot2)
library(viridis)
## Loading required package: viridisLite
?viridis
## starting httpd help server ...
## done
locations.palette <- viridis(3)
smoking.palette <- inferno(2, begin = 0.5, direction = -1)
years.palette <- mako(2, begin = 0.4, end = 0.9, direction = -1)
genes.palette <- plasma(4)
As we go through this plotting section, we will pause several times to allow you to explore. Don’t limit yourself to the visualizations included! Experiment with manipulating each of the elements of the plots to accheive interesting and informative graphics.
birthweight <- read.csv("birthweight.csv")
experiment <- read.csv("experiment.csv")
The basic function of the ggplot2 library is
ggplot().
?ggplot
It is capable of taking a lot of arguments and options, but requires only two: an object (typically a data frame) containing the data, and a list of “aesthetic mappings” that tell R which values to use for the axes, colors, and other graphical elements of the plot.
ggplot(data = experiment, mapping = aes(x = birthweight))
Alone, this produces an empty plot. The ggplot()
function by itself creates the blank canvas upon which the plot will be
drawn. The plot elements are added to this canvas in layers called
“geoms.”
There are over 30 geoms in the ggplot2 library, each of which accepts
a particular set of aesthetic mappings. The geoms inherit the mapping
specified in the original ggplot() function call, and
additional layer-specific aesthetics may be specified within the geom.
Let’s start with one of the simplest geoms, the histogram.
The geom_histogram() function requires, at a minimum,
that a value be provided for x.
ggplot(data = experiment, mapping = aes(x = birthweight)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Setting the “binwidth” parameter within the
geom_histogram() call changes the appearance of the plot
and eliminates the message.
ggplot(data = experiment, mapping = aes(x = birthweight)) +
geom_histogram(binwidth = 1)
ggplot(data = experiment, mapping = aes(x = birthweight)) +
geom_histogram(binwidth = 0.25)
The color (for lines and points) and fill (for areas, like bars) of a geom can add another layer of information to the plot.
ggplot(data = experiment, mapping = aes(x = birthweight, fill = location)) +
geom_histogram(binwidth = 0.25) +
scale_fill_manual(values = locations.palette)
Here the total height of the bar is equal to the number of births at each weight, and the fill denotes the hospital at which the birth occured.
It is not necessary to make a single geom convey all of the information a plot must communicate. Instead, ggplot2 offers users the ability to layer geoms together. As long as they use the same axes, geoms may share a plot.
ggplot(experiment, mapping = aes(x = weeks.gestation,
y = birthweight,
color = smoker)) +
geom_point() +
geom_smooth(alpha = 0.2) +
labs(x = "Gestational age at birth (weeks)",
y = "Birth weight (kg)",
color = "Maternal tobacco use",
caption = "Birthweight increases with gestational age for infants born to both\nsmokers and non-smokers.") +
scale_color_manual(values = smoking.palette) +
theme_bw() +
theme(plot.caption = element_text(hjust = 0))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'