We’re going to begin by doing some very basic plots and data processing using base R tools, and then show how to do them better, faster, and more visually attractive with tidyverse tools.
As always to begin we must load some libraries we will be using. If we do not load them, R will not be able to find the functions contained in these libraries (unless we use the “::” format). Right now we’re just using a few packages.
We also set up some defaults here. We want to see the commands we are
issuing in our output, because this is for learning purposes. If we were
writing a paper, we would not want to see the commands, so we would set
echo = FALSE
.
ggplot2
The quickest, but not necessarily the most efficient way to start, is
by using the plot
function from base R.
Notice how R Markdown manages the output when printing an entire
dataset. If you executed this in the console, it would print all 1,704
rows, which is usually undesirable. In such cases, you might prefer
using the head()
function to display only the first few
rows.
head(gapminder,n=20)
## # A tibble: 20 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## 11 Afghanistan Asia 2002 42.1 25268405 727.
## 12 Afghanistan Asia 2007 43.8 31889923 975.
## 13 Albania Europe 1952 55.2 1282697 1601.
## 14 Albania Europe 1957 59.3 1476505 1942.
## 15 Albania Europe 1962 64.8 1728137 2313.
## 16 Albania Europe 1967 66.2 1984060 2760.
## 17 Albania Europe 1972 67.7 2263554 3313.
## 18 Albania Europe 1977 68.9 2509048 3533.
## 19 Albania Europe 1982 70.4 2780097 3631.
## 20 Albania Europe 1987 72 3075321 3739.
Let’s use base R plots now. We access individual variables from the main data frame.
plot(gapminder$gdpPercap,gapminder$lifeExp)
It works, but there are several drawbacks: 1) the default settings are often odd (like using scientific notation), 2) the axis labels and titles are unusual, 3) the appearance is quite unattractive, and 4) customizing it requires remembering a complex list of commands.
Now, let’s try creating a histogram.
hist(iris$Sepal.Length, breaks = 10)
It’s adequate, but far from perfect. The default settings are subpar,
and the appearance leaves much to be desired. The default x-axis and
title even include strange $
signs.
ggplot2
Let’s quickly switch to ggplot2
. We’ll begin with
qplot
, which allows us to create a much more visually
appealing plot compared to base R graphics.
qplot(data=gapminder,x=gdpPercap,y=lifeExp)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
qplot(data=iris,x=Sepal.Length,geom="histogram",bins=10)
That’s the last time we’ll use qplot
. It’s much
prettier, but 1) it is now officially deprecated, and 2) doesn’t let us
behold the power of this fully operational battlestation, I mean
ggplot2.
Let’s try to make 2 simple plots.
p.scatter <- ggplot(data=gapminder,
mapping=aes(x=gdpPercap,y=lifeExp)) +
geom_point()
p.hist <- ggplot(data=iris,
mapping=aes(x=Sepal.Length)) +
geom_histogram(bins=10)
p.scatter
p.hist
You’ve just created your first plot with the full capabilities of ggplot2!
You might wonder why this is preferable to qplot
, which
seemed to do everything more straightforwardly.
In essence, plot
and qplot
are single-line
imperative commands. Modifying a plot means altering complex options
within that line, and creating a different plot type requires an
entirely new command. Even then, certain customizations are nearly
impossible.
ggplot
provides a comprehensive solution by enabling you
to construct visualizations logically, layer by layer. These layers can
be combined like Lego pieces, allowing you to create and customize plots
with ease.
In this simple example, the ggplot
call does the
following:
Specifies the dataset to be plotted.
Defines the aesthetic mapping, linking variables to visual elements. Here, GDP is mapped to the X axis and life expectancy to the Y axis.
Adds a single layer or geom, which in this case is a scatterplot.
Relies on sensible and appropriate defaults for all unspecified settings.
ggplot2 has a set of defaults that are usually sensible. The creators of this package and many of its complementary packages have very good taste!
But while these are often a good starting point, they can (and should) be tweaked to fit specific needs and preferences.
ggplot2
is built on the principles of the “grammar of
graphics,” which defines a structured approach to creating
visualizations. In this system, data are mapped to geometric objects
that have aesthetic attributes such as color, position, and size. Data
can be transformed for more convenient visualization, scales can be
adjusted, and the results are projected onto a coordinate
system—typically Cartesian—forming a cohesive “graphics sentence.”
In ggplot2
, mappings are set up using
ggplot()
and aes()
. These functions establish
the relationships between your data and the visual aspects of your plot,
known as aesthetic mappings.
Geometric objects, or geoms, are specified in code with
geom_xxx()
, where xxx
represents the type of
plot, such as point
, line
, or
hist
(histogram). Each geom represents a layer of the
plot.
Layers in a ggplot2
plot are combined using the
+
operator. This allows for adding multiple layers
in the order you specify. The simplest plot includes two
components: the ggplot()
function and a
geom_xxx()
function. For example a scatterplot
with var1 on the x-axis and var2 on the y:
#ggplot(data = data, aes(x=var1, y=var2)) + geom_point()
Let’s save this plot in the Plots subfolder, with a set width and height. Notice we have to issue a save command for each file type we want.
We’ll want to save both PNG and PDF files. The PNG file is useful for display and sharing. The PDF file is best for generating PDFs for ultimate use as presentations and papers that can be resized.
ggsave(filename = "Plots/ch2_1.png",width=8,height=5)
ggsave(filename = "Plots/ch2_1.pdf",width=8,height=5)
ggsave(p.scatter, filename = "Plots/ch2_scatter.pdf",width=8,height=5)
But do we really need to save if we commit to R Markdown? No. But it’s useful if you want to use these plots in some other program: Word, Powerpoint, LaTeX, share on social media, etc.
Click the Knit button to create a knitted document fusing code, output, and notes together.