In this Tutorial, you should learn how to:
The data I’ll be stealing and the figure I will be reproducing is taken from the 10th edition of the textbook How Humans Evolved, by Rob Boyd, Joan Silk, and Kevin Langergraber (2023). The figure is a line graph of age-specific fertility rates (ASFR) by sex as observed in the !Kung San people of the Kalahari Desert. ASFR represents the probability of an individual giving birth per year. In studies of smaller populations, such data can be quite noisy, as you might only have a few observations of individuals in certain ages. For this reason is customary to clump together people into five year age groups, and calculate an ASFR for each five-year age group. This was the case in the study of the !Kung. The text uses these data to illustrate an example of “natural fertility” in human populations that subsist from hunting and gathering and do not use modern contraceptives.
R Markdown has great options for both generating your images on the fly (at knit-time) and embedding existing images. I’ll describe here options for displaying existing pre-rendered images, like charts, graphs, and illustrations. R can deal with many image types, I find myself most frequently using .PNG, .JPG, and .TIF images. It is good to save them at a resolution around 300 dpi, but the code can handle any resolution you choose.
The working directory for locating images is the very same directory where your .Rmd document is saved. Place your images in the same folder and life will be easy. If you are inclined to organize, you can also save them in a folder that is within the folder where your .Rmd file is saved, say, in a folder named ‘figures’ or ‘images’. If you do that, you just have to add the directory to each of your references to the image file name. But the base, working directory where images will be first searched for is the folder where the .Rmd file is saved. In this tutorial, that is where the images are located.
The easiest way to include images is by inserting text like the following:

Which produces this:
That is the simplest way to include an image. The ![]()
syntax is Markdown syntax for including an image. The () is
where you put the path to the image file. The path is relative to the
location of the .Rmd file. The ! is what tells the Markdown
processor that this is an image. The [] is where you can
put a caption for the image. The caption is optional. The
![]() syntax is not R code, it is Markdown syntax.
The way I typically include images is by using the knitr
package. This package has a function called
include_graphics that is invoked inside an R code chunk.
The function is called with the path to the image file as an argument.
The function can also be used to scale the image, which is very useful
for controlling the size of the image in the document.
Using the function include_graphics in the
knitr package, we can provide the out.width
argument to the chunk header and that will control the size of the
image. The out.width argument takes a percentage as a
value. For example, to make the image 50% smaller, you can do the
following:
Which produces this:
You can also control where in the page the figure is placed. If you
would like the image to be centered horizontally in the page, you can
provide the chunk option fig.align="center":
Which produces this:
The data I extracted from the figure using the WebplotDigitizer is shown below, as it appears when opened using Excel. Including an image like this is not something you need to do for your homework, but I wanted to show you how the data arrived in my case, just so the data processing steps below make sense.
There are some issues with the data that arrived from WebplotDigitizer. As you can see, the data are “jagged” in the sense that the age groups are not consistent. And there is the weird first row that I will need to remove. The data are also in a wide format, which is not ideal for plotting with ggplot2. My plan is to first process female data, then the male data, and then combine them in a way that is suited for plotting using ggplot2.
# Load the data
d <- read.csv("kung_asfr.csv")
#rename the columns
names(d) <- c("male_age", "male_asfr", "female_age", "female_asfr")
#remove the first row
d <- d[-1,]
#convert male_age to numeric and then round to integer
d$male_age <- round(as.numeric(d$male_age),0)
d$female_age <- round(as.numeric(d$female_age),0)
#round the asfr to 2 decimal places
d$male_asfr <- round(as.numeric(d$male_asfr),2)
d$female_asfr <- round(as.numeric(d$female_asfr),2)
# make the data tidy and narrow, which is what ggplot2 will need.
# This means there should not be two columns for age, but only 1.
# There same goes for ASFR.
# We will also need to add a column for sex.
male_ages <- d$male_age
#remove NA values that arise from the jagged data frame
male_ages <- male_ages[!is.na(male_ages)]
female_ages <- d$female_age
male_asfr <- d$male_asfr
#remove NA values
male_asfr <- male_asfr[!is.na(male_asfr)]
female_asfr <- d$female_asfr
#combine the data
age <- c(male_ages, female_ages)
asfr <- c(male_asfr, female_asfr)
sex <- c(rep("Men",length(male_ages)), rep("Women",length(female_ages)))
tidy_d <- data.frame(age, asfr, sex)
#wrangling finished
Here is what the data look like in tidy format:
tidy_d[1:17,]
## age asfr sex
## 1 20 0.06 Men
## 2 25 0.11 Men
## 3 30 0.15 Men
## 4 35 0.14 Men
## 5 40 0.14 Men
## 6 45 0.09 Men
## 7 50 0.04 Men
## 8 55 0.00 Men
## 9 15 0.06 Women
## 10 20 0.21 Women
## 11 25 0.24 Women
## 12 30 0.18 Women
## 13 35 0.11 Women
## 14 40 0.04 Women
## 15 45 0.01 Women
## 16 50 0.00 Women
## 17 55 0.00 Women
We will first make a “quick and dirty” plot and then systematically work through the parts of the plotting code that need to be adjusted to match the original published figure.
Now that the data are in a tidy format, we can plot them using ggplot2. I will first go for a quick and dirty plot, and then I will make it look better and get nit-picky with the details so that it matches the figure in the book. This is the quick and dirty plot:
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line() +
geom_point() +
labs(x="Age (years)", y="Age specific fertility rate")
Theme elements are the way to control elements of the plot that are
not linked to data, but represent artistic choices about how the plot
should look. Most of the work we are doing here will be inside calls to
the theme function.
I will remove the gridlines, change the axes so tick marks and labels appear correct, add a border to the plot, make the lines thicker, the points bigger, change the legend title and symbols, change the aspect ratio, change some font sizes, and change the colors. All this work might feel tedious. But trust me! This is a valuable lesson. When the time comes for you to make an figure of your own creation just right you will also need to delve into this level of detail and learn the finer points about how ggplot2 works. You can use this tutorial as a guide for your homework and study of these issues.
Here we need to add labels at every multiple of 5, not just multiples of 10.
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line() +
geom_point() +
labs(x="Age (years)", y="Age specific fertility rate") +
theme_minimal() +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56))
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line() +
geom_point() +
labs(x="Age (years)", y="Age specific fertility rate") +
theme_minimal() +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56)) +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()
)
Because the thickness of the lines and the width of the points are
related to the geom_line and geom_point
elements, we make this adjustment by referencing arguments supplied to
those geoms.
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
)
We can do this by adding the following line to the theme
function:
axis.line = element_line(colour = "black", linewidth = 1).
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black", linewidth = 1)
)
The tick marks are a bit weird because they extend into the plot area
(by default in ggplot2 they extend away from the plot area, towards the
margins). We can achieve this by specifying negative numbers of the
axis.ticks.length arguments. We also need to remove any
margins beyond the axes limits (e.g. any extra space on the x axis to
the left of 15 or to the right of 55). This is achieved by changing the
expand argument in the scale_x_continuous and
scale_y_continuous functions.
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(0, 0.25, .05), limits = c(0, .26), expand = c(0, 0)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black", linewidth = 1),
axis.ticks.x = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.x = unit(-3, "mm"), # Negative length to extend upward
axis.ticks.y = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.y = unit(-3, "mm"), # Negative length to extend upward
)
This is achieved by adding the following line to the
theme function:
legend.title = element_blank().
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(0, 0.25, .05), limits = c(0, .27), expand = c(0, 0)) +
theme_minimal()+
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black", linewidth = 1),
axis.ticks.x = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.x = unit(-3, "mm"), # Negative length to extend upward
axis.ticks.y = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.y = unit(-3, "mm"), # Negative length to extend upward
legend.title = element_blank(),
)
As you see in the legend, there is both a line and a point mapped to the variable “Sex”. We only want the point to be in the legend. To make this change, we need to add the argument show.legend = FALSE to the geom_line() function.
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5, show.legend = FALSE) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(0, 0.25, .05), limits = c(0, .27), expand = c(0, 0)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black", linewidth = 1),
axis.ticks.x = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.x = unit(-3, "mm"), # Negative length to extend upward
axis.ticks.y = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.y = unit(-3, "mm"), # Negative length to extend upward
legend.title = element_blank(),
)
The axis tick mark labels are a bit too close to the axis lines. We
can increase the margins between the axis lines and the axis tick mark
labels by adjusting arguments to the axis.text theme
elements. The arguments accept values for the margin of interest (t for
top, r for right, b for bottom, and l for left), and the unit that we
are using for the adjustments, which in this case is “points”, or
pt.
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5, show.legend = FALSE) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(0, 0.25, .05), limits = c(0, .27), expand = c(0, 0)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black", linewidth = 1),
axis.ticks.x = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.x = unit(-3, "mm"), # Negative length to extend upward
axis.ticks.y = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.y = unit(-3, "mm"), # Negative length to extend upward
legend.title = element_blank(),
axis.text.x = element_text(margin = margin(t = 5, r = 0, b = 0, l = 0, unit = "pt")),
axis.text.y = element_text(margin = margin(t = 0, r = 5, b = 0, l = 0, unit = "pt"))
)
The legend is currently in the middle right of the plot. We can move
it to the top right corner by adjusting the legend.position
and legend.position.inside arguments in the
theme() function.
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5, show.legend = FALSE) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(0, 0.25, .05), limits = c(0, .27), expand = c(0, 0)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black", linewidth = 1),
axis.ticks.x = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.x = unit(-3, "mm"), # Negative length to extend upward
axis.ticks.y = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.y = unit(-3, "mm"), # Negative length to extend upward
legend.title = element_blank(),
axis.text.x = element_text(margin = margin(t = 5, r = 0, b = 0, l = 0, unit = "pt")),
axis.text.y = element_text(margin = margin(t = 0, r = 5, b = 0, l = 0, unit = "pt")),
legend.position = "inside",
legend.position.inside=c(0.9, 0.9)
)
The font size of the tick mark labels is a bit too small. We can
increase the font size of the tick mark labels by adjusting the
size argument in element_text function that
defines the axis.text.x and axis.text.y
elements.
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5, show.legend = FALSE) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(0, 0.25, .05), limits = c(0, .27), expand = c(0, 0)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black", linewidth = 1),
axis.ticks.x = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.x = unit(-3, "mm"), # Negative length to extend upward
axis.ticks.y = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.y = unit(-3, "mm"), # Negative length to extend upward
legend.title = element_blank(),
axis.text.x = element_text(size=12, margin = margin(t = 5, r = 0, b = 0, l = 0, unit = "pt")),
axis.text.y = element_text(size=12, margin = margin(t = 0, r = 5, b = 0, l = 0, unit = "pt")),
legend.position = "inside",
legend.position.inside=c(0.9, 0.9)
)
We would like men to be associated with a greenish color, and women
associated with a reddish color. To change these mappings between the
groups in the data and their displayed colors, we can use the
scale_color_manual() function. This function takes a named
vector as an argument, where the names are the groups, and the values
are the colors that we want to associate with each group.
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5, show.legend = FALSE) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_color_manual(values=c(Men="#06aca4", Women="#f04143"))+
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(0, 0.25, .05), limits = c(0, .27), expand = c(0, 0)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black", linewidth = 1),
axis.ticks.x = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.x = unit(-3, "mm"), # Negative length to extend upward
axis.ticks.y = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.y = unit(-3, "mm"), # Negative length to extend upward
legend.title = element_blank(),
axis.text.x = element_text(margin = margin(t = 5, r = 0, b = 0, l = 0, unit = "pt")),
axis.text.y = element_text(margin = margin(t = 0, r = 5, b = 0, l = 0, unit = "pt")),
legend.position = "inside",
legend.position.inside=c(0.9, 0.9)
)
The plot is now very similar to the published figure. I am happy with the result, and I am ready to declare victory. A few final notes now, before we finish, about working with colors.
In ggplot you can supply colors in RGB, hexadecimal, or color names.
The website imagecolorpicker.com is a useful tool for finding the RGB or hex code of a color in an image.
You can find pleasing combinations of colors in the following places:
If you are looking for inspiration for color combinations, I recommend the website “ColorBrewer” [http://colorbrewer2.org/]. This website is a great resource for finding color schemes which are colorblind-friendly, print-friendly, and photocopy-friendly.
If you are looking for color combinations that simply look beautiful, I also recommend the website “coolors” [https://coolors.co/palettes/popular]. This website demonstrates beautiful color schemes that you can use in your plots.
The RColorBrewer package provides access to the
ColorBrewer palettes. These are pre-defined combinations of colors that
are easy to plug into your ggplot code.
The figure below shows the named colors in R. You can use these names directly in your ggplot code.
One of the great things about R is that it allows you to save figures with exact dimensions, resolutions, and file formats, making it easy to reproduce your figures or customize your figures for different uses, whether that be a powerpoint presentation, a figure to be included in a report, or a figure that must precisely match a journal’s artwork guidelines. A high resolution png file will usually work across all of these different scenarios.
Here, we here use R’s png() function to save our figure.
You place a call to this function at the beginning of your ggplot code,
specifying the name of the file, its dimensions, the units of its
dimensions, and its resolution. After you call png(), you
then include your ggplot code to draw the figure, and when that it
finished, you finish your work with a call to dev.off().
This function tells R: “I am finished plotting now, so please write the
file”.
png("recreated_figure.png", width=6, height=4, units="in", res=300)
ggplot(tidy_d, aes(x=age, y=asfr, color=sex)) +
geom_line(linewidth=1.5, show.legend = FALSE) +
geom_point(size=3) +
labs(x="Age (years)", y="Age specific fertility rate") +
scale_color_manual(values=c(Men="#06aca4", Women="#f04143"))+
scale_x_continuous(breaks = seq(10, 55, 5), limits = c(10, 56), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(0, 0.25, .05), limits = c(0, .27), expand = c(0, 0)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black", linewidth = 1),
axis.ticks.x = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.x = unit(-3, "mm"), # Negative length to extend upward
axis.ticks.y = element_line(color = "black", linewidth = 0.5, linetype = "solid"),
axis.ticks.length.y = unit(-3, "mm"), # Negative length to extend upward
legend.title = element_blank(),
axis.text.x = element_text(margin = margin(t = 5, r = 0, b = 0, l = 0, unit = "pt")),
axis.text.y = element_text(margin = margin(t = 0, r = 5, b = 0, l = 0, unit = "pt"))
)
dev.off()
## quartz_off_screen
## 2
Boyd, R., Silk, J., & Langergraber, K. (2023). How Humans Evolved. W.W. Norton & Company.