This code through explores how to visualize time series data using data from the the US Federal Reserve (the Fed). The Fed publishes federal, state, and even county level economic data and all of that data can be accessed via an online portal, FRED.
For the purposes of this exercise, I will be importing two data sets, one I have already downloaded from FRED. The second data set is accessed using the tidyquant package, which includes a function for downloading the directly in R by using the FRED API/website.
Specifically, I’ll explain and demonstrate how to make visualizations of time series data that are truthful, functional, beautiful, insightful, and enlightening - the five qualities of great visualizations discussed in Alberto Cairo’s The Truthful Art: Data, Charts, and Maps for Communication while using R and ggplot, which allows us to implement the grammar of graphics directly in our plot creation.
The grammar of graphics requires just three things: data, mapping (the coordinate system of your plot), and geoms (the visual marks that indicate data points). Once you’ve included these three things, you’ve created a plot! Next, we can use layers of aesthetics to map even more, like size and color - and there’s even more customization made possible with ggplot by using the theme() function.
We can use additional packages like scales or hrbrthemes for even more fine-tuning of our visualizations.
This topic is valuable because our data means very little without a visualization to help us tell its story. And if our visualization is not truthful, functional, beautiful, insightful, or enlightening, we have missed our opportunity to tell our story - or possibly worse, we’ve misinformed our audience through poor visualizations.
Specifically, you’ll learn how to:
Plot data using ggplot() and the grammar of graphics
Use theme() to make customizations to your plot
Use annotate() to add annotations and other fancy things to your plot
Learn about pre-built themes that can save you a lot of time
We begin by of coursing loading our data and wrangling it as we need, so that our visualizations will plot what we need.
# load your data
wealth_raw <- read_csv("data/share_wealth_raw.csv")
# preview first six rows your data
head(wealth_raw)# pivot long for tidy data
wealth <- wealth_raw %>%
pivot_longer(!date, names_to = "category", values_to = "per_share")
head(wealth) # preview the first 6 rows# for better visualizations later on, fix the labels
wealth.labs <- c("Total Share of Wealth of the Bottom 50%",
"Total Share of Wealth of the Top 1%")
wealth$category <- factor(wealth$category,
levels = c("share_wealth_bottom50",
"share_wealth_top1"),
labels = c("Total Share of Wealth of the Bottom 50%",
"Total Share of Wealth of the Top 1%"))# let's make a basic plot we can build on
# remember, to start, all we need is data, mapping, and the geom
ggplot(data = wealth,
mapping = aes(x = date, y = per_share, color = category)) +
geom_line()Cool! Just like that, with two lines of code, we’ve taken a data set and created a line graph! But this plot is pretty boring, so let’s spruce it up using labs, themes, geoms, and annotations.
While we have a good starting point - a plot that truthful, functional, and insightful - there’s so much more we can do to turn this plot from meh to WOW
In this example, we add:
Labs: this function allows for us to label our axes (or in this case, remove the labels since they’re intuitively self-explanatory) and add a title, subtitle, and even caption!
Color: I use a pre-built color palette from the RColorBrewer package. We can specify what colors the lines should be, which is based on “category” from our mapping.
Scales: I use the scale_y_continuous function to label the y-axis data as percents
Theme: this is the biggie! Using the theme() functions in ggplot, we can change font family, font style, font color, text size, text alignment, plot margins, plot borders, and so much more! And even within each of those things, we can make even more customizations.
ggplot(data = wealth,
mapping = aes(x = date, y = per_share, color = category)) +
geom_line(size = 1) + # help distinguish our line better
# remove labels from our axes since they're self-explanatory
# add a title, subtitle, and caption
labs(x = NULL, y = NULL,
color = NULL,
title = "Americans' Share of the Nation's Wealth",
subtitle = "From 1987-2021, comparing the top 1% and bottom 50%",
caption = "Source: Federal Reserve Economic Data") +
# use the RColorBrewer palette
scale_color_brewer(palette = "Paired") +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
# specify custom font
theme_minimal(base_family = "Roboto Condensed", base_size = 12) +
# make modifications to the theme, adjusting fonts and formatting
theme(panel.grid.minor = element_blank(),
plot.title = element_text(face = "bold", size = rel(1.7)),
plot.subtitle = element_text(face = "plain", size = rel(1.3),
color = "grey70"),
plot.caption = element_text(face = "italic", size = rel(0.7),
color = "grey70", hjust = 0),
strip.text = element_text(face = "bold", size = rel(1.1), hjust = 0),
axis.title = element_text(face = "bold"),
axis.title.x = element_text(margin = margin(t = 10), hjust = 0),
axis.title.y = element_text(margin = margin(r = 10), hjust = 1),
strip.background = element_rect(fill = "grey90", color = NA),
panel.border = element_rect(color = "grey90", fill = NA),
legend.position = "bottom")BUT WAIT! THERE’S MORE
Some creative people out there have written code for pre-built themes that have all that customization BUILT-IN, so we don’t have to make edits to our plot, element by element. You can install a ton of different themes and be able to make visualizations using the same style as the BBC, the Economist, the Wall Street Journal, FiveThrityEight, and more. See Further Resources below for links to more ideas.
In this example, my code is only 12 lines long to create a visualization that in the earlier example took 24 lines of code to create!
ggplot(data = wealth,
mapping = aes(x = date, y = per_share, color = category)) +
geom_line(size = 1) +
labs(x = NULL, y = NULL,
color = NULL,
title = "Americans' Share of the Nation's Wealth",
subtitle = "From 1987-2021, comparing the top 1% and bottom 50%",
caption = "Source: Federal Reserve Economic Data") +
# using this pre-built theme, we have hardly any modifications to make!
# it's already built in for us!
theme_ipsum_rc() +
scale_color_brewer(palette = "Paired") +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
theme(legend.position = "bottom")In addition to using themes - either built in ones or your own customizations, we can add geom elements and annotations to our plots as well!
For this example, let’s add shaded areas to indicate when the US was in a recession, as identified by the Federal Bank.
# to include recessions on our visualizations, include:
fred_raw <- tq_get(("USREC"), get = "economic.data", from = "1989-07-01")
# create a new variable to indicate if there's a recession change
recessions_tidy <- fred_raw %>%
filter(date >= as.Date("1987-07-01") & date <= as.Date("2020-05-01")) %>%
mutate(recession_change = price - lag(price))
head(recessions_tidy)# set when the recession started
recessions_start_end <- fred_raw %>%
mutate(recession_change = price - lag(price)) %>%
filter(recession_change != 0)
head(recessions_start_end)# set when the recession ended
recessions_fake_end <- recessions_start_end %>%
bind_rows(tibble(date = ymd(today()),
recession_change = -1))
head(recessions_fake_end)# omit 2021
recessions_fake_end <- recessions_fake_end[1:8, ]
# combine to create our new data set
recessions <- tibble(start = filter(recessions_fake_end, recession_change == 1)$date,
end = filter(recessions_fake_end, recession_change == -1)$date)
head(recessions)Now that we have this new data set of when recessions began and ended, we can add that information to our original plot following the same grammar of graphics by adding geom_rect, which will plot a rectangle based on the data we map to in the code.
To help our audience understand what these rectangles are, we can include more information in the subtitle or elsewhere, to help guide our viewers’ interpretation of our visualization. I’ve added a small comment to my subtitle, explaining the shaded areas.
And lastly, I wanted to add additional information that may provide context to the enlightening story being told by this data. So, I added segments to the plot with text labeling what each segment represents.
ggplot(data = wealth,
mapping = aes(x = date, y = per_share, color = category)) +
geom_line(size = 1) +
# geom_rect allows for us to draw a rectangle right on the plot!
# the dimensions of the rectangle are based on our recessions data set
geom_rect(data = recessions,
aes(xmin = start, xmax = end, ymin = 0, ymax = 35),
inherit.aes = FALSE, fill = "gray70", alpha = 0.3) +
labs(x = NULL, y = NULL,
color = NULL,
title = "Americans' Share of the Nation's Wealth",
subtitle = "From 1987-2021, comparing the top 1% and bottom 50% \n(National recessions shaded)",
caption = "Source: Federal Reserve Economic Data") +
theme_ipsum_rc() +
scale_color_brewer(palette = "Paired") +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
theme(legend.position = "bottom") +
# annotate allows us to draw on the plot
# create a segment to denote significant economic events corresponding to
# the recession
annotate("segment", x = as.Date.factor("2001-06-01"),
xend = as.Date.factor("2001-06-01"),
y = 0, yend = 35, colour = "#666666", size=1, alpha=0.6) +
annotate(geom = "text", x = as.Date.factor("2001-06-01"), y = 32,
label = "Bush Tax Cut \n Jun 2001 & May 2003 ",
fontface = "italic", hjust = 1, color = "#666666", size = 3) +
annotate("segment", x = as.Date.factor("2009-02-01"),
xend = as.Date.factor("2009-02-01"),
y = 0, yend = 35, colour = "#666666", size=1, alpha=0.6) +
annotate(geom = "text", x = as.Date.factor("2009-02-01"), y = 32,
label = "Obama Stimulus \n Feb 2009 ",
fontface = "italic", hjust = 1, color = "#666666", size = 3) +
annotate("segment", x = as.Date.factor("2013-01-01"),
xend = as.Date.factor("2013-01-01"),
y = 0, yend = 35, colour = "#666666", size=1, alpha=0.6) +
annotate(geom = "text", x = as.Date.factor("2013-01-01"), y = 32.5,
label = "Obama \n Taxpayer Relief \n Dec 2010 ",
fontface = "italic", hjust = 1, color = "#666666", size = 3) +
annotate("segment", x = as.Date.factor("2020-04-01"),
xend = as.Date.factor("2020-04-01"),
y = 0, yend = 35, colour = "#666666", size=1, alpha=0.6) +
annotate(geom = "text", x = as.Date.factor("2020-04-01"), y = 33,
label = "Covid-19 ",
fontface = "italic", hjust = 1, color = "#666666", size = 3)
To learn more, check out these resources: