library(tidyverse) # because, always
library(gapminder) # for the Gapminder datasetMaking Pretty Charts: Part 1 - Sorted Bar Charts & Labels
Introduction
Pretty charts travel far and wide. Ugly charts don’t.
Making pretty charts is an easily learnable skill. Most people are terrible at data visualization. This includes the other people you are competing with for jobs. Learn how to make pretty charts, and you will stand out from the pack.
Today we’ll learn how to make one of the most helpful data visualizations: the sorted bar chart. And we’re going to add labels and make it look pretty.
Setup
Load our packages
Let’s look at the Gapminder dataset
gapminder# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# … with 1,694 more rows
The Gapminder dataset has five columns:
Country
Continent
Year
Life Expectancy
Population
GDP Per Capita
For our chart, we’re going to use our dplyr toolkit to get the top 10 countries by GDP per capita for the latest year available.
top_10_gdp_per_cap <- gapminder |>
# for each country
group_by(country) |>
# get the the latest (maximum) year's data
slice_max(order_by = year) |>
# ungroup (so it is not grouped by country, from above)
ungroup() |>
# get the top 10 by GDP per Capita
slice_max(order_by = gdpPercap, n = 10)
top_10_gdp_per_cap# A tibble: 10 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Norway Europe 2007 80.2 4627926 49357.
2 Kuwait Asia 2007 77.6 2505559 47307.
3 Singapore Asia 2007 80.0 4553009 47143.
4 United States Americas 2007 78.2 301139947 42952.
5 Ireland Europe 2007 78.9 4109086 40676.
6 Hong Kong, China Asia 2007 82.2 6980412 39725.
7 Switzerland Europe 2007 81.7 7554661 37506.
8 Netherlands Europe 2007 79.8 16570613 36798.
9 Canada Americas 2007 80.7 33390141 36319.
10 Iceland Europe 2007 81.8 301931 36181.
Making Our Chart
Step 1: Make A Basic Bar Chart
First, we make a basic horizontal bar chart in ggplot . The advantage of horizontal bar charts is that the labels are easier to read than a vertical bar chart (where they labels are sideways, or overlapping).
top_10_gdp_per_cap |>
ggplot(aes(x = gdpPercap, y = country)) +
geom_bar(stat = "identity")Well, this has the data, but there’s a lot we can do to improve it.
Step 2: Sort The Bar Chart By Values
What is the 3rd richest country? What rank is Iceland? Right now, we have no sense of ordering. Let’s fix that using fct_reorder() from the forcats package, which is loaded automatically as part of the tidyverse.
top_10_gdp_per_cap |>
# .f = the variable you want to order
# .x = the numeric variable you want it ordered by
ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap))) +
geom_bar(stat = "identity")great, now we have our chart sorted by GDP per capita. It looks a lot better already.
Take a look at the documentation for fct_reorder()
?fct_reorder()Step 3: Color Bars By Group
We can color the bars by group by using the fill aesthetic in ggplot2.
top_10_gdp_per_cap |>
ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap),
fill = continent)) +
geom_bar(stat = "identity")Cool, now we can see if there are geographic patterns. But these colors are super ugly. What is this, the circus?
Step 3.1: Customize our Colors
We can customize our colors. There are a lot of different ways to do this, but we’re going to use color names. Especially if you’re making multiple charts with the same groups it’s a good idea to assign colors to each specific variable so that the color is the same in each chart so it’s not confusing to your viewer.
# define our color list as a vector c("Variable Name" = "color name")
color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")
top_10_gdp_per_cap |>
ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap),
fill = continent)) +
geom_bar(stat = "identity") +
# use scale_fill_manual to assign that color list
# if you were using a ggplot geom that uses color instead of fill use scale_color_manual
scale_fill_manual(values = color_list)Step 4: Add Labels
color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")
top_10_gdp_per_cap |>
ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap),
fill = continent)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = color_list) +
# add all the labels here.
labs(title = "Top 10 Countries by GDP per Capita",
# str_wrap() makes it mutli-line if it is over the width # wide.
subtitle = str_wrap("Use the subtitle to give context or say something you want your readers to take away from the chart.", width = 70),
x = "GDP Per Capita (USD)",
# we don't need a label next to country names -- that's obvious
y = "",
# where is your data from? what is he latest datapoint?
# give yourself credit too. You made it -- you deserve it.
caption = "Source: GapMinder | Latest Data: 2007 | Calculations by Me"
)That looks a lot better. Now we know what we’re looking at.
Step 5: Format the Axis
You want to make your chart as easy to understand as possible for your viewer. You now have an x-axis label, which gives them some guidance. But there’s an easy way to make it even better by formatting the axis using scale_x_continuous() and some helper functions from the scales package.
color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")
top_10_gdp_per_cap |>
ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap),
fill = continent)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = color_list) +
labs(title = "Top 10 Countries by GDP per Capita",
subtitle = str_wrap("Use the subtitle to give context or say something you want your readers to take away from the chart.", width = 70),
x = "GDP Per Capita (USD)",
y = "",
caption = "Source: GapMinder | Latest Data: 2007 | Calculations by Me",
fill = "Region:"
) +
# first argument: formats the axis into dollars,
# second argument: makes x-axis fit better -- try it with and without.
scale_x_continuous(labels = scales::label_dollar(), expand = c(0,0)) Step 6: Add data labels:
color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")
top_10_gdp_per_cap |>
ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap),
fill = continent)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = color_list) +
labs(title = "Top 10 Countries by GDP per Capita",
subtitle = str_wrap("Use the subtitle to give context or say something you want your readers to take away from the chart.", width = 70),
x = "GDP Per Capita (USD)",
y = "",
caption = "Source: GapMinder | Latest Data: 2007 | Calculations by Me",
fill = "Region:"
) +
scale_x_continuous(labels = scales::label_dollar(), expand = c(0,0)) +
# use scales::dollar() to format the labels.
# scale = 1/1000 makes 49400 into 49.4
# accuracy = .1 rounds it to 1 decimal point
# suffix = K adds K to the end to signify magnitude is thousands.
# hjust = 1.2 shifts the label horizontally
geom_text(aes(label = scales::dollar(gdpPercap, scale = 1/1000,
accuracy = .1, suffix = "K")),
hjust = 1.2, size = 3, color = "white") +
theme_minimal()Try playing around with this to get a sense of how it works
scales::dollar(50302.22, scale = 1/1000, accuracy = .1, suffix = "K")[1] "$50.3K"
It’s really helpful for big numbers
scales::dollar(5372848093)[1] "$5,372,848,093"
For big numbers, change the scale to something meaningful for the viewer – don’t make the viewer count the commas in “$5,372,848,093” to figure out that it 5.3 billion.
Use the scale argument to change the magnitude. You can figure this out by dividing by the number of zeros you are tying to chop off.
1 million = 1,000,000 = 10^6
1 billion = 1,000,000,000 = 10^9
1 trillion = 1,000,000,000 = 10^12
scales::dollar(5372848093, scale = 1/10^9, accuracy = .01, suffix = " bn")[1] "$5.37 bn"
scales is also helpful for figures in percent:
scales::percent(.45624, accuracy = .01)[1] "45.62%"
Step 7: Add a Theme
ggplot2 has built-in themes that can style your plots. I personally recommend theme_minimal() because it removes unnecessary clutter.
color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")
top_10_gdp_per_cap |>
ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap),
fill = continent)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = color_list) +
labs(title = "Top 10 Countries by GDP per Capita",
subtitle = str_wrap("Use the subtitle to give context or say something you want your readers to take away from the chart.", width = 70),
x = "GDP Per Capita (USD)",
y = "",
caption = "Source: GapMinder | Latest Data: 2007 | Calculations by Me",
fill = "Region:"
) +
scale_x_continuous(labels = scales::label_dollar(), expand = c(0,0)) +
geom_text(aes(label = scales::dollar(gdpPercap, scale = 1/1000,
accuracy = .1, suffix = "K")),
hjust = 1.2, size = 3, color = "white") +
# add a theme
theme_minimal()Try out other themes, and see which ones you like.
Conclusion
There is more we could do to make this chart look even better. But this is a great start.
Want to make nice charts and be lazy? Here’s what you’ll be able to do in the future:
Easy: Browse through the list of ggplot2 extensions. These packages can help you make great looking charts easily.
Medium: If you find yourself making the same kind of chart over and over again, you can make it into a custom function.
Advanced: Make your own custom ggplot2 theme.
Do you want to make professional quality data visualizations using ggplot2? If so, check out the materials (free online) from Cedric Scherer’s Graphic Design with ggplot2 workshop. It’s excellent.