Making Pretty Charts: Part 1 - Sorted Bar Charts & Labels

Author

Teal Emery

Introduction

Pretty charts travel far and wide. Ugly charts don’t.   

Making pretty charts is an easily learnable skill. Most people are terrible at data visualization. This includes the other people you are competing with for jobs. Learn how to make pretty charts, and you will stand out from the pack.

Today we’ll learn how to make one of the most helpful data visualizations:  the sorted bar chart. And we’re going to add labels and make it look pretty.

Setup

Load our packages

library(tidyverse) # because, always
library(gapminder) # for the Gapminder dataset

Let’s look at the Gapminder dataset

gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# … with 1,694 more rows

The Gapminder dataset has five columns:

  • Country

  • Continent

  • Year

  • Life Expectancy

  • Population

  • GDP Per Capita

For our chart, we’re going to use our dplyr toolkit to get the top 10 countries by GDP per capita for the latest year available.

top_10_gdp_per_cap <- gapminder |> 
  # for each country
  group_by(country) |> 
  # get the the latest (maximum) year's data
  slice_max(order_by = year) |> 
  # ungroup (so it is not grouped by country, from above)
  ungroup() |> 
  # get the top 10 by GDP per Capita
  slice_max(order_by = gdpPercap, n = 10)

top_10_gdp_per_cap
# A tibble: 10 × 6
   country          continent  year lifeExp       pop gdpPercap
   <fct>            <fct>     <int>   <dbl>     <int>     <dbl>
 1 Norway           Europe     2007    80.2   4627926    49357.
 2 Kuwait           Asia       2007    77.6   2505559    47307.
 3 Singapore        Asia       2007    80.0   4553009    47143.
 4 United States    Americas   2007    78.2 301139947    42952.
 5 Ireland          Europe     2007    78.9   4109086    40676.
 6 Hong Kong, China Asia       2007    82.2   6980412    39725.
 7 Switzerland      Europe     2007    81.7   7554661    37506.
 8 Netherlands      Europe     2007    79.8  16570613    36798.
 9 Canada           Americas   2007    80.7  33390141    36319.
10 Iceland          Europe     2007    81.8    301931    36181.

Making Our Chart

Step 1: Make A Basic Bar Chart

First, we make a basic horizontal bar chart in ggplot . The advantage of horizontal bar charts is that the labels are easier to read than a vertical bar chart (where they labels are sideways, or overlapping).

top_10_gdp_per_cap |> 
  ggplot(aes(x = gdpPercap, y = country)) +
  geom_bar(stat = "identity")

Well, this has the data, but there’s a lot we can do to improve it.

Step 2: Sort The Bar Chart By Values

What is the 3rd richest country? What rank is Iceland? Right now, we have no sense of ordering. Let’s fix that using fct_reorder() from the forcats package, which is loaded automatically as part of the tidyverse.

top_10_gdp_per_cap |> 
  # .f = the variable you want to order
  # .x = the numeric variable you want it ordered by
  ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap))) +
  geom_bar(stat = "identity")

great, now we have our chart sorted by GDP per capita. It looks a lot better already.

Take a look at the documentation for fct_reorder()

?fct_reorder()

Step 3: Color Bars By Group

We can color the bars by group by using the fill aesthetic in ggplot2.

top_10_gdp_per_cap |> 
  ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap), 
             fill = continent)) +
  geom_bar(stat = "identity")

Cool, now we can see if there are geographic patterns. But these colors are super ugly. What is this, the circus?

Step 3.1: Customize our Colors

We can customize our colors. There are a lot of different ways to do this, but we’re going to use color names. Especially if you’re making multiple charts with the same groups it’s a good idea to assign colors to each specific variable so that the color is the same in each chart so it’s not confusing to your viewer.

# define our color list as a vector c("Variable Name" = "color name")
color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")

top_10_gdp_per_cap |> 
  ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap), 
             fill = continent)) +
  geom_bar(stat = "identity") +
  # use scale_fill_manual to assign that color list
  # if you were using a ggplot geom that uses color instead of fill use scale_color_manual 
  scale_fill_manual(values = color_list)

Step 4: Add Labels

color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")

top_10_gdp_per_cap |> 
  ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap), 
             fill = continent)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = color_list) +
  # add all the labels here.
  labs(title = "Top 10 Countries by GDP per Capita",
       # str_wrap() makes it mutli-line if it is over the width # wide.
       subtitle = str_wrap("Use the subtitle to give context or say something you want your readers to take away from the chart.", width = 70),
       x = "GDP Per Capita (USD)",
       # we don't need a label next to country names -- that's obvious
       y = "", 
       # where is your data from?  what is he latest datapoint?  
       # give yourself credit too. You made it -- you deserve it. 
       caption = "Source: GapMinder | Latest Data: 2007 | Calculations by Me"
       )

That looks a lot better. Now we know what we’re looking at.

Step 5: Format the Axis

You want to make your chart as easy to understand as possible for your viewer. You now have an x-axis label, which gives them some guidance. But there’s an easy way to make it even better by formatting the axis using scale_x_continuous() and some helper functions from the scales package.

color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")

top_10_gdp_per_cap |> 
  ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap), 
             fill = continent)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = color_list) +
  labs(title = "Top 10 Countries by GDP per Capita",
       subtitle = str_wrap("Use the subtitle to give context or say something you want your readers to take away from the chart.", width = 70),
       x = "GDP Per Capita (USD)",
       y = "", 
       caption = "Source: GapMinder | Latest Data: 2007 | Calculations by Me",
       fill = "Region:"
       ) +
  # first argument:  formats the axis into dollars,
  # second argument: makes x-axis fit better -- try it with and without.
  scale_x_continuous(labels = scales::label_dollar(), expand = c(0,0)) 

Step 6: Add data labels:

color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")

top_10_gdp_per_cap |> 
  ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap), 
             fill = continent)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = color_list) +
  labs(title = "Top 10 Countries by GDP per Capita",
       subtitle = str_wrap("Use the subtitle to give context or say something you want your readers to take away from the chart.", width = 70),
       x = "GDP Per Capita (USD)",
       y = "", 
       caption = "Source: GapMinder | Latest Data: 2007 | Calculations by Me",
       fill = "Region:"
       ) +
  scale_x_continuous(labels = scales::label_dollar(), expand = c(0,0)) +
  # use scales::dollar() to format the labels.
  # scale = 1/1000 makes 49400 into 49.4
  # accuracy = .1 rounds it to 1 decimal point
  # suffix = K adds K to the end to signify magnitude is thousands. 
  # hjust = 1.2 shifts the label horizontally
  geom_text(aes(label = scales::dollar(gdpPercap, scale = 1/1000, 
                                       accuracy = .1, suffix = "K")), 
            hjust = 1.2, size = 3, color = "white") +
  theme_minimal()

Try playing around with this to get a sense of how it works

scales::dollar(50302.22, scale = 1/1000, accuracy = .1, suffix = "K")
[1] "$50.3K"

It’s really helpful for big numbers

scales::dollar(5372848093)
[1] "$5,372,848,093"

For big numbers, change the scale to something meaningful for the viewer – don’t make the viewer count the commas in “$5,372,848,093” to figure out that it 5.3 billion.

Use the scale argument to change the magnitude. You can figure this out by dividing by the number of zeros you are tying to chop off.

1 million = 1,000,000 = 10^6

1 billion = 1,000,000,000 = 10^9

1 trillion = 1,000,000,000 = 10^12

scales::dollar(5372848093, scale = 1/10^9, accuracy = .01, suffix = " bn")
[1] "$5.37 bn"

scales is also helpful for figures in percent:

scales::percent(.45624, accuracy = .01)
[1] "45.62%"

Step 7: Add a Theme

ggplot2 has built-in themes that can style your plots. I personally recommend theme_minimal() because it removes unnecessary clutter.

color_list <- c("Americas" = "lightblue", "Asia" = "skyblue4", "Europe" = "ivory4")

top_10_gdp_per_cap |> 
  ggplot(aes(x = gdpPercap, y = fct_reorder(.f = country, .x = gdpPercap), 
             fill = continent)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = color_list) +
  labs(title = "Top 10 Countries by GDP per Capita",
       subtitle = str_wrap("Use the subtitle to give context or say something you want your readers to take away from the chart.", width = 70),
       x = "GDP Per Capita (USD)",
       y = "", 
       caption = "Source: GapMinder | Latest Data: 2007 | Calculations by Me",
       fill = "Region:"
       ) +
  scale_x_continuous(labels = scales::label_dollar(), expand = c(0,0)) +
  geom_text(aes(label = scales::dollar(gdpPercap, scale = 1/1000, 
                                       accuracy = .1, suffix = "K")), 
            hjust = 1.2, size = 3, color = "white") +
  # add a theme
  theme_minimal()

Try out other themes, and see which ones you like.

Conclusion

There is more we could do to make this chart look even better. But this is a great start.

Want to make nice charts and be lazy? Here’s what you’ll be able to do in the future:

  • Easy: Browse through the list of ggplot2 extensions. These packages can help you make great looking charts easily.

  • Medium: If you find yourself making the same kind of chart over and over again, you can make it into a custom function.

  • Advanced: Make your own custom ggplot2 theme.

Do you want to make professional quality data visualizations using ggplot2? If so, check out the materials (free online) from Cedric Scherer’s Graphic Design with ggplot2 workshop. It’s excellent.