Using ggplot2 and gganimate to recreate Hans Rosling’s famous bubble chart in R

21 November 2019

Setup and aims

Why are we learning this

  1. Animated graphics provide a really engaging and effective illustration of how things change, usually over time. Any data scientist or aspiring data scientist should be able to deploy animations when he/she believes they are the best way to communicate or illustrate a phenomenon to their client.

  2. Since learning how to create animated graphics earlier this year, I have found numerous situations where I have been able to use them to more effectively support an argument.

  3. It’s surpisingly easy to create animated graphics in R, particularly if you have some familiarity with working in ggplot2.

What we will need

library(tidyverse) # includes ggplot2
library(viridis) # optional - for nice colours
library(gganimate) # core animation package in R
library(wbstats) # connects to world bank and pulls statistical indicators

I also have Oswald font family installed. If you want to use that you can download it here.

Questions

  1. What is an animation and how is one created?

  2. Why did Hans Rosling create his famous bubble chart?

  3. What data did he use?

Getting our data

The wbstats package

We are going to need some macroeconomic data to create this chart. A good source is the World Bank Open Data Site.

The wbstats package allows you to pull the data you need directly from this site into an R dataframe, by utilizing the API. Two functions from this package are useful to us:

## pull specific indicator between specific dates based on indicator ID

wb(country = "all", indicator, startdate, enddate, mrv, return_wide = FALSE,
  gapfill, freq, cache, lang = c("en", "es", "fr", "ar", "zh"),
  removeNA = TRUE, POSIXct = FALSE, include_dec = FALSE,
  include_unit = FALSE, include_obsStatus = FALSE,
  include_lastUpdated = FALSE)
  
## get latest information about country properties.

wbcountries(lang = c("en", "es", "fr", "ar", "zh"))
  

What data do we need?

  1. GDP per Capita, US$ - indicator ID NY.GDP.PCAP.CD
  2. Life expectance at birth, total, years - indicator ID SP.DYN.LE00.IN
  3. Population, thousands - indicator ID SP.POP.TOTL

Since this is a time series animation, we will need a start year and end year - lets start at 1960 and end at 2017 - some data will be missing but that’s not a problem.

We also need it for all countries. Note that the World Bank also has data for regional and economic country groupings - we are not interested in those - we just want the pure countries.

Let’s get the data

rosling_data <- wbstats::wb(indicator = c("SP.DYN.LE00.IN", "NY.GDP.PCAP.CD", "SP.POP.TOTL"), 
                       country = "countries_only", startdate = 1960, enddate = 2017) 

head(rosling_data %>% 
       dplyr::arrange(date, country), n = 3)
##   iso3c date        value    indicatorID
## 1   AFG 1960 3.244600e+01 SP.DYN.LE00.IN
## 2   AFG 1960 5.977319e+01 NY.GDP.PCAP.CD
## 3   AFG 1960 8.996973e+06    SP.POP.TOTL
##                                 indicator iso2c     country
## 1 Life expectancy at birth, total (years)    AF Afghanistan
## 2            GDP per capita (current US$)    AF Afghanistan
## 3                       Population, total    AF Afghanistan

Looks like what we need, but we may want to assign countries to regions.

Joining to regions

The wbcountries() function returns a bunch of information about countries - we only need the region from this. So we can grab that and join it to our datafile based on the iso3c country code.

rosling_data <- rosling_data %>% 
  dplyr::left_join(wbstats::wbcountries() %>% 
                     dplyr::select(iso3c, region))
## Joining, by = "iso3c"
head(rosling_data %>% 
       dplyr::arrange(date, country), n = 3)
##   iso3c date        value    indicatorID
## 1   AFG 1960 3.244600e+01 SP.DYN.LE00.IN
## 2   AFG 1960 5.977319e+01 NY.GDP.PCAP.CD
## 3   AFG 1960 8.996973e+06    SP.POP.TOTL
##                                 indicator iso2c     country     region
## 1 Life expectancy at birth, total (years)    AF Afghanistan South Asia
## 2            GDP per capita (current US$)    AF Afghanistan South Asia
## 3                       Population, total    AF Afghanistan South Asia

Now we need to make the data wide

For ggplot2 we will need all the indictor data on a single row for each date and country.

rosling_data <- rosling_data %>% 
  tidyr::pivot_wider(id_cols = c("date", "country", "region"), names_from = indicator, values_from = value)

head(rosling_data %>% 
       dplyr::arrange(date, country), n = 3)
## # A tibble: 3 x 6
##   date  country  region  `Life expectanc… `GDP per capita… `Population, to…
##   <chr> <chr>    <chr>              <dbl>            <dbl>            <dbl>
## 1 1960  Afghani… South …             32.4             59.8          8996973
## 2 1960  Albania  Europe…             62.3             NA            1608800
## 3 1960  Algeria  Middle…             46.1            246.          11057863

Static charting

What are our aesthetics?

Before we animate, we can work on the data for a single year to make sure that we get the static chart design the way we want it.

rosling_data_2010 <- rosling_data %>% 
  dplyr::filter(date == 2010)

The first thing we should always do in ggplot2 is set our aesthetics - which is to declare what elements of the data correspond to what properties of the chart.

  1. The x-axis will be GDP per Capita
  2. The y-axis will be Life expectancy
  3. The marker sizes will be population
  4. The market colors will represent the regions
rosling_chart <- ggplot2::ggplot(rosling_data_2010, aes(x = `GDP per capita (current US$)`, 
                                                        y = `Life expectancy at birth, total (years)`, 
                                                        size = `Population, total`))

Give our chart some basic properties

We need to tell ggplot2 what kind of chart we want, and we basically want a scatter chart with the regions colour coded, which is geom_point(). Then we can render the chart for the first time to see what it looks like:

rosling_chart <- ggplot2::ggplot(rosling_data_2010, aes(x = `GDP per capita (current US$)`, 
                                                        y = `Life expectancy at birth, total (years)`, 
                                                        size = `Population, total`)) +
  ggplot2::geom_point(aes(color = region))

Let’s render it

rosling_chart

A few things it would be nice to do

  1. Get rid of the population size legend - not necessary
  2. Make the bubbles somewhat transparent, especially if they end up moving around.
  3. Move to a log scale on the x axis to avoid massive crowding to the left.
  4. Control the scaling of the bubbles a bit more.
  5. Add a theme to get rid of the gridlines.
rosling_chart <- ggplot2::ggplot(rosling_data_2010, aes(x = log(`GDP per capita (current US$)`), 
                                                        y = `Life expectancy at birth, total (years)`, 
                                                        size = `Population, total`)) +
  ggplot2::geom_point(alpha = 0.5, aes(color = region)) +
  ggplot2::scale_size(range = c(.1, 16), guide = FALSE) +
  ggplot2::theme_classic()

What does it look like now?

rosling_chart

Much nicer, just a few more tweaks…

Some finishing touches

  1. Use a slightly nicer color scheme if you like?
  2. Label the axes better.
rosling_chart <- ggplot2::ggplot(rosling_data_2010, aes(x = log(`GDP per capita (current US$)`), 
                                                        y = `Life expectancy at birth, total (years)`, 
                                                        size = `Population, total`)) +
  ggplot2::geom_point(alpha = 0.5, aes(color = region)) +
  ggplot2::scale_size(range = c(.1, 16), guide = FALSE) +
  ggplot2::theme_classic() +
  viridis::scale_color_viridis(discrete = TRUE, name = "Region", option = "viridis") +
  ggplot2::labs(x = "Log GDP per capita",
                y = "Life expectancy at birth") 

Our finished static chart

rosling_chart

Nice! Now we need to animate…

Animating from a static chart

transition_state and ease_aes

When we use the package gganimate, it will expect a a transition state variable, which is the variable that it uses to move between static states. In this case our transition state is clearly date. So basically gganimate renders your design for each value of date, and then moves between them. The function transition_states() is used for this.

There are various options for how gganimate moves between the states. It can move at a steady pace, or it can speed up at various rates as the states progress, so that movement is slow at the beginning and fast at the end. The function ease_aes() is used to determine how to move in and out of the states. There are various options available to you for this, but I will use cubic-in-out.

All we need to do to have an animated graphic is to add these two functions to our existing ggplot2 code. Remember we have to return to our original multi-year data set.

rosling_chart_anim <- ggplot2::ggplot(rosling_data, aes(x = log(`GDP per capita (current US$)`), 
                                                        y = `Life expectancy at birth, total (years)`, 
                                                        size = `Population, total`)) +
  ggplot2::geom_point(alpha = 0.5, aes(color = region)) +
  ggplot2::scale_size(range = c(.1, 16), guide = FALSE) +
  ggplot2::theme_classic() +
  viridis::scale_color_viridis(discrete = TRUE, name = "Region", option = "viridis") +
  ggplot2::labs(x = "Log GDP per capita",
                y = "Life expectancy at birth") +
  gganimate::transition_states(date, transition_length = 1, state_length = 1) +
  gganimate::ease_aes('cubic-in-out')

What does it look like?

rosling_chart_anim

Looks promising, but we need to make some tweaks.

Tweaking and controlling the animation

We want to tweak some of the appearance.

  1. We might want to adjust the x and y ranges to our liking
  2. We want to put the year unobtrusively in the center background
  3. Then we want to render the animation with options that suits our needs and save it to a .gif or a .mp4 if we prefer.
rosling_chart_anim <- ggplot2::ggplot(rosling_data, aes(x = log(`GDP per capita (current US$)`), 
                                                        y = `Life expectancy at birth, total (years)`, 
                                                        size = `Population, total`)) +
  ggplot2::geom_point(alpha = 0.5, aes(color = region)) +
  ggplot2::scale_size(range = c(.1, 16), guide = FALSE) +
  ggplot2::theme_classic() +
  viridis::scale_color_viridis(discrete = TRUE, name = "Region", option = "viridis") +
  ggplot2::labs(x = "Log GDP per capita",
                y = "Life expectancy at birth") +
  ggplot2::geom_text(aes(x = 7.5, y = 60, label = date), size = 14, color = 'lightgrey', family = 'Oswald') +
  ggplot2::scale_x_continuous(limits = c(2.5, 12.5)) +
  ggplot2::scale_y_continuous(limits = c(30, 90)) +
  gganimate::transition_states(date, transition_length = 1, state_length = 1) +
  gganimate::ease_aes('cubic-in-out')

# animate and save a gif (default if no renderer is explicitly called)

rosling_chart_gif <- gganimate::animate(rosling_chart_anim, nframes = 200, width = 800, height = 600)

gganimate::save_animation(rosling_chart_gif, "rosling.gif")

# animate and save as an mpeg

rosling_chart_mp4 <- gganimate::animate(rosling_chart_anim, nframes = 200, width = 800, height = 600,
                                        renderer = ffmpeg_renderer())

gganimate::save_animation(rosling_chart_mp4, "rosling.mp4")

Lets look at the .gif

Lets look at the .mp4

Other resources

‘Race’ bar charts - Eurovision Song Contests

Map/geographic viz - Travel patterns


(Audio added using ffmpeg)