I’ve been working on creating static international population pyramids with ggplot2 for a project I’ve been working on and I wanted to briefly share what I’ve learned. I’ve written in greater detail about how to create interactive population pyramids in R here: http://walkerke.github.io/2014/06/rcharts-pyramids/

You can’t make a population pyramid without data; the following function get_data grabs the required population data from the US Census Bureau’s International Data Base and outputs a data frame in the right format for ggplot2. All you need is a FIPS 10-4 country code for the country you want, and then pick the year you want to visualize.

library(XML)
library(reshape2)
library(ggplot2)
library(plyr)


get_data <- function(country, year) {
  c1 <- "http://www.census.gov/population/international/data/idb/region.php?N=%20Results%20&T=10&A=separate&RT=0&Y="  
  c2 <- "&R=-1&C="
  url <- paste0(c1, year, c2, country)
  df <- data.frame(readHTMLTable(url))
  keep <- c(2, 4, 5)
  df <- df[,keep]  
  names(df) <- c("Age", "Male", "Female")
  cols <- 2:3
  df[,cols] <- apply(df[,cols], 2, function(x) as.numeric(as.character(gsub(",", "", x))))
  df <- df[df$Age != 'Total', ]  
  df$Male <- -1 * df$Male
  df$Age <- factor(df$Age, levels = df$Age, labels = df$Age)
  
  df.melt <- melt(df, 
                   value.name='Population', 
                   variable.name = 'Gender', 
                   id.vars='Age' )
  
  return(df.melt)
}

Now, I can call the get_data function and create a population pyramid. My code was basically adapted from this StackOverflow post so the credit is due to Didzis Elferts for his excellent answer. Here’s a pyramid for a fast-growing country, Nigeria, in 2014:

nigeria <- get_data("NI", 2014)

n1 <- ggplot(nigeria, aes(x = Age, y = Population, fill = Gender)) + 
  geom_bar(subset = .(Gender == "Female"), stat = "identity") + 
  geom_bar(subset = .(Gender == "Male"), stat = "identity") + 
  scale_y_continuous(breaks = seq(-15000000, 15000000, 5000000), 
                     labels = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")) + 
  coord_flip() + 
  scale_fill_brewer(palette = "Set1") + 
  theme_bw()

n1

plot of chunk unnamed-chunk-2

You may have to fiddle with the scale_y_continuous arguments to get the x-axis looking just the way you want, depending on the country you’ve chosen. For contrast, here’s what an aging country, Germany, looks like:

germany <- get_data("GM", 2014)

g1 <- ggplot(germany, aes(x = Age, y = Population, fill = Gender)) + 
  geom_bar(subset = .(Gender == "Female"), stat = "identity") + 
  geom_bar(subset = .(Gender == "Male"), stat = "identity") + 
  scale_y_continuous(breaks = seq(-4000000, 4000000, 1000000), 
                     labels = paste0(as.character(c(4:0, 1:4)), "m")) + 
  coord_flip() + 
  scale_fill_brewer(palette = "Set1") + 
  theme_bw()

g1

plot of chunk unnamed-chunk-3