1 What Are We Gonna Do?

We will try to replicate one of The Economist plot, which tells us about the decrease of South Korean women fertility rates. You can look the articles here here .

2 Why we need to use R anyway?

Well, R is just a tool, just like other data visualization tools like Tableau or even Excel. But what makes R powerful is that it is FREE and it is REPRODUCIBLE. We can build a simple graph with Excel, but you can’t reproduce it in an efficient way. You have to go through the same steps, same clicks, everytime you want to reproduce a graph. In R, you just have to build the plot once, and you can recreate it as many as you want.

Another think to consider is that R has this great package, named ggplot2, which has many varieties of plot and supporting elements that you can use create a beautiful and charming visualization. To show you the power of R, I will recreate one of The Economist plot, which is acknowledged due to the simple yet elegant plot.

3 Preparation

Before we start our journey, first we must load the required libraries.

library(tidyverse)
library(scales)
library(ggrepel)
library(ggthemes)
library(grid)
options(scipen = 100)

The options() function means to set the scientific notation.

4 Data PreProcessing

4.1 Import Data

The data is acquired from World Bank
fertility rate data: https://data.worldbank.org/indicator/SP.DYN.TFRT.IN
population data: https://data.worldbank.org/indicator/SP.POP.TOTL?end=2018&start=1996
gdp per capita data: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD

data_tfr <- read_csv("data/world_bank_fertility.csv",skip = 4)
## Warning: Missing column names filled in: 'X64' [64]
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Country Name` = col_character(),
##   `Country Code` = col_character(),
##   `Indicator Name` = col_character(),
##   `Indicator Code` = col_character(),
##   `2018` = col_logical(),
##   X64 = col_logical()
## )
## See spec(...) for full column specifications.
data_tfr
data_pop <- read_csv("data/world_bank_pop.csv",skip = 4)
## Warning: Missing column names filled in: 'X64' [64]
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Country Name` = col_character(),
##   `Country Code` = col_character(),
##   `Indicator Name` = col_character(),
##   `Indicator Code` = col_character(),
##   X64 = col_logical()
## )
## See spec(...) for full column specifications.
data_gdp <- read_csv("data/world_bank_gdp.csv",skip = 4)
## Warning: Missing column names filled in: 'X64' [64]
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Country Name` = col_character(),
##   `Country Code` = col_character(),
##   `Indicator Name` = col_character(),
##   `Indicator Code` = col_character(),
##   X64 = col_logical()
## )
## See spec(...) for full column specifications.

Look like we have an empty column on 2018 and X64. Don’t worry, we will not use them anyway.

4.2 Data PreProcessing

Since the article only use the population data from 2017, we will clean the data first.

data_tfr <- data_tfr %>% 
  select(`Country Name`,'2017') %>% 
  rename(tfr = "2017")

data_pop <- data_pop %>% 
  select(`Country Name`,'2017') %>% 
  rename(pop = "2017")

data_gdp <- data_gdp %>% 
  select(`Country Name`,'2017') %>% 
  rename(gdp = "2017")

head(data_tfr,10)

Next, we will combine the data

df <- data_tfr %>% 
  left_join(data_pop) %>% 
  left_join(data_gdp) %>% 
  na.omit() %>% 
  rename(country = `Country Name`) %>% 
  arrange(desc(pop))
## Joining, by = "Country Name"
## Joining, by = "Country Name"
head(df,40)

Look like we have some region/cluster of countries in the data, so we need to remove them.

df <- df[-c(1:15,18:43),]

head(df,10)

Now we are ready to make the plot.

5 Plot

5.1 Scatter Plot

First, we plot the data into scatter plot, with the size of the dot indicate the population size.

p <- df %>% 
  ggplot()+
  geom_point(aes(gdp,tfr,size=pop),color="#2fc1d3",alpha=.5,show.legend = F)
p

We will smooth the data using the log10 scale

p <- p+
  scale_x_continuous(trans = "log10", 
                     expand = c(0.05,0),
                     labels = number_format(big.mark = ","),limits = c(100,110000))
p

Now, we rescale the size of the point.

p <- p+
  scale_size_continuous(range = c(3,16),guide = F)
p

Next, we scale the y axis

p <- p+
  scale_y_continuous(breaks = seq(0,8,2),limits = c(0,8.3),position = "right",expand = c(0,0))

p

5.2 Plot Theme

We clean the background and the grid.

p <- p+
  theme_economist()+
  theme(panel.grid.major.y = element_line(colour = "gray80"),
        panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.background = element_blank(),
        plot.background = element_blank())

p

Now we adjust the tick mark and the position of the y-axis text

p <- p+
  theme(axis.ticks.length.x = unit(2,"mm"),
        axis.text.y = element_text(vjust = 0))
p

5.3 Add Title and Annotation

We will remove the y-axis title and rename the x-axis title. Also, we will add the title and caption.

p <- p+
  theme(axis.title.y = element_blank(),
        axis.title.x = element_text(colour = "black",size = 12),
        plot.title = element_text(size = 14,face = "bold",color="black"),
        plot.subtitle = element_text(size=12,color="black"),
        plot.caption = element_text(size=10,color="gray30",hjust = 0))+
  labs(title = "Gone baby gone",
       subtitle = "GDP and fertility, 2017                                                                                                                                      Fertility rate",
       caption = "Source: World Bank",
       x = "GDP per capita, $, log scale")

p

We will add the annotation of population size and the replacement fertility level

p <- p+
  geom_hline(aes(yintercept=2.1),color="#edb0ad",linetype="dashed",size=1)+
  geom_text(aes(100,2.4,label = "Replacement fertility level"),
            color="#e07b78",hjust="left",size=4.5)+
  geom_text(aes(100,.5,label = "Circle size = Population, 2017"),
            color="#8e9093",hjust="left",size=4)
p

5.4 Highligt Selected Countries

Now we will highlights and add annotation for selected countries. Since the annotation line is in elbow shape, we need to be creative. To my best knowledge, there is no packages that can directly make an elbow annotation. I’ll show you how to make one.

First, we add the Niger.

df_niger <- df %>% filter(country == "Niger")

p <- p+
  geom_text_repel(aes(gdp,tfr,label=country),data = df_niger,nudge_x = -.15,direction = "x")+
  geom_point(aes(gdp,tfr,size=pop),data = df_niger,shape=21,fill="#2fc1d3",color="black")

p

Second, we add the India and China

df_inch <- df %>% filter(country %in% c("India","China"))

p <- p+
  geom_segment(aes(x=gdp,xend=gdp,y=tfr,yend=tfr-.7),data=df_inch)+
  geom_text_repel(aes(gdp,tfr-.7,label=country),data=df_inch,nudge_x = -.3,direction = "x")+
  geom_point(aes(gdp,tfr,size=pop),data=df_inch,shape=21,fill="#2fc1d3",color="black")
p

Next, we add Japan and The United States

df_japus <- df %>% filter(country %in% c("Japan","United States"))

p <- p+
  geom_text_repel(aes(gdp,tfr,label=country),data=df_japus,nudge_y = 1.5,direction = "y")+
  geom_point(aes(gdp,tfr,size=pop),data=df_japus,shape=21,fill="#2fc1d3",color="black")
p

We add Hongkong

df_hongkong <- df %>% filter(country == "Hong Kong SAR, China")
df_hongkong$country <- "Hong kong"

p <- p+
  geom_text_repel(aes(gdp,tfr,label=country),data=df_hongkong,nudge_y = -.7,direction = "y")+
  geom_point(aes(gdp,tfr,size=pop),data=df_hongkong,shape=21,fill="#2fc1d3",color="black")
p

We add Singapore

df_sing <- df %>% filter(country == "Singapore")

p <- p+
  geom_segment(aes(x=gdp,xend=gdp,y=tfr,yend=tfr-.5),data=df_sing,color="black")+
  geom_text_repel(aes(gdp,tfr-.5,label=country),data=df_sing,color="black",nudge_x = 0.2,direction = "x")+
  geom_point(aes(gdp,tfr,size=pop),data=df_sing,shape=21,fill="#2fc1d3",color="black")
p

Finally, we add South Korea

df_kor <- df %>% filter(country == "Korea, Rep.")
df_kor$country <- "South Korea"

p <- p+
  geom_segment(aes(x=gdp,xend=gdp,y=tfr,yend=tfr-.5),data=df_kor)+
  geom_text_repel(aes(gdp,tfr-.5,label=country),data=df_kor,nudge_x = -.3,direction = "x",fontface="bold")+
  geom_point(aes(gdp,tfr,size=pop),data=df_kor,shape=21,fill="#2fc1d3",color="black")
p

5.5 Add Red Header

To put cerry on the top, we will add the unique red header of The Economist plot.

png("economist_plot.png", width = 9, height = 6, units = "in", res = 300)
p
#Add red rectangle on top left side
grid.rect(x=1, y=.995,hjust = 1,vjust=0,gp=gpar(fill='#e5001c',lwd=0))
grid.rect(x=0.05, y=.98,hjust = 1,vjust=0,gp=gpar(fill='#e5001c',lwd=0))
dev.off()
## png 
##   2

5.6 Final Result

Our plot is ready.