Original


The Population of Every Country is Represented on this Bubble Chart
https://www.visualcapitalist.com/population-every-country-bubble//


Objective

The objective of the original data visualization is to represent a simple and concise panorama of the most populated countries in the world. The audience of the data visualization could be the general public, especially those whose study or work is related to world demographics,for instance staff members of international organizations, scholars of global development and health, etc.

The visualization chosen had the following three main issues:

  • The visualization is fantastic in appearance, but with less accuracy in information delivered to the audience. For example, it has only offered deceptively simplicity perspective on world population but the exact population number for each country is missing. what’s more, there is no descriptive information or sub-title in the visualization. The proportion of each bubble is also unclear. Last but not least, many unlabeled bubbles are literally nonfunctional without any additional explanation.
  • It could be observed that the creator tried to answer many questions in this single visualization but failed to address any. Thus arrangement and plotting of the visualization could be optimized for better focus.For example, a bar plot could simply show the sequence and the size of the population of different countries as well as the number of countries in the plot could be narrowed down by significance.
  • The visualization is not color-blinded friendly. As shown in the following simulations, the colors for Asian countries and African countries are difficult to distinguish for the red-blinded, while that also happens for green-blinded between Asian and American countries.

Reference

Code

The following code was used to fix the issues identified in the original.

library(tidyverse) #for data wrangling and plotting
library(magrittr) #for pipe function
data <- read_csv("E:/Download/SYB63_1_202105_Population, Surface Area and Density.csv") #load data from downloaded path
data<-filter(data, Year==2019,Series=="Population mid-year estimates (millions)") # draw the latest data
data<-arrange(data,desc(Value)) # re-arrange data by descendant sequence of variable "value"
data<-data[1:15,c(2,5)]#subset the data with top 15 populated countries
continent<-factor(c("Asia","Asia","Americas","Asia","Asia","Americas","Africa","Asia","Europe","Americas","Asia","Africa","Asia","Africa","Asia")) #create a factor variable with respective continent name of each country
data<-data%>%rename("Country"="X2","Population"="Value")%>%mutate(Continent=continent) #rename the column and add the the new column "Continent"
p<-ggplot(data, aes(y=reorder(Country,Population),x=round(Population), fill=Continent)) +ylab("Country")+xlab("Population")+geom_col() # plot a bar plot of population of each country and colored by continent
p<-p+labs(title="Top 15 Countries in Population",
       subtitle=" As of 2019 By United Nations Statistcs (Rounded in Millions)",
       caption="Work by Tianzhuo Zheng",
       ) # add different labels to the plot
p<-p+theme(axis.title.x = element_text(size=20, 
                                      color="#0570B0", 
                                      face="bold",
                                      angle=0)) # customize title of x axis 
p<-p+theme(axis.title.y = element_text(size=20, 
                                      color="#0570B0", 
                                      face="bold",
                                      angle=90)) #cusomize title of y axis
p<-p+theme(axis.text.y = element_text(face="bold",
                                     colour="black",
                                     size=12)) #customize y axis text
p<-p + theme(plot.caption= element_text(size=10,
                                   color="Purple",
                                   face="bold"))# customize caption
p<-p + theme(plot.title= element_text(size=20,
                                   color="blue",
                                   face="bold"))# customize title
p<-p + theme(plot.subtitle= element_text(size=10,
                                      color="red",
                                      face="bold"))# customize sub-title
p<-p+scale_x_continuous(limits=c(0,1750),breaks = seq(0,1750,250),expand = c(0,0))# adjust x axis scale
p<-p+geom_text(aes(label = round(Population)), size = 4, hjust = 0, vjust = 0.2,color="steelblue") #adjust text of bar plot
p<-p+ scale_fill_manual(values =c('#9ECAE1','#6A51A3','#C7E9B4','#FF7F00')) #adjust colorblind friendliness of each continent
p<-p+theme(axis.ticks.x = element_line(color = "#CB181D", size = 2)) #adjust x axis ticks

Data Reference

Reconstruction

The following plot fixes the main issues in the original.


color-blind test for red and green


Reference and Acknowledgements

Reference

  • Pathak, Manas A. “Data Visualization.” Beginning Data Science with R. Cham: Springer International Publishing, 2014. 31–60. Web.
  • Srinivasa, K. G, Siddesh G. M, and Srinidhi H. “Introduction to Data Visualization.” Network Data Analytics. Cham: Springer International Publishing, 2018. 321–331. Web.
  • Muenchen, Robert A. “Graphics with Ggplot2.” R for SAS and SPSS Users. New York, NY: Springer New York, 2011. 521–598. Web.
  • Chen, Min. et al. Foundations of Data Visualization. 1st ed. 2020. Cham: Springer International Publishing, 2020. Web.

Acknowledgement

The reconstruction of the visualization is the reflection of the insight of the illustration in the article published by Mr.Jeff Desjardins, and the data that has been used for the plotting are from United Nations Statistic Division. The color-blind test in this report was simulated through Colblinder (website: https://www.color-blindness.com/). All efforts related to the above-mentioned online resources are highly appreciated.

I would also like to take this opportunity to extend my special gratitude to Ms. Mojdeh Shirazi-Manesh, my online course facilitator from RMIT, on her informative lecturing and detailed instruction on this assignment.