Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Visualizing The Richest Countries in The World (2020).


Objective:

The visualization explains about the richest countries in the world based on the GDP per capita at power purchase parity, current prices for the year 2020. The aim of this visualization is to categorize the countries by their respective continents and GDP per capita range (i.e), “$100K and more”, “$50K - $99.9K”, “$10K - $49.9K” and “Less than $10K” expounding the richest countries in the world and within a particular continent by their rank labels.

Target Audience:

Given this data is taken from the World economic outlook database, the target audience is likely to be the general public with an interest towards world economy.

The visualization chosen had the following three main issues:

  • Visual Bombardment: There are too many components in the visualization that can deceive a viewer or an audience from the actual objective of the plot. For example, the aesthetic usage of country flag and maps does not provide any valuable information. Not many people can identify a country by its flag and one has to read the country’s name below the flag for identification. Hence, this is time consuming and can deceive an audience from the objective of the visualization.

  • Poorly illustrated Legend: The legend in the visualization explains about the range of GDP per capita. This is illustrated by a thin border of color around a country’s flag which may be difficult for the audience to spot out. This is another example where it is time consuming for an audience to identify and classify the range.

  • Poor Scaling and Ranking: The visualization displays only top 10 countries in a grid or continent. This can mislead an audience in making inferences. For example, from the visualization one can say that Oceania has the most number of countries with GDP per capita less than $10k, but as per the data Africa has the highest number of countries below the $10k GDP per capita range.
    The ranking of countries within a continent can sometimes deceive an audience from the objective of the visualization, for example Qatar and Seychelles have a difference of about $105.8k in the GDP per capita but both are ranked number 1 in their respective continents. As the title of the visualization says the objective is to visualize the richest countries in the world and not in each continent. So ranking should not be done within each continent which can delude an audience, rather it can be done based on the GPD per capita of the countries.

Reference

Code

The following code was used to fix the issues identified in the original.

# Importing the packages 
library(readr)
library(dplyr)
library(forcats)
library(ggplot2)

# Setting the working directory where the data is present
setwd("C:/Users/moham/OneDrive - RMIT University/Documents/RMIT/Courses/Sem 2/Data Viz/Assignment 2")

# Reading the data required for Visualisation
# The website from where the visualsation is taken does not provide data about the continents. So I am using an open continent dataset for mapping the countries to their respective continents.
df1 <- read_csv("Data_viz.csv")
df2 <- read_csv("continents.csv")

# Merging the two datasets using left_join()
df <- left_join(df1, df2, by = "Country")

# Data Manipulation:
# For better readability, the GDP per capita values for 2020 are converted to thousands of dollars followed by subsetting of data. 
df <- df %>%
  select("continent", "Country", `2020`)%>%
  mutate(`2020` = round(`2020`/1000,1)) %>%
  mutate(Range = ifelse(`2020`>=100, ">= $100k", ifelse(`2020`>=50 & `2020`<100, "$50k-$99.9k", ifelse(`2020`>=10 & `2020`<50,"$10k-$49.9k", "<$10k"))))

# Removing rows that have missing value
df <- df[complete.cases(df),]

# Defining the rank and Changing data types 
Rank <- seq(1, 169, 1)
df <- cbind(df, Rank)

df$Rank <- factor(df$Rank, ordered = TRUE)
df$continent <- as.factor(df$continent)
df$Range <- factor(df$Range, levels = c("<$10k", "$10k-$49.9k", "$50k-$99.9k", ">= $100k"), ordered=TRUE)
str(df)
## 'data.frame':    169 obs. of  5 variables:
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 4 3 4 3 4 3 3 4 ...
##  $ Country  : chr  "Qatar $138.9K" "Macao SAR" "Luxembourg" "Singapore" ...
##  $ 2020     : num  139 113 112 106 87 ...
##  $ Range    : Ord.factor w/ 4 levels "<$10k"<"$10k-$49.9k"<..: 4 4 4 4 3 3 3 3 3 3 ...
##  $ Rank     : Ord.factor w/ 169 levels "1"<"2"<"3"<"4"<..: 1 2 3 4 5 6 7 8 9 10 ...
# Adding ranks to the country labels and sorting them based on the GDP per capita
df <- df %>%
  mutate(CR = paste(Country,"(",Rank,")"))%>%
  mutate(CR = fct_reorder(CR, `2020`))%>%
  arrange(continent, desc(`2020`))

# Plotting:
background <- "#EFF1F0"

p1 <- ggplot(data = df, aes(x = `CR`, y = `2020`, fill = `Range`))+
  
      geom_bar(stat = "identity", colour = "black")+ coord_flip() +
  
      facet_wrap(continent~., scales = "free_y")+
  
      labs(title = "Richest Countries in the World based on their GDP per Capita for the year 2020",
          subtitle = "Countries ranked by their GDP per capita (shown after a country's name)\n",
          x = "Countries and their ranks\n",
          y = "\nGDP per capita in thousand dollars",
          fill = "GDP per capita Range",
          caption = "Source: Howmuch.net- https://howmuch.net/sources/richest-countries-in-the-world-2020")+
  
      theme(title = element_text(size = 90, hjust = 0.5, face = "bold"),
            axis.text.y = element_text(size = 55),
            axis.text.x = element_text(size = 55),
            axis.title.x = element_text(size = 80, hjust = 0.5, vjust = 1),
            axis.title.y = element_text(size = 80),
            text=element_text(family="Georgia"),
            strip.text.x = element_text(size = 70),
            legend.key.size = unit(9, "cm"),
            legend.text = element_text(size = 55),
            legend.title = element_text(size = 70, hjust = 0.5),
            legend.position = c(0.85, 0.25),
            legend.background = element_rect(fill = background),
            plot.caption = element_text(size = 60, hjust = 0.5, vjust = 0, face = "plain"),
            panel.grid.major = element_line(size = 0.5, linetype = "solid", colour = "white"),
            panel.grid.minor = element_line(size = 0.5, linetype = "solid", colour = "white"))+
  
      geom_text(aes(label = paste("$",df$`2020`,"k")), size=17, hjust=-0.1,check_overlap = TRUE,
            position = position_dodge(0.1), color = "Dark Blue", fontface="bold")+
  
      scale_fill_brewer(palette = "OrRd")+
  
      scale_y_continuous(breaks=seq(0,140,by=20))

References:

Data Reference:

Website Reference:

Reconstruction

The following plot fixes the main issues in the original.
Note: The number in the bracket after a country’s name represents the rank of the country based on the GDP per capita and the value at the tip of each bar represents the GDP per capita of that particular country.