Original


Source: Altinok, Angrist and Patrinos(2018), Maddison Project Database 2020 (Bolt and van Zanden(2020))


Objective

The main objective of the original data visualisation is to depict average learning outcome and GDP rate of various countries from the standardised achievement tests organised for students across the world. It demonstrates Gross Domestic Product(GDP) on the y-axis and Average Leaning Rate on the x-axis. The results of tests conducted were pooled from subjects such as math, science and reading at primary and secondary levels of education. The price difference were adjusted over time for the GDP. (Note: Only year 2015 is being considered here.) Three major issues that were obvious from looking at the visualisation are as follows:

  • Visual Perception and Color Issue: The graph doesn’t convey a clear standardised message as a good data visualisation should provide. There is no important pattern being recognised by the audience by looking at the graph. Each bubble in the plot represents a country. It doesn’t consider colour-blind people as it will be difficult for them to notice the difference between green and red. For example, the big green circle for India contains minute red circles inside it representing countries like Nicaragua, Honduras and Guatemala. Such examples were repeated all over the graph for other countries too.

  • Deceptive method - Area and Size as quantity: The size of the bubble for a country in the graph represents the population. Only two countries with huge population namely, China, India and the US can be seen clearly from the graph. Firstly, there was no need to consider population for answering the practical question. Second, all other countries are represented by a dot which doesn’t reveal the population accurately. This is creating a deceptive impact on the audience. Moreover, since the all countries are in form of dots, it is hard to scroll over a particular country and differentiate amongst them since these are overlapping on each other hence, by using the size of the bubble, a clear message is not conveyed through the graph.

  • Visual Bombardment: Considering the third guiding principle of Kirk, “Creating accessibility through intuitive design”, using too many colors in the above plot is not creating effective human visual communication. Grouping countries data on the basis of continent seems to make the visualisation more complex and not easy to understand. There is lack of information due to the visual bombardment of the big data that is used to plot the graph. There are far too many countries and it is inappropriate to display all the countries by over-plotting the bubbles.

Reference

Code

The following code was used to fix the issues identified in the original.

library(dplyr)
library(matlib)
library(ggplot2)
library(tidyverse)
#Reading data from csv file
data = read.csv('learning-outcomes-vs-gdp-per-capita.csv')

data1 =  filter(data, Year==2015) #Filtering data for the year 2015
data1 = subset(data1, select = -c(6,7)) #removing unwanted columns

#Renaming the columns 1,4 and 5
names(data1)[1] <- "Country"
names(data1)[4] <- "Avg_learning_outcome"
names(data1)[5] <- "GDP"

#View(data1)

#filtering top 10 countries on the basis of GDP
top_10 <- top_n(data1, 10, GDP)
top_10
##                 Country Code Year Avg_learning_outcome    GDP     Continent
## 1               Ireland  IRL 2015               535.42  54278        Europe
## 2                Kuwait  KWT 2015               372.30  71354          Asia
## 3            Luxembourg  LUX 2015               508.88  55972        Europe
## 4                Norway  NOR 2015               530.48  82713        Europe
## 5                 Qatar  QAT 2015               438.90 156029          Asia
## 6          Saudi Arabia  SAU 2015               375.60  51681          Asia
## 7             Singapore  SGP 2015               619.17  65660          Asia
## 8           Switzerland  CHE 2015               543.83  59307        Europe
## 9  United Arab Emirates  ARE 2015               460.49  74746          Asia
## 10        United States  USA 2015               529.09  52591 North America
#Top 10 GDP countries
plot1 <- ggplot(top_10, aes(x=Country, y=GDP)) + geom_bar(stat = "identity", color = '#1A237E', fill = "#0277BD") + geom_text(aes(label = `GDP`), vjust = -0.2, size = 3.0) + labs( title="GDP of Top 10 Countries in Year 2015", x="Countries", y="Gross Domestic Product (GDP)") + theme_bw() + theme(plot.title = element_text(hjust = 0.5, face="bold"), axis.text.x = element_text(angle=45, hjust=1, size = 11), axis.text.y = element_text(size = 12), axis.title = element_text(face = "bold"))

#Learning outcome of top 10 GDP countries
plot2 <- ggplot(top_10, aes(x=Country, y=Avg_learning_outcome)) + geom_bar(stat = "identity", color = '#1A237E', fill = "#0277BD") + geom_text(aes(label = `Avg_learning_outcome`), vjust = -0.2, size = 3.0) + labs( title="Learning Outcome of Top 10 GDP Countries in Year 2015", x="Countries", y="Average Learning Outcome")+ theme_bw() + theme(plot.title = element_text(hjust = 0.5, face="bold"), axis.text.x = element_text(angle=45, hjust=1, size = 11), axis.text.y = element_text(size = 12), axis.title = element_text(face = "bold"))

Data Reference

Reconstruction

The graphs given below resolve all the issues that were states in the original plot. The top 10 countries are selected and a bar plot for their GDP is plotted. To easily understand the average learning outcome of these countries with highest GDP, another bar graph was plotted. Also considering the color-blindness in people, one color was chosen, the proportion of learning outcome can be observed without any eye-straining.