Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The main objective of the visualisation/infographic is to inform about the diversity when it comes to languages spoken aroud the world when it comes to native speakers. The targetted audience for this infographic can be someone who is starting to take interest in learning a new language and this infographic could help them decide which one is the right for them based on how many people would communicate with it. This infographic could be used as a marketing strategy to focus on people looking to learn something new.
The visualisation chosen had the following three main issues:
Inaccurate data- The biggest issue for the visualization is that it uses incorrect data to prove a point. The total number of english speakers in the world are not 1.8 Billion but rather 1.34 Billion. The data source used for this infographic is not credible enough to guarantee accurate statistics.
Difficult to understand- Even though pie charts are a poor and obnoxious way to describe a point, at least with them there is still an aspect of getting the gist of visual by looking at the angles. Here even that luxury is taken away as donut charts give you no semblance of how the variables are distributed especially when one moves away from bigger shares of the donuts.
Color palette- A big concern for me with the color palette is that it does not cater for colorblind poeple. Shades of Brown and Grey should not be used side by side with green so as to help people with colorblindness to easily interpret charts and plots.
Reference
The following code was used to fix the issues identified in the original.
library(readxl)
library(ggplot2)
library(reshape2)
firstLanguages <- read_excel("FirstLanguages.xlsx")
totalSpeakers <- read_excel("TotalLanguages.xlsx")
colnames(firstLanguages) <- c("Language", "spoken", "Official Language", "Worldwide Percentage", "First Language Speakers(In millions)")
df <- merge(firstLanguages,totalSpeakers,by="Language")
keeps <- c("Language","First Language Speakers(In millions)", "Total Speakers (In Millions)")
maindf = df[keeps]
maindf$Language <- factor(maindf$Language,levels = c("Chinese","Hindi", "Spanish", "English", "Arabic", "Bengali", "Portuguese", "Russian", "Japanese", "Punjabi"))
plotdf <- melt(maindf, id.vars='Language')
plot2 <- ggplot(data = plotdf, aes(x = Language, y = value, fill = Language, label = value)) +
labs(y = "Population in Millions", title = "Native speakers vs total speakers for the 10 most popular languages") +
geom_bar(stat = "identity") + coord_flip() +
facet_wrap(~ variable) + scale_fill_brewer(palette = "Paired") +
theme(axis.text.x = element_text(face="bold", color="#993333", size=8),
legend.background = element_rect(fill='white',color='white'))
Data Reference
Worldwide distribution of languages. (2019). Worlddata.info. Retrieved May 2, 2021, from https://www.worlddata.info/languages/index.php
What are the top 200 most spoken languages? (2018, October 3). Ethnologue. Retrieved May 2, 2021, from https://www.ethnologue.com/guides/ethnologue200
The following plot fixes the main issues in the original.