Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
This visualisation is taken from the report on Alcohol Consumption in the world published on the website Our World in data. The report as a whole consists of various charts comparing the alcohol consumption across the world plotted against various dimensions like demographics,age,gender,history, etc and also throws light on relation between alcohol consumption and health,crime,etc and its impact on expenditure and how it varies with income.
The visualisation chosen for this assignment is Alcohol consumption by sex where it compares the share of men vs women who drank alcohol in the year 2016. The visualisation is a bubble chart which consists of Percentage of women and Percentage of men who drank alcohol in the year 2016 plotted on X axis and Y axis respectively, with each bubble representing countries of the world and the size of the each bubble represents the population of that country. The chart also has a legend that groups the bubbles according to the continent that each country belongs to. The visualisation had the following three main issues:
*Deception : Visual Bombardment - 1) Too much information for a single chart which leads to clutter and hence its difficult to understand in the first go as to what exactly should one interpret by looking at the chart. 2) 3 measures plotted against 2 dimensions and that too for a dimension like country trying to plot all countries of the world in a single chat creates clutter of information and hence one cannot clearly see any numbers unless hovered over.
*Ethical issues such as perceived basis - 1) The measures plotted compares percentage of women and percentage of men on the X and Y axis respectively and the third measure is the population of the country. But when we actually compare the numbers we can see that the percentage of men and women when added up goes above 100% for some countries this means that the percentages plotted are of the respective gender percentages taken over the respective gender population which is misleading because as the chart compares percentages of the respective gender alcohol consumption against population of country one could possible think its in proportion to each other. Also it does not make sense to plot these measure together as it brings no insight comparing the percentages of each gender and total population, it would have made more sense if the percentages were relative to the population.
*Color issues - 1) The chart aims at comparing the share of men and women consuming alcohol in each country , however the colors group the countries as per the continents and hence this takes away the focus from the main objective of the chart as we know that when it comes to perception colors play a vital role.
Reference
The following code was used to fix the issues identified in the original.
library(readr)
library(ggplot2)
library("dplyr")
options(scipen=999)
AlcoholData <- read_csv("C:/Users/cinat/OneDrive/Desktop/Cina/Sem 2/DataViz/Data.csv")
View(AlcoholData)
AlcoholData$Code <- reorder(AlcoholData$Code, AlcoholData$Total_Population)
# convert data to long format
data_long <- tidyr::pivot_longer(AlcoholData, cols = c("Total_NonAlcohol","Male_Alcohol", "Female_Alcohol"), names_to = "group", values_to = "value")
View(data_long)
# create stacked bar chart
ggp_act <- ggplot(data_long, aes(x = value, y = Code, fill = group)) +
geom_bar(stat = "identity") +
labs(title = "Share of men vs women who drank alcohol in the year 2016 (Actuals)", x = "Value", y = "Country_Code", fill = "Group") +
scale_fill_manual(values = c("grey","pink", "blue"), name = "Group", labels = c("Non Alcohol Population","Male Alcohol Consumtion", "Female Alcohol Consumption"))
data_long_P <- tidyr::pivot_longer(AlcoholData, cols = c("Total_NonAlcP","Male_AlcP", "Female_AlcP"), names_to = "group", values_to = "value")
ggp_perc <- ggplot(data_long_P, aes(x = value, y = Code, fill = group)) +
geom_bar(stat = "identity") +
geom_text(aes(label = value), position = position_stack(vjust = 0.5)) + # add value labels
labs(title = "Share of men vs women who drank alcohol in the year 2016 (Percentages)", x = "Value", y = "Country_Code", fill = "Group") +
scale_fill_manual(values = c("grey","pink", "blue"), name = "Group", labels = c("Non Alcohol Population","Male Alcohol Consumtion", "Female Alcohol Consumption"))
Data Reference
*World Bank staff estimates based on age/sex distributions of United Nations Population Division’s World Population Prospects: 2022 Revision. , from website https://data.worldbank.org/indicator/SP.POP.TOTL.FE.ZS
The following plot fixes the main issues in the original.
Assumptions made :- 1) Only two types of Gender i.e Male and Female 2) While the major issue highlighted was data clutter and that in the reconstruction plot the data is subsetted for convenience of plotting in R, what i want to suggest here is that when using an actual visualization tool we could take the continents into an external filter instead of plotting it in the graph itself. OR simply provide scroll view and plot all countries together while sorting wrt to total population.
Data pre-processing :- 1) Added actual Male and female population distribution and calculated actual number of total consumption of each gender based on total population and also their percentages 2) Subsetted the data to consider only one continent countries i.e South America (for convenience of plotting and displaying in R graph)