Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: [OC] Unvaccinated 80+ year olds (ECDC) : dataisbeautiful, 2021


Objective

The data visualization above indicates the percentage of unvaccinated elderly above the age of 80 in European countries. The map intends to show the general public and government authorities who want to know the percentage of unvaccinated people over the age of 80 in different European countries.

The visualisation chosen had the following three main issues:

  • Color Issue:
    • Color chosen for two adjacent countries are same making tough to differentiate between those two countries. For example: There are two countries in blue color and green color, adjacent to each other making it difficult for the user to distinguish between them. Moreover, there is no border which might have helped user to distinguish between two different countries.
  • Improper Labelling and Scaling:
    • This visualization would have been more satisfactory if countries have been labelled with the total number of population, which would have helped users to compare the population with respective unvaccinated percentage for different countries. Since there is no scaling, one cannot compare which country elderly population is highly unvaccinated. Moreover, there are no labels for the country name which makes users skeptical as to which percentage is related to which country.
  • Deceptive Issue:
    • There are few countries which show no elderly people above the age 80 are vaccinated, but after meticulously exploring the data it was found that some percentage of elderly people are vaccinated. For example: Portugal, Ireland, and Spain show 0% unvaccinated people, while after exploring unvaccinated percentage came out to be 2.44%, 6.87%, and 2.75% respectively. This leads to providing misleading information to target audience. Furthermore, percentage labels in the map are closely labelled making it difficult to interpret which percentage belongs to which country.

Reference

Code

The following code was used to fix the issues identified in the original.

library(readr)
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
library(scales)
library(readxl)

#reading the data
VaccinatedDataset <- read_excel("dataset_2021-W35.xlsx")

#Changing the name using using column number
names(VaccinatedDataset)[1] <- "ReportingCountry"
names(VaccinatedDataset)[2] <- "AgeGroup"
names(VaccinatedDataset)[5] <- "GroupPopulation"
names(VaccinatedDataset)[7] <- "FirstDose"

#Preprocessing the data
VaccinatedDataset <- subset(x = VaccinatedDataset,subset= AgeGroup =="Age80+",select = c(ReportingCountry,AgeGroup,GroupPopulation,FirstDose))

#Checking datatype of each columns

VaccinatedDataset$GroupPopulation <- as.numeric(VaccinatedDataset$GroupPopulation)
VaccinatedDataset$FirstDose <- as.numeric(VaccinatedDataset$FirstDose)

#Selecting required columns from dataset and mutating new columsn as required.
FilteredDataset <- VaccinatedDataset %>% group_by(ReportingCountry,GroupPopulation) %>% summarise(TotalDose=sum(FirstDose))
FilteredDataset <- FilteredDataset %>% mutate(UnvaccinatedPopulation = abs(GroupPopulation-TotalDose))
FilteredDataset <- FilteredDataset %>% mutate(UnvaccinatedPercentagePopulation = (format(round((UnvaccinatedPopulation/GroupPopulation)*100,2))))
FilteredDataset <- FilteredDataset %>% mutate(VaccinatedPopulation = abs(GroupPopulation-UnvaccinatedPopulation))

#This code reorders the levels within the GroupPopulation factor based on the decreasing population
FilteredDataset$ReportingCountry <- FilteredDataset$ReportingCountry %>%
 factor(levels = FilteredDataset$ReportingCountry[order(+FilteredDataset$GroupPopulation)])

#Converting the data into long format for Stacked bar chart.
data_long = gather(FilteredDataset,Type,Value,VaccinatedPopulation,UnvaccinatedPopulation)

#Specifying colour for bar chart.
typecolour = c(UnvaccinatedPopulation = "thistle2", VaccinatedPopulation = "skyblue2",GroupPopulation = "skyblue2")

#To display the numbers in normalform.
options(scipen=999)

#Creating the Stacked bar chart
p1<-ggplot(data_long,aes(x = ReportingCountry, y = Value,fill = factor(Type,levels=c("VaccinatedPopulation","UnvaccinatedPopulation"))))+
  geom_bar(stat="identity") + coord_flip()+   
  scale_fill_manual(values = typecolour,breaks=c('UnvaccinatedPopulation','GroupPopulation'))+
  geom_text(data = data_long %>% filter(Type == "UnvaccinatedPopulation"),aes(label= paste(UnvaccinatedPercentagePopulation,"%") ),position = position_stack(vjust = 0),size = 3,fontface = "bold")+
  geom_text(data = data_long%>% filter(Type == "VaccinatedPopulation"),aes(label = comma(GroupPopulation)),size = 3,fontface = "bold",hjust=-0.5)+
  labs(title ="Unvaccinated Population Age80+ (European Countries)",subtitle='Plot of Unvaccinated Elderly People Population in Percentage till 15th Sept 2021', y = 'Total "Age80+" Population', x = "European Countries",fill="Legend") +
  scale_y_continuous(expand = c(.09, .09),labels = scales::comma,) +
  theme_minimal()+
  theme(panel.grid.major = element_blank(),
        plot.title = element_text(color="black", size = 15, face = "bold"),
        plot.subtitle = element_text(color='grey21'),
        axis.title.x = element_text(vjust = -1, size = 11, face = "bold"),
        axis.title.y = element_text(size = 11, face = "bold"),
        legend.position = "bottom",
        legend.title = element_text(face = "bold" ))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.