Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: ESPN Cricinfo (2019).


Objective

The original data visualization depicts the percentage of runs scored by top batsmen of a particular team out of total runs scored by the team throughout the cricket World Cup 2019 with the target audience being the general public.

The visualisation chosen had the following three main issues:

  • The visualization is a pie chart. The total percentage of all entries is not equal to 100%. Also the number of parameters is more. In these situations it is not recommended to use pie chart as is can confuse the audience on what exactly the visualization wants to depict. For example- Someone might deduce from chart that Kane Williamson is the highest run scorer of the tournament with 30% of total runs which is not the case.
  • Same colors are used to depict a certain set of parameters on pie chart which can confuse the audience.
  • The sectors of pie chart are not mutually exclusive.

Attempts have been made to rectify the above issues.

Reference

Code

The dataset is created by filling in the entries using the data visualization into excel with 3 entries: Batsmen, Country, Percentage(%ge of Runs Scored)

The following code was used to fix the issues identified in the original.

library(ggplot2)

library(readxl)
runs <- read_excel("Desktop/Book1.xlsx")

colors = c("grey","blue"," green", "yellow", "dark green", "light green", "red", "cyan", "Purple", "Pink")


p2<-ggplot(data=runs, aes(x= Batsmen, y=Percentage)) 

p2<-p2+coord_flip()+geom_bar(stat = "identity", fill=colors)+theme_minimal()+labs(
  title = "Percentage of total team runs scored by Top scorer of each team in World Cup 2019", y="%ge of total team Runs scored by top scorer the entire tournament", x="Batsmen(Country)")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.