Original


Source: ESPN Cricinfo Feature Section (2019).


Objective

The main objective of the article as a whole was to identify which top cricket city would win the world cup based on statistics alone. This was followed by numerous different sections divided by wickets and runs as there were more than one graphic depicting which cities were the top based on different categories. There were many different categories such as best batting averages, best bowling averages, most man of the match awards and then finally the final calculation of what the best team would look like. The overall purpose of the article was to use statistics of specific categories to determine the strongest lineup from the two top cities based on their player outputs since the start of the game. This was then used to determine a world cup final between these two cities where one was subjectively chosen to supposedly win the final. The targeted audience was the general cricket public that would be intrigued by which cities produced the best talent based on statistics and in terms would be curious to see what a hypothetical world cup final would look like.

The visualisation chosen had the following three main issues:

Colour: Based on the data visualisation shown above it can be determined that the colours chosen in the visualisation makes it hard to read and makes it hard to focus on the necessary information. There is too much colour and it takes away from reading the statistical numbers you are meant to. Therefore, making it even harder for one to understand and interpret the data visualisation correctly.

Deception Issues: This data visualisations attempts to use area and size to depict the amount of runs per city however because it has tried to use a building from each city it makes the area not scaled correctly. Sydney which has the highest average has the smallest building depicted for it. It could be said that this is because the Sydney Opera House building is a famous building that most people would identify Sydney with. However, to better depict the visual of the highest average, a bar chart or some form of chart should have been used to easily depict the decrepency between the scores.

Visualisation Choice: The graphic used to depict the data just is not best suited graphic to depict the data and convey the message it hopes to convey. By changing this to a bar chart would make it easier for the audience to depict what cities are the top cities based on their statistics.

Reference

Code

The following code was used to fix the issues identified in the original. It was decided that a simple bar chart that could depict the highest averages easier would be best suited for this dataset. Furthermore, a more neutral colour scheme was chosen to be able to include more people to be able to read the visual easier.

cricket <- data.frame(City = c("Sydney", "Launceston", 
                                  "Cape Town", "Delhi", "Mumbai"),
                      Runs = c(5416, 2657, 4501, 3172, 3938),
                      Average = c(45.37, 44.80, 44.29, 36.53, 35.51))
cc1 <- ggplot(cricket, aes(x = City, y = Runs, fill = City)) +
  geom_bar(stat = "identity", width = 0.5) +
  geom_text(aes(y = Runs, label = Average), vjust = -1, color = "black") +
  scale_fill_viridis_d() +
  labs(title = "Total Runs by City", subtitle = "The numbers above each column are the average runs per city", x = "City", y = "Runs") +
  scale_y_continuous(limits = c(0,6000)) 

Data Reference

Reconstruction

The following bar chart fixes the main issues in the original.