Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Visual Capitalist (Mendoza, 2022).


Objective

The objective of this Sankey Diagram is to highlight the best selling video game console platforms in proportion to their parent companies and region.

Furthermore, this infographic is intended towards a general audience of console gamers who may show interest or curiosity behind the popularity of their favourite platforms.

However, the visualisation chosen had the following three main issues:

  • It is too complicated. A Sankey diagram is effective when there are less labels, but given the amount of platforms, the entire graph becomes difficult to follow and confuses the audience. To this end, there are also too many platforms that do not necessarily contribute to the intended goal of highlighting the best selling consoles.

  • Ornamental x-axis. The flow of shapes is intended to connect the proportions of sales to their respective categories. However, it sub-consciously implies transition and change, even though this is not the case. In truth, this x-axis only exists as an aesthetic means of connecting the various categories. However, this connection can be represented in a less complicated manner.

  • Deceptive practices. The infographic implies that the source data is an authentic indication of platform sales. However, upon further inspection of the source data company, VGChartz, it is revealed that they are suspected of estimating sales rather than reporting them. The original console sales data is often kept private by the company and the sales figures which are publicly announced are likely inflated for marketing purposes. Furthermore, the original dataset held a lot of missing values for the regional sales of smaller platforms. Despite this, the infographic displays regional sales for all of them. Using estimations is the only option since company data is private, but labeling them as authentic and then inventing figures where they don’t exist can be considered to be overall deceptive.

Reference

Code

The following code was used to fix the issues identified in the original.

library(readxl)
library(ggplot2)
library(tidyr)
cs = read_excel('vgchartz.xlsx') 

cs_long = pivot_longer(cs,cols=c('North America', 'Europe', 'Japan', 'Rest of World'),
                    names_to='Region',
                    values_to='Sales')
p1 <- ggplot(cs_long,                         
       aes(x = Platform,
           y = Sales,
           fill = Region)) + 
  geom_bar(stat = "identity",
           position = "stack") +
  facet_grid(~ Company, scales = "free_y") +
  coord_flip() +
  labs(title="Best Selling Video Game Consoles of All Time",
       x="Platform", y= "Units Sold (Millions)") +
  theme(plot.title = element_text(hjust = 0.5))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.