MATH2404 - Data Visualisation

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original

Figure 1. Australian Federal Government: Revenue and Expenditure for 2016-17 (original)

Source: Commonwealth of Australia (2016, p. 25).

Objective

The above pie charts were used by the Australian Federal Government to communicate detailed information on revenue and spending in their annual budget projection papers for 2016-17. Pie charts are considered an infographic, which means they are designed to grab readers’ attention, as opposed to helping understand patterns that may be present in the data (Gelman and Unwin, 2013).

The information presented in these pie charts is intended for a wide audience. For example:

Fellow parliamentarians who are in government, or in opposition;
Journalists who will communicate the budget results widely through a variety of means such as news websites and social media;
The general Australian public, especially voters and taxpayers;
Corporate leaders, who will consider the potential impact on their business strategy;
International investors, including credit rating agencies, who make assessments which affect the Federal Government’s access to, and cost of, capital; and,
Foreign governments, who will consider any potential impact on two-way trade (e.g. sales tariffs).

At a superficial level, communicating the projection of revenue and spending is a straightforward matter. We would like to know where the funds are coming from (revenue), where the funds are being spent (expenses), and what the difference is (surplus/deficit). In accounting terms, this is essentially an Income Statement (Investopedia, 2021). The information as it is presented however falls short for the following three reasons:

There is a lot of detail, but no clear summary

If we are to understand the Federal Government’s projected revenue and spending, we need to know the total revenue, total spending, and the difference (surplus/deficit). However this summary is not readily accessible in the pie charts which are multi-coloured and overflowing with detail. To find the summary, we are forced to explore the wordy paragraph at the top of the page where we locate total revenue and spending; even then, the difference (surplus/deficit) is not provided to the reader. Having now located our total revenue and spending, we can punch the numbers into our trusty calculator, only to find the Federal Government will have a deficit of ($33.70 billion) in 2016-17. Some explaining to do to the voters then.
Poor visual alignment creates complexity

For the individual items on each of the pie charts it is necessary to follow detailed categories and numbers in a circular manner, which means glancing at the information is not really possible. As there is no linear alignment for the individual categories, it is easy to lose track if one looks away from the page. Interestingly, there is also no clear order to the circle - alphabetically or numerically. With 17 categories across the two charts, this creates added visual complexity when processing the information.

There is also further complexity when trying to compare information between the two pie charts. Because they are not located side-by-side, we find ourselves looking up-and-down, which is less instinctive for a Western audience, who are more used to reading information from left-to-right.
Colour Scheme

With 17 categories to observe, the author has used the same colour palette in both pie charts of blue-green-grey, presumably so as to avoid an explosion of colours across the page.

This has the resulting effect of creating some confusion when trying to compare the two charts, as colours that exist in both instinctively create the impression of a relationship between categories. However on closer inspection this is clearly not the case. For example, there is no clear relationship between an individual’s income tax (olive green) and social security and welfare (also olive green).

The high prevalence of blue-green-grey in this colour scheme also potentially disadvantages individuals with Tritan colour blindness (tritanopia / tritanomaly). While it has a low prevalence in the overall population, it is more likely to occur later in life due to conditions affecting the ageing of the eye, such as glaucoma and cataracts. Affected individuals do not see blue colours well, and may find it difficult to differentiate between blue-and-green (Pacheco-Cutillas et al, 1999).

References

Commonwealth of Australia (2016) Budget 2016-17 Overview 3 May 2016. Available at: <https://archive.budget.gov.au/2016-17/glossies/Budget2016-17-Overview.pdf>. [Accessed 1 August 2021].
Gelman, A & Unwin, A. (2013) “Infovis and statistical graphics: Different goals, different looks.” Journal of Computational and Graphical Statistics 22.1 (2013): 2-28. Available at: <https://primo-direct-apac.hosted.exlibrisgroup.com/permalink/f/aqirjb/TN_cdi_proquest_journals_1324508998>. [Accessed 1 August 2021].
Investopedia (2021) The Three Major Financial Statements: How They’re Interconnected. Available at: <<https://www.investopedia.com/ask/answers/031815/how-are-three-major-financial-statements-related-each-other.asp>>. [Accessed 1 August 2021].
Pacheco-Cutillas M., Sahraie A., & Edgar D.F. (1999) "Acquired colour vision defects in glaucoma - their detection and clinical significance.’ British Journal of Opthalmology (1999); 83:1396-1402. Available at: <https://bjo.bmj.com/content/bjophthalmol/83/12/1396.full.pdf>[Accessed 1 August 2021].

Code

Reconstruction of the original charts commenced by entering the data into R as dataframes, with a view to build a Sankey diagram using googleVis. The following code was then used to improve upon the issues identified in the original charts.

Key inspirations for the coding are detailed in the Data References list at the bottom of this page.

# Load libraries
library(tidyverse)
library(googleVis)

# Create dataframes
Rev_spend <- data.frame(rs_from=c(rep('', 7),'Individuals income tax $201.3 billion', 'Fringe benefits tax $4.8 billion', 'Superannuation taxes $7.5 billion', 'Company and resource rent taxes $71.0 billion', 'Other taxes $6.1 billion', 'Sales taxes $64.8 billion', 'Fuels excise $18.4 billion', 'Other excise $3.4 billion', 'Customs duty $14.0 billion', 'Non-tax revenue $25.6 billion'),
rs_to=c('Social security and welfare ($158.6 billion)', 'Other purposes ($89.1 billion)', 'Health ($71.4 billion)', 'Education ($33.7 billion)', 'Defence ($27.2 billion)', 'General public services ($22.7 billion)', 'All other functions ($47.9 billion)', rep('', 10)),
rs_weights=c(c(158.6, 89.1, 71.4, 33.7, 27.2, 22.7, 47.9),c(201.3, 4.8, 7.5, 71.0, 6.1, 64.8, 18.4, 3.4, 14.0, 25.6)))  

# Create colour for nodes and links
colors_link <- c('orange', c(rep('lightblue', 17)), c(rep('orange', 7)))
colors_link_array <- paste0("[", paste0("'", colors_link,"'", collapse = ','), "]")

colors_node <- c('white', c(rep('darkorange', 7)), c(rep('blue', 14)))
colors_node_array <- paste0("[", paste0("'", colors_node,"'", collapse = ','), "]")

# Create Opts vector with font size and nodePadding
opts <- paste0("{
        link: { colorMode: 'source',
                colors: ", colors_link_array ," },
        node: { colors: ", colors_node_array ," , label: { 
                                         fontSize: 13,
                                         bold: true
                                         }, nodePadding: 30 } 
      }" )

# PLOT and PRINT the Sankey Chart
FB_Sankey <- (
  gvisSankey(Rev_spend, from="rs_from", 
             to="rs_to", weight="rs_weights",
                          options=list(
               height=500,
               width=915,
               sankey=opts
               ))
)

plot(FB_Sankey)
## print(FB_Sankey, 'chart') : we have deactivated the print code in this tab. The active version will be reproduced under the 'Reconstruction' tab.

Data References

Gesmann, M. (2014) Sankey diagrams with googleVis | mage’s blog. [online] Available at: <https://www.magesblog.com/post/2014-03-25-sankey-diagrams-with-googlevis/> [Accessed 1 August 2021].
Google Developers (2021) Sankey Diagram | Charts | Google Developers. [online] Available at: <https://developers.google.com/chart/interactive/docs/gallery/sankey> [Accessed 1 August 2021].
Rdocumentation.org (2018). gvisSankey function - RDocumentation. [online] Available at: <https://www.rdocumentation.org/packages/googleVis/versions/0.6.0/topics/gvisSankey>. [Accessed 1 August 2021].
Vadym, B. & Tallino, M. (2016) How to change node and link colors in R googleVis sankey chart. [online] Available at <https://stackoverflow.com/questions/29371977/how-to-change-node-and-link-colors-in-r-googlevis-sankey-chart> [Accessed 1 August 2021].

Reconstruction

Following is the Sankey diagram which reconstructs the information contained in the pie charts under the ‘Original’ tab.

Figure 2. Australian Federal Government: Revenue and Expenditure for 2016-17 (re-constructed)

Total Revenue $416.90 billion		Total Expenses ($450.60 billion)
	*Deficit ($33.70 billion)*

The Sankey diagram addresses the shortcomings from the original pie charts as follows:

There is a clear hierarchy and summary

When comparing quantitative variables, Cleveland and McGill (1985) recommend features such as position, length, angle and slope over less-preferred features such as area, volume and density. We can see then that our Sankey diagram primarily uses the recommended features of position and length, whereas the original pie charts mainly used the less-preferred feature of area.

The reconstructed chart also allows us to locate the summary in an instant. The total revenue, spending, and difference (surplus/deficit) are at the top of the chart. The positioning is instinctual as there is a clear hierarchy from top-to-bottom commencing with the chart title, category summary, and finally category detail.

There is a simple linear alignment

With 17 categories to observe, it is important to create instinctual simplicity in the visual alignment of the Sankey diagram. This helps to reduce the cogntive load when visually processing the information, and also helps the reader to quickly re-orientate and order the information as they investigate the detail.

This Sankey diagram has achieved this by:

creating a flow from left-to-right (revenue received arrives from the left, to then depart by expenses paid on the right); and,
creating order from top-to-bottom (largest value items at the top, lowest value items at the bottom).

The colour scheme is simpler and more friendly

The blue-green-grey palette of 10 colours in the original chart has been reduced to a palette of 4 colours in the re-constructed chart.

Colours in the Sankey diagram have been used to create a clear distinction between the categories of revenue (blue) and expenses (orange). The long flowing links use a lighter shade, whereas the end nodes use a darker colour. As the end-nodes are aligned vertically, the darker shade has the effect of creating bullet-points for each category. A larger node means that category has a larger value.

The clear distinction between blue and orange also means it is unlikely to disadvantage individuals with colour blindness such as:

protanopia/ protanomaly (difficulty distinguishing between red-green);
deutanomia/ deutanomaly (difficulty distinguishing between green-yellow, or blue-purple); and,
tritanomia/ tritanomaly (difficulty distinguishing between blue-green).

(Baglin, 2020)

The resulting effect of the simpler colour scheme then is that it not only reduces cognitive load, but actually helps orientate the reader in an instinctive manner.

References

Baglin, J. (2020). Chapter 3 Visual Perception and Colour | Data Visualisation: From Theory to Practice. [online] Available at https://dark-star-161610.appspot.com/secured/_book/index.html [Accessed 1 August 2021].
Cleveland, W.S. & McGill, R. (1985) “Graphical Perception and Graphical Methods for Analyzing Scientific Data.” Science: Vol 229, Issue 4716, pp.828-833. [online] Available at <10.1126/science.229.4716.828> [Accessed 1 August 2021].

MATH2404 - Data Visualisation - Assignment 2

Deconstruct, Reconstruct Web Report

John Mihail s3868554

01 August 2021

Original

Code

Reconstruction