Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Business Insider Australia (2014)


Objective

This section briefly explains some critiques against the original data visualization (see above) which up to author’s best knowledge based on web searching, has never been done before.

In author’s opinion, the original data visualisation was targetted for general public of Australia with objective to show how the Australians in average, spend their income. According to the original article, the income is made up of wages, investment incomes, welfare payments and the imputed rent values of home ownership [1]. Further, author thinks that the visualisation also was intended for wider international public perhaps to serve as point of reference for some comparisons.

The visualisation chosen had the following three main issues :

  • Precision: It uses doughnut chart which is known for relatively lower precision - it is hard to tell the exact proportion of an expenditure category without looking at the accompanying label/text. For example, by ignoring the percentage label, one may conclude that expenditures for Durables and Hospitality are equal.
  • Choice of Color: It uses red-green color combination which without the explanatory labels may causes challenges for people with color blindness. The issue is even worse if these colors are next to each other, as in the original visualisation for Rent and Other Services category. In addition, it reuses same color for different expenditure category which up to an extent may also confuses the audience. For example, the Taxes and Rent categories are both assigned Red color.
  • Multi-Year Comparison: The fact that it uses doughnut chart creates a challenge if the audience would like to compare multi-year data. One might argue that they can always create a doughnut chart for each year and put them side-by-side but still it is not very intuitive and will require audiences to change their focus as they move from one chart to another.

Reference

Code

The following code was used to fix the issues identified in the original. Important to note the data file loaded in this code already went through some data preprocessing e.g. grouping of observations, transforming monetary value into ratio (%), data reshaping from wide to long format, etc. The segment of code that performed the data preprocessing was purposely not included in this section as the main focus of this task was the reconstruction of data visualisation.

library(dplyr)
library(readr)
library(ggplot2)

# read data from file
expenditure <- read_csv(file="avg_weekly_expenditure.csv")

# set the order for year
expenditure$Year <- factor(expenditure$Year,
                           levels = c("1984", "1988-89", "1993-94", "1998-99",
                                      "2003-04", "2009-10", "2015-16"), ordered = TRUE)

# set the order for expenditure category
expenditure$Category <- factor(expenditure$Category,
                               levels = 
                                 c("Insurance", "Mortgage/Rent", "Income Tax", "Other Goods/Services",
                                   "Personal Care", "Education", "Recreation", "Communication",
                                   "Transport", "Health", "Apparel", "Tobacco",
                                   "Food and Beverages", "Energy", "Housing"), ordered = TRUE)

# list of colors to be used for each bars in the faceted plot
colors <- c("#fccde5", "#ffffb3", "#bebada", "#fb8072", 
            "#80b1d3", "#cc4c02", "#f768a1", "#8dd3c7",
            "#d9d9d9", "#377eb8", "#984ea3", "#a65628",
            "#e5d8bd", "#41b6c4", "#fdb462")

# generate faceted grid plot
expenditure_plot <- ggplot(data = expenditure, aes(x = Category, y = Percentage, fill = Category)) +
  geom_bar(stat = 'identity') + 
  scale_fill_manual(values = colors) +
  coord_flip()

expenditure_plot <- expenditure_plot + 
  facet_grid(. ~ Year, scale = 'free_y') + 
  scale_y_continuous(limits = c(0, 25)) +
  geom_text(aes(label = Percentage), hjust = -0.2)

expenditure_plot <- expenditure_plot +
  labs(title = "Australian Household Expenditure",
       subtitle  = "Household Expenditure (Percentage), 1984 to 2015-16",
       caption = "Source: Australian Bureau of Statistics")

# additional plot enhancements
final_plot <- expenditure_plot +
  theme(panel.grid = element_blank(),
        legend.position = 'none',
        panel.background = element_rect(fill = "#f7fcfd"),
        title = element_text(size = 16, face = 'bold'),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text.y = element_text(size = 12, face = 'bold'),
        strip.text = element_text(size = 14, face = 'bold')
  )

Data Reference

Reconstruction

The following plot fixes the main issues in the original. It is important to note that the expenditure categories and their proportion do not exactly match the original data visualisation. This limitation is due to the fact that author could not find the original data regardless an attempt to retrieve the data directly from the source referenced in the chart, in this case Australian Bureau of Statistics (ABS). It is highly likely that the original publication had performed some further data processing including adding some new features/variables sourced from other provider (i.e. IBISWorld). Therefore, the reconstruction was done using data directly retrieved from ABS with some data processing e.g. grouping of categories, reshaping the data from wide to long format, etc. Also, additional data (i.e. household expenditures from previous survey period) were added to illustrate the ability of reconstructed data visualization in displaying multi-years data.