Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
This section briefly explains some critiques against the original data visualization (see above) which up to author’s best knowledge based on web searching, has never been done before.
In author’s opinion, the original data visualisation was targetted for general public of Australia with objective to show how the Australians in average, spend their income. According to the original article, the income is made up of wages, investment incomes, welfare payments and the imputed rent values of home ownership [1]. Further, author thinks that the visualisation also was intended for wider international public perhaps to serve as point of reference for some comparisons.
The visualisation chosen had the following three main issues :
Reference
The following code was used to fix the issues identified in the original. Important to note the data file loaded in this code already went through some data preprocessing e.g. grouping of observations, transforming monetary value into ratio (%), data reshaping from wide to long format, etc. The segment of code that performed the data preprocessing was purposely not included in this section as the main focus of this task was the reconstruction of data visualisation.
library(dplyr)
library(readr)
library(ggplot2)
# read data from file
expenditure <- read_csv(file="avg_weekly_expenditure.csv")
# set the order for year
expenditure$Year <- factor(expenditure$Year,
levels = c("1984", "1988-89", "1993-94", "1998-99",
"2003-04", "2009-10", "2015-16"), ordered = TRUE)
# set the order for expenditure category
expenditure$Category <- factor(expenditure$Category,
levels =
c("Insurance", "Mortgage/Rent", "Income Tax", "Other Goods/Services",
"Personal Care", "Education", "Recreation", "Communication",
"Transport", "Health", "Apparel", "Tobacco",
"Food and Beverages", "Energy", "Housing"), ordered = TRUE)
# list of colors to be used for each bars in the faceted plot
colors <- c("#fccde5", "#ffffb3", "#bebada", "#fb8072",
"#80b1d3", "#cc4c02", "#f768a1", "#8dd3c7",
"#d9d9d9", "#377eb8", "#984ea3", "#a65628",
"#e5d8bd", "#41b6c4", "#fdb462")
# generate faceted grid plot
expenditure_plot <- ggplot(data = expenditure, aes(x = Category, y = Percentage, fill = Category)) +
geom_bar(stat = 'identity') +
scale_fill_manual(values = colors) +
coord_flip()
expenditure_plot <- expenditure_plot +
facet_grid(. ~ Year, scale = 'free_y') +
scale_y_continuous(limits = c(0, 25)) +
geom_text(aes(label = Percentage), hjust = -0.2)
expenditure_plot <- expenditure_plot +
labs(title = "Australian Household Expenditure",
subtitle = "Household Expenditure (Percentage), 1984 to 2015-16",
caption = "Source: Australian Bureau of Statistics")
# additional plot enhancements
final_plot <- expenditure_plot +
theme(panel.grid = element_blank(),
legend.position = 'none',
panel.background = element_rect(fill = "#f7fcfd"),
title = element_text(size = 16, face = 'bold'),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_text(size = 12, face = 'bold'),
strip.text = element_text(size = 14, face = 'bold')
)
Data Reference
The following plot fixes the main issues in the original. It is important to note that the expenditure categories and their proportion do not exactly match the original data visualisation. This limitation is due to the fact that author could not find the original data regardless an attempt to retrieve the data directly from the source referenced in the chart, in this case Australian Bureau of Statistics (ABS). It is highly likely that the original publication had performed some further data processing including adding some new features/variables sourced from other provider (i.e. IBISWorld). Therefore, the reconstruction was done using data directly retrieved from ABS with some data processing e.g. grouping of categories, reshaping the data from wide to long format, etc. Also, additional data (i.e. household expenditures from previous survey period) were added to illustrate the ability of reconstructed data visualization in displaying multi-years data.