Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: www.reddit/r/dataisbeautiful.com


Objective

The original visualization had been sourced from the subreddit known as ‘DataIsBeautiful’. The objective of the visualisation was to highlight a disproportional increase in the price of Freddos (a popular chocolate bar, manafactured by Cadbury) and the national living wage (NLW) per hour, in the United Kingdom between April 1999 and April 2020. The disproportionality between the two variables suggested that the number of Freddos affordable had decreased despite an increase in minimum wages due to an increase in the price for Freddos above the appropriate increase caused by inflation. The targetted audience were fellow data visualisation-enthusiast.

The visualisation chosen had the following three main issues:

  • Each bar represented the number of Freddos that were affordable for a subsequent value of NLW/hour in the UK, over time. Each bar comprised of an image of the Freddos chocolate packaging. Given that only the height of a bar changes in a bar plot, bars that are greater in height (to represent a higher number of Freddos affordable) comprise of an image that has its height stretched out relative to its width. This does not look visually appropriate. This can be classified as a perceptual issue. To tackle this, the decision was made to plot the number of Freddo’s affordable, using a line instead of a bar. Furthermore, a line would show movement (increase or decrease) in the number of Freddos affordable, relative to movement in NLW/hour, through time. Hence, the validity of the original objective would still be maintained. The deviation of one line from the other would indicate a disproportionate movement in the two values.
  • The visualisation utilises a dual axis for the number of Freddos affordable for a given NLW per hour, across time. The dual axes have different scales and, it was not clear as to which axis represented the plotted line and subsequent plotted bars at first glance. To avoid the deception (intended or not) caused by dual axes, the decision was made to reconstruct the visualisation using a single y-axis to represent the number of Freddo’s affordable for a given NLW/hour value across time. Furthermore, given that they had different scales, the two aforementioned variables had been standardised. The two standardised variables “scaled_nlw_per_hour” and “scaled_freddos_affordable” were then plotted.
  • The x-axis of the time series (and the data used by the original content creator) only consisted of 23 time points. The frequency of the time series was intended to be semi-annual (observed prices in April and October per year from April 1999 to April 2020). Hence, the number of time points should have been 43. This was suggestive of missing data. Using the referenced data sources, the missing time point values were introduced into the new data set (on which the visualisation is reconstructed). Note that the appropriate values for the number of Freddos affordable and the NLW/hour, for these missing time points had also been introduced into the new data set.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(readr)

#the following line of code had been used to set our working directory.
setwd("~/Desktop/Data Visualisation/assignment 2/working/reconstruction in R.")

#import data set 
freddos <- read_csv("datavis.csv")

#recreation of the original visualisation
p <- ggplot(freddos, aes(x=time))

p1 <- p + geom_line(aes(y=scaled_NLW_per_hour),colour="dodgerblue",group=1) +
  geom_line(aes(y=scaled_freddos_affordable),colour="red",group=1) +
  geom_vline(xintercept=22,linetype="dashed",colour="darkgrey") + 
  geom_text(aes(x=28.5, y=2.5, label=paste("Probable Intervention Effect")), 
            colour="darkgrey",angle=0) +
  geom_text(aes(x=37.75, y=1.65, label=paste("scaled_NLW_per_hour")), 
            colour="dodgerblue", angle=45, size=2.5, alpha=0.05) +
  geom_text(aes(x=37.75, y=-0.45, label=paste("scaled_freddos_affordable")), 
            colour="red", angle=0, size=2.5, alpha=0.05) +
  theme_bw() + 
  labs(title="No. of Freddos Affordable with NLW/Hour, Across Time",
      x="Semi-Annual Time Points (1999 April-2020 April)",
      y="Scaled: No. of Freddos Affordable & the NLW/Hour") +
  scale_x_discrete(breaks = freddos$time[seq(1, length(freddos$time), by = 5)])

Data Reference

Reconstruction

The following plot fixes the main issues in the original.

As can be observed in the reconstructed visualisation, the number of Freddos affordable increases with an increase in NLW/hour in the UK (perhaps due to inflation). However, following the intervention effect, despite an increase in the NLW/hour, the number of Freddos affordable reduced drastically and stayed at a relatively lower value. This suggests that the increase in the actual price of a Freddo was more than the increase in the price caused by inflation. The deviation of the two lines from each other represents this disproportionality.