Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Daniel Bowen (2018).


Objective

The audience are train commuters in the Melbourne Metropolitan Area, particularly those travelling during the morning peak (7.00 to 9:30 am). The objective is to inform the audience of trends in the number train services where passenger loads are considered above or below capacity.

The visualisation chosen had the following three main issues:

  • When reading a report entitled “PT crowding increasing”, the audience is likely more interested in the proportion of surveyed services above or below the benchmark rather than the total numbers of services surveyed in each category. It is the proportion rather than the absolute number of surveyed services that indicates how likely it is that the readers themselves will travel on an overcrowded train. With the changing numbers in services surveyed between years, it becomes difficult for the audience to infer that information from this visualisation. Showing absolute numbers may also mislead the audience into mistaking the sample size for the population size (i.e. mistaking the number of surveyed services for the actual number of services which ran in each time period).
  • In general, commuters would want to be able to look at this data to see if it confirms or refutes their personal experiences of trains being overcrowded. However, overcrowding varies greatly between train lines. E.g. somebody living on the Alamein line would rarely experience overcrowding while somebody on the South Morang line would have experienced an increased number of overcrowded services. This visualisation aggregates data across the entire metropolitan network and thus completely hides this variability.
  • The use of colour isn’t ideal. Red-green is the most common type of colour blindness. While this chart could theoretically be made sense of by somebody without being able to distinguish those colours, it becomes significantly less self-explanatory: Without the colours, readers would likely find it more difficult to realise that “Above Benchmark” (red) is “bad” and “Below Benchmark” (green) is “good”.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)

data <- read.csv("./PassengerLoads.csv")
data$Year <- factor(data$Year)
data$Line <- factor(data$Line)
p1 <- ggplot(data, aes(x=Year, y=Services_Within_Capacity, group=Line)) + 
  geom_line(aes(color=Line)) + 
  facet_grid(rows=vars(data$Line)) +
  xlab("Survey Month") +
  ylab("% at or below Passenger Capacity") + 
  labs(title = "Melbourne Metropolitan Commuter Train Passenger Loads during AM Peak", 
       subtitle = "Train Services delivered at or below nominal Passenger Capacity by Line / Corridor") + 
  scale_y_continuous(labels = scales::percent, breaks = c(0.5, 0.75, 1)) + 
  theme_minimal() + 
  theme(strip.text.y = element_text(angle = 0)) + 
  theme(legend.position="none") + 
  theme(panel.spacing.y = unit(1, "lines"))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.