Data Visualisation Assignment 2

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original

Source: Martha Kang McGill. Where Do Americans Share Transportation?

Objective & Target Audience

Objective Since the visualisation chosen was created by a graphic designer and does not contain any numerals, it can be surmised that one objective for this visualisation was to grab attention purely as a work of art, conveying aesthetic pleasure with sharp fonts and bright colours. If we interpret the meaning of the colours, we then see it is an infographic, that highlights the difference in percentages between the methods used by commuters as they travel to work in 8 U.S. cities.
Target Audience Since the visualisation lacks numerals, we can surmise it was not indended for a technical audience. Since the visualisation was first released as a graphic design piece, the audience it was aimed at was probably the design community, to demonstrate a way to render numerical data in an engaging way without putting emphasis on the numerical content.

Issues

The visualisation chosen has the following three main issues:

Misleading Areas. The city abbreviations (CHI, LA, etc) have different letter lengths (2 or 3) and are coloured in regions like a stacked bar chart adding to 100%. The hight of the coloured regions is being used to represent a percentage value, however the width (and area) of the coloured regions has not been taken into account. For a given percentage to be represented along the y-axis, the area formed by the coloured region is 50% larger in 3 letter abbreviations compared to 2 letter abbreviations.
City Order. The cities have been listed in an order that does not help with interpretation or comparison of the data. The order is not alphabetic or geographic. Perhaps the order was chosen for asthetic reasons only to form columns with letter counts of 3,2,3. Form instead of function.
No Numbers. Numerical values are completely missing. The y-axis tick marks are shown however it is not indicated that the major tick values are 25%.

Reference

Martha Kang McGill (2008). Where Do Americans Share Transportation?. February 19, 2010, from shareable.net website: https://www.shareable.net/where-do-americans-share-transportation/

Code

The following code was used to fix the issues identified in the original.

library(tidyverse)
library(ggplot2)
library(scales) # for percent_format
library(colorspace) # for lighten

colors = c('#31A354','#3962AD','#7B91CA','#A8321D','#EA6125')
colors = c(lighten(colors[1:2], amount=.7), colors[3], lighten(colors[4], amount=.7), colors[5])

type_order = rev(c("% who drive themselves", "% who carpool", "% who take public transport", "% who walk", "% who bike, work at home"))
city_order = c('Chicago','Los Angeles','New York','Atlanta','San Francisco','Houston','Washington DC','Seattle')

commuting = tribble(     ~TYPE,   ~LA,   ~SF,   ~DC,  ~ATL,  ~CHI,  ~NYC,  ~HOU,  ~SEA,
      "% who drive themselves",  0.72,  0.36, 0.348, 0.674, 0.502, 0.227, 0.754, 0.533,
               "% who carpool", 0.108, 0.079, 0.059, 0.078, 0.094,  0.05, 0.126, 0.086,
 "% who take public transport", 0.072, 0.341, 0.383, 0.114, 0.265, 0.557, 0.045, 0.182,
                  "% who walk", 0.029, 0.094, 0.118, 0.041, 0.065, 0.101, 0.021, 0.086,
    "% who bike, work at home", 0.071, 0.126, 0.092, 0.093, 0.074, 0.065, 0.054, 0.113
)

# set column order
commuting = commuting %>% select(TYPE, CHI, LA, NYC, ATL, SF, HOU, DC, SEA)
# set city names
colnames(commuting) = c('TYPE', city_order)
# convert to long format
commuting = commuting %>% gather(CITY, PERCENT, 2:9)

# arrange plots in order of "% who drive themselves" descending
city_order = commuting %>% filter(TYPE == "% who drive themselves") %>% arrange(desc(PERCENT)) %>% pull(CITY)

commuting = commuting %>% mutate(
  TYPE = factor(TYPE, levels=type_order),
  CITY = factor(CITY, levels=city_order)
)

family = "Arial"

p1 <- ggplot(commuting, aes(fill=TYPE, x=PERCENT, y=TYPE)) +
labs( title= "COMMUTING TO WORK", y="", x = "",
      caption = "Source: US Census Bureau American Community Survey COMMUTING CHARACTERISTICS (2008) Credit: Ashley Mallia") +
geom_bar(stat="identity", position="stack", width = .8) +
geom_text(aes(label = paste0(round(PERCENT*100),'%')), 
          family = family, size = 3, hjust=-0.2, color="grey50") +
scale_fill_manual(values=colors) +
theme_bw() + theme( 
  plot.title = element_text(family = "Arial Narrow", size = 20, face="bold"), 
  plot.caption = element_text(family = family, size = 9, face="italic"),
  axis.title = element_text(family = family, size = 9),
  axis.text = element_text(family = family, size = 9),
  legend.position = "none",
  strip.background = element_blank(),
  strip.text = element_text(family = family, size = 11, face="bold")
) +
scale_x_continuous(breaks=seq(.2,.8,.2), labels = label_percent(), limits = c(0,.8)) +
coord_fixed(ratio = 1/8) +
facet_wrap(~CITY, ncol=3, labeller = labeller(CITY = toupper))

Data Reference

United States Census Bureau. COMMUTING CHARACTERISTICS (2008), American Community Survey, 1-Year Estimates Subject Tables (TableID S0801). Website https://data.census.gov/

Reconstruction

The following plot fixes the main issues in the original.

Data Visualisation Assignment 2

Deconstruct, Reconstruct Web Report

Ashley Mallia (s3773716)

Original

Code

Reconstruction