Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: The Economist (2021).


Objective

This visualisation was published in an article in The Economist (2021) entitled Which is the strongest Olympic team of all time?. The newspaper is mostly read in English speaking Western nations and publishes news articles with topics ranging across current affairs, international business, politics, technology, and culture. The article was most likely targeting general news readers and the newspaper’s regular readership, as the Olympics is a mainstream and prominent topic when it comes about. The objective was to identify an advantage for Olympic host nations, the influence of politics on performance, and/or the strongest Olympic team of all time.

The visualisation had the following three main issues:

  • Lack of focus: The title of this visualisation is Home Advantage, while the title of the article is Which is the strongest Olympic team of all time?, while the article’s subtitle says ‘Politics and geography often determine national success at the Olympics’ (The Economist, 2021). These titles represent three objectives that are quite distinct. While it is not always a bad thing for a visualisation to tell multiple stories, if done inelegantly it can cause it to spread itself too thinly. In this case, because the visualisation is trying to do too much, the viewer is left scanning over the graphic without knowing what part to latch onto, or when think they have found something to focus on, it doesn’t lead to anywhere consistent or satisfying. It is unclear if or how we are meant to focus on or contextualise, for example, the host nations’ performances, the big chunks occupied by the US and the Soviets, or the narrowing disparity of medal distribution as the century progresses. It would be better for this visualisation to pick one objective and focus on showing that.
  • Unanswered questions: Of the questions it poses, it fails to answer any. If the objective was to illustrate an advantage for host nations, unless it was one of the top three to win the most medals, the host nation’s performance is invisible. This is true for a large portion of the latter half of the 20th century, which tells us little about host nations. If the objective was to illustrate the strongest Olympic team of all time, we can see certain countries dominating at different periods. But how are we to define the ‘strongest Olympic team of all time’? Was it the US in 1904 when they won over 80% of the medals, despite there being a smaller selection of teams and events? Likely not, but how we are meant to interpret the ‘strongest team of all time’ remains unclear however we look at it. Lastly, if the objective was to demonstrate the political influence on Olympic performance, the visualisation indirectly demonstrates this, but only for those who have an understanding of 20th century Western history.
  • Muddled colours: Countries were a variable in this visualisation, and a lot of different ones made it into the top three to win the most medals of their year. Squinting my eyes, I managed to count 19 different countries. Consequently, the creator had to use a large number of colours to distinguish between them. As expected, the palette ends up muddled and unappealing. Not only is it mucky, but the rules behind the colouring appear to break their own logic. Germany and Russia have both have tumultuous national identities throughout the 20th century, so the creator used lighter and darker shades of the same colour to represent a relation between their various identities. But when we look at Italy and Finland, as just one example, we can see a lighter and darker shade of purple, despite these two nations being completely unrelated.

Reference

Code

The following code was used to fix the issues identified in the original.

# install.packages("magrittr")
# install.packages("tidyverse")
# install.packages("readxl")

# load libraries
library(magrittr)
library(tidyverse)
library(readxl)

# load data
oly <- read_excel("medal_table.xlsx")

# calculate medal share by year
props <- oly %>%
  group_by(Year) %>% 
  mutate(freq = (Total / sum(Total)) * 100) %>% 
  data.frame()

# filter for country with highest medal share by year...
first <- props %>% 
  group_by(Year) %>% 
  slice_max(order_by = freq, n = 1) %>% 
  summarise(freq1 = median(freq))

# ...then second highest...
second <- props %>% 
  group_by(Year) %>% 
  slice_max(order_by = freq, n = 3) %>% 
  summarise(freq2 = median(freq))

# ...and third highest...
third <- props %>% 
  group_by(Year) %>% 
  slice_max(order_by = freq, n = 5) %>% 
  summarise(freq3 = median(freq))

# filter for host country medal share by year
host_medals <- props %>% 
  filter(Host == TRUE)

# filter out unnecesssary columns
host_medals <- host_medals %>%  
  select(Year,Nation,freq)

# join all the tables
medals_freq <- host_medals %>% 
  left_join(first, by = "Year") %>% 
  left_join(second, by = "Year") %>% 
  left_join(third, by = "Year")

# create long form data
medals_freq2 <- medals_freq %>% gather(
  `freq`, 
  `freq1`, 
  `freq2`, 
  `freq3`, 
  key = "type", 
  value = "freqs"
  ) %>% 
  data.frame()

# set type column as factor with labels
medals_freq2$type <- factor(
  medals_freq2$type, 
  levels = c(
    "freq", 
    "freq1",
    "freq2",
    "freq3"
  ),
  labels = c(
    "Host nation", 
    "Nation with most medals",
    "Second most",
    "Third most"
  )
)

# join with wide form table (needed to create range line)
medals_freq2 <- medals_freq2 %>%
  left_join(medals_freq, by = "Year")

# filter out unnecesssary columns
medals_freq2 <- medals_freq2 %>% 
  select(Year,type,freqs,freq,freq1,freq3)

# WWI and WWII annotions - decided not to use in final
annotation <- data.frame(
  x = c(1916,1942),
  y = c(30,30),
  label = c("WWI", "WWII")
  )

# instantiate ggplot object
p1 <- ggplot(
  medals_freq2, 
  aes(y=freqs, x=Year, color=type, size=type)
) +
  
  # show all x axis ticks
  scale_x_continuous(
    "Year", 
    labels = as.character(medals_freq2$Year), 
    breaks = medals_freq2$Year
    ) +
  
  # add range lines layer
  geom_linerange(
    aes(ymin=freq1,ymax=freq3), 
    size = 1.5, 
    colour = "grey90"
    )  +
  
  # add data points layer 
  geom_point() +
  
  # comment-in below to see with annotation
  
  # geom_text(
  #   data=annotation, 
  #   aes(x=x, y=y, label=label), 
  #   color="black", 
  #   size=2.5 , angle=0
  #   ) +
  
  # specify colours dor data points
  scale_color_manual(values = c(
    "blue",
    "black",
    "grey50",
    "grey70"
    )) +
  
  # re-adding the Host nation data point so it overlaps the others
  geom_point(
    aes(y=freq), 
    size = 2.5, 
    shape = 21, 
    fill = "blue", 
    colour = "blue"
    ) +
  
  # edit data point sizes
  scale_size_manual(values = c(2.5,1,1,1,1)) +
  
  # classic theme with no grid lines
  theme_classic() +
  
  # fix up titles
  labs(
    y = "",
    title = "Home advantage?", 
    subtitle = "Share of medals won by Olympic host nation, %",
    color = "", 
    size = ""
    ) +

  # 1. adjust x ticks - neither hjust nor margin worked to fix the
  # positioning of the ticks, so they are slightly unaligned
  # unfortunately
  # 2. add y axis grid
  theme(
    axis.text.x = element_text(angle = 90, size = 7),
    panel.grid.major.y = element_line(linetype =  1)
    )

Data Reference

Reconstruction

The following plot fixes the main issues in the original.