Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: The Oxford Study: Cost and Cost overrun Olympic at the games (2016)


Objective

To educate the audience about total cost of each Olympic Games from 1968 to 2016 and how much each Game overspend compared to its initial budget.

  • Show total cost

  • Show cost overrun

  • Show year, host country and season occurred each Game

  • Compare initial estimated cost and overspend of each Game

Target audience

The general public who want to gain knowledge of the total expenditure of the host country.

The visualization chosen had the following three main issues.

  • Issue 1 is visual bombardment causing misleading viewer. It contains too much information in one graph. There is some unnecessary information which is not helpful to answer the original objective. Namely, the city and Olympic logo of the host country. This causes a false impression and will mislead the viewer. With too much information, the viewer may find it difficult to draw a relevant conclusion. However, there are other important variables that should be highlighted on the graph because they directly impact the cost of each Olympic games (Flyvbjerg, 2021). These include the country, year, season, initial cost and over expenditure.

  • Issue 2 is choosing an improper visualization method causing fail to achieve the original objectives. The purpose of the visualization is to illustrate the initial costs, overrun expenditure and total expenses of each Olympic games. However, the original chart illustrates the total expenses in billions of dollars while the overrun expenditures is in percentages. If the target audience is the general public, then it is difficult for them to understand the information and compare the initial costs and overrun expenditures.

  • Issue 3 is legend issues. The related sport costs and costs overrun are unnecessary legends. This is because they represent the amount of costs for each games using circle sizes. However, this makes it difficult for viewers to match circle sizes as per the legends. Moreover, the label represents exact values which makes the legends redundant. Although the circles sizes may help viewers compare costs of each country, it is difficult for them to compare between countries with approximate values.

Reference

Code

The following code was used to fix the issues identified in the original.

#load packages
library(tidyverse)
library(ggplot2)
library(data.table)
library(tidyr)
library(extrafont)

#PREPROCESS DATA
#read the dataset and set dataframe
OriginalDataset<-read_csv("C:/Users/linhn/Downloads/olympic.csv")
df<-data.frame(OriginalDataset)
df
##    year   country continent season   cost PercentageOverrun
## 1  1968    France    Eroupe    win  0.888               181
## 2  1976    Canada  Americas    sum  6.093               720
## 3  1980      U.S.  Americas    win  0.435               324
## 4  1988    Canada  Americas    win  1.109                65
## 5  1992    France    Eroupe    win  1.997               137
## 6  1992     Spain    Eroupe    sum  9.687               266
## 7  1994    Norway    Eroupe    win  2.228               277
## 8  1996      U.S.  Americas    sum  4.143               151
## 9  1998     Japan      Asia    win  2.227                56
## 10 2000 Australia Australia    sum  5.026                90
## 11 2002      U.S.  Americas    win  2.520                24
## 12 2004    Greece    Eroupe    sum  2.942                49
## 13 2006     Italy    Eroupe    win  4.366                80
## 14 2008     China      Asia    sum  6.810                 2
## 15 2010    Canada  Americas    win  2.540                13
## 16 2012      U.K.    Eroupe    sum 14.957                76
## 17 2014    Russia      Asia    win 21.900               289
## 18 2016    Brazil  Americas    sum  4.557                51
#concatenate 3 columns year, season and country
df$YearSeasonCountry <- paste(df$year, df$season, df$country)

#convert 'overrun' from % to $ billion 
df$overspend<-round((df$cost *100)/(100+df$PercentageOverrun),1)

# calculate initial budget
df$initial <- round((df$cost - df$overspend),1)

#data is in wide format 
df
##    year   country continent season   cost PercentageOverrun  YearSeasonCountry
## 1  1968    France    Eroupe    win  0.888               181    1968 win France
## 2  1976    Canada  Americas    sum  6.093               720    1976 sum Canada
## 3  1980      U.S.  Americas    win  0.435               324      1980 win U.S.
## 4  1988    Canada  Americas    win  1.109                65    1988 win Canada
## 5  1992    France    Eroupe    win  1.997               137    1992 win France
## 6  1992     Spain    Eroupe    sum  9.687               266     1992 sum Spain
## 7  1994    Norway    Eroupe    win  2.228               277    1994 win Norway
## 8  1996      U.S.  Americas    sum  4.143               151      1996 sum U.S.
## 9  1998     Japan      Asia    win  2.227                56     1998 win Japan
## 10 2000 Australia Australia    sum  5.026                90 2000 sum Australia
## 11 2002      U.S.  Americas    win  2.520                24      2002 win U.S.
## 12 2004    Greece    Eroupe    sum  2.942                49    2004 sum Greece
## 13 2006     Italy    Eroupe    win  4.366                80     2006 win Italy
## 14 2008     China      Asia    sum  6.810                 2     2008 sum China
## 15 2010    Canada  Americas    win  2.540                13    2010 win Canada
## 16 2012      U.K.    Eroupe    sum 14.957                76      2012 sum U.K.
## 17 2014    Russia      Asia    win 21.900               289    2014 win Russia
## 18 2016    Brazil  Americas    sum  4.557                51    2016 sum Brazil
##    overspend initial
## 1        0.3     0.6
## 2        0.7     5.4
## 3        0.1     0.3
## 4        0.7     0.4
## 5        0.8     1.2
## 6        2.6     7.1
## 7        0.6     1.6
## 8        1.7     2.4
## 9        1.4     0.8
## 10       2.6     2.4
## 11       2.0     0.5
## 12       2.0     0.9
## 13       2.4     2.0
## 14       6.7     0.1
## 15       2.2     0.3
## 16       8.5     6.5
## 17       5.6    16.3
## 18       3.0     1.6
#restructure data to long format

df<- gather(df, key="initial/overspend",value="cost",initial:overspend)
df
##    year   country continent season PercentageOverrun  YearSeasonCountry
## 1  1968    France    Eroupe    win               181    1968 win France
## 2  1976    Canada  Americas    sum               720    1976 sum Canada
## 3  1980      U.S.  Americas    win               324      1980 win U.S.
## 4  1988    Canada  Americas    win                65    1988 win Canada
## 5  1992    France    Eroupe    win               137    1992 win France
## 6  1992     Spain    Eroupe    sum               266     1992 sum Spain
## 7  1994    Norway    Eroupe    win               277    1994 win Norway
## 8  1996      U.S.  Americas    sum               151      1996 sum U.S.
## 9  1998     Japan      Asia    win                56     1998 win Japan
## 10 2000 Australia Australia    sum                90 2000 sum Australia
## 11 2002      U.S.  Americas    win                24      2002 win U.S.
## 12 2004    Greece    Eroupe    sum                49    2004 sum Greece
## 13 2006     Italy    Eroupe    win                80     2006 win Italy
## 14 2008     China      Asia    sum                 2     2008 sum China
## 15 2010    Canada  Americas    win                13    2010 win Canada
## 16 2012      U.K.    Eroupe    sum                76      2012 sum U.K.
## 17 2014    Russia      Asia    win               289    2014 win Russia
## 18 2016    Brazil  Americas    sum                51    2016 sum Brazil
## 19 1968    France    Eroupe    win               181    1968 win France
## 20 1976    Canada  Americas    sum               720    1976 sum Canada
## 21 1980      U.S.  Americas    win               324      1980 win U.S.
## 22 1988    Canada  Americas    win                65    1988 win Canada
## 23 1992    France    Eroupe    win               137    1992 win France
## 24 1992     Spain    Eroupe    sum               266     1992 sum Spain
## 25 1994    Norway    Eroupe    win               277    1994 win Norway
## 26 1996      U.S.  Americas    sum               151      1996 sum U.S.
## 27 1998     Japan      Asia    win                56     1998 win Japan
## 28 2000 Australia Australia    sum                90 2000 sum Australia
## 29 2002      U.S.  Americas    win                24      2002 win U.S.
## 30 2004    Greece    Eroupe    sum                49    2004 sum Greece
## 31 2006     Italy    Eroupe    win                80     2006 win Italy
## 32 2008     China      Asia    sum                 2     2008 sum China
## 33 2010    Canada  Americas    win                13    2010 win Canada
## 34 2012      U.K.    Eroupe    sum                76      2012 sum U.K.
## 35 2014    Russia      Asia    win               289    2014 win Russia
## 36 2016    Brazil  Americas    sum                51    2016 sum Brazil
##    initial/overspend cost
## 1            initial  0.6
## 2            initial  5.4
## 3            initial  0.3
## 4            initial  0.4
## 5            initial  1.2
## 6            initial  7.1
## 7            initial  1.6
## 8            initial  2.4
## 9            initial  0.8
## 10           initial  2.4
## 11           initial  0.5
## 12           initial  0.9
## 13           initial  2.0
## 14           initial  0.1
## 15           initial  0.3
## 16           initial  6.5
## 17           initial 16.3
## 18           initial  1.6
## 19         overspend  0.3
## 20         overspend  0.7
## 21         overspend  0.1
## 22         overspend  0.7
## 23         overspend  0.8
## 24         overspend  2.6
## 25         overspend  0.6
## 26         overspend  1.7
## 27         overspend  1.4
## 28         overspend  2.6
## 29         overspend  2.0
## 30         overspend  2.0
## 31         overspend  2.4
## 32         overspend  6.7
## 33         overspend  2.2
## 34         overspend  8.5
## 35         overspend  5.6
## 36         overspend  3.0
#VISUALIZATION 
#plot a base
  
p<- ggplot(data=df, aes(x=cost, y=YearSeasonCountry,fill=continent))+

#plot stacked bars  
  geom_col(position = position_stack(reverse = FALSE),colour="black")+
  
#add labels
  geom_text(data=df,aes(cost, YearSeasonCountry,label=cost), size=1.7,
            position=position_stack(vjust=.5),check_overlap=TRUE)+
  
#add  titles
  labs(
    title="OLYMPIC GAMES ALWAYS GO OVER BUDGET (1968-2016)",
    subtitle= "Displaying intital estimate and over expenditure in $ billion",
    x="Intitial Budget (left) and Overspend (right)",
    y="Year - Season - Country",
    caption="Source:The Oxford Olympic Study 2016: Cost and Cost Overrun at the Games") + 

#change theme  
  theme_light() +
  
#customize titles and legend 
  theme(
        text= element_text(family= "Times New Roman"),
        
        legend.title =element_blank(),
        legend.text = element_text(size=7), 
        legend.position = "top",
        legend.box="horizontal",
        legend.box.just = "left",
        legend.key.size = unit(0.3,'cm'),
        
        plot.title = element_text(size= 9,face = "bold",hjust=0.5),
        plot.caption= element_text(size=7,color="grey", face="italic",hjust = 1),
        plot.subtitle = element_text(size=9,colour = "black", face="italic", hjust=0.5),
    
        axis.text=element_text(size=7),
        axis.title=element_text(size=7, face="bold"),
        
        )

Data Reference

Flyvbjerg, B., Stewart, A., & Budzier, A. (2016). The Oxford Olympics Study 2016: Cost and Cost Overrun. SSRN, 8,12.

Reconstruction

The following plot fixes the main issues in the original.