Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Source: The Oxford Study: Cost and Cost overrun Olympic at the games (2016)
Objective
To educate the audience about total cost of each Olympic Games from 1968 to 2016 and how much each Game overspend compared to its initial budget.
Show total cost
Show cost overrun
Show year, host country and season occurred each Game
Compare initial estimated cost and overspend of each Game
Target audience
The general public who want to gain knowledge of the total expenditure of the host country.
The visualization chosen had the following three main issues.
Issue 1 is visual bombardment causing misleading viewer. It contains too much information in one graph. There is some unnecessary information which is not helpful to answer the original objective. Namely, the city and Olympic logo of the host country. This causes a false impression and will mislead the viewer. With too much information, the viewer may find it difficult to draw a relevant conclusion. However, there are other important variables that should be highlighted on the graph because they directly impact the cost of each Olympic games (Flyvbjerg, 2021). These include the country, year, season, initial cost and over expenditure.
Issue 2 is choosing an improper visualization method causing fail to achieve the original objectives. The purpose of the visualization is to illustrate the initial costs, overrun expenditure and total expenses of each Olympic games. However, the original chart illustrates the total expenses in billions of dollars while the overrun expenditures is in percentages. If the target audience is the general public, then it is difficult for them to understand the information and compare the initial costs and overrun expenditures.
Issue 3 is legend issues. The related sport costs and costs overrun are unnecessary legends. This is because they represent the amount of costs for each games using circle sizes. However, this makes it difficult for viewers to match circle sizes as per the legends. Moreover, the label represents exact values which makes the legends redundant. Although the circles sizes may help viewers compare costs of each country, it is difficult for them to compare between countries with approximate values.
Reference
Amoros, R. (2016, 8 31). The Olympic Games Always Go Over Budget, in One Chart (1968-2016). Retrieved from https://howmuch.net/articles/olympic-costs
Flyvbjerg, B. (2021, July). How Much Do the Olympics Cost? Retrieved from https://towardsdatascience.com/how-much-do-the-olympics-cost-ef0170bc71f7
The following code was used to fix the issues identified in the original.
#load packages
library(tidyverse)
library(ggplot2)
library(data.table)
library(tidyr)
library(extrafont)
#PREPROCESS DATA
#read the dataset and set dataframe
OriginalDataset<-read_csv("C:/Users/linhn/Downloads/olympic.csv")
df<-data.frame(OriginalDataset)
df
## year country continent season cost PercentageOverrun
## 1 1968 France Eroupe win 0.888 181
## 2 1976 Canada Americas sum 6.093 720
## 3 1980 U.S. Americas win 0.435 324
## 4 1988 Canada Americas win 1.109 65
## 5 1992 France Eroupe win 1.997 137
## 6 1992 Spain Eroupe sum 9.687 266
## 7 1994 Norway Eroupe win 2.228 277
## 8 1996 U.S. Americas sum 4.143 151
## 9 1998 Japan Asia win 2.227 56
## 10 2000 Australia Australia sum 5.026 90
## 11 2002 U.S. Americas win 2.520 24
## 12 2004 Greece Eroupe sum 2.942 49
## 13 2006 Italy Eroupe win 4.366 80
## 14 2008 China Asia sum 6.810 2
## 15 2010 Canada Americas win 2.540 13
## 16 2012 U.K. Eroupe sum 14.957 76
## 17 2014 Russia Asia win 21.900 289
## 18 2016 Brazil Americas sum 4.557 51
#concatenate 3 columns year, season and country
df$YearSeasonCountry <- paste(df$year, df$season, df$country)
#convert 'overrun' from % to $ billion
df$overspend<-round((df$cost *100)/(100+df$PercentageOverrun),1)
# calculate initial budget
df$initial <- round((df$cost - df$overspend),1)
#data is in wide format
df
## year country continent season cost PercentageOverrun YearSeasonCountry
## 1 1968 France Eroupe win 0.888 181 1968 win France
## 2 1976 Canada Americas sum 6.093 720 1976 sum Canada
## 3 1980 U.S. Americas win 0.435 324 1980 win U.S.
## 4 1988 Canada Americas win 1.109 65 1988 win Canada
## 5 1992 France Eroupe win 1.997 137 1992 win France
## 6 1992 Spain Eroupe sum 9.687 266 1992 sum Spain
## 7 1994 Norway Eroupe win 2.228 277 1994 win Norway
## 8 1996 U.S. Americas sum 4.143 151 1996 sum U.S.
## 9 1998 Japan Asia win 2.227 56 1998 win Japan
## 10 2000 Australia Australia sum 5.026 90 2000 sum Australia
## 11 2002 U.S. Americas win 2.520 24 2002 win U.S.
## 12 2004 Greece Eroupe sum 2.942 49 2004 sum Greece
## 13 2006 Italy Eroupe win 4.366 80 2006 win Italy
## 14 2008 China Asia sum 6.810 2 2008 sum China
## 15 2010 Canada Americas win 2.540 13 2010 win Canada
## 16 2012 U.K. Eroupe sum 14.957 76 2012 sum U.K.
## 17 2014 Russia Asia win 21.900 289 2014 win Russia
## 18 2016 Brazil Americas sum 4.557 51 2016 sum Brazil
## overspend initial
## 1 0.3 0.6
## 2 0.7 5.4
## 3 0.1 0.3
## 4 0.7 0.4
## 5 0.8 1.2
## 6 2.6 7.1
## 7 0.6 1.6
## 8 1.7 2.4
## 9 1.4 0.8
## 10 2.6 2.4
## 11 2.0 0.5
## 12 2.0 0.9
## 13 2.4 2.0
## 14 6.7 0.1
## 15 2.2 0.3
## 16 8.5 6.5
## 17 5.6 16.3
## 18 3.0 1.6
#restructure data to long format
df<- gather(df, key="initial/overspend",value="cost",initial:overspend)
df
## year country continent season PercentageOverrun YearSeasonCountry
## 1 1968 France Eroupe win 181 1968 win France
## 2 1976 Canada Americas sum 720 1976 sum Canada
## 3 1980 U.S. Americas win 324 1980 win U.S.
## 4 1988 Canada Americas win 65 1988 win Canada
## 5 1992 France Eroupe win 137 1992 win France
## 6 1992 Spain Eroupe sum 266 1992 sum Spain
## 7 1994 Norway Eroupe win 277 1994 win Norway
## 8 1996 U.S. Americas sum 151 1996 sum U.S.
## 9 1998 Japan Asia win 56 1998 win Japan
## 10 2000 Australia Australia sum 90 2000 sum Australia
## 11 2002 U.S. Americas win 24 2002 win U.S.
## 12 2004 Greece Eroupe sum 49 2004 sum Greece
## 13 2006 Italy Eroupe win 80 2006 win Italy
## 14 2008 China Asia sum 2 2008 sum China
## 15 2010 Canada Americas win 13 2010 win Canada
## 16 2012 U.K. Eroupe sum 76 2012 sum U.K.
## 17 2014 Russia Asia win 289 2014 win Russia
## 18 2016 Brazil Americas sum 51 2016 sum Brazil
## 19 1968 France Eroupe win 181 1968 win France
## 20 1976 Canada Americas sum 720 1976 sum Canada
## 21 1980 U.S. Americas win 324 1980 win U.S.
## 22 1988 Canada Americas win 65 1988 win Canada
## 23 1992 France Eroupe win 137 1992 win France
## 24 1992 Spain Eroupe sum 266 1992 sum Spain
## 25 1994 Norway Eroupe win 277 1994 win Norway
## 26 1996 U.S. Americas sum 151 1996 sum U.S.
## 27 1998 Japan Asia win 56 1998 win Japan
## 28 2000 Australia Australia sum 90 2000 sum Australia
## 29 2002 U.S. Americas win 24 2002 win U.S.
## 30 2004 Greece Eroupe sum 49 2004 sum Greece
## 31 2006 Italy Eroupe win 80 2006 win Italy
## 32 2008 China Asia sum 2 2008 sum China
## 33 2010 Canada Americas win 13 2010 win Canada
## 34 2012 U.K. Eroupe sum 76 2012 sum U.K.
## 35 2014 Russia Asia win 289 2014 win Russia
## 36 2016 Brazil Americas sum 51 2016 sum Brazil
## initial/overspend cost
## 1 initial 0.6
## 2 initial 5.4
## 3 initial 0.3
## 4 initial 0.4
## 5 initial 1.2
## 6 initial 7.1
## 7 initial 1.6
## 8 initial 2.4
## 9 initial 0.8
## 10 initial 2.4
## 11 initial 0.5
## 12 initial 0.9
## 13 initial 2.0
## 14 initial 0.1
## 15 initial 0.3
## 16 initial 6.5
## 17 initial 16.3
## 18 initial 1.6
## 19 overspend 0.3
## 20 overspend 0.7
## 21 overspend 0.1
## 22 overspend 0.7
## 23 overspend 0.8
## 24 overspend 2.6
## 25 overspend 0.6
## 26 overspend 1.7
## 27 overspend 1.4
## 28 overspend 2.6
## 29 overspend 2.0
## 30 overspend 2.0
## 31 overspend 2.4
## 32 overspend 6.7
## 33 overspend 2.2
## 34 overspend 8.5
## 35 overspend 5.6
## 36 overspend 3.0
#VISUALIZATION
#plot a base
p<- ggplot(data=df, aes(x=cost, y=YearSeasonCountry,fill=continent))+
#plot stacked bars
geom_col(position = position_stack(reverse = FALSE),colour="black")+
#add labels
geom_text(data=df,aes(cost, YearSeasonCountry,label=cost), size=1.7,
position=position_stack(vjust=.5),check_overlap=TRUE)+
#add titles
labs(
title="OLYMPIC GAMES ALWAYS GO OVER BUDGET (1968-2016)",
subtitle= "Displaying intital estimate and over expenditure in $ billion",
x="Intitial Budget (left) and Overspend (right)",
y="Year - Season - Country",
caption="Source:The Oxford Olympic Study 2016: Cost and Cost Overrun at the Games") +
#change theme
theme_light() +
#customize titles and legend
theme(
text= element_text(family= "Times New Roman"),
legend.title =element_blank(),
legend.text = element_text(size=7),
legend.position = "top",
legend.box="horizontal",
legend.box.just = "left",
legend.key.size = unit(0.3,'cm'),
plot.title = element_text(size= 9,face = "bold",hjust=0.5),
plot.caption= element_text(size=7,color="grey", face="italic",hjust = 1),
plot.subtitle = element_text(size=9,colour = "black", face="italic", hjust=0.5),
axis.text=element_text(size=7),
axis.title=element_text(size=7, face="bold"),
)
Data Reference
Flyvbjerg, B., Stewart, A., & Budzier, A. (2016). The Oxford Olympics Study 2016: Cost and Cost Overrun. SSRN, 8,12.
The following plot fixes the main issues in the original.