Assignment 2

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original

Source: The New York Times (2022).

Objective

With the recent conclusion of the mega sporting event in Beijing, the common question was “Which were the top performing countries in the Winter Olympics?”

The official Olympic medal table, ranks the nations based on their number of gold medals, with silver and bronze only used to break ties.
Another method (predominantly employed in the USA) is to rank countries based on the total medals won. In this case, a silver and bronze medal are considered equivalent to the gold medal.

Which method is more accurate? Is there a different method?

The above visualization explores this question and its main objective is to provide the audience a different perspective on how a country would rank in the Winter Olympics, if weighted ranking systems were followed.

The charts show all the places a country might land on a medals table, given different ways of measuring the relative worth of a gold medal to a silver, and a silver to a bronze.

The upper-right corner considers only the gold medals. The lower-left corner considers all medals equal. Everywhere in between is a weighed scoring method. (Considering gold is worth 2 or 5 times silver and silver is worth 2 or 5 times bronze)

The target audience for this visualization are -
1) the sport fans who closely follow Olympics
2) the sports analysts and statisticians, to critique the ranking system
3) the general public who are just keen on the Olympic results.

The chosen visualization had the following three main issues:

Colour - The chosen colour scale is not user friendly. The colour changes on the individual heat maps are sometimes hard to notice. The scale (rainbow spectrum) spans shades of red and green, and hence not colour blindness safe. The contrast of the labels are bad and a strain on the eye. Some colour gradients are hard to comprehend as its unlabelled.
Data Interpretation - The visualization lists only 9 countries. The reader could be misled into thinking other countries did not win any medals. Additionally, even upon understanding how to read the graph, the visualization is complicated. It is not straightforward to figure out when and how the ranks change across different gradients. The x and y axis labels are not maintained throughout the visualization.
Failure to answer a practical question - It performs poorly when you want to compare standings of different countries in the same/different measurement method. The weightage from individual medals contributing to the overall rank, is not visually showcased.

Reference

What’s Going On in This Graph? | Olympics Medal Count. (2022). Retrieved 29 April 2022, from The New York Times website: https://www.nytimes.com/2022/02/24/learning/whats-going-on-in-this-graph-march-2-2022.html

Code

The following code was used to fix the issues identified in the original.

#List of packages used
library(openxlsx)  #To read excel files
library(dplyr) #To use mutate function during data pre processing
library(ggplot2) #To plot graphs
library(naniar) #To combine individual plots
library(tidyr) #To convert data to long format
library(ggpubr) #To combine multiple graphs

#Read the data and store it as a data frame.
#Source - https://www.kaggle.com/datasets/arjunprasadsarkhel/2022-winter-olympics-beijing
olympics <- read.xlsx("Medals_2022.xlsx")

#Next using the mutate function we add columns to our dataframe.
#Weighted model 1 - Gold worth 2x Silver, Silver worth 2x Bronze.

temp <- mutate(olympics, Gold_4=4*as.numeric(olympics$Gold), Silver_2=2*as.numeric(olympics$Silver), WeightedRank_GSB_221 = 4*as.numeric(olympics$Gold)+2*as.numeric(olympics$Silver)+1*as.numeric(olympics$Bronze))

#Convert all quantitative variables to numeric type and df is our new dataframe
df <- temp %>% select(-NOC) 
df <- lapply(df,as.numeric)
df$Country <- temp$NOC
df <- df %>% as.data.frame()

#Compute the rank for new model
df$Rank_221 <- c(rank(-df$WeightedRank_GSB_221,ties.method = "min"))

#Convert to long format
df_long <- df %>% pivot_longer(names_to = "Medals", values_to = "Points", cols = c(7,8,4))

#Convert Medals datatype to factor and update the labels
df_long$Medals <- factor(df_long$Medals,c( 'Points from bronze medals', 'Points from silver medals','Points from gold medals'), levels = c('Bronze', 'Silver_2','Gold_4' ))

#Plot the graph with Points on x axis and  Country reordered according to the rank on y axis    
p1 <- ggplot(data = df_long, 
            aes(x = Points, y = reorder(Country, desc(df_long$Rank_221)), fill=df_long$Medals)) +
  geom_bar(stat = "identity") +
  #Add rank label
  geom_text(aes(label = df_long$Rank_221, group = Country, x=0), stat = 'summary', fun = mean,
              hjust = "top", family="Georgia")
#Add the total points to the end of bar graph
p1 <- p1 +
  geom_text(aes(label = stat(x), group = Country),
  stat = 'summary', fun = sum, colour="black", size=4, fontface="italic", hjust =-0, vjust=0.5)

#Add individual contribution of the medals to the stacked bar graph 
p1 <- p1 + geom_text(aes(label = Points),
             hjust = "top",position = position_stack(vjust=0.5))

#Update the colouring and exclude legends
#Reference - https://rdrr.io/github/sadleskorn/apply303/man/medal_counts2.html
p1 <- p1 + scale_fill_manual(values = c("#cd7f32", "#c0c0c0", "#ffd700"),
                             name='Medal weightage',
                             labels=c('Bronze', 'Silver', 'Gold'))+ xlab("Total points") + ylab("Countries")+labs(title = "Weighted Ranking - Gold worth 2x Silver, Silver worth 2x Bronze")+
  theme(plot.title = element_text(face="bold"), legend.position = "top", legend.box.background = element_rect(colour = "black"))


#Repeat the above steps for other models
#Weighted model 2 - Gold worth 5x Silver, Silver worth 2x Bronze.
temp <- mutate(olympics, Gold_10=10*as.numeric(olympics$Gold), Silver_2=2*as.numeric(olympics$Silver), WeightedRank_GSB_521 = 10*as.numeric(olympics$Gold)+2*as.numeric(olympics$Silver)+1*as.numeric(olympics$Bronze))

#Convert quantitative variables as numeric datatype
df2 <- temp %>% select(-NOC) 
df2 <- lapply(df2,as.numeric)
df2$Country <- temp$NOC
df2 <- df2 %>% as.data.frame()
df2$Rank_521 <- c(rank(-df2$WeightedRank_GSB_521,ties.method = "min"))

#Data preprocessing
df2_long <- df2 %>% pivot_longer(names_to = "Medals", values_to = "Points", cols = c(7,8,4))
df2_long$Medals <- factor(df2_long$Medals,c( 'Points from bronze medals', 'Points from silver medals','Points from gold medals'), levels = c('Bronze', 'Silver_2','Gold_10' ))

#Plot the graph                         
p2 <- ggplot(data = df2_long, 
            aes(x = Points, y = reorder(Country, desc(df2_long$Rank_521)), fill=df2_long$Medals)) +
  geom_bar(stat = "identity")+
  geom_text(aes(label = df2_long$Rank_521, group = Country, x=0), stat = 'summary', fun = mean,
              hjust = "top", family="Georgia")

p2 <- p2 +
  geom_text(aes(label = stat(x), group = Country),
  stat = 'summary', fun = sum, colour="black", size=4, fontface="italic", hjust =-0, vjust=0.5)

p2 <- p2 + geom_text(aes(label = Points),
             hjust = "top",position = position_stack(vjust=0.5))

p2 <- p2 + scale_fill_manual(values = c("#cd7f32", "#c0c0c0", "#ffd700"),
                             name='Medal weightage',
                             labels=c('Bronze', 'Silver', 'Gold'))+ xlab("Total points") + ylab("Countries")+labs(title = "\nWeighted Ranking - Gold worth 5x Silver, Silver worth 2x Bronze")+
  theme(plot.title = element_text(face="bold"), legend.position = "top", legend.box.background = element_rect(colour = "black"))

#Weighted model 3 - Gold worth 2x Silver, Silver worth 5x Bronze.
temp <- mutate(olympics, Gold_10=10*as.numeric(olympics$Gold), Silver_5=5*as.numeric(olympics$Silver), WeightedRank_GSB_251 = 10*as.numeric(olympics$Gold)+5*as.numeric(olympics$Silver)+1*as.numeric(olympics$Bronze))

#Data Pre-processing
df3 <- temp %>% select(-NOC) 
df3 <- lapply(df3,as.numeric)
df3$Country <- temp$NOC
df3 <- df3 %>% as.data.frame()
df3$Rank_251 <- c(rank(-df3$WeightedRank_GSB_251,ties.method = "min"))

df3_long <- df3 %>% pivot_longer(names_to = "Medals", values_to = "Points", cols = c(7,8,4))

df3_long$Medals <- factor(df3_long$Medals,c( 'Points from bronze medals', 'Points from silver medals','Points from gold medals'), levels = c('Bronze', 'Silver_5','Gold_10' ))

#Plot the graph                        
p3 <- ggplot(data = df3_long, 
            aes(x = Points, y = reorder(Country, desc(df3_long$Rank_251)), fill=df3_long$Medals)) +
  geom_bar(stat = "identity")+
  geom_text(aes(label = df3_long$Rank_251, group = Country, x=0), stat = 'summary', fun = mean,
              hjust = "top", family="Georgia")

p3 <- p3 +
  geom_text(aes(label = stat(x), group = Country),
  stat = 'summary', fun = sum, colour="black", size=4, fontface="italic", hjust =-0, vjust=0.5)
 
p3 <- p3 + geom_text(aes(label = Points),
             hjust = "top",position = position_stack(vjust=0.5))

p3 <- p3 + scale_fill_manual(values = c("#cd7f32", "#c0c0c0", "#ffd700"),
                             name='Medal weightage',
                             labels=c('Bronze', 'Silver', 'Gold'))+ xlab("Total points") + ylab("Countries")+labs(title = "\nWeighted Ranking - Gold worth 2x Silver, Silver worth 5x Bronze")+
  theme(plot.title = element_text(face="bold"), legend.position = "top", legend.box.background = element_rect(colour = "black"))


#Weighted model 4 - Gold worth 5x Silver, Silver worth 5x Bronze.
temp <- mutate(olympics, Gold_25=25*as.numeric(olympics$Gold), Silver_5=5*as.numeric(olympics$Silver), WeightedRank_GSB_551 = 25*as.numeric(olympics$Gold)+5*as.numeric(olympics$Silver)+1*as.numeric(olympics$Bronze))
#Data Pre-processing
df4 <- temp %>% select(-NOC) 
df4 <- lapply(df4,as.numeric)
df4$Country <- temp$NOC
df4 <- df4 %>% as.data.frame()
df4$Rank_551 <- c(rank(-df4$WeightedRank_GSB_551,ties.method = "min"))

df4_long <- df4 %>% pivot_longer(names_to = "Medals", values_to = "Points", cols = c(7,8,4))

df4_long$Medals <- factor(df4_long$Medals,c( 'Points from bronze medals', 'Points from silver medals','Points from gold medals'), levels = c('Bronze', 'Silver_5','Gold_25' ))

#Plot the graph                      
p4 <- ggplot(data = df4_long, 
            aes(x = Points, y = reorder(Country, desc(df4_long$Rank_551)), fill=df4_long$Medals)) +
  geom_bar(stat = "identity")+
  geom_text(aes(label = df4_long$Rank_551, group = Country, x=0), stat = 'summary', fun = mean,
              hjust = "top", family="Georgia")

p4 <- p4 +
  geom_text(aes(label = stat(x), group = Country),
  stat = 'summary', fun = sum, colour="black", size=4, fontface="italic", hjust =-0, vjust=0.5)

p4 <- p4 + geom_text(aes(label = Points),
             hjust = "top",position = position_stack(vjust=0.5))

p4 <- p4 + scale_fill_manual(values = c("#cd7f32", "#c0c0c0", "#ffd700"),
                             name='Medal weightage',
                             labels=c('Bronze', 'Silver', 'Gold'))+ xlab("Total points") + ylab("Countries")+ labs(title = "\nWeighted Ranking - Gold worth 5x Silver, Silver worth 5x Bronze")+
  theme(plot.title = element_text(face="bold"), legend.position = "top", legend.box.background = element_rect(colour = "black"))

#Model 5 - All medals carry equal weight.
temp <- mutate(olympics, Gold=1*as.numeric(olympics$Gold), Silver=1*as.numeric(olympics$Silver), Total = 1*as.numeric(olympics$Gold)+1*as.numeric(olympics$Silver)+1*as.numeric(olympics$Bronze))

#Data pre-processing
df5 <- temp %>% select(-NOC) 
df5 <- lapply(df5,as.numeric)
df5$Country <- temp$NOC
df5 <- df5 %>% as.data.frame()
df5$Rank <- c(rank(-df5$Total,ties.method = "min"))

df5_long <- df5 %>% pivot_longer(names_to = "Medals", values_to = "Points", cols = c(2,3,4))
df5_long$Medals <- factor(df5_long$Medals,c( 'Points from bronze medals', 'Points from silver medals','Points from gold medals'), levels = c('Bronze', 'Silver','Gold' ))

#Plot the graph                        
p5 <- ggplot(data = df5_long, 
            aes(x = Points, y = reorder(Country, desc(df5_long$Rank)), fill=Medals)) +
  geom_bar(stat = "identity")+
  geom_text(aes(label = df5_long$Rank, group = Country, x=0), stat = 'summary', fun = mean,
              hjust = "top", family="Georgia")

p5 <- p5 +
  geom_text(aes(label = stat(x), group = Country),
  stat = 'summary', fun = sum, colour="black", size=4, fontface="italic", hjust =-0, vjust=0.5)

p5 <- p5 + geom_text(aes(label = Points),
             hjust = "top",position = position_stack(vjust=0.5))

p5 <- p5 + scale_fill_manual(values = c("#cd7f32", "#c0c0c0", "#ffd700"),
                             name='Medal weightage',
                             labels=c('Bronze', 'Silver', 'Gold'))+ 
                               xlab("Total points") + 
                               ylab("Countries") + 
                               labs(title = "Rank by Total Medals")+
  theme(plot.title = element_text(face="bold"), legend.position = "top", legend.box.background = element_rect(colour = "black"))

#Combine all the graphs
#Note: Same could have been achieved through faceting graph, however since we wish to showcase the medal contributions in the stacked bar chart, we follow this approach

p <- ggarrange(p5, p1, p2, p3, p4, common.legend = TRUE, legend = "top", ncol = 2, nrow = 3)
p <- annotate_figure(p, 
                     top = text_grob("Beijing Winter Olympics 2022 - Exploring different ranking systems\n",  color = "black", face = "bold", size = 20),
                     bottom = text_grob("Data source: Kaggle (2022 Winter Olympics Beijing)", color = "blue",
                                   face = "italic", size = 12)
                     )

Data Reference

2022 Winter Olympics Beijing. (2022). Retrieved 1 May 2022, from https://www.kaggle.com/datasets/arjunprasadsarkhel/2022-winter-olympics-beijing

Reconstruction

The following plot fixes the main issues in the original.

Instructions for reading the graph:
1) The countries are sorted based on the overall points (highest to lowest) a country scored according to the weightage system. The rank of each country appears at the start of the stacked bar.
2) Gold, Silver and Bronze colour stack represents the contribution of each medal type to the overall points total and actual value is represented in the centre. It is colour blind safe.
3) The overall points can be viewed at the end of the stack.

Assignment 2

Deconstruct, Reconstruct Web Report

Yeshaswi Aralaguppe Muralidhar (S3825856)

Original

Code

Reconstruction