This is a template file. The example included is not considered a good example to follow for Assignment 2. Remove this warning prior to submitting.

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: ACMA Research and Analysis Section (2015).


Objective

Explain the objective of the original data visualisation and the targetted audience.

This graph was published in the cricinfo website and the article was about which top cricket city would win the world cup in 2019.

Under this main topic, author is talking on match winning factors and the primary objective of this graph was to portray the cricket cities with the best batting averages.

But I believe the visualization was difficult to understand and had the following three main issues:

  • It is not visually supportive to understand the meaning of the graph. You need to read the text to know what this graph is visualising. According to the text in the graph highest batting averages are from Sydney but when you look at the visualization Sydney is the smallest compared to other four cities. The author has not given any values to the proportion of the outcomes. The shape’s size, height or width doesn’t present any difference as we can see the lowest average is Mumbai but it stands as the tallest in the graph. Therefore we cannot find a visual gist in this graph.

  • The colors used in this graph is not user friendly. Red and Blue are primary colors and they are very strong colors too. When the user looks into stronger colors and high contrast can produce after images when the viewer looks away from the screen. It is straining the eyes.

  • Also there’s no clear description about the period of the data for this graph. It could be since the beginning of cricket to 2019 of world cups or starting from world cups of different period of time to 2019.

Reference

Code

The following code was used to fix the issues identified in the original.

# Instal librarys
library(ggplot2)
library(dplyr)  
# Extracting the data
df <-read.csv("C:/data/Cricinfo_Dataset_1975to2019.csv")
head(df,5)  
##                                   Row.Labels       Country              City
## 1   England - St Lawrence Ground, Canterbury      England         Canterbury
## 2      New Zealand - Sky Stadium, Wellington  New Zealand         Wellington
## 3       Australia - Melbourne Cricket Ground    Australia          Melbourne
## 4          Australia - Manuka Oval, Canberra    Australia           Canberra
## 5 South Africa - City Oval, Pietermaritzburg South Africa   Pietermaritzburg
##   Average.of.Ave Runs Mat
## 1         112.15  407   1
## 2          84.54 2048   4
## 3          71.94 2493   5
## 4          63.59 1711   3
## 5          60.30  691   2
# Finding the summary of Mat variable - which is the number of matches variable
summary(df$Mat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.274   6.000  17.000
# Extracting the data of Mean matches which is 4 or more
df1 <- df 
df2 =filter (df, Mat >= 4)
head(df2,4)
##                                                     Row.Labels       Country
## 1                        New Zealand - Sky Stadium, Wellington  New Zealand 
## 2                         Australia - Melbourne Cricket Ground    Australia 
## 3           South Africa - The Wanderers Stadium, Johannesburg South Africa 
## 4 India - Vidarbha Cricket Association Stadium, Jamtha, Nagpur        India 
##            City Average.of.Ave Runs Mat
## 1    Wellington          84.54 2048   4
## 2     Melbourne          71.94 2493   5
## 3  Johannesburg          57.87 2469   5
## 4        Nagpur          56.09 2775   6
# Selecting the top 5 averages

df3 = df2%>%
  
group_by(City)%>%
head(5, Average.of.Ave)
df3
## # A tibble: 5 x 6
## # Groups:   City [5]
##   Row.Labels                            Country City  Average.of.Ave  Runs   Mat
##   <chr>                                 <chr>   <chr>          <dbl> <int> <int>
## 1 New Zealand - Sky Stadium, Wellington "New Z~ " We~           84.5  2048     4
## 2 Australia - Melbourne Cricket Ground  "Austr~ "Mel~           71.9  2493     5
## 3 South Africa - The Wanderers Stadium~ "South~ " Jo~           57.9  2469     5
## 4 India - Vidarbha Cricket Association~ "India~ " Na~           56.1  2775     6
## 5 England - Riverside Ground, Chester-~ "Engla~ "Che~           53.0  2340     5
# Graph
text1 = c("Avg=84.54","Avg=71.94","Avg=57.87","Avg=56.09","Avg=53.05")
text2= c("Runs=2048","Runs=2493","Runs=2469","Runs=2775","Runs=2340")


p <- ggplot(data=df3, aes(x= reorder(City,-Average.of.Ave), y= Average.of.Ave))+
  
   geom_bar(stat="identity", fill="steelblue",width=0.6) +
  
 geom_text(aes(label= text1), vjust= 1.6, size=3.5,color="white",nudge_y = -5)+
  
  geom_text(aes(label = text2), size=3.5,color="black",nudge_y = -3)+
  
  xlab ("City") + ylab("Batting Averages") +
  
  ggtitle ("Cities with Best Batting Averages in World Cup 1975 - 2019" ) + 
  
  theme_minimal()
  

 p + theme(
    plot.title = element_text(color="black", size=12, face="bold"),
    axis.title.x = element_text(color="black", size=10, face="bold"),
    axis.title.y = element_text(color="black", size=10, face="bold")
  )

p <- ggplot(data=df3, aes(x= reorder(City,Average.of.Ave), y= Average.of.Ave))+
   geom_bar(stat="identity", fill="steelblue",width=0.6) +
  
 geom_text(aes(label= signif(Average.of.Ave)), vjust= 1.6, size=3.5,color="white",nudge_y = -5)+
  
  geom_text(aes(label = Runs), size=3.5,color="black",nudge_y = -3)+
  
  xlab ("City") + ylab("Batting Averages") +
  
  ggtitle ("Cities with Best Batting Averages in World Cup 1975 - 2019" ) + 
  
  theme_minimal()
  

 p + theme(
    plot.title = element_text(color="black", size=12, face="bold"),
    axis.title.x = element_text(color="black", size=10, face="bold"),
    axis.title.y = element_text(color="black", size=10, face="bold")
  )

Data Reference

Reconstruction

The following plot fixes the main issues in the original.