This is a template file. The example included is not considered a good example to follow for Assignment 2. Remove this warning prior to submitting.
Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
Explain the objective of the original data visualisation and the targetted audience.
This graph was published in the cricinfo website and the article was about which top cricket city would win the world cup in 2019.
Under this main topic, author is talking on match winning factors and the primary objective of this graph was to portray the cricket cities with the best batting averages.
But I believe the visualization was difficult to understand and had the following three main issues:
It is not visually supportive to understand the meaning of the graph. You need to read the text to know what this graph is visualising. According to the text in the graph highest batting averages are from Sydney but when you look at the visualization Sydney is the smallest compared to other four cities. The author has not given any values to the proportion of the outcomes. The shape’s size, height or width doesn’t present any difference as we can see the lowest average is Mumbai but it stands as the tallest in the graph. Therefore we cannot find a visual gist in this graph.
The colors used in this graph is not user friendly. Red and Blue are primary colors and they are very strong colors too. When the user looks into stronger colors and high contrast can produce after images when the viewer looks away from the screen. It is straining the eyes.
Also there’s no clear description about the period of the data for this graph. It could be since the beginning of cricket to 2019 of world cups or starting from world cups of different period of time to 2019.
Reference
The following code was used to fix the issues identified in the original.
# Instal librarys
library(ggplot2)
library(dplyr)
# Extracting the data
df <-read.csv("C:/data/Cricinfo_Dataset_1975to2019.csv")
head(df,5)
## Row.Labels Country City
## 1 England - St Lawrence Ground, Canterbury England Canterbury
## 2 New Zealand - Sky Stadium, Wellington New Zealand Wellington
## 3 Australia - Melbourne Cricket Ground Australia Melbourne
## 4 Australia - Manuka Oval, Canberra Australia Canberra
## 5 South Africa - City Oval, Pietermaritzburg South Africa Pietermaritzburg
## Average.of.Ave Runs Mat
## 1 112.15 407 1
## 2 84.54 2048 4
## 3 71.94 2493 5
## 4 63.59 1711 3
## 5 60.30 691 2
# Finding the summary of Mat variable - which is the number of matches variable
summary(df$Mat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.274 6.000 17.000
# Extracting the data of Mean matches which is 4 or more
df1 <- df
df2 =filter (df, Mat >= 4)
head(df2,4)
## Row.Labels Country
## 1 New Zealand - Sky Stadium, Wellington New Zealand
## 2 Australia - Melbourne Cricket Ground Australia
## 3 South Africa - The Wanderers Stadium, Johannesburg South Africa
## 4 India - Vidarbha Cricket Association Stadium, Jamtha, Nagpur India
## City Average.of.Ave Runs Mat
## 1 Wellington 84.54 2048 4
## 2 Melbourne 71.94 2493 5
## 3 Johannesburg 57.87 2469 5
## 4 Nagpur 56.09 2775 6
# Selecting the top 5 averages
df3 = df2%>%
group_by(City)%>%
head(5, Average.of.Ave)
df3
## # A tibble: 5 x 6
## # Groups: City [5]
## Row.Labels Country City Average.of.Ave Runs Mat
## <chr> <chr> <chr> <dbl> <int> <int>
## 1 New Zealand - Sky Stadium, Wellington "New Z~ " We~ 84.5 2048 4
## 2 Australia - Melbourne Cricket Ground "Austr~ "Mel~ 71.9 2493 5
## 3 South Africa - The Wanderers Stadium~ "South~ " Jo~ 57.9 2469 5
## 4 India - Vidarbha Cricket Association~ "India~ " Na~ 56.1 2775 6
## 5 England - Riverside Ground, Chester-~ "Engla~ "Che~ 53.0 2340 5
# Graph
text1 = c("Avg=84.54","Avg=71.94","Avg=57.87","Avg=56.09","Avg=53.05")
text2= c("Runs=2048","Runs=2493","Runs=2469","Runs=2775","Runs=2340")
p <- ggplot(data=df3, aes(x= reorder(City,-Average.of.Ave), y= Average.of.Ave))+
geom_bar(stat="identity", fill="steelblue",width=0.6) +
geom_text(aes(label= text1), vjust= 1.6, size=3.5,color="white",nudge_y = -5)+
geom_text(aes(label = text2), size=3.5,color="black",nudge_y = -3)+
xlab ("City") + ylab("Batting Averages") +
ggtitle ("Cities with Best Batting Averages in World Cup 1975 - 2019" ) +
theme_minimal()
p + theme(
plot.title = element_text(color="black", size=12, face="bold"),
axis.title.x = element_text(color="black", size=10, face="bold"),
axis.title.y = element_text(color="black", size=10, face="bold")
)
p <- ggplot(data=df3, aes(x= reorder(City,Average.of.Ave), y= Average.of.Ave))+
geom_bar(stat="identity", fill="steelblue",width=0.6) +
geom_text(aes(label= signif(Average.of.Ave)), vjust= 1.6, size=3.5,color="white",nudge_y = -5)+
geom_text(aes(label = Runs), size=3.5,color="black",nudge_y = -3)+
xlab ("City") + ylab("Batting Averages") +
ggtitle ("Cities with Best Batting Averages in World Cup 1975 - 2019" ) +
theme_minimal()
p + theme(
plot.title = element_text(color="black", size=12, face="bold"),
axis.title.x = element_text(color="black", size=10, face="bold"),
axis.title.y = element_text(color="black", size=10, face="bold")
)
Data Reference
The following plot fixes the main issues in the original.