Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The above data visualisation illustrates the number of boat asylum seeker who unable to return safely in two financial years which are : 2013 - 2014 and 2014-2015, published by Refugee Council of Australia in March 2020.
The target audience of this data visualisation can be politician who can use this for their politic purposes, or those who want to gain the knowledge of history. This data visualisation can be interpreted by general people who interest in data analytics.
The visualisation chosen had the following three main issues:
Colour issue: The choice of red colour: light red, red, dark red representing for Crew, Total(including crew) and Children can make audience confused. The viewer may find it hard to distinguish each category.
Data layout and deceptive methods: there are smuggling ventures in each month, but the vertical bar chart may not demonstrate all information to audience. For example: In September 2013 and October 2013 , these two months have 5 ventures each, but the viewer are unlikely to see that data. In addition, there is no separate line between ventures, which may lead to false understanding.
Issue with data integrity ( inappropriate data and insufficient data): although there is no records in other months, but the data should contain those months in order to deliver a better overview to audience. Moreover, the data should provide all kinds of people on boat rather than giving total number of people on board and only 2 types which are Crew and Children.
Reference
The following code was used to fix the issues identified in the original.
library(dplyr)
library(tidyr)
library(magrittr)
library(ggplot2)
#Import data
data<-read.csv(file="data.csv", header=TRUE)
#Tidying and manipulating data
colnames(data)<-c("Venture", "Total_including_crew", "Crew", "Children")
data[1:23,1]<- c("vent 01 (09/2013)","vent 02 (09/2013)","vent 03 (09/2013)",
"vent 04 (09/2013)","vent 05 (09/2013)","vent 06 (10/2013)",
"vent 07 (10/2013)","vent 08 (10/2013)","vent 09 (10/2013)","vent 10 (10/2013)",
"vent 11 (11/2013)","vent 12 (11/2013)","vent 13 (11/2013)","vent 14 (11/2013)",
"vent 15 (11/2013)","vent 16 (12/2013)","vent 17 (12/2013)","vent 18 (12/2013)",
"vent 19 (12/2013)","vent 20 (12/2013)","vent 21 (12/2013)","vent 22 (12/2013)",
"vent 23 (07/2014)")
data1<-mutate(data,Others=Total_including_crew - Crew - Children)
data1=data1[ ,c(1,3,4,5)]
data2<-data1%>%gather(2:4,key ="Category", value="Number_of_people")%>%group_by(Venture)
data2<-data2%>%arrange(Venture,desc(Category))
data2<-mutate(data2,lab_ypos = cumsum(Number_of_people) - 0.5* Number_of_people)
# Plot data
p <- ggplot(data = data2, aes(x = Venture, y = Number_of_people,label=Number_of_people)) +
geom_col(aes(fill = Category), width = 0.9) +
ggtitle("BOATS THAT COULD NOT BE SAFELY RETURNED FROM 2013") +
geom_text(data=subset(data2,Number_of_people !=0),aes(y = lab_ypos, label = Number_of_people, group =Category), color = "black", size = 2.5,face="bold")+
coord_flip()+
theme(axis.text.x = element_text(face="bold", size=8),axis.text.y = element_text(face="bold",size=7),axis.title.x = element_text(face="bold", size=10, color="Red"),axis.title.y = element_text(face="bold", size=10, color="Red"), title= element_text(colour = "#7F3D17",face="bold",size=11.5))
Data Reference
The following plot fixes the main issues in the original.