Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Australian Bureau of Statistics (2020).


Objective and Targeted Audience

The selected data visualization was extracted from the Australian Bureau of Statistics website. The data was collected from 1988 until 2020 and the plot was released on December 11th 2020. The visualization is about the proportions of employees in casual employment during this period of time. ABS uses casual employment rate as the main indicator for whether an employee is entitled to paid leave and this data is essential to the policy makers and public works professionals to set policy decisions. Therefore, the target audience for this data visualization might be, but not limited to, policy makers, publick works professionals and employers.

The main problems analyzed in the visualization are:

  • Confusing Timeseries (Lines Layout): The visualization had split the timeseries between pre and post 2004 year, although it is reporting the same variables (employment rates) along the time for the same gender groups.

  • Difficult to correlate legend to the plot: Because of the previous issue, the plot has 6 different timeseries, each one with different colors, to identify the data of the genders between two different time periods (pre and post 2004). It made pretty difficult to interpret which gender each line is referring to in the plot. The choose of colors was also poor, since there is no similarity between the pre and post 2004 color for the same gender.

  • Confusing Objective: The visualization doesn’t capture and report a clear objective. It is required to ready the article to have a better idea of plot’s message.

Reference

Code

The following code was used to fix the issues identified in the original.

#Load Libraries 
library(dplyr)
library(ggplot2)
library(tidyverse) 
library(janitor)
library(lubridate) 
library(extrafont)
library(showtext)
library(grid)
#Font Import

font_add("Oswald", "/Library/Fonts/Oswald-Regular.ttf")
showtext_auto()
#loadfonts() 
fonts() #Checking available Fonts 
## Load data
dt <- read_csv("Share of casual employment.csv",skip=1,na="0") 
head(dt)
#Data Wrangling 

##Rename Column 1
dt<- dt%>%
  rename(Years=X1) 
head(dt) 

##Convert Year column from chr to date
dt$Years <- as.Date(paste0("01-", dt$Years), format="%d-%b-%y")
#dt$Years <- format(dt$Years, "%b-%y")
class(dt$Years) #Checking the new class date

##Merge pre and post 2004 data by columns Men, Women and Total 
dt_merge <- dt %>%
  unite("Men", `Men (pre-2004) (%)`,`Men (2004+) (%)`, remove=FALSE, na.rm=TRUE)%>%
  unite("Women",`Women (pre-2004) (%)`,`Women (2004+) (%)`, remove=FALSE, na.rm=TRUE)%>%
  unite("Total", `Total (pre-2004) (%)`,`Total (2004+) (%)`, remove=FALSE, na.rm=TRUE)

head(dt_merge)

##Deleting pre and post columns 
drops <- c("Men (pre-2004) (%)","Women (pre-2004) (%)","Total (pre-2004) (%)","Men (2004+) (%)", "Women (2004+) (%)", "Total (2004+) (%)") 
dt_merge<-dt_merge[,!(names(dt_merge) %in% drops)]
head(dt_merge) 

##Cleaning Notes 
dt_merge<- slice(dt_merge, 1:129) #Removing notes from the dataset
head(dt_merge)

##Deleting rows with blank values
dt_mod <- dt_merge%>% 
  filter(Men!="" | Women!="" |Total!="") 
dt_mod <- arrange(dt_mod, desc(Years))
View(dt_mod)

##Convert to dbl 
dt_mod$Total<- as.numeric(dt_mod$Total)
dt_mod$Men<- as.numeric(dt_mod$Men)
dt_mod$Women<- as.numeric(dt_mod$Women)
##Ploting
date_vline <- as.Date(c("2004-01-01"))  # Set x coordinate for the intercept line 
data_curve_start <-as.Date(c("2010-01-01")) # Start of the arrow
annotation_start <- as.Date(c("2012-01-01")) # Locating annotation 1 
annotation_2004 <- as.Date(c("2003-01-01"))  # Lcating Annotation 2 (Year 2004)

#p1 - Main line and points plot containing the aes for each line. 
p1 <- dt_mod%>%  
  ggplot() + 
  geom_line(aes(x=Years, y=Total, col="Total")) +  
    geom_line(aes(x=Years, y=Men, col="Men")) +  
  geom_line(aes(x=Years, y=Women, col="Women")) +
  scale_color_manual(values = c("darkblue", "springgreen4","deeppink3")) +
  geom_point(aes(x=Years, y=Total), shape=21, color="black", fill="darkgreen") +
  geom_point(aes(x=Years, y=Men), shape=21, color="darkblue", fill="steelblue1")+
  geom_point(aes(x=Years, y=Women), shape=21, color="hotpink4", fill="indianred2") 

 

#p2 - Name the title, x and y labels and add intercept line in x. 
p2<- p1 + labs(x="Years", y="Employees(%)", title="Share of casual employment", subtitle = "Proportion of employees in casual employment ", caption ="Source: Australian Bureau of Statistics 

      Notes:                                                                               
      1. pre-2004 series: includes Owner Managers of Incorporated Enterprises (OMIEs). 
      2. 2004+ series: excludes OMIEs.
      3. From August 2014, casual employment is collected quarterly in the Labour Force Survey. 
      4. The ABS plans toproduce historically comparable estimates for the pre-2004 period, excluding OMIEs, in the future.") + 
  
  geom_vline(xintercept=date_vline, col="brown4", size=1.0, linetype="dotdash")+  # Year 2004 Intercept line 
  scale_x_date(date_breaks = "5 year", date_labels = "%Y") 


#p3 - Set the theme of background, legends, Title and subtitle

p3 <- p2 + theme (
  legend.position = c(0.95, 0.8),
  legend.direction = "horizontal",
  legend.justification = c("right","bottom"),
  legend.box.just = "right", 
  legend.title=element_blank(),
  legend.background = element_rect(fill="white"),
  legend.key = element_rect(fill = "white"),
  legend.key.width = unit(1.0, "cm"),
  panel.border = element_blank(),
  panel.background = element_rect(fill = "white", color = NA),
  panel.grid.major = element_line(size = 0.5, linetype = 'dotted',
                                colour = "#69b3a2"), 
  panel.grid.minor = element_line(size = 0.05, linetype = 'dotted',
                                colour = "#69b3a2"),
  plot.background = element_rect(fill="grey95"),  # Change backgroung color to grey
  axis.line = element_line(size=1.0, colour = "black"),
  text = element_text(family='Oswald', color="grey20"),
  plot.title = element_text(size = rel(2), hjust=0), # Position and size of Title
  plot.subtitle = element_text(size = rel(1), hjust=0), # Position and size of Subtitile
  plot.caption = element_text(hjust = 0))
 

#p4 -Add annotations
p4 <- p3 + annotate("text", x = annotation_start, 
                y = 13, label = "post-2004 series excludes OMIEs",family="Oswald",fontface="italic", angle=0, size=3, 
                colour='black', face="bold") +
           annotate("text", x = annotation_2004, 
                y = 16, label = "Year = 2004",family="Oswald", angle=90, size=4, 
                colour='black', face="bold") + 
         geom_curve(aes(x =  data_curve_start, y = 15, xend =date_vline, yend = 13), colour='deeppink4', size=0.5,arrow = arrow(length = unit(0.04, "npc")), alpha=0.1)

Data Reference

Reconstruction

The reconstruction of the original visualization can be seen below and does address the main issues highlighted previously.

In the reconstruction, the pre and post 2004 data was combined in each gender, which contributed to a cleaner and seamless visualization of the data. A intercept line was added to the x-axis to identify the 2004 year brake point. In order to improve the plot’s message, was added an annotation explaining the difference between the pre and post 2004 data and a subtitle containing further clarifications about the variable. Now, there is only one color to identify the genders Men, Women and Total. For last, the note’s justification was moved to the left side of the plot and spaced in order to have one note per line facilitating the reading.