Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
Context: The COVID-19 pandemic has widened existing gender gaps and stalled progress toward gender parity in several economies and industries (World Economic Forum 2021). Current projections estimate it will take 135.6 years to close the overall global gender gap (World Economic Forum 2021). Similarly, the United States (US) has seen the pandemic threaten to erase gender gains achieved between 2015 and 2020 (Thomas et al 2020).
Purpose: To emphasise the slow progress toward gender parity in US corporations between 2015 and 2020 across occupation categories, or corporate pipeline, and highlight categories where gender equity gains were most pronounced.
Target audience: Broadly speaking the target audience is the general population, both in the US and other countries, who have an interest in gender equity. Audiences that may have a particular interest in this research include women, corporations, institutions and other national and global forums.
Analysis: Data visualisations are a powerful tool to captivate an audience, highlight important findings, and strengthen understanding of the data (Baglin 2020).
The visualisation chosen had the following three main issues:
Reference
The following code was used to fix the issues identified in the original.
suppress <- suppressPackageStartupMessages # suppress library warning messages from displaying in output
suppress(library(readr))
suppress(library(dplyr))
suppress(library(tidyr))
suppress(library(magrittr))
suppress(library(ggplot2))
setwd("~/RMIT (Data Science)/Data Visualisation (MATH2150)/Assignment 2/Submission")
# Data collation and preprocessing was conducted seperately in R and a CSV file containing data created.
# Code for this initial step is provided in the submission document for reference.
# Import data set, view data structure, allocate levels within occupation category, and view header of data set
data <- read_csv("Data_Category.csv")
str(data)
## spec_tbl_df [30 x 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Job_Role : chr [1:30] "Entry Level" "Entry Level" "Entry Level" "Entry Level" ...
## $ Category : chr [1:30] "Men_2020" "Women_2020" "Men_2015" "Women_2015" ...
## $ Employee Percent: num [1:30] 53 47 55.1 44.9 2.1 62 38 63.2 36.8 1.2 ...
## - attr(*, "spec")=
## .. cols(
## .. Job_Role = col_character(),
## .. Category = col_character(),
## .. `Employee Percent` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
data$Job_Role <- factor(c(data$Job_Role), levels=c("Entry Level", "Manager", "Senior Manager / Director", "Vice President", "Senior Vice President", "C-Suite"), ordered=TRUE)
head(data)
## # A tibble: 6 x 3
## Job_Role Category `Employee Percent`
## <ord> <chr> <dbl>
## 1 Entry Level Men_2020 53
## 2 Entry Level Women_2020 47
## 3 Entry Level Men_2015 55.1
## 4 Entry Level Women_2015 44.9
## 5 Entry Level Women_Change_2015_2020 2.1
## 6 Manager Men_2020 62
# Filter data for plotting and order the categories for plotting
data_plot <- data %>% filter(Category=="Men_2020" | Category=="Women_Change_2015_2020" | Category=="Women_2015")
data_plot$Category <- factor(c(data_plot$Category), levels=c("Men_2020", "Women_Change_2015_2020", "Women_2015"), ordered=TRUE)
head(data_plot)
## # A tibble: 6 x 3
## Job_Role Category `Employee Percent`
## <ord> <ord> <dbl>
## 1 Entry Level Men_2020 53
## 2 Entry Level Women_2015 44.9
## 3 Entry Level Women_Change_2015_2020 2.1
## 4 Manager Men_2020 62
## 5 Manager Women_2015 36.8
## 6 Manager Women_Change_2015_2020 1.2
# Reconstruct data visualisation to a stacked bar chart of gender categories using ggplot2
# with occupational categories on the Y-axis and employee percentage on the X-axis
p1 <- ggplot(data_plot, aes(fill=Category, x=`Employee Percent`, y=Job_Role)) +
geom_bar(position="stack", stat="identity", width=0.8)
p1 <- p1 + theme(panel.background=element_rect(fill="lightsteelblue1")) +
scale_fill_manual(values=c("midnightblue", "darkorange1", "goldenrod1"), name=element_blank(),
labels=c("Total Men in 2020", "% point change for Women from 2015 to 2020",
"Total Women in 2015"), guide=guide_legend(reverse = TRUE)) +
scale_x_continuous(breaks=seq(0, 100, by=10)) +
theme(panel.grid.major.y=element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_line(size=0.5, linetype='solid', colour="lightsteelblue3")) +
theme(axis.ticks=element_blank(), axis.title.y = element_blank()) +
theme(text=element_text(family="sans")) +
labs(title="Since 2015, we've seen only modest signs of progress in the
representation of women in the corporate pipeline.",
subtitle="Representation of employees by level", x="% of employees") +
theme(plot.title=element_text(face="bold", color="black", size=11),
plot.subtitle=element_text(face="bold", color="black", size=9, vjust=-1.5),
axis.title.x=element_text(face="bold", color="gray0", size=9),
axis.text.x=element_text(color="gray10", size=8),
axis.text.y=element_text(color="gray0", size=8)) +
theme(legend.position="top", legend.key.size = unit(0.3, 'cm'),
legend.text=element_text(color="gray10", size=8))
Data Reference
The following plot fixes the main issues in the original.