Assignment 2

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original

Source: Coury et al 2020.

Objective

Context: The COVID-19 pandemic has widened existing gender gaps and stalled progress toward gender parity in several economies and industries (World Economic Forum 2021). Current projections estimate it will take 135.6 years to close the overall global gender gap (World Economic Forum 2021). Similarly, the United States (US) has seen the pandemic threaten to erase gender gains achieved between 2015 and 2020 (Thomas et al 2020).

Purpose: To emphasise the slow progress toward gender parity in US corporations between 2015 and 2020 across occupation categories, or corporate pipeline, and highlight categories where gender equity gains were most pronounced.

Target audience: Broadly speaking the target audience is the general population, both in the US and other countries, who have an interest in gender equity. Audiences that may have a particular interest in this research include women, corporations, institutions and other national and global forums.

Analysis: Data visualisations are a powerful tool to captivate an audience, highlight important findings, and strengthen understanding of the data (Baglin 2020).

The visualisation chosen had the following three main issues:

Category position: The occupation categories are grouped together and arranged to suggest a pathway, however the hierarchical nature of these categories are not explicitly reflected in the visualisation . Additional labels and annotations are required to differentiate between the categories and highlight their relationship.
Proportion position: The area representing the proportion of women is centered within each occupation category making it difficult to determine the scale accuracy as well as distance from parity which is an important factor in equity discussions. The inclusion of both male and female employees in the visualisation is commended and colour is used effectively to contrast different employee categories.
Connectedness and continuity across the years: With very little change across each year it’s difficult to ascertain progress toward gender parity between 2015 and 2020 and any occupation categories where gender equity gains were most pronounced.

Reference

Coury, S, Huang, J, Kumar, A, Prince, S, Krivkovich, A & Yee, L 2020, ‘Women in the Workplace 2020’, LeanIn.Org and McKinsey, 30 September, viewed 16 September 2021 https://www.mckinsey.com/featured-insights/diversity-and-inclusion/women-in-the-workplace

Code

The following code was used to fix the issues identified in the original.

suppress <- suppressPackageStartupMessages # suppress library warning messages from displaying in output
suppress(library(readr))
suppress(library(dplyr))
suppress(library(tidyr))
suppress(library(magrittr))
suppress(library(ggplot2))

setwd("~/RMIT (Data Science)/Data Visualisation (MATH2150)/Assignment 2/Submission")

# Data collation and preprocessing was conducted seperately in R and a CSV file containing data created.
# Code for this initial step is provided in the submission document for reference.

# Import data set, view data structure, allocate levels within occupation category, and view header of data set
data <- read_csv("Data_Category.csv")
str(data)

## spec_tbl_df [30 x 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Job_Role        : chr [1:30] "Entry Level" "Entry Level" "Entry Level" "Entry Level" ...
##  $ Category        : chr [1:30] "Men_2020" "Women_2020" "Men_2015" "Women_2015" ...
##  $ Employee Percent: num [1:30] 53 47 55.1 44.9 2.1 62 38 63.2 36.8 1.2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Job_Role = col_character(),
##   ..   Category = col_character(),
##   ..   `Employee Percent` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

data$Job_Role <- factor(c(data$Job_Role), levels=c("Entry Level", "Manager", "Senior Manager / Director", "Vice President", "Senior Vice President", "C-Suite"), ordered=TRUE)
head(data)

## # A tibble: 6 x 3
##   Job_Role    Category               `Employee Percent`
##   <ord>       <chr>                               <dbl>
## 1 Entry Level Men_2020                             53  
## 2 Entry Level Women_2020                           47  
## 3 Entry Level Men_2015                             55.1
## 4 Entry Level Women_2015                           44.9
## 5 Entry Level Women_Change_2015_2020                2.1
## 6 Manager     Men_2020                             62

# Filter data for plotting and order the categories for plotting
data_plot <- data %>% filter(Category=="Men_2020" | Category=="Women_Change_2015_2020" | Category=="Women_2015")
data_plot$Category <- factor(c(data_plot$Category), levels=c("Men_2020", "Women_Change_2015_2020", "Women_2015"), ordered=TRUE)
head(data_plot)

## # A tibble: 6 x 3
##   Job_Role    Category               `Employee Percent`
##   <ord>       <ord>                               <dbl>
## 1 Entry Level Men_2020                             53  
## 2 Entry Level Women_2015                           44.9
## 3 Entry Level Women_Change_2015_2020                2.1
## 4 Manager     Men_2020                             62  
## 5 Manager     Women_2015                           36.8
## 6 Manager     Women_Change_2015_2020                1.2

# Reconstruct data visualisation to a stacked bar chart of gender categories using ggplot2
# with occupational categories on the Y-axis and employee percentage on the X-axis
p1 <- ggplot(data_plot, aes(fill=Category, x=`Employee Percent`, y=Job_Role)) +
  geom_bar(position="stack", stat="identity", width=0.8)
p1 <- p1 + theme(panel.background=element_rect(fill="lightsteelblue1")) +
  scale_fill_manual(values=c("midnightblue", "darkorange1", "goldenrod1"), name=element_blank(),
                    labels=c("Total Men in 2020", "% point change for Women from 2015 to 2020",
                             "Total Women in 2015"), guide=guide_legend(reverse = TRUE)) +
  scale_x_continuous(breaks=seq(0, 100, by=10)) +
  theme(panel.grid.major.y=element_blank(),
        panel.grid.minor.x=element_blank(),
        panel.grid.major.x=element_line(size=0.5, linetype='solid', colour="lightsteelblue3")) +
  theme(axis.ticks=element_blank(), axis.title.y = element_blank()) +
  theme(text=element_text(family="sans")) +
  labs(title="Since 2015, we've seen only modest signs of progress in the
       representation of women in the corporate pipeline.",
       subtitle="Representation of employees by level", x="% of employees") +
  theme(plot.title=element_text(face="bold", color="black", size=11),
        plot.subtitle=element_text(face="bold", color="black", size=9, vjust=-1.5),
        axis.title.x=element_text(face="bold", color="gray0", size=9),
        axis.text.x=element_text(color="gray10", size=8),
        axis.text.y=element_text(color="gray0", size=8)) +
  theme(legend.position="top", legend.key.size = unit(0.3, 'cm'),
        legend.text=element_text(color="gray10", size=8))

Data Reference

Thomas, R, Cooper, M (Ph.D), Cardazone, G (Ph.D), Urban, K, Bohrer, A, Long, M, Yee, L, Krivkovich, A, Huang, J, Prince, S, Kumar, A, Coury, S 2020, ‘Women in the Workplace 2020 - Corporate America is at Critical Crossroads’, LeanIn.Org and McKinsey, pp 8, viewed 16 September 2021 https://wiw-report.s3.amazonaws.com/Women_in_the_Workplace_2020.pdf

Reconstruction

The following plot fixes the main issues in the original.

Assignment 2

Deconstruct, Reconstruct Web Report

Juliana Potulic (9204937R)

Original

Code

Reconstruction