Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Visualizing the Massive Gender Pay Gap Across U.S. Industries (Source: howmuch.net)


Objective: The main objective of this visualization is to compare the median annual earnings by industry for men versus women in the United States of America, illustrating the disparities in income between genders. This data aims to highlight potential issues of inequality in wage distribution across various sectors.

Audience: The targeted audience includes policymakers, gender studies scholars, HR professionals, and the general public interested in gender equality and economic distribution.

Critique

The visualization chosen had the following three main issues:

  • Complexity and Clarity in Visualization: The main problem with this visualization is its use of a circular design. This layout makes it hard to compare the earnings of men and women across different industries clearly. The circular arrangement can be visually appealing but often complicates the understanding of data differences directly.

  • Color and Perceptual Issues: The second issue is related to the use of color and perceptual interpretation. The visualization utilizes a gradient of colors to denote different income levels, which can be misleading or difficult to interpret accurately, especially for individuals with color vision deficiencies. The slight changes in color do not effectively communicate the differences in earnings, potentially minimizing perceived disparities.

  • Label Clarity and Accessibility: The third issue with this visualization is the way data labels such as earning range labels, industry labels are handled. The labels are small and tightly packed, making it hard for people to quickly understand the information, especially on small screens or for those with visual impairments. This can make the visualization less effective as a tool for conveying information clearly.

References

The reference to the original data visualisation chosen, the data source(s) used for the reconstruction and any other sources used for this assignment are as follows:

Code

The following code was used to fix the issues identified in the original.

# Loading necessary libraries
library(ggplot2)
library(dplyr)
library(readxl)
library(scales)
library(tidyr)
# Loading the data
data <- read_excel("data.xlsx")

# Viewing the first few rows of the dataset
head(data)
## # A tibble: 6 × 5
##    Rank Industry                   Median earnings (Men…¹ Median earnings (Wom…²
##   <dbl> <chr>                                       <dbl>                  <dbl>
## 1     1 Finance and insurance                       83660                  50456
## 2     2 Other services, except pu…                  35778                  22083
## 3     3 Professional, scientific,…                  84749                  53152
## 4     4 Agriculture, forestry, fi…                  32021                  20689
## 5     5 Management of companies &…                  85219                  58718
## 6     6 Retail trade                                30592                  21415
## # ℹ abbreviated names: ¹​`Median earnings (Men)`, ²​`Median earnings (Women)`
## # ℹ 1 more variable: `Women's earnings (as a % of men's earning)` <dbl>
# Transforming data from wide to long format
data_new <- data %>%
  pivot_longer(
    cols = c(`Median earnings (Men)`, `Median earnings (Women)`),
    names_to = "Gender",
    values_to = "Earnings"
  ) %>%
  mutate(Gender = recode(Gender, `Median earnings (Men)` = "Men", `Median earnings (Women)` = "Women"))

# Creating grouped bar chart
output <- ggplot(data_new, aes(x = Industry, y = Earnings, fill = Gender)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.7)) +
  geom_text(aes(label = label_number(scale = .001, suffix = "k")(Earnings)), 
            position = position_dodge(width = 0.7), vjust = 0.5, hjust = -0.1, color = "black", size = 2) +
  scale_y_continuous(limits = c(0, 90000), labels = label_number(scale = .001, suffix = "k")) +
  scale_fill_manual(values = c("Men" = "skyblue", "Women" = "violet")) +
  labs(title = "Comparative Median Annual Earnings by Gender \n Across Industries in U.S",
       x = "Industry",
       y = "Median Annual Earnings \n ($)",
       fill = "Gender") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(size = 12, hjust = 0.5)) +
  coord_flip()

Reconstruction

The following plot fixes the main issues in the original.