Original


Source: https://howmuch.net/articles/men-vs-women-comparing-income-by-industry.


Objective

the objective of the original data visualization was to show the pay gap between male and female in all Industry in America, the Audience are U.S. citizens.

The visualization chosen had the following three main issues:

  • The colour scale is not appropriate for this chart,since the purpose of this data visualization was to show the pay gaps between female and male, the color should only be used to represent the gender.
  • Due to the properties of the circle chart, the increase of the radius in the chart is not proportional to the increase of the area, It makes the value of industries with a high median earning looks more extreme than it should be.
  • The accuracy of the data visualization was not good, audience have to use the labeled value to check the pay gap between male and female, and the visualization does not have an overall industry pay gap between men and women.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(tidyr)
library(dplyr)
pay<- read.csv("reorganized dataset.csv")
head(pay)
##                                                    ï..Industry
## 1 Civilian employed population 16 years and over with earnings
## 2                   Agriculture, forestry, fishing and hunting
## 3                Mining, quarrying, and oil and gas extraction
## 4                                                 Construction
## 5                                                Manufacturing
## 6                                              Wholesale trade
##   Estimated.Median.earnings..dollars.  Male Female Percent
## 1                               39777 45893  32436   0.707
## 2                               30299 32021  20689   0.646
## 3                               71960 73037  65364   0.895
## 4                               41870 42098  39222   0.932
## 5                               48406 52026  37694   0.725
## 6                               48553 51407  40630   0.790
paygender <- pay[-1,c(1,3,4)]
head(paygender)
##                                     ï..Industry  Male Female
## 2    Agriculture, forestry, fishing and hunting 32021  20689
## 3 Mining, quarrying, and oil and gas extraction 73037  65364
## 4                                  Construction 42098  39222
## 5                                 Manufacturing 52026  37694
## 6                               Wholesale trade 51407  40630
## 7                                  Retail trade 30592  21415
Median_Earning_Male <- pay[1,3]
head(Median_Earning_Male)
## [1] 45893
Median_Earning_Female <- pay[1,4]
head(Median_Earning_Female)
## [1] 32436
paygender <- paygender%>%gather(Male,Female,key = 'Gender',
                                value = 'Median_earning(US dollars)')
p1 <- ggplot(data = paygender,
             aes(y=ï..Industry,
             x=`Median_earning(US dollars)`,fill=Gender))
P<- p1+geom_bar(position = "dodge",stat = "identity")+
  scale_fill_manual(values = c("#FFC0CB","#0096FF"))+
  geom_vline(xintercept = c(Median_Earning_Male,Median_Earning_Female),
             colour=c("blue","red"),
             size=1,alpha=0.8)+
  geom_text(aes(x=Median_Earning_Male,
                label="$45893",y=1.5),alpha=0.2,
            colour="blue",angle=270,vjust=1)+
  geom_text(aes(x=Median_Earning_Female, 
                label="$32436",y=1.5),alpha=0.2,
            colour="red",angle=270,vjust=1)+
  geom_text(aes(x=Median_Earning_Male,
                label="All Industry(male)",y=2),alpha=0.2,
            colour="blue",angle=270,vjust=-0.5)+
  geom_text(aes(x=Median_Earning_Female, 
                label="All Industry(female)",y=2),alpha=0.2,
            colour="red",angle=270,vjust=-0.5)+
  labs(title = "U.S. median earning for men and women by Industry in 2019",
       x="median earning(U.S. dollar)",
       y="Industry")+
  theme(axis.title = element_text(size = 15),
        plot.title = element_text(size=20),
        panel.background = element_rect(colour = "#000000",
                                        fill = "#FFFFFF",linetype = 1,size = 1),
        panel.grid.major = element_line(size = 0.5, linetype = 2,
                                        colour = "#000000"),
        panel.grid.major.y = element_blank(),
        panel.grid.minor = element_line(size = 0.25, linetype = 2,
                                      colour = "#000000"))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.