Objective
the objective of the original data visualization was to show the pay gap between male and female in all Industry in America, the Audience are U.S. citizens.
The visualization chosen had the following three main issues:
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(tidyr)
library(dplyr)
pay<- read.csv("reorganized dataset.csv")
head(pay)
## ï..Industry
## 1 Civilian employed population 16 years and over with earnings
## 2 Agriculture, forestry, fishing and hunting
## 3 Mining, quarrying, and oil and gas extraction
## 4 Construction
## 5 Manufacturing
## 6 Wholesale trade
## Estimated.Median.earnings..dollars. Male Female Percent
## 1 39777 45893 32436 0.707
## 2 30299 32021 20689 0.646
## 3 71960 73037 65364 0.895
## 4 41870 42098 39222 0.932
## 5 48406 52026 37694 0.725
## 6 48553 51407 40630 0.790
paygender <- pay[-1,c(1,3,4)]
head(paygender)
## ï..Industry Male Female
## 2 Agriculture, forestry, fishing and hunting 32021 20689
## 3 Mining, quarrying, and oil and gas extraction 73037 65364
## 4 Construction 42098 39222
## 5 Manufacturing 52026 37694
## 6 Wholesale trade 51407 40630
## 7 Retail trade 30592 21415
Median_Earning_Male <- pay[1,3]
head(Median_Earning_Male)
## [1] 45893
Median_Earning_Female <- pay[1,4]
head(Median_Earning_Female)
## [1] 32436
paygender <- paygender%>%gather(Male,Female,key = 'Gender',
value = 'Median_earning(US dollars)')
p1 <- ggplot(data = paygender,
aes(y=ï..Industry,
x=`Median_earning(US dollars)`,fill=Gender))
P<- p1+geom_bar(position = "dodge",stat = "identity")+
scale_fill_manual(values = c("#FFC0CB","#0096FF"))+
geom_vline(xintercept = c(Median_Earning_Male,Median_Earning_Female),
colour=c("blue","red"),
size=1,alpha=0.8)+
geom_text(aes(x=Median_Earning_Male,
label="$45893",y=1.5),alpha=0.2,
colour="blue",angle=270,vjust=1)+
geom_text(aes(x=Median_Earning_Female,
label="$32436",y=1.5),alpha=0.2,
colour="red",angle=270,vjust=1)+
geom_text(aes(x=Median_Earning_Male,
label="All Industry(male)",y=2),alpha=0.2,
colour="blue",angle=270,vjust=-0.5)+
geom_text(aes(x=Median_Earning_Female,
label="All Industry(female)",y=2),alpha=0.2,
colour="red",angle=270,vjust=-0.5)+
labs(title = "U.S. median earning for men and women by Industry in 2019",
x="median earning(U.S. dollar)",
y="Industry")+
theme(axis.title = element_text(size = 15),
plot.title = element_text(size=20),
panel.background = element_rect(colour = "#000000",
fill = "#FFFFFF",linetype = 1,size = 1),
panel.grid.major = element_line(size = 0.5, linetype = 2,
colour = "#000000"),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_line(size = 0.25, linetype = 2,
colour = "#000000"))
Data Reference
The following plot fixes the main issues in the original.