Women have been tested by the disparity in the workforce throughout the years. Until current occasions, lawful and social practices, joined with the latency of longstanding strict and instructive shows, limited ladies’ entrance and interest in the workforce.
The gender pay gap is the gap between what people are paid. Most generally, it alludes to the middle yearly pay of all ladies who work all day and all year, contrasted with the compensation of a comparable companion of men. It is significant for us to distinguish which business areas and occupations have a critical sex pay hole. It is additionally significant for us to distinguish if there has been any improvement in overcoming any issues regarding time.
We would perform an exploratory data examination on the chronicled information about ladies’ profit and business status. We will utilize summaries and diagrams, in order to find designs and to spot peculiarities. When we recognize the loopholes, we could progress in the direction of connecting the crisis in particular areas and promote fairness.
library(tidyr)
library(DT)
library(ggplot2)
library(dplyr)
library(tidyverse)
library(kableExtra)
library(lubridate)
library(readxl)
library(highcharter)
library(lubridate)
library(scales)
library(RColorBrewer)
library(wesanderson)
library(plotly)
library(shiny)
library(readxl)
library(ggalt)
## Warning: package 'ggalt' was built under R version 3.6.2
| Package | Description |
|---|---|
| library(tidyr) | For changing the layout of your data sets, to convert data into the tidy format |
| library(DT) | For HTML display of data |
| library(ggplot2) | For customizable graphical representation |
| library(dplyr) | For data manipulation |
| library(tidyverse) | Collection of R packages designed for data science that works harmoniously with other packages |
| library(kableExtra) | To display table in a fancy way |
| library(lubridate) | Lubridate makes it easier to do the things R does with date-times and possible to do the things R does not |
| library(readxl) | The readxl package makes it easy to get data out of Excel and into R |
| library(highcharter) | Highcharter is a R wrapper for Highcharts javascript libray and its modules |
| library(scales) | The idea of the scales package is to implement scales in a way that is graphics system agnostic |
| library(RColorBrewer) | RColorBrewer is an R package that allows users to create colourful graphs with pre-made color palettes that visualize data in a clear and distinguishable manner |
| library(wesanderson) | A Wes Anderson is color palette for R |
| library(plotly) | Plotly’s R graphing library makes interactive, publication-quality graphs |
| library(shiny) | Shiny is an R package that makes it easy to build interactive web apps straight from R |
There are historical data about women’s earnings and employment status, as well as detailed information about specific occupation and earnings from 2013-2016 from the Bureau of Labor Statistics and the Census Bureau about women in the workforce.
The data used in the analysis can be found here. The data consists of three tables.
The first one contains information about the major employment sectors, occupations, proportion of women and the percentage earnings of women in that occupation. It has 2008 observations and 12 variables.
| VARIABLE | CLASS | DESCRIPTION |
|---|---|---|
| year | integer | Year |
| occupation | character | Specific job/career |
| major_category | character | Broad category of occupation |
| minor_category | character | Fine category of occupation |
| total_workers | double | Total estimated full-time workers > 16 years old |
| workers_male | double | Estimated MALE full-time workers > 16 years old |
| workers_female | double | Estimated FEMALE full-time workers > 16 years old |
| percent_female | double | The percent of females for specific occupation |
| total_earnings | double | Total estimated median earnings for full-time workers > 16 years old |
| total_earnings_male | double | Estimated MALE median earnings for full-time workers > 16 years old |
| total_earnings_female | double | Estimated FEMALE median earnings for full-time workers > 16 years old |
| wage_percent_of_male | double | Female wages as percent of male wages - NA for occupations with small sample size |
The second table describes the percent of earnings of women with respect to men , for different age groups over the span of time. It has 264 observations and 3 variables.
| VARIABLE | CLASS | DESCRIPTION |
|---|---|---|
| Year | integer | Year |
| group | character | Age group |
| percent | double | Female salary percent of male salary |
This table contains data of proportion of women and men working part-time and full-time over the span of time. It has 49 observations and 7 variables.
| VARIABLE | CLASS | DESCRIPTION |
|---|---|---|
| year | double | Year |
| total_full_time | double | Percent of total employed people usually working full time |
| total_part_time | double | Percent of total employed people usually working part time |
| full_time_female | double | Percent of employed women usually working full time |
| part_time_female | double | Percent of employed women usually working part time |
| full_time_male | double | Percent of employed men usually working full time |
| part_time_male | double | Percent of employed men usually working part time |
jobs_gender <- read.csv("jobs_gender.csv")
earnings_female <- read.csv("earnings_female.csv")
employed_gender <- read.csv("employed_gender.csv")
We now take a look at the structure of the data and also their summary statistics. The summaries would help us spot any anomalities like negative or extreme values. It would also indicate the fields with missing values and their counts.
str(jobs_gender)
## 'data.frame': 2088 obs. of 12 variables:
## $ year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
## $ occupation : Factor w/ 522 levels "Accountants and auditors",..: 69 218 265 6 289 415 5 87 178 82 ...
## $ major_category : Factor w/ 8 levels "Computer, Engineering, and Science",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ minor_category : Factor w/ 23 levels "Architecture and Engineering",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ total_workers : int 1024259 977284 14815 43015 754514 44198 109703 489048 990611 14656 ...
## $ workers_male : int 782400 681627 8375 17775 440078 16141 72873 354369 460842 3387 ...
## $ workers_female : int 241859 295657 6440 25240 314436 28057 36830 134679 529769 11269 ...
## $ percent_female : num 23.6 30.3 43.5 58.7 41.7 63.5 33.6 27.5 53.5 76.9 ...
## $ total_earnings : int 120254 73557 67155 61371 78455 74114 62187 99167 70456 71927 ...
## $ total_earnings_male : int 126142 81041 71530 75190 91998 90071 66579 101318 90278 97552 ...
## $ total_earnings_female: int 95921 60759 65325 55860 65040 66052 55079 90940 57406 68207 ...
## $ wage_percent_of_male : num 76 75 91.3 74.3 70.7 ...
summary(jobs_gender)
## year occupation
## Min. :2013 Accountants and auditors : 4
## 1st Qu.:2014 Actors : 4
## Median :2014 Actuaries : 4
## Mean :2014 Adhesive bonding machine operators and tenders: 4
## 3rd Qu.:2015 Administrative services managers : 4
## Max. :2016 Advertising and promotions managers : 4
## (Other) :2064
## major_category
## Production, Transportation, and Material Moving :444
## Natural Resources, Construction, and Maintenance:328
## Sales and Office :280
## Service :272
## Computer, Engineering, and Science :236
## Management, Business, and Financial :232
## (Other) :296
## minor_category total_workers
## Production : 308 Min. : 658
## Office and Administrative Support : 208 1st Qu.: 18687
## Construction and Extraction : 152 Median : 58997
## Installation, Maintenance, and Repair : 144 Mean : 196055
## Healthcare Practitioners and Technical: 128 3rd Qu.: 187415
## Management : 120 Max. :3758629
## (Other) :1028
## workers_male workers_female percent_female total_earnings
## Min. : 0 Min. : 0 Min. : 0.00 Min. : 17266
## 1st Qu.: 10765 1st Qu.: 2364 1st Qu.: 10.73 1st Qu.: 32410
## Median : 32302 Median : 15238 Median : 32.40 Median : 44437
## Mean : 111515 Mean : 84540 Mean : 36.00 Mean : 49762
## 3rd Qu.: 102644 3rd Qu.: 63327 3rd Qu.: 57.31 3rd Qu.: 61012
## Max. :2570385 Max. :2290818 Max. :100.00 Max. :201542
##
## total_earnings_male total_earnings_female wage_percent_of_male
## Min. : 12147 Min. : 7447 Min. : 50.88
## 1st Qu.: 35702 1st Qu.: 28872 1st Qu.: 77.56
## Median : 46825 Median : 40191 Median : 85.16
## Mean : 53138 Mean : 44681 Mean : 84.03
## 3rd Qu.: 65015 3rd Qu.: 54813 3rd Qu.: 90.62
## Max. :231420 Max. :166388 Max. :117.40
## NA's :4 NA's :65 NA's :846
We see that there are 4 missing values under the column ‘total_earnings_male’, 65 missing values for ‘total_earnings_female’ and 846 missing values under ‘wage_percent_of_male’ from the first table- ‘jobs_gender’.
Since 4 and 65 correspond to 0.19% and 3.11% of the dataset respectively, we could remove them from further analysis. However, 846 is a signicant fraction we wouldn’t remove those obeservation. The values for these observations can be computed using the formula total_earnings_female/total_earnings_male X 100.
We would rename the field ‘wage_percent_of_male’ to ‘wage_percent_female_wrt_male’ for clarity.
We also see from the summary() for job_gender table, the minimum value for both columns: workers_male and workers_female is 0. This indicates that there are certain occupations where either only male or female employees work.
jobs_gender%>% filter(workers_female==0)%>% count()
## # A tibble: 1 x 1
## n
## <int>
## 1 18
jobs_gender%>% filter(workers_male==0)%>% count()
## # A tibble: 1 x 1
## n
## <int>
## 1 3
These observations have NA values in their corresponding earnings variables hence they are automatically handled.
jobs_gender <- jobs_gender %>% filter(!is.na(total_earnings_male) & !is.na(total_earnings_female)) %>% rename(wage_percent_female_wrt_male = wage_percent_of_male)
jobs_gender$wage_percent_female_wrt_male[is.na(jobs_gender$wage_percent_female_wrt_male)] <- jobs_gender$total_earnings_female/jobs_gender$total_earnings_male *100
We will now look into the table earnings_female that provides us data regarding percentage earnings of women of various age groups over the years.
str(earnings_female)
## 'data.frame': 264 obs. of 3 variables:
## $ Year : int 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 ...
## $ group : Factor w/ 8 levels "16-19 years",..: 8 8 8 8 8 8 8 8 8 8 ...
## $ percent: num 62.3 64.2 64.4 65.7 66.5 67.6 68.1 69.5 69.8 70.2 ...
summary(earnings_female)
## Year group percent
## Min. :1979 16-19 years:33 Min. :56.80
## 1st Qu.:1987 20-24 years:33 1st Qu.:69.40
## Median :1995 25-34 years:33 Median :75.50
## Mean :1995 35-44 years:33 Mean :76.88
## 3rd Qu.:2003 45-54 years:33 3rd Qu.:86.90
## Max. :2011 55-64 years:33 Max. :95.40
## (Other) :66
unique(earnings_female$group)
## [1] Total, 16 years and older 16-19 years
## [3] 20-24 years 25-34 years
## [5] 35-44 years 45-54 years
## [7] 55-64 years 65 years and older
## 8 Levels: 16-19 years 20-24 years 25-34 years 35-44 years ... Total, 16 years and older
Here we find a group named “Total, 16 years and older” in the group column . This does not giving any proper insights hence we will remove those values from the data set.
earnings_female <- earnings_female %>%
filter(str_detect(group, "Total, 16 years and older") == FALSE)
Now taking a look at the employed_gender table**
str(employed_gender)
## 'data.frame': 49 obs. of 7 variables:
## $ year : int 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 ...
## $ total_full_time : num 86 85.5 84.8 84.4 84.3 84.4 84.2 83.4 83.3 83.3 ...
## $ total_part_time : num 14 14.5 15.2 15.6 15.7 15.6 15.8 16.6 16.7 16.7 ...
## $ full_time_female: num 75.1 74.9 73.9 73.2 73.1 73.2 73.2 72.4 72.5 72.6 ...
## $ part_time_female: num 24.9 25.1 26.1 26.8 26.9 26.8 26.8 27.6 27.5 27.4 ...
## $ full_time_male : num 92.2 91.8 91.5 91.2 91.1 91.4 91.2 90.6 90.6 90.5 ...
## $ part_time_male : num 7.8 8.2 8.5 8.8 8.9 8.6 8.8 9.4 9.4 9.5 ...
summary(employed_gender)
## year total_full_time total_part_time full_time_female
## Min. :1968 Min. :80.30 Min. :14.00 Min. :71.90
## 1st Qu.:1980 1st Qu.:81.80 1st Qu.:16.80 1st Qu.:73.20
## Median :1992 Median :82.60 Median :17.40 Median :73.90
## Mean :1992 Mean :82.64 Mean :17.36 Mean :73.86
## 3rd Qu.:2004 3rd Qu.:83.20 3rd Qu.:18.20 3rd Qu.:74.70
## Max. :2016 Max. :86.00 Max. :19.70 Max. :75.40
## part_time_female full_time_male part_time_male
## Min. :24.60 Min. :86.60 Min. : 7.80
## 1st Qu.:25.30 1st Qu.:89.00 1st Qu.: 9.60
## Median :26.10 Median :89.50 Median :10.50
## Mean :26.14 Mean :89.49 Mean :10.51
## 3rd Qu.:26.80 3rd Qu.:90.40 3rd Qu.:11.00
## Max. :28.10 Max. :92.20 Max. :13.40
We will use the employed_gender table as it is as there are no concerning issues.
Now the data is ready for analysis.
The cleaned data can be found below:
datatable(earnings_female, filter = 'top')
datatable(employed_gender, filter = 'top')
datatable(jobs_gender, filter = 'top')
We grouped and divided the job_gender data with respect to each major category that gave us an idea where the pay gap is maximum and minimum. Few interesting observations from the analysis are:
Even though the women earns a maximum salary by around 22% when compared to the maximum salary earned by men in Production, Transportation and Material Moving Category, the percentage pay gap difference between men and women is maximum. This is a category with 75% men in the workforce where the minimum salary earned by women is 189% less than the minimum salary earned by men.
In the Healthcare Practioners and Technical Category, where the women in the workplace is more than men, still they receive less earning by around 20% when compared to male.
Further analysis indicates that the earning of women is independent on the represntation of women in the workplace for each category.
We can also see the trend that as the age of the women increases, the pay gap also increases.
summary1 <- jobs_gender %>%
group_by(major_category) %>%
summarize( mean_earnings_female = sum(total_earnings_female)/n(), mean_earnings_male = sum(total_earnings_male)/n()) %>%
mutate(perc_difference = (mean_earnings_male - mean_earnings_female)/mean_earnings_female*100)
mean_perc_diff <- mean(summary1$perc_difference)
ggplot(summary1, aes(x= major_category, y= perc_difference)) + geom_col(fill="lightblue") +
geom_col(stat ="identity", color = "black", fill="#C91B1B")+
geom_text(aes(label = round(perc_difference,0)), size = 3, hjust=1.5, color = "white") +
ggtitle("Percentage Difference in Earnings \n by Major Occupational Category",) +
xlab("Major Occupation Category") +
ylab("Percentage Difference") +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank()) +
coord_flip()
We see that there is a significant positive difference in earnings of women in comparison to men in all major occupational categories. The percentage difference is as high as 25% in categories like Production, Transportation, and Material Moving and in Management, Business and Finacial. The least differnce is (around 13%) in the field of Natural Resources, Construction and Maintanence. The overall average in pay gap is around 19% across all categories.
summary2 <- jobs_gender %>%
group_by(major_category) %>%
summarize( max_earnings_female = max(total_earnings_female), max_earnings_male = max(total_earnings_male), min_earnings_female = min(total_earnings_female), min_earnings_male = min(total_earnings_male)) %>%
mutate(perc_max_difference = (max_earnings_male - max_earnings_female)/max_earnings_female*100, perc_min_difference = (min_earnings_male - min_earnings_female)/min_earnings_female*100) %>%
mutate(if_female_max= perc_max_difference <0, if_female_min= perc_min_difference>0) %>%
select(major_category,perc_max_difference,perc_min_difference,if_female_max,if_female_min)
ggplot(summary2, aes(x= major_category, y= perc_max_difference, fill=if_female_max)) +
geom_bar(stat ="identity", color = "black") +
scale_fill_manual(values=c("#040059", "#37FE00"), labels = c("Male earning more", "Female earning more")) +
geom_text(aes(label = round(perc_max_difference,0)), size = 3,hjust= 0, color = "black")+
ggtitle("Who earns the maximum salary \n and by how much?") +
ylab("Percentage Difference in Maximun Salary of Male and Female") +
xlab("Major Occupational Category") +
coord_flip() +
ylim(-55,55) +
guides(fill=guide_legend(" ")) +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
ggplot(summary2, aes(x= major_category, y= perc_min_difference,fill=if_female_min)) +
geom_col(stat ="identity", color = "black") +
scale_fill_manual(values=c("#040059", "#37FE00"), labels = c("Male earning less", "Female earning less")) +
geom_text(aes(label = round(perc_min_difference,0)), size = 3, hjust= 0) +
ggtitle("Who earns the minimum salary \n and by how much?") +
ylab("Percentage Difference in Minimum Salary of Male and Female") +
xlab("Major Occupational Category") +
coord_flip() +
guides(fill=guide_legend(" ")) +
ylim ( -50, 200) +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
Comparing maximum and minimum values of earnings in each major category, we see that women are earning the minimum salaries in most categories. They earn the maximum salaries in about three major categories. In the category of Production, Transportation and Material Moving, we see that a woman earns the maximun as well as the minimum salary. This is the department with the highest difference in the mean salary too.
summary3 <- jobs_gender %>%
group_by(major_category) %>%
summarise(total_female_percent = sum(workers_female)/sum(total_workers)*100,
total_male_percent = sum(workers_male)/sum(total_workers)*100) %>%
gather(key=gender, value = proportion, total_female_percent:total_male_percent)
ggplot(summary3, aes(x=major_category, y= proportion, fill= gender)) +
geom_col(color = "black") +
scale_fill_manual(values=c("#37FE00", "#040059"), labels = c("Percentage of Women", "Percentage of Men")) +
ggtitle("Representation of Women in Each \n Occupational Category") +
ylab("Propotion Value") +
xlab("Major Category")+
coord_flip() +
guides(fill=guide_legend(" ")) +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
We see that the category of Natural Resources,Construction and Maintenance is highly male dominated. The healthcare Practitioners and Technical department have largest proportion of women but still have a pay gap of 20%. Service Sales and Office , Management Business have about 50% of women representation.
correlation <- jobs_gender %>% group_by(major_category) %>%
summarize(cor =cor(percent_female,wage_percent_female_wrt_male))
kable(correlation) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", "sm"), full_width = F, fixed_thead = T, )
| major_category | cor |
|---|---|
| Computer, Engineering, and Science | -0.1158826 |
| Education, Legal, Community Service, Arts, and Media | 0.0837506 |
| Healthcare Practitioners and Technical | 0.2916140 |
| Management, Business, and Financial | -0.1011816 |
| Natural Resources, Construction, and Maintenance | -0.0228289 |
| Production, Transportation, and Material Moving | -0.1330825 |
| Sales and Office | 0.2720451 |
| Service | 0.0047345 |
From the above results we can confirm that the represantation of women in an occupational category does not influence their earnings.
5.Now we use the earning females table to visualize the aggregate percentage earnings of female with respect to male for various age categories of women over the years.
data2 <- earnings_female %>%
group_by(group) %>%
summarise(average_percent_female = sum(percent)/n())
data2 %>% ggplot(aes(x=group, y=average_percent_female)) +
geom_col(fill= "#1FCA19" , color = "black") +
coord_flip() +
geom_text(aes(label = round(average_percent_female,0)), size = 3, hjust=2)+
ggtitle("Percentage Earnings of Women of Various Age Groups") +
xlab("Age Group of Women") +
ylab("Percentage Earning of Women with Respect to Men") +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
Women belonging to the age-group of 16-19 years and 20-24 years face lesser pay gap in comparison to the remaining. The pay gap among age groups tend to increase with respect to age of women. However the pay gap reduces again for women of age 65 years and older.
We grouped and divided the data with respect to time frame which gave us an idea where the pay gap is maximum and minimum. Few interesting observations from the analysis are:
Salary earned by women in 2016 in all major category is less than the salary earned by men in 2013.
Even though the proportion of women in each major category is more or less the same from 2013 to 2016, there are some category where mean salary obtained by women fluctuates from 2013 to 2016
We also see that the part time female workers are around 3 times compared to part time male workers but their proportion decreases with the increase in year.
Younger women face less gender pay gap compared to elder women.
1.We check how the salary has changed for each major occupational category over the time period for both men and women.
#Jobs Gender Table: Increment in each year
data_year_female <- jobs_gender %>%
select(year, major_category, total_earnings_female) %>%
group_by(year,major_category) %>%
summarise(average_earning_female= mean(total_earnings_female)) %>%
spread(key=year, value = average_earning_female) %>%
mutate(Increment= round(((`2016`-`2013`)/`2013`)*100,digit = 2))
kable(data_year_female) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", "sm"), full_width = F, fixed_thead = T, )
| major_category | 2013 | 2014 | 2015 | 2016 | Increment |
|---|---|---|---|---|---|
| Computer, Engineering, and Science | 68092.78 | 69346.10 | 69480.19 | 70813.66 | 4.00 |
| Education, Legal, Community Service, Arts, and Media | 45673.12 | 45950.57 | 46768.31 | 46638.64 | 2.11 |
| Healthcare Practitioners and Technical | 64665.71 | 67537.39 | 69305.42 | 70697.45 | 9.33 |
| Management, Business, and Financial | 57625.17 | 58736.60 | 58756.09 | 61163.57 | 6.14 |
| Natural Resources, Construction, and Maintenance | 37153.62 | 39296.11 | 38833.38 | 38880.52 | 4.65 |
| Production, Transportation, and Material Moving | 31820.57 | 31745.33 | 32541.09 | 33674.94 | 5.83 |
| Sales and Office | 36016.64 | 36828.29 | 37481.26 | 38097.60 | 5.78 |
| Service | 31766.88 | 31579.30 | 32184.32 | 32418.42 | 2.05 |
ggplot(data_year_female, aes(x=`2013`, xend=`2016`, y=major_category )) +
geom_dumbbell()+
geom_segment(aes(x=`2013`,
xend=`2016`,
y=major_category,
yend=major_category),
color="#b2b2b2", size=1.5) +
geom_dumbbell(color="black",
size_x=5.5,
size_xend = 5.5,
colour_x="#FFEF00",
colour_xend = "#37FE00") +
geom_text(aes(label = round(Increment,0)), size = 5, hjust = 1) +
ggtitle ("Salary Change of Women From 2013 to 2016",) +
ylab("Major Occupation Category") + xlab("Average Salary of Women") +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
data_year_male<- jobs_gender %>%
select(year, major_category, total_earnings_male) %>%
group_by(year,major_category) %>%
summarise(average_earning_male= mean(total_earnings_male)) %>%
spread(key=year, value = average_earning_male) %>%
mutate(Increment= round(((`2016`-`2013`)/`2013`)*100, digit = 2))
kable(data_year_male) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", "sm"), full_width = F, fixed_thead = T, )
| major_category | 2013 | 2014 | 2015 | 2016 | Increment |
|---|---|---|---|---|---|
| Computer, Engineering, and Science | 77964.64 | 80126.39 | 80844.20 | 81857.02 | 4.99 |
| Education, Legal, Community Service, Arts, and Media | 53320.83 | 53617.74 | 55066.60 | 55606.64 | 4.29 |
| Healthcare Practitioners and Technical | 78265.61 | 80933.71 | 81960.77 | 84787.16 | 8.33 |
| Management, Business, and Financial | 73004.07 | 73326.62 | 73625.83 | 74912.86 | 2.61 |
| Natural Resources, Construction, and Maintenance | 43227.41 | 42728.63 | 44106.50 | 44549.85 | 3.06 |
| Production, Transportation, and Material Moving | 40030.11 | 40654.44 | 40460.63 | 41954.31 | 4.81 |
| Sales and Office | 43940.64 | 44313.67 | 45059.54 | 46633.70 | 6.13 |
| Service | 35301.51 | 36612.79 | 36800.38 | 37859.16 | 7.25 |
ggplot(data_year_male, aes(x=`2013`, xend=`2016`, y=major_category )) +
geom_dumbbell() +
geom_segment(aes(x=`2013`,
xend=`2016`,
y=major_category,
yend=major_category),
color="#b2b2b2", size=1.5) +
geom_dumbbell(color="black",
size_x=5.5,
size_xend = 5.5,
colour_x="#FFEF00",
colour_xend = "#040059") +
geom_text(aes(label = round(Increment,0)), size = 5, hjust= 1) +
ggtitle("Salary Change of Men From 2013 to 2016",) +
ylab("Major Occupation Category") + xlab("Average Salary of Men") +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
We see that even though there are certain category where the increment percentage(indicated by number on the dumbbell plot) in salary from 2013 to 2016 is more for women, the actual picture is very different.There is a huge pay gap per year in each major occupational categories which is evident from the fact there in all the categories the salary what a women is making in 2016 is way less than the salary what men used to make 4 years ago i.e. 2013.
2.Also, we look at the proportion of women in each occupational category over the period from 2013 to 2016
#Visualizing proportion by year
summary5 <- jobs_gender %>%
select(year, major_category, total_workers, workers_male, workers_female) %>%
group_by(year,major_category) %>%
summarise(total_female_percent = sum(workers_female)/sum(total_workers),
total_male_percent = sum(workers_male)/sum(total_workers)) %>%
gather(key=gender, value = proportion, "total_female_percent":"total_male_percent")
summary5 %>% ggplot(aes(x=major_category, y= proportion, fill = gender)) +
geom_col() +
scale_fill_manual(values=c("#37FE00", "#040059"), labels= c("Percentage of Women", "Percentage of Men")) +
facet_wrap(~year) +
coord_flip() +
ggtitle("Representation of Women Over the Years") +
xlab("Propotion Value") +
ylab("Major Category") +
guides(fill=guide_legend(" ")) +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
We see that the representation of women in each of the categories has remained almost constant over the four years. As seen earlier, Natual Resources is one the major occupational sector which is heavily male dominated over the four years
We see that the most drastic drop in earnings of women happens in Natural Resources in 2015. The most drastic rise happens in Service in 2014. In the categories of Computer, Healthcare , Education the salary percentages fluctute while it has remained fairly constant in Management, Production and Sales.
#employed_gender : part time women VS men
summary7 <- employed_gender %>%
group_by(year) %>%
mutate(Ratio = sum(part_time_female)/sum(part_time_male))
ggplot(data= summary7, aes(x=year, y=Ratio, fill=year)) +
geom_line(size=1.5, color = "#0016FF") +
ggtitle("Representation of Women in Part Time Jobs") +
xlab("Year") +
ylab("Ratio of part time female by part time male ") +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
The ratio of part time women keeps decreasing as the year increases which indicates that more women are taking full time jobs. In the 70’s it was 3 women corresponding to one man and has now reduced to 2.
#employed_gender : full time women VS men
summary8 <-employed_gender %>%
group_by(year) %>%
mutate ( Ratio= sum(full_time_female)/sum(full_time_male))
#Plot
ggplot (data= summary8, aes(x=year, y=Ratio, fill=year)) +
geom_line(size=1.5, color = "#0016FF") +
ggtitle("Representation of Women in Time Jobs") +
xlab("Year") +
ylab("Ratio of full time female by full time male ") +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
From the above output we see that the number of full time women in all the year is less than the men. The ratio of full time women is more or less the same with some postive increment in the last few years. In the 70’s the ratio was 100 women for about 123 men and improves to a ratio 100 women for 117 men in 2010.
#employed_gender : part time women VS full time women
summary9 <- employed_gender %>%
gather(key= time, value= proportion,part_time_female,full_time_female) %>%
select(year, time, proportion)
#plot
ggplot(data = summary9 , aes(x= year, y=proportion, fill=(time))) +
geom_bar(stat= 'identity', position='dodge' , color = "Black") +
scale_fill_manual(values=c("#0021FF" , "#FF0000"), labels = c("Full Time", "Part Time")) +
ggtitle("Women Employees Over Various Years") +
xlab("Year") +ylab("Percent of Women Employees") +
guides(fill=guide_legend(" ")) +
theme(legend.position = "right",
plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
axis.title.y = element_text(),
axis.title.x = element_text(),
axis.ticks = element_blank())
From the above two graphs we see that the the variation in full time and part time jobs is huge for the genders. 30% of women work in part time jobs while only about 10% of men work in part time.
We see that the change in percentage of earnings for women with respect to men is increasing for the women between age group 25 to 64. The variation is random for women of age group 16 to 19 years , 20-24 years and for 65 years and older.
In light of the current information, we have attempted to examine the patterns and examples in the profit of women in contrast with men as for different variables. This brought about a couple of insights about the same.
Factor 1: Major Occupational Categories:
We have assembled and separated the entire information concerning each significant class which gave us a thought where the compensation hole is most extreme and least. We see that there is a huge positive distinction in the income of ladies in contrast with men in all major word related classifications.
It is likewise observed that six out of eight word related classes have the least middle pay rates earned by a lady. Though five out of these eight word related classifications have the most extreme compensation earned by a man. This shows towards the plausibility of men holding a bigger extent of generously compensated occupations in every one of these parts.
Factor 2: Representation of Women in the Workforce
We see that the extent of ladies in the workforce for every one of the divisions hasn’t changed essentially throughout the years. There is no kind impact on their portrayal of their pay. It is likewise seen that they endure a compensation hole of around 10 to 20% in specific occupations like Nursing where their portrayal is above 80%. Ladies are practically 50% of the workforce. However, all things considered, ladies keep on procuring significantly not as much as men.
Factor 3: Full-Time and Part-Time
Taking proportions of the number of ladies to men in low maintenance to all day employments, we see that Women dwarf men in low maintenance occupations. All-day occupations have more men portrayal. This proposes the plausibility of the presence of a predisposition. Ladies will, in general, occupy low maintenance occupations to meet other family desires.
Factor 4: Time
We assembled and partitioned the information as for year giving us the sign whether the compensation hole increments or diminishes concerning time. The pattern recommends that however there has been a salary raise of ladies as for men throughout the years, the change isn’t noteworthy for some age gatherings. It is nearly continued as before for the most seasoned and most youthful age gatherings.