Synopsis

Women have been tested by the disparity in the workforce throughout the years. Until current occasions, lawful and social practices, joined with the latency of longstanding strict and instructive shows, limited ladies’ entrance and interest in the workforce.

The gender pay gap is the gap between what people are paid. Most generally, it alludes to the middle yearly pay of all ladies who work all day and all year, contrasted with the compensation of a comparable companion of men. It is significant for us to distinguish which business areas and occupations have a critical sex pay hole. It is additionally significant for us to distinguish if there has been any improvement in overcoming any issues regarding time.

We would perform an exploratory data examination on the chronicled information about ladies’ profit and business status. We will utilize summaries and diagrams, in order to find designs and to spot peculiarities. When we recognize the loopholes, we could progress in the direction of connecting the crisis in particular areas and promote fairness.

Packages Required

library(tidyr)
library(DT)
library(ggplot2)
library(dplyr)
library(tidyverse)
library(kableExtra)
library(lubridate)
library(readxl)
library(highcharter)
library(lubridate)
library(scales)
library(RColorBrewer)
library(wesanderson)
library(plotly)
library(shiny)
library(readxl)
library(ggalt)
## Warning: package 'ggalt' was built under R version 3.6.2
Package Description
library(tidyr) For changing the layout of your data sets, to convert data into the tidy format
library(DT) For HTML display of data
library(ggplot2) For customizable graphical representation
library(dplyr) For data manipulation
library(tidyverse) Collection of R packages designed for data science that works harmoniously with other packages
library(kableExtra) To display table in a fancy way
library(lubridate) Lubridate makes it easier to do the things R does with date-times and possible to do the things R does not
library(readxl) The readxl package makes it easy to get data out of Excel and into R
library(highcharter) Highcharter is a R wrapper for Highcharts javascript libray and its modules
library(scales) The idea of the scales package is to implement scales in a way that is graphics system agnostic
library(RColorBrewer) RColorBrewer is an R package that allows users to create colourful graphs with pre-made color palettes that visualize data in a clear and distinguishable manner
library(wesanderson) A Wes Anderson is color palette for R
library(plotly) Plotly’s R graphing library makes interactive, publication-quality graphs
library(shiny) Shiny is an R package that makes it easy to build interactive web apps straight from R

Data Preparation

Data Source

There are historical data about women’s earnings and employment status, as well as detailed information about specific occupation and earnings from 2013-2016 from the Bureau of Labor Statistics and the Census Bureau about women in the workforce.

Explanation of Source Data

The data used in the analysis can be found here. The data consists of three tables.

The first one contains information about the major employment sectors, occupations, proportion of women and the percentage earnings of women in that occupation. It has 2008 observations and 12 variables.

jobs_gender.csv
VARIABLE CLASS DESCRIPTION
year integer Year
occupation character Specific job/career
major_category character Broad category of occupation
minor_category character Fine category of occupation
total_workers double Total estimated full-time workers > 16 years old
workers_male double Estimated MALE full-time workers > 16 years old
workers_female double Estimated FEMALE full-time workers > 16 years old
percent_female double The percent of females for specific occupation
total_earnings double Total estimated median earnings for full-time workers > 16 years old
total_earnings_male double Estimated MALE median earnings for full-time workers > 16 years old
total_earnings_female double Estimated FEMALE median earnings for full-time workers > 16 years old
wage_percent_of_male double Female wages as percent of male wages - NA for occupations with small sample size


The second table describes the percent of earnings of women with respect to men , for different age groups over the span of time. It has 264 observations and 3 variables.

earnings_female.csv
VARIABLE CLASS DESCRIPTION
Year integer Year
group character Age group
percent double Female salary percent of male salary


This table contains data of proportion of women and men working part-time and full-time over the span of time. It has 49 observations and 7 variables.

employed_gender.csv
VARIABLE CLASS DESCRIPTION
year double Year
total_full_time double Percent of total employed people usually working full time
total_part_time double Percent of total employed people usually working part time
full_time_female double Percent of employed women usually working full time
part_time_female double Percent of employed women usually working part time
full_time_male double Percent of employed men usually working full time
part_time_male double Percent of employed men usually working part time


Data Cleaning

jobs_gender <- read.csv("jobs_gender.csv")
earnings_female <- read.csv("earnings_female.csv")
employed_gender <- read.csv("employed_gender.csv")

We now take a look at the structure of the data and also their summary statistics. The summaries would help us spot any anomalities like negative or extreme values. It would also indicate the fields with missing values and their counts.

str(jobs_gender)
## 'data.frame':    2088 obs. of  12 variables:
##  $ year                 : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
##  $ occupation           : Factor w/ 522 levels "Accountants and auditors",..: 69 218 265 6 289 415 5 87 178 82 ...
##  $ major_category       : Factor w/ 8 levels "Computer, Engineering, and Science",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ minor_category       : Factor w/ 23 levels "Architecture and Engineering",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ total_workers        : int  1024259 977284 14815 43015 754514 44198 109703 489048 990611 14656 ...
##  $ workers_male         : int  782400 681627 8375 17775 440078 16141 72873 354369 460842 3387 ...
##  $ workers_female       : int  241859 295657 6440 25240 314436 28057 36830 134679 529769 11269 ...
##  $ percent_female       : num  23.6 30.3 43.5 58.7 41.7 63.5 33.6 27.5 53.5 76.9 ...
##  $ total_earnings       : int  120254 73557 67155 61371 78455 74114 62187 99167 70456 71927 ...
##  $ total_earnings_male  : int  126142 81041 71530 75190 91998 90071 66579 101318 90278 97552 ...
##  $ total_earnings_female: int  95921 60759 65325 55860 65040 66052 55079 90940 57406 68207 ...
##  $ wage_percent_of_male : num  76 75 91.3 74.3 70.7 ...
summary(jobs_gender)
##       year                                               occupation  
##  Min.   :2013   Accountants and auditors                      :   4  
##  1st Qu.:2014   Actors                                        :   4  
##  Median :2014   Actuaries                                     :   4  
##  Mean   :2014   Adhesive bonding machine operators and tenders:   4  
##  3rd Qu.:2015   Administrative services managers              :   4  
##  Max.   :2016   Advertising and promotions managers           :   4  
##                 (Other)                                       :2064  
##                                           major_category
##  Production, Transportation, and Material Moving :444   
##  Natural Resources, Construction, and Maintenance:328   
##  Sales and Office                                :280   
##  Service                                         :272   
##  Computer, Engineering, and Science              :236   
##  Management, Business, and Financial             :232   
##  (Other)                                         :296   
##                                 minor_category total_workers    
##  Production                            : 308   Min.   :    658  
##  Office and Administrative Support     : 208   1st Qu.:  18687  
##  Construction and Extraction           : 152   Median :  58997  
##  Installation, Maintenance, and Repair : 144   Mean   : 196055  
##  Healthcare Practitioners and Technical: 128   3rd Qu.: 187415  
##  Management                            : 120   Max.   :3758629  
##  (Other)                               :1028                    
##   workers_male     workers_female    percent_female   total_earnings  
##  Min.   :      0   Min.   :      0   Min.   :  0.00   Min.   : 17266  
##  1st Qu.:  10765   1st Qu.:   2364   1st Qu.: 10.73   1st Qu.: 32410  
##  Median :  32302   Median :  15238   Median : 32.40   Median : 44437  
##  Mean   : 111515   Mean   :  84540   Mean   : 36.00   Mean   : 49762  
##  3rd Qu.: 102644   3rd Qu.:  63327   3rd Qu.: 57.31   3rd Qu.: 61012  
##  Max.   :2570385   Max.   :2290818   Max.   :100.00   Max.   :201542  
##                                                                       
##  total_earnings_male total_earnings_female wage_percent_of_male
##  Min.   : 12147      Min.   :  7447        Min.   : 50.88      
##  1st Qu.: 35702      1st Qu.: 28872        1st Qu.: 77.56      
##  Median : 46825      Median : 40191        Median : 85.16      
##  Mean   : 53138      Mean   : 44681        Mean   : 84.03      
##  3rd Qu.: 65015      3rd Qu.: 54813        3rd Qu.: 90.62      
##  Max.   :231420      Max.   :166388        Max.   :117.40      
##  NA's   :4           NA's   :65            NA's   :846

We see that there are 4 missing values under the column ‘total_earnings_male’, 65 missing values for ‘total_earnings_female’ and 846 missing values under ‘wage_percent_of_male’ from the first table- ‘jobs_gender’.

Since 4 and 65 correspond to 0.19% and 3.11% of the dataset respectively, we could remove them from further analysis. However, 846 is a signicant fraction we wouldn’t remove those obeservation. The values for these observations can be computed using the formula total_earnings_female/total_earnings_male X 100.

We would rename the field ‘wage_percent_of_male’ to ‘wage_percent_female_wrt_male’ for clarity.

We also see from the summary() for job_gender table, the minimum value for both columns: workers_male and workers_female is 0. This indicates that there are certain occupations where either only male or female employees work.

jobs_gender%>% filter(workers_female==0)%>% count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1    18
jobs_gender%>% filter(workers_male==0)%>% count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1     3

These observations have NA values in their corresponding earnings variables hence they are automatically handled.

jobs_gender <- jobs_gender %>% filter(!is.na(total_earnings_male) &  !is.na(total_earnings_female)) %>% rename(wage_percent_female_wrt_male = wage_percent_of_male) 

jobs_gender$wage_percent_female_wrt_male[is.na(jobs_gender$wage_percent_female_wrt_male)] <- jobs_gender$total_earnings_female/jobs_gender$total_earnings_male *100

We will now look into the table earnings_female that provides us data regarding percentage earnings of women of various age groups over the years.

str(earnings_female)
## 'data.frame':    264 obs. of  3 variables:
##  $ Year   : int  1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 ...
##  $ group  : Factor w/ 8 levels "16-19 years",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ percent: num  62.3 64.2 64.4 65.7 66.5 67.6 68.1 69.5 69.8 70.2 ...
summary(earnings_female)
##       Year              group       percent     
##  Min.   :1979   16-19 years:33   Min.   :56.80  
##  1st Qu.:1987   20-24 years:33   1st Qu.:69.40  
##  Median :1995   25-34 years:33   Median :75.50  
##  Mean   :1995   35-44 years:33   Mean   :76.88  
##  3rd Qu.:2003   45-54 years:33   3rd Qu.:86.90  
##  Max.   :2011   55-64 years:33   Max.   :95.40  
##                 (Other)    :66
unique(earnings_female$group)
## [1] Total, 16 years and older 16-19 years              
## [3] 20-24 years               25-34 years              
## [5] 35-44 years               45-54 years              
## [7] 55-64 years               65 years and older       
## 8 Levels: 16-19 years 20-24 years 25-34 years 35-44 years ... Total, 16 years and older

Here we find a group named “Total, 16 years and older” in the group column . This does not giving any proper insights hence we will remove those values from the data set.

earnings_female <- earnings_female %>% 
                          filter(str_detect(group, "Total, 16 years and older") == FALSE)

Now taking a look at the employed_gender table**

str(employed_gender)
## 'data.frame':    49 obs. of  7 variables:
##  $ year            : int  1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 ...
##  $ total_full_time : num  86 85.5 84.8 84.4 84.3 84.4 84.2 83.4 83.3 83.3 ...
##  $ total_part_time : num  14 14.5 15.2 15.6 15.7 15.6 15.8 16.6 16.7 16.7 ...
##  $ full_time_female: num  75.1 74.9 73.9 73.2 73.1 73.2 73.2 72.4 72.5 72.6 ...
##  $ part_time_female: num  24.9 25.1 26.1 26.8 26.9 26.8 26.8 27.6 27.5 27.4 ...
##  $ full_time_male  : num  92.2 91.8 91.5 91.2 91.1 91.4 91.2 90.6 90.6 90.5 ...
##  $ part_time_male  : num  7.8 8.2 8.5 8.8 8.9 8.6 8.8 9.4 9.4 9.5 ...
summary(employed_gender)
##       year      total_full_time total_part_time full_time_female
##  Min.   :1968   Min.   :80.30   Min.   :14.00   Min.   :71.90   
##  1st Qu.:1980   1st Qu.:81.80   1st Qu.:16.80   1st Qu.:73.20   
##  Median :1992   Median :82.60   Median :17.40   Median :73.90   
##  Mean   :1992   Mean   :82.64   Mean   :17.36   Mean   :73.86   
##  3rd Qu.:2004   3rd Qu.:83.20   3rd Qu.:18.20   3rd Qu.:74.70   
##  Max.   :2016   Max.   :86.00   Max.   :19.70   Max.   :75.40   
##  part_time_female full_time_male  part_time_male 
##  Min.   :24.60    Min.   :86.60   Min.   : 7.80  
##  1st Qu.:25.30    1st Qu.:89.00   1st Qu.: 9.60  
##  Median :26.10    Median :89.50   Median :10.50  
##  Mean   :26.14    Mean   :89.49   Mean   :10.51  
##  3rd Qu.:26.80    3rd Qu.:90.40   3rd Qu.:11.00  
##  Max.   :28.10    Max.   :92.20   Max.   :13.40

We will use the employed_gender table as it is as there are no concerning issues.

Now the data is ready for analysis.

Cleaned Data

The cleaned data can be found below:

earnings_female

datatable(earnings_female, filter = 'top')

employed_gender

datatable(employed_gender, filter = 'top')

jobs_gender

datatable(jobs_gender, filter = 'top')

Exploratory Data Analysis

a. Analysis by occupational category

We grouped and divided the job_gender data with respect to each major category that gave us an idea where the pay gap is maximum and minimum. Few interesting observations from the analysis are:

  • Even though the women earns a maximum salary by around 22% when compared to the maximum salary earned by men in Production, Transportation and Material Moving Category, the percentage pay gap difference between men and women is maximum. This is a category with 75% men in the workforce where the minimum salary earned by women is 189% less than the minimum salary earned by men.

  • In the Healthcare Practioners and Technical Category, where the women in the workplace is more than men, still they receive less earning by around 20% when compared to male.

  • Further analysis indicates that the earning of women is independent on the represntation of women in the workplace for each category.

  • We can also see the trend that as the age of the women increases, the pay gap also increases.

  1. We summarize and visualize the mean earning of women in comparison to men by each major category of occupation.
summary1 <- jobs_gender %>% 
  group_by(major_category) %>%
  summarize( mean_earnings_female = sum(total_earnings_female)/n(), mean_earnings_male = sum(total_earnings_male)/n()) %>% 
  mutate(perc_difference = (mean_earnings_male - mean_earnings_female)/mean_earnings_female*100)

 mean_perc_diff <- mean(summary1$perc_difference)

ggplot(summary1, aes(x= major_category, y= perc_difference)) + geom_col(fill="lightblue") +
  geom_col(stat ="identity", color = "black", fill="#C91B1B")+
  geom_text(aes(label = round(perc_difference,0)), size = 3, hjust=1.5, color = "white") +
  ggtitle("Percentage Difference in Earnings \n by Major Occupational Category",) +
  xlab("Major Occupation Category") + 
  ylab("Percentage Difference") + 
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank()) + 
  coord_flip() 

We see that there is a significant positive difference in earnings of women in comparison to men in all major occupational categories. The percentage difference is as high as 25% in categories like Production, Transportation, and Material Moving and in Management, Business and Finacial. The least differnce is (around 13%) in the field of Natural Resources, Construction and Maintanence. The overall average in pay gap is around 19% across all categories.

  1. We will look at the minmum and maximum salaries in each department. We are interested in learning if these are salaries earned by a woman or a man
summary2 <- jobs_gender %>% 
                  group_by(major_category) %>%
                  summarize( max_earnings_female = max(total_earnings_female), max_earnings_male = max(total_earnings_male), min_earnings_female = min(total_earnings_female), min_earnings_male = min(total_earnings_male)) %>%
                  mutate(perc_max_difference = (max_earnings_male - max_earnings_female)/max_earnings_female*100, perc_min_difference = (min_earnings_male - min_earnings_female)/min_earnings_female*100) %>%
                  mutate(if_female_max= perc_max_difference <0, if_female_min= perc_min_difference>0) %>%
                  select(major_category,perc_max_difference,perc_min_difference,if_female_max,if_female_min)
  ggplot(summary2, aes(x= major_category, y= perc_max_difference, fill=if_female_max)) + 
        geom_bar(stat ="identity", color = "black") + 
        scale_fill_manual(values=c("#040059", "#37FE00"), labels = c("Male earning more", "Female earning more")) +
        geom_text(aes(label = round(perc_max_difference,0)), size = 3,hjust= 0, color = "black")+
        ggtitle("Who earns the maximum salary \n and by how much?") +
        ylab("Percentage Difference in Maximun Salary of Male and Female") +
        xlab("Major Occupational Category") +
        coord_flip() + 
        ylim(-55,55) +
        guides(fill=guide_legend(" ")) + 
        theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

 ggplot(summary2, aes(x= major_category, y= perc_min_difference,fill=if_female_min)) + 
  geom_col(stat ="identity", color = "black") + 
  scale_fill_manual(values=c("#040059", "#37FE00"), labels = c("Male earning less", "Female earning less")) +
  geom_text(aes(label = round(perc_min_difference,0)), size = 3, hjust= 0) +
  ggtitle("Who earns the minimum salary \n and by how much?") +
  ylab("Percentage Difference in Minimum Salary of Male and Female") +
  xlab("Major Occupational Category") +
  coord_flip() + 
  guides(fill=guide_legend(" ")) +
  ylim ( -50, 200) + 
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

Comparing maximum and minimum values of earnings in each major category, we see that women are earning the minimum salaries in most categories. They earn the maximum salaries in about three major categories. In the category of Production, Transportation and Material Moving, we see that a woman earns the maximun as well as the minimum salary. This is the department with the highest difference in the mean salary too.

  1. We now take a look at the proportion of women in each of the major categories.
summary3 <- jobs_gender %>%
  group_by(major_category) %>%
  summarise(total_female_percent = sum(workers_female)/sum(total_workers)*100,
            total_male_percent = sum(workers_male)/sum(total_workers)*100)  %>% 
  gather(key=gender, value = proportion, total_female_percent:total_male_percent)

ggplot(summary3, aes(x=major_category, y= proportion, fill= gender)) + 
  geom_col(color = "black") + 
  scale_fill_manual(values=c("#37FE00", "#040059"), labels = c("Percentage of Women", "Percentage of Men")) +
  ggtitle("Representation of Women in Each \n Occupational Category") +
  ylab("Propotion Value") +
  xlab("Major Category")+ 
  coord_flip() +
  guides(fill=guide_legend(" ")) + 
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

We see that the category of Natural Resources,Construction and Maintenance is highly male dominated. The healthcare Practitioners and Technical department have largest proportion of women but still have a pay gap of 20%. Service Sales and Office , Management Business have about 50% of women representation.

  1. To study the correlation between the earnings of women and their representation in each category, we look at the correlation values.
correlation <- jobs_gender %>% group_by(major_category) %>% 
  summarize(cor =cor(percent_female,wage_percent_female_wrt_male))

kable(correlation) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", "sm"), full_width = F, fixed_thead = T, )
major_category cor
Computer, Engineering, and Science -0.1158826
Education, Legal, Community Service, Arts, and Media 0.0837506
Healthcare Practitioners and Technical 0.2916140
Management, Business, and Financial -0.1011816
Natural Resources, Construction, and Maintenance -0.0228289
Production, Transportation, and Material Moving -0.1330825
Sales and Office 0.2720451
Service 0.0047345

From the above results we can confirm that the represantation of women in an occupational category does not influence their earnings.

5.Now we use the earning females table to visualize the aggregate percentage earnings of female with respect to male for various age categories of women over the years.

data2 <- earnings_female %>%
  group_by(group) %>%
  summarise(average_percent_female = sum(percent)/n()) 


data2 %>% ggplot(aes(x=group, y=average_percent_female)) + 
  geom_col(fill= "#1FCA19" , color = "black") + 
  coord_flip() +
  geom_text(aes(label = round(average_percent_female,0)), size = 3, hjust=2)+
  ggtitle("Percentage Earnings of Women of Various Age Groups") +
  xlab("Age Group of Women") +
  ylab("Percentage Earning of Women with Respect to Men") + 
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

Women belonging to the age-group of 16-19 years and 20-24 years face lesser pay gap in comparison to the remaining. The pay gap among age groups tend to increase with respect to age of women. However the pay gap reduces again for women of age 65 years and older.

b. Analysis with respect to Time

We grouped and divided the data with respect to time frame which gave us an idea where the pay gap is maximum and minimum. Few interesting observations from the analysis are:

  • Salary earned by women in 2016 in all major category is less than the salary earned by men in 2013.

  • Even though the proportion of women in each major category is more or less the same from 2013 to 2016, there are some category where mean salary obtained by women fluctuates from 2013 to 2016

  • We also see that the part time female workers are around 3 times compared to part time male workers but their proportion decreases with the increase in year.

  • Younger women face less gender pay gap compared to elder women.

1.We check how the salary has changed for each major occupational category over the time period for both men and women.

#Jobs Gender Table:  Increment in each year
data_year_female <- jobs_gender %>%
  select(year, major_category, total_earnings_female) %>%
  group_by(year,major_category) %>%
  summarise(average_earning_female= mean(total_earnings_female)) %>%
  spread(key=year, value = average_earning_female) %>% 
  mutate(Increment= round(((`2016`-`2013`)/`2013`)*100,digit = 2))

kable(data_year_female) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", "sm"), full_width = F, fixed_thead = T, )
major_category 2013 2014 2015 2016 Increment
Computer, Engineering, and Science 68092.78 69346.10 69480.19 70813.66 4.00
Education, Legal, Community Service, Arts, and Media 45673.12 45950.57 46768.31 46638.64 2.11
Healthcare Practitioners and Technical 64665.71 67537.39 69305.42 70697.45 9.33
Management, Business, and Financial 57625.17 58736.60 58756.09 61163.57 6.14
Natural Resources, Construction, and Maintenance 37153.62 39296.11 38833.38 38880.52 4.65
Production, Transportation, and Material Moving 31820.57 31745.33 32541.09 33674.94 5.83
Sales and Office 36016.64 36828.29 37481.26 38097.60 5.78
Service 31766.88 31579.30 32184.32 32418.42 2.05
ggplot(data_year_female, aes(x=`2013`, xend=`2016`, y=major_category )) + 
geom_dumbbell()+
   geom_segment(aes(x=`2013`, 
                         xend=`2016`, 
                         y=major_category,
                         yend=major_category), 
                     color="#b2b2b2", size=1.5) +
        geom_dumbbell(color="black", 
                      size_x=5.5, 
                      size_xend = 5.5,
                      colour_x="#FFEF00", 
                      colour_xend = "#37FE00") +
       geom_text(aes(label = round(Increment,0)), size = 5, hjust = 1) +
 ggtitle ("Salary Change of Women From 2013 to 2016",) +
  ylab("Major Occupation Category") + xlab("Average Salary of Women") + 
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

data_year_male<- jobs_gender %>%
  select(year, major_category, total_earnings_male) %>%
  group_by(year,major_category) %>%
  summarise(average_earning_male= mean(total_earnings_male)) %>%
  spread(key=year, value = average_earning_male) %>% 
  mutate(Increment= round(((`2016`-`2013`)/`2013`)*100, digit = 2))

kable(data_year_male) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", "sm"), full_width = F, fixed_thead = T, )
major_category 2013 2014 2015 2016 Increment
Computer, Engineering, and Science 77964.64 80126.39 80844.20 81857.02 4.99
Education, Legal, Community Service, Arts, and Media 53320.83 53617.74 55066.60 55606.64 4.29
Healthcare Practitioners and Technical 78265.61 80933.71 81960.77 84787.16 8.33
Management, Business, and Financial 73004.07 73326.62 73625.83 74912.86 2.61
Natural Resources, Construction, and Maintenance 43227.41 42728.63 44106.50 44549.85 3.06
Production, Transportation, and Material Moving 40030.11 40654.44 40460.63 41954.31 4.81
Sales and Office 43940.64 44313.67 45059.54 46633.70 6.13
Service 35301.51 36612.79 36800.38 37859.16 7.25
ggplot(data_year_male, aes(x=`2013`, xend=`2016`, y=major_category  )) + 
geom_dumbbell() +
   geom_segment(aes(x=`2013`, 
                         xend=`2016`, 
                         y=major_category,
                         yend=major_category), 
                     color="#b2b2b2", size=1.5) +
        geom_dumbbell(color="black", 
                      size_x=5.5, 
                      size_xend = 5.5,
                      colour_x="#FFEF00", 
                      colour_xend = "#040059") +
       geom_text(aes(label = round(Increment,0)), size = 5, hjust= 1) +
 ggtitle("Salary Change of Men From 2013 to 2016",) +
  ylab("Major Occupation Category") + xlab("Average Salary of Men") + 
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

We see that even though there are certain category where the increment percentage(indicated by number on the dumbbell plot) in salary from 2013 to 2016 is more for women, the actual picture is very different.There is a huge pay gap per year in each major occupational categories which is evident from the fact there in all the categories the salary what a women is making in 2016 is way less than the salary what men used to make 4 years ago i.e. 2013.

2.Also, we look at the proportion of women in each occupational category over the period from 2013 to 2016

#Visualizing proportion by year
summary5 <- jobs_gender %>%
  select(year, major_category, total_workers, workers_male, workers_female) %>%
  group_by(year,major_category) %>%
  summarise(total_female_percent = sum(workers_female)/sum(total_workers),
            total_male_percent = sum(workers_male)/sum(total_workers)) %>%
  gather(key=gender, value = proportion, "total_female_percent":"total_male_percent")

summary5 %>% ggplot(aes(x=major_category, y= proportion, fill = gender)) + 
  geom_col() +
  scale_fill_manual(values=c("#37FE00", "#040059"), labels= c("Percentage of Women", "Percentage of Men")) +
  facet_wrap(~year) + 
  coord_flip() + 
  ggtitle("Representation of Women Over the Years") + 
  xlab("Propotion Value") +
  ylab("Major Category") +
  guides(fill=guide_legend(" ")) + 
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

We see that the representation of women in each of the categories has remained almost constant over the four years. As seen earlier, Natual Resources is one the major occupational sector which is heavily male dominated over the four years

We see that the most drastic drop in earnings of women happens in Natural Resources in 2015. The most drastic rise happens in Service in 2014. In the categories of Computer, Healthcare , Education the salary percentages fluctute while it has remained fairly constant in Management, Production and Sales.

  1. Now we use the employee gender table to derive insights for our analysis.
#employed_gender : part time women VS men

summary7 <- employed_gender %>% 
  group_by(year)  %>%  
  mutate(Ratio = sum(part_time_female)/sum(part_time_male))


ggplot(data= summary7, aes(x=year, y=Ratio, fill=year)) +
  geom_line(size=1.5, color = "#0016FF") +  
  ggtitle("Representation of Women in Part Time Jobs") + 
  xlab("Year") +
  ylab("Ratio of part time female by part time male ") +
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

The ratio of part time women keeps decreasing as the year increases which indicates that more women are taking full time jobs. In the 70’s it was 3 women corresponding to one man and has now reduced to 2.

#employed_gender : full time women VS men
summary8 <-employed_gender %>% 
  group_by(year)  %>%  
  mutate ( Ratio= sum(full_time_female)/sum(full_time_male))

#Plot
ggplot (data= summary8, aes(x=year, y=Ratio, fill=year)) +
  geom_line(size=1.5, color = "#0016FF") + 
  ggtitle("Representation of Women in Time Jobs") + 
  xlab("Year") +
  ylab("Ratio of full time female by full time male ") +
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

From the above output we see that the number of full time women in all the year is less than the men. The ratio of full time women is more or less the same with some postive increment in the last few years. In the 70’s the ratio was 100 women for about 123 men and improves to a ratio 100 women for 117 men in 2010.

#employed_gender : part time women VS full time women

summary9 <- employed_gender %>%
  gather(key= time, value= proportion,part_time_female,full_time_female) %>%
  select(year, time, proportion)

#plot
ggplot(data = summary9 , aes(x= year, y=proportion, fill=(time))) +
  geom_bar(stat= 'identity', position='dodge' , color = "Black") + 
  scale_fill_manual(values=c("#0021FF" , "#FF0000"), labels = c("Full Time", "Part Time")) +
  ggtitle("Women Employees Over Various Years") + 
  xlab("Year") +ylab("Percent of Women Employees") + 
  guides(fill=guide_legend(" ")) +
  theme(legend.position = "right",
        plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
        plot.subtitle = element_text(color = "darkblue", hjust = 0.5),
        axis.title.y = element_text(),
        axis.title.x = element_text(),
        axis.ticks = element_blank())

From the above two graphs we see that the the variation in full time and part time jobs is huge for the genders. 30% of women work in part time jobs while only about 10% of men work in part time.

  1. Now we look into the earning female table to derive insights for our analysis.

We see that the change in percentage of earnings for women with respect to men is increasing for the women between age group 25 to 64. The variation is random for women of age group 16 to 19 years , 20-24 years and for 65 years and older.

Summary

In light of the current information, we have attempted to examine the patterns and examples in the profit of women in contrast with men as for different variables. This brought about a couple of insights about the same.

Factor 1: Major Occupational Categories:

We have assembled and separated the entire information concerning each significant class which gave us a thought where the compensation hole is most extreme and least. We see that there is a huge positive distinction in the income of ladies in contrast with men in all major word related classifications.

It is likewise observed that six out of eight word related classes have the least middle pay rates earned by a lady. Though five out of these eight word related classifications have the most extreme compensation earned by a man. This shows towards the plausibility of men holding a bigger extent of generously compensated occupations in every one of these parts.

Factor 2: Representation of Women in the Workforce

We see that the extent of ladies in the workforce for every one of the divisions hasn’t changed essentially throughout the years. There is no kind impact on their portrayal of their pay. It is likewise seen that they endure a compensation hole of around 10 to 20% in specific occupations like Nursing where their portrayal is above 80%. Ladies are practically 50% of the workforce. However, all things considered, ladies keep on procuring significantly not as much as men.

Factor 3: Full-Time and Part-Time

Taking proportions of the number of ladies to men in low maintenance to all day employments, we see that Women dwarf men in low maintenance occupations. All-day occupations have more men portrayal. This proposes the plausibility of the presence of a predisposition. Ladies will, in general, occupy low maintenance occupations to meet other family desires.

Factor 4: Time

We assembled and partitioned the information as for year giving us the sign whether the compensation hole increments or diminishes concerning time. The pattern recommends that however there has been a salary raise of ladies as for men throughout the years, the change isn’t noteworthy for some age gatherings. It is nearly continued as before for the most seasoned and most youthful age gatherings.