library(dplyr)
library(ggplot2)
library(GGally)
load(file = "Datasets/OPM94.RData")
load(file = "Datasets/OPM2008.RData")
Throughout history, women have have fewer rights than men and in our more modern society, there is a big debate over whether politics and the economy are being affected by sexism. In terms of wages, there re many claims that there is a gender wage gap between men and women. Statistics pertaining to the gender wage gap have become so prevalent to people that former United States president, Barack Obama, even discussed the matter. Society is working to bridge the wage gap and it is very important that the patterns of gender wage gap are recognized and handled. This research report will aim to explore the gender pay gap, which can be explained as the average hourly pay of women and men. This report will also identify the many different variables contributing to the gap including years in education, grade, as well as, years of service. The disparity in wages is statistically significant and can, without a doubt, be generalized to an even greater population of people. The following data provides substantial evidence proving that gender pay gap is a serious issue.
Do women receive lower pay for equal work irrespective of the levek of qualification?
To answer this question, we will use a random sample of federal employees.
load(file = "Datasets/OPM94.RData")
names(opm94)
## [1] "x" "sal" "grade" "patco" "major" "age"
## [7] "male" "vet" "handvet" "hand" "yos" "edyrs"
## [13] "promo" "exit" "supmgr" "race" "minority" "grade4"
## [19] "promo01" "supmgr01" "male01" "exit01" "vet01"
There are two datasets provided that will consist of 9,074 observation with 22 varibles in the 2008 dataset and 1,000 observations wth 24 variables including, but not limited to: salary, grade, edyrs, race, sex, yos, vet status, and age.
opm94 %>% group_by(male) %>% summarise(Mean_Salary = mean(sal, na.rm = TRUE))
## # A tibble: 2 x 2
## male Mean_Salary
## <fct> <dbl>
## 1 female 34223.
## 2 male 46999.
opm2008 %>% group_by(male) %>% summarise(Mean_Salary = mean(salary, na.rm = TRUE))
## Warning: Factor `male` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## # A tibble: 3 x 2
## male Mean_Salary
## <fct> <dbl>
## 1 Female 63902.
## 2 Male 74841.
## 3 <NA> 73327.
The results of fitting a a bivariate model with salary as the outcome and gender as a predictor for opm94 are shown below:
lm(sal ~ male01, data = opm94) %>% summary()
##
## Call:
## lm(formula = sal ~ male01, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31945 -11537 -3092 9591 71883
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34222.8 749.9 45.64 <2e-16 ***
## male01 12776.6 1046.3 12.21 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16500 on 993 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.1305, Adjusted R-squared: 0.1297
## F-statistic: 149.1 on 1 and 993 DF, p-value: < 2.2e-16
opm94 <- opm94 %>% mutate(female01 = if_else(male01 == 0, 1, 0 ))
lm(sal ~ female01, data = opm94) %>% summary()
##
## Call:
## lm(formula = sal ~ female01, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31945 -11537 -3092 9591 71883
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46999.4 729.8 64.40 <2e-16 ***
## female01 -12776.6 1046.3 -12.21 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16500 on 993 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.1305, Adjusted R-squared: 0.1297
## F-statistic: 149.1 on 1 and 993 DF, p-value: < 2.2e-16
The results of fitting a bivariate model with salary as the outcome and gender as a predictor for opm2008 are shown below:
lm(salary ~ male, data = opm2008) %>% summary()
##
## Call:
## lm(formula = salary ~ male, data = opm2008)
##
## Residuals:
## Min 1Q Median 3Q Max
## -51916 -22631 -5674 18366 139731
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 63902.0 413.7 154.47 <2e-16 ***
## maleMale 10938.5 606.8 18.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28810 on 9058 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.03463, Adjusted R-squared: 0.03452
## F-statistic: 324.9 on 1 and 9058 DF, p-value: < 2.2e-16
opm2008 <- opm2008 %>% mutate(female = if_else(male == 0, 1, 0 ))
lm(salary ~ female, data = opm2008) %>% summary()
##
## Call:
## lm(formula = salary ~ female, data = opm2008)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49339 -23945 -5550 18553 145587
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68985 308 224 <2e-16 ***
## female NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29320 on 9059 degrees of freedom
## (14 observations deleted due to missingness)
The table above shows, many predictors of salary are strongly correlated with the predictor variable.
Below, are a few plots for opm94, showing the correlation between salary and grade, education years, and years of service
ggplot(data=opm94) + geom_point(mapping = aes(x=grade, y = sal))
## Warning: Removed 5 rows containing missing values (geom_point).
ggplot(data=opm94) + geom_point(mapping = aes(x=yos, y = sal))
## Warning: Removed 5 rows containing missing values (geom_point).
ggplot(data=opm94) + geom_point(mapping = aes(x=edyrs, y = sal))
## Warning: Removed 5 rows containing missing values (geom_point).
Below, are a few plots for opm2008, showing the correlation between salary and grade, education years, and years of service
ggplot(data=opm2008) + geom_point(mapping = aes(x=grade, y = salary))
## Warning: Removed 8 rows containing missing values (geom_point).
ggplot(data=opm2008) + geom_point(mapping = aes(x=yos, y = salary))
## Warning: Removed 8 rows containing missing values (geom_point).
ggplot(data=opm2008) + geom_point(mapping = aes(x=edyrs, y = salary))
## Warning: Removed 8 rows containing missing values (geom_point).
The most influential variable influencing salary in opm94 and opm2008 for women is ranked as follow: # 1- Grade # 2- Education Years # 3- Years of Service
The resulting model has three main predictors which include grade, education years, and years of service. According to the statistics found, men receive a much gretaer salary than women. In fact, if we examine the means for both gender in 1994, men on average made $12,776 more and in 2008, men made $10,939 more. While the gender wage gap has gotten smaller, there is still a significant disparity between salaries that both genders receive. The conclusion could be extended to a larger population, as the statistics accurately respresent claims that have been made pertaining the wage gap. In other words, this data has enough substantially convincing evidence proving that mean receive a higher salary than women.