library(magrittr) library(dplyr) library(ggplot2)
#Katelyn Burton #April 27, 2020
####EXECUTIVE SUMMARY
#Throughout history, women have have fewer rights than men and in our more modern society, there is a big debate over whether politics and the economy are being affected by sexism. In terms of wages, there re many claims that there is a gender wage gap between men and women. Statistics pertaining to the gender wage gap have become so prevalent to people that former United States president, Barack Obama, even discussed the matter. Society is working to bridge the wage gap and it is very important that the patterns of gender wage gap are recognized and handled. This research report will aim to explore the gender pay gap, which can be explained as the average hourly pay of women and men. This report will also identify the many different variables contributing to the gap including years in education, grade, as well as, years of service. The disparity in wages is statistically significant and can, without a doubt, be generalized to an even greater population of people. The following data provides substantial evidence proving that gender pay gap is a serious issue.
####ANALYSIS
###Variables
#The outcome variable in this analysis is salary. This variable would be known as a numeric variable. The salary in opm94 ranges from 15,054 dollars to 116,529 dollars. In opm2008 the salaries ranged from 19,646 dollars to 214,572 dollars.
#The independent variable of interest is gender, so male or female. In the dataset, the variable is classed as a categorical variable
#There are many other variables that contribute to the affect of gender on salary, but the main contributors are listed below as: #sex #salary #occupation #age #years of experience #grade #race
###Exploratory Analysis #To determine if men make a greater salary than women, a calculation of the means is required. This will show if there is a significant disparity in salaries based on gender.
opm94 %>% group_by(male) %>% summarise(Mean_Salary = mean(sal, na.rm = TRUE)) opm2008 %>% group_by(male) %>% summarise(Mean_Salary = mean(salary, na.rm = TRUE))
###Modeling
#The results of fitting a a bivariate model with salary as the outcome and gender as a predictor for opm94 are shown below:
lm(sal ~ male01, data = opm94) %>% summary() opm94 <- opm94 %>% mutate(female01 = if_else(male01 == 0, 1, 0 )) lm(sal ~ female01, data = opm94) %>% summary()
#The results of fitting a bivariate model with salary as the outcome and gender as a predictor for opm2008 are shown below:
lm(salary ~ male, data = opm2008) %>% summary() opm2008 <- opm2008 %>% mutate(female = if_else(male == 0, 1, 0 )) lm(salary ~ female, data = opm2008) %>% summary()
#The table shows that the expected salary of men is $34,222. Men are typically expected to make $12,776 more than the intercept. These numbers are statistically significant. However, there are many other characteristics that influence salary and could possibly be in correlation to gender. Because of this, the coefficient of gender might be biased. Below is a correlation matrix with the main variables contributing to the effect of gender on salary:
opm94 %>% select(sal, grade, yos, edyrs) %>% cor(use = “pairwise.complete.obs”)
#The table above shows, many predictors of salary are strongly correlated with the predictor variable.
#Below, are a few plots for opm94, showing the correlation between salary and grade, education years, and years of service
ggplot(data=opm94) + geom_point(mapping = aes(x=grade, y = sal)) ggplot(data=opm94) + geom_point(mapping = aes(x=yos, y = sal)) ggplot(data=opm94) + geom_point(mapping = aes(x=edyrs, y = sal))
#Below, are a few plots for opm2008, showing the correlation between salary and grade, education years, and years of service
ggplot(data=opm2008) + geom_point(mapping = aes(x=grade, y = salary)) ggplot(data=opm2008) + geom_point(mapping = aes(x=yos, y = salary)) ggplot(data=opm2008) + geom_point(mapping = aes(x=edyrs, y = salary))
#The most influential variable influencing salary in opm94 and opm2008 for women is ranked as follow: # 1- Grade # 2- Education Years # 3- Years of Service
####RESULTS
#The resulting model has three main predictors which include grade, education years, and years of service. According to the statistics found, men receive a much gretaer salary than women. In fact, if we examine the means for both gender in 1994, men on average made $12,776 more and in 2008, men made $10,939 more. While the gender wage gap has gotten smaller, there is still a significant disparity between salaries that both genders receive. The conclusion could be extended to a larger population, as the statistics accurately respresent claims that have been made pertaining the wage gap. In other words, this data has enough substantially convincing evidence proving that mean receive a higher salary than women.
names(opm94)
summary(opm94$male)
names(opm2008)
summary(opm2008$male)
ggplot(data=opm2008) + geom_point(mapping = aes(x=male, y = salary)) ggplot(data=opm2008) + geom_point(mapping = aes(x=male, y = salary, color = male)) + (geom_point)
opm94 %>% select(male, edyrs) %>% group_by(male) %>% summarise(mean_edyrs = mean(edyrs, na.rm = T)) opm94 %>% select(male, yos) %>% group_by(male) %>% summarise(mean_yos = mean(yos, na.rm = T)) opm94 %>% select(male, grade) %>% group_by(male) %>% summarise(mean_grade = mean(grade, na.rm = T))