MSDS Winter 2018 R Workshop

Jiadi Li

Final Project

Question for Analysis:

How does gender difference affect professors’ salaries?

0.Data Import: place the original Salaries.csv in a github file and have R read from the link.

ProfSalary <- read.csv("https://raw.githubusercontent.com/xiaoxiaogao-DD/Testing/master/Salaries.csv",header = TRUE)
#view the first 6 lines of the file
head(ProfSalary)
##   X      rank discipline yrs.since.phd yrs.service  sex salary
## 1 1      Prof          B            19          18 Male 139750
## 2 2      Prof          B            20          16 Male 173200
## 3 3  AsstProf          B             4           3 Male  79750
## 4 4      Prof          B            45          39 Male 115000
## 5 5      Prof          B            40          41 Male 141500
## 6 6 AssocProf          B             6           6 Male  97000

1.Data Exploration: summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.

summary(ProfSalary)
##        X              rank     discipline yrs.since.phd    yrs.service   
##  Min.   :  1   AssocProf: 64   A:181      Min.   : 1.00   Min.   : 0.00  
##  1st Qu.:100   AsstProf : 67   B:216      1st Qu.:12.00   1st Qu.: 7.00  
##  Median :199   Prof     :266              Median :21.00   Median :16.00  
##  Mean   :199                              Mean   :22.31   Mean   :17.61  
##  3rd Qu.:298                              3rd Qu.:32.00   3rd Qu.:27.00  
##  Max.   :397                              Max.   :56.00   Max.   :60.00  
##      sex          salary      
##  Female: 39   Min.   : 57800  
##  Male  :358   1st Qu.: 91000  
##               Median :107300  
##               Mean   :113706  
##               3rd Qu.:134185  
##               Max.   :231545

Since the data is not separated by gender yet, what we can summary is that, in this dataset, the number of male prfessors are approximately 9 times the number of female professors. While the population of female scholars in certain disciplines are reasonably small, the population difference may reveal gender inequality. Further explanation is required.

2.Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example - if it makes sense you could sum two columns together)

#Rename column name from "sex" to "gender"
colnames(ProfSalary)[6] <- "gender"
#Create a subset for all male professors
MaleProf <- subset(ProfSalary,gender=="Male",select = c(X,rank,discipline,yrs.service,salary))
summary(MaleProf)
##        X                rank     discipline  yrs.service   
##  Min.   :  1.0   AssocProf: 54   A:163      Min.   : 0.00  
##  1st Qu.:102.2   AsstProf : 56   B:195      1st Qu.: 7.00  
##  Median :202.5   Prof     :248              Median :18.00  
##  Mean   :202.1                              Mean   :18.27  
##  3rd Qu.:300.8                              3rd Qu.:27.00  
##  Max.   :397.0                              Max.   :60.00  
##      salary      
##  Min.   : 57800  
##  1st Qu.: 92000  
##  Median :108043  
##  Mean   :115090  
##  3rd Qu.:134864  
##  Max.   :231545
#Create a subset for all female professors
FemaleProf <- subset(ProfSalary,gender=="Female",select = c(X,rank,discipline,yrs.service,salary))
summary(FemaleProf)
##        X              rank    discipline  yrs.service        salary      
##  Min.   : 10   AssocProf:10   A:18       Min.   : 0.00   Min.   : 62884  
##  1st Qu.: 77   AsstProf :11   B:21       1st Qu.: 4.00   1st Qu.: 77250  
##  Median :149   Prof     :18              Median :10.00   Median :103750  
##  Mean   :171                             Mean   :11.56   Mean   :101002  
##  3rd Qu.:250                             3rd Qu.:17.50   3rd Qu.:117003  
##  Max.   :362                             Max.   :36.00   Max.   :161101

Disciplines are proportionally divided between two genders which makes it unnecessary to consider it as a factor in this observation. For ranks, 2/3 of male scholars in the study are professors which female professors in associate professor, assistant professor and professor catogories are approximately equal.This situation might or might not reveal discrimination in advancement depending on each professors years of service which will be discuss in the next section. Even though the maximum salary for male is 1.44 times the maximum salary for female, the maximum years of service for male is 1.67 times the maximum years of service for female. By comparing mean with the same method, similar conclusion is drawn. In this case, the data should be evaluate on a case by case or group by group basis.

3.Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Please explore the many other options in R packages such as ggplot2.

#Create boxlot for salary
boxplot(salary~rank,data = FemaleProf,col = "pink",main = "Boxplot of Salary vs. Rank (Female)")

boxplot(salary~rank,data = MaleProf,col = "light green",main = "Boxplot of Salary vs. Rank (Male)")

#Create histogram for salary
hist(ProfSalary$salary,col= blues9,xlab = "Salary",main = "Histogram of Professor's Salary")

#Create vectors based on yrs of services (Male)
M_g1 <- mean(MaleProf$salary[MaleProf$yrs.service<=15])
M_g2 <- mean(MaleProf$salary[MaleProf$yrs.service>15&MaleProf$yrs.service<=30])
M_g3 <- mean(MaleProf$salary[MaleProf$yrs.service>30])
Mgroup <- c(M_g1,M_g2,M_g3)
#Create vectors based on yrs of services (Female)
F_g1 <- mean(FemaleProf$salary[FemaleProf$yrs.service<=15])
F_g2 <- mean(FemaleProf$salary[FemaleProf$yrs.service>15&FemaleProf$yrs.service<=30])
F_g3 <- mean(FemaleProf$salary[FemaleProf$yrs.service>30])
Fgroup <- c(F_g1,F_g2,F_g3)
MFlabel <- c("0-15 yrs","16-30 yrs","30+ yrs")
#Create a two-two matrix for both mean(salary)
M_F <- rbind(Mgroup,Fgroup)
#Create barplot
barplot(M_F,beside = TRUE, ylim = c(0,max(Mgroup)),names.arg = MFlabel,main = "Mean of Professors'Salary by Yrs of Service Group",col = c("light blue","pink"))

For years of service group 0-15yrs and 16-30yrs, male professors’ salary is relatively higher. Situation is reversed in 30+years group where female professors’ salary is higher.

#Create vectors based on rank (Male)
M_g4 <- mean(MaleProf$salary[MaleProf$rank=="AssocProf"])
M_g5 <- mean(MaleProf$salary[MaleProf$rank=="AsstProf"])
M_g6 <- mean(MaleProf$salary[MaleProf$rank=="Prof"])
Mgroup2 <- c(M_g4,M_g5,M_g6)
#Create vectors based on yrs of services (Female)
F_g4 <- mean(FemaleProf$salary[FemaleProf$rank=="AssocProf"])
F_g5 <- mean(FemaleProf$salary[FemaleProf$rank=="AsstProf"])
F_g6 <- mean(FemaleProf$salary[FemaleProf$rank=="Prof"])
Fgroup2 <- c(F_g4,F_g5,F_g6)
MFlabel2 <- c("AssocProf","AsstProf","Prof")
#Create a two-two matrix for both mean(salary)
M_F2 <- rbind(Mgroup2,Fgroup2)
#Create barplot
barplot(M_F2,beside = TRUE, ylim = c(0,max(Mgroup)),names.arg = MFlabel2,main = "Mean of Professors'Salary by Rank",col = c("light blue","pink"))

On average, male professors’ salary in all rank group is slightly higher than female professors’. No significant difference is oberved.

4.Conclusion

The dataset is collected in a college for monitoring salary difference between genders. Based on previous observation, there is no gender discrimination in the sense of salary level.

First, even though maximum salary of male professor is significantly higher than female professor (231,545 vs.161,101), the maximum age also shows significant difference (60 vs.36). In this case, no discrimination should be reported.

From the graphs, by identifying average salary for each years of service group and rank group, we may still conclude discrimination is not a problem in this campus.

However, since the difference in number of each gender group is huge (39 female vs. 358 male), while not necessarily indicate discrimination during hiring process, gathering additional data and background information is recommended.