I choose to look salaries for college professors using the dataset salaries. I’ve always wanted to work at a college, so I thought the data might be useful for me to see what type of salaries professors could make. Also I thought it might be interesting to see if there was a salary gap for gender within this college.
library(RCurl)
library(tidyverse)
url <- getURL("https://rawgit.com/nschettini/CUNY-MSDS-Bridge-R/master/Salaries.csv")
salaries <- read.csv(text = url)
head(salaries)
##   X      rank discipline yrs.since.phd yrs.service  sex salary
## 1 1      Prof          B            19          18 Male 139750
## 2 2      Prof          B            20          16 Male 173200
## 3 3  AsstProf          B             4           3 Male  79750
## 4 4      Prof          B            45          39 Male 115000
## 5 5      Prof          B            40          41 Male 141500
## 6 6 AssocProf          B             6           6 Male  97000

   

Q1. Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.

summary(salaries)
##        X              rank     discipline yrs.since.phd    yrs.service   
##  Min.   :  1   AssocProf: 64   A:181      Min.   : 1.00   Min.   : 0.00  
##  1st Qu.:100   AsstProf : 67   B:216      1st Qu.:12.00   1st Qu.: 7.00  
##  Median :199   Prof     :266              Median :21.00   Median :16.00  
##  Mean   :199                              Mean   :22.31   Mean   :17.61  
##  3rd Qu.:298                              3rd Qu.:32.00   3rd Qu.:27.00  
##  Max.   :397                              Max.   :56.00   Max.   :60.00  
##      sex          salary      
##  Female: 39   Min.   : 57800  
##  Male  :358   1st Qu.: 91000  
##               Median :107300  
##               Mean   :113706  
##               3rd Qu.:134185  
##               Max.   :231545
cat("The mean of all salaries is", mean(salaries$salary))
## The mean of all salaries is 113706.5
cat("The median of all salaries is", median(salaries$salary))
## The median of all salaries is 107300

 

Median Salary between gender and rank

aggregate(salary ~ (sex + rank), salaries, median)
##      sex      rank   salary
## 1 Female AssocProf  90556.5
## 2   Male AssocProf  95626.5
## 3 Female  AsstProf  77000.0
## 4   Male  AsstProf  80182.0
## 5 Female      Prof 120257.5
## 6   Male      Prof 123996.0

 

Median Salary based on gender.

aggregate(salary ~ sex , salaries, median)
##      sex salary
## 1 Female 103750
## 2   Male 108043

 

aggregate(salary ~  rank, salaries, median)
##        rank   salary
## 1 AssocProf  95626.5
## 2  AsstProf  79800.0
## 3      Prof 123321.5

According to the data above, Professors have the highest salary, followed by Associate Professors, then Assistance Professors. This data also shows that Males make more than Females.    

Q2. Data Wrangling. Please perform some basic transformations.

rename1 <- rename(salaries, gender = sex, title = rank, total_salary = salary, years.since.phd = yrs.since.phd, years.serivce = yrs.service)

head(rename1)
##   X     title discipline years.since.phd years.serivce gender total_salary
## 1 1      Prof          B              19            18   Male       139750
## 2 2      Prof          B              20            16   Male       173200
## 3 3  AsstProf          B               4             3   Male        79750
## 4 4      Prof          B              45            39   Male       115000
## 5 5      Prof          B              40            41   Male       141500
## 6 6 AssocProf          B               6             6   Male        97000

 

total_males <- subset(salaries, salaries$sex == "Male")
count(total_males)
## # A tibble: 1 x 1
##       n
##   <int>
## 1   358
total_females <- subset(salaries, salaries$sex == "Female")
count(total_females)
## # A tibble: 1 x 1
##       n
##   <int>
## 1    39

 

Q3. Graphics

It makes sense to see the difference in males vs. females in this dataset for this college:

ggplot(salaries, aes(x= sex)) + geom_bar(aes(fill=rank)) + 
  ggtitle("# of Female vs. Male Professors, by rank") +
  xlab("Female vs. Male") +
  ylab("Count")

 

Looking at the above graph, total # of males out number total # of females

ggplot(salaries, aes(x = salary, fill = rank)) + geom_histogram(bins = 10) +
  theme_dark() +
  xlab("Salaries") +
  ylab("Freq. of Salaries") +
  ggtitle("Histogram of Salaries for Professors") +
  geom_vline(xintercept=median(salaries$salary), col='yellow')+
  geom_vline(xintercept=mean(salaries$salary), col='orange')

This histogram shows the distribution of salaries over our dataset, along with the median highlighted in yellow, and mean highlight in orange.  

ggplot(salaries, aes(y = salary, x = rank)) + geom_boxplot(aes(fill=factor(rank))) +
  theme_dark() +
  xlab("Rank") +
  ylab("Salary") +
  ggtitle("Median Salary for Different Ranks")

This boxplot shows the three different ranks of our data. This shows visually that Professors make the most, followed by Associate Professors, and lastly Assistant Professors.

 ggplot(salaries, aes(x= yrs.service, y= salary)) + 
   geom_point(aes(color = rank, shape = sex), size = 4) +
   theme_dark() +
  xlab("Years of Service") +
  ylab("Salary") +
  ggtitle("Years of Service vs. Salary") +
  geom_smooth(method=lm)

 ggplot(salaries, aes(x= yrs.since.phd, y= salary)) + 
   geom_point(aes(color = rank), size = 4) +
   theme_dark()+
  xlab("Years since PHD was obtained") +
  ylab("Salary") +
  ggtitle("Years since PHD Obtained vs. Salary") +
  geom_smooth(method=lm)

   

Conclusions:

According to the data, it seems that the higher rank you are within the college, the higher average salary you have - professors make more than associate professors, and associate professors make more than assistant professors.
The median salary for all ranks at this college is about 107k - which is a fairly decent salary. It seems that males make slightly more than females at this college (108k vs. 103k median salary M. vs. F.). This college however doesn’t employee many female professors (358 vs. 39).
It would be interesting to see a dataset for colleges nationwide and local (CUNY) and rerun this analysis to see how they compare.