The CSV file is saved in the Github.
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.2.5
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(knitr)
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 3.2.5
salary1<- read.csv("https://raw.githubusercontent.com/xkong100/IS607/master/Project2/Salaries.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = c("", "NA"))
kable(head(salary1))
| rank | discipline | yrs.since.phd | yrs.service | sex | salary | |
|---|---|---|---|---|---|---|
| 1 | Prof | B | 19 | 18 | Male | 139750 |
| 2 | Prof | B | 20 | 16 | Male | 173200 |
| 3 | AsstProf | B | 4 | 3 | Male | 79750 |
| 4 | Prof | B | 45 | 39 | Male | 115000 |
| 5 | Prof | B | 40 | 41 | Male | 141500 |
| 6 | AssocProf | B | 6 | 6 | Male | 97000 |
salary1 <- data.frame(salary1[,-4])
salary2 <- salary1 %>% filter(yrs.service <=20)
salary3 <- salary2 %>% arrange(salary)
kable(head(salary3))
| Var.1 | rank | discipline | yrs.service | sex | salary |
|---|---|---|---|---|---|
| 238 | AsstProf | A | 6 | Female | 63100 |
| 227 | AsstProf | A | 1 | Male | 63900 |
| 65 | AsstProf | B | 3 | Male | 68404 |
| 241 | AsstProf | A | 3 | Male | 69200 |
| 235 | AsstProf | A | 3 | Male | 69700 |
| 228 | AssocProf | A | 7 | Male | 70000 |
salary3 %>% summarise (Mean_salary_under20yrs = mean(salary), min_salary_under20yrs = min(salary), max_salary_under20yrs= max(salary))
## Mean_salary_under20yrs min_salary_under20yrs max_salary_under20yrs
## 1 108768.8 63100 204000
The mean salary is $108768.8, the minimum salary is $63100, and the maximum salary is $204000 when the professors service under 20 years.
salary4 <- salary1 %>% filter(yrs.service > 20)
salary5 <- salary4 %>% arrange(salary)
kable(head(salary5))
| Var.1 | rank | discipline | yrs.service | sex | salary |
|---|---|---|---|---|---|
| 283 | Prof | A | 51 | Male | 57800 |
| 124 | AssocProf | A | 22 | Female | 62884 |
| 318 | Prof | B | 45 | Male | 67559 |
| 300 | AssocProf | A | 39 | Male | 70700 |
| 299 | Prof | A | 43 | Male | 72300 |
| 232 | AssocProf | A | 24 | Female | 73300 |
salary5 %>% summarise (Mean_salary_morethatn20yrs = mean(salary), min_salary_morethan20yrs = min(salary), max_salary_morethan20yrs= max(salary))
## Mean_salary_morethatn20yrs min_salary_morethan20yrs
## 1 122103.9 57800
## max_salary_morethan20yrs
## 1 231545
The mean salary is $122103.9, the minimum salary is $57800, and the maximum salary is $231545 when the professors service more than 20 years.
By comparing the mean salary, we can conclude that, the service time and the salary has positive relationship.
salary6 <- salary1 %>% spread(sex, salary)
salary6$Female[is.na(salary6$Female)] <-0
salary6$Male[is.na(salary6$Male)] <- 0
salary7 <- within(salary6, {Female <- as.numeric(as.character(Female))
Male <-as.numeric(as.character(Male))})
kable(head(salary7))
| Var.1 | rank | discipline | yrs.service | Female | Male |
|---|---|---|---|---|---|
| 1 | Prof | B | 18 | 0 | 139750 |
| 2 | Prof | B | 16 | 0 | 173200 |
| 3 | AsstProf | B | 3 | 0 | 79750 |
| 4 | Prof | B | 39 | 0 | 115000 |
| 5 | Prof | B | 41 | 0 | 141500 |
| 6 | AssocProf | B | 6 | 0 | 97000 |
salary7 %>% summarise (Mean_salary_Female = mean(Female), max_salary_Female= max(Female),Mean_salary_male=mean(Male), max_salary_male=max(Male))
## Mean_salary_Female max_salary_Female Mean_salary_male max_salary_male
## 1 9922.151 161101 103784.3 231545
From our analysis, we can conclude that Female’s salary is lower than the Male’s salary by comparing their mean salaries and max salaries.