(1) Create a .CSV file that includes all of the information above.

The CSV file is saved in the Github.

(2) Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.

(3) Analysis ( I did (2) and (3) together)

library(tidyr)
## Warning: package 'tidyr' was built under R version 3.2.5
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(knitr)
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 3.2.5
salary1<- read.csv("https://raw.githubusercontent.com/xkong100/IS607/master/Project2/Salaries.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = c("", "NA"))
kable(head(salary1))
rank discipline yrs.since.phd yrs.service sex salary
1 Prof B 19 18 Male 139750
2 Prof B 20 16 Male 173200
3 AsstProf B 4 3 Male 79750
4 Prof B 45 39 Male 115000
5 Prof B 40 41 Male 141500
6 AssocProf B 6 6 Male 97000

We want to analyze professors’ salary when they service under 20 years.

salary1 <- data.frame(salary1[,-4])
salary2 <- salary1 %>% filter(yrs.service <=20)
salary3 <- salary2 %>% arrange(salary)
kable(head(salary3))
Var.1 rank discipline yrs.service sex salary
238 AsstProf A 6 Female 63100
227 AsstProf A 1 Male 63900
65 AsstProf B 3 Male 68404
241 AsstProf A 3 Male 69200
235 AsstProf A 3 Male 69700
228 AssocProf A 7 Male 70000
salary3 %>% summarise (Mean_salary_under20yrs = mean(salary), min_salary_under20yrs = min(salary), max_salary_under20yrs= max(salary))
##   Mean_salary_under20yrs min_salary_under20yrs max_salary_under20yrs
## 1               108768.8                 63100                204000

The mean salary is $108768.8, the minimum salary is $63100, and the maximum salary is $204000 when the professors service under 20 years.

We also want to find out professor’s salary when they service more than 20 years.

salary4 <- salary1 %>% filter(yrs.service > 20)
salary5 <- salary4 %>% arrange(salary)
kable(head(salary5))
Var.1 rank discipline yrs.service sex salary
283 Prof A 51 Male 57800
124 AssocProf A 22 Female 62884
318 Prof B 45 Male 67559
300 AssocProf A 39 Male 70700
299 Prof A 43 Male 72300
232 AssocProf A 24 Female 73300
salary5 %>% summarise (Mean_salary_morethatn20yrs = mean(salary), min_salary_morethan20yrs = min(salary), max_salary_morethan20yrs= max(salary))
##   Mean_salary_morethatn20yrs min_salary_morethan20yrs
## 1                   122103.9                    57800
##   max_salary_morethan20yrs
## 1                   231545

The mean salary is $122103.9, the minimum salary is $57800, and the maximum salary is $231545 when the professors service more than 20 years.

By comparing the mean salary, we can conclude that, the service time and the salary has positive relationship.

We also want to find out the relationship between salary and genders.

salary6 <- salary1 %>% spread(sex, salary)
salary6$Female[is.na(salary6$Female)] <-0
salary6$Male[is.na(salary6$Male)] <- 0
salary7 <- within(salary6, {Female <- as.numeric(as.character(Female))
Male <-as.numeric(as.character(Male))})
kable(head(salary7))
Var.1 rank discipline yrs.service Female Male
1 Prof B 18 0 139750
2 Prof B 16 0 173200
3 AsstProf B 3 0 79750
4 Prof B 39 0 115000
5 Prof B 41 0 141500
6 AssocProf B 6 0 97000
salary7 %>% summarise (Mean_salary_Female = mean(Female),  max_salary_Female= max(Female),Mean_salary_male=mean(Male), max_salary_male=max(Male))
##   Mean_salary_Female max_salary_Female Mean_salary_male max_salary_male
## 1           9922.151            161101         103784.3          231545

From our analysis, we can conclude that Female’s salary is lower than the Male’s salary by comparing their mean salaries and max salaries.