urlfile<-"https://raw.githubusercontent.com/juanellemarks/Salaries-for-professor/master/salaries%20for%20professor.csv"
dataset<-read.csv(urlfile)
head(dataset, 20)
## rank discipline yrs.since.phd yrs.service sex salary
## 1 Prof B 19 18 Male 139750
## 2 Prof B 20 16 Male 173200
## 3 AsstProf B 4 3 Male 79750
## 4 Prof B 45 39 Male 115000
## 5 Prof B 40 41 Male 141500
## 6 AssocProf B 6 6 Male 97000
## 7 Prof B 30 23 Male 175000
## 8 Prof B 45 45 Male 147765
## 9 Prof B 21 20 Male 119250
## 10 Prof B 18 18 Female 129000
## 11 AssocProf B 12 8 Male 119800
## 12 AsstProf B 7 2 Male 79800
## 13 AsstProf B 1 1 Male 77700
## 14 AsstProf B 2 0 Male 78000
## 15 Prof B 20 18 Male 104800
## 16 Prof B 12 3 Male 117150
## 17 Prof B 19 20 Male 101000
## 18 Prof A 38 34 Male 103450
## 19 Prof A 37 23 Male 124750
## 20 Prof A 39 36 Female 137000
summary(dataset)
## rank discipline yrs.since.phd yrs.service sex
## AssocProf: 64 A:181 Min. : 1.00 Min. : 0.00 Female: 39
## AsstProf : 67 B:216 1st Qu.:12.00 1st Qu.: 7.00 Male :358
## Prof :266 Median :21.00 Median :16.00
## Mean :22.31 Mean :17.61
## 3rd Qu.:32.00 3rd Qu.:27.00
## Max. :56.00 Max. :60.00
## salary
## Min. : 57800
## 1st Qu.: 91000
## Median :107300
## Mean :113706
## 3rd Qu.:134185
## Max. :231545
# mean of the attribute 'salary'
mean(dataset$salary)
## [1] 113706.5
# mean of the attribute 'yrs.service'
mean(dataset$yrs.service)
## [1] 17.61461
# median of the attribute 'salary'
median(dataset$salary)
## [1] 107300
# median of the attribute 'yrs.service'
median(dataset$yrs.service)
## [1] 16
y<-c(dataset[1:75, c(1,4,5,6)])# Subset created with 75 rows and columns 1, 4, 5 and 6 from the original data set.
dataset2<-data.frame(y)
head(dataset2,20)
## rank yrs.service sex salary
## 1 Prof 18 Male 139750
## 2 Prof 16 Male 173200
## 3 AsstProf 3 Male 79750
## 4 Prof 39 Male 115000
## 5 Prof 41 Male 141500
## 6 AssocProf 6 Male 97000
## 7 Prof 23 Male 175000
## 8 Prof 45 Male 147765
## 9 Prof 20 Male 119250
## 10 Prof 18 Female 129000
## 11 AssocProf 8 Male 119800
## 12 AsstProf 2 Male 79800
## 13 AsstProf 1 Male 77700
## 14 AsstProf 0 Male 78000
## 15 Prof 18 Male 104800
## 16 Prof 3 Male 117150
## 17 Prof 20 Male 101000
## 18 Prof 34 Male 103450
## 19 Prof 23 Male 124750
## 20 Prof 36 Female 137000
names(dataset2)<-c("Post", "Service.Years", "Gender","Pay")
head(dataset2,20)
## Post Service.Years Gender Pay
## 1 Prof 18 Male 139750
## 2 Prof 16 Male 173200
## 3 AsstProf 3 Male 79750
## 4 Prof 39 Male 115000
## 5 Prof 41 Male 141500
## 6 AssocProf 6 Male 97000
## 7 Prof 23 Male 175000
## 8 Prof 45 Male 147765
## 9 Prof 20 Male 119250
## 10 Prof 18 Female 129000
## 11 AssocProf 8 Male 119800
## 12 AsstProf 2 Male 79800
## 13 AsstProf 1 Male 77700
## 14 AsstProf 0 Male 78000
## 15 Prof 18 Male 104800
## 16 Prof 3 Male 117150
## 17 Prof 20 Male 101000
## 18 Prof 34 Male 103450
## 19 Prof 23 Male 124750
## 20 Prof 36 Female 137000
summary(dataset2)
## Post Service.Years Gender Pay
## AssocProf:12 Min. : 0.00 Female:10 Min. : 68404
## AsstProf :16 1st Qu.: 4.00 Male :65 1st Qu.: 89890
## Prof :47 Median :15.00 Median :103450
## Mean :16.01 Mean :108656
## 3rd Qu.:24.00 3rd Qu.:122275
## Max. :45.00 Max. :231545
# mean of the attribute 'Pay'
mean(dataset2$Pay)
## [1] 108655.7
# mean of the attribute 'Service.Years'
mean(dataset2$Service.Years)
## [1] 16.01333
# median of the attribute 'Pay'
median(dataset2$Pay)
## [1] 103450
# median of the attribute 'Service.Years'
median(dataset2$Service.Years)
## [1] 15
The value of the mean salary of the original data set is approximately $5 051 more than the mean of the new data set. The value of the mean years of service of the two data sets varied by aproximately 1.5 years with the original data set having the larger mean of 17.6 years. The value of the median salary of the original data set was $107 300 while the median salary of the new data set was $103 450. This resolves to a $3 850 difference between the two median salary values. The value of the median years of service between the two data sets varied by 1 year, with the original data set median years of service being the larger value, 16.
require(plyr)
## Loading required package: plyr
x<-mapvalues(dataset2[,1], c("AssocProf","AsstProf","Prof"), c("AssociateProfessor", "AssistantProfessor", "Professor"))# renaming values in column 1
dataset2[,1]<-x
head(dataset2,20)
## Post Service.Years Gender Pay
## 1 Professor 18 Male 139750
## 2 Professor 16 Male 173200
## 3 AssistantProfessor 3 Male 79750
## 4 Professor 39 Male 115000
## 5 Professor 41 Male 141500
## 6 AssociateProfessor 6 Male 97000
## 7 Professor 23 Male 175000
## 8 Professor 45 Male 147765
## 9 Professor 20 Male 119250
## 10 Professor 18 Female 129000
## 11 AssociateProfessor 8 Male 119800
## 12 AssistantProfessor 2 Male 79800
## 13 AssistantProfessor 1 Male 77700
## 14 AssistantProfessor 0 Male 78000
## 15 Professor 18 Male 104800
## 16 Professor 3 Male 117150
## 17 Professor 20 Male 101000
## 18 Professor 34 Male 103450
## 19 Professor 23 Male 124750
## 20 Professor 36 Female 137000