RBridge Week Two Assignment

Question One

Part One: Read .csv file into R from github file and use summary function to gain overview:

urlfile<-"https://raw.githubusercontent.com/juanellemarks/Salaries-for-professor/master/salaries%20for%20professor.csv"

dataset<-read.csv(urlfile)
head(dataset, 20)

##         rank discipline yrs.since.phd yrs.service    sex salary
## 1       Prof          B            19          18   Male 139750
## 2       Prof          B            20          16   Male 173200
## 3   AsstProf          B             4           3   Male  79750
## 4       Prof          B            45          39   Male 115000
## 5       Prof          B            40          41   Male 141500
## 6  AssocProf          B             6           6   Male  97000
## 7       Prof          B            30          23   Male 175000
## 8       Prof          B            45          45   Male 147765
## 9       Prof          B            21          20   Male 119250
## 10      Prof          B            18          18 Female 129000
## 11 AssocProf          B            12           8   Male 119800
## 12  AsstProf          B             7           2   Male  79800
## 13  AsstProf          B             1           1   Male  77700
## 14  AsstProf          B             2           0   Male  78000
## 15      Prof          B            20          18   Male 104800
## 16      Prof          B            12           3   Male 117150
## 17      Prof          B            19          20   Male 101000
## 18      Prof          A            38          34   Male 103450
## 19      Prof          A            37          23   Male 124750
## 20      Prof          A            39          36 Female 137000

summary(dataset)

##         rank     discipline yrs.since.phd    yrs.service        sex     
##  AssocProf: 64   A:181      Min.   : 1.00   Min.   : 0.00   Female: 39  
##  AsstProf : 67   B:216      1st Qu.:12.00   1st Qu.: 7.00   Male  :358  
##  Prof     :266              Median :21.00   Median :16.00               
##                             Mean   :22.31   Mean   :17.61               
##                             3rd Qu.:32.00   3rd Qu.:27.00               
##                             Max.   :56.00   Max.   :60.00               
##      salary      
##  Min.   : 57800  
##  1st Qu.: 91000  
##  Median :107300  
##  Mean   :113706  
##  3rd Qu.:134185  
##  Max.   :231545

Part Two: Display the mean and median for at least two attributes of the dataset:

 # mean of the attribute 'salary'
mean(dataset$salary)

## [1] 113706.5

# mean of the attribute 'yrs.service'
mean(dataset$yrs.service)

## [1] 17.61461

# median of the attribute 'salary'
median(dataset$salary)

## [1] 107300

# median of the attribute 'yrs.service'
median(dataset$yrs.service)

## [1] 16

Question Two

Create a new data frame with a subset of the columns and rows.

y<-c(dataset[1:75, c(1,4,5,6)])# Subset created with 75 rows and columns 1, 4, 5 and 6 from the original data set.
dataset2<-data.frame(y)
head(dataset2,20)

##         rank yrs.service    sex salary
## 1       Prof          18   Male 139750
## 2       Prof          16   Male 173200
## 3   AsstProf           3   Male  79750
## 4       Prof          39   Male 115000
## 5       Prof          41   Male 141500
## 6  AssocProf           6   Male  97000
## 7       Prof          23   Male 175000
## 8       Prof          45   Male 147765
## 9       Prof          20   Male 119250
## 10      Prof          18 Female 129000
## 11 AssocProf           8   Male 119800
## 12  AsstProf           2   Male  79800
## 13  AsstProf           1   Male  77700
## 14  AsstProf           0   Male  78000
## 15      Prof          18   Male 104800
## 16      Prof           3   Male 117150
## 17      Prof          20   Male 101000
## 18      Prof          34   Male 103450
## 19      Prof          23   Male 124750
## 20      Prof          36 Female 137000

Question Three

Create new column names for the new data frame:

names(dataset2)<-c("Post", "Service.Years", "Gender","Pay")
head(dataset2,20)

##         Post Service.Years Gender    Pay
## 1       Prof            18   Male 139750
## 2       Prof            16   Male 173200
## 3   AsstProf             3   Male  79750
## 4       Prof            39   Male 115000
## 5       Prof            41   Male 141500
## 6  AssocProf             6   Male  97000
## 7       Prof            23   Male 175000
## 8       Prof            45   Male 147765
## 9       Prof            20   Male 119250
## 10      Prof            18 Female 129000
## 11 AssocProf             8   Male 119800
## 12  AsstProf             2   Male  79800
## 13  AsstProf             1   Male  77700
## 14  AsstProf             0   Male  78000
## 15      Prof            18   Male 104800
## 16      Prof             3   Male 117150
## 17      Prof            20   Male 101000
## 18      Prof            34   Male 103450
## 19      Prof            23   Male 124750
## 20      Prof            36 Female 137000

Question Four

Part one: Use summary function to overview new data frame

summary(dataset2)

##         Post    Service.Years      Gender        Pay        
##  AssocProf:12   Min.   : 0.00   Female:10   Min.   : 68404  
##  AsstProf :16   1st Qu.: 4.00   Male  :65   1st Qu.: 89890  
##  Prof     :47   Median :15.00               Median :103450  
##                 Mean   :16.01               Mean   :108656  
##                 3rd Qu.:24.00               3rd Qu.:122275  
##                 Max.   :45.00               Max.   :231545

Part Two:Print mean and median for same two attributes as original data set

# mean of the attribute 'Pay'
mean(dataset2$Pay)

## [1] 108655.7

# mean of the attribute 'Service.Years'
mean(dataset2$Service.Years)

## [1] 16.01333

# median of the attribute 'Pay'
median(dataset2$Pay)

## [1] 103450

# median of the attribute 'Service.Years'
median(dataset2$Service.Years)

## [1] 15

Part Three:Comparison of mean and median of two attributes in the original and new data sets:

The value of the mean salary of the original data set is approximately $5 051 more than the mean of the new data set. The value of the mean years of service of the two data sets varied by aproximately 1.5 years with the original data set having the larger mean of 17.6 years. The value of the median salary of the original data set was $107 300 while the median salary of the new data set was $103 450. This resolves to a $3 850 difference between the two median salary values. The value of the median years of service between the two data sets varied by 1 year, with the original data set median years of service being the larger value, 16.

Question Five

Rename at least three values in a column from the new dataset so that every value in that column is renamed:

require(plyr)

## Loading required package: plyr

x<-mapvalues(dataset2[,1], c("AssocProf","AsstProf","Prof"), c("AssociateProfessor", "AssistantProfessor", "Professor"))# renaming values in column 1
dataset2[,1]<-x
head(dataset2,20)

##                  Post Service.Years Gender    Pay
## 1           Professor            18   Male 139750
## 2           Professor            16   Male 173200
## 3  AssistantProfessor             3   Male  79750
## 4           Professor            39   Male 115000
## 5           Professor            41   Male 141500
## 6  AssociateProfessor             6   Male  97000
## 7           Professor            23   Male 175000
## 8           Professor            45   Male 147765
## 9           Professor            20   Male 119250
## 10          Professor            18 Female 129000
## 11 AssociateProfessor             8   Male 119800
## 12 AssistantProfessor             2   Male  79800
## 13 AssistantProfessor             1   Male  77700
## 14 AssistantProfessor             0   Male  78000
## 15          Professor            18   Male 104800
## 16          Professor             3   Male 117150
## 17          Professor            20   Male 101000
## 18          Professor            34   Male 103450
## 19          Professor            23   Male 124750
## 20          Professor            36 Female 137000