Description of the Assignment

Exploratory Data Analysis in R. Choose an interesting dataset and use R graphics to describe the data. You may use base R graphics, or a graphics package of your choice. You should include at least one example of each of the following: • histogram • boxplot • scatterplot

Brief description of the selected dataset

Salaries for Professors - The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S.

The data were collected as part of the on-going effort of the college’s administration to monitor salary differences between male and female faculty members.

The dataset is included in the car package

Use ggplot2 and car packages

require(ggplot2)
## Loading required package: ggplot2
require(car)
## Loading required package: car

Subset the data to exclude less than 1 Year of Service

salDF <- subset(Salaries, Salaries$yrs.service > 0)
salDF <- salDF[order(salDF$salary),]
head(salDF)
##          rank discipline yrs.since.phd yrs.service    sex salary
## 283      Prof          A            51          51   Male  57800
## 124 AssocProf          A            25          22 Female  62884
## 238  AsstProf          A             7           6 Female  63100
## 227  AsstProf          A             3           1   Male  63900
## 318      Prof          B            46          45   Male  67559
## 65   AsstProf          B             4           3   Male  68404

Show frequency distributions of Salary, Years of Service

# Histograms
hist(salDF$salary
     ,main = "Distribution of Salaries for Professors \n 2008-2009 Academic Year"
     ,ylab = "Frequency of Salaries"
     ,xlab = "Salaries \n (nine-month salary total, in dollars)"
     ,col = "grey")

hist(salDF$yrs.service
     ,main = "Distribution of Years of Service"
     ,xlab = "Years of Service"
     ,col = "tan")

Show median Salary among 3 different Ranks

# Boxplot using ggplot
ggplot(salDF
       ,aes(y = salDF$salary, x = salDF$rank)) + 
       geom_boxplot() +
       xlab("Rank") +
       ylab("Salary") +
       ggtitle("Median Salary In Different Ranks")

Show relationship, if any, between Years of service and Salary across 3 different Ranks

# Scatterplot using ggplot
ggplot(salDF, aes(yrs.service,salary)) + 
      geom_point() + 
      facet_wrap(~rank) +
      xlab("Years of Service") +
      ylab("Salary") +
      ggtitle("Relationship Between Years of Service and Salary")

Show other distributions of categorical values

# Summary count per Gender and Rank
table(salDF$sex, salDF$rank)
##         
##          AsstProf AssocProf Prof
##   Female        9        10   17
##   Male         48        54  248
# Summary count per Discipline (A=Theoretical Dept, B=Applied Dept) and Gender
table(salDF$discipline, salDF$sex)
##    
##     Female Male
##   A     16  161
##   B     20  189
# Summary count per Discipline and Rank
table(salDF$discipline, salDF$rank)
##    
##     AsstProf AssocProf Prof
##   A       21        26  130
##   B       36        38  135
# Average Years of Service per Rank
aggregate(yrs.service ~ rank, salDF, mean)
##        rank yrs.service
## 1  AsstProf    2.789474
## 2 AssocProf   11.953125
## 3      Prof   22.901887

Lastly, show Median Salary between Gender and Rank

aggregate(salDF$salary ~ (sex + rank), salDF, median)
##      sex      rank salDF$salary
## 1 Female  AsstProf      77500.0
## 2   Male  AsstProf      80027.5
## 3 Female AssocProf      90556.5
## 4   Male AssocProf      95626.5
## 5 Female      Prof     122960.0
## 6   Male      Prof     123996.0