Exploratory Data Analysis in R. Choose an interesting dataset and use R graphics to describe the data. You may use base R graphics, or a graphics package of your choice. You should include at least one example of each of the following: • histogram • boxplot • scatterplot
Salaries for Professors - The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S.
The data were collected as part of the on-going effort of the college’s administration to monitor salary differences between male and female faculty members.
The dataset is included in the car package
require(ggplot2)
## Loading required package: ggplot2
require(car)
## Loading required package: car
salDF <- subset(Salaries, Salaries$yrs.service > 0)
salDF <- salDF[order(salDF$salary),]
head(salDF)
## rank discipline yrs.since.phd yrs.service sex salary
## 283 Prof A 51 51 Male 57800
## 124 AssocProf A 25 22 Female 62884
## 238 AsstProf A 7 6 Female 63100
## 227 AsstProf A 3 1 Male 63900
## 318 Prof B 46 45 Male 67559
## 65 AsstProf B 4 3 Male 68404
# Histograms
hist(salDF$salary
,main = "Distribution of Salaries for Professors \n 2008-2009 Academic Year"
,ylab = "Frequency of Salaries"
,xlab = "Salaries \n (nine-month salary total, in dollars)"
,col = "grey")
hist(salDF$yrs.service
,main = "Distribution of Years of Service"
,xlab = "Years of Service"
,col = "tan")
# Boxplot using ggplot
ggplot(salDF
,aes(y = salDF$salary, x = salDF$rank)) +
geom_boxplot() +
xlab("Rank") +
ylab("Salary") +
ggtitle("Median Salary In Different Ranks")
# Scatterplot using ggplot
ggplot(salDF, aes(yrs.service,salary)) +
geom_point() +
facet_wrap(~rank) +
xlab("Years of Service") +
ylab("Salary") +
ggtitle("Relationship Between Years of Service and Salary")
# Summary count per Gender and Rank
table(salDF$sex, salDF$rank)
##
## AsstProf AssocProf Prof
## Female 9 10 17
## Male 48 54 248
# Summary count per Discipline (A=Theoretical Dept, B=Applied Dept) and Gender
table(salDF$discipline, salDF$sex)
##
## Female Male
## A 16 161
## B 20 189
# Summary count per Discipline and Rank
table(salDF$discipline, salDF$rank)
##
## AsstProf AssocProf Prof
## A 21 26 130
## B 36 38 135
# Average Years of Service per Rank
aggregate(yrs.service ~ rank, salDF, mean)
## rank yrs.service
## 1 AsstProf 2.789474
## 2 AssocProf 11.953125
## 3 Prof 22.901887
aggregate(salDF$salary ~ (sex + rank), salDF, median)
## sex rank salDF$salary
## 1 Female AsstProf 77500.0
## 2 Male AsstProf 80027.5
## 3 Female AssocProf 90556.5
## 4 Male AssocProf 95626.5
## 5 Female Prof 122960.0
## 6 Male Prof 123996.0