The column names are listed below. There are 725 observations and 11 variables in total.
faculty2 <- read.csv("C:/Users/Li Xu/Documents/aaa/CU Boulder/GEOG5023/faculty.csv",
sep = ",", header = T)
names(faculty2)
## [1] "AYSALARY" "R1" "R2" "R7" "PRIOREXP" "YRBG"
## [7] "YRRANK" "TERMDEG" "YRDG" "EMINENT" "FEMALE"
nrow(faculty2)
## [1] 725
ncol(faculty2)
## [1] 11
This distribution is somehow normally distributed except for several very high values.
hist(faculty2$AYSALARY, breaks = seq(20000, 120000, by = 2000), main = "Faculty Annual Salary",
xlab = "Annual Salary (US Dollars)", col = "blue")
According to the boxplot below, it is obvious that male faculties make more money than female faculties on average.
boxplot(faculty2$AYSALARY ~ faculty2$FEMALE, main = "Faculty Annual Salary by gender",
ylab = "Annual Salary (US Dollars)", xlab = "Gender (0=Male, 1=Female)",
col = rainbow(2))
When looking at the plot below, it seems that there is positive correlationship between salary and the number of years of employment.
plot(faculty2$YRBG, faculty2$AYSALARY, main = "Faculty Annual Salary vs Time of Employment",
xlab = "Time of employment (years)", ylab = "Annual Salary (US Dollars)",
pch = 1, col = "blue")
By combing R1, R2, R3 into one catergorical variable, we can create a boxplot as follows. Thus full professors have the highest salaries, Instructor or lecturers have the lowest.
faculty2$POSITION[faculty2$R1 == 1] <- "FP"
faculty2$POSITION[faculty2$R2 == 1] <- "AP"
faculty2$POSITION[faculty2$R7 == 1] <- "IL"
faculty2$POSITION[faculty2$R1 == 0 & faculty2$R2 == 0 & faculty2$R7 == 0] <- "NA"
boxplot(faculty2$AYSALARY ~ faculty2$POSITION, main = "Faculty Annual Salary by position",
ylab = "Annual Salary (US Dollars)", xlab = "Position (FP=Full Professor, AP=Asssociate Professor, IL=Instructor/Lecturer, NA=Other)",
col = rainbow(4))
Created by: Li Xu; Created on: 01/22/2013; Updated on: 01/26/2013