In-class Exercise 1: Introduction to R

Key Concepts: 1.

Part 1: Import the faculty.csv file into R. What are the column names? How many observations are there? How many variables?

# Import the data
faculty <- read.csv("E:/Quant/inClassExercises/InClassExerciseData/faculty.csv")

# Get the column names
names(faculty)
##  [1] "AYSALARY" "R1"       "R2"       "R7"       "PRIOREXP" "YRBG"    
##  [7] "YRRANK"   "TERMDEG"  "YRDG"     "EMINENT"  "FEMALE"

# How many observations and variables? By using the names function, we can
# see that there were 11 variables.  We can also use the length() function
length(faculty)
## [1] 11
# By looking into one of the columns, we can see how many observations
# there are...
length(faculty$AYSALARY)
## [1] 725

Part 2: Is annual salary normally distributed?

# we can plot a histogram to see if the data is normally distributed
hist(faculty$AYSALARY)

plot of chunk unnamed-chunk-2


# We can also use a boxplot
boxplot(faculty$AYSALARY)

plot of chunk unnamed-chunk-2

Part 3: Does it appear that male and female faculty members make the same annual salary?

# To answer this, we can plot 2 side-by-side boxplots and compare them.
boxplot(faculty$AYSALARY ~ faculty$FEMALE, main = "Annual Salary broken into earnings \n by men and women", 
    ylab = "Annual Salary", xlab = "Sex \n (0 = men; 1 = women)")

plot of chunk unnamed-chunk-3

Part 4: Does there appear to be a relationship bewteen salary and the number of years of employment?

# We can explore relationships using scatterplots
plot(faculty$AYSALARY, faculty$YRBG)  #I'm not sure which is the appropriate YEAR column

plot of chunk unnamed-chunk-4

plot(faculty$AYSALARY, faculty$YRRANK)  #I'm not sure which is the appropriate YEAR column

plot of chunk unnamed-chunk-4

plot(faculty$AYSALARY, faculty$YRDG)  #I'm not sure which is the appropriate YEAR column

plot of chunk unnamed-chunk-4

BONUS: Create a new variable combining R1, R2, and R3 into one categorical variable of rank. Does one category appear to have higer salaries?

shotData$white <- ifelse(shotData$Race == 0, c(1), c(0))

# Combine the columns First, create a new field
faculty$Ranks <- ifelse(faculty$R1 == 1, c(1), ifelse(faculty$R2 == 1, c(2), 
    ifelse(faculty$R7, c(7), c(99))))

# Create boxplots for all of the variables in the new Ranks Field.
boxplot(faculty$AYSALARY ~ faculty$Ranks)

plot of chunk unnamed-chunk-5