Data used for answering the questions:
Salaries <- read.csv("~/Uni/SS 2015/R/Salaries.csv")
Section 1: Dataset Description
I searched for R datasets in Google and found the following website: https://vincentarelbundock.github.io/Rdatasets/datasets.html I downloaded the “Salaries” dataset as an Excel document and imported it into R.
The data was originally collected in an US college. It was collected over a period of nine months in the years 2008 and 2009. They asked for the academic salary of Assistant Professors, Associate Professors and Professors. The original goal of the data collection was to determine if there were any salary differences for female and male faculty members.
The dataset consists of 397 rows and 7 columns.
The names of the columns are “X”, “rank”, “discipline”, “yrs.since.phd”, “yrs.service”, “sex”, “salary”. The variable “X” represents an individual faculty member. “rank” has 3 factors. It shows if the person is an Assistant Professors, an Associate Professors and or a Professors. The variable “discipline” has 2 factors (A and B). “A” means the person is working in a “theoretical” department. “B” means the person is working in an “applied” department. “yrs.since.phd” shows the number of years a person already has its phd. “yrs.service”shows the number of years a person is already working in his or her profession. “sex” has the factors “Male” and “Female”. The variable “salary” shows the nine-month salary in US-dollars.
Section 2: Questions
What is the mean, median and standard deviation of the total salaries at the college?
How are the salaries distributed?
Is there a difference in salary between men and women?
Is there a relationship between salary in euro and the years a person is already in service or the amount of years a person already has its phd?
What is the mean salary for Assistant Professors, Associate Professors and Professors respectively?
Section 3: Analysis
Salaries$salary.euro <- Salaries$salary * 0.9
head(Salaries)
## X rank discipline yrs.since.phd yrs.service sex salary salary.euro
## 1 1 Prof B 19 18 Male 139750 125775
## 2 2 Prof B 20 16 Male 173200 155880
## 3 3 AsstProf B 4 3 Male 79750 71775
## 4 4 Prof B 45 39 Male 115000 103500
## 5 5 Prof B 40 41 Male 141500 127350
## 6 6 AssocProf B 6 6 Male 97000 87300
sd(Salaries$salary)
## [1] 30289.04
mean(Salaries$salary)
## [1] 113706.5
median(Salaries$salary)
## [1] 107300
salary.male <- subset(Salaries, subset = sex == "Male")$salary
salary.female <- subset(Salaries, subset = sex == "Female")$salary
test.result1 <- t.test(x = salary.male,
y = salary.female,
alternative = "two.sided"
)
test.result1
##
## Welch Two Sample t-test
##
## data: salary.male and salary.female
## t = 3.1615, df = 50.122, p-value = 0.002664
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 5138.102 23037.916
## sample estimates:
## mean of x mean of y
## 115090.4 101002.4
test.result2 <- cor.test(x = Salaries$yrs.service,
y = Salaries$salary.euro
)
test.result2
##
## Pearson's product-moment correlation
##
## data: Salaries$yrs.service and Salaries$salary.euro
## t = 7.0602, df = 395, p-value = 7.529e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2443740 0.4193506
## sample estimates:
## cor
## 0.3347447
Salaries.lm <- lm(salary.euro ~ yrs.since.phd,
data = Salaries)
summary(Salaries.lm)
##
## Call:
## lm(formula = salary.euro ~ yrs.since.phd, data = Salaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -75754 -17489 -2572 14478 92145
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82546.82 2489.21 33.162 <2e-16 ***
## yrs.since.phd 886.81 96.63 9.177 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24780 on 395 degrees of freedom
## Multiple R-squared: 0.1758, Adjusted R-squared: 0.1737
## F-statistic: 84.23 on 1 and 395 DF, p-value: < 2.2e-16
plot(x = 1, y = 1, xlab = "Time", ylab = "Salary in Euro",
type = "n", main = "The Effect of Time since phd and Years of Service on Salary in Euro",
xlim = c(0, 60), ylim = c(0, 210000))
points(x = Salaries$yrs.since.phd, y = Salaries$salary.euro, pch = 16, cex = 0.7, col = "red")
points(x = Salaries$yrs.service, y = Salaries$salary.euro, pch = 16, cex = 0.7, col = "blue")
summary(lm(formula = salary.euro ~ yrs.since.phd, data = Salaries))
##
## Call:
## lm(formula = salary.euro ~ yrs.since.phd, data = Salaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -75754 -17489 -2572 14478 92145
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82546.82 2489.21 33.162 <2e-16 ***
## yrs.since.phd 886.81 96.63 9.177 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24780 on 395 degrees of freedom
## Multiple R-squared: 0.1758, Adjusted R-squared: 0.1737
## F-statistic: 84.23 on 1 and 395 DF, p-value: < 2.2e-16
summary(lm(formula = salary.euro ~ yrs.service, data = Salaries))
##
## Call:
## lm(formula = salary.euro ~ yrs.service, data = Salaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -73739 -18460 -3399 14775 91752
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 89977.19 2174.94 41.37 < 2e-16 ***
## yrs.service 701.61 99.38 7.06 7.53e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25720 on 395 degrees of freedom
## Multiple R-squared: 0.1121, Adjusted R-squared: 0.1098
## F-statistic: 49.85 on 1 and 395 DF, p-value: 7.529e-12
abline(a = 82546.82, b = 886.81, lwd = 3, col = "red")
abline(a = 89977.19, b = 701.61, lwd = 3, col = "blue")
legend("bottomright",
legend = c("years since phd", "years service"),
col = c('red', 'blue'),
pch = c(16, 16),
bg = "white"
)
hist(x = Salaries$salary,
main = "Salary Distribution",
xlab = "Salary"
)
abline(v = mean(Salaries$salary), lwd = 2, col = "blue")
abline(v = median(Salaries$salary), lwd = 2, col = "red")
legend("topright",
legend = c("Mean", "Median"),
col = c('blue', 'red'),
pch = c(16, 16),
bg = "white"
)
aggregate(formula = salary ~ rank,
FUN = mean,
na.rm = T,
data = Salaries
)
## rank salary
## 1 AssocProf 93876.44
## 2 AsstProf 80775.99
## 3 Prof 126772.11
par(mfrow = c(1, 2))
plot(x = 1, y = 1, xlab = "Time", ylab = "Salary in Euro",
type = "n", main = "The Effect of Time since phd and Years of Service on Salary in Euro",
xlim = c(0, 60), ylim = c(0, 210000))
points(x = Salaries$yrs.since.phd, y = Salaries$salary.euro, pch = 16, cex = 0.7, col = "red")
points(x = Salaries$yrs.service, y = Salaries$salary.euro, pch = 16, cex = 0.7, col = "blue")
abline(a = 91718.7, b = 985.3, lwd = 3, col = "red")
abline(a = 99974.7, b = 779.6, lwd = 3, col = "blue")
hist(x = Salaries$salary,
main = "Salary Distribution",
xlab = "Salary"
)
abline(v = mean(Salaries$salary), lwd = 2, col = "blue")
abline(v = median(Salaries$salary), lwd = 2, col = "red")
legend("topright",
legend = c("Mean", "Median"),
col = c('blue', 'red'),
pch = c(16, 16),
bg = "white"
)
Professor.Salary <- function(what) {
valid.input <- what %in% c("AsstProf", "AssocProf", "Prof")
if(valid.input == TRUE) {
if (what == "AsstProf") {output <- mean(Salaries$salary[Salaries$rank == "AsstProf"])}
if (what == "AssocProf") {output <- mean(Salaries$salary[Salaries$rank == "AssocProf"])}
if (what == "Prof") {output <- mean(Salaries$salary[Salaries$rank == "Prof"])}
}
if(valid.input == FALSE) {
output <- "Please enter AsstProf, AssocProf or Prof"
}
return(output)
}
Professor.Salary("AsstProf")
## [1] 80775.99
Professor.Salary("AssocProf")
## [1] 93876.44
Professor.Salary("Prof")
## [1] 126772.1
Professor.Salary("LOL")
## [1] "Please enter AsstProf, AssocProf or Prof"
Section 4: Conclusion
The analysis of the dataset provided several interesting results. The results were both surprising and unsurprising giving insights for potential future political changes.
Descriptive statistics indicate that the salaries at the college are approximately normally distributed. Only few people are gaining less than 50 000 dollars or more than 200 000 dollars, while most people have a salary around 100 000 dollars. Descriptive statistics also indicate that there is a relationship between the salary and the rank of a person in the faculty. It seems to make a difference if a person is an Assistant Professor, an Associate Professor or a Professor with Professors having the highest and Assistant Professors having the lowest salary. Future analysis need to determine if this relationship is statistically significant.
Unfortunately, at this college, there is still a statistically sifnificant difference between the salary of men and women. In a period of nine months men are gaining on average approximately 15 000 dollars more than women.
test.result1
##
## Welch Two Sample t-test
##
## data: salary.male and salary.female
## t = 3.1615, df = 50.122, p-value = 0.002664
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 5138.102 23037.916
## sample estimates:
## mean of x mean of y
## 115090.4 101002.4
However, the analyses did not control for rank, years since phd and years in service. Future analyses need to account for these factors to see whether political interventions might be neccessary to adjust salary distribution in the college.
Another interesting result was the effect of time on salary. Both time since the person already has its phd and the time of service had and effect on the size of the salary. People who have their phd for a longer period of time gain more money. Furthermore, people who are more years in service also have a higher salary. Regression analyses revealed that these relationships were statistically significant.