salary = read.csv("C:\\Thach\\UTS\\Teaching\\TRM\\Practical Data Analysis\\2024_Autumn semester\\Data\\Professorial Salaries.csv")
library(table1)
table1(~ Sex + Rank + Discipline + Yrs.since.phd + Yrs.service + NPubs + Ncits + Salary, data = salary)
Overall (N=397) |
|
---|---|
Sex | |
Female | 39 (9.8%) |
Male | 358 (90.2%) |
Rank | |
AssocProf | 64 (16.1%) |
AsstProf | 67 (16.9%) |
Prof | 266 (67.0%) |
Discipline | |
A | 181 (45.6%) |
B | 216 (54.4%) |
Yrs.since.phd | |
Mean (SD) | 22.3 (12.9) |
Median [Min, Max] | 21.0 [1.00, 56.0] |
Yrs.service | |
Mean (SD) | 17.6 (13.0) |
Median [Min, Max] | 16.0 [0, 60.0] |
NPubs | |
Mean (SD) | 18.2 (14.0) |
Median [Min, Max] | 13.0 [1.00, 69.0] |
Ncits | |
Mean (SD) | 40.2 (16.9) |
Median [Min, Max] | 35.0 [1.00, 90.0] |
Salary | |
Mean (SD) | 114000 (30300) |
Median [Min, Max] | 107000 [57800, 232000] |
library(ggplot2)
ggplot(data = salary, aes(x = Salary)) + geom_histogram(aes(y = ..density..), color = "white", fill = "blue") + ggtitle("Professors' salaries (USD)") + theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
m1 = lm(Salary ~ Ncits, data = salary)
summary(m1)
##
## Call:
## lm(formula = Salary ~ Ncits, data = salary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -61660 -23012 -5654 20638 120083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 105664.57 3899.87 27.094 <2e-16 ***
## Ncits 199.93 89.36 2.237 0.0258 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30140 on 395 degrees of freedom
## Multiple R-squared: 0.01251, Adjusted R-squared: 0.01001
## F-statistic: 5.005 on 1 and 395 DF, p-value: 0.02583
par(mfrow = c(2,2))
plot(m1)
Interpretation: The assumptions are met.
There was evidence (P= 0.0258) that every one increase in the number of citations was associated with on average $199.9 increase in professors’ salaries, ranging from $25 to $375.