#Introduction This research looks into what determines how much professors get paid at universities, and it highlights how important it is to be clear and fair about their pay to keep a good learning environment. The study is driven by wanting to understand how paying professors well affects getting and keeping good teachers. The researchers will look at things like how long a professor has been teaching, how many years they’ve been working, and other factors using statistical methods. The goal is to find practical information that can help universities make better rules, improve how they pay professors, and be part of the conversation about fairness in education. The study hopes to find useful information to help universities make good decisions, be fair, and stay competitive in getting and keeping a diverse and talented group of professors. Later on, the study will give more details about the data they used, how they analyzed it, and what they found, with the aim of giving useful advice to universities dealing with paying professors and having a diverse staff. ##
head(data)
##
## 1 function (..., list = character(), package = NULL, lib.loc = NULL,
## 2 verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE)
## 3 {
## 4 fileExt <- function(x) {
## 5 db <- grepl("\\\\.[^.]+\\\\.(gz|bz2|xz)$", x)
## 6 ans <- sub(".*\\\\.", "", x)
dim(data)
## NULL
names(data)
## NULL
The date contain 395 individuals that are houses and 6 variables ### Description of the data
library(readxl)
data <- readxl::read_excel("E:/Khi tôi học/2023.1/Thống kê ứng dụng/xlsx/ProfessorSalaries.xlsx")
data(data)
## Warning in data(data): data set 'data' not found
str(data)
## tibble [397 × 6] (S3: tbl_df/tbl/data.frame)
## $ rank : chr [1:397] "Prof" "Prof" "AsstProf" "Prof" ...
## $ discipline : chr [1:397] "B" "B" "B" "B" ...
## $ yrs.since.phd: num [1:397] 19 20 4 45 40 6 30 45 21 18 ...
## $ yrs.service : num [1:397] 18 16 3 39 41 6 23 45 20 18 ...
## $ sex : chr [1:397] "Male" "Male" "Male" "Male" ...
## $ salary : num [1:397] 139750 173200 79750 115000 141500 ...
summary(data)
## rank discipline yrs.since.phd yrs.service
## Length:397 Length:397 Min. : 1.00 Min. : 0.00
## Class :character Class :character 1st Qu.:12.00 1st Qu.: 7.00
## Mode :character Mode :character Median :21.00 Median :16.00
## Mean :22.31 Mean :17.61
## 3rd Qu.:32.00 3rd Qu.:27.00
## Max. :56.00 Max. :60.00
## sex salary
## Length:397 Min. : 57800
## Class :character 1st Qu.: 91000
## Mode :character Median :107300
## Mean :113706
## 3rd Qu.:134185
## Max. :231545
hist(data$'yrs.since.phd', main = "Years Since Ph.D.", xlab = "Years")
library(ggplot2)
# Fit a linear regression model
model <- lm(salary ~ yrs.since.phd + yrs.service, data =data)
# Summarize the model
summary(model)
##
## Call:
## lm(formula = salary ~ yrs.since.phd + yrs.service, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -79735 -19823 -2617 15149 106149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 89912.2 2843.6 31.620 < 2e-16 ***
## yrs.since.phd 1562.9 256.8 6.086 2.75e-09 ***
## yrs.service -629.1 254.5 -2.472 0.0138 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27360 on 394 degrees of freedom
## Multiple R-squared: 0.1883, Adjusted R-squared: 0.1842
## F-statistic: 45.71 on 2 and 394 DF, p-value: < 2.2e-16
# Visualize the regression results
ggplot(data, aes(x = yrs.since.phd, y = salary)) +
geom_point(color = "black") +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Linear Regression: Salary ~ Years Since PhD")
## `geom_smooth()` using formula = 'y ~ x'
# Tạo dữ liệu giả định
deptrai <- data.frame(
Salary = rnorm(400),
yrs_since_phd = rnorm(400),
yrs_service = rnorm(400),
academic_rank = factor(sample(c("Assistant", "Associate", "Full"), 100, replace = TRUE)),
discipline = factor(sample(c("A", "B"), 100, replace = TRUE)),
gender = factor(sample(c("Male", "Female"), 100, replace = TRUE))
)
# Tạo dữ liệu giả định
deptrai <- data.frame(
Salary = rnorm(100),
yrs_since_phd = rnorm(100),
yrs_service = rnorm(100),
academic_rank = factor(sample(c("Assistant", "Associate", "Full"), 100, replace = TRUE)),
discipline = factor(sample(c("A", "B"), 100, replace = TRUE)),
gender = factor(sample(c("Male", "Female"), 100, replace = TRUE))
)
# Hiển thị thông tin về bộ dữ liệu "deptrai"
summary(deptrai)
## Salary yrs_since_phd yrs_service academic_rank
## Min. :-3.1012459 Min. :-2.246945 Min. :-3.264948 Assistant:35
## 1st Qu.:-0.5169933 1st Qu.:-0.780496 1st Qu.:-0.553335 Associate:34
## Median :-0.0572677 Median : 0.052960 Median :-0.040682 Full :31
## Mean :-0.0003183 Mean : 0.003702 Mean :-0.004434
## 3rd Qu.: 0.6786466 3rd Qu.: 0.791358 3rd Qu.: 0.711722
## Max. : 2.9018228 Max. : 1.851144 Max. : 2.149423
## discipline gender
## A:59 Female:58
## B:41 Male :42
##
##
##
##
# Tạo công thức động
formula_text <- bquote(Salary == beta[0] + beta[1] %*% yrs.since.phd + beta[2] %*% yrs.service + beta[3] %*% academic.rank + beta[4] %*% discipline + beta[5] %*% gender + epsilon)
# Hiển thị công thức trong bảng console
cat("Regression Formula:\n", as.character(formula_text), "\n")
## Regression Formula:
## == Salary beta[0] + beta[1] %*% yrs.since.phd + beta[2] %*% yrs.service + beta[3] %*% academic.rank + beta[4] %*% discipline + beta[5] %*% gender + epsilon
summary(data)
## rank discipline yrs.since.phd yrs.service
## Length:397 Length:397 Min. : 1.00 Min. : 0.00
## Class :character Class :character 1st Qu.:12.00 1st Qu.: 7.00
## Mode :character Mode :character Median :21.00 Median :16.00
## Mean :22.31 Mean :17.61
## 3rd Qu.:32.00 3rd Qu.:27.00
## Max. :56.00 Max. :60.00
## sex salary
## Length:397 Min. : 57800
## Class :character 1st Qu.: 91000
## Mode :character Median :107300
## Mean :113706
## 3rd Qu.:134185
## Max. :231545
boxplot(data$salary, main="Boxplot of Salary")
par(mfrow=c(2, 2)) # Set up a 2x2 grid for subplots
hist(data$yrs.since.phd, main="Histogram of Years Since PhD", xlab="Years Since PhD")
hist(data$yrs.service, main="Histogram of Years of Service", xlab="Years of Service")
hist(data$salary, main="Histogram of Salary", xlab="Salary")
barplot(table(data$rank), main="Bar Plot of Academic Rank", xlab="Rank", ylab="Frequency", col="lightblue")
barplot(table(data$discipline), main="Bar Plot of Discipline", xlab="Discipline", ylab="Frequency", col="lightgreen")
barplot(table(data$sex), main="Bar Plot of Gender", xlab="Gender", ylab="Frequency", col="lightpink")
# Scatterplot matrix
pairs(data[, c("yrs.since.phd", "yrs.service", "salary")], main="Scatterplot Matrix")
# Correlation matrix
cor_matrix <- cor(data[, c("yrs.since.phd", "yrs.service", "salary")])
print("Correlation Matrix:")
## [1] "Correlation Matrix:"
print(cor_matrix)
## yrs.since.phd yrs.service salary
## yrs.since.phd 1.0000000 0.9096491 0.4192311
## yrs.service 0.9096491 1.0000000 0.3347447
## salary 0.4192311 0.3347447 1.0000000
# Simple linear regression: Salary ~ yrs.since.phd
lm_yrs_since_phd <- lm(salary ~ yrs.since.phd, data =data)
print("Simple Linear Regression: Salary ~ yrs.since.phd")
## [1] "Simple Linear Regression: Salary ~ yrs.since.phd"
print(summary(lm_yrs_since_phd))
##
## Call:
## lm(formula = salary ~ yrs.since.phd, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -84171 -19432 -2858 16086 102383
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 91718.7 2765.8 33.162 <2e-16 ***
## yrs.since.phd 985.3 107.4 9.177 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27530 on 395 degrees of freedom
## Multiple R-squared: 0.1758, Adjusted R-squared: 0.1737
## F-statistic: 84.23 on 1 and 395 DF, p-value: < 2.2e-16
# Simple linear regression: Salary ~ yrs.service
lm_yrs_service <- lm(salary ~ yrs.service, data = data)
print("Simple Linear Regression: Salary ~ yrs.service")
## [1] "Simple Linear Regression: Salary ~ yrs.service"
print(summary(lm_yrs_service))
##
## Call:
## lm(formula = salary ~ yrs.service, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -81933 -20511 -3776 16417 101947
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 99974.7 2416.6 41.37 < 2e-16 ***
## yrs.service 779.6 110.4 7.06 7.53e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28580 on 395 degrees of freedom
## Multiple R-squared: 0.1121, Adjusted R-squared: 0.1098
## F-statistic: 49.85 on 1 and 395 DF, p-value: 7.529e-12
# Multiple linear regression
multiple_regression_model <- lm(salary ~ yrs.since.phd + yrs.service + rank + discipline + sex, data =data)
# Summary of the regression model
summary(multiple_regression_model)
##
## Call:
## lm(formula = salary ~ yrs.since.phd + yrs.service + rank + discipline +
## sex, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -65248 -13211 -1775 10384 99592
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 78862.8 4990.3 15.803 < 2e-16 ***
## yrs.since.phd 535.1 241.0 2.220 0.02698 *
## yrs.service -489.5 211.9 -2.310 0.02143 *
## rankAsstProf -12907.6 4145.3 -3.114 0.00198 **
## rankProf 32158.4 3540.6 9.083 < 2e-16 ***
## disciplineB 14417.6 2342.9 6.154 1.88e-09 ***
## sexMale 4783.5 3858.7 1.240 0.21584
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22540 on 390 degrees of freedom
## Multiple R-squared: 0.4547, Adjusted R-squared: 0.4463
## F-statistic: 54.2 on 6 and 390 DF, p-value: < 2.2e-16
This study comprehensively explored factors influencing faculty salaries in academia, using a diverse dataset including academic rank, discipline, years since obtaining a Ph.D., years of service, and gender. The analysis revealed a rich tapestry of faculty characteristics, ranging from academic ranks to disciplinary affiliations and gender diversity. Pairwise investigations, scatterplot matrices, and correlation matrices unveiled intriguing patterns and relationships among variables.
A multiple regression analysis synthesized the collective influence of various factors on faculty salaries, emphasizing the significance of academic rank, discipline, and years of service. Key findings highlighted the positive relationship between years of service and salary, the impact of academic rank and discipline on compensation differentials, and the role of gender in shaping salaries.
The study contributes valuable insights for informed decision-making in academia, providing a foundation for refining compensation structures to promote fairness, equity, and diversity. While acknowledging the study’s limitations, the findings underscore the ongoing need for gender-sensitive compensation practices and contribute to the dialogue on fostering an inclusive and equitable environment for faculty members in the evolving academic landscape.