The dataset comprises 200 public service employees with information on annual salary, years of employment, and gender. The mean salary is approximately €161,409 with a standard deviation of €65,877. The mean employment duration is 15.67 years (SD ≈ 8.76). The sample includes both male and female employees.
salary_data = read.xlsx("SalaryData.xlsx")
summary(salary_data)
## years_empl salary gender
## Min. : 0.007167 Min. : 30203 Length:200
## 1st Qu.: 7.790195 1st Qu.: 54208 Class :character
## Median :16.191430 Median : 97496 Mode :character
## Mean :15.734362 Mean :122304
## 3rd Qu.:22.908421 3rd Qu.:179447
## Max. :29.666752 Max. :331348
# Gender distribution
table(salary_data$gender)
##
## Female Male
## 100 100
# Means and standard deviations of salary and years of employment
mean(salary_data$salary)
## [1] 122303.5
sd(salary_data$salary)
## [1] 79030.12
mean(salary_data$years_empl)
## [1] 15.73436
sd(salary_data$years_empl)
## [1] 9.035618
A visual inspection of the scatterplot reveals a positive association between years of employment and salary. The trend appears non-linear, indicating diminishing returns to additional years of service.
plot(salary_data$years_empl, salary_data$salary,
main = "Association between Experience and Salary",
xlab = "Years of Employment",
ylab = "Salary (€)",
)
To linearise the relationship, the dependent variable (salary) is transformed using the natural logarithm. A linear model is then estimated to quantify the association.
salary_data$log_salary = log(salary_data$salary)
# linear model
lm(log_salary ~ years_empl, data = salary_data)
##
## Call:
## lm(formula = log_salary ~ years_empl, data = salary_data)
##
## Coefficients:
## (Intercept) years_empl
## 10.383 0.071
The estimated linear regression model uses the natural logarithm of salary as the dependent variable and years of employment as the independent variable. The intercept represents the expected log-salary for an individual with zero years of employment, while the slope coefficient indicates the expected change in log-salary for each additional year of employment.
Exponentiating the slope coefficient allows for interpretation on the original salary scale. For instance, a slope of 0.06 implies that each additional year of employment is associated with an approximate 6% increase in salary, assuming other factors remain constant. This interpretation reflects a multiplicative effect due to the logarithmic transformation of the dependent variable.
The coefficient of determination (R²) indicates the proportion of variance in log-salary explained by years of employment. A high R² value suggests that employment duration is a strong predictor of salary.
Separate regression models estimated for male and female employees reveal differences in both intercepts and slope coefficients. The slope for female employees is higher than that for male employees, indicating a steeper salary growth rate per year of employment. However, the intercept for females is lower, suggesting that initial salary levels are lower for women compared to men.
These results imply that while women may experience more rapid salary growth over time, they begin their careers at a lower pay level relative to their male counterparts. This pattern highlights the presence of gender-based disparities in salary structure within the public service sector.
# Model for males
lm(log(salary) ~ years_empl, data = salary_data[salary_data$gender == "Male", ])
##
## Call:
## lm(formula = log(salary) ~ years_empl, data = salary_data[salary_data$gender ==
## "Male", ])
##
## Coefficients:
## (Intercept) years_empl
## 10.38095 0.07637
# Model for females
lm(log(salary) ~ years_empl, data = salary_data[salary_data$gender == "Female", ])
##
## Call:
## lm(formula = log(salary) ~ years_empl, data = salary_data[salary_data$gender ==
## "Female", ])
##
## Coefficients:
## (Intercept) years_empl
## 10.38460 0.06562
The average salary in the sample is €161,409 (SD ≈ €65,877), with a mean employment duration of 15.67 years. The sample includes 131 male and 69 female employees.
A scatterplot indicates a positive but nonlinear relationship between salary and years of employment. After applying a logarithmic transformation to salary, a linear regression model shows that each additional year of employment increases salary by approximately 6.5% (β ≈ 0.063, R² ≈ 0.86).
Separate models reveal a higher slope for female employees (β ≈ 0.082) compared to males (β ≈ 0.063), indicating faster salary growth. However, females have a lower intercept, suggesting lower starting salaries.