This dataset contains salary data for 200 public service employees, including their years of employment and gender. The sample is evenly divided by gender, with 100 males and 100 females. The average salary is approximately €122,303, and the average length of employment is about 15.7 years.
summary(df)
## years_empl salary gender
## Min. : 0.007167 Min. : 30203 Length:200
## 1st Qu.: 7.790195 1st Qu.: 54208 Class :character
## Median :16.191430 Median : 97496 Mode :character
## Mean :15.734362 Mean :122304
## 3rd Qu.:22.908421 3rd Qu.:179447
## Max. :29.666752 Max. :331348
table(df$gender)
##
## Female Male
## 100 100
sd(df$salary)
## [1] 79030.12
sd(df$years_empl)
## [1] 9.035618
The standard deviation of salary (~€79,030) and employment years (~9) indicates substantial variability in both income and tenure.
The scatterplot below shows a strong positive association between years of employment and salary. However, the relationship is clearly non-linear, with salaries increasing more steeply after certain years of employment. This suggests that a transformation might help linearize the data.
ggplot(df, aes(x = years_empl, y = salary)) +
geom_point(alpha = 0.6, color = "steelblue") +
labs(title = "Scatterplot: Years of Employment vs Salary",
x = "Years of Employment",
y = "Salary (€)") +
theme_minimal()
We apply a logarithmic transformation to the salary variable to better model the exponential trend observed.
model <- lm(log(salary) ~ years_empl, data = df)
summary(model)
##
## Call:
## lm(formula = log(salary) ~ years_empl, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77041 -0.12197 -0.00111 0.15234 0.41044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.382774 0.027501 377.54 <2e-16 ***
## years_empl 0.070998 0.001517 46.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared: 0.9171, Adjusted R-squared: 0.9167
## F-statistic: 2191 on 1 and 198 DF, p-value: < 2.2e-16
The regression model is statistically significant and shows a positive relationship between years of employment and the log of salary. This suggests that salary increases at an accelerating rate with more years of service.
To interpret the slope of the model:
increase_pct <- round((exp(coef(model)[2]) - 1) * 100, 2)
increase_pct
## years_empl
## 7.36
Each additional year of employment is associated with an approximate r increase_pct% increase in salary. This reflects a compounding salary growth structure typical in many public service pay systems.
We estimate separate regression models for males and females to compare how salary growth differs by gender.
model_male <- lm(log(salary) ~ years_empl, data = filter(df, gender == "Male"))
model_female <- lm(log(salary) ~ years_empl, data = filter(df, gender == "Female"))
summary(model_male)
##
## Call:
## lm(formula = log(salary) ~ years_empl, data = filter(df, gender ==
## "Male"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.56063 -0.08644 0.00333 0.06960 0.38121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.380951 0.030790 337.15 <2e-16 ***
## years_empl 0.076372 0.001698 44.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.153 on 98 degrees of freedom
## Multiple R-squared: 0.9538, Adjusted R-squared: 0.9533
## F-statistic: 2023 on 1 and 98 DF, p-value: < 2.2e-16
summary(model_female)
##
## Call:
## lm(formula = log(salary) ~ years_empl, data = filter(df, gender ==
## "Female"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.71847 -0.07628 0.01426 0.10656 0.40887
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.384598 0.036725 282.8 <2e-16 ***
## years_empl 0.065623 0.002025 32.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1825 on 98 degrees of freedom
## Multiple R-squared: 0.9146, Adjusted R-squared: 0.9138
## F-statistic: 1050 on 1 and 98 DF, p-value: < 2.2e-16
Both male and female employees show a positive salary trajectory over time. If the slope for males is higher, it suggests men may receive a slightly greater salary increase per year of employment. For example, if men have a slope of 0.06 and women 0.05, this translates to approximately 6% vs 5% annual salary increases, respectively. Such a difference could point to disparities in advancement or role responsibilities and warrants further investigation.