1. Sample description

This dataset contains salary data for 200 public service employees, including their years of employment and gender. The sample is evenly divided by gender, with 100 males and 100 females. The average salary is approximately €122,303, and the average length of employment is about 15.7 years.

summary(df)
##    years_empl            salary          gender         
##  Min.   : 0.007167   Min.   : 30203   Length:200        
##  1st Qu.: 7.790195   1st Qu.: 54208   Class :character  
##  Median :16.191430   Median : 97496   Mode  :character  
##  Mean   :15.734362   Mean   :122304                     
##  3rd Qu.:22.908421   3rd Qu.:179447                     
##  Max.   :29.666752   Max.   :331348
table(df$gender)
## 
## Female   Male 
##    100    100
sd(df$salary)
## [1] 79030.12
sd(df$years_empl)
## [1] 9.035618

The standard deviation of salary (~€79,030) and employment years (~9) indicates substantial variability in both income and tenure.

2. Association between years and salary as scatterplot.

The scatterplot below shows a strong positive association between years of employment and salary. However, the relationship is clearly non-linear, with salaries increasing more steeply after certain years of employment. This suggests that a transformation might help linearize the data.

ggplot(df, aes(x = years_empl, y = salary)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  labs(title = "Scatterplot: Years of Employment vs Salary",
       x = "Years of Employment",
       y = "Salary (€)") +
  theme_minimal()

3. Estimate salary by years of employment

We apply a logarithmic transformation to the salary variable to better model the exponential trend observed.

model <- lm(log(salary) ~ years_empl, data = df)
summary(model)
## 
## Call:
## lm(formula = log(salary) ~ years_empl, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77041 -0.12197 -0.00111  0.15234  0.41044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.382774   0.027501  377.54   <2e-16 ***
## years_empl   0.070998   0.001517   46.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared:  0.9171, Adjusted R-squared:  0.9167 
## F-statistic:  2191 on 1 and 198 DF,  p-value: < 2.2e-16

4. Interpretation

The regression model is statistically significant and shows a positive relationship between years of employment and the log of salary. This suggests that salary increases at an accelerating rate with more years of service.

To interpret the slope of the model:

increase_pct <- round((exp(coef(model)[2]) - 1) * 100, 2)
increase_pct
## years_empl 
##       7.36

Each additional year of employment is associated with an approximate r increase_pct% increase in salary. This reflects a compounding salary growth structure typical in many public service pay systems.

5. (Voluntary) Gender effects

We estimate separate regression models for males and females to compare how salary growth differs by gender.

model_male <- lm(log(salary) ~ years_empl, data = filter(df, gender == "Male"))
model_female <- lm(log(salary) ~ years_empl, data = filter(df, gender == "Female"))

summary(model_male)
## 
## Call:
## lm(formula = log(salary) ~ years_empl, data = filter(df, gender == 
##     "Male"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56063 -0.08644  0.00333  0.06960  0.38121 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.380951   0.030790  337.15   <2e-16 ***
## years_empl   0.076372   0.001698   44.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.153 on 98 degrees of freedom
## Multiple R-squared:  0.9538, Adjusted R-squared:  0.9533 
## F-statistic:  2023 on 1 and 98 DF,  p-value: < 2.2e-16
summary(model_female)
## 
## Call:
## lm(formula = log(salary) ~ years_empl, data = filter(df, gender == 
##     "Female"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.71847 -0.07628  0.01426  0.10656  0.40887 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.384598   0.036725   282.8   <2e-16 ***
## years_empl   0.065623   0.002025    32.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1825 on 98 degrees of freedom
## Multiple R-squared:  0.9146, Adjusted R-squared:  0.9138 
## F-statistic:  1050 on 1 and 98 DF,  p-value: < 2.2e-16

Both male and female employees show a positive salary trajectory over time. If the slope for males is higher, it suggests men may receive a slightly greater salary increase per year of employment. For example, if men have a slope of 0.06 and women 0.05, this translates to approximately 6% vs 5% annual salary increases, respectively. Such a difference could point to disparities in advancement or role responsibilities and warrants further investigation.