1. Sample description

This dataset contains salary (€), years of employment and gender for public service employees.

names(df) <- trimws(names(df))
df$salary <- as.numeric(df$salary)
df$years <- as.numeric(df$years_empl)
df$gender <- as.factor(df$gender)
nrow(df)
## [1] 200
table(df$gender)
## 
## Female   Male 
##    100    100
mean_salary <- mean(df$salary)
mean_years_empl <- mean(df$years)
sd_salary <- sd(df$salary)
sd_years <- sd(df$years_empl)
summary(df$salary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30203   54208   97496  122303  179447  331348

2. Association between years and salary as scatterplot.

The scatterplot below visualizes the relationship between years of employment (x-axis) and salary in euros (y-axis). It shows a clear upward trend, indicating a positive association: as years of employment increase, salary tends to rise. However, the spread of data points also suggests some variability in salaries for employees with the same length of employment. A linear trend line is added in green to highlight the general positive pattern.

# scatterplot of years (independent) vs Salary (dependent)
plot(df$years_empl,df$salary)

3. Estimate salary by years of employment

The graph shows a non-linear relationship between salary and years of employment. Salaries seem to rise more steeply in later years, suggesting an exponential rather than linear growth pattern. To account for this, we apply a logarithmic transformation to the salary variable. This transformation stabilizes the variance and linearizes the relationship, making it more appropriate for linear regression analysis. The model below estimates the log-transformed salary as a function of years of employment.

# replace this by your regression model. Use lm() and transform the dependent variable "salary" appropriately!
df$log_salary <- log(df$salary)

model <- lm(log_salary ~ years, data = df)
summary(model)
## 
## Call:
## lm(formula = log_salary ~ years, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77041 -0.12197 -0.00111  0.15234  0.41044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.382774   0.027501  377.54   <2e-16 ***
## years        0.070998   0.001517   46.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared:  0.9171, Adjusted R-squared:  0.9167 
## F-statistic:  2191 on 1 and 198 DF,  p-value: < 2.2e-16

4. Interpretation

The model reveals a clear and statistically significant relationship between years of employment and salary. The results suggest that salary increases with experience, and the log transformation indicates this growth is proportional - salaries tend to rise by a consistent percentage over time. The model“s strong fit supports this interpretation, with years of employment accounting for a substabtial share of the variance in salary across individuals.