1. Sample description

In the dataset there are data about salary measured in €, years of employment and gender.

names(df) <- trimws(names(df))
df$salary <- as.numeric(df$salary)
df$years <- as.numeric(df$years_empl)
df$gender <- as.factor(df$gender)
nrow(df)
## [1] 200
table(df$gender)
## 
## Female   Male 
##    100    100
mean_salary <- mean(df$salary)
mean_years_empl <- mean(df$years)
sd_salary <- sd(df$salary)
sd_years <- sd(df$years_empl)
summary(df$salary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30203   54208   97496  122304  179447  331348


2. Association between years and salary as scatterplot.

The scatterplot illustrates the relationship between years of employment (x-axis) and salary in euros (y-axis). It reveals a clear upward trend, indicating a positive correlation: as employees gain more years of experience, their salaries generally increase. Despite this trend, there’s noticeable variability in salaries among individuals with the same number of years employed. A green linear trend line is included to emphasize the overall positive trajectory.

plot(x=df$years_empl, y=df$salary)
abline(lm(salary ~ years, data = df), col = "green", lwd = 2)


3. Estimate salary by years of employment

An initial visual inspection indicated that the relationship between salary and years of employment might be non-linear. Salaries appear to increase more rapidly in the later years, suggesting an exponential or multiplicative growth pattern rather than a steady, additive rise. To better capture this pattern, a logarithmic transformation is applied to the salary variable. This transformation helps to stabilize variance and linearize the relationship, making it more appropriate for linear regression analysis. The resulting model estimates the logarithm of salary based on years of employment.

df$log_salary <- log(df$salary)
model <- lm(log_salary ~ years, data = df)
summary(model)
## 
## Call:
## lm(formula = log_salary ~ years, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77041 -0.12197 -0.00111  0.15234  0.41044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.382774   0.027501  377.54   <2e-16 ***
## years        0.070998   0.001517   46.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared:  0.9171, Adjusted R-squared:  0.9167 
## F-statistic:  2191 on 1 and 198 DF,  p-value: < 2.2e-16


4. Interpretation

The model reveals a strong and statistically significant relationship between years of employment and salary. It indicates that as years of employment increase, salaries generally rise. Since the salary variable has been log-transformed, this suggests a proportional rather than absolute increase—implying that salaries grow at a relatively constant percentage rate over time. The model’s strong fit reinforces this conclusion, showing that years of employment account for a substantial share of the variation in salary among individuals.