In the dataset there are data about salary measured in €, years of employment and gender.
names(df) <- trimws(names(df))
df$salary <- as.numeric(df$salary)
df$years <- as.numeric(df$years_empl)
df$gender <- as.factor(df$gender)
nrow(df)
## [1] 200
table(df$gender)
##
## Female Male
## 100 100
mean_salary <- mean(df$salary)
mean_years_empl <- mean(df$years)
sd_salary <- sd(df$salary)
sd_years <- sd(df$years_empl)
summary(df$salary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30203 54208 97496 122304 179447 331348
The scatterplot illustrates the relationship between years of employment (x-axis) and salary in euros (y-axis). It reveals a clear upward trend, indicating a positive correlation: as employees gain more years of experience, their salaries generally increase. Despite this trend, there’s noticeable variability in salaries among individuals with the same number of years employed. A green linear trend line is included to emphasize the overall positive trajectory.
plot(x=df$years_empl, y=df$salary)
abline(lm(salary ~ years, data = df), col = "green", lwd = 2)
An initial visual inspection indicated that the relationship between salary and years of employment might be non-linear. Salaries appear to increase more rapidly in the later years, suggesting an exponential or multiplicative growth pattern rather than a steady, additive rise. To better capture this pattern, a logarithmic transformation is applied to the salary variable. This transformation helps to stabilize variance and linearize the relationship, making it more appropriate for linear regression analysis. The resulting model estimates the logarithm of salary based on years of employment.
df$log_salary <- log(df$salary)
model <- lm(log_salary ~ years, data = df)
summary(model)
##
## Call:
## lm(formula = log_salary ~ years, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77041 -0.12197 -0.00111 0.15234 0.41044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.382774 0.027501 377.54 <2e-16 ***
## years 0.070998 0.001517 46.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared: 0.9171, Adjusted R-squared: 0.9167
## F-statistic: 2191 on 1 and 198 DF, p-value: < 2.2e-16
The model reveals a strong and statistically significant relationship between years of employment and salary. It indicates that as years of employment increase, salaries generally rise. Since the salary variable has been log-transformed, this suggests a proportional rather than absolute increase—implying that salaries grow at a relatively constant percentage rate over time. The model’s strong fit reinforces this conclusion, showing that years of employment account for a substantial share of the variation in salary among individuals.