The dataset contains three variables: ‘Years of Employment’, ‘Gender’, and ‘Salary’. A basic descriptive analysis of the sample is presented below.
##
## Female Male
## 100 100
Gender is evenly represented in the sample, with an equal number of male and female participants.
## Group Mean_Salary
## 1 Overall 122303.5
## 2 Female 109140.8
## 3 Male 135466.1
## [1] "Compared to females, who have a mean salary of 109140.76 men have a higher salary of 135466.14"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.007167 7.790195 16.191430 15.734362 22.908421 29.666752
Years of employment in the sample range from 0 to approximately 29.67 years, with a mean of 15.73 years.
The scatterplot illustrates the relationship between ‘Years of Employment’ and ‘Salary’. A clear upward trend can be observed: as years of employment increase, salary tends to rise as well. This indicates a strong positive correlation between the two variables. However, the trend is not strictly linear—salary growth appears to accelerate more noticeably after approximately 15 to 20 years of employment.
plot(df$years_empl, df$salary,
xlab = "Years of Employment",
ylab = "Salary (€)",
main = "Relationship between Years of Employment and Salary")
A logarithmic transformation is applied to the ‘Salary’ variable in order to linearize its relationship with years of employment.
model = lm(log(salary) ~ years_empl, data = df)
summary(model)
##
## Call:
## lm(formula = log(salary) ~ years_empl, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77041 -0.12197 -0.00111 0.15234 0.41044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.382774 0.027501 377.54 <2e-16 ***
## years_empl 0.070998 0.001517 46.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared: 0.9171, Adjusted R-squared: 0.9167
## F-statistic: 2191 on 1 and 198 DF, p-value: < 2.2e-16
When years of employment is 0—indicating someone just starting their job—the expected log-salary is approximately 10.38. Each additional year of employment increases the log-salary by 0.071. Since the model uses a logarithmic transformation of salary to linearize the relationship, this coefficient can be interpreted in percentage terms:
Specifically, an increase of 0.071 in log-salary corresponds to a salary increase of about 7.4% per additional year of employment, as exp(0.071) ≈ 1.074. This log-linear approach is useful because salary often grows exponentially with experience, and using the logarithm makes the relationship more suitable for linear regression.