1. Sample description

The dataset contains three variables: ‘Years of Employment’, ‘Gender’, and ‘Salary’. A basic descriptive analysis of the sample is presented below.

## 
## Female   Male 
##    100    100

Gender is evenly represented in the sample, with an equal number of male and female participants.

##     Group Mean_Salary
## 1 Overall    122303.5
## 2  Female    109140.8
## 3    Male    135466.1
## [1] "Compared to females, who have a mean salary of 109140.76 men have a higher salary of 135466.14"
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##  0.007167  7.790195 16.191430 15.734362 22.908421 29.666752

Years of employment in the sample range from 0 to approximately 29.67 years, with a mean of 15.73 years.

2. Association between years and salary as scatterplot.

The scatterplot illustrates the relationship between ‘Years of Employment’ and ‘Salary’. A clear upward trend can be observed: as years of employment increase, salary tends to rise as well. This indicates a strong positive correlation between the two variables. However, the trend is not strictly linear—salary growth appears to accelerate more noticeably after approximately 15 to 20 years of employment.

plot(df$years_empl, df$salary,
     xlab = "Years of Employment",
     ylab = "Salary (€)",
     main = "Relationship between Years of Employment and Salary")


3. Estimate salary by years of employment

A logarithmic transformation is applied to the ‘Salary’ variable in order to linearize its relationship with years of employment.

model = lm(log(salary) ~ years_empl, data = df)
summary(model)
## 
## Call:
## lm(formula = log(salary) ~ years_empl, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77041 -0.12197 -0.00111  0.15234  0.41044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.382774   0.027501  377.54   <2e-16 ***
## years_empl   0.070998   0.001517   46.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared:  0.9171, Adjusted R-squared:  0.9167 
## F-statistic:  2191 on 1 and 198 DF,  p-value: < 2.2e-16


4. Interpretation

When years of employment is 0—indicating someone just starting their job—the expected log-salary is approximately 10.38. Each additional year of employment increases the log-salary by 0.071. Since the model uses a logarithmic transformation of salary to linearize the relationship, this coefficient can be interpreted in percentage terms:

Specifically, an increase of 0.071 in log-salary corresponds to a salary increase of about 7.4% per additional year of employment, as exp(0.071) ≈ 1.074. This log-linear approach is useful because salary often grows exponentially with experience, and using the logarithm makes the relationship more suitable for linear regression.