This dataframe displays variables like the duration of employment
(years of employment), the salary and gender of population. The mean
salary of the entire population is 122304€
rmean(df$salary)
## [1] 122303.5
## [1] 15.73436
## years_empl salary gender
## Min. : 0.007167 Min. : 30203 Length:200
## 1st Qu.: 7.790195 1st Qu.: 54208 Class :character
## Median :16.191430 Median : 97496 Mode :character
## Mean :15.734362 Mean :122304
## 3rd Qu.:22.908421 3rd Qu.:179447
## Max. :29.666752 Max. :331348
An almost linear association can be seen, however it assembles more to be an exponential function. In order for R to run, a linear association needs to be modelled.
By comparing Spearman’s (0.9608431) and Pearson’s (0.908204) coefficient a significant difference can be seen. This indicates a non-linear association Therefore the dependent variable is firstly linearised by using the regression model.
## [1] 0.9608431
## [1] 0.908204
##
## Call:
## lm(formula = df$salary ~ df$years_empl)
##
## Coefficients:
## (Intercept) df$years_empl
## -2684 7944
The second plot now shows a linear association between years of employment and salary. The different salaries lie around the regression line (blue line). Meaning, that with an increase in duration of employment in years, salary rises as well.