1. Sample description

This dataframe displays variables like the duration of employment (years of employment), the salary and gender of population. The mean salary of the entire population is 122304€ rmean(df$salary)

## [1] 122303.5
## [1] 15.73436

##    years_empl            salary          gender         
##  Min.   : 0.007167   Min.   : 30203   Length:200        
##  1st Qu.: 7.790195   1st Qu.: 54208   Class :character  
##  Median :16.191430   Median : 97496   Mode  :character  
##  Mean   :15.734362   Mean   :122304                     
##  3rd Qu.:22.908421   3rd Qu.:179447                     
##  Max.   :29.666752   Max.   :331348


2. Association between years and salary as scatterplot.

An almost linear association can be seen, however it assembles more to be an exponential function. In order for R to run, a linear association needs to be modelled.


3. Estimate salary by years of employment

By comparing Spearman’s (0.9608431) and Pearson’s (0.908204) coefficient a significant difference can be seen. This indicates a non-linear association Therefore the dependent variable is firstly linearised by using the regression model.

## [1] 0.9608431
## [1] 0.908204
## 
## Call:
## lm(formula = df$salary ~ df$years_empl)
## 
## Coefficients:
##   (Intercept)  df$years_empl  
##         -2684           7944


4. Interpretation

The second plot now shows a linear association between years of employment and salary. The different salaries lie around the regression line (blue line). Meaning, that with an increase in duration of employment in years, salary rises as well.