1. Sample description

First we describe our dataset.

## [1] 200

We have 200 cases - now let’s check the gender distribution

## 
## Female   Male 
##    100    100

Next, we will analyze mean, minimum, median and maximum of both variables.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30203   54208   97496  122304  179447  331348
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##  0.007167  7.790195 16.191430 15.734362 22.908421 29.666752

Lastly, we check standard deviations of both variables.

## [1] 79030.12
## [1] 9.035618


2. Association between years and salary as scatterplot.

Next, we will visualize the relation between years of experience and salary with a scatterplot.

The scatterplot shows a likely nonlinear relation between the two variables.


3. Estimate salary by years of employment

To linearize the relation, a log transformation will be implemented for the salary variable. But first we check again for a possible linear relation with the pearson and spearman factors.

pearson:

## [1] 0.908204

spearman:

## [1] 0.9608431

Both pearson and spearman being close to 1 actually speak for a linear relationship, not necessarily calling for a log transformation. We will still implement it to see if residuals can be improved.

Scatterplot with log function and a regression line:

## 
## Call:
## lm(formula = log(df$salary) ~ df$years_empl)
## 
## Coefficients:
##   (Intercept)  df$years_empl  
##        10.383          0.071


4. Interpretation

The model shows a clear linear relationship between years of employment and salary. The coefficient is positive (0,071) and the intercept is 10,38. This means that the more years someone is employed, the more salary they earn. More accurately, it suggests that for every additional year of employment, employees earn exp(0.071) - 1) * 100 = 7,36% more.