First we describe our dataset.
## [1] 200
We have 200 cases - now let’s check the gender distribution
##
## Female Male
## 100 100
Next, we will analyze mean, minimum, median and maximum of both variables.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30203 54208 97496 122304 179447 331348
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.007167 7.790195 16.191430 15.734362 22.908421 29.666752
Lastly, we check standard deviations of both variables.
## [1] 79030.12
## [1] 9.035618
Next, we will visualize the relation between years of experience and salary with a scatterplot.
The scatterplot shows a likely nonlinear relation between the two variables.
To linearize the relation, a log transformation will be implemented for the salary variable. But first we check again for a possible linear relation with the pearson and spearman factors.
pearson:
## [1] 0.908204
spearman:
## [1] 0.9608431
Both pearson and spearman being close to 1 actually speak for a linear relationship, not necessarily calling for a log transformation. We will still implement it to see if residuals can be improved.
Scatterplot with log function and a regression line:
##
## Call:
## lm(formula = log(df$salary) ~ df$years_empl)
##
## Coefficients:
## (Intercept) df$years_empl
## 10.383 0.071
The model shows a clear linear relationship between years of employment and salary. The coefficient is positive (0,071) and the intercept is 10,38. This means that the more years someone is employed, the more salary they earn. More accurately, it suggests that for every additional year of employment, employees earn exp(0.071) - 1) * 100 = 7,36% more.