The dataset contains 200employees.
0# replace this by a basic sample description (by applying row(), table(), means(), sd(), summary(), ... (whatever applies best)
## [1] 0
#sample size
nrow(df)
## [1] 200
# overall summary
summary(df)
## years_empl salary gender
## Min. : 0.007167 Min. : 30203 Length:200
## 1st Qu.: 7.790195 1st Qu.: 54208 Class :character
## Median :16.191430 Median : 97496 Mode :character
## Mean :15.734362 Mean :122304
## 3rd Qu.:22.908421 3rd Qu.:179447
## Max. :29.666752 Max. :331348
We have 200 employees in the sample.
Min = 0.007 Max = 29.667 Mean = 15.734 SD = 9.036
Figure 1 shows that salary rises with years of employment, but not at
a constant rate.
Beyond about 15 years,it looks like salaries start to climb much more
steeply, reaching over €200 000 by 25–30 years of employment.
plot(df$years_empl, df$salary)
Next, we fit a simple linear regression of salary on years_empl.
plot(df$years_empl, df$salary)
abline(lm(df$salary ~ df$years_empl))
lm(salary ~ years_empl, data = df)
##
## Call:
## lm(formula = salary ~ years_empl, data = df)
##
## Coefficients:
## (Intercept) years_empl
## -2684 7944
The model shows that each extra year on the job yields about 7,944€ more. est. salary = -2684 + 7944 * years_empl