1. Sample description

The dataset contains 200employees.

0# replace this by a basic sample description (by applying row(), table(), means(), sd(), summary(), ... (whatever applies best)

## [1] 0

#sample size
nrow(df)

## [1] 200

# overall summary
summary(df)

##    years_empl            salary          gender         
##  Min.   : 0.007167   Min.   : 30203   Length:200        
##  1st Qu.: 7.790195   1st Qu.: 54208   Class :character  
##  Median :16.191430   Median : 97496   Mode  :character  
##  Mean   :15.734362   Mean   :122304                     
##  3rd Qu.:22.908421   3rd Qu.:179447                     
##  Max.   :29.666752   Max.   :331348

We have 200 employees in the sample.

Years of employment:

Min = 0.007 Max = 29.667 Mean = 15.734 SD = 9.036

2. Association between years and salary as scatterplot.

Figure 1 shows that salary rises with years of employment, but not at a constant rate.
Beyond about 15 years,it looks like salaries start to climb much more steeply, reaching over €200 000 by 25–30 years of employment.

plot(df$years_empl, df$salary)

3. Estimate salary by years of employment

Next, we fit a simple linear regression of salary on years_empl.

plot(df$years_empl, df$salary)

abline(lm(df$salary ~ df$years_empl))

lm(salary ~ years_empl, data = df)

## 
## Call:
## lm(formula = salary ~ years_empl, data = df)
## 
## Coefficients:
## (Intercept)   years_empl  
##       -2684         7944

4. Interpretation

The model shows that each extra year on the job yields about 7,944€ more. est. salary = -2684 + 7944 * years_empl

Homework 3

2025-05-22

1. Sample description

Years of employment:

2. Association between years and salary as scatterplot.

3. Estimate salary by years of employment

4. Interpretation