1. Sample description

The dataset comprises 200 observations of public service employees, including three variables: annual salary in euros, years of employment, and gender. The summary statistics indicate a mean salary of approximately €108,491 with a standard deviation of €60,116. The average length of employment is about 15.67 years (SD ≈ 8.76). The gender distribution shows 131 male and 69 female employees.

0# replace this by a basic sample description (by applying row(), table(), means(), sd(), summary(), ... (whatever applies best)
## [1] 0
salary_data = read.xlsx("SalaryData.xlsx")
summary(salary_data)
##    years_exp             salary          gender         
##  Min.   : 0.007167   Min.   : 30028   Length:200        
##  1st Qu.: 7.749079   1st Qu.: 60076   Class :character  
##  Median :16.662402   Median : 93164   Mode  :character  
##  Mean   :15.666479   Mean   :108491                     
##  3rd Qu.:22.823554   3rd Qu.:150437                     
##  Max.   :29.666752   Max.   :255381
# Gender distribution
table(salary_data$gender)
## 
## Female   Male 
##     69    131
# Means and standard deviations
mean(salary_data$salary)
## [1] 108490.9
sd(salary_data$salary)
## [1] 60116.36
mean(salary_data$years_exp)
## [1] 15.66648
sd(salary_data$years_exp)
## [1] 8.760716


## 2. Association between years and salary as scatterplot.

A visual inspection of the scatterplot reveals a positive association between years of employment and salary. However, the relationship does not appear strictly linear. The slope increases at a decreasing rate, suggesting that a transformation of the dependent variable may be necessary to model the association more accurately.

0 # replace this by plot(independent variable, dependent variable)
## [1] 0
plot(salary_data$years_exp, salary_data$salary,
     main = "Association between Experience and Salary",
     xlab = "Years of Experience",
     ylab = "Salary (€)")


3. Estimate salary by years of employment

Due to the non-linear pattern observed in the scatterplot, a logarithmic transformation of the salary variable is applied to linearise the relationship. A linear regression model is then estimated, with log(salary) as the dependent variable and years of employment as the independent variable.

0# replace this by your regression model. Use lm() and transform the dependent variable "salary" appropriately!
## [1] 0
salary_data$log_salary = log(salary_data$salary)

# Estimate linear model: log(salary) ~ years of experience
lm(log_salary ~ years_exp, data = salary_data)
## 
## Call:
## lm(formula = log_salary ~ years_exp, data = salary_data)
## 
## Coefficients:
## (Intercept)    years_exp  
##    10.43644      0.06332

4. Interpretation

The linear regression model estimates the natural logarithm of salary as a function of years of employment. The intercept is approximately 10.44, and the slope coefficient is 0.063. This implies that, on average, each additional year of employment is associated with a 0.063 increase in log-salary. Exponentiating the coefficient (e^0.063≈1.065) indicates a mean annual salary increase of approximately 6.5%, assuming other factors remain constant.

5. (Voluntary) Gender effects

To explore potential gender differences in the relationship between salary and years of employment, separate linear models are estimated for male and female employees. In both cases, the dependent variable is log-transformed salary, and the independent variable is years of employment.

0# replace this by two regression models, separated by gender. 
## [1] 0
# Model for males
lm(log(salary) ~ years_exp, data = salary_data[salary_data$gender == "Male", ])
## 
## Call:
## lm(formula = log(salary) ~ years_exp, data = salary_data[salary_data$gender == 
##     "Male", ])
## 
## Coefficients:
## (Intercept)    years_exp  
##    10.54381      0.06296
# Model for females
lm(log(salary) ~ years_exp, data = salary_data[salary_data$gender == "Female", ])
## 
## Call:
## lm(formula = log(salary) ~ years_exp, data = salary_data[salary_data$gender == 
##     "Female", ])
## 
## Coefficients:
## (Intercept)    years_exp  
##      9.8912       0.0819

The results show that for male employees, the slope coefficient is approximately 0.063, corresponding to an estimated 6.3% increase in salary per year of employment. For female employees, the slope is higher at about 0.082, indicating an 8.5% annual increase. However, the intercept for female employees is lower, suggesting that initial salaries are comparatively lower for women, despite their steeper growth rate over time.