The dataset comprises 200 observations of public service employees, including three variables: annual salary in euros, years of employment, and gender. The summary statistics indicate a mean salary of approximately €108,491 with a standard deviation of €60,116. The average length of employment is about 15.67 years (SD ≈ 8.76). The gender distribution shows 131 male and 69 female employees.
0# replace this by a basic sample description (by applying row(), table(), means(), sd(), summary(), ... (whatever applies best)
## [1] 0
salary_data = read.xlsx("SalaryData.xlsx")
summary(salary_data)
## years_exp salary gender
## Min. : 0.007167 Min. : 30028 Length:200
## 1st Qu.: 7.749079 1st Qu.: 60076 Class :character
## Median :16.662402 Median : 93164 Mode :character
## Mean :15.666479 Mean :108491
## 3rd Qu.:22.823554 3rd Qu.:150437
## Max. :29.666752 Max. :255381
# Gender distribution
table(salary_data$gender)
##
## Female Male
## 69 131
# Means and standard deviations
mean(salary_data$salary)
## [1] 108490.9
sd(salary_data$salary)
## [1] 60116.36
mean(salary_data$years_exp)
## [1] 15.66648
sd(salary_data$years_exp)
## [1] 8.760716
## 2. Association between years and salary as scatterplot.
A visual inspection of the scatterplot reveals a positive association between years of employment and salary. However, the relationship does not appear strictly linear. The slope increases at a decreasing rate, suggesting that a transformation of the dependent variable may be necessary to model the association more accurately.
0 # replace this by plot(independent variable, dependent variable)
## [1] 0
plot(salary_data$years_exp, salary_data$salary,
main = "Association between Experience and Salary",
xlab = "Years of Experience",
ylab = "Salary (€)")
Due to the non-linear pattern observed in the scatterplot, a logarithmic transformation of the salary variable is applied to linearise the relationship. A linear regression model is then estimated, with log(salary) as the dependent variable and years of employment as the independent variable.
0# replace this by your regression model. Use lm() and transform the dependent variable "salary" appropriately!
## [1] 0
salary_data$log_salary = log(salary_data$salary)
# Estimate linear model: log(salary) ~ years of experience
lm(log_salary ~ years_exp, data = salary_data)
##
## Call:
## lm(formula = log_salary ~ years_exp, data = salary_data)
##
## Coefficients:
## (Intercept) years_exp
## 10.43644 0.06332
The linear regression model estimates the natural logarithm of salary as a function of years of employment. The intercept is approximately 10.44, and the slope coefficient is 0.063. This implies that, on average, each additional year of employment is associated with a 0.063 increase in log-salary. Exponentiating the coefficient (e^0.063≈1.065) indicates a mean annual salary increase of approximately 6.5%, assuming other factors remain constant.
To explore potential gender differences in the relationship between salary and years of employment, separate linear models are estimated for male and female employees. In both cases, the dependent variable is log-transformed salary, and the independent variable is years of employment.
0# replace this by two regression models, separated by gender.
## [1] 0
# Model for males
lm(log(salary) ~ years_exp, data = salary_data[salary_data$gender == "Male", ])
##
## Call:
## lm(formula = log(salary) ~ years_exp, data = salary_data[salary_data$gender ==
## "Male", ])
##
## Coefficients:
## (Intercept) years_exp
## 10.54381 0.06296
# Model for females
lm(log(salary) ~ years_exp, data = salary_data[salary_data$gender == "Female", ])
##
## Call:
## lm(formula = log(salary) ~ years_exp, data = salary_data[salary_data$gender ==
## "Female", ])
##
## Coefficients:
## (Intercept) years_exp
## 9.8912 0.0819
The results show that for male employees, the slope coefficient is approximately 0.063, corresponding to an estimated 6.3% increase in salary per year of employment. For female employees, the slope is higher at about 0.082, indicating an 8.5% annual increase. However, the intercept for female employees is lower, suggesting that initial salaries are comparatively lower for women, despite their steeper growth rate over time.