This dataset contains salary (ā¬), years of employment and gender for public service employees.
names(df) <- trimws(names(df))
df$salary <- as.numeric(df$salary)
df$years <- as.numeric(df$years_empl)
df$gender <- as.factor(df$gender)
nrow(df)
## [1] 200
table(df$gender)
##
## Female Male
## 100 100
mean_salary <- mean(df$salary)
mean_years_empl <- mean(df$years)
sd_salary <- sd(df$salary)
sd_years <- sd(df$years_empl)
summary(df$salary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30203 54208 97496 122303 179447 331348
The scatterplot below visualizes the relationship between years of employment (x-axis) and salary in euros (y-axis). It shows a clear upward trend, indicating a positive association: as years of employment increase, salary tends to rise. However, the spread of data points also suggests some variability in salaries for employees with the same length of employment. A linear trend line is added in green to highlight the general positive pattern.
# scatterplot of years (independent) vs Salary (dependent)
plot(df$years_empl,df$salary)
The graph shows a non-linear relationship between salary and years of employment. Salaries seem to rise more steeply in later years, suggesting an exponential rather than linear growth pattern. To account for this, we apply a logarithmic transformation to the salary variable. This transformation stabilizes the variance and linearizes the relationship, making it more appropriate for linear regression analysis. The model below estimates the log-transformed salary as a function of years of employment.
# replace this by your regression model. Use lm() and transform the dependent variable "salary" appropriately!
df$log_salary <- log(df$salary)
model <- lm(log_salary ~ years, data = df)
summary(model)
##
## Call:
## lm(formula = log_salary ~ years, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77041 -0.12197 -0.00111 0.15234 0.41044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.382774 0.027501 377.54 <2e-16 ***
## years 0.070998 0.001517 46.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared: 0.9171, Adjusted R-squared: 0.9167
## F-statistic: 2191 on 1 and 198 DF, p-value: < 2.2e-16
The model reveals a clear and statistically significant relationship between years of employment and salary. The results suggest that salary increases with experience, and the log transformation indicates this growth is proportional - salaries tend to rise by a consistent percentage over time. The model“s strong fit supports this interpretation, with years of employment accounting for a substabtial share of the variance in salary across individuals.