1. Sample description

The dataset contains salary information, years of employment, and gender of the public service employees.

# replace this by a basic sample description (by applying 
# Clean column names to avoid issues with spaces
names(df) <- trimws(names(df))

# Convert data types 
df$salary <- as.numeric(df$salary)
df$years <- as.numeric(df$years_empl)
df$gender <- as.factor(df$gender)

# Number of rows (observations)
nrow(df)
## [1] 200
# Frequency table for gender
table(df$gender)
## 
## Female   Male 
##    100    100
# Means
mean_salary <- mean(df$salary)
mean_years_empl <- mean(df$years)

# Standard deviations
sd_salary <- sd(df$salary)
sd_years <- sd(df$years_empl)
summary(df$salary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30203   54208   97496  122304  179447  331348


2. Association between years and salary as scatterplot.

The scatterplot displays the correlation between years of employment and salary. A positive relationship is evident—salaries tend to rise with more years on the job. However, the trend may not be strictly linear, as salary increases appear to level off with greater experience.

# Scatterplot of Years (independent) vs Salary (dependent)
plot(x=df$years_empl, y=df$salary)
abline(lm(salary ~ years, data = df), col = "blue", lwd = 2)

# replace this by plot(independent variable, dependent variable)


3. Estimate salary by years of employment

A non-linear relationship is observed between salary and years of employment. To make this relationship more linear, we apply a logarithmic transformation to the salary variable and then fit a linear regression model.

df$log_salary <- log(df$salary)

model <- lm(log_salary ~ years, data = df)
summary(model)
## 
## Call:
## lm(formula = log_salary ~ years, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77041 -0.12197 -0.00111  0.15234  0.41044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.382774   0.027501  377.54   <2e-16 ***
## years        0.070998   0.001517   46.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared:  0.9171, Adjusted R-squared:  0.9167 
## F-statistic:  2191 on 1 and 198 DF,  p-value: < 2.2e-16


4. Interpretation

The model suggests that salary increases with additional years of employment. Specifically, each extra year of employment is associated with an approximate 6.6% increase in salary, since the coefficient represents a change on the log scale. This relationship is statistically significant, indicating that experience is a meaningful predictor of earnings within the public service sector.