The dataset contains salary information, years of employment, and gender of the public service employees.
# replace this by a basic sample description (by applying
# Clean column names to avoid issues with spaces
names(df) <- trimws(names(df))
# Convert data types
df$salary <- as.numeric(df$salary)
df$years <- as.numeric(df$years)
df$gender <- as.factor(df$gender)
# Number of rows (observations)
nrow(df)
## [1] 200
# Frequency table for gender
table(df$gender)
##
## Female Male
## 69 131
# Means
mean_salary <- mean(df$salary)
mean_years <- mean(df$years)
# Standard deviations
sd_salary <- sd(df$salary)
sd_years <- sd(df$years)
summary(df$salary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30028 60076 93164 108491 150437 255381
The scatterplot displays the correlation between years of employment and salary. A positive relationship is evident—salaries tend to rise with more years on the job. However, the trend may not be strictly linear, as salary increases appear to level off with greater experience.
# Scatterplot of Years (independent) vs Salary (dependent)
plot(x=df$years, y=df$salary)
abline(lm(salary ~ years, data = df), col = "blue", lwd = 2)
# replace this by plot(independent variable, dependent variable)
A non-linear relationship is observed between salary and years of employment. To make this relationship more linear, we apply a logarithmic transformation to the salary variable and then fit a linear regression model.
df$log_salary <- log(df$salary)
model <- lm(log_salary ~ years, data = df)
summary(model)
##
## Call:
## lm(formula = log_salary ~ years, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74993 -0.11686 0.00666 0.11146 0.77461
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.436444 0.032197 324.14 <2e-16 ***
## years 0.063322 0.001795 35.28 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2218 on 198 degrees of freedom
## Multiple R-squared: 0.8628, Adjusted R-squared: 0.8621
## F-statistic: 1245 on 1 and 198 DF, p-value: < 2.2e-16
The model indicates that salary tends to rise with additional years of education. Specifically, each extra year of education is associated with an increase in salary of approximately 0.0633. This relationship is statistically significant, highlighting education as a key predictor of salary.
0# replace this by two regression models, separated by gender.
## [1] 0
SOME TEXT HERE TO INTERPRET YOUR MODEL OUTPUT.