This dataset consists of data on salary, years of employment and gender of public service employees.
# replace this by a basic sample description (by applying
# Clean column names to avoid issues with spaces
names(df) <- trimws(names(df))
# Convert data types
df$salary <- as.numeric(df$salary)
df$years <- as.numeric(df$years)
df$gender <- as.factor(df$gender)
# Number of rows (observations)
nrow(df)
## [1] 200
# Frequency table for gender
table(df$gender)
##
## Female Male
## 69 131
# Means
mean_salary <- mean(df$salary)
mean_years <- mean(df$years)
# Standard deviations
sd_salary <- sd(df$salary)
sd_years <- sd(df$years)
summary(df$salary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30028 60076 93164 108491 150437 255381
The scatterplot below shows the relationship between years of employment and salary. There appears to be a positive association — as the number of years increases, salary also tends to increase. The pattern suggests a possible non-linear relationship, with salary growth flattening at higher experience levels.
# Scatterplot of Years (independent) vs Salary (dependent)
plot(x=df$years, y=df$salary)
abline(lm(salary ~ years, data = df), col = "green", lwd = 2)
# replace this by plot(independent variable, dependent variable)
We observe a non-linear relationship between salary and years of employment. To linearize the association, we apply a logarithmic transformation to the salary variable and fit a linear regression model.
df$log_salary <- log(df$salary)
model <- lm(log_salary ~ years, data = df)
summary(model)
##
## Call:
## lm(formula = log_salary ~ years, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74993 -0.11686 0.00666 0.11146 0.77461
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.436444 0.032197 324.14 <2e-16 ***
## years 0.063322 0.001795 35.28 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2218 on 198 degrees of freedom
## Multiple R-squared: 0.8628, Adjusted R-squared: 0.8621
## F-statistic: 1245 on 1 and 198 DF, p-value: < 2.2e-16
The model shows that salary increases with more years of education. Each extra year of education raises salary by about 0.0633. The relationship is statistically significant, meaning education is an important factor in predicting salary.