The sample includes 100 female and 100 male (see table 1), with both genders equally represented. The years of employment and salary variables are both metric. The dependent variable salary has a mean of 1.2230345^{5} and the standard deviation 7.9030117^{4}. The independent variable years of employment has the mean 15.7343624 and standard deviation 9.0356184.
The plot showed a strong positive association between salary and years of experience, with a clear upward trend visible, but the line is curved, non-linear.
plot(df$years_empl, df$salary)
#### 3. Estimate salary by years of employment
To transform the dependent variable salary, the we took the logarithm base 10 of salary, which transformed the geometric salary growth into an arithmetic scale.
lm(df$salary ~ df$years_empl)
##
## Call:
## lm(formula = df$salary ~ df$years_empl)
##
## Coefficients:
## (Intercept) df$years_empl
## -2684 7944
lm(log(salary) ~ years_empl, data = df)
##
## Call:
## lm(formula = log(salary) ~ years_empl, data = df)
##
## Coefficients:
## (Intercept) years_empl
## 10.383 0.071
# Add linear regression line & plot
plot(df$years_empl, log10(df$salary),
main = "Linearized: Years of Experience vs log10(Salary)",
xlab = "Years of Experience", ylab = "log10(Salary)")
lm(log10(salary) ~ years_empl, data = df)
##
## Call:
## lm(formula = log10(salary) ~ years_empl, data = df)
##
## Coefficients:
## (Intercept) years_empl
## 4.50918 0.03083
abline(lm(df$years_empl ~ log10(df$salary))) # can´t draw the line
#### 4. Interpretation
The model showed that for a person with zero years of employment, the estimated salary is 7943.255 and the salary increases by -2684.255 with each additional year of employment.
The transformed model shows that with each extra year of employment, the log (base 10) of salary goes up by 0.071.