1. Sample description

The dataset consist of three variables which are “Years of employment”, “gender” and “salary”. To analyze these samples, simple sample description is used as follows:

## 
## Female   Male 
##    100    100

The sample size of gender is evenly distributed.

##     Group Mean_Salary
## 1 Overall    122303.5
## 2  Female    109140.8
## 3    Male    135466.1
## [1] "Compared to females, who have a mean salary of 109140.76 men have a higher salary of 135466.14"
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##  0.007167  7.790195 16.191430 15.734362 22.908421 29.666752

The minimum years of employment is 0 and the maximum years of employment is around 29.67 years. The mean is 15.73 years.

2. Association between years and salary as scatterplot.

The scatterplot shows the relation between the variables “years of emmployment” and “salary”. There is a clear upward trend. If years_empl increases, salary increases too. This suggests a strong positive correlation between these two variables. However, the trend is not perfectly linear, instead, salary accelerates more steeply after 15-20 years of employment.

plot(df$years_empl, df$salary)


3. Estimate salary by years of employment

To make the relationship between salary and years of employment linear, a log transformation is applied to the variable “salary”.

model = lm(log(salary) ~ years_empl, data = df)
summary(model)
## 
## Call:
## lm(formula = log(salary) ~ years_empl, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77041 -0.12197 -0.00111  0.15234  0.41044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.382774   0.027501  377.54   <2e-16 ***
## years_empl   0.070998   0.001517   46.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared:  0.9171, Adjusted R-squared:  0.9167 
## F-statistic:  2191 on 1 and 198 DF,  p-value: < 2.2e-16


4. Interpretation

If years of employment is 0, which means someone just started their job, the expected log-salary is ~ 10.3828. Each additional year of employment increases the log-salary by 0.071 which is a percent increase of around 7.4%.