1. Sample description

In the following section a sample description will be provided. For instance, the expected years people are working is .

df$salary <- as.numeric(df$salary)
df$years_exp <- as.numeric(df$years_exp)
df$gender <- as.factor(df$gender)

nrow(df)
## [1] 200
mean_salary <- mean(df$salary, na.rm=T)
mean_years <- mean(df$years_exp, na.rm=T)
table(mean_salary)
## mean_salary
## 108490.894848091 
##                1
table(mean_years)
## mean_years
## 15.6664792847936 
##                1
sd_salary <- sd(df$salary)
sd_years <- sd(df$years_exp)
summary(df$salary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30028   60076   93164  108491  150437  255381


2. Association between years and salary as scatterplot.

The plot describes the relationship between years of employment (independent variable) and salary (dependent variable). The association appears to be positive.

The R value tells us that 0.9396925

plot(df$years_exp, df$salary)

lm(df$years_exp ~ df$salary)
## 
## Call:
## lm(formula = df$years_exp ~ df$salary)
## 
## Coefficients:
## (Intercept)    df$salary  
##    1.232249     0.000133
cor(df$years_exp, df$salary, use = "complete.obs", method = "pearson")
## [1] 0.9129636
cor(df$years_exp, df$salary, use = "complete.obs", method = "spearman")
## [1] 0.9396925


3. Estimate salary by years of employment

A non-linear relationship can be observed between the salary and the years of employment. A linear regression model is being applied to check the association with the salary variable.

df$salary_model <- log(df$salary)
model <- lm(df$salary_model ~ df$year)
summary(model)
## 
## Call:
## lm(formula = df$salary_model ~ df$year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.74993 -0.11686  0.00666  0.11146  0.77461 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.436444   0.032197  324.14   <2e-16 ***
## df$year      0.063322   0.001795   35.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2218 on 198 degrees of freedom
## Multiple R-squared:  0.8628, Adjusted R-squared:  0.8621 
## F-statistic:  1245 on 1 and 198 DF,  p-value: < 2.2e-16


4. Interpretation

The model shows positive linear relationship between year and salary model.

R² = 86.3% represents a strong significance. With each additional year, the salary increases by 0.063 units, assuming other factors are constant.

5. (Voluntary) Gender effects

SOME TEXT HERE OR DELETE THIS SECTION.

0# replace this by two regression models, separated by gender. 
## [1] 0

SOME TEXT HERE TO INTERPRET YOUR MODEL OUTPUT.