R Markdown

1. Sample description

The following section is a sample description,WILL BE PROVIDED FOR INSTANCE,THE EXPECTED YEAR PEOPLE ARE WORKING is ’r mean salary df$salary

df = read.xlsx("C:/Users/User/Downloads/SalaryData.xlsx")
  

df$salary <- as.numeric(df$salary)
df$years_empl <- as.numeric(df$years_empl)
df$gender <- as.factor(df$gender)
num_row <- nrow(df)
cat("num_row=", num_row)
## num_row= 200
mean_salary <- round(mean(df$salary,na.rm = T),2)
cat("mean_salary=", mean_salary)
## mean_salary= 122303.4
mean_years <-round(mean(df$years_empl,na.rm = T),2)
cat("mean_years =", mean_years)
## mean_years = 15.73
table(df$gender)%>%
 kable() %>%
  kable_styling()
Var1 Freq
Female 100
Male 100
sd_salary <- sd(df$salary)
cat("sd_salary =", sd_salary)
## sd_salary = 79030.12
sd_years <- sd(df$years_empl)
cat("sd_years =", sd_years)
## sd_years = 9.035618
summary(df$salary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30203   54208   97496  122304  179447  331348

2. Association between years and salary as scatterplot.

the plot decribes the relationship between years of employment(independent variable) and salary(dependent variable).The association appears to be positive.As the years of employment increases,salary also increases.

ggplot(df,aes(df$years_empl,log(df$salary)))+geom_point()+geom_line()+theme_light()+labs(title = "Association between years of employment and salary", x="years of employment",y="salary")

#### 3. Estimate salary by years of employment

df$log_salary <- log(df$salary)

model <- lm(log_salary ~ years_empl, data = df)
summary(model)
## 
## Call:
## lm(formula = log_salary ~ years_empl, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77041 -0.12197 -0.00111  0.15234  0.41044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.382774   0.027501  377.54   <2e-16 ***
## years_empl   0.070998   0.001517   46.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared:  0.9171, Adjusted R-squared:  0.9167 
## F-statistic:  2191 on 1 and 198 DF,  p-value: < 2.2e-16
coef(model)
## (Intercept)  years_empl 
## 10.38277446  0.07099772
exp(coef(model)[2]) - 1
## years_empl 
## 0.07357878

4. Interpretation

This tells you how much salary increases (in percentage terms) for each additional year of employment. It means for each additional year of employment, salary increases by 0.07%

5. (Voluntary) Gender effects

df$log_salary <- log(df$salary)

male_model <- lm(df$log_salary ~df$years_empl, data = filter(df, df$gender == "Male"))
female_model <- lm(df$log_salary ~df$years_empl, data = filter(df,df$gender == "Female"))

summary(male_model)
## 
## Call:
## lm(formula = df$log_salary ~ df$years_empl, data = filter(df, 
##     df$gender == "Male"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77041 -0.12197 -0.00111  0.15234  0.41044 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   10.382774   0.027501  377.54   <2e-16 ***
## df$years_empl  0.070998   0.001517   46.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared:  0.9171, Adjusted R-squared:  0.9167 
## F-statistic:  2191 on 1 and 198 DF,  p-value: < 2.2e-16
summary(female_model)
## 
## Call:
## lm(formula = df$log_salary ~ df$years_empl, data = filter(df, 
##     df$gender == "Female"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77041 -0.12197 -0.00111  0.15234  0.41044 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   10.382774   0.027501  377.54   <2e-16 ***
## df$years_empl  0.070998   0.001517   46.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared:  0.9171, Adjusted R-squared:  0.9167 
## F-statistic:  2191 on 1 and 198 DF,  p-value: < 2.2e-16

4. Interpretation

this mean that percentage change in the salary per additional year of employment does not deffer base on gender,

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.