The following section is a sample description,WILL BE PROVIDED FOR
INSTANCE,THE EXPECTED YEAR PEOPLE ARE WORKING is ’r mean salary
df$salary
df = read.xlsx("C:/Users/User/Downloads/SalaryData.xlsx")
df$salary <- as.numeric(df$salary)
df$years_empl <- as.numeric(df$years_empl)
df$gender <- as.factor(df$gender)
num_row <- nrow(df)
cat("num_row=", num_row)
## num_row= 200
mean_salary <- round(mean(df$salary,na.rm = T),2)
cat("mean_salary=", mean_salary)
## mean_salary= 122303.4
mean_years <-round(mean(df$years_empl,na.rm = T),2)
cat("mean_years =", mean_years)
## mean_years = 15.73
table(df$gender)%>%
kable() %>%
kable_styling()
| Var1 | Freq |
|---|---|
| Female | 100 |
| Male | 100 |
sd_salary <- sd(df$salary)
cat("sd_salary =", sd_salary)
## sd_salary = 79030.12
sd_years <- sd(df$years_empl)
cat("sd_years =", sd_years)
## sd_years = 9.035618
summary(df$salary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30203 54208 97496 122304 179447 331348
the plot decribes the relationship between years of employment(independent variable) and salary(dependent variable).The association appears to be positive.As the years of employment increases,salary also increases.
ggplot(df,aes(df$years_empl,log(df$salary)))+geom_point()+geom_line()+theme_light()+labs(title = "Association between years of employment and salary", x="years of employment",y="salary")
#### 3. Estimate salary by years of employment
df$log_salary <- log(df$salary)
model <- lm(log_salary ~ years_empl, data = df)
summary(model)
##
## Call:
## lm(formula = log_salary ~ years_empl, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77041 -0.12197 -0.00111 0.15234 0.41044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.382774 0.027501 377.54 <2e-16 ***
## years_empl 0.070998 0.001517 46.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared: 0.9171, Adjusted R-squared: 0.9167
## F-statistic: 2191 on 1 and 198 DF, p-value: < 2.2e-16
coef(model)
## (Intercept) years_empl
## 10.38277446 0.07099772
exp(coef(model)[2]) - 1
## years_empl
## 0.07357878
This tells you how much salary increases (in percentage terms) for each additional year of employment. It means for each additional year of employment, salary increases by 0.07%
df$log_salary <- log(df$salary)
male_model <- lm(df$log_salary ~df$years_empl, data = filter(df, df$gender == "Male"))
female_model <- lm(df$log_salary ~df$years_empl, data = filter(df,df$gender == "Female"))
summary(male_model)
##
## Call:
## lm(formula = df$log_salary ~ df$years_empl, data = filter(df,
## df$gender == "Male"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77041 -0.12197 -0.00111 0.15234 0.41044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.382774 0.027501 377.54 <2e-16 ***
## df$years_empl 0.070998 0.001517 46.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared: 0.9171, Adjusted R-squared: 0.9167
## F-statistic: 2191 on 1 and 198 DF, p-value: < 2.2e-16
summary(female_model)
##
## Call:
## lm(formula = df$log_salary ~ df$years_empl, data = filter(df,
## df$gender == "Female"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77041 -0.12197 -0.00111 0.15234 0.41044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.382774 0.027501 377.54 <2e-16 ***
## df$years_empl 0.070998 0.001517 46.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared: 0.9171, Adjusted R-squared: 0.9167
## F-statistic: 2191 on 1 and 198 DF, p-value: < 2.2e-16
this mean that percentage change in the salary per additional year of employment does not deffer base on gender,
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.