R Markdown

library(readxl)
employee <- read_excel("C:/Users/Kendall/Downloads/employeenumeric.xls")
View(employee)
# 1. What are the dimensions of the dataset?
dim(employee)
## [1] 474   5
# 2. What variables are included in the data?
  1. Answer: There are 5 columns (variables) and 474 rows (observations)
  2. Answer: The variables are gender, current salary, years of education, minority classification, and date of birth
# Filter data so only individuals with 15+ yrs edu are included
emp15 <- employee[employee$`Years of Education`== 15, ]
# 3. What are the dimensions of the new dataset?
dim(emp15)
## [1] 116   5
  1. Answer: The new dimensions are 5 columns (variables) with 109 rows (observations)
# Conduct a t-test for mean salary and gender
# 4. What is the null hypothesis?
attach(emp15)
t.test(`Current Salary`[Gender=="m"], `Current Salary`[Gender=="f"])
## 
##  Welch Two Sample t-test
## 
## data:  `Current Salary`[Gender == "m"] and `Current Salary`[Gender == "f"]
## t = 5.0443, df = 102.38, p-value = 1.977e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3930.779 9024.884
## sample estimates:
## mean of x mean of y 
##  33527.83  27050.00
# 5. Why include only those with 15 years of education?
# 6. What is the t-statistic for difference in salaries btw men and women?
# 7. What is the p-value?
# 8. What are the limites of the 95% CI?
# 9. Does the 95% CI include 0?
# 10. What are the mean salaries for men and women?
# 11. Interpret results (conclusion)
  1. Answer: The null hypothesis is that there is no relationship between mean salary and gender, they are completely random.
  2. Answer: Including only those with 15 years of education helps to control for the potential confounder that education can play, if everyone has the same level of education then gender should be the only relevant modifying variable
  3. Answer: t = 7.4962
  4. Answer: p-value = 9.535e-11
  5. Answer: The limits are $16692.72 to $28769.48
  6. Answer: No, the value 0 is not part of this particular confidence interval
  7. Answer: The mean salaries for men are $62442.5, and for women are $39711.4
  8. Answer: From the results of this test I conclude that I can reject the null hypothesis (no relationship) in favor of the alternative hypothesis (gender has an effect on salary) because the p-value is very significant.
#Conduct another t-test to determine if salary differs by minority status
t.test(`Current Salary`[`Minority Classification` == 0], `Current Salary`[`Minority Classification` == 1])
## 
##  Welch Two Sample t-test
## 
## data:  `Current Salary`[`Minority Classification` == 0] and `Current Salary`[`Minority Classification` == 1]
## t = 2.4432, df = 59.458, p-value = 0.01755
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   664.4916 6673.2519
## sample estimates:
## mean of x mean of y 
##  32507.33  28838.46
# 12. What is the t-statistic for difference in salaries by minority status?
# 13. What is the p-value?
# 14. What are the limits of the 95% CI?
# 15. Does the 95% CI contain the value 0?
# 16. What are the mean salaries for minorities and non-minorities
# 17. Conclusion
  1. Answer: t = 1.9447
  2. Answer: p-value = 0.07403
  3. Answer: The limits of the 95% CI are -1588.772 to 29896.435
  4. Answer: The 95% CI does contain the value 0
  5. Answer: The mean salary for minorities is$44633.33 and the mean for non-minorities is $58787.16
  6. I conclude that I cannot reject the null hypothesis, that there is no relationship between minority status and salary because the p-value is non-significant at a value higher than 0.05.
# 18. Compare, using a t-test for the difference in means, minority vs. non-minority men. Is there a significant difference at 95%?
men <- subset(emp15, Gender == "m")
t.test(`Current Salary`~ `Minority Classification`, data = men)
## 
##  Welch Two Sample t-test
## 
## data:  Current Salary by Minority Classification
## t = 2.4005, df = 40.643, p-value = 0.02104
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##   702.7293 8164.9289
## sample estimates:
## mean in group 0 mean in group 1 
##        34489.38        30055.56
  1. Answer: There is a significance at the 95% confidence interval because the p-value = 0.02104, which is lower than 0.05
# 19. Compare, using a t-test for the difference in means, minority vs non-minority women. Is there a significance at 95%?
women <- subset(emp15, Gender == "f")
t.test(`Current Salary`~ `Minority Classification`, data = women)
## 
##  Welch Two Sample t-test
## 
## data:  Current Salary by Minority Classification
## t = 0.62398, df = 11.646, p-value = 0.5447
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -3139.546  5647.546
## sample estimates:
## mean in group 0 mean in group 1 
##           27354           26100
  1. Answer: There is not a significance at the 95% confidence interval becuase the p-value = 0.5447 which is higher than 0.05
# 20. Fill in the following table
Mean Salaries Male Female
Non-Minority 34489.38 27354
Minority 30055.56 26100

not sure what happened to questions 21-22?

# 23. Include the plot and describe what it tells you about how differences in salary relate to the interaction btw gender and minority status.
interaction.plot(Gender, `Minority Classification`, `Current Salary`)

  1. Answer: Based on the plot, the differences in salary don’t seem to relate a lot to the interaction between gender and minority status for people with 15 years of education. Gender and minority classification seem to run parallel to each other wihout much change, though the slope of gender appears slightly more steep than minority.