library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   3.5.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
file.exists("~/Library/CloudStorage/OneDrive-UniversityofNorthCarolinaatChapelHill/GEOG 391/employeenumeric.xls")
## [1] TRUE
employeenumeric <- read_excel("~/Library/CloudStorage/OneDrive-UniversityofNorthCarolinaatChapelHill/GEOG 391/employeenumeric.xls")
glimpse(employeenumeric)
## Rows: 474
## Columns: 5
## $ Gender                    <chr> "m", "m", "f", "f", "m", "m", "m", "f", "f",…
## $ `Current Salary`          <dbl> 57000, 40200, 21450, 21900, 45000, 32100, 36…
## $ `Years of Education`      <dbl> 15, 16, 12, 8, 15, 15, 15, 12, 15, 12, 16, 8…
## $ `Minority Classification` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,…
## $ `Date of Birth`           <dbl> 19027, 21328, 10800, 17272, 20129, 21419, 20…

Question 1

The data set contains five variables (columns) and 474 observations (rows).

Question 2

head(employeenumeric)
## # A tibble: 6 × 5
##   Gender `Current Salary` `Years of Education` `Minority Classification`
##   <chr>             <dbl>                <dbl>                     <dbl>
## 1 m                 57000                   15                         0
## 2 m                 40200                   16                         0
## 3 f                 21450                   12                         0
## 4 f                 21900                    8                         0
## 5 m                 45000                   15                         0
## 6 m                 32100                   15                         0
## # ℹ 1 more variable: `Date of Birth` <dbl>

The variables included in the dataset are gender, current salary, years of education, minority classification, and date of birth.

Question 3

employee15 <- employeenumeric %>% 
  filter(`Years of Education` == 15)
dim(employee15)
## [1] 116   5

The dimensions for the new subset are 116 rows by 5 columns.

Question 4

The null hypothesis is that there is no relationship between mean salary and sex among our sample.

Question 5

By managing the years of education, it is our way of controlling for the variable to ensure that pay increases are not actually due to education differences.

Question 6

attach(employee15)
t.test(`Current Salary`[Gender == "m"], `Current Salary`[Gender == "f"])
## 
##  Welch Two Sample t-test
## 
## data:  `Current Salary`[Gender == "m"] and `Current Salary`[Gender == "f"]
## t = 5.0443, df = 102.38, p-value = 1.977e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3930.779 9024.884
## sample estimates:
## mean of x mean of y 
##  33527.83  27050.00
detach(employee15)

The test statistic for the difference in salary between men and women with 15 years of education is 5.0443.

Question 7

The p-value for the above test statistic is 1.977e-06, which is practically zero. This suggests that there is a statistically significant difference between the two variables and we can reject the null hypothesis that there is no difference

Question 8

The lower limit for the 95% CI is 3930.779 and the upper limit for the 95% CI is 9024.884.

Question 9

The CI does not contain the value of zero meaning there is likely a significant difference between the two variables, salary and gender

Question 10

em15f <- employee15 %>% 
  filter(Gender == "f")

em15m <- employee15 %>% 
  filter(Gender == "m")

mean(em15m$`Current Salary`)
## [1] 33527.83
mean(em15f$`Current Salary`)
## [1] 27050

The mean salary for men with 15 years of education is $33,527.83, while the mean salary for females is $27,050.

Question 11

Based on the results from the previous questions, we can reject the null hypothesis that there is no relationship between gender and salary at 15 years of education. The alternative hypothesis was correct that there was a significant relationship between gender and salary at 15 years of education.

Question 12

attach(employee15)
t.test(`Current Salary`[`Minority Classification` == 0], `Current Salary`[`Minority Classification` == 1])
## 
##  Welch Two Sample t-test
## 
## data:  `Current Salary`[`Minority Classification` == 0] and `Current Salary`[`Minority Classification` == 1]
## t = 2.4432, df = 59.458, p-value = 0.01755
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   664.4916 6673.2519
## sample estimates:
## mean of x mean of y 
##  32507.33  28838.46
detach(employee15)

The test statistic for the difference in salaries between minorities and non-minorities at 15 years of education is 2.4432.

Question 13

The p-value for the above t.test is 0.01755.

Question 14

The lower limit for the 95% CI is 664.4916 and the upper limit for the 95% CI is 6673.2519.

Question 15

The CI does not contain the value of zero meaning there is likely a significant relationship between the two variables, salary and minority status.

Question 16

em150 <- employee15 %>% 
  filter(`Minority Classification` == 0)

em151 <- employee15 %>% 
  filter(`Minority Classification` == 1)

mean(em150$`Current Salary`)
## [1] 32507.33
mean(em151$`Current Salary`)
## [1] 28838.46

The mean current salary for non-minorities with 15 years of education is $32507.33 and the mean current salary for minorities with 15 years of education is $28838.46.

Question 17

Since the p-value for the test statistic examining the differences between salary between minorities and non-minorities at 15 years of education is significant, we can reject the null hypothesis that there is no difference between the variables and accept the alternative hypothesis that there is a significant difference

Question 18

attach(em15m)
t.test(`Current Salary`[`Minority Classification` == 0], `Current Salary`[`Minority Classification` == 1])
## 
##  Welch Two Sample t-test
## 
## data:  `Current Salary`[`Minority Classification` == 0] and `Current Salary`[`Minority Classification` == 1]
## t = 2.4005, df = 40.643, p-value = 0.02104
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   702.7293 8164.9289
## sample estimates:
## mean of x mean of y 
##  34489.38  30055.56
detach(em15m)

The test statistic for the difference in salary between minority statuses by men with 15 years of education is 2.4005 which has a p-value of 0.02104. This indicates that there is a statistically significant difference in salary at 95% CI between minority statuses by men with 15 years of education.

Question 19

attach(em15f)
t.test(`Current Salary`[`Minority Classification` == 0], `Current Salary`[`Minority Classification` == 1])
## 
##  Welch Two Sample t-test
## 
## data:  `Current Salary`[`Minority Classification` == 0] and `Current Salary`[`Minority Classification` == 1]
## t = 0.62398, df = 11.646, p-value = 0.5447
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3139.546  5647.546
## sample estimates:
## mean of x mean of y 
##     27354     26100
detach(em15f)

The test statistic for the difference in salary between minority statuses by females with 15 years of education is 0.62398 which has a p-value of 0.5447. This indicates that there is not a statistically significant difference in salary at 95% CI between minority statuses among females with 15 years of education.

Question 20

em150m <- em15m %>% 
  filter(`Minority Classification` == 0)

em151m <- em15m %>% 
  filter(`Minority Classification` == 1)

em150f <- em15f %>% 
  filter(`Minority Classification` == 0)

em151f <- em15f %>% 
  filter(`Minority Classification` == 1)

mean(em150m$`Current Salary`)
## [1] 34489.38
mean(em151m$`Current Salary`)
## [1] 30055.56
mean(em150f$`Current Salary`)
## [1] 27354
mean(em151f$`Current Salary`)
## [1] 26100
Mean Salaries Male Female
Non-Minority 34489.38 27354
Minority 30055.56 26100

Question 23

attach(employee15) 
interaction.plot(Gender, `Minority Classification`, `Current Salary`) 

detach(employee15) 

The mean salaries for both male and female minorities are lower than that of their non-minority coworkers also at 15 years of education. Similarly, males of both minority and non-minority make a higher mean salary than females of either status. This shows that having a minority status and being female equate to a lower mean salary.