R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

library(readxl)
employeenumeric <- read_excel("C:/Users/Kins/Downloads/employeenumeric.xls")
read_excel("C:/Users/Kins/Downloads/employeenumeric.xls")
## # A tibble: 474 × 5
##    Gender CurrentSalary YearsofEducation MinorityClassification DateofBirth
##    <chr>          <dbl>            <dbl>                  <dbl>       <dbl>
##  1 m              57000               15                      0       19027
##  2 m              40200               16                      0       21328
##  3 f              21450               12                      0       10800
##  4 f              21900                8                      0       17272
##  5 m              45000               15                      0       20129
##  6 m              32100               15                      0       21419
##  7 m              36000               15                      0       20571
##  8 f              21900               12                      0       24233
##  9 f              27900               15                      0       16825
## 10 f              24000               12                      0       16846
## # ℹ 464 more rows
Question 1: The dimensions of the dataset at 474 x 5
Question 2: The 5 variables are Gender, Current Salary, Years of Education, Minority Classification, and Date of Birth
library(sf)
## Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
employeefilter <- employeenumeric |>
  filter(YearsofEducation == "15") |>
  glimpse()
## Rows: 116
## Columns: 5
## $ Gender                 <chr> "m", "m", "m", "m", "f", "m", "f", "m", "f", "f…
## $ CurrentSalary          <dbl> 57000, 45000, 32100, 36000, 27900, 27750, 35100…
## $ YearsofEducation       <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,…
## $ MinorityClassification <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1,…
## $ DateofBirth            <dbl> 19027, 20129, 21419, 20571, 16825, 22114, 17955…
Question 3: The new dimensions are 116 x 5
attach(employeefilter)
t.test(CurrentSalary[Gender == "m"], CurrentSalary[Gender == "f"])
## 
##  Welch Two Sample t-test
## 
## data:  CurrentSalary[Gender == "m"] and CurrentSalary[Gender == "f"]
## t = 5.0443, df = 102.38, p-value = 1.977e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3930.779 9024.884
## sample estimates:
## mean of x mean of y 
##  33527.83  27050.00
Question 4: The null hypothesis is that there is no difference in mean salaries between men and women with 15 years of education.
Question 5: To control the data so that the only factor to look at would be gender and minority classification.
Question 6: t = 5.0443
Question 7: The p-value = 1.977e-06
Question 8: The 95% confidence interval is 3930.78 , 9024.88
Question 9: No it does not include 0, so the difference is statistically significant.
Question 10: The mean salary for men is $33527.83 and $27050 for women.
Question 11: Since the p-value is small and the 95% confidence interval for the difference does not include 0, we can reject the null hypothesis. This tells us that for individuals with 15 years of education the men earn more than women on average.
t.test(CurrentSalary[MinorityClassification == 1], CurrentSalary[MinorityClassification == 0])
## 
##  Welch Two Sample t-test
## 
## data:  CurrentSalary[MinorityClassification == 1] and CurrentSalary[MinorityClassification == 0]
## t = -2.4432, df = 59.458, p-value = 0.01755
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6673.2519  -664.4916
## sample estimates:
## mean of x mean of y 
##  28838.46  32507.33
Question 12: t = -2.4432
Question 13: p-value = 0.01755
Question 14: The 95% confidence interval is -6673.2519, -664.4916
Question 15: It does not contain 0.
Question 16: The mean salary for having a minority classification is $28838.46 and $32507.33 for not having a minority classification.
Question 17: Minority employees with 15 years of education earn less than non-minority employees.
t.test(CurrentSalary[Gender=="m" & MinorityClassification==1], CurrentSalary[Gender=="m" & MinorityClassification==0])
## 
##  Welch Two Sample t-test
## 
## data:  CurrentSalary[Gender == "m" & MinorityClassification == 1] and CurrentSalary[Gender == "m" & MinorityClassification == 0]
## t = -2.4005, df = 40.643, p-value = 0.02104
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8164.9289  -702.7293
## sample estimates:
## mean of x mean of y 
##  30055.56  34489.38
Question 18: Because of the p-value is < 0.05 and the confidence interval not containing 0, there is a statistically significant difference.
t.test(CurrentSalary[Gender=="f" & MinorityClassification==1], CurrentSalary[Gender=="f" & MinorityClassification==0])
## 
##  Welch Two Sample t-test
## 
## data:  CurrentSalary[Gender == "f" & MinorityClassification == 1] and CurrentSalary[Gender == "f" & MinorityClassification == 0]
## t = -0.62398, df = 11.646, p-value = 0.5447
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5647.546  3139.546
## sample estimates:
## mean of x mean of y 
##     26100     27354
Question 19: Since the p-value is > 0.05 and the confidence interval contaoins 0, there is no statistically significant difference.
Question 20: Mean Salaries| Male | Female
Non-Minority | $34,489.38 | $27,354
Minority | $30,055.56 | $26,100
interaction.plot(Gender, `MinorityClassification`, `CurrentSalary`)

Question 23: The plot shows the difference in salary between minority and non-minority employees depends on gender. It also shows the difference in salary of people classified as a minority- while the CI gave us a statistically irrevenent otput, there still is a visible difference we can now see.