geog391 Practice 6

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

library(readxl)
employeenumeric <- read_excel("C:/Users/Kins/Downloads/employeenumeric.xls")
read_excel("C:/Users/Kins/Downloads/employeenumeric.xls")

## # A tibble: 474 × 5
##    Gender CurrentSalary YearsofEducation MinorityClassification DateofBirth
##    <chr>          <dbl>            <dbl>                  <dbl>       <dbl>
##  1 m              57000               15                      0       19027
##  2 m              40200               16                      0       21328
##  3 f              21450               12                      0       10800
##  4 f              21900                8                      0       17272
##  5 m              45000               15                      0       20129
##  6 m              32100               15                      0       21419
##  7 m              36000               15                      0       20571
##  8 f              21900               12                      0       24233
##  9 f              27900               15                      0       16825
## 10 f              24000               12                      0       16846
## # ℹ 464 more rows

Question 1: The dimensions of the dataset at 474 x 5

Question 2: The 5 variables are Gender, Current Salary, Years of Education, Minority Classification, and Date of Birth

library(sf)

## Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

employeefilter <- employeenumeric |>
  filter(YearsofEducation == "15") |>
  glimpse()

## Rows: 116
## Columns: 5
## $ Gender                 <chr> "m", "m", "m", "m", "f", "m", "f", "m", "f", "f…
## $ CurrentSalary          <dbl> 57000, 45000, 32100, 36000, 27900, 27750, 35100…
## $ YearsofEducation       <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,…
## $ MinorityClassification <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1,…
## $ DateofBirth            <dbl> 19027, 20129, 21419, 20571, 16825, 22114, 17955…

Question 3: The new dimensions are 116 x 5

attach(employeefilter)
t.test(CurrentSalary[Gender == "m"], CurrentSalary[Gender == "f"])

## 
##  Welch Two Sample t-test
## 
## data:  CurrentSalary[Gender == "m"] and CurrentSalary[Gender == "f"]
## t = 5.0443, df = 102.38, p-value = 1.977e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3930.779 9024.884
## sample estimates:
## mean of x mean of y 
##  33527.83  27050.00

Question 4: The null hypothesis is that there is no difference in mean salaries between men and women with 15 years of education.

Question 5: To control the data so that the only factor to look at would be gender and minority classification.

Question 6: t = 5.0443

Question 7: The p-value = 1.977e-06

Question 8: The 95% confidence interval is 3930.78 , 9024.88

Question 9: No it does not include 0, so the difference is statistically significant.

Question 10: The mean salary for men is $33527.83 and $27050 for women.

Question 11: Since the p-value is small and the 95% confidence interval for the difference does not include 0, we can reject the null hypothesis. This tells us that for individuals with 15 years of education the men earn more than women on average.

t.test(CurrentSalary[MinorityClassification == 1], CurrentSalary[MinorityClassification == 0])

## 
##  Welch Two Sample t-test
## 
## data:  CurrentSalary[MinorityClassification == 1] and CurrentSalary[MinorityClassification == 0]
## t = -2.4432, df = 59.458, p-value = 0.01755
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6673.2519  -664.4916
## sample estimates:
## mean of x mean of y 
##  28838.46  32507.33

Question 12: t = -2.4432

Question 13: p-value = 0.01755

Question 14: The 95% confidence interval is -6673.2519, -664.4916

Question 15: It does not contain 0.

Question 16: The mean salary for having a minority classification is $28838.46 and $32507.33 for not having a minority classification.

Question 17: Minority employees with 15 years of education earn less than non-minority employees.

t.test(CurrentSalary[Gender=="m" & MinorityClassification==1], CurrentSalary[Gender=="m" & MinorityClassification==0])

## 
##  Welch Two Sample t-test
## 
## data:  CurrentSalary[Gender == "m" & MinorityClassification == 1] and CurrentSalary[Gender == "m" & MinorityClassification == 0]
## t = -2.4005, df = 40.643, p-value = 0.02104
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8164.9289  -702.7293
## sample estimates:
## mean of x mean of y 
##  30055.56  34489.38

Question 18: Because of the p-value is < 0.05 and the confidence interval not containing 0, there is a statistically significant difference.

t.test(CurrentSalary[Gender=="f" & MinorityClassification==1], CurrentSalary[Gender=="f" & MinorityClassification==0])

## 
##  Welch Two Sample t-test
## 
## data:  CurrentSalary[Gender == "f" & MinorityClassification == 1] and CurrentSalary[Gender == "f" & MinorityClassification == 0]
## t = -0.62398, df = 11.646, p-value = 0.5447
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5647.546  3139.546
## sample estimates:
## mean of x mean of y 
##     26100     27354

Question 19: Since the p-value is > 0.05 and the confidence interval contaoins 0, there is no statistically significant difference.

Question 20: Mean Salaries| Male | Female

Non-Minority | $34,489.38 | $27,354

Minority | $30,055.56 | $26,100

interaction.plot(Gender, `MinorityClassification`, `CurrentSalary`)

Question 23: The plot shows the difference in salary between minority and non-minority employees depends on gender. It also shows the difference in salary of people classified as a minority- while the CI gave us a statistically irrevenent otput, there still is a visible difference we can now see.

geog391 Practice 6

Kinsey Snodgrass

2025-10-30

R Markdown

Including Plots

Question 1: The dimensions of the dataset at 474 x 5

Question 2: The 5 variables are Gender, Current Salary, Years of Education, Minority Classification, and Date of Birth

Question 3: The new dimensions are 116 x 5

Question 4: The null hypothesis is that there is no difference in mean salaries between men and women with 15 years of education.

Question 5: To control the data so that the only factor to look at would be gender and minority classification.

Question 6: t = 5.0443

Question 7: The p-value = 1.977e-06

Question 8: The 95% confidence interval is 3930.78 , 9024.88

Question 9: No it does not include 0, so the difference is statistically significant.

Question 10: The mean salary for men is $33527.83 and $27050 for women.

Question 11: Since the p-value is small and the 95% confidence interval for the difference does not include 0, we can reject the null hypothesis. This tells us that for individuals with 15 years of education the men earn more than women on average.

Question 12: t = -2.4432

Question 13: p-value = 0.01755

Question 14: The 95% confidence interval is -6673.2519, -664.4916

Question 15: It does not contain 0.

Question 16: The mean salary for having a minority classification is $28838.46 and $32507.33 for not having a minority classification.

Question 17: Minority employees with 15 years of education earn less than non-minority employees.

Question 18: Because of the p-value is < 0.05 and the confidence interval not containing 0, there is a statistically significant difference.

Question 19: Since the p-value is > 0.05 and the confidence interval contaoins 0, there is no statistically significant difference.

Question 20: Mean Salaries| Male | Female

Non-Minority | $34,489.38 | $27,354

Minority | $30,055.56 | $26,100

Question 23: The plot shows the difference in salary between minority and non-minority employees depends on gender. It also shows the difference in salary of people classified as a minority- while the CI gave us a statistically irrevenent otput, there still is a visible difference we can now see.