library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
Public_School_Characteristics_2022_23 <- read_csv("Public_School_Characteristics_2022-23.csv")
## Rows: 101390 Columns: 77
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (23): NCESSCH, SURVYEAR, STABR, LEAID, ST_LEAID, LEA_NAME, SCH_NAME, LST...
## dbl (54): X, Y, OBJECTID, STATUS, TOTFRL, FRELCH, REDLCH, DIRECTCERT, PK, KG...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
clean_Public_School_Characteristics_2022_23 <- Public_School_Characteristics_2022_23 |> select(HI,WH,STUTERATIO,TOTFRL) |> drop_na(HI) |> drop_na(WH) |> drop_na(TOTFRL) |> drop_na(STUTERATIO)
cor(clean_Public_School_Characteristics_2022_23)
## HI WH STUTERATIO TOTFRL
## HI 1.00000000 0.05218346 0.05564767 0.72181277
## WH 0.05218346 1.00000000 0.05735714 0.18051615
## STUTERATIO 0.05564767 0.05735714 1.00000000 0.05683012
## TOTFRL 0.72181277 0.18051615 0.05683012 1.00000000
pairs(clean_Public_School_Characteristics_2022_23)
cor.test(clean_Public_School_Characteristics_2022_23$HI, clean_Public_School_Characteristics_2022_23$TOTFRL, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: clean_Public_School_Characteristics_2022_23$HI and clean_Public_School_Characteristics_2022_23$TOTFRL
## z = 216.95, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.4690714
Correlation method justification: I chose to use the Kendall’s Tau correlation method because the observations for the two variables chosen are not normally distributed (which rules out use of Pearson’s method) and repeating (which rules out use of Spearman’s RHO).
Explanation of findings: the use of Kendall’s Tau method above was to test the variables “HI”, which represents the number of Hispanic students at each school surveyed, and “TOTFRL”, which represents the number of students on the Free and Reduced Lunch Program at each school surveyed. The extremely low p-value of 2.2e-16 means we can reject the null hypothesis and rule out the possibility of two these variables not being correlated. The correlation value of 0.47 shows that the relationship between Free and Reduced Lunch status and Hispanic/non-Hispanic status of students is somewhat correlated.