Homework 5

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)
districts <- read_excel("district.xls")

districts_variables <- districts |> select(DAGC4X21R,DPFPAHSAP,DPETECOP)

summary(districts_variables$DAGC4X21R)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   -1.00   93.20   96.90   93.91  100.00  100.00     133

summary(districts_variables$DPFPAHSAP)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.0000  0.0000  0.1578  0.1000  3.4000       5

summary(districts_variables$DPETECOP)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   47.95   61.90   60.75   77.15  100.00

cleaned_districts_variables <- districts_variables |> drop_na()

cor(cleaned_districts_variables)

##              DAGC4X21R    DPFPAHSAP    DPETECOP
## DAGC4X21R  1.000000000 -0.006462118 -0.23646133
## DPFPAHSAP -0.006462118  1.000000000  0.01921436
## DPETECOP  -0.236461334  0.019214365  1.00000000

pairs(~DPFPAHSAP+DAGC4X21R+DPETECOP,data=cleaned_districts_variables)

cor.test(cleaned_districts_variables$DPFPAHSAP,cleaned_districts_variables$DAGC4X21R,method="kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  cleaned_districts_variables$DPFPAHSAP and cleaned_districts_variables$DAGC4X21R
## z = -1.0363, p-value = 0.3001
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##         tau 
## -0.02478076

Explanation

I chose Kendall’s tau because it works well with smaller sample sizes; while my full dataset contains 1,074 observations, I am focusing on 40 districts within Bexar County. Although there appears to be a positive trend between high school allotment and graduation rates, this relationship is not statistically significant (p = 0.3001), suggesting the observed pattern may be due to chance rather than a genuine association.

Homework 5

Janet N. Ekezie

2025-10-09

Explanation