library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(readxl)
district_data <- read_excel("district.xls")
clean_data <- district_data |> select(DISTNAME, DDA00A001222R, DPFEAINSP, DPFPAREGP, DPETECOP, DPSTEXPA)
clean_data <- district_data |>select(district_name = DISTNAME,staar_meets = DDA00A001222R, exp_instruction = DPFEAINSP, exp_stuservices = DPFPAREGP, econ_disadv = DPETECOP, teacher_exp = DPSTEXPA) |>
mutate(across(where(is.character), readr::parse_number)) |>
drop_na(staar_meets, exp_instruction, exp_stuservices, econ_disadv, teacher_exp)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(where(is.character), readr::parse_number)`.
## Caused by warning:
## ! 1206 parsing failures.
## row col expected actual
## 1 -- a number CAYUGA ISD
## 2 -- a number ELKHART ISD
## 3 -- a number FRANKSTON ISD
## 4 -- a number NECHES ISD
## 5 -- a number PALESTINE ISD
## ... ... ........ .............
## See problems(...) for more details.
staar_meets: Percentage of students meeting grade level on all STAAR subjects exp_instruction: Percentage of total expenditures spent on classroom instruction exp_stuservices: Percentage of total expenditures spent on student support services econ_disadv: Percentage of students identified as economically disadvantaged. teacher_exp: Average years of teacher experience within the district
cor(clean_data[, c("staar_meets", "exp_instruction", "exp_stuservices", "econ_disadv", "teacher_exp")],use = "complete.obs", method = "pearson")
## staar_meets exp_instruction exp_stuservices econ_disadv
## staar_meets 1.0000000 0.2150228 0.35432970 -0.6964191
## exp_instruction 0.2150228 1.0000000 0.48358599 -0.1924036
## exp_stuservices 0.3543297 0.4835860 1.00000000 -0.4761955
## econ_disadv -0.6964191 -0.1924036 -0.47619545 1.0000000
## teacher_exp 0.3333607 0.1297148 -0.02474583 -0.2327761
## teacher_exp
## staar_meets 0.33336067
## exp_instruction 0.12971478
## exp_stuservices -0.02474583
## econ_disadv -0.23277614
## teacher_exp 1.00000000
pairs(clean_data[, c("staar_meets", "exp_instruction", "exp_stuservices", "econ_disadv", "teacher_exp")], main = "Pairs Plot: Achievement, Spending, Poverty, Experience", pch = 19, col = rgb(0,0,0,0.4))
cor.test(clean_data$staar_meets, clean_data$econ_disadv, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: clean_data$staar_meets and clean_data$econ_disadv
## t = -33.561, df = 1196, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7244803 -0.6660534
## sample estimates:
## cor
## -0.6964191
This analysis looked at how different district-level factors relate to academic performance across Texas. STAAR Meets (%) was used to measure how well students performed overall, while Instructional Expenditure (%), Student Services Expenditure (%), Teacher Experience (Years), and Economically Disadvantaged (%) were used as the main comparison variables.
The results showed a strong negative relationship between Economically Disadvantaged (%) and STAAR Meets (%), with r = -0.696, p < .001. In other words, districts with more economically disadvantaged students tend to have lower STAAR performance. The other variables—Instructional Expenditure and Teacher Experience—had smaller positive relationships, suggesting they may help, but not nearly as much as poverty hurts.
Pearson’s correlation was used because all of the variables are continuous and show a mostly straight-line pattern. A Spearman test backed up the same results, meaning the relationship is real and not caused by a few unusual data points. The scatterplot clearly shows that as poverty levels go up, performance tends to drop.
Overall, this tells a pretty clear story: poverty continues to be a major factor in how students perform. Spending and teacher experience do play a role, but the economic challenges facing students weigh the heaviest. For me, this reinforces how important it is to think about funding fairness and targeted support for schools serving higher-need students.