library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
setwd("~/Desktop/UTSA/Quantitative Methods/RStudio")
district <- read_excel("district.xls")
clean_district<-district |> select(DA0GR21N, DPSTURNR, DPFRAALLK) |> drop_na()
district_model<-lm(DA0GR21N~DPSTURNR+DPFRAALLK, data=clean_district)
summary(district_model)
##
## Call:
## lm(formula = DA0GR21N ~ DPSTURNR + DPFRAALLK, data = clean_district)
##
## Residuals:
## Min 1Q Median 3Q Max
## -649.3 -325.2 -214.4 -37.9 11218.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 965.181647 91.409334 10.559 < 2e-16 ***
## DPSTURNR -14.029075 2.831411 -4.955 8.41e-07 ***
## DPFRAALLK -0.021141 0.003839 -5.507 4.57e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 853.2 on 1074 degrees of freedom
## Multiple R-squared: 0.04632, Adjusted R-squared: 0.04454
## F-statistic: 26.08 on 2 and 1074 DF, p-value: 8.705e-12
ANSWER: This model explains about 4.6% of the variation in annual graduate counts (R-squared = 0.0463), which means most of the differences in graduation rates are due to other factors not included when I ran it. Still, the overall model is statistically significant (p < .001), showing that both variables have a real relationship with graduate count. Teacher turnover and revenue per pupil are both significant (p < .001). Districts with higher teacher turnover tend to graduate fewer students, roughly 14 fewer graduates for every 1% increase in turnover. Higher revenue per pupil is also linked to slightly lower graduate counts, which could reflect that smaller or higher-need districts spend more per student. So while the relationships are significant, they explain only a small portion of the overall variation in graduation outcomes.
Both of my independent variables are modestly significant. The coefficient for teacher turnover (–14.03) means that for every 1% increase in teacher turnover, a district graduates about 14 fewer students, assuming revenue stays the same. The coefficient for revenue per pupil (–0.02) shows a small negative relationship: as revenue per student increases by one dollar, the number of graduates slightly decreases. It could be because smaller or higher-need districts often spend more per student. Overall, both variables have a negative effect on graduation counts, with teacher turnover showing the stronger relationship.
plot(district_model,which=1)
The residuals vs. fitted plot looks mostly flat, which suggests the model generally meets the assumption of linearity. The red line stays close to zero across most fitted values, meaning the relationship between the variables is roughly linear. But there are a few large outliers, which could indicate a bit of non-linearity in certain districts. Overall, the linearity assumption is mostly satisfied, but a few extreme data points might be influencing the model.