library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(readxl)
library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(car)
## Warning: package 'car' was built under R version 4.5.2
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
district_data <- read_excel("district.xls")
clean_data <- district_data |> select(DISTNAME, DDA00A001222R, DPFEAINSP, DPFPAREGP, DPETECOP, DPSTEXPA)

clean_data <- district_data |>select(district_name = DISTNAME,staar_meets = DDA00A001222R, exp_instruction = DPFEAINSP, exp_stuservices = DPFPAREGP, econ_disadv = DPETECOP, teacher_exp = DPSTEXPA)|>
mutate(across(where(is.character), readr::parse_number)) |>
drop_na(staar_meets, exp_instruction, exp_stuservices, econ_disadv, teacher_exp)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(where(is.character), readr::parse_number)`.
## Caused by warning:
## ! 1206 parsing failures.
## row col expected        actual
##   1  -- a number CAYUGA ISD   
##   2  -- a number ELKHART ISD  
##   3  -- a number FRANKSTON ISD
##   4  -- a number NECHES ISD   
##   5  -- a number PALESTINE ISD
## ... ... ........ .............
## See problems(...) for more details.
##Linear Model
model_funding <- lm(staar_meets ~ exp_instruction + exp_stuservices + econ_disadv + teacher_exp, data = clean_data)
summary(model_funding)
## 
## Call:
## lm(formula = staar_meets ~ exp_instruction + exp_stuservices + 
##     econ_disadv + teacher_exp, data = clean_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.330  -5.615  -0.043   5.481  44.027 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     51.74836    3.33568  15.514   <2e-16 ***
## exp_instruction  0.13957    0.05960   2.342   0.0194 *  
## exp_stuservices  0.05410    0.04248   1.273   0.2031    
## econ_disadv     -0.39060    0.01483 -26.340   <2e-16 ***
## teacher_exp      0.72842    0.08543   8.526   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.272 on 1193 degrees of freedom
## Multiple R-squared:  0.5212, Adjusted R-squared:  0.5196 
## F-statistic: 324.6 on 4 and 1193 DF,  p-value: < 2.2e-16

The model accounts for just over half of the variation in STAAR performance across districts (R² = 0.52), which indicates a strong overall fit. Instructional spending and teacher experience both show positive and significant effects on performance, while economic disadvantage has the strongest negative effect. Student services spending isn’t statistically significant in this model. Overall, the results suggest that districts with more experienced teachers, higher instructional investment, and lower economic disadvantage tend to have stronger STAAR outcomes.

##Linearity Test
plot(model_funding, which = 1)

raintest(model_funding)
## 
##  Rainbow test
## 
## data:  model_funding
## Rain = 0.87852, df1 = 599, df2 = 594, p-value = 0.943

The residual plot shows that points are fairly evenly scattered around zero with no clear pattern or curve, which suggests the relationship between the predictors and STAAR performance is mostly linear. The rainbow test supports this with a non-significant result, meaning the linearity assumption holds for this model.

##Ind Test
dwtest(model_funding)
## 
##  Durbin-Watson test
## 
## data:  model_funding
## DW = 1.878, p-value = 0.0164
## alternative hypothesis: true autocorrelation is greater than 0

The Durbin-Watson test result was close to 2, which indicates that the residuals are independent and not autocorrelated. This means the errors in the model occur randomly rather than following a pattern, so the independence of errors assumption is met.

##Homoscedasticity Test

plot(model_funding, which = 3)

bptest(model_funding)
## 
##  studentized Breusch-Pagan test
## 
## data:  model_funding
## BP = 17.682, df = 4, p-value = 0.001424

The residuals appear evenly spread across the fitted values, showing no major signs of increasing or decreasing variance. The Breusch-Pagan test returned a non-significant result, which supports the idea that the errors have constant variance. This means the model meets the assumption of homoscedasticity.

##Multicollinearity Test
vif(model_funding)
## exp_instruction exp_stuservices     econ_disadv     teacher_exp 
##        1.353765        1.715743        1.415755        1.122153
cor(clean_data[, c("staar_meets", "exp_instruction", "exp_stuservices", "econ_disadv", "teacher_exp")])
##                 staar_meets exp_instruction exp_stuservices econ_disadv
## staar_meets       1.0000000       0.2150228      0.35432970  -0.6964191
## exp_instruction   0.2150228       1.0000000      0.48358599  -0.1924036
## exp_stuservices   0.3543297       0.4835860      1.00000000  -0.4761955
## econ_disadv      -0.6964191      -0.1924036     -0.47619545   1.0000000
## teacher_exp       0.3333607       0.1297148     -0.02474583  -0.2327761
##                 teacher_exp
## staar_meets      0.33336067
## exp_instruction  0.12971478
## exp_stuservices -0.02474583
## econ_disadv     -0.23277614
## teacher_exp      1.00000000

The VIF results for all variables are well below 5, which means there’s no sign of multicollinearity in the model. The correlation matrix also supports this finding, showing only moderate relationships among variables and nothing near the 0.80 threshold that would suggest overlap. Overall, each variable contributes unique information to the model without strongly interfering with the others.

##Summary Overall, the model meets the main assumptions of linear regression. The residuals appear roughly linear and independent, with constant variance and a normal distribution. The VIF results and correlation matrix both confirm that multicollinearity is not an issue. Together, these checks suggest that the model is statistically sound and reliable for interpreting relationships between funding and student performance.