library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
districts <- read_excel("district.xls")
cleaned_districts <- districts |> drop_na()
cleaned_districts_multiple <- lm(DAGC4X21R~DPFPAHSAP+DZCAMPUS+DPETECOP, data=cleaned_districts)
summary(cleaned_districts_multiple)
##
## Call:
## lm(formula = DAGC4X21R ~ DPFPAHSAP + DZCAMPUS + DPETECOP, data = cleaned_districts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.938 -1.458 0.779 2.879 10.120
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 100.21309 1.05366 95.110 < 2e-16 ***
## DPFPAHSAP -0.68293 0.96571 -0.707 0.4800
## DZCAMPUS -0.02947 0.01314 -2.243 0.0256 *
## DPETECOP -0.09734 0.01742 -5.589 4.92e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.729 on 319 degrees of freedom
## Multiple R-squared: 0.1158, Adjusted R-squared: 0.1075
## F-statistic: 13.92 on 3 and 319 DF, p-value: 1.493e-08
The Adjusted R-squared indicates about 11% of the model is explained, and with a significantly low p-value there is enough evidence to reject the null hypothesis. Additionally when the significant independent variables, like district size (DZCAMPUS) and the percentage of economically disadvantaged students within a district (DPETECOP), increase by 1, the dependent variable goes down by 0.02947 and 0.09734, respectively.
plot(cleaned_districts_multiple,which=1)
As the plot illustrates, the relationship betwen the dependent and independent variables is linear.