library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
district<-read_excel("district.xls")
clean_district_data <- district |> select(DDB00A001322R,DPSTBLFP,DPETBLAP,DPSTKIDR) |> drop_na()
summary(clean_district_data)
## DDB00A001322R DPSTBLFP DPETBLAP DPSTKIDR
## Min. :-1.00 Min. : 0.000 Min. : 0.00 Min. :-2.00
## 1st Qu.: 5.00 1st Qu.: 0.000 1st Qu.: 1.20 1st Qu.:11.20
## Median :11.00 Median : 2.250 Median : 4.10 Median :13.10
## Mean :12.39 Mean : 7.998 Mean :10.15 Mean :12.99
## 3rd Qu.:17.00 3rd Qu.: 7.725 3rd Qu.:12.82 3rd Qu.:14.60
## Max. :90.00 Max. :100.000 Max. :98.10 Max. :37.30
head(clean_district_data)
## # A tibble: 6 × 4
## DDB00A001322R DPSTBLFP DPETBLAP DPSTKIDR
## <dbl> <dbl> <dbl> <dbl>
## 1 3 8.3 4.4 12.3
## 2 8 2.9 4 11
## 3 6 4 8.5 10.8
## 4 19 6.5 8.2 11.3
## 5 11 9.6 25.1 12.9
## 6 11 11.6 19.7 11
model1<-lm(DDB00A001322R~DPSTBLFP+DPETBLAP+DPSTKIDR, data=clean_district_data)
summary(model1)
##
## Call:
## lm(formula = DDB00A001322R ~ DPSTBLFP + DPETBLAP + DPSTKIDR,
## data = clean_district_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.391 -7.625 -1.753 4.156 75.382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.01138 1.75111 0.578 0.5637
## DPSTBLFP -0.11896 0.05023 -2.368 0.0181 *
## DPETBLAP 0.05846 0.04987 1.172 0.2414
## DPSTKIDR 0.90388 0.13319 6.786 1.96e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.55 on 1008 degrees of freedom
## Multiple R-squared: 0.04557, Adjusted R-squared: 0.04273
## F-statistic: 16.04 on 3 and 1008 DF, p-value: 3.411e-10
The p-value of 3.411e-10 is lower than .05 which makes it statistically significant, but the multiple r-squared of .04557 and adjusted r-squared of .04273 show that this model only explains 5% and 4.3% of the varience, respectively.
If my understanding is correct, then this means that the significant variables are DPSTBLFP (percent of teachers who are African American) and DPSTKIDR (number of students per teacher)
The DPSTKIDR estimate of 0.90388 shows that there is a strong positive effect of number of students per teacher on African American students mastering grade level for STAAR testing. DPSTBLFP estimate of -0.11896 shows that there is a weak negative effect of percent of African American teachers on African American students mastering grade level for STAAR testing.
plot(model1,which=1)
According to the plot, it appears that this model violates the
assumption of linearity.