library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
Public_School_Characteristics_2022_23 <- read_csv("Public_School_Characteristics_2022-23.csv")
## Rows: 101390 Columns: 77
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (23): NCESSCH, SURVYEAR, STABR, LEAID, ST_LEAID, LEA_NAME, SCH_NAME, LST...
## dbl (54): X, Y, OBJECTID, STATUS, TOTFRL, FRELCH, REDLCH, DIRECTCERT, PK, KG...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
psc_model<-lm(STUTERATIO~TOTFRL+ULOCALE+WH+HI, data=Public_School_Characteristics_2022_23)
summary(psc_model)
##
## Call:
## lm(formula = STUTERATIO ~ TOTFRL + ULOCALE + WH + HI, data = Public_School_Characteristics_2022_23)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58.5 -3.4 -0.9 1.9 3584.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.9267697 0.2052804 67.843 < 2e-16 ***
## TOTFRL 0.0012621 0.0003691 3.420 0.000627 ***
## ULOCALE12-City: Mid-size 0.2429880 0.3463773 0.702 0.482985
## ULOCALE13-City: Small -0.2138650 0.3390579 -0.631 0.528197
## ULOCALE21-Suburb: Large -0.2444225 0.2369523 -1.032 0.302297
## ULOCALE22-Suburb: Mid-size 0.3357734 0.4382972 0.766 0.443627
## ULOCALE23-Suburb: Small -0.2419736 0.5411577 -0.447 0.654774
## ULOCALE31-Town: Fringe -0.1383734 0.4598818 -0.301 0.763500
## ULOCALE32-Town: Distant -0.4625817 0.3494587 -1.324 0.185603
## ULOCALE33-Town: Remote 1.5417867 0.4050560 3.806 0.000141 ***
## ULOCALE41-Rural: Fringe -0.7278903 0.2868867 -2.537 0.011176 *
## ULOCALE42-Rural: Distant -1.5699833 0.2992098 -5.247 1.55e-07 ***
## ULOCALE43-Rural: Remote -2.4718333 0.3444846 -7.175 7.26e-13 ***
## WH 0.0043632 0.0002875 15.177 < 2e-16 ***
## HI 0.0031666 0.0004385 7.222 5.16e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.13 on 97084 degrees of freedom
## (4291 observations deleted due to missingness)
## Multiple R-squared: 0.007507, Adjusted R-squared: 0.007364
## F-statistic: 52.45 on 14 and 97084 DF, p-value: < 2.2e-16
Variables include TOTFRL (total number of students with either free or reduced lunuch status), ULOCALE (locale status), WH (total number of White students), HI (total number of Hispanic students).
P-value of 2.2e-16, which is extremely low, means we can reject the null hypothesis that states the TOTFRL, ULOCALE, WH, HI does not effect STUTERATIO.
R-squared value of 0.007, which is also extremely low, means that only .7% of the intercept can be explained by the independent variables listed above (ULOCALE, WH, HI). In other words, the independent variables listed do not effect student-teacher ratio in the slightest. There are other independent variables at play here.
Based on P-values provided, significant variables include TOTFRL, Town:Remote, Rural:Distant, Rural:Remote, WH, HI. Insignificant variables include all other local statuses. TOTFRL, with a p-value of 0.0006, shows we can reject the null hypothesis that it does not affect STUTERATIO. More interestingly, p-value of 7.26e-13 for Rural: Remote shows that we can reject the null hypothesis that it does not affect student-teacher ratio. Considering that all other p-values for locale codes are not significant, we can rule out the notion that locale code definitely does not affect student-teacher ratio.
plot(psc_model,which=1)
It seems that this model does meet the assumption of linearity, seeing that most of the observations, or residuals, lie close to the fitted (red) line.