library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
districtbasehw6 <- read_xls("district.xls")
districtbasehw6cleaned<-districtbasehw6 %>% select(DISTNAME,DZCAMPUS,DAGC4X21R,DA0AT21R,DPSTURNR,DPSTKIDR,DPFEAINSP,DZEXADMP) %>% na.omit(.)
head(districtbasehw6cleaned)
## # A tibble: 6 × 8
## DISTNAME DZCAMPUS DAGC4X21R DA0AT21R DPSTURNR DPSTKIDR DPFEAINSP DZEXADMP
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CAYUGA ISD 3 100 96.7 19.1 12.3 49.6 9.1
## 2 ELKHART ISD 4 100 96 13.9 11 60.3 6.9
## 3 FRANKSTON ISD 3 95.2 95.4 21.6 10.8 54.2 8.3
## 4 NECHES ISD 2 95.8 95.8 18.3 11.3 53.7 10.7
## 5 PALESTINE ISD 6 99 93.7 17.9 12.9 54.6 8.3
## 6 WESTWOOD ISD 4 97.8 94.5 30.6 11 50.6 8.5
Dependent variable 4-YR LONGITUDINAL GRADUATION RATE (CLASS OF 2021) DISTRICT EXCL Graduation rates = DAGC4X21R
Independent variables- Attendance rate= DA0AT21R Teacher turnover rate = DPSTURNR Number of students per teacher= DPSTKIDR Expenditure INSTRUCTIONAL percentage= DPFEAINSP Expenditure Central Administrative= DZEXADMP
districtbasehw6cleaned_model <- lm(DAGC4X21R ~ DA0AT21R + DPSTURNR + DPSTKIDR + DPFEAINSP + DZEXADMP,
data = districtbasehw6cleaned)
summary(districtbasehw6cleaned_model)
##
## Call:
## lm(formula = DAGC4X21R ~ DA0AT21R + DPSTURNR + DPSTKIDR + DPFEAINSP +
## DZEXADMP, data = districtbasehw6cleaned)
##
## Residuals:
## Min 1Q Median 3Q Max
## -95.800 -1.604 1.615 4.456 22.070
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.375e+01 1.341e+01 -2.517 0.012 *
## DA0AT21R 1.683e+00 1.268e-01 13.279 < 2e-16 ***
## DPSTURNR 1.283e-03 4.140e-02 0.031 0.975
## DPSTKIDR 1.414e-04 1.409e-01 0.001 0.999
## DPFEAINSP -4.314e-01 8.586e-02 -5.024 5.92e-07 ***
## DZEXADMP -9.074e-01 1.386e-01 -6.547 9.10e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.6 on 1065 degrees of freedom
## Multiple R-squared: 0.1827, Adjusted R-squared: 0.1788
## F-statistic: 47.61 on 5 and 1065 DF, p-value: < 2.2e-16
The R square shows as 0.1827 or 18.27%. Which means that my dependent variable is explained by my chosen other independent variables by that amount. The P-valves are < 2.2e-16 which is showing that it is really small and that the p-valve signifies that the model is showing significance statistically. DA0AT21R(attendance rate),DZEXADMP (expenditures on central administration),DPFEAINSP (expenditures on instruction) are all variables that showed a significant valve of affecting the dependent variable with attendance being the most significant to student’s graduation rates. Comparatively, teacher turnover rates and number of students per teacher were not as significant showing much higher p-values.
So the biggest unit effecting variable in my table by far was attendance or the DA0AT21R variable. This shows that a unit increase measures out to a 1.683 movement. The other two variables that show sign. codes in the model are DPFEAINSP (expenditures on instruction) which show a similar relationship at 0.431% impact on graduation rates and DZEXADMP (expenditures on central administration) which has a 0.907 % impact per unit increase of a single unit of the dependent variable with all other variiables constant.
plot(districtbasehw6cleaned_model, which=1)
Looking at the plot, I would say it does violate the assumption of linearity. The red line has a bit of a bow indicating a curve in the data.