Homework 6

library(ggplot2)

Load your chosen dataset into Rmarkdown

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)

districtbasehw6 <- read_xls("district.xls")

districtbasehw6cleaned<-districtbasehw6 %>% select(DISTNAME,DZCAMPUS,DAGC4X21R,DA0AT21R,DPSTURNR,DPSTKIDR,DPFEAINSP,DZEXADMP) %>% na.omit(.)

head(districtbasehw6cleaned)

## # A tibble: 6 × 8
##   DISTNAME      DZCAMPUS DAGC4X21R DA0AT21R DPSTURNR DPSTKIDR DPFEAINSP DZEXADMP
##   <chr>            <dbl>     <dbl>    <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
## 1 CAYUGA ISD           3     100       96.7     19.1     12.3      49.6      9.1
## 2 ELKHART ISD          4     100       96       13.9     11        60.3      6.9
## 3 FRANKSTON ISD        3      95.2     95.4     21.6     10.8      54.2      8.3
## 4 NECHES ISD           2      95.8     95.8     18.3     11.3      53.7     10.7
## 5 PALESTINE ISD        6      99       93.7     17.9     12.9      54.6      8.3
## 6 WESTWOOD ISD         4      97.8     94.5     30.6     11        50.6      8.5

Select the dependent variable you are interested in, along with independent variables which you believe are causing the dependent variable

Dependent variable 4-YR LONGITUDINAL GRADUATION RATE (CLASS OF 2021) DISTRICT EXCL Graduation rates = DAGC4X21R

Independent variables- Attendance rate= DA0AT21R Teacher turnover rate = DPSTURNR Number of students per teacher= DPSTKIDR Expenditure INSTRUCTIONAL percentage= DPFEAINSP Expenditure Central Administrative= DZEXADMP

create a linear model using the “lm()” command, save it to some object
call a “summary()” on your new model

districtbasehw6cleaned_model <- lm(DAGC4X21R ~ DA0AT21R + DPSTURNR + DPSTKIDR + DPFEAINSP + DZEXADMP, 
                                  data = districtbasehw6cleaned)

summary(districtbasehw6cleaned_model)

## 
## Call:
## lm(formula = DAGC4X21R ~ DA0AT21R + DPSTURNR + DPSTKIDR + DPFEAINSP + 
##     DZEXADMP, data = districtbasehw6cleaned)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -95.800  -1.604   1.615   4.456  22.070 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.375e+01  1.341e+01  -2.517    0.012 *  
## DA0AT21R     1.683e+00  1.268e-01  13.279  < 2e-16 ***
## DPSTURNR     1.283e-03  4.140e-02   0.031    0.975    
## DPSTKIDR     1.414e-04  1.409e-01   0.001    0.999    
## DPFEAINSP   -4.314e-01  8.586e-02  -5.024 5.92e-07 ***
## DZEXADMP    -9.074e-01  1.386e-01  -6.547 9.10e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.6 on 1065 degrees of freedom
## Multiple R-squared:  0.1827, Adjusted R-squared:  0.1788 
## F-statistic: 47.61 on 5 and 1065 DF,  p-value: < 2.2e-16

interpret the model’s r-squared and p-values. How much of the dependent variable does the overall model explain? What are the significant variables? What are the insignificant variables?

The R square shows as 0.1827 or 18.27%. Which means that my dependent variable is explained by my chosen other independent variables by that amount. The P-valves are < 2.2e-16 which is showing that it is really small and that the p-valve signifies that the model is showing significance statistically. DA0AT21R(attendance rate),DZEXADMP (expenditures on central administration),DPFEAINSP (expenditures on instruction) are all variables that showed a significant valve of affecting the dependent variable with attendance being the most significant to student’s graduation rates. Comparatively, teacher turnover rates and number of students per teacher were not as significant showing much higher p-values.

Choose some significant independent variables. Interpret its Estimates (or Beta Coefficients). How do the independent variables individually affect the dependent variable?

So the biggest unit effecting variable in my table by far was attendance or the DA0AT21R variable. This shows that a unit increase measures out to a 1.683 movement. The other two variables that show sign. codes in the model are DPFEAINSP (expenditures on instruction) which show a similar relationship at 0.431% impact on graduation rates and DZEXADMP (expenditures on central administration) which has a 0.907 % impact per unit increase of a single unit of the dependent variable with all other variiables constant.

Does the model you create meet or violate the assumption of linearity? Show your work with “plot(x,which=1)”

plot(districtbasehw6cleaned_model, which=1)

Looking at the plot, I would say it does violate the assumption of linearity. The red line has a bit of a bow indicating a curve in the data.

Homework 6

Aaron Rodriguez

2025-03-30