HW 6

library(readxl)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

district<-read_excel("district.xls")
clean_district_data <- district |> select(DDB00A001322R,DPSTBLFP,DPETBLAP,DPSTKIDR) |> drop_na()

summary(clean_district_data)

##  DDB00A001322R      DPSTBLFP          DPETBLAP        DPSTKIDR    
##  Min.   :-1.00   Min.   :  0.000   Min.   : 0.00   Min.   :-2.00  
##  1st Qu.: 5.00   1st Qu.:  0.000   1st Qu.: 1.20   1st Qu.:11.20  
##  Median :11.00   Median :  2.250   Median : 4.10   Median :13.10  
##  Mean   :12.39   Mean   :  7.998   Mean   :10.15   Mean   :12.99  
##  3rd Qu.:17.00   3rd Qu.:  7.725   3rd Qu.:12.82   3rd Qu.:14.60  
##  Max.   :90.00   Max.   :100.000   Max.   :98.10   Max.   :37.30

head(clean_district_data)

## # A tibble: 6 × 4
##   DDB00A001322R DPSTBLFP DPETBLAP DPSTKIDR
##           <dbl>    <dbl>    <dbl>    <dbl>
## 1             3      8.3      4.4     12.3
## 2             8      2.9      4       11  
## 3             6      4        8.5     10.8
## 4            19      6.5      8.2     11.3
## 5            11      9.6     25.1     12.9
## 6            11     11.6     19.7     11

model1<-lm(DDB00A001322R~DPSTBLFP+DPETBLAP+DPSTKIDR, data=clean_district_data)

summary(model1)

## 
## Call:
## lm(formula = DDB00A001322R ~ DPSTBLFP + DPETBLAP + DPSTKIDR, 
##     data = clean_district_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.391  -7.625  -1.753   4.156  75.382 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.01138    1.75111   0.578   0.5637    
## DPSTBLFP    -0.11896    0.05023  -2.368   0.0181 *  
## DPETBLAP     0.05846    0.04987   1.172   0.2414    
## DPSTKIDR     0.90388    0.13319   6.786 1.96e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.55 on 1008 degrees of freedom
## Multiple R-squared:  0.04557,    Adjusted R-squared:  0.04273 
## F-statistic: 16.04 on 3 and 1008 DF,  p-value: 3.411e-10

The p-value of 3.411e-10 is lower than .05 which makes it statistically significant, but the multiple r-squared of .04557 and adjusted r-squared of .04273 show that this model only explains 5% and 4.3% of the varience, respectively.

If my understanding is correct, then this means that the significant variables are DPSTBLFP (percent of teachers who are African American) and DPSTKIDR (number of students per teacher)

The DPSTKIDR estimate of 0.90388 shows that there is a strong positive effect of number of students per teacher on African American students mastering grade level for STAAR testing. DPSTBLFP estimate of -0.11896 shows that there is a weak negative effect of percent of African American teachers on African American students mastering grade level for STAAR testing.

plot(model1,which=1)

According to the plot, it appears that this model violates the assumption of linearity.

HW 6

Alexis Garay

2025-10-23