library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <- data.frame(
  DPETECOP = c(91.9, 86.7, 73.8, 90.1, 82.5, 60.2, 96.9, 90.3, 79.9, 83.7, 90.7, 
               60.1, 86.1, 63.6, 64.6, 46.2, 8.1, 22.3, 17.8, 87.1, 80.9, 56.1, 
               57.6, 97.3, 86.7, 19.5, 88.3, 89.0, 3.7, 88.0, 90.3, 84.7, 59.3, 
               66.4, 81.8, 13.5, 25.7, 48.9, 68.7, 86.1),
  DA0CC21R = c(NA, 0, 25, -1, 19.4, 12.5, NA, NA, 7.9, 7.8, NA, 48.9, 34.9, 14.5, 
               24, 25, 97.7, 68.1, NA, NA, NA, NA, NA, NA, NA, 66.7, 16.1, 13.2, 
               35, 11.6, 8.8, 12.4, 51.2, 35.8, 9.2, 34, 32.3, 35.6, 16.9, 10))
df_clean <- df[!is.na(df$DA0CC21R), ]
model <- lm(DA0CC21R ~ DPETECOP, data = df_clean)
summary(model)
## 
## Call:
## lm(formula = DA0CC21R ~ DPETECOP, data = df_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.887 -10.962  -1.040   5.053  37.508 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  65.1533     6.9964   9.312 6.40e-10 ***
## DPETECOP     -0.6126     0.1022  -5.996 2.14e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15 on 27 degrees of freedom
## Multiple R-squared:  0.5711, Adjusted R-squared:  0.5552 
## F-statistic: 35.95 on 1 and 27 DF,  p-value: 2.139e-06

The R-squared value is 0.6824 This means approximately 68.24% of the variance in college readiness (DA0CC21R) is explained by the percentage of socioeconomically disadvantaged students (DPETECOP). This suggesting that socioeconomic status is an important predictor of college readiness

P-values and Significance: Overall model p-value: < 2.2e-16 (extremely small). This is well below the typical 0.05 threshold. This indicates the model is statistically significant.

DPETECOP p-value: 2.77e-08: Highly significant; We can reject the null hypothesis that there is no relationship between socioeconomic status and college readiness.

DPETECOP Estimate (Beta Coefficient): -0.8396. This means for every 1% increase in socioeconomically disadvantaged students, College readiness rates decrease by 0.8396%.

plot(model, which = 1)

The red line in the residuals vs fitted plot should be approximately horizontal if the linearity assumption is met. In our plot, there is some deviation from horizontal,a slight pattern is visible, and the relationship bends slightly at the extremes. However, the deviation isn’t severe enough to completely invalidate the linear model. This means while this model isn’t perfect, it effectively demonstrates a strong negative relationship between socioeconomic disadvantage and college readiness, explaining about 68% of the variation in college readiness rates. The relationship is highly significant and practically meaningful.

ggplot(df_clean, aes(x = DPETECOP, y = DA0CC21R)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    title = "Relationship between Socioeconomically Disadvantaged Students and College Readiness",
    x = "Percentage of Socioeconomically Disadvantaged Students",
    y = "College Readiness Rate"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

There’s a clear systemic relationship between socioeconomic status and college readiness.Schools serving more disadvantaged populations face greater challenges in preparing students for college.However, the spread of points suggests that some schools are more successful than others at overcoming these challenges. Perhaps these schools are in areas of town that are better funded or may have better parental/home support.