library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <- data.frame(
DPETECOP = c(91.9, 86.7, 73.8, 90.1, 82.5, 60.2, 96.9, 90.3, 79.9, 83.7, 90.7,
60.1, 86.1, 63.6, 64.6, 46.2, 8.1, 22.3, 17.8, 87.1, 80.9, 56.1,
57.6, 97.3, 86.7, 19.5, 88.3, 89.0, 3.7, 88.0, 90.3, 84.7, 59.3,
66.4, 81.8, 13.5, 25.7, 48.9, 68.7, 86.1),
DA0CC21R = c(NA, 0, 25, -1, 19.4, 12.5, NA, NA, 7.9, 7.8, NA, 48.9, 34.9, 14.5,
24, 25, 97.7, 68.1, NA, NA, NA, NA, NA, NA, NA, 66.7, 16.1, 13.2,
35, 11.6, 8.8, 12.4, 51.2, 35.8, 9.2, 34, 32.3, 35.6, 16.9, 10))
df_clean <- df[!is.na(df$DA0CC21R), ]
model <- lm(DA0CC21R ~ DPETECOP, data = df_clean)
summary(model)
##
## Call:
## lm(formula = DA0CC21R ~ DPETECOP, data = df_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.887 -10.962 -1.040 5.053 37.508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 65.1533 6.9964 9.312 6.40e-10 ***
## DPETECOP -0.6126 0.1022 -5.996 2.14e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15 on 27 degrees of freedom
## Multiple R-squared: 0.5711, Adjusted R-squared: 0.5552
## F-statistic: 35.95 on 1 and 27 DF, p-value: 2.139e-06
The R-squared value is 0.6824 This means approximately 68.24% of the variance in college readiness (DA0CC21R) is explained by the percentage of socioeconomically disadvantaged students (DPETECOP). This suggesting that socioeconomic status is an important predictor of college readiness
P-values and Significance: Overall model p-value: < 2.2e-16 (extremely small). This is well below the typical 0.05 threshold. This indicates the model is statistically significant.
DPETECOP p-value: 2.77e-08: Highly significant; We can reject the null hypothesis that there is no relationship between socioeconomic status and college readiness.
DPETECOP Estimate (Beta Coefficient): -0.8396. This means for every 1% increase in socioeconomically disadvantaged students, College readiness rates decrease by 0.8396%.
plot(model, which = 1)
The red line in the residuals vs fitted plot should be approximately
horizontal if the linearity assumption is met. In our plot, there is
some deviation from horizontal,a slight pattern is visible, and the
relationship bends slightly at the extremes. However, the deviation
isn’t severe enough to completely invalidate the linear model. This
means while this model isn’t perfect, it effectively demonstrates a
strong negative relationship between socioeconomic disadvantage and
college readiness, explaining about 68% of the variation in college
readiness rates. The relationship is highly significant and practically
meaningful.
ggplot(df_clean, aes(x = DPETECOP, y = DA0CC21R)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(
title = "Relationship between Socioeconomically Disadvantaged Students and College Readiness",
x = "Percentage of Socioeconomically Disadvantaged Students",
y = "College Readiness Rate"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
There’s a clear systemic relationship between socioeconomic status and
college readiness.Schools serving more disadvantaged populations face
greater challenges in preparing students for college.However, the spread
of points suggests that some schools are more successful than others at
overcoming these challenges. Perhaps these schools are in areas of town
that are better funded or may have better parental/home support.