The quality of public education is a cornerstone of societal development, influencing individual success and community well-being. This project focuses on analyzing public school data in West Virginia, a state with unique demographic, economic, and educational challenges. The project aims to uncover trends and correlations between student achievement, financial investment in education, and socioeconomic factors across West Virginia counties. More specifically, I aim to uncover any trends in unemployment and see if they affect proficiency scores.
The data being used for this includes the following:
Assessment Data: Dataset including proficiency in scoring by county, as well as by school. Students are grouped by age for some parts of this project to see how certain age groups react to unemployment.
Demographic Data: The unemployment rate of each county
Children are always affected by the world around them and it is crucial to start learning early in school to ensure the full capacity of knowledge when you are older. I think that unemployment plays a role in how successful schools are in their standardized testing, and I think scores will be worse where unemployment is higher.
This correlation shows high positive correlations between revenues,
which makes sense. There is also pretty strong positive correlation
between testing proficiencies, which also adds up. One thing that
surprised me from this analysis is how low the correlation between
revenues and proficiencies are. There is a slight positive correlation,
but it is not strong enough to work with for this example. What we will
be looking at here is the correlations between unemployment and median
income, and how those two affect the outcomes of testing scores. As
shown above, the relationship between enrollment and unemployment rates
of each county have a negative correlation, meaning that when enrollment
is down, unemployment is up. This can be for a few reasons:
People cannot afford to send their children to school
People who are unemployed may not prioritize education for their children and ask for help paying the bills
Education may be poor in these areas, leading to a hard time finding a job after graduation.
However, we want to look at how unemployment affects children. Say that the case is that they are helping to pay the bills, that should reflect a lower score in standardized testing for high school students. Let’s look into that next.
This graph shows an interesting idea - according to the trend line, middle schoolers are more likely to be proficient in science than high schoolers, but high schoolers tend to be more proficient when taking math. West Virginia sets up their standardized testing so that Science is taken in grades 5, 8 and 11, so every third year. Math and writing, on the other hand, are required from grades 3-8 and again in grade 11.
##
## Call:
## lm(formula = math_proficiency ~ unemployed + reading_proficiency +
## science_proficiency, data = middle_schools)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.2696 -4.0279 0.3458 3.9211 13.6862
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.10783 3.43424 -2.070 0.0407 *
## unemployed -0.05161 0.24905 -0.207 0.8362
## reading_proficiency 0.64358 0.08700 7.397 2.21e-11 ***
## science_proficiency 0.20821 0.08450 2.464 0.0152 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.386 on 118 degrees of freedom
## Multiple R-squared: 0.5653, Adjusted R-squared: 0.5543
## F-statistic: 51.16 on 3 and 118 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = math_proficiency ~ unemployed + reading_proficiency +
## science_proficiency, data = high_schools)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.7650 -3.8178 0.3681 3.5141 14.4111
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.25929 3.15593 -1.033 0.3041
## unemployed -0.07868 0.20138 -0.391 0.6968
## reading_proficiency 0.22430 0.10138 2.212 0.0291 *
## science_proficiency 0.59225 0.12263 4.830 4.72e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.38 on 104 degrees of freedom
## Multiple R-squared: 0.6758, Adjusted R-squared: 0.6664
## F-statistic: 72.26 on 3 and 104 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = reading_proficiency ~ unemployed + math_proficiency +
## science_proficiency, data = middle_schools)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.132 -3.561 0.155 2.691 18.387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.11279 2.36029 8.945 6.21e-15 ***
## unemployed -0.28166 0.21630 -1.302 0.195
## math_proficiency 0.49227 0.06655 7.397 2.21e-11 ***
## science_proficiency 0.32667 0.06956 4.696 7.22e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.585 on 118 degrees of freedom
## Multiple R-squared: 0.6217, Adjusted R-squared: 0.6121
## F-statistic: 64.65 on 3 and 118 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = reading_proficiency ~ unemployed + math_proficiency +
## science_proficiency, data = high_schools)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.7395 -3.2221 -0.0972 2.8832 15.4264
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.62494 2.21357 9.318 2.25e-15 ***
## unemployed -0.25092 0.18890 -1.328 0.1870
## math_proficiency 0.20040 0.09058 2.212 0.0291 *
## science_proficiency 0.88488 0.09445 9.369 1.73e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.085 on 104 degrees of freedom
## Multiple R-squared: 0.7915, Adjusted R-squared: 0.7855
## F-statistic: 131.6 on 3 and 104 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = science_proficiency ~ unemployed + reading_proficiency +
## math_proficiency, data = middle_schools)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.502 -4.608 -1.410 3.251 23.629
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.41031 3.70758 0.650 0.5169
## unemployed -0.03258 0.26462 -0.123 0.9022
## reading_proficiency 0.48206 0.10265 4.696 7.22e-06 ***
## math_proficiency 0.23501 0.09538 2.464 0.0152 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.784 on 118 degrees of freedom
## Multiple R-squared: 0.4602, Adjusted R-squared: 0.4465
## F-statistic: 33.54 on 3 and 118 DF, p-value: 9.485e-16
##
## Call:
## lm(formula = science_proficiency ~ unemployed + reading_proficiency +
## math_proficiency, data = high_schools)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.3218 -1.9418 0.2942 2.5901 11.6788
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.94909 2.24046 -2.209 0.0294 *
## unemployed -0.01227 0.14564 -0.084 0.9330
## reading_proficiency 0.51726 0.05521 9.369 1.73e-15 ***
## math_proficiency 0.30932 0.06405 4.830 4.72e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.888 on 104 degrees of freedom
## Multiple R-squared: 0.8182, Adjusted R-squared: 0.813
## F-statistic: 156.1 on 3 and 104 DF, p-value: < 2.2e-16
These models use each proficiency as the dependent variable and tests it between middle school and high school students. Some of the models are better than others, and there are some trends to uncover here. We can see that unemployment is statistically insignificant to most models. The R Squared values seem to be much higher for high school compared to middle school. This could be for a couple reasons:
Some middle schools include elementary schools in their data. Depending on where you go to school changes the building you’re in. Some elementary schools in West Virginia end in third grade, while others may end in sixth grade. This may attest to some variance in Middle School testing.
Data from High Schools may be more complete and concise, compared to potentially extensive middle school data.
The desire for standardized testing as a benchmark for college acceptance.
## Importance of first k=2 (out of 4) components:
## PC1 PC2
## Standard deviation 1.5791 0.9736
## Proportion of Variance 0.6234 0.2370
## Cumulative Proportion 0.6234 0.8604
## Standard deviations (1, .., p=4):
## [1] 1.5790856 0.9736123 0.5507143 0.5052538
##
## Rotation (n x k) = (4 x 2):
## PC1 PC2
## science_proficiency 0.5667548 -0.14152212
## math_proficiency 0.5637386 -0.04662995
## reading_proficiency 0.5700716 -0.13906040
## unemployed -0.1897529 -0.97900937
Proficiency is most likely reflected by PC1, where profociencies are rather high
The second PC is most likely for unemployment
More of the same from this tree, predicting math proficiency using unemployment and enrollment and other testing scores. Not as big of a jump at the end in terms of percentage, only a 9% increase from the second highest bucket to the highest proficiency bucket.
I have a few recommendations based on the analysis of this project.
Although I did not use enrollment in any of my models, there is a negative correlation between student enrollment and unemployment. Some students in West Virginia are limited due to reasons outside of their control, such as transportation or restricting weather conditions. Some students do not have a bus route, or other buses take far too long to pick them up. Other times, students who live in heavily rural areas may not be able to clear snow to get to school that day.
Unemployment may not have the strongest correlation, but there is still a somewhat negative correlation between testing scores and unemployment. Lower unemployment and less struggle for young students to see, the better they do!
Test scores are not the best across the board in West Virginia in middle school. Adding on, schools that test well get more money than schools that don’t test well. This is flawed because you are boosting the students that do good, when the students that do bad are the ones that really need the help, rather than be left in the dark to continue struggling.
Dr. Garrett’s Class Notes
ChatGPT for some Visualization Features