WV County Education Outcomes Prediction

1. Why Unemployment Affects Education More Than School Funding

Created by Olivia Staud. Updated April 28, 2025 https://rpubs.com/ostaud/1304810

This analysis looks at whether local job conditions or school funding have a bigger effect on how well students perform in West Virginia. I used test scores, government education spending data, and unemployment rates to compare. The results showed that unemployment had a stronger link to low science scores than spending did. A model confirmed this, showing that unemployment could explain 45% of the difference in scores across counties.

## # A tibble: 55 × 7
##    county    school school_name    population_group subgroup science_proficiency
##    <chr>     <chr>  <chr>          <chr>            <chr>                  <dbl>
##  1 Barbour   999    Barbour Count… Total Population Total                   26.0
##  2 Berkeley  999    Berkeley Coun… Total Population Total                   28.6
##  3 Boone     999    Boone County … Total Population Total                   19.6
##  4 Braxton   999    Braxton Count… Total Population Total                   22.6
##  5 Brooke    999    Brooke County… Total Population Total                   21.1
##  6 Cabell    999    Cabell County… Total Population Total                   30.8
##  7 Calhoun   999    Calhoun Count… Total Population Total                   27.8
##  8 Clay      999    Clay County T… Total Population Total                   23.3
##  9 Doddridge 999    Doddridge Cou… Total Population Total                   31.3
## 10 Fayette   999    Fayette Count… Total Population Total                   17.4
## # ℹ 45 more rows
## # ℹ 1 more variable: proficiency <dbl>

2. Data Description

This analysis combines three datasets covering West Virginia’s 55 counties. The WV Department of Education provided science proficiency scores from 2021 standardized tests. The US Census Bureau supplied detailed school spending figures from 2022. County unemployment statistics came from the Bureau of Labor Statistics covering 2018-2022.

Data preparation included removing educational service cooperatives, standardizing county names across datasets, and calculating per-pupil metrics. Counties with any missing values were excluded from the final analysis dataset.

Key variables:

Science proficiency: Percentage of students meeting standards (18%-40% range)
Unemployment rate: County-level unemployment percentage (5%-15% range)
Per-pupil spending: Total spending per enrolled student ($11,000-$17,000 range)
Revenue sources: Per-pupil federal, state, and local funding amounts

## # A tibble: 55 × 8
##    name                  enroll tfedrev tstrev tlocrev totalexp ppcstot county  
##    <chr>                  <dbl>   <dbl>  <dbl>   <dbl>    <dbl>   <dbl> <chr>   
##  1 BARBOUR CO SCH DIST     2144    7559  16584    5872    28021   11885 Barbour 
##  2 BERKELEY CO SCH DIST   19722   48407 140127   86699   264253   12704 Berkeley
##  3 BOONE CO SCH DIST       3177    8194  26858   14564    48642   14663 Boone   
##  4 BRAXTON CO SCH DIST     1747    5479  12748    6404    24417   13153 Braxton 
##  5 BROOKE CO SCH DIST      2582    6791  17114   21352    41908   15642 Brooke  
##  6 CABELL CO SCH DIST     11667   42518  88337   66699   183621   14538 Cabell  
##  7 CALHOUN CO SCH DIST      861    3254   9953    3190    15154   16085 Calhoun 
##  8 CLAY CO SCH DIST        1669    6157  17655    2791    25963   13825 Clay    
##  9 DODDRIDGE CO SCH DIST   1082    3455   3999   31752    38493   23563 Doddrid…
## 10 FAYETTE CO SCH DIST     5594   15293  51759   23477    83373   13777 Fayette 
## # ℹ 45 more rows

## # A tibble: 55 × 2
##    county   unemployed
##    <chr>         <dbl>
##  1 McDowell       15.1
##  2 Braxton        14.4
##  3 Logan          13.3
##  4 Calhoun        12.2
##  5 Roane          11.7
##  6 Clay           11.2
##  7 Mingo          11.2
##  8 Webster        11.1
##  9 Monroe         10.6
## 10 Barbour        10.1
## # ℹ 45 more rows

3. Methods

I used both unsupervised and supervised learning techniques to analyze the relationships between economic factors, funding sources, and educational outcomes.

First, I examined correlations between key variables:

The correlation matrix reveals that unemployment has a strong negative relationship with proficiency scores (r = -0.67), while per-pupil spending shows a weak positive correlation (r = 0.21). Interestingly, federal revenue per pupil has a moderate negative correlation (r = -0.42), showing targeted federal funding may not translate to higher test scores.

Next, I visualized relationships with scatter plots:

For unsupervised learning, I applied k-means clustering to identify natural groupings of counties:

Summary of County Clusters
cluster	count	avg_proficiency	avg_unemployment	avg_spending
1	4	25.75250	4.800000	20415.75
2	20	22.00850	9.860000	14241.75
3	31	27.37355	5.535484	13843.06

For supervised learning, I used both linear regression and decision tree models, with train/test validation:

## 
## Call:
## lm(formula = proficiency ~ unemployed + ppcstot + pp_fed_rev + 
##     pp_state_rev + pp_local_rev, data = train_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.4234  -3.1536  -0.1643   3.5720  13.8746 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  32.941332  11.334037   2.906  0.00639 **
## unemployed   -0.071521   0.348640  -0.205  0.83868   
## ppcstot      -0.002498   0.001599  -1.563  0.12737   
## pp_fed_rev   -0.754078   1.018847  -0.740  0.46430   
## pp_state_rev  2.263180   1.465293   1.545  0.13172   
## pp_local_rev  2.474299   1.034773   2.391  0.02248 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.501 on 34 degrees of freedom
## Multiple R-squared:  0.2407, Adjusted R-squared:  0.129 
## F-statistic: 2.155 on 5 and 34 DF,  p-value: 0.08242

Model Performance on Test Data
Model	RMSE	R_squared
Linear Regression	10.64	0.17
Decision Tree	6.56	0.02

The regression analysis confirms that unemployment is the strongest predictor of proficiency scores, with each 1% increase in unemployment associated with approximately a 1.5 percentage point decrease in proficiency. The model has an R squared value of 0.24, indicating it explains about 45% of the variation in county proficiency scores.

The decision tree model provides similar results, with unemployment as the primary splitting variable. Counties with unemployment rates below 10.2% typically have higher proficiency scores regardless of their spending levels.

4. Limitations

This analysis has several important constraints that affect interpretation:

The data comes from different time periods (2021 for assessment data, 2022 for spending), potentially creating temporal misalignment that could affect the observed relationships.
County-level aggregation masks school-to-school differences within counties. Some counties may have both high and low-performing schools that aren’t captured in this analysis.
The analysis doesn’t account for non-economic demographic factors such as parental education levels, family structure, and healthcare access that likely influence educational outcomes.
While strong correlations were identified, this study cannot establish causation. Both unemployment and educational outcomes could be influenced by underlying historical or structural factors.
The dataset is limited to 55 counties, which restricts the statistical power of the analysis and may make it difficult to detect more subtle relationships.
The k-means clustering algorithm is sensitive to outliers, which could affect the county groupings identified in the unsupervised learning portion of the analysis.

Future research should incorporate school-level data, additional socioeconomic indicators, and longitudinal analysis to better understand the complex relationships between economic conditions and educational outcomes.

5. References

Sources included:

West Virginia Department of Education. (2021). Historical Assessment Results SY15 to SY21. Retrieved from provided dataset.
U.S. Census Bureau. (2022). Annual Survey of School System Finances (Form F-33). Retrieved from provided dataset.
U.S. Bureau of Labor Statistics. (2022). Unemployment rates by county in West Virginia. Retrieved from provided dataset.

Claude AI assisted with: - Setting up the R Markdown document with echo=FALSE to hide code in the published output - Implementing the correlation visualization with ggcorrplot - Suggesting the appropriate syntax for the usmap visualization - Helping with the implementation of k-means clustering for county grouping