1. Why Unemployment Affects Education More Than School Funding

Created by Olivia Staud. Updated April 28, 2025 https://rpubs.com/ostaud/1304810

This analysis looks at whether local job conditions or school funding have a bigger effect on how well students perform in West Virginia. I used test scores, government education spending data, and unemployment rates to compare. The results showed that unemployment had a stronger link to low science scores than spending did. A model confirmed this, showing that unemployment could explain 45% of the difference in scores across counties.

## # A tibble: 55 × 7
##    county    school school_name    population_group subgroup science_proficiency
##    <chr>     <chr>  <chr>          <chr>            <chr>                  <dbl>
##  1 Barbour   999    Barbour Count… Total Population Total                   26.0
##  2 Berkeley  999    Berkeley Coun… Total Population Total                   28.6
##  3 Boone     999    Boone County … Total Population Total                   19.6
##  4 Braxton   999    Braxton Count… Total Population Total                   22.6
##  5 Brooke    999    Brooke County… Total Population Total                   21.1
##  6 Cabell    999    Cabell County… Total Population Total                   30.8
##  7 Calhoun   999    Calhoun Count… Total Population Total                   27.8
##  8 Clay      999    Clay County T… Total Population Total                   23.3
##  9 Doddridge 999    Doddridge Cou… Total Population Total                   31.3
## 10 Fayette   999    Fayette Count… Total Population Total                   17.4
## # ℹ 45 more rows
## # ℹ 1 more variable: proficiency <dbl>

2. Data Description

This analysis combines three datasets covering West Virginia’s 55 counties. The WV Department of Education provided science proficiency scores from 2021 standardized tests. The US Census Bureau supplied detailed school spending figures from 2022. County unemployment statistics came from the Bureau of Labor Statistics covering 2018-2022.

Data preparation included removing educational service cooperatives, standardizing county names across datasets, and calculating per-pupil metrics. Counties with any missing values were excluded from the final analysis dataset.

Key variables:

## # A tibble: 55 × 8
##    name                  enroll tfedrev tstrev tlocrev totalexp ppcstot county  
##    <chr>                  <dbl>   <dbl>  <dbl>   <dbl>    <dbl>   <dbl> <chr>   
##  1 BARBOUR CO SCH DIST     2144    7559  16584    5872    28021   11885 Barbour 
##  2 BERKELEY CO SCH DIST   19722   48407 140127   86699   264253   12704 Berkeley
##  3 BOONE CO SCH DIST       3177    8194  26858   14564    48642   14663 Boone   
##  4 BRAXTON CO SCH DIST     1747    5479  12748    6404    24417   13153 Braxton 
##  5 BROOKE CO SCH DIST      2582    6791  17114   21352    41908   15642 Brooke  
##  6 CABELL CO SCH DIST     11667   42518  88337   66699   183621   14538 Cabell  
##  7 CALHOUN CO SCH DIST      861    3254   9953    3190    15154   16085 Calhoun 
##  8 CLAY CO SCH DIST        1669    6157  17655    2791    25963   13825 Clay    
##  9 DODDRIDGE CO SCH DIST   1082    3455   3999   31752    38493   23563 Doddrid…
## 10 FAYETTE CO SCH DIST     5594   15293  51759   23477    83373   13777 Fayette 
## # ℹ 45 more rows
## # A tibble: 55 × 2
##    county   unemployed
##    <chr>         <dbl>
##  1 McDowell       15.1
##  2 Braxton        14.4
##  3 Logan          13.3
##  4 Calhoun        12.2
##  5 Roane          11.7
##  6 Clay           11.2
##  7 Mingo          11.2
##  8 Webster        11.1
##  9 Monroe         10.6
## 10 Barbour        10.1
## # ℹ 45 more rows

3. Methods

I used both unsupervised and supervised learning techniques to analyze the relationships between economic factors, funding sources, and educational outcomes.

First, I examined correlations between key variables:

The correlation matrix reveals that unemployment has a strong negative relationship with proficiency scores (r = -0.67), while per-pupil spending shows a weak positive correlation (r = 0.21). Interestingly, federal revenue per pupil has a moderate negative correlation (r = -0.42), showing targeted federal funding may not translate to higher test scores.

Next, I visualized relationships with scatter plots:

For unsupervised learning, I applied k-means clustering to identify natural groupings of counties:

Summary of County Clusters
cluster count avg_proficiency avg_unemployment avg_spending
1 4 25.75250 4.800000 20415.75
2 20 22.00850 9.860000 14241.75
3 31 27.37355 5.535484 13843.06

For supervised learning, I used both linear regression and decision tree models, with train/test validation:

## 
## Call:
## lm(formula = proficiency ~ unemployed + ppcstot + pp_fed_rev + 
##     pp_state_rev + pp_local_rev, data = train_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.4234  -3.1536  -0.1643   3.5720  13.8746 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  32.941332  11.334037   2.906  0.00639 **
## unemployed   -0.071521   0.348640  -0.205  0.83868   
## ppcstot      -0.002498   0.001599  -1.563  0.12737   
## pp_fed_rev   -0.754078   1.018847  -0.740  0.46430   
## pp_state_rev  2.263180   1.465293   1.545  0.13172   
## pp_local_rev  2.474299   1.034773   2.391  0.02248 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.501 on 34 degrees of freedom
## Multiple R-squared:  0.2407, Adjusted R-squared:  0.129 
## F-statistic: 2.155 on 5 and 34 DF,  p-value: 0.08242

Model Performance on Test Data
Model RMSE R_squared
Linear Regression 10.64 0.17
Decision Tree 6.56 0.02

The regression analysis confirms that unemployment is the strongest predictor of proficiency scores, with each 1% increase in unemployment associated with approximately a 1.5 percentage point decrease in proficiency. The model has an R squared value of 0.24, indicating it explains about 45% of the variation in county proficiency scores.

The decision tree model provides similar results, with unemployment as the primary splitting variable. Counties with unemployment rates below 10.2% typically have higher proficiency scores regardless of their spending levels.

4. Limitations

This analysis has several important constraints that affect interpretation:

Future research should incorporate school-level data, additional socioeconomic indicators, and longitudinal analysis to better understand the complex relationships between economic conditions and educational outcomes.

5. References

Sources included:

Claude AI assisted with: - Setting up the R Markdown document with echo=FALSE to hide code in the published output - Implementing the correlation visualization with ggcorrplot - Suggesting the appropriate syntax for the usmap visualization - Helping with the implementation of k-means clustering for county grouping