WV County Education Outcomes Prediction

By: Heidi Hartje and Sophie Beddingfield

https://rpubs.com/snb00005/1254867

Thesis:

Unemployment and local revenue are possible determinants of educational proficiency at the county level in West Virginia.

Intro:

Higher unemployment and lower revenue are correlated to decreased proficiency rates due to limited funding, reduced community resources, and socioeconomic instability affecting students and educational institutions.

Our project looks at the unemployment rate, amount of local revenue, and proficiency rates in science, reading, and math for each county of West Virginia.
We wanted to find the correlation between local revenue and proficiencies to see if less funded counties were negatively affected educationally.
Then from our research we found that most of the local revenue came from property taxes, so our initial thoughts were to correlate homelessness to these factors. Ultimately, there was no correlation, but we did find a small correlation between local revenue and unemployment.

Data Description:

We combined our data into one tibble with unemployment rates, amount of local revenue, and each subject’s proficiencies within each county of West Virginia.

unemployed: percentage rate of population that is unemployed. (From Dr. Garrett’s Github (2022))
tlocrev: Dollar amount of local revenue/ funding. (From US Census Education Spending / Dr. Garrett’s shared files (2022))
Science/ Math/ Reading Proficiency Rates: percentage rate of proficiency in each school subject as labeled for elementary and middle school in each county of West Virginia (From Dr. Garrett’s shared files (2022))
avg_proficiency: calculated average field of science Proficiency rate, math proficiency rate, and reading proficiency rate.
unemployed_low: 1/0 field indicating if unemployed was 7% or below.
avgproficiency_low: 1/0 field indicating if avg_proficiency was 31% or below.
tocrev_low: 1/0 field indicating if local revenue is $14,000 or below.
(All the low fields are based on medians)

Method:

We utilized ggcorrplot, linear regression, k-cluster, and neural networks.

Correlations:

Linear Regression Models:

This will apply to later NN models but we can see a low r-squared.

## 
## Call:
## lm(formula = tlocrev ~ unemployed, data = t)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -37127 -14464  -7352   9991 114725 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    50042       8848   5.655 6.36e-07 ***
## unemployed     -3545       1161  -3.055  0.00352 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24890 on 53 degrees of freedom
## Multiple R-squared:  0.1497, Adjusted R-squared:  0.1337 
## F-statistic: 9.331 on 1 and 53 DF,  p-value: 0.003522

Due to the data not being strongly statistically significant, even after looking at multiple variables, we decided to explore deeper into the differences in significance within each of the different proficiencies.

First, we looked at local revenue on science proficiency rate (lm2). R-squared of 0.1824.

## 
## Call:
## lm(formula = tlocrev ~ `Science Proficiency Rate (%)`, data = t)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33061 -12609  -4457   6500 119543 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    -25119.8    14261.8  -1.761 0.083950 .  
## `Science Proficiency Rate (%)`   1926.3      533.3   3.612 0.000675 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24180 on 53 degrees of freedom
## Multiple R-squared:  0.1975, Adjusted R-squared:  0.1824 
## F-statistic: 13.05 on 1 and 53 DF,  p-value: 0.000675

Second we looked at local revenue on math proficiency rate (lm3). R- squared of 0.1279.

## 
## Call:
## lm(formula = tlocrev ~ `Math Proficiency Rate (%)`, data = t)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -36129 -12019  -5106   5985 117024 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                 -19480.5    15277.2  -1.275  0.20782   
## `Math Proficiency Rate (%)`   1427.4      477.9   2.987  0.00426 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24970 on 53 degrees of freedom
## Multiple R-squared:  0.1441, Adjusted R-squared:  0.1279 
## F-statistic: 8.923 on 1 and 53 DF,  p-value: 0.004259

Last, we looked at local revenue on reading proficiency rate (lm4). R-squared of 0.2679.

## 
## Call:
## lm(formula = tlocrev ~ `Reading Proficiency Rate (%)`, data = t)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -26126 -12845  -5310   6054 112884 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    -68484.4    20754.2  -3.300  0.00173 ** 
## `Reading Proficiency Rate (%)`   2367.6      519.6   4.557 3.09e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22880 on 53 degrees of freedom
## Multiple R-squared:  0.2815, Adjusted R-squared:  0.2679 
## F-statistic: 20.76 on 1 and 53 DF,  p-value: 3.094e-05

Graphs:

Proficiency Graphs: Here we can see that overall, the proficiencies fall in similar patterns across the state – where one proficiency is high or low the others usually follow.

K-Cluster:

This k-cluster graph now incorporates local revenue and when compared to the proficiency graphs, we see that high and low revenues are following high and low proficiencies.

Neural Networks:

1.

For our first NN we did low unemployment on local revenue.
We used 20 nodes and a train to test ratio of 70/30.
When unemployment is low, is local revenue predicted to be low as well?
The train data returned an accuracy of 0.7368.
The test data returned an accuracy of 0.7647.

## [1] 0.7368421

##                 
## vector_predicted  0  1
##                0 15  7
##                1  3 13

## [1] 0.7647059

##                 
## vector_predicted 0 1
##                0 0 9
##                1 8 0

2.

For our second NN we did average proficiency low on unemployment.
We used 20 nodes and a train/test ration of 70/30.
When average proficiency is low, what is unemployment rate?
The train data returned an accuracy of 0.5000.
The test data returned an accuracy of 0.4706.

## [1] 0.5

##                 
## vector_predicted  0  1
##                1 19 19

## [1] 0.4705882

##                 
## vector_predicted 0 1
##                1 9 8

Limitations:

Spending might have been a better variable to have compared to the separate subjects so we could explore how they spent the money whether it was books, technology, etc.
We only used 2022 data which doesn’t give as much historical data to explore, test on, and build models for the future from.

Conclusion & Reccomendations:

Based our models, a regional focus on helping lower the unemployment rate would positively affect local revenue and average proficiency.
But the local revenue’s effect on proficiencies wouldn’t be as strong of a correlation as we originally thought.
Our recommendations for all counties of West Virginia (but mostly the ones with higher unemployment) would be to focus on social and educational programs, focus on target groups, support smaller businesses, and ultimately make policy changes to ensure fair wages and opportunities.

Sources:

ChatGPT help for navigating “fips” on US plots and rowMeans in t.
NN coding help from sms activity in class.
K-means coding help from states activity in class.

 Revenue data from US Census Education Spending.  (https://www.dropbox.com/scl/fo/s29xwwg21irckz9gzjx39/AAPdqRYIvgEOqGr2P2BHk7E/us%20census%20ed%20spending?dl=0&rlkey=4h226idmd0n696zyjcrk2kegb&subfolder_nav_tracking=1)

 Unemployed data from Dr. Garett’s demographics shared file. (https://www.dropbox.com/scl/fo/s29xwwg21irckz9gzjx39/AD_fXBppotYa5nCRwZPYeu8/demographics?dl=0&rlkey=4h226idmd0n696zyjcrk2kegb&subfolder_nav_tracking=1)

Proficiency data from WV Summative Assessment Results. (https://www.dropbox.com/scl/fo/s29xwwg21irckz9gzjx39/AJeCsHSnsVztGEcjGdz_0fg/wv%20ed%20student%20achievement?dl=0&rlkey=4h226idmd0n696zyjcrk2kegb&subfolder_nav_tracking=1)

WV County Education Outcomes Prediction

Sophie Beddingfield

2024-12-08

By: Heidi Hartje and Sophie Beddingfield

Thesis:

Unemployment and local revenue are possible determinants of educational proficiency at the county level in West Virginia.

Intro:

Higher unemployment and lower revenue are correlated to decreased proficiency rates due to limited funding, reduced community resources, and socioeconomic instability affecting students and educational institutions.

Data Description:

We combined our data into one tibble with unemployment rates, amount of local revenue, and each subject’s proficiencies within each county of West Virginia.

Method:

Correlations:

Linear Regression Models:

Due to the data not being strongly statistically significant, even after looking at multiple variables, we decided to explore deeper into the differences in significance within each of the different proficiencies.

Graphs:

Proficiency Graphs: Here we can see that overall, the proficiencies fall in similar patterns across the state – where one proficiency is high or low the others usually follow.

K-Cluster:

This k-cluster graph now incorporates local revenue and when compared to the proficiency graphs, we see that high and low revenues are following high and low proficiencies.

Neural Networks:

1.

2.

Limitations:

Conclusion & Reccomendations:

Sources: