By: Heidi Hartje and Sophie Beddingfield

Thesis:

Unemployment and local revenue are possible determinants of educational proficiency at the county level in West Virginia.

Intro:

Higher unemployment and lower revenue are correlated to decreased proficiency rates due to limited funding, reduced community resources, and socioeconomic instability affecting students and educational institutions.

  • Our project looks at the unemployment rate, amount of local revenue, and proficiency rates in science, reading, and math for each county of West Virginia.

  • We wanted to find the correlation between local revenue and proficiencies to see if less funded counties were negatively affected educationally.

  • Then from our research we found that most of the local revenue came from property taxes, so our initial thoughts were to correlate homelessness to these factors. Ultimately, there was no correlation, but we did find a small correlation between local revenue and unemployment.

Data Description:

We combined our data into one tibble with unemployment rates, amount of local revenue, and each subject’s proficiencies within each county of West Virginia.

  • unemployed: percentage rate of population that is unemployed. (From Dr. Garrett’s Github (2022))

  • tlocrev: Dollar amount of local revenue/ funding. (From US Census Education Spending / Dr. Garrett’s shared files (2022))

  • Science/ Math/ Reading Proficiency Rates: percentage rate of proficiency in each school subject as labeled for elementary and middle school in each county of West Virginia (From Dr. Garrett’s shared files (2022))

  • avg_proficiency: calculated average field of science Proficiency rate, math proficiency rate, and reading proficiency rate.

  • unemployed_low: 1/0 field indicating if unemployed was 7% or below.

  • avgproficiency_low: 1/0 field indicating if avg_proficiency was 31% or below.

  • tocrev_low: 1/0 field indicating if local revenue is $14,000 or below.

  • (All the low fields are based on medians)

Method:

Correlations:

Linear Regression Models:

  • This will apply to later NN models but we can see a low r-squared.
## 
## Call:
## lm(formula = tlocrev ~ unemployed, data = t)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -37127 -14464  -7352   9991 114725 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    50042       8848   5.655 6.36e-07 ***
## unemployed     -3545       1161  -3.055  0.00352 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24890 on 53 degrees of freedom
## Multiple R-squared:  0.1497, Adjusted R-squared:  0.1337 
## F-statistic: 9.331 on 1 and 53 DF,  p-value: 0.003522

Due to the data not being strongly statistically significant, even after looking at multiple variables, we decided to explore deeper into the differences in significance within each of the different proficiencies.

  • First, we looked at local revenue on science proficiency rate (lm2). R-squared of 0.1824.
## 
## Call:
## lm(formula = tlocrev ~ `Science Proficiency Rate (%)`, data = t)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33061 -12609  -4457   6500 119543 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    -25119.8    14261.8  -1.761 0.083950 .  
## `Science Proficiency Rate (%)`   1926.3      533.3   3.612 0.000675 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24180 on 53 degrees of freedom
## Multiple R-squared:  0.1975, Adjusted R-squared:  0.1824 
## F-statistic: 13.05 on 1 and 53 DF,  p-value: 0.000675
  • Second we looked at local revenue on math proficiency rate (lm3). R- squared of 0.1279.
## 
## Call:
## lm(formula = tlocrev ~ `Math Proficiency Rate (%)`, data = t)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -36129 -12019  -5106   5985 117024 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                 -19480.5    15277.2  -1.275  0.20782   
## `Math Proficiency Rate (%)`   1427.4      477.9   2.987  0.00426 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24970 on 53 degrees of freedom
## Multiple R-squared:  0.1441, Adjusted R-squared:  0.1279 
## F-statistic: 8.923 on 1 and 53 DF,  p-value: 0.004259
  • Last, we looked at local revenue on reading proficiency rate (lm4). R-squared of 0.2679.
## 
## Call:
## lm(formula = tlocrev ~ `Reading Proficiency Rate (%)`, data = t)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -26126 -12845  -5310   6054 112884 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    -68484.4    20754.2  -3.300  0.00173 ** 
## `Reading Proficiency Rate (%)`   2367.6      519.6   4.557 3.09e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22880 on 53 degrees of freedom
## Multiple R-squared:  0.2815, Adjusted R-squared:  0.2679 
## F-statistic: 20.76 on 1 and 53 DF,  p-value: 3.094e-05

Graphs:

Proficiency Graphs: Here we can see that overall, the proficiencies fall in similar patterns across the state – where one proficiency is high or low the others usually follow.

K-Cluster:

This k-cluster graph now incorporates local revenue and when compared to the proficiency graphs, we see that high and low revenues are following high and low proficiencies.

Neural Networks:

1.

  • For our first NN we did low unemployment on local revenue.

  • We used 20 nodes and a train to test ratio of 70/30.

  • When unemployment is low, is local revenue predicted to be low as well?

  • The train data returned an accuracy of 0.7368.

  • The test data returned an accuracy of 0.7647.

## [1] 0.7368421
##                 
## vector_predicted  0  1
##                0 15  7
##                1  3 13
## [1] 0.7647059
##                 
## vector_predicted 0 1
##                0 0 9
##                1 8 0

2.

  • For our second NN we did average proficiency low on unemployment.

  • We used 20 nodes and a train/test ration of 70/30.

  • When average proficiency is low, what is unemployment rate?

  • The train data returned an accuracy of 0.5000.

  • The test data returned an accuracy of 0.4706.

## [1] 0.5
##                 
## vector_predicted  0  1
##                1 19 19
## [1] 0.4705882
##                 
## vector_predicted 0 1
##                1 9 8

Limitations:

Conclusion & Reccomendations:

Sources: