#Project 3

Introduction

The quality of public education is a cornerstone of societal development, influencing individual success and community well-being. This project focuses on analyzing public school data in West Virginia, a state with unique demographic, economic, and educational challenges. The project aims to uncover trends and correlations between student achievement, financial investment in education, and socioeconomic factors across West Virginia counties. More specifically, I aim to uncover any trends in unemployment and see if they affect proficiency scores.

Data

The data being used for this includes the following:

  • Assessment Data: Dataset including proficiency in scoring by county, as well as by school. Students are grouped by age for some parts of this project to see how certain age groups react to unemployment.

  • Demographic Data: The unemployment rate of each county

Predictions

Children are always affected by the world around them and it is crucial to start learning early in school to ensure the full capacity of knowledge when you are older. I think that unemployment plays a role in how successful schools are in their standardized testing, and I think scores will be worse where unemployment is higher.

Correlations

This correlation shows high positive correlations between revenues, which makes sense. There is also pretty strong positive correlation between testing proficiencies, which also adds up. One thing that surprised me from this analysis is how low the correlation between revenues and proficiencies are. There is a slight positive correlation, but it is not strong enough to work with for this example. What we will be looking at here is the correlations between unemployment and median income, and how those two affect the outcomes of testing scores. As shown above, the relationship between enrollment and unemployment rates of each county have a negative correlation, meaning that when enrollment is down, unemployment is up. This can be for a few reasons:

However, we want to look at how unemployment affects children. Say that the case is that they are helping to pay the bills, that should reflect a lower score in standardized testing for high school students. Let’s look into that next.

This graph shows an interesting idea - according to the trend line, middle schoolers are more likely to be proficient in science than high schoolers, but high schoolers tend to be more proficient when taking math. West Virginia sets up their standardized testing so that Science is taken in grades 5, 8 and 11, so every third year. Math and writing, on the other hand, are required from grades 3-8 and again in grade 11.

Linear Regression Models

## 
## Call:
## lm(formula = math_proficiency ~ unemployed + reading_proficiency + 
##     science_proficiency, data = middle_schools)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.2696  -4.0279   0.3458   3.9211  13.6862 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -7.10783    3.43424  -2.070   0.0407 *  
## unemployed          -0.05161    0.24905  -0.207   0.8362    
## reading_proficiency  0.64358    0.08700   7.397 2.21e-11 ***
## science_proficiency  0.20821    0.08450   2.464   0.0152 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.386 on 118 degrees of freedom
## Multiple R-squared:  0.5653, Adjusted R-squared:  0.5543 
## F-statistic: 51.16 on 3 and 118 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = math_proficiency ~ unemployed + reading_proficiency + 
##     science_proficiency, data = high_schools)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.7650  -3.8178   0.3681   3.5141  14.4111 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -3.25929    3.15593  -1.033   0.3041    
## unemployed          -0.07868    0.20138  -0.391   0.6968    
## reading_proficiency  0.22430    0.10138   2.212   0.0291 *  
## science_proficiency  0.59225    0.12263   4.830 4.72e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.38 on 104 degrees of freedom
## Multiple R-squared:  0.6758, Adjusted R-squared:  0.6664 
## F-statistic: 72.26 on 3 and 104 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = reading_proficiency ~ unemployed + math_proficiency + 
##     science_proficiency, data = middle_schools)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.132  -3.561   0.155   2.691  18.387 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         21.11279    2.36029   8.945 6.21e-15 ***
## unemployed          -0.28166    0.21630  -1.302    0.195    
## math_proficiency     0.49227    0.06655   7.397 2.21e-11 ***
## science_proficiency  0.32667    0.06956   4.696 7.22e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.585 on 118 degrees of freedom
## Multiple R-squared:  0.6217, Adjusted R-squared:  0.6121 
## F-statistic: 64.65 on 3 and 118 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = reading_proficiency ~ unemployed + math_proficiency + 
##     science_proficiency, data = high_schools)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.7395  -3.2221  -0.0972   2.8832  15.4264 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         20.62494    2.21357   9.318 2.25e-15 ***
## unemployed          -0.25092    0.18890  -1.328   0.1870    
## math_proficiency     0.20040    0.09058   2.212   0.0291 *  
## science_proficiency  0.88488    0.09445   9.369 1.73e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.085 on 104 degrees of freedom
## Multiple R-squared:  0.7915, Adjusted R-squared:  0.7855 
## F-statistic: 131.6 on 3 and 104 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = science_proficiency ~ unemployed + reading_proficiency + 
##     math_proficiency, data = middle_schools)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.502  -4.608  -1.410   3.251  23.629 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          2.41031    3.70758   0.650   0.5169    
## unemployed          -0.03258    0.26462  -0.123   0.9022    
## reading_proficiency  0.48206    0.10265   4.696 7.22e-06 ***
## math_proficiency     0.23501    0.09538   2.464   0.0152 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.784 on 118 degrees of freedom
## Multiple R-squared:  0.4602, Adjusted R-squared:  0.4465 
## F-statistic: 33.54 on 3 and 118 DF,  p-value: 9.485e-16
## 
## Call:
## lm(formula = science_proficiency ~ unemployed + reading_proficiency + 
##     math_proficiency, data = high_schools)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3218  -1.9418   0.2942   2.5901  11.6788 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -4.94909    2.24046  -2.209   0.0294 *  
## unemployed          -0.01227    0.14564  -0.084   0.9330    
## reading_proficiency  0.51726    0.05521   9.369 1.73e-15 ***
## math_proficiency     0.30932    0.06405   4.830 4.72e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.888 on 104 degrees of freedom
## Multiple R-squared:  0.8182, Adjusted R-squared:  0.813 
## F-statistic: 156.1 on 3 and 104 DF,  p-value: < 2.2e-16

These models use each proficiency as the dependent variable and tests it between middle school and high school students. Some of the models are better than others, and there are some trends to uncover here. We can see that unemployment is statistically insignificant to most models. The R Squared values seem to be much higher for high school compared to middle school. This could be for a couple reasons:

PCA

## Importance of first k=2 (out of 4) components:
##                           PC1    PC2
## Standard deviation     1.5791 0.9736
## Proportion of Variance 0.6234 0.2370
## Cumulative Proportion  0.6234 0.8604
## Standard deviations (1, .., p=4):
## [1] 1.5790856 0.9736123 0.5507143 0.5052538
## 
## Rotation (n x k) = (4 x 2):
##                            PC1         PC2
## science_proficiency  0.5667548 -0.14152212
## math_proficiency     0.5637386 -0.04662995
## reading_proficiency  0.5700716 -0.13906040
## unemployed          -0.1897529 -0.97900937

Decision Tree

More of the same from this tree, predicting math proficiency using unemployment and enrollment and other testing scores. Not as big of a jump at the end in terms of percentage, only a 9% increase from the second highest bucket to the highest proficiency bucket.

Recommendations

I have a few recommendations based on the analysis of this project.

References