By: Will Bruton

Opening

For this project, I explored how various socioeconomic factors affect education outcomes in West Virginia counties.

Correlations

  • Summary: The correlation plot reveals key relationships between socioeconomic and educational variables. BachelorsDegreePCT and MedianHouseholdIncome show strong positive correlations with proficiency, indicating that higher educational attainment and household income are linked to better academic performance. In contrast, FamiliesInPovertyPCT and LessThanHighSchoolEducationPCT exhibit strong negative correlations with proficiency, highlighting the detrimental effects of poverty and low education levels on outcomes. The unemployed variable has a weaker negative correlation with proficiency, suggesting its impact is less direct compared to other factors.

##                                unemployed LessThanHighSchoolEducationPCT
## unemployed                      1.0000000                      0.5145454
## LessThanHighSchoolEducationPCT  0.5145454                      1.0000000
## BachelorsDegreePCT             -0.4481701                     -0.6872446
## MedianHouseholdIncome          -0.5867637                     -0.7486523
## FamiliesInPovertyPCT            0.5413968                      0.6939926
## science_proficiency            -0.3233399                     -0.5855356
## reading_proficiency            -0.4452155                     -0.4637887
## math_proficiency               -0.4935898                     -0.5829703
## proficiency                    -0.3233399                     -0.5855356
## enroll                         -0.2846968                     -0.3752641
## tfedrev                        -0.1738598                     -0.2501004
## tstrev                         -0.2494595                     -0.3272404
## tlocrev                        -0.3869128                     -0.4973038
## totalexp                       -0.2880562                     -0.3748346
## ppcstot                        -0.1362845                      0.0299694
##                                BachelorsDegreePCT MedianHouseholdIncome
## unemployed                             -0.4481701           -0.58676369
## LessThanHighSchoolEducationPCT         -0.6872446           -0.74865225
## BachelorsDegreePCT                      1.0000000            0.77282500
## MedianHouseholdIncome                   0.7728250            1.00000000
## FamiliesInPovertyPCT                   -0.5097746           -0.75107674
## science_proficiency                     0.6638683            0.65439837
## reading_proficiency                     0.6646356            0.61784621
## math_proficiency                        0.6088648            0.61868302
## proficiency                             0.6638683            0.65439837
## enroll                                  0.5934898            0.41526889
## tfedrev                                 0.4267642            0.20824154
## tstrev                                  0.5603278            0.36490679
## tlocrev                                 0.6491699            0.52957317
## totalexp                                0.5909950            0.40146903
## ppcstot                                -0.1626779            0.02370495
##                                FamiliesInPovertyPCT science_proficiency
## unemployed                               0.54139677         -0.32333991
## LessThanHighSchoolEducationPCT           0.69399256         -0.58553557
## BachelorsDegreePCT                      -0.50977463          0.66386828
## MedianHouseholdIncome                   -0.75107674          0.65439837
## FamiliesInPovertyPCT                     1.00000000         -0.52986745
## science_proficiency                     -0.52986745          1.00000000
## reading_proficiency                     -0.42167222          0.80578216
## math_proficiency                        -0.48041147          0.69107297
## proficiency                             -0.52986745          1.00000000
## enroll                                  -0.12157256          0.33813495
## tfedrev                                  0.01357396          0.17721750
## tstrev                                  -0.08859247          0.30593792
## tlocrev                                 -0.23004924          0.43846237
## totalexp                                -0.11429872          0.33811529
## ppcstot                                 -0.13373170          0.09355478
##                                reading_proficiency math_proficiency proficiency
## unemployed                             -0.44521548       -0.4935898 -0.32333991
## LessThanHighSchoolEducationPCT         -0.46378866       -0.5829703 -0.58553557
## BachelorsDegreePCT                      0.66463563        0.6088648  0.66386828
## MedianHouseholdIncome                   0.61784621        0.6186830  0.65439837
## FamiliesInPovertyPCT                   -0.42167222       -0.4804115 -0.52986745
## science_proficiency                     0.80578216        0.6910730  1.00000000
## reading_proficiency                     1.00000000        0.8692654  0.80578216
## math_proficiency                        0.86926543        1.0000000  0.69107297
## proficiency                             0.80578216        0.6910730  1.00000000
## enroll                                  0.38123787        0.2746682  0.33813495
## tfedrev                                 0.21218505        0.1535191  0.17721750
## tstrev                                  0.35787915        0.2439882  0.30593792
## tlocrev                                 0.47255820        0.4079433  0.43846237
## totalexp                                0.38453883        0.2895739  0.33811529
## ppcstot                                 0.07565188        0.1383753  0.09355478
##                                    enroll     tfedrev      tstrev     tlocrev
## unemployed                     -0.2846968 -0.17385978 -0.24945948 -0.38691283
## LessThanHighSchoolEducationPCT -0.3752641 -0.25010041 -0.32724037 -0.49730381
## BachelorsDegreePCT              0.5934898  0.42676417  0.56032778  0.64916987
## MedianHouseholdIncome           0.4152689  0.20824154  0.36490679  0.52957317
## FamiliesInPovertyPCT           -0.1215726  0.01357396 -0.08859247 -0.23004924
## science_proficiency             0.3381350  0.17721750  0.30593792  0.43846237
## reading_proficiency             0.3812379  0.21218505  0.35787915  0.47255820
## math_proficiency                0.2746682  0.15351914  0.24398820  0.40794332
## proficiency                     0.3381350  0.17721750  0.30593792  0.43846237
## enroll                          1.0000000  0.91314436  0.99057978  0.90451626
## tfedrev                         0.9131444  1.00000000  0.91044615  0.84820726
## tstrev                          0.9905798  0.91044615  1.00000000  0.85436386
## tlocrev                         0.9045163  0.84820726  0.85436386  1.00000000
## totalexp                        0.9883729  0.94484238  0.97557742  0.93707098
## ppcstot                        -0.3403577 -0.26212428 -0.37809676 -0.02957104
##                                  totalexp     ppcstot
## unemployed                     -0.2880562 -0.13628452
## LessThanHighSchoolEducationPCT -0.3748346  0.02996940
## BachelorsDegreePCT              0.5909950 -0.16267794
## MedianHouseholdIncome           0.4014690  0.02370495
## FamiliesInPovertyPCT           -0.1142987 -0.13373170
## science_proficiency             0.3381153  0.09355478
## reading_proficiency             0.3845388  0.07565188
## math_proficiency                0.2895739  0.13837526
## proficiency                     0.3381153  0.09355478
## enroll                          0.9883729 -0.34035766
## tfedrev                         0.9448424 -0.26212428
## tstrev                          0.9755774 -0.37809676
## tlocrev                         0.9370710 -0.02957104
## totalexp                        1.0000000 -0.25492361
## ppcstot                        -0.2549236  1.00000000

Linear Regression Model and Testing

  • Summary: The linear regression model identified BachelorsDegreePCT and MedianHouseholdIncome as statistically significant predictors of county-level proficiency scores. The model demonstrates that a 1% increase in the percentage of individuals with a bachelor’s degree is associated with an average increase of 0.3124 in proficiency scores, while a $10,000 increase in median household income is associated with an average increase of 1.564 in proficiency scores. The model explains approximately 49% of the variance in proficiency (adjusted R-squared = 0.4707) and has a validation RMSE of 4.31, indicating reasonably accurate predictions.
## 
## Call:
## lm(formula = proficiency ~ BachelorsDegreePCT + MedianHouseholdIncome, 
##     data = t_CombinedNumbers)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3413 -2.7965 -0.4162  2.4509  8.6613 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           9.061e+00  3.327e+00   2.724  0.00877 **
## BachelorsDegreePCT    3.124e-01  1.241e-01   2.517  0.01496 * 
## MedianHouseholdIncome 1.564e-04  6.951e-05   2.250  0.02873 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.232 on 52 degrees of freedom
## Multiple R-squared:  0.4903, Adjusted R-squared:  0.4707 
## F-statistic: 25.01 on 2 and 52 DF,  p-value: 2.452e-08
## Test RMSE: 4.31043

K-Means Clustering

  • Summary: The K-means clustering analysis grouped counties into three distinct clusters based on proficiency and the percentage of bachelor’s degree holders BachelorsDegreePCT. Cluster 1 represented counties with low proficiency and low educational proficiency, while Cluster 3 encompassed counties with high proficiency and a higher percentage of bachelor’s degrees. Cluster 2 fell in between, representing moderately performing regions.
##   proficiency      BachelorsDegreePCT MedianHouseholdIncome
##  Min.   :-1.8041   Min.   :-1.58674   Min.   :-2.30823     
##  1st Qu.:-0.6627   1st Qu.:-0.67710   1st Qu.:-0.49654     
##  Median :-0.1624   Median :-0.08207   Median : 0.05517     
##  Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.00000     
##  3rd Qu.: 0.7031   3rd Qu.: 0.34881   3rd Qu.: 0.42511     
##  Max.   : 2.8357   Max.   : 4.11733   Max.   : 3.54957     
##  FamiliesInPovertyPCT
##  Min.   :-1.4506     
##  1st Qu.:-0.7303     
##  Median :-0.1209     
##  Mean   : 0.0000     
##  3rd Qu.: 0.4442     
##  Max.   : 3.6245

Decision Tree and Testing

  • Summary: The decision tree shows that counties with BachelorsDegreePCT >= 21% have the highest average proficiency score of 32, highlighting the importance of educational attainment. For counties with lower bachelor’s degree attainment (<21%), poverty becomes a key factor. Counties with FamiliesInPovertyPCT >= 14% have a lower average proficiency score of 22, while those with less poverty have a moderate proficiency score of 25. When tested on unseen data, the decision tree produced an RMSE of 5.1, indicating moderate predictive accuracy. While slightly less precise than the linear regression model (RMSE = 4.31), it is still a usable model.

## Test RMSE: 5.100935

Feature Importance from Decision Tree

  • Summary: The feature importance analysis from the decision tree highlights the relative contributions of predictors. BachelorsDegreePCT was the most important variable, followed by MedianHouseholdIncome and FamiliesInPovertyPCT. This aligns with findings from the regression model and clustering analysis, underscoring the role of socioeconomic factors in predicting proficiency. While MedianHouseholdIncome was not used directly in the tree’s final structure, its high importance score confirms its indirect influence. By quantifying the contribution of each predictor, it gave a better idea of how these factors come into play.

Conclusion

The analysis highlights how households with higher median incomes and parents with significant educational attainment can positively impact student success. This may stem from the value placed on education within the home or the ability to access resources that support academic achievement. The findings strongly support the thesis that socioeconomic factors play a critical role in shaping educational outcomes. Students are more likely to succeed when supported by higher household incomes and well-educated parents or guardians. Conversely, limited education and poverty can hinder a student’s proficiency and ability to learn, underscoring the importance of addressing these barriers to foster academic success.

Resources:

  • Site Used for Data: HD Pulse WV

  • Best Site Ever: EDA Textbook

  • ChatGPT: Major help with cleaning up my coding and finding errors within it. Gave me a bunch of cool ways to visualize the data like the data frame. Helpful in formatting my text in markdown and help reword bits and pieces.