1. Introduction

This analysis uses North Carolina 2020 election data and demographic data to predict the percentage of Georgia voter that would voted Democrat. Predictors used were percentage pf residents with bachelor’s degree or higher, percentage of non English speakers, and percentage below poverty line. I chose North Carolina and Georgia because they share borders, have the same electoral colleges, and similar population of around 10.5 million. I predict that Democrats will not win popular vote for Georgia based on the predictors used.

2. Data Description

There 100 counties in North Carolina and there are 159 counties in Georgia. The county results data were collected from Politico and would be displayed as one row per county. The Demographic data were from the “National Institute on Minority Disparities”(2018-2022).

Key variables included:

3. Methods

My independent variables included percentage pf residents with bachelor’s degree or higher, percentage of non English speakers, and percentage below poverty line. The residuals ranged form -0.2 to 0.3. This shows that the difference between the observed and predicted values are fairly small/low. The coefficient for percentage with bachelors and percentage below poverty suggests a 1% increase means 0.89% increase and 1.71% increase in Democratic votes, respectively. Both being highly significant indicates that there are strong statistical evidence that the percentage of bachelors and below poverty has an effect on percentage of Democratic votes that is unlikely to be due to chance alone.

The f-stat of 25.74 with a p-value of less than 0.05 indicates that the model as a whole is statistically significant. The residual standard error demonstrates that on average, the predicted values of Democratic voting percentages deviate from actual values by about 10.18%. This is moderate and shows that there could be improvements to be made.

The model has an adjusted r-squared of 0.4285, meaning that about 43% of the variations in Democrat voting percentage is effectively captured by model. This indicates that it’s a fair model when addressing the predictors.

## 
## Call:
## lm(formula = percent_voted_for_dem ~ percent_bachelors + percent_below_poverty + 
##     percent_non_eng, data = nc_merged)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.19717 -0.06772 -0.02338  0.05585  0.26846 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -0.098762   0.059666  -1.655    0.101    
## percent_bachelors      0.008856   0.001223   7.241 1.11e-10 ***
## percent_below_poverty  0.017061   0.002494   6.841 7.35e-10 ***
## percent_non_eng        0.014654   0.011060   1.325    0.188    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1018 on 96 degrees of freedom
## Multiple R-squared:  0.4458, Adjusted R-squared:  0.4285 
## F-statistic: 25.74 on 3 and 96 DF,  p-value: 2.646e-12

Very few outliers that didn’t skew data.

This scatterplot compares the predicted Democratic vote percentage with the actual Democratic vote percentage. It demonstrates that there is a positive trend between predicted and actual, meaning that there is positive correlation between them.

The loess line indicates a consistent upward trend without much deviation, meaning that predictions closely track actual values, capturing a meaningful pattern.

## 
## Call:
## lm(formula = percent_voted_for_dem ~ percent_bachelors + percent_below_poverty + 
##     percent_non_eng, data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.19732 -0.07600 -0.01140  0.06503  0.26021 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -0.062282   0.067199  -0.927    0.357    
## percent_bachelors      0.008032   0.001386   5.795 1.46e-07 ***
## percent_below_poverty  0.016482   0.002730   6.037 5.36e-08 ***
## percent_non_eng        0.009973   0.012169   0.820    0.415    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1042 on 76 degrees of freedom
## Multiple R-squared:  0.4006, Adjusted R-squared:  0.377 
## F-statistic: 16.93 on 3 and 76 DF,  p-value: 1.614e-08

The average of the predicted values is 40.4%, meaning that about 40% of votes in Georgia would go to the Democratic party. Based on above factors, I predict that the Democratic party would not win Georgia.

4. Limitations

This analysis did not account for whether people with these demographics had actually voted, which could influence the data, making it irrelevant. There would need to be more data on whether they had actually voted to get a more defined answer. An instance would be the percentage of non English speakers because there could be a possibility that the county they’re in does not provide language assistance. As of now, there are twenty states that are not required to provide non English voting materials, one of which is North Carolina.

5. References

Sources included:

-Politico.com, North Carolina presidential results, https://www.politico.com/2020-election/results/north-carolina/

-National Institute on Minority Health and Health Disparities, Demographic data for North Carolina and Georgia, https://hdpulse.nimhd.nih.gov/data-portal/social/map?socialtopic=080&socialtopic_options=social_6&demo=00008&demo_options=poverty_3&race=00&race_options=race_7&sex=0&sex_options=sex_3&age=001&age_options=ageall_1&statefips=13&statefips_options=area_states

-ChatGPT was used to generate code for splitting data to 80% for training and 20% for testing.