Part 1: Data Source

Census Site | https://hdpulse.nimhd.nih.gov/data-portal/social/map?socialtopic=090&socialtopic_options=social_6&demo=00012&demo_options=workforce_2&race=00&race_options=raceall_1&sex=0&sex_options=sex_3&age=916&age_options=age16_1&statefips=42&statefips_options=area_states

Election Results 2020 | https://www.electionreturns.pa.gov/General/SummaryResults?ElectionID=83&ElectionType=G&IsActive=0

From the 2020 Election Results:

From the Census Data Overall:

From Bachelors Data:

From Unemployed Data:

From Poverty Data:

Once I find a solid polling data by county, I will have more columns and features. The hope in the end is to have an overarching tibble with all my combined data along with the smaller supporting tibbles.

Part 2: Data Transformation

Part 3: Correlations

With the rework of my project, I have had to trash a lot of my previous corelations. I will give my thoughts on some potentially intriguing relationships

Later on I hope to establish more relationships between polling and the census data, this hopefully showing a possible idea if things will remain the same this cycle or change.

Part 4: Modeling

Potential Models

Model 1: Simple Linear Regression

  • The single variable for this model I would like to focus on is Bachelors’s Degree’s by county and relate this to prior elections and current polling

Model 2: Multiple Linear Regression (Continuous Variables)

  • For this Model I want to look at Poverty and Unemployment combined to get an idea of its effect on polling and influence in prior elections

Model 3: Regression with a Categorical Variable

  • For a categorical variable I will look at the counties themselves and try and look at the voting by gender and if there is a noticeable change between this election cycle polling and previous election results

Part 5: Analysis of Results

Due to my scrapping of my previous project, I don’t have any solid results to analyze. Looking at the data with a naked eye, I have identified various factors I believe will have a strong influence in regards to voter decisions in the past and potential voting decisions in the future.