Election Results 2020 | https://www.electionreturns.pa.gov/General/SummaryResults?ElectionID=83&ElectionType=G&IsActive=0
From the 2020 Election Results:
County Name: This is just a designation for each county
Candidate Name: The candidate running
Votes: Total Votes (By each candidate)
Election Day Votes: Votes on Election day
Mail Votes: Votes by mail in ballot
Provisional Votes: Provisional ballot votes for those with unsure elegibility
From the Census Data Overall:
County: Name of County
FIPS: Number Identifier for County
From Bachelors Data:
Value (percent): Percent of people in county with degree
People: People with bachelors by county
From Unemployed Data:
Value (percent): Percent of people unemployed per county
People: People unemployed by county
From Poverty Data:
Value (percent): Percent of poverty per county
Families: Families in poverty by county
Once I find a solid polling data by county, I will have more columns and features. The hope in the end is to have an overarching tibble with all my combined data along with the smaller supporting tibbles.
Initial Transformation of Data will involve the removal of some rows for the census data at the beginning and end, this being just random info that came with the data
Missing Values: As previously mentioned, there is not any missing values to attend. If there is its data not relevant to my analysis and the columns themselves won’t be used
Over/Under Field: I may be able to do this but I am unsure only because I also implemented the Libertarian party in my data. I am thinking that maybe I will need to have tibbles that have all the mentioned parties and one that focuses on Kamala and Trump by themselves
With the rework of my project, I have had to trash a lot of my previous corelations. I will give my thoughts on some potentially intriguing relationships
Education vs Voter Choice: I believe you will find a solid relationship around counties education and where this leads voters to lean
Poverty vs Voter Choice: I believe that poverty will have a relationship with voter choice for the respective county
Unemployment vs Voter Choice: Lastly, I think unemployment will be a strong indicator as to voter tendencies in each county
Later on I hope to establish more relationships between polling and the census data, this hopefully showing a possible idea if things will remain the same this cycle or change.
Model 1: Simple Linear Regression
Model 2: Multiple Linear Regression (Continuous Variables)
Model 3: Regression with a Categorical Variable
Due to my scrapping of my previous project, I don’t have any solid results to analyze. Looking at the data with a naked eye, I have identified various factors I believe will have a strong influence in regards to voter decisions in the past and potential voting decisions in the future.