With the end of the 2020 election, and while waiting for all the final results to be uploaded after being certified, I thought it would be a good idea to examine the results from the 2016 election results. In particular, I thought it would be worth examining in detail the Presidential election results for the closest state by percentage points. I also wanted to examine how the vote for the closest state is distributed by county.
In order to look at the results first we have to import the data. For the state election results, I decided to use the ‘politicaldata’ library, which was built by G. Elliot Morris who is a data journalist currently working for the Economist. The dataset called pres_results contains presidential election results from the year 1976. However, we will only need data from 2016. But first let’s install the libraries we will use. The additional libraries we will use are tidyverse for manipulating the data and patchwork for combining different elements of graphs.
As we can see from the charts, the results was very close. Now let’s look at the state of Michigan to understand how different counties voted for the different candidates. For county election results, we will use election results from here In order to make sure the data runs, download the data, unzip it and store the csv file in the same folder. We also use the library usmap, to plot a map of Michigan with the counties.
if (!requireNamespace('politicaldata', quietly = TRUE))
{
install.packages('politicaldata')
}
if (!requireNamespace('usmap', quietly = TRUE))
{
install.packages("usmap")
}
library('usmap')
library('politicaldata')
library(tidyverse)
library(patchwork)
library(knitr)
election_county <- read_csv("dataverse_files/countypres_2000-2016.csv") %>%
filter(year==2016, state_po=="MI") %>%
mutate(fips = FIPS)Now that we have the data let’s look through it to see the top five closest states for the presidential election. First we start by filtering the results only for the year 2016. We then create a new column that is absolute value of the difference in percentage points between the Republican and Democratic vote and the sort it, and view the first 5 results.
election_results_table <- pres_results %>%
filter(year == 2016) %>%
mutate(difference_percentage = abs(dem - rep)) %>%
arrange(difference_percentage) %>%
slice_head(n=5) %>%
mutate(difference_votes = difference_percentage * total_votes) %>%
select(!year)
kable(election_results_table, format = "html", align = 'c', row.names = TRUE)| state | total_votes | dem | rep | other | difference_percentage | difference_votes | |
|---|---|---|---|---|---|---|---|
| 1 | MI | 4799284 | 0.4727453 | 0.4749756 | 0.0504131 | 0.0022303 | 10704 |
| 2 | NH | 744296 | 0.4682626 | 0.4645867 | 0.0509891 | 0.0036760 | 2736 |
| 3 | PA | 6115402 | 0.4785362 | 0.4857789 | 0.0356850 | 0.0072427 | 44292 |
| 4 | WI | 2976150 | 0.4645384 | 0.4721818 | 0.0591180 | 0.0076434 | 22748 |
| 5 | FL | 9420039 | 0.4782332 | 0.4902194 | 0.0315312 | 0.0119863 | 112911 |
As we can see from the table, the closest state was Michigan with a .223 percentage difference between the Democratic and Republican presidential candidates. The top five closest states were Michigan, New Hampshire, Pennsylvania, Wisconsin and Florida. One intereesting thing to note is that we’re viewing the states in order of difference in percentage points. This means that even though the number of difference in votes in New Hampshire, for example, is very small (2736 votes) compared to the other states, the total votes cast were also much smaller, which is why it is not the first in the list. If we wanted to examine the smallest difference in vote totals, we would get a different table with some different states.
election_results_table <- pres_results %>%
filter(year == 2016) %>%
mutate(difference_percentage = abs(dem - rep)) %>%
mutate(difference_votes = difference_percentage * total_votes) %>%
arrange(difference_votes) %>%
slice_head(n=5) %>%
select(!year)
kable(election_results_table, format = "html", align = 'c', row.names = TRUE)| state | total_votes | dem | rep | other | difference_percentage | difference_votes | |
|---|---|---|---|---|---|---|---|
| 1 | NH | 744296 | 0.4682626 | 0.4645867 | 0.0509891 | 0.0036760 | 2736 |
| 2 | MI | 4799284 | 0.4727453 | 0.4749756 | 0.0504131 | 0.0022303 | 10704 |
| 3 | ME | 771892 | 0.4634521 | 0.4347668 | 0.1017811 | 0.0286854 | 22142 |
| 4 | WI | 2976150 | 0.4645384 | 0.4721818 | 0.0591180 | 0.0076434 | 22748 |
| 5 | NV | 1125385 | 0.4791782 | 0.4550070 | 0.0658148 | 0.0241713 | 27202 |
For the purpose of our analysis, difference in percentage points is a more accurate representation because it is independent of population. Now let’s create a bar chart so that we can visually look at how close the election results were in the five states Michigan, New Hampshire, Pennsylvania, Wisconsin and Florida. Since the vote total for all other candidates is very low, I decided not to show it for the purpose of this graph.
# Manipulating the data so that we can make a chart
election_results_chart <- pres_results %>%
filter(year == 2016) %>%
mutate(difference_percentage_points = abs(dem - rep)*100) %>%
arrange(difference_percentage_points) %>%
slice_head(n=5) %>%
select(State =state, Democrats=dem, Republicans=rep) %>%
pivot_longer(!State, names_to= "Party", values_to = "Vote_totals") %>%
mutate(Vote_totals = Vote_totals*100)
# Making a chart out of the data
ggplot(election_results_chart, aes(fill=Party, y=Vote_totals, x=factor(State), color=Party)) +
scale_fill_manual(values=c("#0015BC", "#FF0000")) + scale_color_manual(values=c("#0015BC", "#FF0000" )) +
geom_bar(position="dodge", stat="identity") + ylim(0, 100) + xlab("States") +ylab("Vote totals (by percentage) ") +
ggtitle("Breakdown of vote percentage for the top five closest \n states for the 2016 Presidential Election") +
theme(plot.title = element_text(hjust = 0.5, size = 15, face = "bold"), legend.text =element_text(face="bold", size=12),
axis.text.x = element_text(face="bold", size = 12), axis.text.y = element_text(face="bold", size = 12),
axis.title.x = element_text(face="bold", size = 12), axis.title.y = element_text(face="bold", size = 12),
legend.title = element_text(face="bold", size=12))Now let’s take a look at a map of Michigan with all its counties and try to visualize how the votes are distributed around different counties. To do that we will create a heat map that shows where most of the votes come from
rep_votes <- election_county %>%
filter(party=="republican")
plot_usmap(data = rep_votes, values= "candidatevotes", regions = "counties", include = c("MI")) +
scale_fill_continuous( low = "white", high = "#FF0000", name = "Number of Votes", label = scales::comma) +
labs(title = "Michigan: Number of Republican votes", subtitle = "Distibution of votes by county. Darker color indicates more votes") +
theme(plot.title = element_text(hjust =0.5, size = 14, face= "bold"), plot.subtitle = element_text(hjust =0.5, size = 8,),
panel.background = element_rect(color = "black", fill = "lightblue"),
legend.position = c(0.05, 0.06), legend.background = element_rect(colour= "black")) The problem with this map is that it tells us where most of the vote is coming by count, so the most of the vote visible will be from the most populated counties. We can see that that a lot of Republican votes are coming from Wayne county even though tends to lean heavily democratic simply because there are a lot of people there. In order to get a useful chart, we can try to standardize the votes so the map indicates the values with the proportion of the votes. We can do this by dividing the votes each candidate got by total votes.
rep_votes <- election_county %>%
filter(party=="republican") %>%
mutate(standardized_vote= candidatevotes/totalvotes * 100)
plot_usmap(data = rep_votes, values= "standardized_vote", regions = "counties", include = c("MI")) +
scale_fill_continuous(low = "white", high = "#FF0000", name = "Number of Votes", label = scales::comma,
breaks=seq(20, 100, length.out = 9), 1) +
labs(title = "Michigan: Percentage of Republican votes",
subtitle = "Distibution of votes percentages by county. Darker color \n indicates higher percentage of vote won.") +
theme(plot.title = element_text(hjust =0.5, size = 14, face= "bold"), plot.subtitle = element_text(hjust =0.5, size = 8,),
panel.background = element_rect(color = "black", fill = "lightblue"),
legend.position = c(0.05, 0.06), legend.background = element_rect(colour= "black")) Now we get a chart that is more readable. We can see that the areas where the Republican candidate won a greater percentage of the vote tended to be more rural and sparseley populated areas, if you can tell the counties apart in Michigan. Here’s a table of the top five counties in Michigan with the highest percentage of the vote won by the Republican candidate.
rep_county_table <- election_county %>%
filter(year == 2016, party=="republican") %>%
mutate(percentage_vote= candidatevotes/totalvotes * 100) %>%
arrange(1/percentage_vote) %>%
slice_head(n=5) %>%
select(county, percentage_vote)
kable(rep_county_table, format = "html", align = 'c', row.names = TRUE)| county | percentage_vote | |
|---|---|---|
| 1 | Missaukee | 73.60940 |
| 2 | Hillsdale | 70.68706 |
| 3 | Sanilac | 69.85298 |
| 4 | Montmorency | 69.83430 |
| 5 | Oscoda | 69.80113 |
Moving on to the democratic votes for the state of Michigan.
dem_votes <- election_county %>%
filter(party=="democrat") %>%
mutate(standardized_vote= candidatevotes/totalvotes * 100)
plot_usmap(data = dem_votes, values= "standardized_vote", regions = "counties", include = c("MI")) +
scale_fill_continuous(low = "white", high = "#0015BC", name = "Number of Votes", label = scales::comma,
breaks=seq(20, 100, length.out = 9), 1) +
labs(title = "Michigan: Percentage of Democratic votes",
subtitle = "Distibution of votes percentages by county. Darker color \n indicates higher percentage of vote won.") +
theme(plot.title = element_text(hjust =0.5, size = 14, face= "bold"), plot.subtitle = element_text(hjust =0.5, size = 8,),
panel.background = element_rect(color = "black", fill = "lightblue"),
legend.position = c(0.05, 0.06), legend.background = element_rect(colour= "black")) We can see that the democratic candidate won a greater percentage of the vote in more urban and densely populated areas. Here’s the top five table for reference.
rep_county_table <- election_county %>%
filter(year == 2016, party=="democrat") %>%
mutate(percentage_vote= candidatevotes/totalvotes * 100) %>%
arrange(1/percentage_vote) %>%
slice_head(n=5) %>%
select(county, percentage_vote)
kable(rep_county_table, format = "html", align = 'c', row.names = TRUE)| county | percentage_vote | |
|---|---|---|
| 1 | Washtenaw | 68.13255 |
| 2 | Wayne | 66.78049 |
| 3 | Ingham | 60.32576 |
| 4 | Kalamazoo | 53.16590 |
| 5 | Genesee | 52.34493 |
The state of Michigan was won by around 10,000 votes, and the result could have swung eitter way just by difference in turnout in urban v.s rural areas. Creating charts is more helpful at visualizing pattern that we might theorize are true. Even in cases where we know something for a fact, it’s always nice to get visual confirmation. It’s also important to know that the choice of our visualziation also decides wether or not we can gleam useful information from it.