Let’s test two theories about Montana’s political landscape, based on election results data from the 2018 US Senate Race (where Democrat Jon Tester defeated Republic Matt Rosendale in a close race).
The first theory is that more populated counties support Democrats and their counterparts support Republics.
The second theory is that Democrats are more common in the Western parts of the state, and vice versa.
library(tidyverse)
library(leaflet)
library(sf)
library(readxl)
library(DT)
library(plotly)
library(broom)
library(tidycensus)
census_api_key("d41baf61ae7a8464397c04c3c41151966dd81cb1")
To install your API key for use in future sessions, run this function with `install = TRUE`.
I obtained election results data from http://mtelectionresults.gov
The election results are a spreadsheet, so I will load it now.
senate_counties <- read_xlsx("Statewide Results.xlsx", sheet = 1, range = "B7:E63")
When I first viewed the data, I saw there were unnecessary rows and columns. So I fixed that by giving a cells range in the R code.
senate_counties <- read_xlsx("Statewide Results.xlsx", sheet = 1)
New names:
* `` -> ...2
* `` -> ...3
* `` -> ...4
* `` -> ...5
When I look again, it looks good now.
Next, I will Use glimpse() to look at senate_counties.
glimpse(senate_counties)
Observations: 56
Variables: 4
$ County [3m[38;5;246m<chr>[39m[23m "Beaverhead", "Big Horn", "Blaine", "Broadwater", "Carbon", "Carter", "Cascade", "Choutea…
$ `JON TESTER\r\nDemocrat` [3m[38;5;246m<dbl>[39m[23m 1876, 3027, 1961, 1071, 2680, 128, 17435, 1275, 1942, 281, 1233, 2892, 281, 1964, 19652, …
$ `MATT ROSENDALE\r\nRepublican` [3m[38;5;246m<dbl>[39m[23m 2866, 1558, 982, 2086, 3209, 602, 15566, 1312, 2762, 631, 2700, 1208, 951, 3640, 26759, 2…
$ `RICK BRECKENRIDGE\r\nLibertarian` [3m[38;5;246m<dbl>[39m[23m 155, 91, 76, 104, 178, 22, 1008, 70, 179, 29, 140, 136, 57, 189, 1349, 1434, 30, 89, 21, …
I will use rename() to set terms “Republican” and “Democrat” instead of the names and the rest of the existing long chain of info, this will make the data easier to work with:
senate_counties <- senate_counties %>%
rename(Republican = "MATT ROSENDALE\r\nRepublican") %>%
rename(Democrat = "JON TESTER\r\nDemocrat") %>%
rename(Libertarian = "RICK BRECKENRIDGE\r\nLibertarian")
Now I’ll see if that looks better.
glimpse(senate_counties)
Observations: 56
Variables: 4
$ County [3m[38;5;246m<chr>[39m[23m "Beaverhead", "Big Horn", "Blaine", "Broadwater", "Carbon", "Carter", "Cascade", "Chouteau", "Custer", "Daniels"…
$ Democrat [3m[38;5;246m<dbl>[39m[23m 1876, 3027, 1961, 1071, 2680, 128, 17435, 1275, 1942, 281, 1233, 2892, 281, 1964, 19652, 33251, 81, 3754, 130, 6…
$ Republican [3m[38;5;246m<dbl>[39m[23m 2866, 1558, 982, 2086, 3209, 602, 15566, 1312, 2762, 631, 2700, 1208, 951, 3640, 26759, 21248, 571, 1153, 303, 1…
$ Libertarian [3m[38;5;246m<dbl>[39m[23m 155, 91, 76, 104, 178, 22, 1008, 70, 179, 29, 140, 136, 57, 189, 1349, 1434, 30, 89, 21, 52, 227, 226, 51, 365, …
Yay, it does!
We want to know the Republican - Democrat difference in each county, as a percentage.
Since the Democrat candidate won this election, I want to see the Democrat minus Republic difference per county, and I want that number as a percentage.
The R code below accomplishes that task.
senate_counties <- senate_counties %>%
mutate(total_votes = Republican + Democrat + Libertarian) %>%
mutate(Dem_advantage = Democrat/total_votes - Republican/total_votes) %>%
mutate(Dem_advantage = round(Dem_advantage*100, 1))
senate_counties %>%
arrange(-Dem_advantage)
NA
I also want to look at census data, in addition to election data.
The R code below brings that useful census data into the fold.
mt_counties <- get_acs(geography = "county",
variables = "B01003_001",
state = "MT",
geometry = TRUE)
Getting data from the 2014-2018 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Using FIPS code '30' for state 'MT'
|
| | 0%
|
|= | 0%
|
|= | 1%
|
|== | 1%
|
|== | 2%
|
|=== | 2%
|
|=== | 3%
|
|==== | 3%
|
|==== | 4%
|
|===== | 4%
|
|====== | 4%
|
|====== | 5%
|
|======= | 5%
|
|======= | 6%
|
|======== | 6%
|
|======== | 7%
|
|========= | 7%
|
|========= | 8%
|
|========== | 8%
|
|=========== | 8%
|
|=========== | 9%
|
|============ | 9%
|
|============ | 10%
|
|============= | 10%
|
|============= | 11%
|
|============== | 11%
|
|=============== | 12%
|
|================ | 12%
|
|================ | 13%
|
|================= | 13%
|
|================= | 14%
|
|================== | 14%
|
|================== | 15%
|
|=================== | 15%
|
|=================== | 16%
|
|==================== | 16%
|
|===================== | 16%
|
|===================== | 17%
|
|====================== | 17%
|
|====================== | 18%
|
|======================= | 18%
|
|======================= | 19%
|
|======================== | 19%
|
|======================== | 20%
|
|========================= | 20%
|
|========================== | 21%
|
|=========================== | 21%
|
|=========================== | 22%
|
|============================ | 22%
|
|============================ | 23%
|
|============================= | 23%
|
|============================= | 24%
|
|============================== | 24%
|
|=============================== | 25%
|
|================================ | 25%
|
|================================ | 26%
|
|================================= | 26%
|
|================================= | 27%
|
|================================== | 27%
|
|================================== | 28%
|
|=================================== | 28%
|
|==================================== | 28%
|
|==================================== | 29%
|
|===================================== | 29%
|
|===================================== | 30%
|
|====================================== | 30%
|
|====================================== | 31%
|
|======================================= | 31%
|
|======================================= | 32%
|
|======================================== | 32%
|
|========================================= | 32%
|
|========================================= | 33%
|
|========================================== | 33%
|
|========================================== | 34%
|
|=========================================== | 34%
|
|=========================================== | 35%
|
|============================================ | 35%
|
|============================================= | 36%
|
|============================================== | 36%
|
|============================================== | 37%
|
|=============================================== | 37%
|
|=============================================== | 38%
|
|================================================ | 38%
|
|================================================ | 39%
|
|================================================= | 39%
|
|================================================== | 40%
|
|=================================================== | 41%
|
|==================================================== | 41%
|
|==================================================== | 42%
|
|===================================================== | 42%
|
|===================================================== | 43%
|
|====================================================== | 43%
|
|====================================================== | 44%
|
|======================================================= | 44%
|
|======================================================== | 45%
|
|========================================================= | 45%
|
|========================================================= | 46%
|
|========================================================== | 46%
|
|========================================================== | 47%
|
|=========================================================== | 47%
|
|=========================================================== | 48%
|
|============================================================ | 48%
|
|============================================================= | 48%
|
|============================================================= | 49%
|
|============================================================== | 49%
|
|============================================================== | 50%
|
|=============================================================== | 50%
|
|=============================================================== | 51%
|
|================================================================ | 51%
|
|================================================================ | 52%
|
|================================================================= | 52%
|
|================================================================== | 52%
|
|================================================================== | 53%
|
|=================================================================== | 53%
|
|=================================================================== | 54%
|
|==================================================================== | 54%
|
|==================================================================== | 55%
|
|===================================================================== | 55%
|
|===================================================================== | 56%
|
|====================================================================== | 56%
|
|======================================================================= | 56%
|
|======================================================================= | 57%
|
|======================================================================== | 57%
|
|======================================================================== | 58%
|
|========================================================================= | 58%
|
|========================================================================= | 59%
|
|========================================================================== | 59%
|
|========================================================================== | 60%
|
|=========================================================================== | 60%
|
|============================================================================ | 60%
|
|============================================================================ | 61%
|
|============================================================================= | 61%
|
|============================================================================= | 62%
|
|============================================================================== | 62%
|
|============================================================================== | 63%
|
|=============================================================================== | 63%
|
|================================================================================ | 64%
|
|================================================================================= | 64%
|
|================================================================================= | 65%
|
|================================================================================== | 65%
|
|================================================================================== | 66%
|
|=================================================================================== | 66%
|
|=================================================================================== | 67%
|
|==================================================================================== | 67%
|
|===================================================================================== | 68%
|
|====================================================================================== | 69%
|
|======================================================================================= | 69%
|
|======================================================================================= | 70%
|
|======================================================================================== | 70%
|
|======================================================================================== | 71%
|
|========================================================================================= | 71%
|
|========================================================================================= | 72%
|
|========================================================================================== | 72%
|
|=========================================================================================== | 72%
|
|=========================================================================================== | 73%
|
|============================================================================================ | 73%
|
|============================================================================================ | 74%
|
|============================================================================================= | 74%
|
|============================================================================================= | 75%
|
|============================================================================================== | 75%
|
|============================================================================================== | 76%
|
|=============================================================================================== | 76%
|
|================================================================================================ | 76%
|
|================================================================================================ | 77%
|
|================================================================================================= | 77%
|
|================================================================================================= | 78%
|
|================================================================================================== | 78%
|
|================================================================================================== | 79%
|
|=================================================================================================== | 79%
|
|=================================================================================================== | 80%
|
|==================================================================================================== | 80%
|
|===================================================================================================== | 80%
|
|===================================================================================================== | 81%
|
|====================================================================================================== | 81%
|
|====================================================================================================== | 82%
|
|======================================================================================================= | 82%
|
|======================================================================================================= | 83%
|
|======================================================================================================== | 83%
|
|========================================================================================================= | 84%
|
|========================================================================================================== | 84%
|
|========================================================================================================== | 85%
|
|=========================================================================================================== | 85%
|
|=========================================================================================================== | 86%
|
|============================================================================================================ | 86%
|
|============================================================================================================ | 87%
|
|============================================================================================================= | 87%
|
|============================================================================================================= | 88%
|
|============================================================================================================== | 88%
|
|=============================================================================================================== | 88%
|
|=============================================================================================================== | 89%
|
|================================================================================================================ | 89%
|
|================================================================================================================ | 90%
|
|================================================================================================================= | 90%
|
|================================================================================================================= | 91%
|
|================================================================================================================== | 91%
|
|================================================================================================================== | 92%
|
|=================================================================================================================== | 92%
|
|==================================================================================================================== | 92%
|
|==================================================================================================================== | 93%
|
|===================================================================================================================== | 93%
|
|===================================================================================================================== | 94%
|
|====================================================================================================================== | 94%
|
|====================================================================================================================== | 95%
|
|======================================================================================================================= | 95%
|
|======================================================================================================================= | 96%
|
|======================================================================================================================== | 96%
|
|========================================================================================================================= | 97%
|
|========================================================================================================================== | 97%
|
|========================================================================================================================== | 98%
|
|=========================================================================================================================== | 98%
|
|=========================================================================================================================== | 99%
|
|============================================================================================================================ | 99%
|
|============================================================================================================================ | 100%
|
|=============================================================================================================================| 100%
There are problems with minor differences in the two data sets I’m using (election and census data) with the name of Lewis and Clark county and with some unnecessary words in one dataset.
The code below fixes that issue, and renames the census column called estimate to Population, for later use:
senate_counties[25, "County"] <- "Lewis and Clark" # Changes "&" "and"
mt_counties <- mt_counties %>%
mutate(County = gsub(" County, Montana", "", NAME)) %>% # Removes unnecessary words
rename(Population = estimate) # Renames the 'estimate' to 'Population'
Let’s join the two to make a new data set.
senate_election <- mt_counties %>%
full_join(senate_counties)
Joining, by = "County"
Let’s make a pretty table to look closer.
senate_election %>%
as_tibble() %>%
select(County, Population, Democrat, Republican, Libertarian, total_votes, Dem_advantage) %>%
datatable()
NA
This code creates a beautiful and informative choropleth.
vote_colors <- colorNumeric(palette = "viridis", domain = senate_election$Dem_advantage)
senate_election %>%
leaflet() %>%
addTiles() %>%
addPolygons(weight = 1,
fillColor = ~vote_colors(Dem_advantage),
label = ~paste0(County, ", Democrat advantage = ", Dem_advantage),
highlight = highlightOptions(weight = 2)) %>%
setView(-110, 47, zoom = 6) %>%
addLegend(pal = vote_colors, values = ~Dem_advantage)
sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'
Notice that if you hover your mouse over each county, it shows the Democrat advantage (where ever there was actually a Democrat disadvantage, a negative number will be displayed). Many populous Montana counties (Missoula, Gallatin, Cascade, Silverbow, etc.) are colored in a yellow-green shade because they have large Democrat advantages. Meanwhile, the most populous county actually has a modest Democrat disadvantage. But certainly there still seems to be a trend that the other most populous counties seem to learn toward Democrat.
So, do Democrats prevail in the more populated counties and vice versa? Let’s keep investigating with the data at hand.
A plotly graph can help us out with that.
senate_election %>%
plot_ly(x = ~Population, y = ~Dem_advantage) %>%
add_markers()
It certainly still looks like Democrats do better in the more populous counties!
Let’s enhance this plotly graph.
senate_election %>%
plot_ly(x = ~Population,
y = ~Dem_advantage,
hoverinfo = "text",
text = ~paste("County:",
County, "<br>",
"Population: ", Population, "<br>",
"Democrat advantage: ", Dem_advantage)) %>%
add_markers(marker = list(opacity = 0.7)) %>%
layout(title = "Predicting Democrat Vote Advantage from Population, by County",
xaxis = list(title = "County population"),
yaxis = list(title = "Democrat vote advantage"))
NA
That looks better!
Now… Let’s create a regression model using a lineal modeal.
pop_model <- lm(Dem_advantage ~ Population, data = senate_election)
Summary is one way to look at the data from the model.
summary(pop_model)
Call:
lm(formula = Dem_advantage ~ Population, data = senate_election)
Residuals:
Min 1Q Median 3Q Max
-47.948 -15.782 -3.247 12.013 71.228
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.428e+01 4.084e+00 -5.946 2.08e-07 ***
Population 3.761e-04 1.100e-04 3.418 0.00121 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 26.44 on 54 degrees of freedom
Multiple R-squared: 0.1779, Adjusted R-squared: 0.1626
F-statistic: 11.68 on 1 and 54 DF, p-value: 0.001207
But, hey that all looks like a mess! Not very easy on the eyes…
So let’s use the broom package in R to get some prettier tables to look at.
tidy(pop_model)
glance(pop_model)
NA
NA
R-squared is about .18, meaning 18% of the variance in vote is attributable to population counts. The correlation is about .4, which is statistically significant, providing evidence for the theory that the larger the population, the more dominant Democrats are in a given county.
Now I want to plot the regression line onto the chart.
senate_election %>%
plot_ly(x = ~Population,
y = ~Dem_advantage,
hoverinfo = "text",
text = ~paste("County:",
County, "<br>",
"Population: ", Population, "<br>",
"Democrat advantage: ", Dem_advantage)) %>%
add_markers(showlegend = F, marker = list(opacity = 0.7)) %>%
layout(title = "Predicting Democrat Vote Advantage from Population, by County",
xaxis = list(title = "County population"),
yaxis = list(title = "Democrat vote advantage")) %>%
add_lines(y = ~fitted(pop_model))
NA
That regression line is a nice visual cue for the case we’ve made.
So, if you look further West in Montana, do you find more Democrats, and vice versa? Let’s look at geographic data per county to find out.
senate_election <- senate_election %>%
mutate(Longitude = as_tibble(st_coordinates(st_centroid(senate_election$geometry)))$X) %>%
mutate(Latitude = as_tibble(st_coordinates(st_centroid(senate_election$geometry)))$Y)
st_centroid does not give correct centroids for longitude/latitude datast_centroid does not give correct centroids for longitude/latitude data
The R code above and below is intended to show roughly the center of each county.
senate_election %>%
leaflet() %>%
addTiles() %>%
addPolygons(weight = 1) %>%
setView(-110, 47, zoom = 6) %>%
addCircleMarkers(~Longitude, ~Latitude)
sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'
As the graph shows, in the farthest Easterly longtitudes, there is slight trend for a Republican advantage, and vice versa.
senate_election %>%
plot_ly(x = ~Longitude, y = ~Dem_advantage) %>%
add_markers()
Now let’s do the linear regression for the same data.
longitude_lm <- lm(Dem_advantage ~ Longitude, data = senate_election)
tidy(longitude_lm)
glance(longitude_lm)
The r-squared is again about .18, and the correlation is about the same .4. This confirms that the West is Democrat territory.
Let’s make the graph more complete with the line, and some labels.
senate_election %>%
plot_ly(x = ~Longitude,
y = ~Dem_advantage,
hoverinfo = "text",
text = ~paste("County:", County, "<br>", "Longitude: ", Longitude, "<br>", "Democrat advantage: ", Dem_advantage)) %>%
add_markers(marker = list(opacity = 0.7), showlegend = F) %>%
layout(title = "Predicting Democrat Vote Advantage from Longitude, by County",
xaxis = list(title = "County longitude"),
yaxis = list(title = "Democrat vote advantage")) %>%
add_lines(y = ~fitted(longitude_lm))
NA
NA
How about a multiple regression to predict Democrat advantage from county population and longitude at the same time.
multiple_lm <- lm(Dem_advantage ~ Population + Longitude, data = senate_election)
tidy(multiple_lm)
glance(multiple_lm)
Wow, they both retain their statistical analysis. The case is looking ever-stronger for our theory.
2D this whole time is getting a bit repetitive; let’s do a sweet 3D graph for a change.
senate_election %>%
plot_ly(x = ~Longitude, y = ~Population, z = ~Dem_advantage,
text = ~County, hoverinfo = "text") %>%
add_markers(opacity = .7, showlegend = F)
Wow, that’s eye-popping! And informative!
Matching the broader trend across the entire country, the more populous areas in Montana tend to lean Democrat. The more rural areas tend to lean Republican. And yes, it really is true; on average, if you’re in the Western part of Montana, you’re surrounded by Democrats. If you’re in the Eastern part of Montana, you’re surrounded by Republicans. Meanwhile, Yellowstone county, home of MSU-Billings, seems to be a good place for everyone to hang out, with apparently plentiful Democrat and Republic voters–which makes sense, because it’s right in the middle of the state.