Introduction:
In this data analysis, we are hypothesizing that the lower the population, the higher the Republican advantage will be for the state of Montana. We are focusing on only the most recent Senate data resutls from a previous election.
Load packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 2.0.0 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(leaflet)
library(sf)
## Linking to GEOS 3.5.1, GDAL 2.1.3, PROJ 4.9.2
library(readxl)
library(DT)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(broom)
library(tidycensus)
Read in Senate data
senate_counties <- read_xlsx("Statewide Results.xlsx", sheet = 1)
## New names:
## * `` -> `..2`
## * `` -> `..3`
## * `` -> `..4`
## * `` -> `..5`
Read in Statewide Results from data
senate_counties <- read_xlsx("Statewide Results.xlsx", sheet = 1, range = "B7:E63")
Rename columns
senate_counties <- senate_counties %>%
rename(Democrat = "JON TESTER\r\nDemocrat") %>%
rename(Republican = "MATT ROSENDALE\r\nRepublican") %>%
rename(Libertarian = "RICK BRECKENRIDGE\r\nLibertarian")
Compare the votes between the three categories along with the calculated “Rep_advantage”
senate_counties <- senate_counties %>%
mutate(total_votes = Republican + Democrat + Libertarian) %>%
mutate(Repub_advantage = Republican/total_votes - Democrat/total_votes) %>%
mutate(Repub_advantage = round(Repub_advantage*100, 1))
senate_counties %>%
arrange(-Repub_advantage)
Retrieve census data in order to compare votes.
mt_counties <- get_acs(geography = "county",
variables = "B01003_001",
state = "MT",
geometry = TRUE)
## Getting data from the 2013-2017 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======= | 12%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========= | 15%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================= | 52%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|===================================== | 58%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
senate_counties[25, "County"] <- "Lewis and Clark" # Changes "&" "and"
mt_counties <- mt_counties %>%
mutate(County = gsub(" County, Montana", "", NAME)) %>% # Removes unnecessary words
rename(Population = estimate) # Renames the 'estimate' to 'Population'
Join the census data with the senate data by county
senate_election <- mt_counties %>%
full_join(senate_counties)
## Joining, by = "County"
Display the results
senate_election %>%
as_tibble() %>%
select(County, Population, Democrat, Republican, Libertarian, total_votes, Repub_advantage) %>%
datatable()
Dispaly the resutls in a choropleth.
vote_colors <- colorNumeric(palette = "viridis", domain = senate_election$Repub_advantage)
senate_election %>%
leaflet() %>%
addTiles() %>%
addPolygons(weight = 1,
fillColor = ~vote_colors(Repub_advantage),
label = ~paste0(County, ", Republican advantage = ", Repub_advantage),
highlight = highlightOptions(weight = 2)) %>%
setView(-110, 47, zoom = 6) %>%
addLegend(pal = vote_colors, values = ~Repub_advantage)
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
Display the resutls of population versus repub_advantage. As one can see from this graphic, the lower the population the higher the Republican advantage.
senate_election %>%
plot_ly(x = ~Population,
y = ~Repub_advantage,
hoverinfo = "text",
text = ~paste("County:",
County, "<br>",
"Population: ", Population, "<br>",
"Republican advantage: ", Repub_advantage)) %>%
add_markers(marker = list(opacity = 0.7)) %>%
layout(title = "Predicting Republican Vote Advantage from Population, by County",
xaxis = list(title = "County population"),
yaxis = list(title = "Republican vote advantage"))
Create a regression model
pop_model <- lm(Repub_advantage ~ Population, data = senate_election)
According to this anaylysis, the r factor is about a little under 0.18 which is actually pretty good. The p value is bout a 0.12 which means that there is some statistical significance here.
tidy(pop_model)
glance(pop_model)
To note here: the lower the county population demonstrates a higher Republican advantage. A few county outliers include: Missoula, Gallatin, Yellowstone and Flathead as they have a much higher popualtion.
senate_election %>%
plot_ly(x = ~Population,
y = ~Repub_advantage,
hoverinfo = "text",
text = ~paste("County:",
County, "<br>",
"Population: ", Population, "<br>",
"Republican advantage: ", Repub_advantage)) %>%
add_markers(showlegend = F, marker = list(opacity = 0.7)) %>%
layout(title = "Predicting Republican Vote Advantage from Population, by County",
xaxis = list(title = "County population"),
yaxis = list(title = "Republican vote advantage")) %>%
add_lines(y = ~fitted(pop_model))
In addtion, another area to test is the affect of longitude on votes
senate_election <- senate_election %>%
mutate(Longitude = as_tibble(st_coordinates(st_centroid(senate_election$geometry)))$X) %>%
mutate(Latitude = as_tibble(st_coordinates(st_centroid(senate_election$geometry)))$Y)
## Warning in st_centroid.sfc(senate_election$geometry): st_centroid does not
## give correct centroids for longitude/latitude data
## Warning in st_centroid.sfc(senate_election$geometry): st_centroid does not
## give correct centroids for longitude/latitude data
Create a map displaying election data based on centroid analysis. It is important to note that the cirlces are about in the center of county.
senate_election %>%
leaflet() %>%
addTiles() %>%
addPolygons(weight = 1) %>%
setView(-110, 47, zoom = 6) %>%
addCircleMarkers(~Longitude, ~Latitude)
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
Calculate a linear regression of longitatude versus republican advantage. Once again, the r factor is around 0.17 and the p value is 0.14 which is relatively significant.
longitude_lm <- lm(Repub_advantage ~ Longitude, data = senate_election)
tidy(longitude_lm)
glance(longitude_lm)
Create a graph depicting the republican advantage based on longitude. As one can see here, the farther west you go the more republican advantage is noted.
senate_election %>%
plot_ly(x = ~Longitude,
y = ~Repub_advantage,
hoverinfo = "text",
text = ~paste("County:", County, "<br>", "Longitude: ", Longitude, "<br>", "Republican advantage: ", Repub_advantage)) %>%
add_markers(marker = list(opacity = 0.7), showlegend = F) %>%
layout(title = "Predicting Republican Vote Advantage from Longitude, by County",
xaxis = list(title = "County longitude"),
yaxis = list(title = "Republican vote advantage")) %>%
add_lines(y = ~fitted(longitude_lm))
Finally, compare multiple regressions. The r factor is 0.27 and p-value is 0.02 which means that there is a small significance.
multiple_lm <- lm(Repub_advantage ~ Population + Longitude, data = senate_election)
tidy(multiple_lm)
glance(multiple_lm)
Display this data in a graph
senate_election %>%
plot_ly(x = ~Longitude, y = ~Population, z = ~Repub_advantage,
text = ~County, hoverinfo = "text") %>%
add_markers(opacity = .7, showlegend = F)
In conclusion, this Senate analysis showed that there is some significance to “the more west you go in the state of Montana, the more of a republican advantage you have.” Also, lower county populations also impacted the Republican advantage. In regards to Senator Tester winning the vote, it was pretty close to not being in the Democratic favor.