This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(readr)
url <- "http://dl.tufts.edu/file_assets/generic/tufts:MS115.003.001.00001/0"
if (!file.exists("all-votes.tsv")) {
download.file(url, "nnv-all-votes.zip")
unzip("nnv-all-votes.zip", files = "all-votes.tsv")
}
nnv <- read_tsv("all-votes.tsv")
You can also embed plots, for example:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(stringr)
names(nnv) <- names(nnv) %>% str_to_lower() %>% str_replace_all("\\ ", "_")
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
The report should be broken into two parts. The first part should be an exploration of the dataset where you examine what is in the data and explain (briefly) what you are trying to figure out and what you are seeing. Answer these kinds of questions, using plots if you need to.
What kinds of elections were there?
nnv$type %>%
unique()
## [1] "General" "Legislative" "Special"
## [4] "Legislastive" "Special Legislative" "Special Election"
How many of each kind of election?
nnv %>%
ggplot(aes(x=type)) +
geom_bar(stat = "count")
How many candidates and how often do they appear?
nnv$name %>%
unique() %>%
length()
## [1] 34880
nnv$name_id %>%
unique() %>%
length()
## [1] 33763
Above I decided to use the unique and length function to find the different candidates for name
Which parties?
nnv$affiliation %>%
unique() %>%
length()
## [1] 180
nnv_aff <-nnv %>%
filter(affiliation != "null")
nnv_aff$affiliation %>%
unique() %>%
head()
## [1] "Federalist" "Democrat" "Republican" "Democratic"
## [5] "Pro-Slavery" "Restrictionist"
What does each row in the dataset represent?
Each row is a particular candidate runing in a particular election in a particular year.
Which years are in the dataset, and how many elections are there in each year?
nnv <- nnv %>%
mutate(year = str_extract(date, "\\d{4}") %>% as.integer())
nnv %>%
group_by(year) %>%
summarise(unique_elements = n_distinct(id))
## Source: local data frame [40 x 2]
##
## year unique_elements
## (int) (int)
## 1 1787 69
## 2 1788 159
## 3 1789 186
## 4 1790 168
## 5 1791 156
## 6 1792 194
## 7 1793 167
## 8 1794 243
## 9 1795 202
## 10 1796 357
## .. ... ...
Which states are represented?
nnv_state <- nnv %>%
filter(state != "null")
nnv_state$state %>%
unique()
## [1] "Vermont" "Missouri" "New Hampshire" "Tennessee"
## [5] "New York" "NY" "Alabama" "Maryland"
## [9] "Indiana" "Georgia" "New Jersey" "Massachusetts"
## [13] "Rhode Island" "Maine" "North Carolina" "Kentucky"
## [17] "Mississippi" "Illinois" "Louisiana" "Ohio"
## [21] "South Carolina" "Virginia" "Delaware" "Connecticut"
## [25] "Pennsylvania"
Now it’s time to take a look at The Old Dominion! Virginia exceptionalism at it’s finest. So why not start with the richest county in the United States and view the votes in Loudoun’s Electoral College election.
nnv_va <- nnv %>%
filter(state == "Virginia", county == "Loudoun", office =="Electoral College") %>%
group_by(affiliation) %>%
select(-id, -territory, -township, -ward, borough, -parish)
nnv_va <- nnv_va %>%
filter(affiliation != "null")
ggplot(nnv_va, aes(x = year, y = vote,
color = affiliation)) +
geom_point() +
labs(title = "Early Republic Electoral College Votes in Loudoun",
x= "Election Years",
y = "Number of Votes")
The graph above shows the number of votes by affiliation between the years 1795 and 1820 in Loudoun County. While the results aren’t massive, it is easy to track the rise and fall of different political parties throughout the years. Initially, the republican party was most popular, however by 1814 the Federalist pary took over. Interestingly, the graph shows that there was a short term rise in popularity of the Quid party when they won the Electoral College election in 1807.
The next graph is also interested in the Electoral College elections, except this time for Fairfax County. Since these two counties boarder each other, it will be interesting to observe how similarities and differences in voting for the same elections.
nnv_va_fx <- nnv %>%
filter(state == "Virginia", county == "Fairfax", office =="Electoral College") %>%
group_by(affiliation) %>%
select(-id, -territory, -township, -ward, borough, -parish)
nnv_va_fx <- nnv_va_fx %>%
filter(affiliation != "null")
ggplot(nnv_va_fx, aes(x = year, y = vote,
color = affiliation)) +
geom_point() +
labs(title = "Early Republic Electoral College Votes in Fairfax",
x= "Election Years",
y = "Number of Votes")
Interestingly, the election results between Loudoun and Fairfax are different. For instance, the Quid party never wins an Electoral College election and the Federalist party was more popular early on in county election before eventually losing almost every year to the Republicans.
However, now that I’ve looked at two Northern Virginia Counties, I’m interested in how the Federalist, Republican, and Quid parties faired across Virginia in the Electoral College elections.
nnv_virginia <- nnv %>%
filter(state == "Virginia", office =="Electoral College") %>%
group_by(affiliation) %>%
select(-id, -territory, -township, -ward, borough, -parish)
nnv_virginia <- nnv_virginia %>%
filter(affiliation != "null")
ggplot(nnv_virginia, aes(x = year, y = vote,
color = affiliation)) +
geom_point()+
labs(title = "Early Republic Electoral College Votes in Virginia",
x= "Election Years",
y = "Number of Votes") +
scale_y_log10()
## Warning: Removed 230 rows containing missing values (geom_point).
While this plot is helpful and shows me the number of votes in each election year, it could still be better if I seperate the the parties into seperate graphs.
ggplot(nnv_virginia, aes(x = year, y = vote)) +
geom_count(shape = 1, alpha = 0.6) +
facet_wrap(~ affiliation) +
scale_y_log10()
## Warning: Removed 230 rows containing non-finite values (stat_sum).
The graph above seperates the three major political party and shows the density of votes for each year. This gives a better idea of support for each party.
However, what if I wanted to view the number of votes in spacific counties and and compare how often people voted during the Early Republic. There are far too many counties in Virginia to facet sperately by county, but paring down the number I can at least veiw some differences outside of Fairfax and Loudoun. By using the slice function inside of filter, I can view nine counties and compare them. (These counties werre chosen at random)
nnv_virginia <- nnv %>%
filter(county == c("Loudoun", "Fairfax", "King George", "Surry", "York", "Henrico", "Orange", "Augusta", "Chesterfield")) %>%
group_by(affiliation) %>%
select(-id, -territory, -township, -ward, borough, -parish)
## Warning in c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, :
## longer object length is not a multiple of shorter object length
ggplot(nnv_virginia, aes(x = year, y = vote)) +
geom_count(shape = 1, alpha = 0.6) +
facet_wrap(~ county) +
scale_y_log10()
## Warning: Removed 124 rows containing non-finite values (stat_sum).