This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(stringr)
library(readr)
url <- "http://dl.tufts.edu/file_assets/generic/tufts:MS115.003.001.00001/0"
if (!file.exists("all-votes.tsv")) {
download.file(url, "nnv-all-votes.zip")
unzip("nnv-all-votes.zip", files = "all-votes.tsv")
}
nnv <- read_tsv("all-votes.tsv")
names(nnv) <-names(nnv) %>% str_to_lower() %>% str_replace_all("\\ ", "_")
- What kinds of elections were there?
nnv$type %>%
unique()
## [1] "General" "Legislative" "Special"
## [4] "Legislastive" "Special Legislative" "Special Election"
There are five different elections recorded in the dataset. However, their is a transcribing error that lists “Legislative” as “Legislastive.”
- How many of each kind of election?
nnv %>% count(type)
## Source: local data frame [6 x 2]
##
## type n
## (chr) (int)
## 1 General 620179
## 2 Legislastive 11
## 3 Legislative 24576
## 4 Special 14770
## 5 Special Election 14
## 6 Special Legislative 75
ggplot(nnv, aes(x = type)) +
geom_bar(stat = "count")
- How many candidates and how often do they appear?
nnv_clean_nameID <- nnv%>%
filter(name_id!="null")
nnv_clean_nameID$name_id %>%
unique() %>%
length()
## [1] 33761
nnv %>% count(name, name_id) %>%
ungroup() %>%
arrange(desc(n))
## Source: local data frame [41,738 x 3]
##
## name name_id n
## (chr) (chr) (int)
## 1 Caleb Strong SC0023 6282
## 2 William Phillips PW0081 6136
## 3 Elbridge Gerry GE0049 5420
## 4 others null 3993
## 5 James Sullivan SJ0366 3930
## 6 William Heath HW0171 3910
## 7 Edward H. Robbins RE0018 3693
## 8 scattering null 3650
## 9 Samuel Adams AS0022 3390
## 10 Moses Gill GM0022 3298
## .. ... ... ...
There are a total of 33,761 different candidates over the timespan of this dataset. I removed teh “null” values from the name_id column. The most common individual in the dataset is “Caleb Strong” with 6282 observations.
- Which parties?
nnv_aff<- nnv %>%
filter(affiliation != "null")
nnv_aff$affiliation %>%
unique()
## [1] "Federalist"
## [2] "Democrat"
## [3] "Republican"
## [4] "Democratic"
## [5] "Pro-Slavery"
## [6] "Restrictionist"
## [7] "Anti-Restrictionist"
## [8] "Coalition"
## [9] "Federal"
## [10] "Republicans"
## [11] "Federalists"
## [12] "Clintonian"
## [13] "Bucktail"
## [14] "Quid"
## [15] "Lewis Ticket"
## [16] "Clinton Ticket"
## [17] "Lewisite"
## [18] "Anti-Division"
## [19] "Division"
## [20] "Administration"
## [21] "Anti-Administration"
## [22] "Administration Ticket"
## [23] "Tammany Ticket"
## [24] "Clintonian Ticket"
## [25] "Anti-Federal"
## [26] "Anti-Federalist"
## [27] "Clintonian / Federal"
## [28] "Federalist/Clintonian"
## [29] "Independent/Federalist"
## [30] "Indepedent/Federalist"
## [31] "1st Ticket Democratic Republican"
## [32] "2nd Ticket Democratic Republican"
## [33] "Clinton"
## [34] "Republican/Clintonian"
## [35] "Clintonian/Federal"
## [36] "Clintonian/Federalist"
## [37] "High Federalist"
## [38] "Low Federalist"
## [39] "High Republican"
## [40] "Low Republican"
## [41] "Others"
## [42] "Union"
## [43] "Independent"
## [44] "The People's Ticket"
## [45] "King Caucus Ticket"
## [46] "People's Candidate"
## [47] "People's Ticket"
## [48] "King Caucus"
## [49] "Republican/Federalist"
## [50] "Anti-Federalist (Republican)"
## [51] "Union of Parties"
## [52] "Federal / Clintonian"
## [53] "Federal/Clintonian"
## [54] "Anti-Clintonian"
## [55] "Anti-Clinton"
## [56] "nul"
## [57] "Anti Federal"
## [58] "Republican / Anti Federalist"
## [59] "Republican / Anti Fedealist"
## [60] "Republican/ Anti Federalist"
## [61] "Republican/Anti-Federalist"
## [62] "Anti-Caucus"
## [63] "Caucus"
## [64] "Opposition Republican"
## [65] "No Jew-Bill Ticket"
## [66] "Jew-Bill Ticket"
## [67] "Federalist / I"
## [68] "Opposition"
## [69] "Federalist/Opposition"
## [70] "Caucus Ticket"
## [71] "Anti-Caucus Ticket"
## [72] "Moderates"
## [73] "Violents"
## [74] "Jacksonian"
## [75] "Adamite"
## [76] "Crawfordite"
## [77] "Pig Point or Lower Candidates"
## [78] "Choptank Bridge or Upper Candidates"
## [79] "Chesapeake"
## [80] "Potomac"
## [81] "Democrat/Republican"
## [82] "The Federal Republican Ticket"
## [83] "Crawford"
## [84] "Jackson"
## [85] "Jeffersonian"
## [86] "Bank Tax Ticket"
## [87] "Anti-Bank Tax Ticket"
## [88] "Independent Republican"
## [89] "War Ticket"
## [90] "Peace Ticket"
## [91] "Federal Republican"
## [92] "American"
## [93] "French"
## [94] "Peacemaker"
## [95] "Warhawk"
## [96] "Pro-Administration"
## [97] "Federalist/Union"
## [98] "Peace"
## [99] "Anti-Relief"
## [100] "Relief"
## [101] "For New Election"
## [102] "Against New Election"
## [103] "Constitutionalist"
## [104] "Court"
## [105] "Country"
## [106] "New-Electionist"
## [107] "Williamsite"
## [108] "Anti-Slavery"
## [109] "opp. Republican"
## [110] "Federalist and Republican"
## [111] "Federal/Quid"
## [112] "Tammany"
## [113] "Anti-Tammany"
## [114] "Democratic Republican"
## [115] "Opposition/Federal"
## [116] "Republican/Quid"
## [117] "Minority"
## [118] "Federalist/Quid"
## [119] "Republican / Federalist"
## [120] "Anti-Tolerationist"
## [121] "Tolerationist"
## [122] "Anit-Republican"
## [123] "Jacobin"
## [124] "Democrats"
## [125] "Federal/Independent Republican"
## [126] "Leibite"
## [127] "Whig"
## [128] "Leib"
## [129] "Constitutionalist/Federalist"
## [130] "Federalist/Constitutionalist"
## [131] "Constitutional/Federalist"
## [132] "Constutionalist"
## [133] "Constitutionalsit"
## [134] "Repblican"
## [135] "Hiester ticket"
## [136] "Findlay ticket"
## [137] "Hiesterites"
## [138] "Anti-Republican"
## [139] "Tory"
## [140] "Federalist Republican"
## [141] "Federal Republicans"
## [142] "Quid/Federalist"
## [143] "Constitutionalist/Republican"
## [144] "Friends of Peace"
## [145] "British"
## [146] "Republican Danville"
## [147] "Republican Bloomsburg"
## [148] "Old School Republican"
## [149] "Republican - New School"
## [150] "Republican - Old School"
## [151] "New School Democrat"
## [152] "Old School Democrats"
## [153] "Federlaist"
## [154] "Conventionalist"
## [155] "Constitutional"
## [156] "Federalist/Republican"
## [157] "First Democratic Ticket"
## [158] "Second Democratic Ticket"
## [159] "Friends of the Constitution"
## [160] "Snyderites or Jacobins"
## [161] "American Republican"
## [162] "Oppositional Republican"
## [163] "Federal/Constitutional"
## [164] "New School"
## [165] "Old School"
## [166] "Constitutional/Federal"
## [167] "Ticket in favor of the division of old Northumberland County"
## [168] "Second Republican Ticket"
## [169] "Non-Descripts"
## [170] "Snyderites"
## [171] "Old School Democrat"
## [172] "Democrat/Old School"
## [173] "Republican-Democratic Delegation"
## [174] "Federal-Independent Democratic Delegation"
## [175] "Findlayite"
## [176] "Binnite"
## [177] "Federalist/Constitutional"
## [178] "Federal/Constitutionalist"
- What does each row in the dataset represent? Each row represents the total votes for a candidate in a specific election for a specific year for a specific region. However, the compilers of this dataset also choes to calculate total votes for elections at various geogrpahic levels, thus repeating vote counts as new observations.
- Which years are in the dataset, and how many elections are there in each year?
nnv <- nnv %>%
mutate(year = str_extract(date, "\\d{4}") %>% as.integer())
nnv %>%
group_by(year) %>%
summarise(unique_elements = n_distinct(id))
## Source: local data frame [40 x 2]
##
## year unique_elements
## (int) (int)
## 1 1787 69
## 2 1788 159
## 3 1789 186
## 4 1790 168
## 5 1791 156
## 6 1792 194
## 7 1793 167
## 8 1794 243
## 9 1795 202
## 10 1796 357
## .. ... ...
- Which states are represented?
nnv_clean_state<-nnv %>%
filter(state != "null")
nnv_clean_state$state %>%
unique()
## [1] "Vermont" "Missouri" "New Hampshire" "Tennessee"
## [5] "New York" "NY" "Alabama" "Maryland"
## [9] "Indiana" "Georgia" "New Jersey" "Massachusetts"
## [13] "Rhode Island" "Maine" "North Carolina" "Kentucky"
## [17] "Mississippi" "Illinois" "Louisiana" "Ohio"
## [21] "South Carolina" "Virginia" "Delaware" "Connecticut"
## [25] "Pennsylvania"
I started the exploring the dataset portion of the assignment but filtering the dataset down to just Massachusetts. As you had shown us some visualizaitons for Governor, I decided to look at Lieutenant Governor position. I noticed that Samuel Adams received votes for that office in 1787 and 1788 but that he also received votes for other offices (Governor, U.S. House of Representatives, Electoral College).
nnv_ma <-nnv %>%
filter(state == "Massachusetts") %>%
filter(is.na(district) & is.na(town)& is.na(city)& is.na(county)) %>%
arrange(year)
nnv_ma %>%
filter(year==1787, name_id=="AS0022") %>%
ggplot(aes(x=office))+
geom_bar(stat = "identity", aes(y=vote), fill="red") +
geom_text(aes(label=vote, y=vote), position=position_dodge(width=0.9), vjust=-0.25)+
labs(title="Samuel Adams' 1787 votes per Office", y="Total Votes", x='')+
theme(axis.text.x =
element_text(size = 10,
angle = 45,
hjust = 1,
vjust = 1))
I decided to use a geom_line to map out Samuel Adams’ political career. This graph shows the drastic increase in votes for both “Lieutenant Governor” and “Governor.” Anytime a point appears, he recieved at least one vote for that office.
Following this line of inquiry futher, I wanted to know how many individuals, within the state of Massacusetts and the datasets date range, received votes for multiple political offices in the same election year.
nnv_ma_count <- nnv_ma %>% count(name, name_id, office, year) %>%
filter(name_id != "null") %>%
count(name, name_id, year) %>%
filter(n>=2)
nnv_ma_count %>%
ggplot(aes(x = n))+
geom_bar(stat = "count")+
labs(title="Total Number of Occurences by Office", y="Total Count", x='Number of Offices')
This shows you the number of candidates that that received votes for 2,3,4,or 5 offices in a particular given election year.
Note that the echo = FALSE
parameter was added to the code chunk to prevent printing of the R code that generated the plot.