nnv

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(readr)
url <- "http://dl.tufts.edu/file_assets/generic/tufts:MS115.003.001.00001/0"
if (!file.exists("all-votes.tsv")) {
  download.file(url, "nnv-all-votes.zip")
  unzip("nnv-all-votes.zip", files = "all-votes.tsv")
}
nnv <- read_tsv("all-votes.tsv")

You can also embed plots, for example:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(tidyr)
library(stringr)

names(nnv) <- names(nnv) %>% str_to_lower() %>% str_replace_all("\\ ", "_")

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

The report should be broken into two parts. The first part should be an exploration of the dataset where you examine what is in the data and explain (briefly) what you are trying to figure out and what you are seeing. Answer these kinds of questions, using plots if you need to.

What kinds of elections were there?

nnv$type %>%
    unique()

## [1] "General"             "Legislative"         "Special"            
## [4] "Legislastive"        "Special Legislative" "Special Election"

How many of each kind of election?

nnv %>%
    ggplot(aes(x=type)) +
    geom_bar(stat = "count")

How many candidates and how often do they appear?

nnv$name %>%
  unique() %>%
  length()

## [1] 34880

nnv$name_id %>%
  unique() %>%
  length()

## [1] 33763

Above I decided to use the unique and length function to find the different candidates for name

Which parties?

nnv$affiliation %>%
  unique() %>%
  length()

## [1] 180

nnv_aff <-nnv %>%
filter(affiliation != "null") 

nnv_aff$affiliation %>%
  unique() %>%
  head()

## [1] "Federalist"     "Democrat"       "Republican"     "Democratic"    
## [5] "Pro-Slavery"    "Restrictionist"

What does each row in the dataset represent?

Each row is a particular candidate runing in a particular election in a particular year.

Which years are in the dataset, and how many elections are there in each year?

nnv <- nnv %>% 
  mutate(year = str_extract(date, "\\d{4}") %>% as.integer())
  
nnv %>%
  group_by(year) %>%
  summarise(unique_elements = n_distinct(id))

## Source: local data frame [40 x 2]
## 
##     year unique_elements
##    (int)           (int)
## 1   1787              69
## 2   1788             159
## 3   1789             186
## 4   1790             168
## 5   1791             156
## 6   1792             194
## 7   1793             167
## 8   1794             243
## 9   1795             202
## 10  1796             357
## ..   ...             ...

Which states are represented?

nnv_state <- nnv %>%
  filter(state != "null")

nnv_state$state %>%
  unique()

##  [1] "Vermont"        "Missouri"       "New Hampshire"  "Tennessee"     
##  [5] "New York"       "NY"             "Alabama"        "Maryland"      
##  [9] "Indiana"        "Georgia"        "New Jersey"     "Massachusetts" 
## [13] "Rhode Island"   "Maine"          "North Carolina" "Kentucky"      
## [17] "Mississippi"    "Illinois"       "Louisiana"      "Ohio"          
## [21] "South Carolina" "Virginia"       "Delaware"       "Connecticut"   
## [25] "Pennsylvania"

Now it’s time to take a look at The Old Dominion! Virginia exceptionalism at it’s finest. So why not start with the richest county in the United States and view the votes in Loudoun’s Electoral College election.

nnv_va <- nnv %>%
  filter(state == "Virginia", county == "Loudoun", office =="Electoral College") %>%
  group_by(affiliation) %>%
  select(-id, -territory, -township, -ward, borough, -parish)

nnv_va <- nnv_va %>%
  filter(affiliation != "null")

ggplot(nnv_va, aes(x = year, y = vote,
color = affiliation)) +
geom_point() +
  labs(title = "Early Republic Electoral College Votes in Loudoun",
       x= "Election Years",
       y = "Number of Votes")

The graph above shows the number of votes by affiliation between the years 1795 and 1820 in Loudoun County. While the results aren’t massive, it is easy to track the rise and fall of different political parties throughout the years. Initially, the republican party was most popular, however by 1814 the Federalist pary took over. Interestingly, the graph shows that there was a short term rise in popularity of the Quid party when they won the Electoral College election in 1807.

The next graph is also interested in the Electoral College elections, except this time for Fairfax County. Since these two counties boarder each other, it will be interesting to observe how similarities and differences in voting for the same elections.

nnv_va_fx <- nnv %>%
  filter(state == "Virginia", county == "Fairfax", office =="Electoral College") %>%
  group_by(affiliation) %>%
  select(-id, -territory, -township, -ward, borough, -parish)

nnv_va_fx <- nnv_va_fx %>%
  filter(affiliation != "null")

ggplot(nnv_va_fx, aes(x = year, y = vote,
color = affiliation)) +
geom_point() +
  labs(title = "Early Republic Electoral College Votes in Fairfax",
       x= "Election Years",
       y = "Number of Votes")

Interestingly, the election results between Loudoun and Fairfax are different. For instance, the Quid party never wins an Electoral College election and the Federalist party was more popular early on in county election before eventually losing almost every year to the Republicans.

However, now that I’ve looked at two Northern Virginia Counties, I’m interested in how the Federalist, Republican, and Quid parties faired across Virginia in the Electoral College elections.

nnv_virginia <- nnv %>%
  filter(state == "Virginia", office =="Electoral College") %>%
  group_by(affiliation) %>%
  select(-id, -territory, -township, -ward, borough, -parish)

nnv_virginia <- nnv_virginia %>%
  filter(affiliation != "null")

ggplot(nnv_virginia, aes(x = year, y = vote, 
color = affiliation)) +
  geom_point()+
  labs(title = "Early Republic Electoral College Votes in Virginia",
       x= "Election Years",
       y = "Number of Votes") +
  scale_y_log10()

## Warning: Removed 230 rows containing missing values (geom_point).

While this plot is helpful and shows me the number of votes in each election year, it could still be better if I seperate the the parties into seperate graphs.

ggplot(nnv_virginia, aes(x = year, y = vote)) +
geom_count(shape = 1, alpha = 0.6) +
facet_wrap(~ affiliation) +
scale_y_log10()

## Warning: Removed 230 rows containing non-finite values (stat_sum).

The graph above seperates the three major political party and shows the density of votes for each year. This gives a better idea of support for each party.

However, what if I wanted to view the number of votes in spacific counties and and compare how often people voted during the Early Republic. There are far too many counties in Virginia to facet sperately by county, but paring down the number I can at least veiw some differences outside of Fairfax and Loudoun. By using the slice function inside of filter, I can view nine counties and compare them. (These counties werre chosen at random)

nnv_virginia <- nnv %>%
 filter(county == c("Loudoun", "Fairfax", "King George", "Surry", "York", "Henrico", "Orange", "Augusta", "Chesterfield")) %>%
 group_by(affiliation) %>%
 select(-id, -territory, -township, -ward, borough, -parish)

## Warning in c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, :
## longer object length is not a multiple of shorter object length

ggplot(nnv_virginia, aes(x = year, y = vote)) + 
geom_count(shape = 1, alpha = 0.6) +
facet_wrap(~ county) +
scale_y_log10()

## Warning: Removed 124 rows containing non-finite values (stat_sum).

nnv

Stephanie Walters

3/16/2016