This is how I’d go about using the tidyverse packages (dplyr, tidyr, purrr, et al.) to import and tidy booth level data on NZ election results. The code results in a tidy data frame in which each row is an observation (number of votes for an individual candidate at a particular booth).
First we load the packages we’ll need. I use rvest for web scraping and the tidyverse for everything else.
library(rvest)
## Loading required package: xml2
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2.9000 ✓ purrr 0.3.4.9000
## ✓ tibble 3.0.3.9000 ✓ dplyr 1.0.2.9000
## ✓ tidyr 1.1.2.9000 ✓ stringr 1.4.0.9000
## ✓ readr 1.3.1.9000 ✓ forcats 0.5.0.9000
## ── Conflicts ────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag() masks stats::lag()
## x purrr::pluck() masks rvest::pluck()
Now we extract the URLs for the individual CSVs from the NZ election results website:
cand_list_url <- "https://archive.electionresults.govt.nz/electionresults_2017/statistics/votes-by-voting-place-electorate-index.html"
cand_csv_urls <- cand_list_url %>%
read_html() %>%
html_nodes("td:nth-child(3) a") %>%
html_attr("href")
base_url <- "https://archive.electionresults.govt.nz/electionresults_2017/statistics/"
cand_csv_urls <- paste0(base_url, cand_csv_urls)
Now we have URLs for the individual CSVs. Here’s what they look like:
Now we download the CSVs:
cand_filenames <- file.path("data", basename(cand_csv_urls))
walk2(cand_csv_urls, cand_filenames, download.file)
Now we have the CSV locally, we can import them. The CSVs are formatted in a non-standard way - there doesn’t appear to be enough commas in the initial rows. To make sure we get all the information we want, I first read the CSV from the third row onwards, then read them again just to get the electorate name from the second row.
cands <- map(cand_filenames, read_csv,
skip = 2)
# We have to read the file twice because it's formatted incorrectly, with too
# few commas in the initial rows
get_elec_names <- function(filename) {
x <- read_csv(filename, skip = 1, n_max = 1) %>%
names()
x[1]
}
elec_names <- map_chr(cand_filenames, get_elec_names)
cands <- set_names(cands, elec_names)
Now we have a list (cands), each element of which is a dataframe corresponding to an electorate CSV.
Now the fun stuff. We wrangle each dataframe to get it in a tidy format, in which each row is an observation (booth-level number of votes for an individual candidate). Note that I drop the little table at the bottom of each CSV - I’m only reading the main table.
# Create a function to tidy the electorate-candidate dfs
tidy_elec_df <- function(df) {
df %>%
rename(area = 1, booth = 2) %>%
# Move place type ('advanced' v regular) to its own column
mutate(place_type = if_else(is.na(booth),
area,
NA_character_),
area = if_else(!is.na(place_type),
NA_character_,
area)) %>%
# Fill blank cells with the previous non-blank text
fill(place_type, area, booth, .direction = "down") %>%
# Drop the weird little table underneath the main table
filter(!is.na(`Total Valid Candidate Votes`)) %>%
# Drop booth totals
select(-`Total Valid Candidate Votes`) %>%
# Pivot to long/tidy format
gather(key = candidate, value = votes,
-place_type, -area, -booth) %>%
mutate(votes = as.numeric(votes))
}
# Apply the function to each df in the list
cands_df <- map_dfr(cands, tidy_elec_df, .id = "electorate")
Now you have a tidy dataframe of booth level results for each electorate. It has 47951 rows and looks like this:
| electorate | area | booth | place_type | candidate | votes |
|---|---|---|---|---|---|
| Auckland Central 1 | Auckland City | Atrium, Takutai Square | Advance Voting Places | EDWARDS, Frank Torrens | 20 |
| Auckland Central 1 | Auckland City | Auckland University, AUSA Club Space, The Quad, Alfred Street | Advance Voting Places | EDWARDS, Frank Torrens | 11 |
| Auckland Central 1 | Auckland City | AUT University, Level 4 Library Foyer, WA Building, 55 Wellesley Street East | Advance Voting Places | EDWARDS, Frank Torrens | 23 |
| Auckland Central 1 | Auckland City | Liston House, 30-32 Hobson Street | Advance Voting Places | EDWARDS, Frank Torrens | 6 |
| Auckland Central 1 | Freemans Bay | Victoria Park New World, 2 College Hill | Advance Voting Places | EDWARDS, Frank Torrens | 24 |
| Auckland Central 1 | Freemans Bay | Auckland Hospital Mobile & Advance Voting | Advance Voting Places | EDWARDS, Frank Torrens | 3 |
| Auckland Central 1 | Grey Lynn | Grey Lynn Community Centre, Oval Room, 510 Richmond Road | Advance Voting Places | EDWARDS, Frank Torrens | 8 |
| Auckland Central 1 | Grey Lynn | Grey Lynn Library Hall, 474 Great North Road | Advance Voting Places | EDWARDS, Frank Torrens | 3 |
| Auckland Central 1 | Mangere | Auckland International Airport Terminal, International Departure Lounge | Advance Voting Places | EDWARDS, Frank Torrens | 0 |
| Auckland Central 1 | Mt Albert | UNITEC, Te Puna Building, Student Drop in Space, Level 1, 139 Carrington Road | Advance Voting Places | EDWARDS, Frank Torrens | 0 |