This is how I’d go about using the tidyverse packages (dplyr, tidyr, purrr, et al.) to import and tidy booth level data on NZ election results. The code results in a tidy data frame in which each row is an observation (number of votes for an individual candidate at a particular booth).

Load packages

First we load the packages we’ll need. I use rvest for web scraping and the tidyverse for everything else.

library(rvest)
## Loading required package: xml2
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2.9000     ✓ purrr   0.3.4.9000
## ✓ tibble  3.0.3.9000     ✓ dplyr   1.0.2.9000
## ✓ tidyr   1.1.2.9000     ✓ stringr 1.4.0.9000
## ✓ readr   1.3.1.9000     ✓ forcats 0.5.0.9000
## ── Conflicts ────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter()         masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag()            masks stats::lag()
## x purrr::pluck()          masks rvest::pluck()

Scrape website

Now we extract the URLs for the individual CSVs from the NZ election results website:

cand_list_url <- "https://archive.electionresults.govt.nz/electionresults_2017/statistics/votes-by-voting-place-electorate-index.html"

cand_csv_urls <- cand_list_url %>%
  read_html() %>%
  html_nodes("td:nth-child(3) a") %>%
  html_attr("href") 

base_url <- "https://archive.electionresults.govt.nz/electionresults_2017/statistics/"

cand_csv_urls <- paste0(base_url, cand_csv_urls)

Now we have URLs for the individual CSVs. Here’s what they look like:

https://archive.electionresults.govt.nz/electionresults_2017/statistics/csv/candidate-votes-by-voting-place-1.csv

Download

Now we download the CSVs:

cand_filenames <- file.path("data", basename(cand_csv_urls))

walk2(cand_csv_urls, cand_filenames, download.file)

Import CSVs

Now we have the CSV locally, we can import them. The CSVs are formatted in a non-standard way - there doesn’t appear to be enough commas in the initial rows. To make sure we get all the information we want, I first read the CSV from the third row onwards, then read them again just to get the electorate name from the second row.

cands <- map(cand_filenames, read_csv,
             skip = 2)

# We have to read the file twice because it's formatted incorrectly, with too 
# few commas in the initial rows
get_elec_names <- function(filename) {
  x <- read_csv(filename, skip = 1, n_max = 1) %>%
    names()
  x[1]
}

elec_names <- map_chr(cand_filenames, get_elec_names)

cands <- set_names(cands, elec_names)

Now we have a list (cands), each element of which is a dataframe corresponding to an electorate CSV.

Tidy CSVs

Now the fun stuff. We wrangle each dataframe to get it in a tidy format, in which each row is an observation (booth-level number of votes for an individual candidate). Note that I drop the little table at the bottom of each CSV - I’m only reading the main table.

# Create a function to tidy the electorate-candidate dfs
tidy_elec_df <- function(df) {
  df %>%
    rename(area = 1, booth = 2) %>%
    # Move place type ('advanced' v regular) to its own column
    mutate(place_type = if_else(is.na(booth),
                                area,
                                NA_character_),
           area = if_else(!is.na(place_type),
                          NA_character_,
                          area)) %>%
    # Fill blank cells with the previous non-blank text
    fill(place_type, area, booth, .direction = "down") %>%
    # Drop the weird little table underneath the main table
    filter(!is.na(`Total Valid Candidate Votes`)) %>%
    # Drop booth totals
    select(-`Total Valid Candidate Votes`) %>%
    # Pivot to long/tidy format
    gather(key = candidate, value = votes,
           -place_type, -area, -booth) %>%
    mutate(votes = as.numeric(votes))
}

# Apply the function to each df in the list

cands_df <- map_dfr(cands, tidy_elec_df, .id = "electorate")

Result

Now you have a tidy dataframe of booth level results for each electorate. It has 47951 rows and looks like this:

electorate area booth place_type candidate votes
Auckland Central 1 Auckland City Atrium, Takutai Square Advance Voting Places EDWARDS, Frank Torrens 20
Auckland Central 1 Auckland City Auckland University, AUSA Club Space, The Quad, Alfred Street Advance Voting Places EDWARDS, Frank Torrens 11
Auckland Central 1 Auckland City AUT University, Level 4 Library Foyer, WA Building, 55 Wellesley Street East Advance Voting Places EDWARDS, Frank Torrens 23
Auckland Central 1 Auckland City Liston House, 30-32 Hobson Street Advance Voting Places EDWARDS, Frank Torrens 6
Auckland Central 1 Freemans Bay Victoria Park New World, 2 College Hill Advance Voting Places EDWARDS, Frank Torrens 24
Auckland Central 1 Freemans Bay Auckland Hospital Mobile & Advance Voting Advance Voting Places EDWARDS, Frank Torrens 3
Auckland Central 1 Grey Lynn Grey Lynn Community Centre, Oval Room, 510 Richmond Road Advance Voting Places EDWARDS, Frank Torrens 8
Auckland Central 1 Grey Lynn Grey Lynn Library Hall, 474 Great North Road Advance Voting Places EDWARDS, Frank Torrens 3
Auckland Central 1 Mangere Auckland International Airport Terminal, International Departure Lounge Advance Voting Places EDWARDS, Frank Torrens 0
Auckland Central 1 Mt Albert UNITEC, Te Puna Building, Student Drop in Space, Level 1, 139 Carrington Road Advance Voting Places EDWARDS, Frank Torrens 0