Often when you’re interested in some super complicated data presentation online, and converting the underlying data to a nice table, there’s a super elegant way to proceed lurking underneath the site.
When you imagine:
a cool map like this from the Times on the 2020 election
or the Facebook/Meta IPO in historical context
or the ways in which Obama could put together a victory over Romney in 2012
…these are all Javascript!
(Quick aside–Javascript was built in the early 1990s by Netscape to build more dynamic and elaborate websites. As Internet Explorer receded in the early 2000s, the emerging open source browser companies settled on Javascript as an efficient client side scripting language. Most of the elaborate web designs use Javascript).
Of key relevance to us–underpinning these elaborate websites are often simple data hierarchies loaded invisibly in your browser. You can often locate these on a site and load them directly, ready for us to do statistics.
The Guardian published a pretty elaborate and elegant graph from last month’s general election. It can be accessed here: https://www.theguardian.com/politics/ng-interactive/2024/jul/04/uk-general-election-results-2024-live-in-full
Poke around. Zoom into London or Birmingham or Manchester and see how little UK constituencies are!
And whoa these data are dis-aggregated!
Mmmmm how to scrape though? Are go do some goofy SelectorGadget’ing around?
Go to your Chrome window with the Guardian map open.
Press Command + Option + I
(for Mac) or
CTRL + Shift + I
(for Windows) to Inspect the
page.
It should return something like
It may look foreboding, but it’s really just a symbolic depiction of the
objects which are depicted in the page’s visual form.
Click on the Network tab in the tab header
We’re going to exploit the json object’s size (anything will all these
candidates and their vote totals must be sizable.
Now refresh the web page. You’ll see all the objects reloaded. Sort these objects by their size. Also enter the search term “.json” in the filter text field.
Inspect the output of each object. You might find this item particularly interesting
Right click the thinresults.json
and select Copy
URL. We can load this json object in R.
library(tidyverse)
library(magrittr)
library(rvest)
library(jsonlite)
j1 <- "https://interactive.guim.co.uk/2024/07/elex-data/production/data/ge/thinresults.json" %>%
jsonlite::read_json()
The first item in the j1
list describes candidates
t1 <- j1 %>%
map(
\(i)
i %>%
extract(
j1[[1]] %>%
names %>%
extract(c(1:4, 6:12))
) %>%
enframe %>%
mutate(
value = value %>%
unlist
) %>%
spread(name, value)
) %>%
list_rbind
And this item indexes election results
t2 <- j1 %>%
map(
\(i){
k <- i %>%
extract2(
j1[[1]] %>%
names %>%
extract(13)
)
k %>%
map(
\(m)
m %>%
discard(is.null) %>%
enframe %>%
mutate(
value = value %>% unlist
) %>%
spread(
name, value
)
) %>%
list_rbind %>%
mutate(
ons = i$ons,
name = i$name
)
},
.progress = T
) %>%
list_rbind