After the Brexit referendum, a petition surfaced on the Internet requesting a second referendum. Intuition may tell us that territories where “Remain” won may be the ones which are requesting a second petition, but is this the case?
Luckily, the UK Parliament website offers all the petition data in JSON format available for download, so we can freely analyze them with RStudio.
To start, we simply download the content of the petition data into a variable:
json_data <- fromJSON("https://petition.parliament.uk/petitions/131215.json",flatten=TRUE)
If we analyze the output with str(json_data) (not shown), we can see that we have the petition data divided by country and by parliament constiuencies.
First of all, we start by looking who in the world would love to keep Britain in the UK. So, we keep only the signatures country data:
# filter the petitions by country
sgn_by_country = json_data$data$attributes$signatures_by_country
Then, we look at the top countries who want Britain in the EU. We sort the array and display the top 20 countries using head:
sgn_by_country = sgn_by_country[order(-sgn_by_country$signature_count),]
head(sgn_by_country,n=20)
## name code signature_count
## 210 United Kingdom GB 3575162
## 69 France FR 24966
## 178 Spain ES 15721
## 11 Australia AU 15629
## 211 United States US 14787
## 74 Germany DE 9875
## 38 Canada CA 5506
## 76 Gibraltar GI 5379
## 139 New Zealand NZ 4597
## 137 Netherlands NL 4201
## 92 Ireland IE 4089
## 95 Italy IT 3592
## 190 Switzerland CH 3119
## 18 Belgium BE 3100
## 85 Hong Kong HK 2903
## 209 United Arab Emirates AE 2728
## 169 Singapore SG 1700
## 189 Sweden SE 1688
## 174 South Africa ZA 1482
## 155 Portugal PT 1399
Ooh, so it seems that French don’t hae Britons so much! The number of signatures from Gibraltar is also very interesting, considering that the population of the territory is just about 35,000.
So, to have a clearer picture, let’s put this data on a beautiful world map!
sgn_by_country$log_signature_count = log10(sgn_by_country$signature_count)
dataMap <- joinCountryData2Map(sgn_by_country,joinCode="ISO2",nameJoinColumn="code")
## 213 codes from your data successfully matched countries in the map
## 9 codes from your data failed to match with a country code in the map
## 28 codes from the map weren't represented in your data
mapData = mapCountryData(dataMap, nameColumnToPlot="signature_count",mapTitle="Second Brexit Referendum",catMethod=10^(0:7),addLegend = FALSE)
do.call( addMapLegendBoxes, c(mapData,title="# signatures",x="bottomleft"))
Now let’s see at how the petition is popular in Britain. As before, keep the constituencies only from the JSON and look at the top 20.
signatures_by_constituency = json_data$data$attributes$signatures_by_constituency
signatures_by_constituency = signatures_by_constituency[order(-signatures_by_constituency$signature_count),]
head(signatures_by_constituency[,c(1,4)],n=20)
## name signature_count
## 294 Hornsey and Wood Green 21865
## 126 Bristol West 20926
## 269 Hampstead and Kilburn 20742
## 292 Holborn and St Pancras 19752
## 262 Hackney North and Stoke Newington 18742
## 456 Richmond Park 18556
## 305 Islington North 18224
## 263 Hackney South and Shoreditch 17672
## 122 Brighton, Pavilion 17625
## 66 Battersea 17261
## 171 Cities of London and Westminster 17102
## 76 Bethnal Green and Bow 17002
## 209 Dulwich and West Norwood 16647
## 311 Kensington 16623
## 306 Islington South and Finsbury 16493
## 577 Vauxhall 16414
## 541 Streatham 16076
## 74 Bermondsey and Old Southwark 16068
## 142 Cambridge 16046
## 440 Poplar and Limehouse 15929
Now let’s do the map. Unfortunately, Nothern Ireland will be exluded from our map, since the geometry data I found doesn’t include it.
First, we read the map data and match the name of the constituencies.
# load the map
ukMap = readOGR("https://github.com/martinjc/UK-GeoJSON/raw/master/json/electoral/gb/topo_wpc.json","wpc")
## OGR data source with driver: GeoJSON
## Source: "https://github.com/martinjc/UK-GeoJSON/raw/master/json/electoral/gb/topo_wpc.json", layer: "wpc"
## with 632 features
## It has 2 fields
# match the names in the map
ind = match(as.character(ukMap@data$id),signatures_by_constituency$ons_code)
ukMap@data$name = signatures_by_constituency$name[ind]
ukMap@data$votes = signatures_by_constituency$signature_count[ind]
Then, we prepare the dataframe for plotting:
# build the dataframe to print
constituencies.map <- data.frame(id=0:(length(ukMap@data$name)-1),
Constituency=as.character(ukMap@data$name))
plotData = fortify(ukMap)
## Regions defined for each Polygons
plotData = merge(plotData,constituencies.map,by="id")
ind = match(plotData$Constituency,as.factor(signatures_by_constituency$name))
plotData$votes = signatures_by_constituency$signature_count[ind]
And finally, the maps!
theme_set(theme_minimal())
p = ggplot(data=plotData,
aes(x=long, y=lat,
group=group))
p = p +
geom_map(data = plotData,
map = plotData,
aes(map_id=id, x=long, y=lat, group=group,
fill=plotData$votes),
color="white", size=0.001) +
scale_fill_gradient(
#guide = "legend",
breaks = c(0,200,400,800,1600,3200,6400,12800,25600),
name = "# signatures",
low='green', high='red', trans="log")
p = p +
labs(x="", y="", title="Petition for second Brexit referendum")+ #labels
theme(axis.ticks.y = element_blank(),axis.text.y = element_blank(),
axis.ticks.x = element_blank(),axis.text.x = element_blank(),
plot.title = element_text(lineheight=1, face="bold")) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
p
Hmm, nice. But it’s not that useful: we should normalize petition signatures by the number of voters for each region. Thankfully, the UK’s electoral commission is very helpful: we can download 2015 electoral data from the UK’s electoral commission website.
For convenience, I pre-processed and cleaned the data a bit before feeding them into R. Let’s see which are the top 20 constituencies by percentage of electorate requesting a second referendum:
electoralData = read.csv(
text = getURL("https://raw.githubusercontent.com/basaldella/second-brexit/master/UkConstituencyData2015.csv"),
header = TRUE,
stringsAsFactors = FALSE)
electoralData = merge(electoralData,signatures_by_constituency,by="name")
electoralData$signaturePercent = 100* electoralData$signature_count / electoralData$electorate
electoralData = electoralData[order(-electoralData$signaturePercent),]
head(electoralData[,c(1,9)],20)
## name signaturePercent
## 139 Cities of London and Westminster 28.03974
## 288 Hornsey and Wood Green 27.59095
## 307 Kensington 27.19153
## 263 Hampstead and Kilburn 25.86446
## 301 Islington North 24.85339
## 302 Islington South and Finsbury 24.20920
## 129 Chelsea and Fulham 24.02092
## 466 Richmond Park 24.00424
## 619 Westminster North 23.70962
## 88 Brighton, Pavilion 23.02206
## 92 Bristol West 22.93612
## 286 Holborn and St Pancras 22.76520
## 31 Battersea 22.67872
## 186 Ealing Central and Acton 22.00722
## 179 Dulwich and West Norwood 21.73947
## 262 Hammersmith 21.43688
## 457 Putney 21.29124
## 256 Hackney North and Stoke Newington 21.26076
## 625 Wimbledon 21.13951
## 257 Hackney South and Shoreditch 20.79768
Let’s place it again on a beautiful map to make it clearer:
ind = match(plotData$Constituency,as.factor(electoralData$name))
plotData$votesPercent = electoralData$signaturePercent[ind]
p = ggplot(data=plotData,
aes(x=long, y=lat,
group=group))
p = p +
geom_map(data = plotData,
map = plotData,
aes(map_id=id, x=long, y=lat, group=group,
fill=plotData$votesPercent),
color="white", size=0.001) +
scale_fill_gradient(
#guide = "legend",
breaks = c(0,5,10,15,20,25),
name = "% of voters\nrequesting 2nd\nreferendum",
low='green', high='red')
p = p +
labs(x="", y="", title="Petition for second Brexit referendum")+ #labels
theme(axis.ticks.y = element_blank(),axis.text.y = element_blank(),
axis.ticks.x = element_blank(),axis.text.x = element_blank(),
plot.title = element_text(lineheight=1, face="bold")) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
p
Unfortunately, we have some grey areas, where data from different sources doesn’t match.
Now we aggregate per region:
aggData = electoralData[,c(3,5,8)]
regionalData <-aggregate(data = aggData, electorate~region,FUN=sum, na.rm=TRUE)
regionalData$signaturePercentRegion <-aggregate(data = aggData, signature_count~region,FUN=sum, na.rm=TRUE)$signature_count
regionalData$signaturePercentRegion = 100 * regionalData$signaturePercentRegion / regionalData$electorate
regionalData = regionalData[order(-regionalData$signaturePercentRegion),]
head(regionalData,n=20)
## region electorate signaturePercentRegion
## 3 London 5407830 14.758397
## 8 South East 6409317 9.325502
## 9 South West 4000463 8.275117
## 1 Eastern 4365302 7.431216
## 6 North West 5230395 6.371259
## 10 wales 2231815 6.156111
## 2 East Midlands 3354204 5.876476
## 12 Yorkshire and the Humber 3862394 5.721788
## 11 West Midlands 4102205 5.521713
## 4 NI 1236765 5.213521
## 5 North East 1923727 4.829479
## 7 Scotland 4099532 4.260072
Well, it seems to be that the regions where the Leave won are the ones who now are requesting a second referendum the most. Scotland would seem to not care so much - probably, they would just prefer indipendence. The same holds for Nothern Ireland (NI in the output), with just 5% of voters requesting the referendum.
Now we plot this data on a map:
# pre-plot
electoralData = merge(electoralData,regionalData[,c(1,3)],by="region")
ind = match(plotData$Constituency,as.factor(electoralData$name))
plotData$votesPercentRegion = electoralData$signaturePercentRegion[ind]
p = ggplot(data=plotData,
aes(x=long, y=lat,
group=group))
p = p +
geom_map(data = plotData,
map = plotData,
aes(map_id=id, x=long, y=lat, group=group,
fill=plotData$votesPercentRegion),
color="white", size=0.001) +
scale_fill_gradient(
#guide = "legend",
#breaks = c(0,5,10,15,20,25),
name = "% of voters\nrequesting 2nd\nreferendum",
low='green', high='red')
p = p +
labs(x="", y="", title="Petition for second Brexit referendum")+ #labels
theme(axis.ticks.y = element_blank(),axis.text.y = element_blank(),
axis.ticks.x = element_blank(),axis.text.x = element_blank(),
plot.title = element_text(lineheight=1, face="bold")) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
p
Beautiful, eh? This gives a very clear picture on how Englishmen seem to be the ones regretting their decision the most! Anyways, I am by no means a political scientist: I just wanted to have a bit of fun with R, so I’ll leave the interpretation of this picture to someone else.
I took many ideas and code from this github repo. Spatial data for the maps comes from Martin Chorley’s UK-GeoJSON project, and it’s licensed with a CC-BY 4.0 license. Petition data and electoral body data come from the UK’s institutional websites (linked above).
Using electoral data to compute these maps is probably not the best choice, because the petition can be signed by anyone, not just people with voting rights.
Moreover, suspicions rose that the petition has been manipulated by citiziens signing twice, or giving away their postal code. Anyways, the data has been periodically cleaned by the authorities: for example, I own one dump file with thousand of signatures from Vatican City, which have then disappeared.
Source code and processed data for this experiment is available on github.
This notebook is licensed with a CC-BY 4.0 license. Please attribute by linking back to this file or to the corresponding github repo.