After the Brexit referendum, a petition surfaced on the Internet requesting a second referendum. Intuition may tell us that territories where “Remain” won may be the ones which are requesting a second petition, but is this the case?

Luckily, the UK Parliament website offers all the petition data in JSON format available for download, so we can freely analyze them with RStudio.

To start, we simply download the content of the petition data into a variable:

json_data <- fromJSON("https://petition.parliament.uk/petitions/131215.json",flatten=TRUE)

If we analyze the output with str(json_data) (not shown), we can see that we have the petition data divided by country and by parliament constiuencies.

First of all, we start by looking who in the world would love to keep Britain in the UK. So, we keep only the signatures country data:

# filter the petitions by country
sgn_by_country = json_data$data$attributes$signatures_by_country

Then, we look at the top countries who want Britain in the EU. We sort the array and display the top 20 countries using head:

sgn_by_country = sgn_by_country[order(-sgn_by_country$signature_count),]
head(sgn_by_country,n=20)
##                     name code signature_count
## 210       United Kingdom   GB         3575162
## 69                France   FR           24966
## 178                Spain   ES           15721
## 11             Australia   AU           15629
## 211        United States   US           14787
## 74               Germany   DE            9875
## 38                Canada   CA            5506
## 76             Gibraltar   GI            5379
## 139          New Zealand   NZ            4597
## 137          Netherlands   NL            4201
## 92               Ireland   IE            4089
## 95                 Italy   IT            3592
## 190          Switzerland   CH            3119
## 18               Belgium   BE            3100
## 85             Hong Kong   HK            2903
## 209 United Arab Emirates   AE            2728
## 169            Singapore   SG            1700
## 189               Sweden   SE            1688
## 174         South Africa   ZA            1482
## 155             Portugal   PT            1399

Ooh, so it seems that French don’t hae Britons so much! The number of signatures from Gibraltar is also very interesting, considering that the population of the territory is just about 35,000.

So, to have a clearer picture, let’s put this data on a beautiful world map!

sgn_by_country$log_signature_count = log10(sgn_by_country$signature_count)
dataMap <- joinCountryData2Map(sgn_by_country,joinCode="ISO2",nameJoinColumn="code")
## 213 codes from your data successfully matched countries in the map
## 9 codes from your data failed to match with a country code in the map
## 28 codes from the map weren't represented in your data
mapData = mapCountryData(dataMap, nameColumnToPlot="signature_count",mapTitle="Second Brexit Referendum",catMethod=10^(0:7),addLegend = FALSE)
do.call( addMapLegendBoxes, c(mapData,title="# signatures",x="bottomleft"))

Now let’s see at how the petition is popular in Britain. As before, keep the constituencies only from the JSON and look at the top 20.

signatures_by_constituency = json_data$data$attributes$signatures_by_constituency
signatures_by_constituency = signatures_by_constituency[order(-signatures_by_constituency$signature_count),]

head(signatures_by_constituency[,c(1,4)],n=20)
##                                  name signature_count
## 294            Hornsey and Wood Green           21865
## 126                      Bristol West           20926
## 269             Hampstead and Kilburn           20742
## 292            Holborn and St Pancras           19752
## 262 Hackney North and Stoke Newington           18742
## 456                     Richmond Park           18556
## 305                   Islington North           18224
## 263      Hackney South and Shoreditch           17672
## 122                Brighton, Pavilion           17625
## 66                          Battersea           17261
## 171  Cities of London and Westminster           17102
## 76              Bethnal Green and Bow           17002
## 209          Dulwich and West Norwood           16647
## 311                        Kensington           16623
## 306      Islington South and Finsbury           16493
## 577                          Vauxhall           16414
## 541                         Streatham           16076
## 74       Bermondsey and Old Southwark           16068
## 142                         Cambridge           16046
## 440              Poplar and Limehouse           15929

Now let’s do the map. Unfortunately, Nothern Ireland will be exluded from our map, since the geometry data I found doesn’t include it.

First, we read the map data and match the name of the constituencies.

# load the map
ukMap = readOGR("https://github.com/martinjc/UK-GeoJSON/raw/master/json/electoral/gb/topo_wpc.json","wpc")
## OGR data source with driver: GeoJSON 
## Source: "https://github.com/martinjc/UK-GeoJSON/raw/master/json/electoral/gb/topo_wpc.json", layer: "wpc"
## with 632 features
## It has 2 fields
# match the names in the map
ind = match(as.character(ukMap@data$id),signatures_by_constituency$ons_code)
ukMap@data$name = signatures_by_constituency$name[ind]
ukMap@data$votes = signatures_by_constituency$signature_count[ind]

Then, we prepare the dataframe for plotting:

# build the dataframe to print
constituencies.map <- data.frame(id=0:(length(ukMap@data$name)-1),
                                 Constituency=as.character(ukMap@data$name))
plotData = fortify(ukMap)
## Regions defined for each Polygons
plotData = merge(plotData,constituencies.map,by="id")
ind = match(plotData$Constituency,as.factor(signatures_by_constituency$name))
plotData$votes = signatures_by_constituency$signature_count[ind]

And finally, the maps!

theme_set(theme_minimal())

p = ggplot(data=plotData,
       aes(x=long, y=lat,
           group=group))

p = p + 
  geom_map(data = plotData,
           map = plotData,
           aes(map_id=id, x=long, y=lat, group=group,
               fill=plotData$votes),
           color="white", size=0.001) +
  scale_fill_gradient(
    #guide = "legend",
    breaks = c(0,200,400,800,1600,3200,6400,12800,25600),
    name = "# signatures",
    low='green', high='red', trans="log")

p = p +  
  labs(x="", y="", title="Petition for second Brexit referendum")+ #labels
  theme(axis.ticks.y = element_blank(),axis.text.y = element_blank(), 
        axis.ticks.x = element_blank(),axis.text.x = element_blank(), 
        plot.title = element_text(lineheight=1, face="bold")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
  
  
p

Hmm, nice. But it’s not that useful: we should normalize petition signatures by the number of voters for each region. Thankfully, the UK’s electoral commission is very helpful: we can download 2015 electoral data from the UK’s electoral commission website.

For convenience, I pre-processed and cleaned the data a bit before feeding them into R. Let’s see which are the top 20 constituencies by percentage of electorate requesting a second referendum:

electoralData = read.csv(
  text = getURL("https://raw.githubusercontent.com/basaldella/second-brexit/master/UkConstituencyData2015.csv"),
  header = TRUE,
  stringsAsFactors = FALSE)

electoralData = merge(electoralData,signatures_by_constituency,by="name")
electoralData$signaturePercent = 100* electoralData$signature_count / electoralData$electorate
electoralData = electoralData[order(-electoralData$signaturePercent),]
head(electoralData[,c(1,9)],20)
##                                  name signaturePercent
## 139  Cities of London and Westminster         28.03974
## 288            Hornsey and Wood Green         27.59095
## 307                        Kensington         27.19153
## 263             Hampstead and Kilburn         25.86446
## 301                   Islington North         24.85339
## 302      Islington South and Finsbury         24.20920
## 129                Chelsea and Fulham         24.02092
## 466                     Richmond Park         24.00424
## 619                 Westminster North         23.70962
## 88                 Brighton, Pavilion         23.02206
## 92                       Bristol West         22.93612
## 286            Holborn and St Pancras         22.76520
## 31                          Battersea         22.67872
## 186          Ealing Central and Acton         22.00722
## 179          Dulwich and West Norwood         21.73947
## 262                       Hammersmith         21.43688
## 457                            Putney         21.29124
## 256 Hackney North and Stoke Newington         21.26076
## 625                         Wimbledon         21.13951
## 257      Hackney South and Shoreditch         20.79768

Let’s place it again on a beautiful map to make it clearer:

ind = match(plotData$Constituency,as.factor(electoralData$name))
plotData$votesPercent = electoralData$signaturePercent[ind]

p = ggplot(data=plotData,
       aes(x=long, y=lat,
           group=group))

p = p + 
  geom_map(data = plotData,
           map = plotData,
           aes(map_id=id, x=long, y=lat, group=group,
               fill=plotData$votesPercent),
           color="white", size=0.001) +
  scale_fill_gradient(
    #guide = "legend",
    breaks = c(0,5,10,15,20,25),
    name = "% of voters\nrequesting 2nd\nreferendum",
    low='green', high='red')

p = p +  
  labs(x="", y="", title="Petition for second Brexit referendum")+ #labels
  theme(axis.ticks.y = element_blank(),axis.text.y = element_blank(), 
        axis.ticks.x = element_blank(),axis.text.x = element_blank(), 
        plot.title = element_text(lineheight=1, face="bold")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
  
  
p

Unfortunately, we have some grey areas, where data from different sources doesn’t match.

Now we aggregate per region:

aggData = electoralData[,c(3,5,8)]
regionalData <-aggregate(data = aggData, electorate~region,FUN=sum, na.rm=TRUE)
regionalData$signaturePercentRegion <-aggregate(data = aggData, signature_count~region,FUN=sum, na.rm=TRUE)$signature_count
regionalData$signaturePercentRegion = 100 * regionalData$signaturePercentRegion / regionalData$electorate

regionalData = regionalData[order(-regionalData$signaturePercentRegion),]
head(regionalData,n=20)
##                      region electorate signaturePercentRegion
## 3                    London    5407830              14.758397
## 8                South East    6409317               9.325502
## 9                South West    4000463               8.275117
## 1                   Eastern    4365302               7.431216
## 6                North West    5230395               6.371259
## 10                    wales    2231815               6.156111
## 2             East Midlands    3354204               5.876476
## 12 Yorkshire and the Humber    3862394               5.721788
## 11            West Midlands    4102205               5.521713
## 4                        NI    1236765               5.213521
## 5                North East    1923727               4.829479
## 7                  Scotland    4099532               4.260072

Well, it seems to be that the regions where the Leave won are the ones who now are requesting a second referendum the most. Scotland would seem to not care so much - probably, they would just prefer indipendence. The same holds for Nothern Ireland (NI in the output), with just 5% of voters requesting the referendum.

Now we plot this data on a map:

# pre-plot 
electoralData = merge(electoralData,regionalData[,c(1,3)],by="region")
ind = match(plotData$Constituency,as.factor(electoralData$name))
plotData$votesPercentRegion = electoralData$signaturePercentRegion[ind]

p = ggplot(data=plotData,
       aes(x=long, y=lat,
           group=group))

p = p + 
  geom_map(data = plotData,
           map = plotData,
           aes(map_id=id, x=long, y=lat, group=group,
               fill=plotData$votesPercentRegion),
           color="white", size=0.001) +
  scale_fill_gradient(
    #guide = "legend",
    #breaks = c(0,5,10,15,20,25),
    name = "% of voters\nrequesting 2nd\nreferendum",
    low='green', high='red')

p = p +  
  labs(x="", y="", title="Petition for second Brexit referendum")+ #labels
  theme(axis.ticks.y = element_blank(),axis.text.y = element_blank(), 
        axis.ticks.x = element_blank(),axis.text.x = element_blank(), 
        plot.title = element_text(lineheight=1, face="bold")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
  
  
p

Beautiful, eh? This gives a very clear picture on how Englishmen seem to be the ones regretting their decision the most! Anyways, I am by no means a political scientist: I just wanted to have a bit of fun with R, so I’ll leave the interpretation of this picture to someone else.

Credits

I took many ideas and code from this github repo. Spatial data for the maps comes from Martin Chorley’s UK-GeoJSON project, and it’s licensed with a CC-BY 4.0 license. Petition data and electoral body data come from the UK’s institutional websites (linked above).

Disclaimer

Using electoral data to compute these maps is probably not the best choice, because the petition can be signed by anyone, not just people with voting rights.

Moreover, suspicions rose that the petition has been manipulated by citiziens signing twice, or giving away their postal code. Anyways, the data has been periodically cleaned by the authorities: for example, I own one dump file with thousand of signatures from Vatican City, which have then disappeared.

Data and source

Source code and processed data for this experiment is available on github.

License

This notebook is licensed with a CC-BY 4.0 license. Please attribute by linking back to this file or to the corresponding github repo.