Introduction

Much of the debate following the referendum on leaving the EU has focused on interpretations of the decision made by the electorate as a whole. It is impossible to know what motivated individual voters. However there may be some interpretable patterns in the results themselves that can be analysed statistically. The votes cast in each local authority are likely to be linked to the specific characteristics of the population. This analysis looks at these general patterns of voting across the country as a function of demography.

Finding the data

Unlike general elections in which votes are declared for constituencies, the votes cast for the referendum were aggregated at the level of local authorities. This provides an opportunity to combine them with demographic data at the same level. So although data is not available on individual voters, the broader patterns can be analysed

The raw data are available here.

http://www.electoralcommission.org.uk/__data/assets/file/0014/212135/EU-referendum-result-data.csv

Demographic data are available here.

https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesforukenglandandwalesscotlandandnorthernireland/mid2015/ukmye2015.zip

The reshape library or dplyr can be used to manipulate these data and merge them into a single data set. The code for this is not shown here. However a link is provided to the merged data set that resulted for England uploaded into CartoDB. The map can be queried and the underlying data set itself downloaded for further analysis.

Percentage voting leave UK

CartoDB link to results map

https://dgolicher.carto.com/viz/54540ebe-49b8-11e6-8189-0e3ff518bd15/public_map

The influence of migration

A great deal of the controversy regarding the referendum centered around the role of the leave campaign’s focus on immigration as the major issue. If people’s experience of directly losing jobs to immigrants influenced their votes it would be logical to expect a positive relationship between the proportion of the population not born in the UK (who could not vote) and the leave vote at the local authority level.

library(ggplot2)
g0<-ggplot(d,aes(x=pernonuk,y=Pct_Leave))
g1<-g0+geom_point(aes(col=win))+scale_x_log10()+geom_smooth(method=lm)
g1<-g1+labs(x="Percentage of population born outside the UK, log scale",y="Percentage voting leave")
g1

In fact the opposite tendency is observed. There is a lot of scatter around the trend line, but the trend is clearly significant overall. Local authorities with higher resident immigrant populations tended on average to record a lower proportion of votes for leave. This aggregated data however may be confounded by the fact that the authorities with the highest percentage of residents born outside the UK were in London, which overwhelming favoured remain. So the results should also be looked at by region.

g1+facet_wrap("Region")

The same general trend emerges, although there is of course once again a great deal of unexplained scatter in the results.

The analysis above was based on the overall percentage of the population born outside the UK. However the argument used in favour of “leave” was that recent migration is putting pressure on services causing resentment. There are also data available on net migration to each authority that can be used to test this assumption. Allowance should be made for the size of the resident population so a percentage of recent immigrants in the population of each local authority has been calculated.

g0<-ggplot(d,aes(x=permig,y=Pct_Leave))
g1<-g0+geom_point(aes(col=win))+geom_smooth(method=lm)
g1<-g1+labs(x="Percent net migration",y="Percentage voting leave")
g1<-g1+scale_x_log10()
g1

Remarkably this trend is also downwards. Areas with the highest percentage of migrants voted to remain. Once again this may be due to the London effect, so it should be looked at by region.

g1+facet_wrap("Region")

Even at the regional level the trend is mainly negative, with the notable exception of the East Midlands. Media coverage has focused on towns such as Boston as if they were representative of the general mood of the country. In reality Boston, and the East midlands as a whole, are clearly not representative of the overall trends. Furthermore Boston itself has a very small number of registered voters so the high leave vote from that authority in particular had no influence on the overall vote.

Migration works both ways. Much of the polemic has surrounded the net migration figures. Part of the issue is that temporary migration, especially of students from EU, is recorded in the figures. The areas with high inward migration also have high rates of outward migration.

g0<-ggplot(dd,aes(x=MigIn,y=MigOut))
g1<-g0+geom_point(aes(col=win))+scale_x_log10()+geom_smooth(method=loess)
g1<-g1+labs(x="Inward migration",y="Outward migration")
g1

Notice that the migration figures for the majority of local authorities is still relatively low. The migration figures are dominated by movements within a very small number of local authorities.

g1+facet_wrap("Region")

High mobility is striking. There is large amount of outward migration in areas receiving immigrants. It can therefore be assumed that most of the outward migration is not the result of British citizens emigrating to either the EU or other countries. It is much more likely that the figures represent two way movements back and forth of migrants both from the EU. In the case of Birmingham which was the only authority with high rates of migration to marginally vote to leave much of the migration is to the Indian subcontinent rather than the EU.

Migration for the Birmingham local authority shown in the CartoDB map

Migration figures for Birmingham

CartoDB link to migration map

https://dgolicher.carto.com/viz/d05c8378-490a-11e6-b712-0e8c56e2ffdb/public_map

Looking at the percentage of outward migration to inward migration shows that net migration figures will be highly influenced by any change in outward migration. The numbers of people migrating out of an area are around 40% of the numbers arriving in most regions. Any decrease in inward migration that is also associated in an increase in outward migration could lower net migration very quickly. This may well occur as a result of changes in perception of the UK after the vote, leaving the migration issue largely irrelevant in coming years.

mig<-melt(dd,id="Region",m=c("MigOut","MigIn"))
mig<-cast(mig,Region~variable,sum)
mig$Percent_outmigration<-round(100*mig$MigOut/mig$MigIn,0)
mig

##           Region MigOut  MigIn Percent_outmigration
## 1           East  18552  45741                   41
## 2  East Midlands  14734  36625                   40
## 3         London  87205 221106                   39
## 4     North East   8738  15099                   58
## 5     North West  27949  50166                   56
## 6       Scotland  18200  37800                   48
## 7     South East  40698  76200                   53
## 8     South West  22573  37645                   60
## 9          Wales  10777  16699                   65
## 10 West Midlands  22014  46313                   48
## 11     Yorkshire  18904  39790                   48

If Brexit induces a recession or makes immigrants feel less comfortable living in the UK outward migration could quite quickly result in a fall in net migration or even result in an overall fall in the number of non UK citizens living in the country, without actually imposing any further legal constraints on movement. Furthermore the analysis shows that most authorities receiving high numbers of migrants are actually in favour of remaining in the EU. Migrants likely to feel most unwelcome are those living away from the areas with high numbers of migrants, again with the notable exception of a few authorities in the East Midlands such as Boston.

Effect of population age structure

To simplify the analysis I calculated the percentage of the total voting age population aged over 40 in each local authority as an indicator of the age structure. Polling by Lord Ashcroft suggested much greater support for remain among younger voters. The results of that poll are available here.

http://lordashcroftpolls.com/2016/06/how-the-united-kingdom-voted-and-why

Turnout is also a function of age. London and the surrounding areas have a relatively young population compared to the rest of the country. Coastal areas have a higher percentage of retired people.

Proportion over 40

CartoDB to map of proportion over 40

https://dgolicher.carto.com/viz/37fde61a-49de-11e6-83ba-0e3ff518bd15/public_map

g0<-ggplot(dd,aes(x=POver40,y=Pct_Leave))
g1<-g0+geom_point(aes(col=win))
g1<-g1+labs(x="Percentage of voting age population aged 40 or more",y="Percentage voting leave")
g1+geom_smooth(aes(x=POver40,y=Pct_Leave),method="loess")

g1+geom_smooth(method=lm)+facet_wrap("Region")

g0<-ggplot(dd,aes(x=POver40,y=Pct_Turnout))
g1<-g0+geom_point(aes(color=win))+geom_smooth(method="lm")
g1<- g1+labs(x="Percentage of voting age population aged 40 or more",y="Percent turnout")
g1

This is more or less as expected, but notice that some of the authorities with the highest turn out voted to remain. This will become very important later on in the analysis.

g1+geom_smooth(method=lm)+facet_wrap("Region")

Turnout as an influence on the vote.

Based on the clear relationships between turnout, age strucure and voting leave we might expect the leave vote to increase with turnout.

g0<-ggplot(dd,aes(x=Pct_Turnout,y=Pct_Leave))
g1<-g0+geom_point(aes(color=win))+geom_smooth(method="lm")
g1<-g1+labs(x="Percent turnout",y="Percent voting leave")
g1

In fact the relationship is almost completely flat. On face value turnout does not appear to be related to the vote. At a regional level the relationship is even the converse of that expected in the the South East and the North East.

g1+facet_wrap("Region")

How can we explain this odd looking result?

We need to look again at the relationship between age and turnout. Areas with an older demographic have a higher turnout in general and they tend to favour leave. However there is a lot of scatter around the trend line. One way of looking at this scatter statistically is to first build a model of turnout as predicted by demography and then use the residuals from this model (distance from the expected trend line) as a response variable. In other words we first take into account the demographic factor before looking at the effect of turnout. Because turnout overall in Scotland was much lower than England and Wales and overwhelmingly voted to remain we will remove Scotish local authorities as they would distort the model for the rest of the country.

mod<-lm(Pct_Turnout~POver40,data=Eng)
Eng$turnout_residuals<-residuals(mod)
g0<-ggplot(Eng,aes(x=turnout_residuals,y=Pct_Leave))
g1<-g0+geom_point()+geom_smooth(method="lm")
g1<-g1+labs(x="Turnout after holding for age",y="Percent voting leave")
g1

Now there is a very clear trend. The leave vote declines in authorities where there was a higher turnout than expected given the age structure of the population. Higher turnout does in fact favour the remain vote, but this effect is obscured by the fact that turnout tends to be naturally higher in the areas that favoured leave due to the underlying relationship between the age of voters and their voting preference. These two factors cancel each other out which explains the ambiguous effect for turnout alone. Once the age effect has been factored into the analysis then higher turnout clearly does favour the remain vote.

g1+facet_wrap("Region")

The effect is even clearer when broken down by region, as regional differences in turnout are also accounted for.

Implications of the turnout effect

The implications of this pattern are very important. A naive analysis of the raw vote that did not factor in the demographic effect might conclude that turnout had little effect on the result and that an increase in turnout would simply have increased the numbers of people voting on both sides. However this is not the case. Most of the authorities with high percentage of leave votes also had a high proportion of older voters. A high proportion of these voters did already turn out to vote. So any overall increase in the turnout would have had the effect of also increasing the turnout after holding for demography. This was the variable associated with votes for remain. This argument may be rather technical and hard to accept for non-statisticans. However holding for confounding variables is a standard statistical technique that is very robust. There is no doubt that the result is valid.

A formal way to analyse this is to build an additive linear model in which the percent voting leave is predicted by turnout and percentage of voters over 40.

mod<-lm(Pct_Leave~Pct_Turnout+POver40,data=Eng)
summary(mod)

## 
## Call:
## lm(formula = Pct_Leave ~ Pct_Turnout + POver40, data = Eng)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.7915  -5.0374  -0.1608   4.9680  24.2007 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 71.17709    6.28839   11.32   <2e-16 ***
## Pct_Turnout -1.18444    0.11238  -10.54   <2e-16 ***
## POver40      1.08213    0.06998   15.46   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.546 on 345 degrees of freedom
## Multiple R-squared:  0.4094, Adjusted R-squared:  0.406 
## F-statistic: 119.6 on 2 and 345 DF,  p-value: < 2.2e-16

The coefficient is negative and highly significant thus confirming formally that higher turnout after taking into account demography does indeed decrease the percent voting leave. This calls into question claims that there was an “overwhelming mandate” in favour of leaving. If more people had turned out to vote then the result would almost certainly have been to remain.

Turnout after holding for age structure

Turnout after holding for demography

CartoDB link to map of trunout after holding for age structure

https://dgolicher.carto.com/viz/6268c778-49e1-11e6-88a6-0e233c30368f/public_map

Creating the maps

The code below is included for my reference in order to document the steps I used. It uses a local PostGIS data base to link the tables to a shapefile. The resulting shapefile for England was then uploaded to CartoDB to produce the maps that are linked to this document.

library(RODBC)
con<-odbcConnect("elections")
odbcQuery(con,"drop table england_data")

## [1] 1

sqlSave(con,Eng,"england_data",safer=FALSE)
query<-"drop table referendum;
create table referendum as
select e.*,geom from 
england_data e,
local_authorities l
where e.area_code = l.code"
odbcQuery(con,query)

## [1] 1

Voting at the last election

It is unfortunately not possible to directly compare the results of the referendum with the votes from the last election as the boundaries of the reporting districts are different. However, a rough and ready solution to this is to take the mean of the percentage votes cast for each party associated with the centroids of the constituencies falling within the boundaries. This can fail in a few cases when no centroid falls in the area. In others several constituencies with different voting patterns may be combined, so there are some errors. However as a guide to overall patterns it is acceptable.

Local authority and consituency boundaries

ge<-read.csv("general_election.csv")
ge<-merge(ge,dd,by.x="code",by.y="Area_Code")

Effect of the UKIP vote

We would expect a very strong relationship with the percent voting for UKIP at the last election and the percent voting leave.

g0<-ggplot(ge,aes(x=ukip,y=leave))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="loess")
g1<-g1+labs(x="Percent voting for UKIP",y="Percent voting leave")
g1

The clear result also holds at the regional level. However notice that a large number of non UKIP voters contributed to the overall leave vote. In many cases this was over 40% of the total vote.

g1+facet_wrap("region")

Labour vote

In order to look at the relationship between the leave vote and the proportion voting Labour we should first subtract the UKIP vote under the assumption that almost all UKIP voters went for leave.

g0<-ggplot(ge,aes(x=labour,y=leave-ukip))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="lm")
g1<-g1+labs(x="Percent voting labour at last election",y="Percent non UKIP leave")
g1

The relationship is apparently quite flat, which suggests that the non UKIP leave vote was drawn from fairly equal numbers of Labour and Conservative voters. However this does not take into account the turnout effect that we have already determined to be an important factor.

g1+facet_wrap("region")

Turnout in labour voting areas

Labour voting areas tend to have low turnouts at general elections, in part because many Labour held seats are very safe and voters traditionally have not been motivated to vote. They also may have more young voters, particularly in London.

g0<-ggplot(ge,aes(x=labour,y=Pct_Turnout))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="lm")
g1<-g1+labs(x="Percent voting Labour at the last election",y="Percent turnout")
g1

g1+facet_wrap("region")

Conservative turnout

Conservative voters are generally older and thus more likely to turnout to vote.

g0<-ggplot(ge,aes(x=con,y=Pct_Turnout))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="lm")
g1<-g1+labs(x="Percent voting Conservative at the last election",y="Percent turnout")
g1

g1+facet_wrap("region")

The UKIP vote and migration

If it were true that experience of high levels of migration leads voters to turn to UKIP we would expect a positive relationship between the percentage of migrants and the UKIP vote.

g0<-ggplot(ge,aes(x=permig,y=ukip))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="lm")
g1<-g1+labs(x="Percent recent migrants",y="Percent voting UKIP at last election")
g1<-g1+scale_x_log10()
g1

The converse is true. In general terms areas with high levels of migration have lower percentages of UKIP voters. Most UKIP supporters live in areas with low migration suggesting that their worries about migration are not based on personal experience of migration.

g1+facet_wrap("region")

The UK referendum on leaving the EU