Much of the debate following the referendum on leaving the EU has focused on interpretations of the decision made by the electorate as a whole. It is impossible to know what motivated individual voters. However there may be some interpretable patterns in the results themselves that can be analysed statistically. The votes cast in each local authority are likely to be linked to the specific characteristics of the population. This analysis looks at these general patterns of voting across the country as a function of demography.
Unlike general elections in which votes are declared for constituencies, the votes cast for the referendum were aggregated at the level of local authorities. This provides an opportunity to combine them with demographic data at the same level. So although data is not available on individual voters, the broader patterns can be analysed
The raw data are available here.
http://www.electoralcommission.org.uk/__data/assets/file/0014/212135/EU-referendum-result-data.csv
Demographic data are available here.
The reshape library or dplyr can be used to manipulate these data and merge them into a single data set. The code for this is not shown here. However a link is provided to the merged data set that resulted for England uploaded into CartoDB. The map can be queried and the underlying data set itself downloaded for further analysis.
Percentage voting leave UK
https://dgolicher.carto.com/viz/54540ebe-49b8-11e6-8189-0e3ff518bd15/public_map
A great deal of the controversy regarding the referendum centered around the role of the leave campaign’s focus on immigration as the major issue. If people’s experience of directly losing jobs to immigrants influenced their votes it would be logical to expect a positive relationship between the proportion of the population not born in the UK (who could not vote) and the leave vote at the local authority level.
library(ggplot2)
g0<-ggplot(d,aes(x=pernonuk,y=Pct_Leave))
g1<-g0+geom_point(aes(col=win))+scale_x_log10()+geom_smooth(method=lm)
g1<-g1+labs(x="Percentage of population born outside the UK, log scale",y="Percentage voting leave")
g1
In fact the opposite tendency is observed. There is a lot of scatter around the trend line, but the trend is clearly significant overall. Local authorities with higher resident immigrant populations tended on average to record a lower proportion of votes for leave. This aggregated data however may be confounded by the fact that the authorities with the highest percentage of residents born outside the UK were in London, which overwhelming favoured remain. So the results should also be looked at by region.
g1+facet_wrap("Region")
The same general trend emerges, although there is of course once again a great deal of unexplained scatter in the results.
The analysis above was based on the overall percentage of the population born outside the UK. However the argument used in favour of “leave” was that recent migration is putting pressure on services causing resentment. There are also data available on net migration to each authority that can be used to test this assumption. Allowance should be made for the size of the resident population so a percentage of recent immigrants in the population of each local authority has been calculated.
g0<-ggplot(d,aes(x=permig,y=Pct_Leave))
g1<-g0+geom_point(aes(col=win))+geom_smooth(method=lm)
g1<-g1+labs(x="Percent net migration",y="Percentage voting leave")
g1<-g1+scale_x_log10()
g1
Remarkably this trend is also downwards. Areas with the highest percentage of migrants voted to remain. Once again this may be due to the London effect, so it should be looked at by region.
g1+facet_wrap("Region")
Even at the regional level the trend is mainly negative, with the notable exception of the East Midlands. Media coverage has focused on towns such as Boston as if they were representative of the general mood of the country. In reality Boston, and the East midlands as a whole, are clearly not representative of the overall trends. Furthermore Boston itself has a very small number of registered voters so the high leave vote from that authority in particular had no influence on the overall vote.
Migration works both ways. Much of the polemic has surrounded the net migration figures. Part of the issue is that temporary migration, especially of students from EU, is recorded in the figures. The areas with high inward migration also have high rates of outward migration.
g0<-ggplot(dd,aes(x=MigIn,y=MigOut))
g1<-g0+geom_point(aes(col=win))+scale_x_log10()+geom_smooth(method=loess)
g1<-g1+labs(x="Inward migration",y="Outward migration")
g1
Notice that the migration figures for the majority of local authorities is still relatively low. The migration figures are dominated by movements within a very small number of local authorities.
g1+facet_wrap("Region")
High mobility is striking. There is large amount of outward migration in areas receiving immigrants. It can therefore be assumed that most of the outward migration is not the result of British citizens emigrating to either the EU or other countries. It is much more likely that the figures represent two way movements back and forth of migrants both from the EU. In the case of Birmingham which was the only authority with high rates of migration to marginally vote to leave much of the migration is to the Indian subcontinent rather than the EU.
https://dgolicher.carto.com/viz/d05c8378-490a-11e6-b712-0e8c56e2ffdb/public_map
Looking at the percentage of outward migration to inward migration shows that net migration figures will be highly influenced by any change in outward migration. The numbers of people migrating out of an area are around 40% of the numbers arriving in most regions. Any decrease in inward migration that is also associated in an increase in outward migration could lower net migration very quickly. This may well occur as a result of changes in perception of the UK after the vote, leaving the migration issue largely irrelevant in coming years.
mig<-melt(dd,id="Region",m=c("MigOut","MigIn"))
mig<-cast(mig,Region~variable,sum)
mig$Percent_outmigration<-round(100*mig$MigOut/mig$MigIn,0)
mig
## Region MigOut MigIn Percent_outmigration
## 1 East 18552 45741 41
## 2 East Midlands 14734 36625 40
## 3 London 87205 221106 39
## 4 North East 8738 15099 58
## 5 North West 27949 50166 56
## 6 Scotland 18200 37800 48
## 7 South East 40698 76200 53
## 8 South West 22573 37645 60
## 9 Wales 10777 16699 65
## 10 West Midlands 22014 46313 48
## 11 Yorkshire 18904 39790 48
If Brexit induces a recession or makes immigrants feel less comfortable living in the UK outward migration could quite quickly result in a fall in net migration or even result in an overall fall in the number of non UK citizens living in the country, without actually imposing any further legal constraints on movement. Furthermore the analysis shows that most authorities receiving high numbers of migrants are actually in favour of remaining in the EU. Migrants likely to feel most unwelcome are those living away from the areas with high numbers of migrants, again with the notable exception of a few authorities in the East Midlands such as Boston.
To simplify the analysis I calculated the percentage of the total voting age population aged over 40 in each local authority as an indicator of the age structure. Polling by Lord Ashcroft suggested much greater support for remain among younger voters. The results of that poll are available here.
http://lordashcroftpolls.com/2016/06/how-the-united-kingdom-voted-and-why
Turnout is also a function of age. London and the surrounding areas have a relatively young population compared to the rest of the country. Coastal areas have a higher percentage of retired people.
Proportion over 40
https://dgolicher.carto.com/viz/37fde61a-49de-11e6-83ba-0e3ff518bd15/public_map
g0<-ggplot(dd,aes(x=POver40,y=Pct_Leave))
g1<-g0+geom_point(aes(col=win))
g1<-g1+labs(x="Percentage of voting age population aged 40 or more",y="Percentage voting leave")
g1+geom_smooth(aes(x=POver40,y=Pct_Leave),method="loess")
g1+geom_smooth(method=lm)+facet_wrap("Region")
g0<-ggplot(dd,aes(x=POver40,y=Pct_Turnout))
g1<-g0+geom_point(aes(color=win))+geom_smooth(method="lm")
g1<- g1+labs(x="Percentage of voting age population aged 40 or more",y="Percent turnout")
g1
This is more or less as expected, but notice that some of the authorities with the highest turn out voted to remain. This will become very important later on in the analysis.
g1+geom_smooth(method=lm)+facet_wrap("Region")
Based on the clear relationships between turnout, age strucure and voting leave we might expect the leave vote to increase with turnout.
g0<-ggplot(dd,aes(x=Pct_Turnout,y=Pct_Leave))
g1<-g0+geom_point(aes(color=win))+geom_smooth(method="lm")
g1<-g1+labs(x="Percent turnout",y="Percent voting leave")
g1
In fact the relationship is almost completely flat. On face value turnout does not appear to be related to the vote. At a regional level the relationship is even the converse of that expected in the the South East and the North East.
g1+facet_wrap("Region")
How can we explain this odd looking result?
We need to look again at the relationship between age and turnout. Areas with an older demographic have a higher turnout in general and they tend to favour leave. However there is a lot of scatter around the trend line. One way of looking at this scatter statistically is to first build a model of turnout as predicted by demography and then use the residuals from this model (distance from the expected trend line) as a response variable. In other words we first take into account the demographic factor before looking at the effect of turnout. Because turnout overall in Scotland was much lower than England and Wales and overwhelmingly voted to remain we will remove Scotish local authorities as they would distort the model for the rest of the country.
mod<-lm(Pct_Turnout~POver40,data=Eng)
Eng$turnout_residuals<-residuals(mod)
g0<-ggplot(Eng,aes(x=turnout_residuals,y=Pct_Leave))
g1<-g0+geom_point()+geom_smooth(method="lm")
g1<-g1+labs(x="Turnout after holding for age",y="Percent voting leave")
g1
Now there is a very clear trend. The leave vote declines in authorities where there was a higher turnout than expected given the age structure of the population. Higher turnout does in fact favour the remain vote, but this effect is obscured by the fact that turnout tends to be naturally higher in the areas that favoured leave due to the underlying relationship between the age of voters and their voting preference. These two factors cancel each other out which explains the ambiguous effect for turnout alone. Once the age effect has been factored into the analysis then higher turnout clearly does favour the remain vote.
g1+facet_wrap("Region")
The effect is even clearer when broken down by region, as regional differences in turnout are also accounted for.
The implications of this pattern are very important. A naive analysis of the raw vote that did not factor in the demographic effect might conclude that turnout had little effect on the result and that an increase in turnout would simply have increased the numbers of people voting on both sides. However this is not the case. Most of the authorities with high percentage of leave votes also had a high proportion of older voters. A high proportion of these voters did already turn out to vote. So any overall increase in the turnout would have had the effect of also increasing the turnout after holding for demography. This was the variable associated with votes for remain. This argument may be rather technical and hard to accept for non-statisticans. However holding for confounding variables is a standard statistical technique that is very robust. There is no doubt that the result is valid.
A formal way to analyse this is to build an additive linear model in which the percent voting leave is predicted by turnout and percentage of voters over 40.
mod<-lm(Pct_Leave~Pct_Turnout+POver40,data=Eng)
summary(mod)
##
## Call:
## lm(formula = Pct_Leave ~ Pct_Turnout + POver40, data = Eng)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.7915 -5.0374 -0.1608 4.9680 24.2007
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.17709 6.28839 11.32 <2e-16 ***
## Pct_Turnout -1.18444 0.11238 -10.54 <2e-16 ***
## POver40 1.08213 0.06998 15.46 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.546 on 345 degrees of freedom
## Multiple R-squared: 0.4094, Adjusted R-squared: 0.406
## F-statistic: 119.6 on 2 and 345 DF, p-value: < 2.2e-16
The coefficient is negative and highly significant thus confirming formally that higher turnout after taking into account demography does indeed decrease the percent voting leave. This calls into question claims that there was an “overwhelming mandate” in favour of leaving. If more people had turned out to vote then the result would almost certainly have been to remain.
Turnout after holding for demography
https://dgolicher.carto.com/viz/6268c778-49e1-11e6-88a6-0e233c30368f/public_map
The code below is included for my reference in order to document the steps I used. It uses a local PostGIS data base to link the tables to a shapefile. The resulting shapefile for England was then uploaded to CartoDB to produce the maps that are linked to this document.
library(RODBC)
con<-odbcConnect("elections")
odbcQuery(con,"drop table england_data")
## [1] 1
sqlSave(con,Eng,"england_data",safer=FALSE)
query<-"drop table referendum;
create table referendum as
select e.*,geom from
england_data e,
local_authorities l
where e.area_code = l.code"
odbcQuery(con,query)
## [1] 1
It is unfortunately not possible to directly compare the results of the referendum with the votes from the last election as the boundaries of the reporting districts are different. However, a rough and ready solution to this is to take the mean of the percentage votes cast for each party associated with the centroids of the constituencies falling within the boundaries. This can fail in a few cases when no centroid falls in the area. In others several constituencies with different voting patterns may be combined, so there are some errors. However as a guide to overall patterns it is acceptable.
Local authority and consituency boundaries
ge<-read.csv("general_election.csv")
ge<-merge(ge,dd,by.x="code",by.y="Area_Code")
We would expect a very strong relationship with the percent voting for UKIP at the last election and the percent voting leave.
g0<-ggplot(ge,aes(x=ukip,y=leave))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="loess")
g1<-g1+labs(x="Percent voting for UKIP",y="Percent voting leave")
g1
The clear result also holds at the regional level. However notice that a large number of non UKIP voters contributed to the overall leave vote. In many cases this was over 40% of the total vote.
g1+facet_wrap("region")
In order to look at the relationship between the leave vote and the proportion voting Labour we should first subtract the UKIP vote under the assumption that almost all UKIP voters went for leave.
g0<-ggplot(ge,aes(x=labour,y=leave-ukip))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="lm")
g1<-g1+labs(x="Percent voting labour at last election",y="Percent non UKIP leave")
g1
The relationship is apparently quite flat, which suggests that the non UKIP leave vote was drawn from fairly equal numbers of Labour and Conservative voters. However this does not take into account the turnout effect that we have already determined to be an important factor.
g1+facet_wrap("region")
Labour voting areas tend to have low turnouts at general elections, in part because many Labour held seats are very safe and voters traditionally have not been motivated to vote. They also may have more young voters, particularly in London.
g0<-ggplot(ge,aes(x=labour,y=Pct_Turnout))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="lm")
g1<-g1+labs(x="Percent voting Labour at the last election",y="Percent turnout")
g1
g1+facet_wrap("region")
Conservative voters are generally older and thus more likely to turnout to vote.
g0<-ggplot(ge,aes(x=con,y=Pct_Turnout))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="lm")
g1<-g1+labs(x="Percent voting Conservative at the last election",y="Percent turnout")
g1
g1+facet_wrap("region")
If it were true that experience of high levels of migration leads voters to turn to UKIP we would expect a positive relationship between the percentage of migrants and the UKIP vote.
g0<-ggplot(ge,aes(x=permig,y=ukip))
g1<-g0+geom_point()
g1<-g1+geom_smooth(method="lm")
g1<-g1+labs(x="Percent recent migrants",y="Percent voting UKIP at last election")
g1<-g1+scale_x_log10()
g1
The converse is true. In general terms areas with high levels of migration have lower percentages of UKIP voters. Most UKIP supporters live in areas with low migration suggesting that their worries about migration are not based on personal experience of migration.
g1+facet_wrap("region")
The analysis provides clear evidence that the remain campaign led by George Osbourne targeted the wrong section of the electorate. The campaign was aimed at the section of the electorate in marginal seats at a general election rather than the country as a whole. The leave campaign won because younger voters did not turnout in large enough numbers, especially in Labour held areas. The widely held belief that high levels of immigration increase the UKIP and leave vote among the resident population is not supported by the results, with the notable exception of a few areas in the East Midlands. A positive campaign targeting young voters in areas with high migration would almost certainly have led to an overall vote to remain. A generalised fear of the effects of migration rather than direct experience of high levels of migration was responsible for the emphasis on migration as a factor in the vote.