OK, so we have the results of the referendum:
results.summary
## registered.count remain.count leave.count turnout.count turnout.prop
## 1 46500001 16141241 17410742 33551983 0.7215
## remain.prop leave.prop
## 1 0.4811 0.5189
Then we have the following data to try to recreate the age disaggregated voting behaviour:
## age.group count registered.count registered.prop turnout.prop
## 1 18-24 5878472 4220131 0.7179 0.36
## 2 25-34 8822757 6511195 0.7380 0.58
## 3 35-44 8378302 7121557 0.8500 0.72
## 4 45-54 9196082 8340846 0.9070 0.75
## 5 55-64 7452381 6923262 0.9290 0.81
## 6 65+ 11611167 11077053 0.9540 0.83
## remain.prop
## 1 0.73
## 2 0.62
## 3 0.52
## 4 0.44
## 5 0.43
## 6 0.40
So first we see how well this data can recreate the actual results.
## data.source registered.count turnout.count turnout.prop remain.count
## 1 actual 46500001 33551983 0.7215 16141241
## 2 recreation 44194044 31480692 0.7123 14958220
## leave.count remain.prop leave.prop
## 1 17410742 0.4811 0.5189
## 2 16522472 0.4752 0.5248
OK, so there’s about 2.3 million voters missing, registerred since 2014. But the turnout and results are very very close, slightly overestimating the turnout (about 1 percentage point) and slightly overestimating leave (by about .6 point).
So we need to add new registrations, but there is no dtat for that, only nice open access and age disaggregated numbers of applications to register, but not successful registrations. apparently a lot are ‘just in case’ registrations of people who are already on the register. And the level of redundancy presumably varies by age but how is not clear. There have been 18 million applications registered since 2014, so clearly most of them were not valid.
But we’ll take the most recent 2.3 million registrations and add them to the register, to get closer to the true number of voters.
## data.source registered.count turnout.count turnout.prop remain.count
## 1 actual 46500001 33551983 0.7215 16141241
## 2 recreation2 46481118 32841849 0.7066 15717344
## leave.count remain.prop leave.prop
## 1 17410742 0.4811 0.5189
## 2 17124505 0.4786 0.5214
Now only a slight manual readjustment of the turnout estimates, and we can match the actual results with the model:
## data.source registered.count turnout.count turnout.prop remain.count
## 1 actual 46500001 33551983 0.7215 16141241
## 2 recreation3 46481118 33535065 0.7215 16133448
## leave.count remain.prop leave.prop
## 1 17410742 0.4811 0.5189
## 2 17401617 0.4811 0.5189
So the followihg is then a realistic model of the age disaggregated voting on the referendum based on the best available data:
## age.group registered.prop turnout.prop remain.prop
## 1 18-24 0.7179 0.400 0.73
## 2 25-34 0.7380 0.620 0.62
## 3 35-44 0.8500 0.725 0.52
## 4 45-54 0.9070 0.770 0.44
## 5 55-64 0.9290 0.810 0.43
## 6 65+ 0.9540 0.830 0.40
OK, now let’s add the life expectancy