OK, so we have the results of the referendum:

results.summary
##   registered.count remain.count leave.count turnout.count turnout.prop
## 1         46500001     16141241    17410742      33551983       0.7215
##   remain.prop leave.prop
## 1      0.4811     0.5189

Then we have the following data to try to recreate the age disaggregated voting behaviour:

##   age.group    count registered.count registered.prop turnout.prop
## 1     18-24  5878472          4220131          0.7179         0.36
## 2     25-34  8822757          6511195          0.7380         0.58
## 3     35-44  8378302          7121557          0.8500         0.72
## 4     45-54  9196082          8340846          0.9070         0.75
## 5     55-64  7452381          6923262          0.9290         0.81
## 6       65+ 11611167         11077053          0.9540         0.83
##   remain.prop
## 1        0.73
## 2        0.62
## 3        0.52
## 4        0.44
## 5        0.43
## 6        0.40

So first we see how well this data can recreate the actual results.

##   data.source registered.count turnout.count turnout.prop remain.count
## 1      actual         46500001      33551983       0.7215     16141241
## 2  recreation         44194044      31480692       0.7123     14958220
##   leave.count remain.prop leave.prop
## 1    17410742      0.4811     0.5189
## 2    16522472      0.4752     0.5248

OK, so there’s about 2.3 million voters missing, registerred since 2014. But the turnout and results are very very close, slightly overestimating the turnout (about 1 percentage point) and slightly overestimating leave (by about .6 point).

So we need to add new registrations, but there is no dtat for that, only nice open access and age disaggregated numbers of applications to register, but not successful registrations. apparently a lot are ‘just in case’ registrations of people who are already on the register. And the level of redundancy presumably varies by age but how is not clear. There have been 18 million applications registered since 2014, so clearly most of them were not valid.

But we’ll take the most recent 2.3 million registrations and add them to the register, to get closer to the true number of voters.

##   data.source registered.count turnout.count turnout.prop remain.count
## 1      actual         46500001      33551983       0.7215     16141241
## 2 recreation2         46481118      32841849       0.7066     15717344
##   leave.count remain.prop leave.prop
## 1    17410742      0.4811     0.5189
## 2    17124505      0.4786     0.5214

Now only a slight manual readjustment of the turnout estimates, and we can match the actual results with the model:

##   data.source registered.count turnout.count turnout.prop remain.count
## 1      actual         46500001      33551983       0.7215     16141241
## 2 recreation3         46481118      33535065       0.7215     16133448
##   leave.count remain.prop leave.prop
## 1    17410742      0.4811     0.5189
## 2    17401617      0.4811     0.5189

So the followihg is then a realistic model of the age disaggregated voting on the referendum based on the best available data:

##   age.group registered.prop turnout.prop remain.prop
## 1     18-24          0.7179        0.400        0.73
## 2     25-34          0.7380        0.620        0.62
## 3     35-44          0.8500        0.725        0.52
## 4     45-54          0.9070        0.770        0.44
## 5     55-64          0.9290        0.810        0.43
## 6       65+          0.9540        0.830        0.40

OK, now let’s add the life expectancy