Richard Harris, School of Geographical Sciences, University of Bristol
September 2016
Promoted as a way to disentangle the scales at which the processes that separate people operate
Variance does not translate directly into a readily interpretable index
However, it is straightforward to link to classic approaches.
The Index of Dissimilarity (ID)
\[ {ID}_{XY}={0.5}\times\sum{\big|Y-X\big|} \]
where \( Y = \frac{y}{\sum{y}} \) (an observed value) and \( X = \frac{x}{\sum{x}} \) (an expected value)
If \( Y = \beta_{0} + \beta_{1}X + \epsilon \)
then, setting \( \beta_{0} = 0 \) and \( \beta_{1} = 1 \),
\[ Y - X = \epsilon \]
\[ {ID}_{XY}={0.5}\times\sum{\big|Y-X\big|} \]
\[ {ID}_{XY}={0.5}\times\sum{\big|\epsilon\big|} \]
If \( Y = \beta_{0} + \beta_{1}X + \zeta_{i} + \eta_{j} + \theta_{k} ... \)
where \( \epsilon = \zeta_{i} + \eta_{j} + \theta_{k} ... \)
then
\[ {ID}_{XY}={0.5}\times\sum{\big|\zeta_{i} + \eta_{j} + \theta_{k} ...\big|} \]
head(mydata[,1:5], n=3)
OA ny nx Y X
1 E00000001 2 150 6.792329e-07 3.547841e-06
2 E00000003 20 177 6.792329e-06 4.186452e-06
3 E00000005 10 254 3.396165e-06 6.007677e-06
0.5*sum(abs(Y - X))
[1] 0.7112702
ols <- lm(Y ~ 0, offset=X, data=mydata)
r <- residuals(ols)
0.5*sum(abs(r))
[1] 0.7112702
library(lme4)
mlm <- lmer(Y ~ 0 + (1|LSOA11CD) + (1|MSOA11CD) + (1|LAD11CD), offset=X, data=mydata)
head(results, n=3)
LSOA11CD MSOA11CD LAD11CD
1 -2.774405e-06 -2.438521e-06 1.455065e-06 8.892527e-07
2 2.700080e-06 -2.438521e-06 1.455065e-06 8.892527e-07
3 -2.517309e-06 -2.438521e-06 1.455065e-06 8.892527e-07
rs <- rowSums(results)
0.5 * sum(abs(rs))
[1] 0.7112702
Use the variance partition coefficient (VPC) & 'holdback scores' to gauge the degree of segregation at each level of the hierarchy. Holdback scores measure the change in the ID if the residuals differences at a given level were ignored (set to zero).
varshare(mlm)
Residual LSOA11CD MSOA11CD LAD11CD
16.45926 10.99240 45.02912 27.51923
holdback(results)
[1] -5.2 -2.4 -10.8 -29.0
Neighbourhood level differences treated as a series of localised disturbances from a general trend: \( |\epsilon_{(u,v)}| \) where \( (u, v) \) denotes a location.
\[ {ID}_{XY} \propto\sum\big|\epsilon_{(u,v)}\big| \]
Then, summing over sub-regions of the map, where \( n \) is the number of neighbourhoods in the sub-region and \( N \) is the number across the whole study space, then
\[ share = \frac{\sum_{}^{n} |\epsilon_{(u,v)}|}{\sum_{}^{N} |\epsilon_{(u,v)}|} \]
Some regions contain more neighbourhoods than others. To allow for this,
\[ impact = \frac{\sum_{}^{n} |\epsilon_{(u,v)}|}{n\overline{|\epsilon|}} \]
Or, scaled
\[ z \approx \frac{\sum_{}^{n} |\epsilon|}{n({se}_\epsilon)} \]
Census population (England)
ID = 0.686
E(randomisation) = 0.041
holdback and % variance:
GORs: -18.5%; 10.4%
LAs: -11.0%; 24.1%
MLOAs: -14.7%; 48.7%
LLOAs: -5.1%; 16.8%
'Significant' places: Leicester, Harrow, Slough, Redbridge, Newham, Tower Hamlets
Schools population (England)
ID = 0.766
E(randomisation) = 0.095
holdback and % variance:
GORs: -10.7%; 1.5%
LAs: -21.5%; 22.4%
MLOAs: -17.7%; 54.4%
LLOAs: -6.3; 21.88%
'Significant' places: Bradford, Blackburn, Oldham, Leicester, Luton, Slough, Redbridge, Newham, Tower Hamlets
Asian here means the Bangladeshi, Indian and Pakistani groups Although the school population seems more segregated than the Census population
The Modifiable Areal Unit Problem (MAUP)
Use gridded data
Please see the pre-publication paper: available here
Or contact rich.harris@bris.ac.uk