A spatially disaggregated, multilevel index of dissimilarity

Richard Harris, School of Geographical Sciences, University of Bristol
September 2016

Measuring segregation using multilevel models

Promoted as a way to disentangle the scales at which the processes that separate people operate

  • e.g. micro effects (small areas) from meso effects (e.g. local authorities) from macro effects (e.g. regions)
  • Based on the idea of measuring spatial heterogeneity as a variance term, where greater variance indicates greater segregation
  • Published examples include Leckie et al. (2012). Leckie & Goldstein (2015), Manley et al. (2016)

Disadvantage

Variance does not translate directly into a readily interpretable index

  • It is not constrained to lie within a given range
  • It is dependent upon the measurement units (problematic for measures of, e.g. benefit claimants over time)

However, it is straightforward to link to classic approaches.

Classic measure of segregation

The Index of Dissimilarity (ID)

\[ {ID}_{XY}={0.5}\times\sum{\big|Y-X\big|} \]

where \( Y = \frac{y}{\sum{y}} \) (an observed value) and \( X = \frac{x}{\sum{x}} \) (an expected value)

Within a regression framework

If \( Y = \beta_{0} + \beta_{1}X + \epsilon \)

then, setting \( \beta_{0} = 0 \) and \( \beta_{1} = 1 \),

\[ Y - X = \epsilon \]

\[ {ID}_{XY}={0.5}\times\sum{\big|Y-X\big|} \]

\[ {ID}_{XY}={0.5}\times\sum{\big|\epsilon\big|} \]

Can be extended

If \( Y = \beta_{0} + \beta_{1}X + \zeta_{i} + \eta_{j} + \theta_{k} ... \)

where \( \epsilon = \zeta_{i} + \eta_{j} + \theta_{k} ... \)

then

\[ {ID}_{XY}={0.5}\times\sum{\big|\zeta_{i} + \eta_{j} + \theta_{k} ...\big|} \]

In R (index)

head(mydata[,1:5], n=3)
         OA ny  nx            Y            X
1 E00000001  2 150 6.792329e-07 3.547841e-06
2 E00000003 20 177 6.792329e-06 4.186452e-06
3 E00000005 10 254 3.396165e-06 6.007677e-06
0.5*sum(abs(Y - X))
[1] 0.7112702

In R (OLS)

ols <- lm(Y ~ 0, offset=X, data=mydata)
r <- residuals(ols)
0.5*sum(abs(r))
[1] 0.7112702

In R (MLM)

library(lme4)
mlm <- lmer(Y ~ 0 + (1|LSOA11CD) + (1|MSOA11CD) + (1|LAD11CD), offset=X, data=mydata)
head(results, n=3)
                     LSOA11CD     MSOA11CD      LAD11CD
1 -2.774405e-06 -2.438521e-06 1.455065e-06 8.892527e-07
2  2.700080e-06 -2.438521e-06 1.455065e-06 8.892527e-07
3 -2.517309e-06 -2.438521e-06 1.455065e-06 8.892527e-07
rs <- rowSums(results)
0.5 * sum(abs(rs))
[1] 0.7112702

In R (MLM ctd.)

Use the variance partition coefficient (VPC) & 'holdback scores' to gauge the degree of segregation at each level of the hierarchy. Holdback scores measure the change in the ID if the residuals differences at a given level were ignored (set to zero).

varshare(mlm)
Residual LSOA11CD MSOA11CD  LAD11CD 
16.45926 10.99240 45.02912 27.51923 
holdback(results)
[1]  -5.2  -2.4 -10.8 -29.0

Investigating Spatial heterogeneity

Neighbourhood level differences treated as a series of localised disturbances from a general trend: \( |\epsilon_{(u,v)}| \) where \( (u, v) \) denotes a location.

\[ {ID}_{XY} \propto\sum\big|\epsilon_{(u,v)}\big| \]

Then, summing over sub-regions of the map, where \( n \) is the number of neighbourhoods in the sub-region and \( N \) is the number across the whole study space, then

\[ share = \frac{\sum_{}^{n} |\epsilon_{(u,v)}|}{\sum_{}^{N} |\epsilon_{(u,v)}|} \]

Investigating Spatial heterogeneity (ctd.)

Some regions contain more neighbourhoods than others. To allow for this,

\[ impact = \frac{\sum_{}^{n} |\epsilon_{(u,v)}|}{n\overline{|\epsilon|}} \]

Or, scaled

\[ z \approx \frac{\sum_{}^{n} |\epsilon|}{n({se}_\epsilon)} \]

Example: Asian-White British segregation 2011


Census population (England)

ID = 0.686
E(randomisation) = 0.041

holdback and % variance:
GORs: -18.5%; 10.4%
LAs: -11.0%; 24.1%
MLOAs: -14.7%; 48.7%
LLOAs: -5.1%; 16.8%

'Significant' places: Leicester, Harrow, Slough, Redbridge, Newham, Tower Hamlets


Schools population (England)

ID = 0.766
E(randomisation) = 0.095

holdback and % variance:
GORs: -10.7%; 1.5%
LAs: -21.5%; 22.4%
MLOAs: -17.7%; 54.4%
LLOAs: -6.3; 21.88%

'Significant' places: Bradford, Blackburn, Oldham, Leicester, Luton, Slough, Redbridge, Newham, Tower Hamlets

Observations

Asian here means the Bangladeshi, Indian and Pakistani groups Although the school population seems more segregated than the Census population

  • It has decreased since 2002 (ID = 0.782)
  • Likely to be a demographic effect
  • The White British are older on average and more likely to move out of urban areas
  • The increased difference at the regional scale for the Census population is driven by London: see full paper

Two limitations

The Modifiable Areal Unit Problem (MAUP)

  • The districts and regions especially differ markedly in shape and population size Model doesn't capture any spatial spillover
  • between e.g. neighbouring places that are contiguous but in different regions

Possible solutions

Use gridded data

Further information

Please see the pre-publication paper: available here

Or contact rich.harris@bris.ac.uk