Wong’s Decomposition of Dissimilarity

Wong (2003) outines a mathematical method for assessing the contribution of spatial units at different scales to the calculated value at the smallest scale. His method is based on the index of dissimilarity, possibly the most popular segregation measure in social sciences. Dissimilarity (\(D\)) measures how evenly two population groups are distributed across spatial units relative to their overall proportions of the total population.

Consider a city with two nested spatial units: a “region” unit (e.g. city wards), and smaller “local” units (e.g. enumeration districts or EDs). Larger spatial units are more likely to include sizable numbers of both population groups, resulting in a lower \(D\) value when calculating dissimilarity based on ward counts. Smaller EDs tend to be dominated by a single group, resulting in a larger \(D\) value. The difference between these scores can be interpreted as the additional contribution of EDs to segregation on top of segregation due to wards (Wong 2003, p. 183). This is the property of \(D\) that allows one assess scale-specific segregation by simply taking differences between city-level results for each unit.

By this Wong’s method of decomposition, each ward in a city contributes to the overall value of \(D_{ED}\) in two ways: the overall counts of each population group within the larger “region” unit (\(RD_{j}\)), and how evenly their populations are distributed across smaller “local” units (\(LD_{j}\)) within them. The values of \(RD_{j}\) for each ward in a city add up to the value of \(D_{Ward}\), and the values of \(LD_{j}\) for each ward add up to the additional contribution of EDs within wards, \(D_{ED}\) - \(D_{Ward}\) (Wong 2003, p. 183-4).

Decomposing D for Providence

We can explore Wong’s formula’s for calculating \(RD_{j}\), \(LD_{j}\), and their relationship to \(D_{ED}\), using real data, in this case Providence in 1940. First, I load the R packages I need and load a file of black and white population counts by ED.

library(readr) #for read_csv()
library(dplyr) #manipulating data
library(purrr) #vectorizing functions
library(here) #file management

# read data
pvd <- read_csv(here("data", "pvd_ward_ed_counts.csv"))

# what does data look like?
pvd
## # A tibble: 269 x 4
##     ward    ed     w     b
##    <dbl> <dbl> <dbl> <dbl>
##  1     1     1   239    23
##  2     1     2   416   434
##  3     1     3  1201    44
##  4     1     4  1541   250
##  5     1     5  1230   122
##  6     1     6  1000   347
##  7     1     7  1300    98
##  8     1     8   392    58
##  9     1     9   991    85
## 10     1    10   948    85
## # … with 259 more rows

I built two dedicated functions to compute \(D\) and its spatial components, \(RD_{j}\) and \(LD_{j}\). The formula for computing \(RD_{j}\) for a single “region” unit is found in equation 10 (Wong 2003, p. 183): \[RD_{j} = |\frac{\sum_{i \in j}^{} w_{ij}}{W} - \frac{\sum_{i \in j}^{} b_{ij}}{B}|\] where \(w_{ib}\) and \(b_{ib}\) are the white and black population counts for each ED \(i\) in ward \(j\), and \(W\) and \(B\) are the total white and black populations in the whole city.

# ward_rows is the population counts for EDs in a single ward
calc_rd <- function(ward_rows, W, B){
  # result divided by 2 to scale with D index
  abs((sum(ward_rows$w)/W) - (sum(ward_rows$b)/B))/2
}

Then, we can set up a function to calculate \(LD_{j}\) for the same region unit following equation 12 (Wong 2003, p. 184): \[LD_{j} = \sum_{i \in j} |\frac{w_{ij}}{W} - \frac{b_{ij}}{B}|-RD_{j}\]

# ward_rows is the population counts for EDs in a single ward
calc_ld <- function(ward_rows, W, B){
  RD <- calc_rd(ward_rows, W = W, B = B)
  LHS <- sum(abs(ward_rows$w/W - ward_rows$b/B))/2
  LHS - RD
}

We can also write a function to calculate dissimilarity for the city as a whole using either spatial unit, ward or ED, found in equation 1 (Wong 2003, p. 181): \[D=\frac{1}{2}\sum_{i}|\frac{w_{i}}{W} - \frac{b_{i}}{B}|\]

# ward_rows is the population counts for EDs in a single ward
diss <- function(rows, unit){
  # some R magic
  unit <- enquo(unit)
  
  # get group counts for chosen unit
  counts <- rows %>% 
    group_by(!!unit) %>%
    summarise(
      w = sum(w),
      b = sum(b)
    )
  
  # calculate D
  sum(abs(counts$w/sum(counts$w)-counts$b/sum(counts$b)))/2
}

We can now compute \(RD_{j}\) and \(LD_{j}\) for each ward. To do this, I split the Providence data into a list containing separate tables for each ward. I can then apply the \(RD_{j}\) and \(LD_{j}\) functions to these tables, creating a vector containing the values for each ward.

wards <- pvd %>% 
  group_by(ward) %>% 
  group_split()

rd_values <- wards %>% 
  # apply RD funciton
  map_dbl(
    calc_rd,
    # specify total counts for Providence
    W = sum(pvd$w),
    B = sum(pvd$b)
  )

ld_values <- wards %>% 
  # apply LD funciton
  map_dbl(
    calc_ld,
    # specify total counts for Providence
    W = sum(pvd$w),
    B = sum(pvd$b)
  )

Comparing D to its Components

First, let’s add up the \(RD_{j}\) values.

sum(rd_values)
## [1] 0.5600013

Now, we can compare this to the value of \(D_{Ward}\).

diss(pvd, ward)
## [1] 0.5600013

We get the exact same result! Next, we can confirm that the \(LD_{j}\) of each ward adds up the the remaining contriubion of EDs to dissimilarity at the ED level.

sum(ld_values)
## [1] 0.2459095
diss(pvd, ed) - diss(pvd, ward)
## [1] 0.2459095

Again, it’s the exact same result! This confirms that taking the difference of dissimiarity calculated at different scales is equivalent to adding up decomposed values based on spatial units. Finally, we can see how we add up all the spatial components, we get the value of \(D_{ED}\).

sum(rd_values) + sum(ld_values)
## [1] 0.8059108
diss(pvd, ed)
## [1] 0.8059108

This allows us to think about how much each scale is contributing to the total segregaion at the finest scale across Providence.

c(
  "Ward" = sum(rd_values) / diss(pvd, ed),
  "ED" = sum(ld_values) / diss(pvd, ed)
)
##      Ward        ED 
## 0.6948676 0.3051324

Bibliography

Wong, DWS. 2003. “Spatial Decomposition of Segregation Indices: A Framework Toward Measuring Segregation at Multiple Levels.” Geographical Analysis 35(3): 179-94.