Wong (2003) outines a mathematical method for assessing the contribution of spatial units at different scales to the calculated value at the smallest scale. His method is based on the index of dissimilarity, possibly the most popular segregation measure in social sciences. Dissimilarity (\(D\)) measures how evenly two population groups are distributed across spatial units relative to their overall proportions of the total population.
Consider a city with two nested spatial units: a “region” unit (e.g. city wards), and smaller “local” units (e.g. enumeration districts or EDs). Larger spatial units are more likely to include sizable numbers of both population groups, resulting in a lower \(D\) value when calculating dissimilarity based on ward counts. Smaller EDs tend to be dominated by a single group, resulting in a larger \(D\) value. The difference between these scores can be interpreted as the additional contribution of EDs to segregation on top of segregation due to wards (Wong 2003, p. 183). This is the property of \(D\) that allows one assess scale-specific segregation by simply taking differences between city-level results for each unit.
By this Wong’s method of decomposition, each ward in a city contributes to the overall value of \(D_{ED}\) in two ways: the overall counts of each population group within the larger “region” unit (\(RD_{j}\)), and how evenly their populations are distributed across smaller “local” units (\(LD_{j}\)) within them. The values of \(RD_{j}\) for each ward in a city add up to the value of \(D_{Ward}\), and the values of \(LD_{j}\) for each ward add up to the additional contribution of EDs within wards, \(D_{ED}\) - \(D_{Ward}\) (Wong 2003, p. 183-4).
We can explore Wong’s formula’s for calculating \(RD_{j}\), \(LD_{j}\), and their relationship to \(D_{ED}\), using real data, in this case Providence in 1940. First, I load the R packages I need and load a file of black and white population counts by ED.
library(readr) #for read_csv()
library(dplyr) #manipulating data
library(purrr) #vectorizing functions
library(here) #file management
# read data
pvd <- read_csv(here("data", "pvd_ward_ed_counts.csv"))
# what does data look like?
pvd
## # A tibble: 269 x 4
## ward ed w b
## <dbl> <dbl> <dbl> <dbl>
## 1 1 1 239 23
## 2 1 2 416 434
## 3 1 3 1201 44
## 4 1 4 1541 250
## 5 1 5 1230 122
## 6 1 6 1000 347
## 7 1 7 1300 98
## 8 1 8 392 58
## 9 1 9 991 85
## 10 1 10 948 85
## # … with 259 more rows
I built two dedicated functions to compute \(D\) and its spatial components, \(RD_{j}\) and \(LD_{j}\). The formula for computing \(RD_{j}\) for a single “region” unit is found in equation 10 (Wong 2003, p. 183): \[RD_{j} = |\frac{\sum_{i \in j}^{} w_{ij}}{W} - \frac{\sum_{i \in j}^{} b_{ij}}{B}|\] where \(w_{ib}\) and \(b_{ib}\) are the white and black population counts for each ED \(i\) in ward \(j\), and \(W\) and \(B\) are the total white and black populations in the whole city.
# ward_rows is the population counts for EDs in a single ward
calc_rd <- function(ward_rows, W, B){
# result divided by 2 to scale with D index
abs((sum(ward_rows$w)/W) - (sum(ward_rows$b)/B))/2
}
Then, we can set up a function to calculate \(LD_{j}\) for the same region unit following equation 12 (Wong 2003, p. 184): \[LD_{j} = \sum_{i \in j} |\frac{w_{ij}}{W} - \frac{b_{ij}}{B}|-RD_{j}\]
# ward_rows is the population counts for EDs in a single ward
calc_ld <- function(ward_rows, W, B){
RD <- calc_rd(ward_rows, W = W, B = B)
LHS <- sum(abs(ward_rows$w/W - ward_rows$b/B))/2
LHS - RD
}
We can also write a function to calculate dissimilarity for the city as a whole using either spatial unit, ward or ED, found in equation 1 (Wong 2003, p. 181): \[D=\frac{1}{2}\sum_{i}|\frac{w_{i}}{W} - \frac{b_{i}}{B}|\]
# ward_rows is the population counts for EDs in a single ward
diss <- function(rows, unit){
# some R magic
unit <- enquo(unit)
# get group counts for chosen unit
counts <- rows %>%
group_by(!!unit) %>%
summarise(
w = sum(w),
b = sum(b)
)
# calculate D
sum(abs(counts$w/sum(counts$w)-counts$b/sum(counts$b)))/2
}
We can now compute \(RD_{j}\) and \(LD_{j}\) for each ward. To do this, I split the Providence data into a list containing separate tables for each ward. I can then apply the \(RD_{j}\) and \(LD_{j}\) functions to these tables, creating a vector containing the values for each ward.
wards <- pvd %>%
group_by(ward) %>%
group_split()
rd_values <- wards %>%
# apply RD funciton
map_dbl(
calc_rd,
# specify total counts for Providence
W = sum(pvd$w),
B = sum(pvd$b)
)
ld_values <- wards %>%
# apply LD funciton
map_dbl(
calc_ld,
# specify total counts for Providence
W = sum(pvd$w),
B = sum(pvd$b)
)
First, let’s add up the \(RD_{j}\) values.
sum(rd_values)
## [1] 0.5600013
Now, we can compare this to the value of \(D_{Ward}\).
diss(pvd, ward)
## [1] 0.5600013
We get the exact same result! Next, we can confirm that the \(LD_{j}\) of each ward adds up the the remaining contriubion of EDs to dissimilarity at the ED level.
sum(ld_values)
## [1] 0.2459095
diss(pvd, ed) - diss(pvd, ward)
## [1] 0.2459095
Again, it’s the exact same result! This confirms that taking the difference of dissimiarity calculated at different scales is equivalent to adding up decomposed values based on spatial units. Finally, we can see how we add up all the spatial components, we get the value of \(D_{ED}\).
sum(rd_values) + sum(ld_values)
## [1] 0.8059108
diss(pvd, ed)
## [1] 0.8059108
This allows us to think about how much each scale is contributing to the total segregaion at the finest scale across Providence.
c(
"Ward" = sum(rd_values) / diss(pvd, ed),
"ED" = sum(ld_values) / diss(pvd, ed)
)
## Ward ED
## 0.6948676 0.3051324
Wong, DWS. 2003. “Spatial Decomposition of Segregation Indices: A Framework Toward Measuring Segregation at Multiple Levels.” Geographical Analysis 35(3): 179-94.