The index of dissimilarity has long been the industry standard for measuring segregation. Since 1976, a host of new measures, rediscovered old indices, and a variety of definitions for segregation have been proposed. There is little agreement about which measure to use under which circumstances Massey and Denton (1988) attempted to quell the disagreement by incorporating many indices under one conceptual framework.
Methodological issues in the measurement of spatial segregation Segregation can be thought of as the extent to which individuals of different groups occupy or experience different social environments. A measure of segregation, then, requires that we define the social environment of each individual that we quantify the extent to which these social environments differ across individuals.
The Dimensions of Residential Segregation Segregation can be thought of as the degree to which two or more groups live separately from one another. Living apart could imply that groups are segregated in a variety of ways. Researchers argue for the adoption of one index and exclude others—fruitless according to Massey and Denton. Massey and Denton (1988) identify 5 distinct dimensions of residential segregation:
Evenness is the degree to which the percentage of minority members within residential areas approaches the minority percentage of the entire neighborhood
Exposure is the degree of potential contact between minority and majority members in neighborhoods
Concentration is the relative amount of physical space occupied by a minority group
Centralization is the degree to which minority members settle in and around the center of a neighborhood
Clustering is the extent to which minority areas adjoin one another in space
As an excellent reference for formulas, consult the US Census Bureau’s report on segregation measures
First we need to load some libraries. We will use R to get all of our census data for us, either from the 2010 100% Summary File 1 (via the UScensus2010 library suite), or the acs package.
library(spdep)
library(seg)
library(UScensus2010)
library(UScensus2010tract)
##
## UScensus2010tract: US Census 2010 Tract Level Shapefiles and Additional Demographic
## Data
## Version 1.00 created on 2011-11-06
## copyright (c) 2011, Zack W. Almquist, University of California-Irvine
## Type help(package="UScensus2010tract") to get started.
##
## For citation information, type citation("UScensus2010tract").
library(UScensus2010county)
##
## UScensus2010county: US Census 2010 County Level Shapefiles and Additional
## Demographic Data
## Version 1.00 created on 2011-11-06
## copyright (c) 2011, Zack W. Almquist, University of California-Irvine
## Type help(package="UScensus2010county") to get started.
##
## For citation information, type citation("UScensus2010county").
library(acs)
library(sp)
library(RColorBrewer)
Before we can use the SF 1 data, you must install the various geographies you want to use to make your segregation indices. I’ll be using tracts, summed up to counties. So, i’ll do: install.tract("osx") and install.county("osx") since i’m on a Mac. I’ve had bad luck using install.*("windows") on my Windows box, and typically use install.*("linux") on either linux or windows.
The first example mimics that from Sparks (2014), Where I show how to calculate three common measures of segregation: Dissimiliarity, Isolation/Interaction and Theile’s Entropy index for multiple groups.
The general process is 1. Create a lower-level unit data set, that has your populations subdivided by race, SES, etc. Here i’m using census tracts. + 1a do any calculations, such as percentages on this data set. 2. Sum the populations up to the higher level unit, such as the county below + 2a Again, do any calculations you need to do 3. Use the formula to put the various parts together. 4. Use tapply() to sum your index across lower level units within higher level units 5. congratulate yourself.
#create a set of three race variables, total, white and black for each census tract in the state of Texas from the 2010 SF1 data
mydem<-demographics(dem=c("P0030001", "P0030002", "P0030003","P0030004","P0030005","P0030006","P0030007","P0030008"), state="48", statefips=TRUE, level="tract")
#assemble a data frame with sensical names and create a county FIPS code for each tract
#First, I assemble the "other" race group, which in the SF1 is composed of 5 tables P0030004 to P0030008
other<-data.frame(cofips=substr(rownames(mydem), 1,5),fips=rownames(mydem),oth1=unlist(mydem[1,"P0030004"]),
oth2=unlist(mydem[1,"P0030005"]),
oth3=unlist(mydem[1,"P0030006"]),
oth4=unlist(mydem[1,"P0030007"]),
oth5=unlist(mydem[1,"P0030008"]))
#build the tract population data
trdat<-data.frame(cofips=substr(rownames(mydem), 1,5),fips=rownames(mydem), total=unlist(mydem[1,"P0030001"]), white=unlist(mydem[1,"P0030002"]), black=unlist(mydem[1,"P0030003"]),other=apply(other[,3:7],1,sum))
#sort the data by county and tract
trdat<-trdat[order(trdat$cofips, trdat$fips),]
#look at the first few cases
head(trdat)
## cofips fips total white black other
## 4686 48001 48001950100 4685 4012 452 221
## 4676 48001 48001950401 5422 1825 2266 1331
## 4677 48001 48001950402 7535 2591 3248 1696
## 4683 48001 48001950500 4377 2737 800 840
## 4682 48001 48001950600 6405 3831 1674 900
## 4681 48001 48001950700 2640 1051 1164 425
#We need the county-level totals for the total population and each race group
co_total<-tapply(trdat$total, trdat$cofips, sum)
co_total<-data.frame(cofips=names(unlist(co_total)), pop=unlist(co_total))
co_wht<-tapply(trdat$white, trdat$cofips, sum)
co_wht<-data.frame(cofips=names(unlist(co_wht)), pop=unlist(co_wht))
co_blk<-tapply(trdat$black, trdat$cofips, sum)
co_blk<-data.frame(cofips=names(unlist(co_blk)), pop=unlist(co_blk))
co_oth<-tapply(trdat$other, trdat$cofips, sum)
co_oth<-data.frame(cofips=names(unlist(co_oth)), pop=unlist(co_oth))
#For the multi-group measure of segregation, we also need population proportions
c_pwhite<-co_wht$pop/co_total$pop
c_pblack<-co_blk$pop/co_total$pop
c_pother<-co_oth$pop/co_total$pop
#we assemble them into a county-level data frame with easy names
county_dat<-data.frame(cofips=co_total$cofips, co_total=co_total$pop, co_wht_total=co_wht$pop, co_blk_total=co_blk$pop, co_oth_total=co_oth$pop, c_pwhite=c_pwhite, c_pblack=c_pblack, c_pother=c_pother)
#Now I make the county-level Entropy measure to be used later, it's easier to do it before
#merging back to the tracts.
county_dat$c_ent<-county_dat$c_pwhite*(log(1/county_dat$c_pwhite))+county_dat$c_pblack*(log(1/county_dat$c_pblack))+county_dat$c_pother*(log(1/county_dat$c_pother))
county_dat$c_ent<-ifelse(is.na(county_dat$c_ent)==T, 0,county_dat$c_ent)
#we merge the county data back to the tract data by the county FIPS code
merged<-merge(x=trdat,y=county_dat, by="cofips", all.x=T )
#have a look and make sure it looks ok
head(merged)
## cofips fips total white black other co_total co_wht_total
## 1 48001 48001950100 4685 4012 452 221 58458 38632
## 2 48001 48001950401 5422 1825 2266 1331 58458 38632
## 3 48001 48001950402 7535 2591 3248 1696 58458 38632
## 4 48001 48001950500 4377 2737 800 840 58458 38632
## 5 48001 48001950600 6405 3831 1674 900 58458 38632
## 6 48001 48001950700 2640 1051 1164 425 58458 38632
## co_blk_total co_oth_total c_pwhite c_pblack c_pother c_ent
## 1 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 2 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 3 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 4 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 5 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 6 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
That is the mechanics for setting up the data. Now we will calculate the indices: ###2 group segregation measures Now we begin the segregation calculations, First we calculate the tract-specific contribution to the county dissimilarity index, then we use the tapply() function to sum the tract-specific contributions within counties. The Dissimiliarity index for blacks and whites is:
\[D = .5* \sum_i \left | \frac{b_i}{B} - \frac{w_i}{W} \right |\] where \(b_i\) is the number of blacks in each tract, B is the number of blacks in the county, \(w_i\) is the number of whites in the tract, and W is the number of whites in the county.
merged$d.wb<-(abs(merged$white/merged$co_wht_total - merged$black/merged$co_blk_total))
The county-level dissimilarity index is just the sum of the tract values within a county, this is easily done with the tapply() function, which in this case will sum the tract-specific contributions to the index within each county.
dissim.wb.tr<-.5*tapply(merged$d.wb, merged$cofips, sum, na.rm=T)
hist(dissim.wb.tr)
Next is the interaction index for blacks and white. In the calculation first population is minority population, second is non-minority population. The formula is: \[\text{Interaction} = \sum_i \frac{b_i}{B} * \frac{w_i}{t_i} \]
merged$int.wb<-(merged$black/merged$co_blk_total * merged$white/merged$total)
#Again, we sum these within each county
int.wb.tr<-tapply(merged$int.wb, merged$cofips, sum, na.rm=T)
head(int.wb.tr)
## 48001 48003 48005 48007 48009 48011
## 0.5198444 0.7840112 0.5427863 0.8582850 0.9498711 0.9331931
hist(int.wb.tr)
Next is is the isolation index for blacks. The formula is:
\[\text{Interaction} = \sum_i \frac{b_i}{B} * \frac{b_i}{t_i} \]
merged$iso.b<-(merged$black/merged$co_blk_total * merged$black/merged$total)
#Again, we sum these within each county
isol.b.tr<-tapply(merged$iso.b, merged$cofips, sum, na.rm=T)
head(isol.b.tr)
## 48001 48003 48005 48007 48009 48011
## 0.307966255 0.018252381 0.290047066 0.018043037 0.006946279 0.005786428
hist(isol.b.tr)
So, those are the typical dissimilarity, interaction and isolation indices for two groups.
Next, we calculate Theile’s multi-group Entropy index using whites, blacks and all others, but of course we could break this into as many groups as you want, with only a few more lines of code. This is a more complicated formula, and is:
\[E = \sum_i \left [\frac{ t_i(E - E_i)}{ET} \right ]\] Where \[E_i = p_i \text{ln} \left (\frac{1}{p_i}\right)+(1-p_i) \text{ln} \left ( \frac{1}{1-p_i} \right)\] and \[E = P \text{ln} \left (\frac{1}{P}\right) + (1-P) \text(ln) \left ( \frac{1}{1-P} \right)\]
#Just like for the counties, we also need population proportions at the tract level
merged$tr_pwhite<-merged$white/merged$total
merged$tr_pblk<-merged$black/merged$total
merged$tr_pother<-merged$other/merged$total
#This is the tract-level entropy measure
merged$tr_ent<-merged$tr_pwhite*(log(1/merged$tr_pwhite))+merged$tr_pblk*(log(1/merged$tr_pblk))+merged$tr_pother*(log(1/merged$tr_pother))
merged$tr_ent<-ifelse(is.na(merged$tr_ent)==T, 0,merged$tr_ent)
head(merged)
## cofips fips total white black other co_total co_wht_total
## 1 48001 48001950100 4685 4012 452 221 58458 38632
## 2 48001 48001950401 5422 1825 2266 1331 58458 38632
## 3 48001 48001950402 7535 2591 3248 1696 58458 38632
## 4 48001 48001950500 4377 2737 800 840 58458 38632
## 5 48001 48001950600 6405 3831 1674 900 58458 38632
## 6 48001 48001950700 2640 1051 1164 425 58458 38632
## co_blk_total co_oth_total c_pwhite c_pblack c_pother c_ent
## 1 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 2 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 3 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 4 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 5 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## 6 12310 7516 0.6608505 0.2105785 0.1285709 0.8655364
## d.wb int.wb iso.b tr_pwhite tr_pblk tr_pother
## 1 0.067133614 0.03144356 0.003542495 0.8563501 0.09647812 0.04717182
## 2 0.136837356 0.06195912 0.076931154 0.3365917 0.41792696 0.24548137
## 3 0.196781777 0.09072816 0.113734109 0.3438620 0.43105508 0.22508295
## 4 0.005860187 0.04063780 0.011878056 0.6253141 0.18277359 0.19191227
## 5 0.036820508 0.08133742 0.035541334 0.5981265 0.26135831 0.14051522
## 6 0.067351845 0.03764382 0.041691160 0.3981061 0.44090909 0.16098485
## tr_ent
## 1 0.5024684
## 2 1.0759163
## 3 1.0654821
## 4 0.9210035
## 5 0.9338686
## 6 1.0217681
#Now I calculate each tract's contribution to the H index
merged$Hcalc<-(merged$total*(merged$c_ent-merged$tr_ent))/(merged$co_total*merged$c_ent)
merged$Hcalc<-ifelse(is.na(merged$Hcalc)==T, 0, merged$Hcalc)
#Then, we sum across tracts within counties to get the H index for each county
hindex<-unlist(tapply(merged$Hcalc, merged$cofips, sum))
head(hindex)
## 48001 48003 48005 48007 48009 48011
## 0.11995495 0.01286551 0.13729605 0.01428389 0.03167802 0.00000000
hist(hindex)
The Neighborhood sorting index is one measure of economic segregation that uses incomes within areas, related to a larger area (such as tracts within metros) Finally, I do the GNSI index for incomes. For a reference, consult this working paper .
This is an approximate index, since the original paper, I think, assumes you have individual household incomes with geographic locations, which I don’t have. Instead I relate tract median household income to the county’s median household income.
We use the acs package to extract our data, to do this, you must have registered with the Census Bureau’s API service here. I use the 2010 5-year ACS summary file. For reference, here is a nice list of tables in the ACS.
I don’t show you my actual key, but you can enter yours like: mykey<-"theykeycensussendsyou"
####Jargowski GNSI
#Get 2010 ACS median household incomes for tracts in Texas
txtracts<-geo.make(state="TX",county="*", tract="*")
income.tract<-acs.fetch(key=mykey,endyear = 2010, span = 5, geography = txtracts, table.number = "B19013")
## Warning: NAs introduced by coercion
#prepare the income data
tx.trincome<-data.frame(income=income.tract@estimate)
tx.trincome$ids<-paste(income.tract@geography$state, sprintf("%03d", income.tract@geography$county),sprintf("%05s", income.tract@geography$tract), sep="")
names(tx.trincome)<-c("medhhincome", "tractfips")
tx.trincome$cofips<-substr(tx.trincome$tractfips, 1,5)
head(tx.trincome)
## medhhincome tractfips
## Census Tract 9501, Anderson County, Texas 46027 48001950100
## Census Tract 9504.01, Anderson County, Texas 61442 48001950401
## Census Tract 9504.02, Anderson County, Texas 61538 48001950402
## Census Tract 9505, Anderson County, Texas 32529 48001950500
## Census Tract 9506, Anderson County, Texas 40409 48001950600
## Census Tract 9507, Anderson County, Texas 30245 48001950700
## cofips
## Census Tract 9501, Anderson County, Texas 48001
## Census Tract 9504.01, Anderson County, Texas 48001
## Census Tract 9504.02, Anderson County, Texas 48001
## Census Tract 9505, Anderson County, Texas 48001
## Census Tract 9506, Anderson County, Texas 48001
## Census Tract 9507, Anderson County, Texas 48001
#average incomes at the county level
tx.coicome<-tapply(tx.trincome$medhhincome, tx.trincome$cofips, mean, na.rm=T)
tx.coicome<-data.frame(cofips=rownames(tx.coicome), countyincome=unlist(tx.coicome))
head(tx.coicome)
## cofips countyincome
## 48001 48001 43519.36
## 48003 48003 53190.00
## 48005 48005 40554.88
## 48007 48007 44245.60
## 48009 48009 53552.00
## 48011 48011 60530.00
mergetractcodat<-merge(tx.trincome, tx.coicome, by.x="cofips", by.y="cofips", all.x=T)
mergetractcodat$difftrco<-mergetractcodat$medhhincome-mergetractcodat$countyincome
#here I find the average tract median household income within a county
meanincs<-unlist(tapply(as.numeric(as.character(mergetractcodat$medhhincome)), mergetractcodat$cofips, mean, na.rm=T))
meanincs<-data.frame(ids=as.character(names(meanincs)), comeaninc=meanincs)
#merge tract data to average county income
mergetractcodat2<-merge(mergetractcodat, meanincs, by.x="cofips", by.y="ids", all.x=T, sort=F)
#calculate the difference between each tract income and the county average
mergetractcodat2$dev<-mergetractcodat2$medhhincome-mergetractcodat2$comeaninc
mergetractcodat2$dev<-ifelse(is.na(mergetractcodat2$dev)==T, 0, mergetractcodat2$dev)
#Texas county shapefile, this comes from the USCensus2010tract library
data("texas.tract10")
datna<-texas.tract10[,c(1:5)]
head(datna@data)
## state county tract fips P0010001
## texas.tract10_0 48 113 014133 48113014133 3909
## texas.tract10_1 48 113 980000 48113980000 0
## texas.tract10_2 48 113 014134 48113014134 4065
## texas.tract10_3 48 113 014135 48113014135 4451
## texas.tract10_4 48 113 014131 48113014131 3811
## texas.tract10_5 48 113 014132 48113014132 2169
datna@data<-merge(datna@data,mergetractcodat2,by.x="fips", by.y="tractfips", all.x=T, sort=F )
head(datna@data)
## fips state county tract P0010001 cofips medhhincome countyincome
## 1 48113014133 48 113 014133 3909 48113 38344 54715.38
## 2 48113980000 48 113 980000 0 48113 NA 54715.38
## 3 48113014134 48 113 014134 4065 48113 124196 54715.38
## 4 48113014135 48 113 014135 4451 48113 74583 54715.38
## 5 48113014131 48 113 014131 3811 48113 53036 54715.38
## 6 48113014132 48 113 014132 2169 48113 69048 54715.38
## difftrco comeaninc dev
## 1 -16371.381 54715.38 -16371.381
## 2 NA 54715.38 0.000
## 3 69480.619 54715.38 69480.619
## 4 19867.619 54715.38 19867.619
## 5 -1679.381 54715.38 -1679.381
## 6 14332.619 54715.38 14332.619
So, again that step was just putting our data in the right format, tracts merged with county information.
Now we can calculate the GNSI. Again, this is a custom written function and it works, don’t tinker too much unless you have to. Basically, this function looks at the number of polygons within a county (# of tracts within a county). If there is 1, then the GNSI is 0. If there are 2 tracts then the spatial weight matrix is: \[W= \begin{bmatrix} .5& .5\\ .5 & .5 \end{bmatrix}\]
Otherwise, the spatial weight matrix will be calculated as normal.
As described in the working paper above, the GNSI can be written very simply, if you use the deviations of each tract from the larger are, and the spatial weight matrix, \(W\).
\[GNSI = \left ( \frac{y'W_k'W_k y}{y'y} \right) ^.5\]
here, \(y\) is the deviation of each tract from the larger area, and \(W\) is the spatial weight matrix for the tracts within a county. Note, that in this weight matrix, I use the tracts themselves in the calculations, so the diagonal of \(W\) is 1.
GNSI<-function(datna){
ncos<-length(unique(datna$county))
counties<-unique(datna$county)
res<-numeric(ncos)
for (i in 1:ncos){
#see if there is 1 tract in county == 0 spatial weight
if(length(datna@polygons[datna$county==counties[i]])==1){
res[i]<-0
} else #see if there are 2 tracst in county == .5 spatial weight
if(length(datna@polygons[datna$county==counties[i]])==2){
wts<-matrix(c(.5,.5,.5,.5), ncol=2, nrow=2)
y<-data.matrix(as.numeric(as.character(datna$dev[datna$county==counties[i]])))
y<-ifelse(is.na(y),0,y)
num<-t(y)%*%t(wts)%*%wts%*%y
denom<-t(y)%*%y
res[i]<-(num/denom)^.5
} else #if there are more than 2 tracts, I calculate the full blown queen contiguity matrix
if(length(datna@polygons[datna$county==counties[i]])>2){
nbs<-poly2nb(datna[datna$county==counties[i],],queen=T)
wts<-data.matrix(nb2mat(nbs, style="B", zero.policy=T))
ntr<-length(datna@polygons[datna$county==counties[i]])
diag(wts)<-1
wts<-wts/apply(wts, 1, sum)
y<-data.matrix(as.numeric(as.character(datna$dev[datna$county==counties[i]])))
y<-ifelse(is.na(y),0,y)
num<-t(y)%*%t(wts)%*%wts%*%y
denom<-t(y)%*%y
res[i]<-(num/denom)^.5
}
# print(i) #this prints the progress
list(result=res)
}
res1<-numeric(ncos)
for(i in 1:ncos){ res1[i]<-length(datna@polygons[datna$county==counties[i]]); res1[i]}
countydat<-data.frame(county=counties, gnsi=res, ntracts=res1)
return(countydat)
}
mycountyGNSI<-GNSI(datna)
mycountyGNSI$cofips<-paste("48",mycountyGNSI$county, sep="")
head(mycountyGNSI)
## county gnsi ntracts cofips
## 1 113 7.330552e-01 529 48113
## 2 227 4.134520e-01 10 48227
## 3 155 0.000000e+00 1 48155
## 4 153 0.000000e+00 2 48153
## 5 043 2.365989e-15 3 48043
## 6 159 4.855904e-01 3 48159
hist(mycountyGNSI$gnsi)
Reardon and O’Sullivan, 2004 describe how most measures of segregation may be made explicitly spatial. They are also kind enough to make an R library seg available to actually do this. Here, I will calculate some spatial segregation measures using the seg library. Below, I calculate the NH White/Hispanic spatial dissimiliarity,isolation, interaction and entropy index.
#ACS table B03002 is Hispanic ethnicity by race
race.tract<-acs.fetch(key=mykey, endyear=2010, span=5, geography=txtracts, table.number="B03002")
colnames(race.tract@estimate)
## [1] "B03002_001" "B03002_002" "B03002_003" "B03002_004" "B03002_005"
## [6] "B03002_006" "B03002_007" "B03002_008" "B03002_009" "B03002_010"
## [11] "B03002_011" "B03002_012" "B03002_013" "B03002_014" "B03002_015"
## [16] "B03002_016" "B03002_017" "B03002_018" "B03002_019" "B03002_020"
## [21] "B03002_021"
tx.trrace<-data.frame( nhwhite= race.tract@estimate[, 3],
nhblack=race.tract@estimate[, 4],
nhother=apply(race.tract@estimate[, c(5:11)], MARGIN = 1, sum ),
hispanic =race.tract@estimate[, 12])
tx.trrace$ids<-paste(race.tract@geography$state, sprintf("%03d", race.tract@geography$county),sprintf("%05s", race.tract@geography$tract), sep="")
tx.trrace$cofips<-substr(tx.trrace$ids, 1,5)
head(tx.trrace)
## nhwhite nhblack nhother
## Census Tract 9501, Anderson County, Texas 4367 643 76
## Census Tract 9504.01, Anderson County, Texas 2545 3342 161
## Census Tract 9504.02, Anderson County, Texas 1224 1824 110
## Census Tract 9505, Anderson County, Texas 1826 1229 77
## Census Tract 9506, Anderson County, Texas 3121 1194 352
## Census Tract 9507, Anderson County, Texas 1120 1363 40
## hispanic ids cofips
## Census Tract 9501, Anderson County, Texas 138 48001950100 48001
## Census Tract 9504.01, Anderson County, Texas 2098 48001950401 48001
## Census Tract 9504.02, Anderson County, Texas 647 48001950402 48001
## Census Tract 9505, Anderson County, Texas 1463 48001950500 48001
## Census Tract 9506, Anderson County, Texas 1535 48001950600 48001
## Census Tract 9507, Anderson County, Texas 749 48001950700 48001
race.sp<-texas.tract10[,c(1:4)]
head(race.sp@data)
## state county tract fips
## texas.tract10_0 48 113 014133 48113014133
## texas.tract10_1 48 113 980000 48113980000
## texas.tract10_2 48 113 014134 48113014134
## texas.tract10_3 48 113 014135 48113014135
## texas.tract10_4 48 113 014131 48113014131
## texas.tract10_5 48 113 014132 48113014132
race.sp<-merge(race.sp,tx.trrace,by.x="fips", by.y="ids", all.x=T, sort=F )
head(race.sp@data)
## fips state county tract nhwhite nhblack nhother hispanic
## 1323 48113014133 48 113 014133 1166 1451 751 739
## 1563 48113980000 48 113 980000 0 0 0 0
## 1324 48113014134 48 113 014134 2390 21 1308 154
## 1325 48113014135 48 113 014135 2839 455 879 525
## 1321 48113014131 48 113 014131 1813 1242 926 472
## 1322 48113014132 48 113 014132 906 555 678 224
## cofips
## 1323 48113
## 1563 48113
## 1324 48113
## 1325 48113
## 1321 48113
## 1322 48113
The spseg() function calculates the spatial versions of these measures, but it assumes you want to do this for a single area. We want to do this for many areas, so I write a function that will loop over all counties and return the indices for each county.
library(seg)
dat<-race.sp #spatialpolygondataframe with the race data
cos<-unique(dat$cofips) #unique county fips codes
spsegout<-function(dat, cos){
theil<-numeric(length(cos))
diss<-numeric(length(cos))
div<-numeric(length(cos))
isol<-numeric(length(cos))
inter<-numeric(length(cos))
county<-NULL
for (i in 1:length(cos)){
#Modify the two groups in the next line, this usees NH White and Hispanic
out<-spseg(dat[dat$cofips==cos[i],],data =dat@data[dat$cofips==cos[i],c("nhwhite","hispanic")], method="all", useC=T, negative.rm=T)
div[i]<-out@r #Diversity
diss[i]<-out@d #Dissimilarity
theil[i]<-out@h #Theil Entrophy
isol[i]<-out@p[2,2] #Isolation for Hispanics
inter[i]<-out@p[1,2] #Interaction for Whites and Hispanics
county[i]<-as.character(cos[i])
}
list(theil=theil, dissim=diss, diversity=div,isolation=isol, interaction=inter, county=county)
}
spsegs<-data.frame(spsegout(dat,cos))
head(spsegs)
## theil dissim diversity isolation interaction county
## 1 8.033904e-04 0.0228552227 0.001112563 0.5234456 0.5162524 48113
## 2 -2.646579e-03 0.0153709000 -0.003669424 0.4210627 0.4154946 48227
## 3 0.000000e+00 0.0000000000 0.000000000 0.2246596 0.2246596 48155
## 4 1.616968e-05 0.0008206314 0.000022407 0.5365595 0.5365566 48153
## 5 4.104968e-03 0.0933361597 0.005554689 0.4568777 0.4380643 48043
## 6 -1.502714e-02 0.0398262489 -0.020490717 0.1343850 0.1295927 48159
So, that gets us a bunch of stuff, now I merge these various indices into a single data frame so I can merge it to a shapefile:
#assemble the indices into a dataframe with nice names
county_dat<-data.frame(cofips=names(unlist(dissim.wb.tr)), dissim_wb=unlist(dissim.wb.tr), isolation_b=isol.b.tr, interaction_bw=int.wb.tr, TheileH=hindex)
county_dat<-merge(x=county_dat, y=mycountyGNSI[,-1], by="cofips", all.x=T, sort=F)
county_dat<-merge(county_dat, spsegs, by.x="cofips", by.y="county", all.x=T, sort=F)
#have a look at the indices
head(county_dat)
## cofips dissim_wb isolation_b interaction_bw TheileH gnsi ntracts
## 1 48001 0.4377915 0.307966255 0.5198444 0.11995495 0.5544800 11
## 2 48003 0.2227811 0.018252381 0.7840112 0.01286551 0.1046631 4
## 3 48005 0.4646370 0.290047066 0.5427863 0.13729605 0.3894168 17
## 4 48007 0.2247782 0.018043037 0.8582850 0.01428389 0.1582254 6
## 5 48009 0.3273754 0.006946279 0.9498711 0.03167802 0.4966145 3
## 6 48011 0.0000000 0.005786428 0.9331931 0.00000000 0.0000000 1
## theil dissim diversity isolation interaction
## 1 -1.709204e-02 0.037536406 -2.350235e-02 0.21092296 0.20177038
## 2 -1.256640e-05 0.004652813 -1.743278e-05 0.49123769 0.49056795
## 3 -1.741791e-02 0.032517304 -2.390528e-02 0.23896052 0.23292235
## 4 -2.926769e-03 0.017826152 -4.056732e-03 0.25805177 0.25541058
## 5 -6.158238e-03 0.042494147 -8.782818e-03 0.07363840 0.07080627
## 6 -2.220446e-16 0.000000000 0.000000e+00 0.09379885 0.09379885
#get a numeric summary of them
summary(county_dat)
## cofips dissim_wb isolation_b interaction_bw
## 48001 : 1 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 48003 : 1 1st Qu.:0.1054 1st Qu.:0.01492 1st Qu.:0.6910
## 48005 : 1 Median :0.2642 Median :0.05619 Median :0.7837
## 48007 : 1 Mean :0.2586 Mean :0.09986 Mean :0.7569
## 48009 : 1 3rd Qu.:0.4034 3rd Qu.:0.14921 3rd Qu.:0.8671
## 48011 : 1 Max. :0.7229 Max. :0.55264 Max. :0.9499
## (Other):248
## TheileH gnsi ntracts theil
## Min. :0.000000 Min. :0.0000 Min. : 1.00 Min. :-0.0473360
## 1st Qu.:0.005786 1st Qu.:0.0000 1st Qu.: 2.00 1st Qu.:-0.0051392
## Median :0.031409 Median :0.2950 Median : 5.00 Median : 0.0000000
## Mean :0.046053 Mean :0.2807 Mean : 20.73 Mean :-0.0019570
## 3rd Qu.:0.069289 3rd Qu.:0.4963 3rd Qu.: 10.00 3rd Qu.: 0.0008148
## Max. :0.335423 Max. :0.8226 Max. :786.00 Max. : 0.0924983
## NA's :1
## dissim diversity isolation interaction
## Min. :0.000000 Min. :-0.062936 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.007905 1st Qu.:-0.007055 1st Qu.:0.1685 1st Qu.:0.1644
## Median :0.018437 Median : 0.000000 Median :0.2595 Median :0.2550
## Mean :0.023783 Mean :-0.002927 Mean :0.3380 Mean :0.3332
## 3rd Qu.:0.033217 3rd Qu.: 0.001101 3rd Qu.:0.4878 3rd Qu.:0.4831
## Max. :0.290177 Max. : 0.098795 Max. :0.9878 Max. :0.9873
##
#Now we need to do some visualization of the indices
#load a US County Shapefile
#the texas county SF1 data is in texas.county10, it also has the county polygons
data("texas.county10")
#I only want the identifiers, so I get rid of the SF1 data
tx_sub<-texas.county10[, c(1:6)]
#I put the attribute table into a new object
#I merge the county attribute table to the segregation indices by county FIPS code. I am careful here not to re-sort the data, they need to be in the same order as the shapefile, and I want to keep all counties, even if they're missing the indices (which none are in this case)
tx_sub<-merge(x=tx_sub, y=county_dat, by.x="fips", by.y="cofips", all.x=T, sort=F)
Now, I map some of the indices:
#Next i plot the indices, using quantile breaks, and using a nice color ramp from ColorBrewer
cols<-brewer.pal(n=5, name="Reds")
spplot(obj=tx_sub, zcol="dissim_wb",at=quantile(tx_sub$dissim_wb, probs=c(0,.25, .5, .75, 1)), col.regions=cols , main="White-Black Dissimiliarity, 2010")
spplot(obj=tx_sub, zcol="TheileH",at=quantile(tx_sub$TheileH, probs=c(0,.25, .5, .75, 1)), col.regions=cols , main="Three-Race Theile H Index, 2010")
spplot(obj=tx_sub, zcol="gnsi",at=quantile(tx_sub$gnsi, probs=c(seq(0,1,length.out = 7)), na.rm=T), col.regions=brewer.pal(n=7, name="Reds") , main="General Neighborhood Sorting Index, 2010")
spplot(tx_sub, zcol="isolation",at=quantile(tx_sub$isolation, probs=c(seq(0,1,length.out = 7)), na.rm=T), col.regions=brewer.pal(n=7, name="Reds") , main="Hispanic Isolation Index, 2010")
spplot(tx_sub, zcol="interaction",at=quantile(tx_sub$interaction, probs=c(seq(0,1,length.out = 7)), na.rm=T), col.regions=brewer.pal(n=7, name="Reds") , main="Hispanic - White Interaction Index, 2010")
You could of course, export these or merge them into a individual level data file to include in a hierarchical model. I did this for my coauthors in this and this paper.