Some background on segregation

The index of dissimilarity has long been the industry standard for measuring segregation. Since 1976, a host of new measures, rediscovered old indices, and a variety of definitions for segregation have been proposed. There is little agreement about which measure to use under which circumstances Massey and Denton (1988) attempted to quell the disagreement by incorporating many indices under one conceptual framework.

Methodological issues in the measurement of spatial segregation Segregation can be thought of as the extent to which individuals of different groups occupy or experience different social environments. A measure of segregation, then, requires that we define the social environment of each individual that we quantify the extent to which these social environments differ across individuals.

The Dimensions of Residential Segregation Segregation can be thought of as the degree to which two or more groups live separately from one another. Living apart could imply that groups are segregated in a variety of ways. Researchers argue for the adoption of one index and exclude others—fruitless according to Massey and Denton. Massey and Denton (1988) identify 5 distinct dimensions of residential segregation:

Evenness is the degree to which the percentage of minority members within residential areas approaches the minority percentage of the entire neighborhood

Exposure is the degree of potential contact between minority and majority members in neighborhoods

Concentration is the relative amount of physical space occupied by a minority group

Centralization is the degree to which minority members settle in and around the center of a neighborhood

Clustering is the extent to which minority areas adjoin one another in space

As an excellent reference for formulas, consult the US Census Bureau’s report on segregation measures

Doing this in R

First we need to load some libraries. We will use R to get all of our census data for us, either from the 2010 100% Summary File 1 (via the UScensus2010 library suite), or the acs package.

library(spdep)
library(seg)
library(UScensus2010)
library(UScensus2010tract)
## 
## UScensus2010tract: US Census 2010 Tract Level Shapefiles and Additional Demographic
## Data 
## Version 1.00 created on 2011-11-06 
## copyright (c) 2011, Zack W. Almquist, University of California-Irvine
## Type help(package="UScensus2010tract") to get started.
## 
## For citation information, type citation("UScensus2010tract").
library(UScensus2010county)
## 
## UScensus2010county: US Census 2010 County Level Shapefiles and Additional
## Demographic Data 
## Version 1.00 created on 2011-11-06 
## copyright (c) 2011, Zack W. Almquist, University of California-Irvine
## Type help(package="UScensus2010county") to get started.
## 
## For citation information, type citation("UScensus2010county").
library(acs)
library(sp)
library(RColorBrewer)

Before we can use the SF 1 data, you must install the various geographies you want to use to make your segregation indices. I’ll be using tracts, summed up to counties. So, i’ll do: install.tract("osx") and install.county("osx") since i’m on a Mac. I’ve had bad luck using install.*("windows") on my Windows box, and typically use install.*("linux") on either linux or windows.

The first example mimics that from Sparks (2014), Where I show how to calculate three common measures of segregation: Dissimiliarity, Isolation/Interaction and Theile’s Entropy index for multiple groups.

The general process is 1. Create a lower-level unit data set, that has your populations subdivided by race, SES, etc. Here i’m using census tracts. + 1a do any calculations, such as percentages on this data set. 2. Sum the populations up to the higher level unit, such as the county below + 2a Again, do any calculations you need to do 3. Use the formula to put the various parts together. 4. Use tapply() to sum your index across lower level units within higher level units 5. congratulate yourself.

#create a set of three race variables, total, white and black for each census tract in the state of Texas from the 2010 SF1 data
mydem<-demographics(dem=c("P0030001", "P0030002", "P0030003","P0030004","P0030005","P0030006","P0030007","P0030008"), state="48", statefips=TRUE, level="tract")

#assemble a data frame with sensical names and create a county FIPS code for each tract
#First, I assemble the "other" race group, which in the SF1 is composed of 5 tables P0030004 to P0030008
other<-data.frame(cofips=substr(rownames(mydem), 1,5),fips=rownames(mydem),oth1=unlist(mydem[1,"P0030004"]),
                  oth2=unlist(mydem[1,"P0030005"]),
                  oth3=unlist(mydem[1,"P0030006"]),
                  oth4=unlist(mydem[1,"P0030007"]),
                  oth5=unlist(mydem[1,"P0030008"]))

#build the tract population data
trdat<-data.frame(cofips=substr(rownames(mydem), 1,5),fips=rownames(mydem), total=unlist(mydem[1,"P0030001"]), white=unlist(mydem[1,"P0030002"]), black=unlist(mydem[1,"P0030003"]),other=apply(other[,3:7],1,sum))

#sort the data by county and tract
trdat<-trdat[order(trdat$cofips, trdat$fips),]
#look at the first few cases
head(trdat)
##      cofips        fips total white black other
## 4686  48001 48001950100  4685  4012   452   221
## 4676  48001 48001950401  5422  1825  2266  1331
## 4677  48001 48001950402  7535  2591  3248  1696
## 4683  48001 48001950500  4377  2737   800   840
## 4682  48001 48001950600  6405  3831  1674   900
## 4681  48001 48001950700  2640  1051  1164   425
#We need the county-level totals for the total population and each race group
co_total<-tapply(trdat$total, trdat$cofips, sum)
co_total<-data.frame(cofips=names(unlist(co_total)), pop=unlist(co_total))
co_wht<-tapply(trdat$white, trdat$cofips, sum)
co_wht<-data.frame(cofips=names(unlist(co_wht)), pop=unlist(co_wht))
co_blk<-tapply(trdat$black, trdat$cofips, sum)
co_blk<-data.frame(cofips=names(unlist(co_blk)), pop=unlist(co_blk))
co_oth<-tapply(trdat$other, trdat$cofips, sum)
co_oth<-data.frame(cofips=names(unlist(co_oth)), pop=unlist(co_oth))

#For the multi-group measure of segregation, we also need population proportions
c_pwhite<-co_wht$pop/co_total$pop
c_pblack<-co_blk$pop/co_total$pop
c_pother<-co_oth$pop/co_total$pop

#we assemble them into a county-level data frame with easy names
county_dat<-data.frame(cofips=co_total$cofips, co_total=co_total$pop, co_wht_total=co_wht$pop, co_blk_total=co_blk$pop, co_oth_total=co_oth$pop, c_pwhite=c_pwhite, c_pblack=c_pblack, c_pother=c_pother)

#Now I make the county-level Entropy measure to be used later, it's easier to do it before
#merging back to the tracts.
county_dat$c_ent<-county_dat$c_pwhite*(log(1/county_dat$c_pwhite))+county_dat$c_pblack*(log(1/county_dat$c_pblack))+county_dat$c_pother*(log(1/county_dat$c_pother))
county_dat$c_ent<-ifelse(is.na(county_dat$c_ent)==T, 0,county_dat$c_ent)


#we merge the county data back to the tract data by the county FIPS code
merged<-merge(x=trdat,y=county_dat, by="cofips", all.x=T )
#have a look and make sure it looks ok
head(merged)
##   cofips        fips total white black other co_total co_wht_total
## 1  48001 48001950100  4685  4012   452   221    58458        38632
## 2  48001 48001950401  5422  1825  2266  1331    58458        38632
## 3  48001 48001950402  7535  2591  3248  1696    58458        38632
## 4  48001 48001950500  4377  2737   800   840    58458        38632
## 5  48001 48001950600  6405  3831  1674   900    58458        38632
## 6  48001 48001950700  2640  1051  1164   425    58458        38632
##   co_blk_total co_oth_total  c_pwhite  c_pblack  c_pother     c_ent
## 1        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 2        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 3        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 4        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 5        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 6        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364

That is the mechanics for setting up the data. Now we will calculate the indices: ###2 group segregation measures Now we begin the segregation calculations, First we calculate the tract-specific contribution to the county dissimilarity index, then we use the tapply() function to sum the tract-specific contributions within counties. The Dissimiliarity index for blacks and whites is:

\[D = .5* \sum_i \left | \frac{b_i}{B} - \frac{w_i}{W} \right |\] where \(b_i\) is the number of blacks in each tract, B is the number of blacks in the county, \(w_i\) is the number of whites in the tract, and W is the number of whites in the county.

merged$d.wb<-(abs(merged$white/merged$co_wht_total - merged$black/merged$co_blk_total))

The county-level dissimilarity index is just the sum of the tract values within a county, this is easily done with the tapply() function, which in this case will sum the tract-specific contributions to the index within each county.

dissim.wb.tr<-.5*tapply(merged$d.wb, merged$cofips, sum, na.rm=T)
hist(dissim.wb.tr)

Next is the interaction index for blacks and white. In the calculation first population is minority population, second is non-minority population. The formula is: \[\text{Interaction} = \sum_i \frac{b_i}{B} * \frac{w_i}{t_i} \]

merged$int.wb<-(merged$black/merged$co_blk_total * merged$white/merged$total)
#Again, we sum these within each county
int.wb.tr<-tapply(merged$int.wb, merged$cofips, sum, na.rm=T)
head(int.wb.tr)
##     48001     48003     48005     48007     48009     48011 
## 0.5198444 0.7840112 0.5427863 0.8582850 0.9498711 0.9331931
hist(int.wb.tr)

Next is is the isolation index for blacks. The formula is:

\[\text{Interaction} = \sum_i \frac{b_i}{B} * \frac{b_i}{t_i} \]

merged$iso.b<-(merged$black/merged$co_blk_total * merged$black/merged$total)
#Again, we sum these within each county
isol.b.tr<-tapply(merged$iso.b, merged$cofips, sum, na.rm=T)
head(isol.b.tr)
##       48001       48003       48005       48007       48009       48011 
## 0.307966255 0.018252381 0.290047066 0.018043037 0.006946279 0.005786428
hist(isol.b.tr)

So, those are the typical dissimilarity, interaction and isolation indices for two groups.

Multi-group segregation

Next, we calculate Theile’s multi-group Entropy index using whites, blacks and all others, but of course we could break this into as many groups as you want, with only a few more lines of code. This is a more complicated formula, and is:

\[E = \sum_i \left [\frac{ t_i(E - E_i)}{ET} \right ]\] Where \[E_i = p_i \text{ln} \left (\frac{1}{p_i}\right)+(1-p_i) \text{ln} \left ( \frac{1}{1-p_i} \right)\] and \[E = P \text{ln} \left (\frac{1}{P}\right) + (1-P) \text(ln) \left ( \frac{1}{1-P} \right)\]

#Just like for the counties, we also need population proportions at the tract level
merged$tr_pwhite<-merged$white/merged$total
merged$tr_pblk<-merged$black/merged$total
merged$tr_pother<-merged$other/merged$total

#This is the tract-level entropy measure
merged$tr_ent<-merged$tr_pwhite*(log(1/merged$tr_pwhite))+merged$tr_pblk*(log(1/merged$tr_pblk))+merged$tr_pother*(log(1/merged$tr_pother))
merged$tr_ent<-ifelse(is.na(merged$tr_ent)==T, 0,merged$tr_ent)

head(merged)
##   cofips        fips total white black other co_total co_wht_total
## 1  48001 48001950100  4685  4012   452   221    58458        38632
## 2  48001 48001950401  5422  1825  2266  1331    58458        38632
## 3  48001 48001950402  7535  2591  3248  1696    58458        38632
## 4  48001 48001950500  4377  2737   800   840    58458        38632
## 5  48001 48001950600  6405  3831  1674   900    58458        38632
## 6  48001 48001950700  2640  1051  1164   425    58458        38632
##   co_blk_total co_oth_total  c_pwhite  c_pblack  c_pother     c_ent
## 1        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 2        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 3        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 4        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 5        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
## 6        12310         7516 0.6608505 0.2105785 0.1285709 0.8655364
##          d.wb     int.wb       iso.b tr_pwhite    tr_pblk  tr_pother
## 1 0.067133614 0.03144356 0.003542495 0.8563501 0.09647812 0.04717182
## 2 0.136837356 0.06195912 0.076931154 0.3365917 0.41792696 0.24548137
## 3 0.196781777 0.09072816 0.113734109 0.3438620 0.43105508 0.22508295
## 4 0.005860187 0.04063780 0.011878056 0.6253141 0.18277359 0.19191227
## 5 0.036820508 0.08133742 0.035541334 0.5981265 0.26135831 0.14051522
## 6 0.067351845 0.03764382 0.041691160 0.3981061 0.44090909 0.16098485
##      tr_ent
## 1 0.5024684
## 2 1.0759163
## 3 1.0654821
## 4 0.9210035
## 5 0.9338686
## 6 1.0217681
#Now I calculate each tract's contribution to the H index
merged$Hcalc<-(merged$total*(merged$c_ent-merged$tr_ent))/(merged$co_total*merged$c_ent)
merged$Hcalc<-ifelse(is.na(merged$Hcalc)==T, 0, merged$Hcalc)

#Then, we sum across tracts within counties to get the H index for each county
hindex<-unlist(tapply(merged$Hcalc, merged$cofips, sum))
head(hindex)
##      48001      48003      48005      48007      48009      48011 
## 0.11995495 0.01286551 0.13729605 0.01428389 0.03167802 0.00000000
hist(hindex)

Economic Segregation

The Neighborhood sorting index is one measure of economic segregation that uses incomes within areas, related to a larger area (such as tracts within metros) Finally, I do the GNSI index for incomes. For a reference, consult this working paper .

This is an approximate index, since the original paper, I think, assumes you have individual household incomes with geographic locations, which I don’t have. Instead I relate tract median household income to the county’s median household income.

We use the acs package to extract our data, to do this, you must have registered with the Census Bureau’s API service here. I use the 2010 5-year ACS summary file. For reference, here is a nice list of tables in the ACS.

I don’t show you my actual key, but you can enter yours like: mykey<-"theykeycensussendsyou"

####Jargowski GNSI

#Get 2010 ACS median household incomes for tracts in Texas
txtracts<-geo.make(state="TX",county="*", tract="*")

income.tract<-acs.fetch(key=mykey,endyear = 2010, span = 5, geography = txtracts, table.number = "B19013")
## Warning: NAs introduced by coercion
#prepare the income data
tx.trincome<-data.frame(income=income.tract@estimate)
tx.trincome$ids<-paste(income.tract@geography$state, sprintf("%03d", income.tract@geography$county),sprintf("%05s", income.tract@geography$tract), sep="")
names(tx.trincome)<-c("medhhincome", "tractfips")
tx.trincome$cofips<-substr(tx.trincome$tractfips, 1,5)
head(tx.trincome)
##                                              medhhincome   tractfips
## Census Tract 9501, Anderson County, Texas          46027 48001950100
## Census Tract 9504.01, Anderson County, Texas       61442 48001950401
## Census Tract 9504.02, Anderson County, Texas       61538 48001950402
## Census Tract 9505, Anderson County, Texas          32529 48001950500
## Census Tract 9506, Anderson County, Texas          40409 48001950600
## Census Tract 9507, Anderson County, Texas          30245 48001950700
##                                              cofips
## Census Tract 9501, Anderson County, Texas     48001
## Census Tract 9504.01, Anderson County, Texas  48001
## Census Tract 9504.02, Anderson County, Texas  48001
## Census Tract 9505, Anderson County, Texas     48001
## Census Tract 9506, Anderson County, Texas     48001
## Census Tract 9507, Anderson County, Texas     48001
#average incomes at the county level
tx.coicome<-tapply(tx.trincome$medhhincome, tx.trincome$cofips, mean, na.rm=T)
tx.coicome<-data.frame(cofips=rownames(tx.coicome), countyincome=unlist(tx.coicome))
head(tx.coicome)
##       cofips countyincome
## 48001  48001     43519.36
## 48003  48003     53190.00
## 48005  48005     40554.88
## 48007  48007     44245.60
## 48009  48009     53552.00
## 48011  48011     60530.00
mergetractcodat<-merge(tx.trincome, tx.coicome, by.x="cofips", by.y="cofips", all.x=T)

mergetractcodat$difftrco<-mergetractcodat$medhhincome-mergetractcodat$countyincome

#here I find the average tract median household income within a county
meanincs<-unlist(tapply(as.numeric(as.character(mergetractcodat$medhhincome)), mergetractcodat$cofips, mean, na.rm=T))
meanincs<-data.frame(ids=as.character(names(meanincs)), comeaninc=meanincs)

#merge tract data to average county income
mergetractcodat2<-merge(mergetractcodat, meanincs, by.x="cofips", by.y="ids", all.x=T, sort=F)

#calculate the difference between each tract income and the county average
mergetractcodat2$dev<-mergetractcodat2$medhhincome-mergetractcodat2$comeaninc
mergetractcodat2$dev<-ifelse(is.na(mergetractcodat2$dev)==T, 0, mergetractcodat2$dev)

#Texas county shapefile, this comes from the USCensus2010tract library
data("texas.tract10")
datna<-texas.tract10[,c(1:5)]
head(datna@data)  
##                 state county  tract        fips P0010001
## texas.tract10_0    48    113 014133 48113014133     3909
## texas.tract10_1    48    113 980000 48113980000        0
## texas.tract10_2    48    113 014134 48113014134     4065
## texas.tract10_3    48    113 014135 48113014135     4451
## texas.tract10_4    48    113 014131 48113014131     3811
## texas.tract10_5    48    113 014132 48113014132     2169
datna@data<-merge(datna@data,mergetractcodat2,by.x="fips", by.y="tractfips", all.x=T, sort=F  )
head(datna@data)           
##          fips state county  tract P0010001 cofips medhhincome countyincome
## 1 48113014133    48    113 014133     3909  48113       38344     54715.38
## 2 48113980000    48    113 980000        0  48113          NA     54715.38
## 3 48113014134    48    113 014134     4065  48113      124196     54715.38
## 4 48113014135    48    113 014135     4451  48113       74583     54715.38
## 5 48113014131    48    113 014131     3811  48113       53036     54715.38
## 6 48113014132    48    113 014132     2169  48113       69048     54715.38
##     difftrco comeaninc        dev
## 1 -16371.381  54715.38 -16371.381
## 2         NA  54715.38      0.000
## 3  69480.619  54715.38  69480.619
## 4  19867.619  54715.38  19867.619
## 5  -1679.381  54715.38  -1679.381
## 6  14332.619  54715.38  14332.619

So, again that step was just putting our data in the right format, tracts merged with county information.

Now we can calculate the GNSI. Again, this is a custom written function and it works, don’t tinker too much unless you have to. Basically, this function looks at the number of polygons within a county (# of tracts within a county). If there is 1, then the GNSI is 0. If there are 2 tracts then the spatial weight matrix is: \[W= \begin{bmatrix} .5& .5\\ .5 & .5 \end{bmatrix}\]

Otherwise, the spatial weight matrix will be calculated as normal.

As described in the working paper above, the GNSI can be written very simply, if you use the deviations of each tract from the larger are, and the spatial weight matrix, \(W\).

\[GNSI = \left ( \frac{y'W_k'W_k y}{y'y} \right) ^.5\]

here, \(y\) is the deviation of each tract from the larger area, and \(W\) is the spatial weight matrix for the tracts within a county. Note, that in this weight matrix, I use the tracts themselves in the calculations, so the diagonal of \(W\) is 1.

GNSI<-function(datna){
  ncos<-length(unique(datna$county))
  counties<-unique(datna$county)
  res<-numeric(ncos)
  
  for (i in 1:ncos){
    #see if there is 1 tract in county == 0 spatial weight
    if(length(datna@polygons[datna$county==counties[i]])==1){
      res[i]<-0
    }  else #see if there are 2 tracst in county == .5 spatial weight
      if(length(datna@polygons[datna$county==counties[i]])==2){
        wts<-matrix(c(.5,.5,.5,.5), ncol=2, nrow=2)
        y<-data.matrix(as.numeric(as.character(datna$dev[datna$county==counties[i]])))
        y<-ifelse(is.na(y),0,y)
        num<-t(y)%*%t(wts)%*%wts%*%y
        denom<-t(y)%*%y
        res[i]<-(num/denom)^.5   
      }  else #if there are more than 2 tracts, I calculate the full blown queen contiguity matrix
        if(length(datna@polygons[datna$county==counties[i]])>2){
          nbs<-poly2nb(datna[datna$county==counties[i],],queen=T) 
          wts<-data.matrix(nb2mat(nbs, style="B", zero.policy=T))
          ntr<-length(datna@polygons[datna$county==counties[i]])
          diag(wts)<-1
          wts<-wts/apply(wts, 1, sum)
          y<-data.matrix(as.numeric(as.character(datna$dev[datna$county==counties[i]])))
          y<-ifelse(is.na(y),0,y)
          num<-t(y)%*%t(wts)%*%wts%*%y
          denom<-t(y)%*%y
          res[i]<-(num/denom)^.5
          
        }
  #  print(i) #this prints the progress
    list(result=res)
  }
  res1<-numeric(ncos)
  for(i in 1:ncos){ res1[i]<-length(datna@polygons[datna$county==counties[i]]); res1[i]}
  
  countydat<-data.frame(county=counties, gnsi=res, ntracts=res1)
  return(countydat)
}

mycountyGNSI<-GNSI(datna)
mycountyGNSI$cofips<-paste("48",mycountyGNSI$county, sep="")
head(mycountyGNSI)
##   county         gnsi ntracts cofips
## 1    113 7.330552e-01     529  48113
## 2    227 4.134520e-01      10  48227
## 3    155 0.000000e+00       1  48155
## 4    153 0.000000e+00       2  48153
## 5    043 2.365989e-15       3  48043
## 6    159 4.855904e-01       3  48159
hist(mycountyGNSI$gnsi)

Spatial segregation measures

Reardon and O’Sullivan, 2004 describe how most measures of segregation may be made explicitly spatial. They are also kind enough to make an R library seg available to actually do this. Here, I will calculate some spatial segregation measures using the seg library. Below, I calculate the NH White/Hispanic spatial dissimiliarity,isolation, interaction and entropy index.

#ACS table B03002 is Hispanic ethnicity by race 
race.tract<-acs.fetch(key=mykey, endyear=2010, span=5, geography=txtracts, table.number="B03002")

colnames(race.tract@estimate)
##  [1] "B03002_001" "B03002_002" "B03002_003" "B03002_004" "B03002_005"
##  [6] "B03002_006" "B03002_007" "B03002_008" "B03002_009" "B03002_010"
## [11] "B03002_011" "B03002_012" "B03002_013" "B03002_014" "B03002_015"
## [16] "B03002_016" "B03002_017" "B03002_018" "B03002_019" "B03002_020"
## [21] "B03002_021"
tx.trrace<-data.frame( nhwhite= race.tract@estimate[, 3],
                       nhblack=race.tract@estimate[, 4], 
                       nhother=apply(race.tract@estimate[, c(5:11)], MARGIN = 1, sum ),
                       hispanic =race.tract@estimate[, 12])

tx.trrace$ids<-paste(race.tract@geography$state, sprintf("%03d", race.tract@geography$county),sprintf("%05s", race.tract@geography$tract), sep="")

tx.trrace$cofips<-substr(tx.trrace$ids, 1,5)

head(tx.trrace)
##                                              nhwhite nhblack nhother
## Census Tract 9501, Anderson County, Texas       4367     643      76
## Census Tract 9504.01, Anderson County, Texas    2545    3342     161
## Census Tract 9504.02, Anderson County, Texas    1224    1824     110
## Census Tract 9505, Anderson County, Texas       1826    1229      77
## Census Tract 9506, Anderson County, Texas       3121    1194     352
## Census Tract 9507, Anderson County, Texas       1120    1363      40
##                                              hispanic         ids cofips
## Census Tract 9501, Anderson County, Texas         138 48001950100  48001
## Census Tract 9504.01, Anderson County, Texas     2098 48001950401  48001
## Census Tract 9504.02, Anderson County, Texas      647 48001950402  48001
## Census Tract 9505, Anderson County, Texas        1463 48001950500  48001
## Census Tract 9506, Anderson County, Texas        1535 48001950600  48001
## Census Tract 9507, Anderson County, Texas         749 48001950700  48001
race.sp<-texas.tract10[,c(1:4)]
head(race.sp@data)  
##                 state county  tract        fips
## texas.tract10_0    48    113 014133 48113014133
## texas.tract10_1    48    113 980000 48113980000
## texas.tract10_2    48    113 014134 48113014134
## texas.tract10_3    48    113 014135 48113014135
## texas.tract10_4    48    113 014131 48113014131
## texas.tract10_5    48    113 014132 48113014132
race.sp<-merge(race.sp,tx.trrace,by.x="fips", by.y="ids", all.x=T, sort=F  )
head(race.sp@data)           
##             fips state county  tract nhwhite nhblack nhother hispanic
## 1323 48113014133    48    113 014133    1166    1451     751      739
## 1563 48113980000    48    113 980000       0       0       0        0
## 1324 48113014134    48    113 014134    2390      21    1308      154
## 1325 48113014135    48    113 014135    2839     455     879      525
## 1321 48113014131    48    113 014131    1813    1242     926      472
## 1322 48113014132    48    113 014132     906     555     678      224
##      cofips
## 1323  48113
## 1563  48113
## 1324  48113
## 1325  48113
## 1321  48113
## 1322  48113

Calculate the spatial segregation indices

The spseg() function calculates the spatial versions of these measures, but it assumes you want to do this for a single area. We want to do this for many areas, so I write a function that will loop over all counties and return the indices for each county.

library(seg)

dat<-race.sp #spatialpolygondataframe with the race data
cos<-unique(dat$cofips) #unique county fips codes

spsegout<-function(dat, cos){
  theil<-numeric(length(cos))
  diss<-numeric(length(cos))
  div<-numeric(length(cos))
  isol<-numeric(length(cos))
  inter<-numeric(length(cos))
  county<-NULL
for (i in 1:length(cos)){
  #Modify the two groups in the next line, this usees NH White and Hispanic
  out<-spseg(dat[dat$cofips==cos[i],],data =dat@data[dat$cofips==cos[i],c("nhwhite","hispanic")],  method="all", useC=T, negative.rm=T)
  
  div[i]<-out@r #Diversity
  diss[i]<-out@d #Dissimilarity
 theil[i]<-out@h #Theil Entrophy
 isol[i]<-out@p[2,2] #Isolation for Hispanics
 inter[i]<-out@p[1,2] #Interaction for Whites and Hispanics
  county[i]<-as.character(cos[i])
}
list(theil=theil, dissim=diss, diversity=div,isolation=isol, interaction=inter, county=county)
}


spsegs<-data.frame(spsegout(dat,cos))

head(spsegs)
##           theil       dissim    diversity isolation interaction county
## 1  8.033904e-04 0.0228552227  0.001112563 0.5234456   0.5162524  48113
## 2 -2.646579e-03 0.0153709000 -0.003669424 0.4210627   0.4154946  48227
## 3  0.000000e+00 0.0000000000  0.000000000 0.2246596   0.2246596  48155
## 4  1.616968e-05 0.0008206314  0.000022407 0.5365595   0.5365566  48153
## 5  4.104968e-03 0.0933361597  0.005554689 0.4568777   0.4380643  48043
## 6 -1.502714e-02 0.0398262489 -0.020490717 0.1343850   0.1295927  48159

So, that gets us a bunch of stuff, now I merge these various indices into a single data frame so I can merge it to a shapefile:

#assemble the indices into a dataframe with nice names
county_dat<-data.frame(cofips=names(unlist(dissim.wb.tr)), dissim_wb=unlist(dissim.wb.tr), isolation_b=isol.b.tr, interaction_bw=int.wb.tr, TheileH=hindex)
county_dat<-merge(x=county_dat, y=mycountyGNSI[,-1], by="cofips", all.x=T, sort=F)
county_dat<-merge(county_dat, spsegs, by.x="cofips", by.y="county", all.x=T, sort=F)

#have a look at the indices
head(county_dat)
##   cofips dissim_wb isolation_b interaction_bw    TheileH      gnsi ntracts
## 1  48001 0.4377915 0.307966255      0.5198444 0.11995495 0.5544800      11
## 2  48003 0.2227811 0.018252381      0.7840112 0.01286551 0.1046631       4
## 3  48005 0.4646370 0.290047066      0.5427863 0.13729605 0.3894168      17
## 4  48007 0.2247782 0.018043037      0.8582850 0.01428389 0.1582254       6
## 5  48009 0.3273754 0.006946279      0.9498711 0.03167802 0.4966145       3
## 6  48011 0.0000000 0.005786428      0.9331931 0.00000000 0.0000000       1
##           theil      dissim     diversity  isolation interaction
## 1 -1.709204e-02 0.037536406 -2.350235e-02 0.21092296  0.20177038
## 2 -1.256640e-05 0.004652813 -1.743278e-05 0.49123769  0.49056795
## 3 -1.741791e-02 0.032517304 -2.390528e-02 0.23896052  0.23292235
## 4 -2.926769e-03 0.017826152 -4.056732e-03 0.25805177  0.25541058
## 5 -6.158238e-03 0.042494147 -8.782818e-03 0.07363840  0.07080627
## 6 -2.220446e-16 0.000000000  0.000000e+00 0.09379885  0.09379885
#get a numeric summary of them
summary(county_dat)
##      cofips      dissim_wb       isolation_b      interaction_bw  
##  48001  :  1   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  48003  :  1   1st Qu.:0.1054   1st Qu.:0.01492   1st Qu.:0.6910  
##  48005  :  1   Median :0.2642   Median :0.05619   Median :0.7837  
##  48007  :  1   Mean   :0.2586   Mean   :0.09986   Mean   :0.7569  
##  48009  :  1   3rd Qu.:0.4034   3rd Qu.:0.14921   3rd Qu.:0.8671  
##  48011  :  1   Max.   :0.7229   Max.   :0.55264   Max.   :0.9499  
##  (Other):248                                                      
##     TheileH              gnsi           ntracts           theil           
##  Min.   :0.000000   Min.   :0.0000   Min.   :  1.00   Min.   :-0.0473360  
##  1st Qu.:0.005786   1st Qu.:0.0000   1st Qu.:  2.00   1st Qu.:-0.0051392  
##  Median :0.031409   Median :0.2950   Median :  5.00   Median : 0.0000000  
##  Mean   :0.046053   Mean   :0.2807   Mean   : 20.73   Mean   :-0.0019570  
##  3rd Qu.:0.069289   3rd Qu.:0.4963   3rd Qu.: 10.00   3rd Qu.: 0.0008148  
##  Max.   :0.335423   Max.   :0.8226   Max.   :786.00   Max.   : 0.0924983  
##                     NA's   :1                                             
##      dissim           diversity           isolation       interaction    
##  Min.   :0.000000   Min.   :-0.062936   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.007905   1st Qu.:-0.007055   1st Qu.:0.1685   1st Qu.:0.1644  
##  Median :0.018437   Median : 0.000000   Median :0.2595   Median :0.2550  
##  Mean   :0.023783   Mean   :-0.002927   Mean   :0.3380   Mean   :0.3332  
##  3rd Qu.:0.033217   3rd Qu.: 0.001101   3rd Qu.:0.4878   3rd Qu.:0.4831  
##  Max.   :0.290177   Max.   : 0.098795   Max.   :0.9878   Max.   :0.9873  
## 
#Now we need to do some visualization of the indices
#load a US County Shapefile

#the texas county SF1 data is in texas.county10, it also has the county polygons
data("texas.county10")

#I only want the identifiers, so I get rid of the SF1 data
tx_sub<-texas.county10[, c(1:6)]
#I put the attribute table into a new object

#I merge the county attribute table to the segregation indices by county FIPS code. I am careful here not to re-sort the data, they need to be in the same order as the shapefile, and I want to keep all counties, even if they're missing the indices (which none are in this case)
tx_sub<-merge(x=tx_sub, y=county_dat, by.x="fips", by.y="cofips", all.x=T, sort=F)

Now, I map some of the indices:

#Next i plot the indices, using quantile breaks, and using a nice color ramp from ColorBrewer
cols<-brewer.pal(n=5, name="Reds")
spplot(obj=tx_sub, zcol="dissim_wb",at=quantile(tx_sub$dissim_wb, probs=c(0,.25, .5, .75, 1)), col.regions=cols , main="White-Black Dissimiliarity, 2010")

spplot(obj=tx_sub, zcol="TheileH",at=quantile(tx_sub$TheileH, probs=c(0,.25, .5, .75, 1)), col.regions=cols , main="Three-Race Theile H Index, 2010")

spplot(obj=tx_sub, zcol="gnsi",at=quantile(tx_sub$gnsi, probs=c(seq(0,1,length.out = 7)), na.rm=T), col.regions=brewer.pal(n=7, name="Reds") , main="General Neighborhood Sorting Index, 2010")

spplot(tx_sub, zcol="isolation",at=quantile(tx_sub$isolation, probs=c(seq(0,1,length.out = 7)), na.rm=T), col.regions=brewer.pal(n=7, name="Reds") , main="Hispanic Isolation Index, 2010")

spplot(tx_sub, zcol="interaction",at=quantile(tx_sub$interaction, probs=c(seq(0,1,length.out = 7)), na.rm=T), col.regions=brewer.pal(n=7, name="Reds") , main="Hispanic - White Interaction Index, 2010")

You could of course, export these or merge them into a individual level data file to include in a hierarchical model. I did this for my coauthors in this and this paper.