We saw last time how to use R to calculate various indices of segregation using data from the 2010 Census Summary File 1, and the 2010 ACS 5 year summary file. We will use the libraries acs
and seg
extensively, as well as the RQGIS
library to give us access to geoprocessing scripts from Qgis. I will also rely heavily on the tigris
package to download Census geographies directly into R, without having to download them seperately myself.
In the previous example, I used the nested GEOID variable that the Census provides in all of its summary data products. For that example, we used tracts nested within counties. This is easy because the tract GEOID variable has the county code nested within it. For example, the first tract in Bexar county, Texas is code 48029110100
, and the first five digits of this GEOID are the state and county fips code 48029
.
For other geographies, this is not as simple, as the GEOID variables are not nested in such a fashion. For example, Public Use Microdata areas (PUMAs) are nested within states, but cross county borders. So they do not have a 5 digit state/county FIPS code to nest tracts within.
In order to get around this problem, we can use the QGIS application to perform a geographic intersection between two spatial polygon layers that we want to use. This will allow us to merge the population data at the census tract level to a higher level, so we can calculate a segregation index.
Below, I show how to use R and Qgis to calculate segregation indices for PUMAs, school districts and core based statistical areas (CBSAs)
For this example, I will use the B03002 table from the 2015 5-year ACS summary file. This table contains counts of population by race and Hispanic ethnicity. I will extract census tracts within the state of Texas. These will be my smaller units that I will aggregate across, within the larger geographies to calculate the segregation measures.
library(rgdal)
library(tigris)
library(RColorBrewer)
library(RQGIS)
library(acs)
library(seg)
#Get 2010 ACS median household incomes for tracts in Texas
txtracts<-geo.make(state="TX",county="*", tract="*")
#ACS table B03002 is Hispanic ethnicity by race
race.tract<-acs.fetch(key=mykey, endyear=2014, span=5, geography=txtracts, table.number="B03002")
colnames(race.tract@estimate)
## [1] "B03002_001" "B03002_002" "B03002_003" "B03002_004" "B03002_005"
## [6] "B03002_006" "B03002_007" "B03002_008" "B03002_009" "B03002_010"
## [11] "B03002_011" "B03002_012" "B03002_013" "B03002_014" "B03002_015"
## [16] "B03002_016" "B03002_017" "B03002_018" "B03002_019" "B03002_020"
## [21] "B03002_021"
tx.trrace<-data.frame( nhwhite= race.tract@estimate[, "B03002_003"],
nhblack=race.tract@estimate[, "B03002_004"],
nhother=apply(race.tract@estimate[, c(5:11)], MARGIN = 1, sum ),
hispanic =race.tract@estimate[, "B03002_012"])
tx.trrace$ids<-paste(race.tract@geography$state,
sprintf("%03d", race.tract@geography$county),
sprintf("%05s", race.tract@geography$tract), sep="")
tx.trrace$cofips<-substr(tx.trrace$ids, 1,5)
head(tx.trrace)
## nhwhite nhblack nhother
## Census Tract 9501, Anderson County, Texas 3910 659 103
## Census Tract 9504.01, Anderson County, Texas 1959 2257 52
## Census Tract 9504.02, Anderson County, Texas 1753 2470 37
## Census Tract 9505, Anderson County, Texas 2267 932 105
## Census Tract 9506, Anderson County, Texas 3185 2138 214
## Census Tract 9507, Anderson County, Texas 556 1063 181
## hispanic ids cofips
## Census Tract 9501, Anderson County, Texas 486 48001950100 48001
## Census Tract 9504.01, Anderson County, Texas 1581 48001950401 48001
## Census Tract 9504.02, Anderson County, Texas 1337 48001950402 48001
## Census Tract 9505, Anderson County, Texas 1411 48001950500 48001
## Census Tract 9506, Anderson County, Texas 1038 48001950600 48001
## Census Tract 9507, Anderson County, Texas 880 48001950700 48001
Now, I will perform the calculations of segregation for the various geographies. All of these examples will follow the same pattern: * load higher level geographies from tigris
* Use QGIS to form the geographic intersection between the tracts and the higher level geographies. * Join the intersected data to the tracts using the tract GEOID * Calculate the segregation measures using a modification to the spseg()
function, so we can calculate segregation for each higher level geography separately * Merge the calculated segregation measures back to the higher level geographies and map them using spplot()
PUMAs are a very useful geography to work with, because we can get individual level data from the IPUMS on many topics, and we could include segregation as a higher level predictor in an analysis.
puma<-pumas(state = "TX")
tract<-tigris::tracts(state = "48", cb = FALSE, refresh=T)
myenv<-set_env()
## Warning: running command 'find /usr/local/Cellar/ -name 'QGIS.app'' had
## status 1
myenv
## $root
## [1] "/Applications/QGIS.app"
##
## $qgis_prefix_path
## [1] "/Applications/QGIS.app/Contents"
##
## $python_plugins
## [1] "/Applications/QGIS.app/Contents/Resources/python/plugins"
#union buffer and tracts
find_algorithms(search_term = "intersect", qgis_env = myenv)
## [1] "ERROR: Opening of authentication db FAILED"
## [2] "WARNING: Auth db query exec() FAILED"
## [3] "Intersection----------------------------------------->qgis:intersection"
## [4] "Line intersections----------------------------------->qgis:lineintersections"
## [5] "Fuzzy intersection (and)----------------------------->saga:fuzzyintersectionand"
## [6] "Intersect-------------------------------------------->saga:intersect"
## [7] "Line-polygon intersection---------------------------->saga:linepolygonintersection"
## [8] "Polygon self-intersection---------------------------->saga:polygonselfintersection"
## [9] "Polygon-line intersection---------------------------->saga:polygonlineintersection"
get_usage(alg="qgis:intersection", qgis_env = myenv, intern = T)
## [1] "ERROR: Opening of authentication db FAILED"
## [2] "WARNING: Auth db query exec() FAILED"
## [3] "ALGORITHM: Intersection"
## [4] "\tINPUT <ParameterVector>"
## [5] "\tINPUT2 <ParameterVector>"
## [6] "\tIGNORE_NULL <ParameterBoolean>"
## [7] "\tOUTPUT <OutputVector>"
## [8] ""
## [9] ""
## [10] ""
params <- get_args_man(alg = "qgis:intersection",
qgis_env = myenv)
params
## $INPUT
## [1] "None"
##
## $INPUT2
## [1] "None"
##
## $IGNORE_NULL
## [1] "False"
##
## $OUTPUT
## [1] "None"
wd<-"/Users/ozd504//Google Drive/classes/dem7263/class17/data"
params$INPUT <- puma
params$INPUT2<-tract
params$OUTPUT<-file.path(wd, "puma_tract.shp")
# path to the output shapefile
pumatract <- run_qgis(alg = "qgis:intersection",
params = params,
load_output = params$OUTPUT,
qgis_env = myenv)
## ERROR: Opening of authentication db FAILEDWARNING: Auth db query exec() FAILED
cols<-brewer.pal(n=12 , name="Set3")
cols<-sample(cols, size = length(unique(pumatract$PUMACE10)), replace = T)
plot(pumatract, col=cols[as.numeric(pumatract$PUMACE10)], border=grey(.9))
race.sp<-geo_join(pumatract,tx.trrace, "GEOID","ids")
head(race.sp@data)
## STATEFP10 PUMACE10 GEOID10
## 0 48 03601 4803601
## 1 48 03601 4803601
## 2 48 03601 4803601
## 3 48 03601 4803601
## 4 48 03601 4803601
## 5 48 03601 4803601
## NAMELSAD10 MTFCC10
## 0 Brazos Valley COG (Outside Brazos County) & Milam County PUMA G6120
## 1 Brazos Valley COG (Outside Brazos County) & Milam County PUMA G6120
## 2 Brazos Valley COG (Outside Brazos County) & Milam County PUMA G6120
## 3 Brazos Valley COG (Outside Brazos County) & Milam County PUMA G6120
## 4 Brazos Valley COG (Outside Brazos County) & Milam County PUMA G6120
## 5 Brazos Valley COG (Outside Brazos County) & Milam County PUMA G6120
## FUNCSTAT10 ALAND10 AWATER10 INTPTLAT10 INTPTLON10 STATEFP
## 0 S 14147111500 202332263 +30.8493793 -096.4762632 48
## 1 S 14147111500 202332263 +30.8493793 -096.4762632 48
## 2 S 14147111500 202332263 +30.8493793 -096.4762632 48
## 3 S 14147111500 202332263 +30.8493793 -096.4762632 48
## 4 S 14147111500 202332263 +30.8493793 -096.4762632 48
## 5 S 14147111500 202332263 +30.8493793 -096.4762632 48
## COUNTYFP TRACTCE GEOID NAME NAMELSAD MTFCC FUNCSTAT
## 0 051 970400 48051970400 9704 Census Tract 9704 G5020 S
## 1 051 970500 48051970500 9705 Census Tract 9705 G5020 S
## 2 051 970100 48051970100 9701 Census Tract 9701 G5020 S
## 3 051 970300 48051970300 9703 Census Tract 9703 G5020 S
## 4 051 970200 48051970200 9702 Census Tract 9702 G5020 S
## 5 289 950100 48289950100 9501 Census Tract 9501 G5020 S
## ALAND AWATER INTPTLAT INTPTLON nhwhite nhblack nhother
## 0 538431560 8650811 +30.4627961 -096.4513889 1904 574 0
## 1 240096871 25958216 +30.3576068 -096.6084742 3023 524 200
## 2 254928575 3613845 +30.6293963 -096.6280153 942 100 107
## 3 26674773 265988 +30.5330133 -096.7009181 2554 713 78
## 4 646752996 7561628 +30.5083502 -096.7671764 3080 125 272
## 5 834389995 5216547 +31.4641407 -095.9191957 5175 622 112
## hispanic ids cofips
## 0 724 48051970400 48051
## 1 737 48051970500 48051
## 2 78 48051970100 48051
## 3 959 48051970300 48051
## 4 839 48051970200 48051
## 5 1640 48289950100 48289
dat<-race.sp #spatialpolygondataframe with the race data
cos<-unique(dat$PUMACE10) #unique county fips codes
spsegout<-function(dat, cos){
theil<-numeric(length(cos))
diss<-numeric(length(cos))
div<-numeric(length(cos))
isol<-numeric(length(cos))
inter<-numeric(length(cos))
pumace<-NULL
for (i in 1:length(cos)){
#Modify the two groups in the next line, this usees NH White and Hispanic
out<-spseg(dat[dat$PUMACE10==cos[i],],data =dat@data[dat$PUMACE10==cos[i],c("nhwhite", "hispanic")], method="all", useC=T, negative.rm = T)
div[i]<-out@r #Diversity
diss[i]<-out@d #Dissimilarity
theil[i]<-out@h #Theil Entrophy
isol[i]<-out@p[2,2] #Isolation for Hispanics
inter[i]<-out@p[1,2] #Interaction for Whites and Hispanics
pumace[i]<-as.character(cos[i])
}
list(theil=theil, dissim=diss, diversity=div,isolation=isol, interaction=inter,puma=pumace)
}
spsegs<-data.frame(spsegout(dat,cos))
head(spsegs)
## theil dissim diversity isolation interaction puma
## 1 0.0022004524 0.032573021 0.0027150884 0.2326321 0.2268293 03601
## 2 -0.0104645369 0.032606115 -0.0144918578 0.3538277 0.3471275 04501
## 3 -0.0005585957 0.011015954 -0.0007835573 0.2567259 0.2557366 04504
## 4 0.0004439682 0.025233774 0.0006152952 0.5097524 0.4987791 04625
## 5 0.0090911710 0.015208613 0.0118778626 0.8959877 0.8922094 04602
## 6 0.0001228756 0.002442282 0.0001572151 0.9128087 0.9126782 04618
puma2<-geo_join(puma, spsegs, "PUMACE10", "puma")
#Map the index
spplot(puma2, "interaction", at=seq(0,1, .2), col.regions=brewer.pal(n=5, "Blues"), main="Hispanic/Non Hispanic Interaction index by PUMAs within Texas")
spplot(puma2, "isolation", at=seq(0,1,.2), col.regions=brewer.pal(n=5, "Blues"), main="Hispanic Isolation index by PUMAs within Texas")
district<-school_districts(state = "TX", type="unified")
params <- get_args_man(alg = "qgis:intersection",
qgis_env = myenv)
params
## $INPUT
## [1] "None"
##
## $INPUT2
## [1] "None"
##
## $IGNORE_NULL
## [1] "False"
##
## $OUTPUT
## [1] "None"
params$INPUT <- district
params$INPUT2<-tract
params$OUTPUT<-file.path(wd, "schooldistrict_tract.shp")
# path to the output shapefile
sdtract <- run_qgis(alg = "qgis:intersection",
params = params,
load_output = params$OUTPUT,
qgis_env = myenv)
## ERROR: Opening of authentication db FAILEDWARNING: Auth db query exec() FAILED
cols<-brewer.pal(n=12 , name="Set3")
cols<-sample(cols, size = length(unique(sdtract$UNSDLEA)), replace = T)
plot(sdtract, col=cols[as.numeric(sdtract$UNSDLEA)], border=grey(.9))
race.sd<-geo_join(sdtract,tx.trrace,"geoid_2","ids" )
head(race.sd@data)
## STATEFP UNSDLEA GEOID NAME LSAD
## 0 48 32280 4832280 Nederland Independent School District 00
## 1 48 32280 4832280 Nederland Independent School District 00
## 2 48 32280 4832280 Nederland Independent School District 00
## 3 48 32280 4832280 Nederland Independent School District 00
## 4 48 32280 4832280 Nederland Independent School District 00
## 5 48 32280 4832280 Nederland Independent School District 00
## LOGRADE HIGRADE MTFCC SDTYP FUNCSTAT ALAND AWATER INTPTLAT
## 0 PK 12 G5420 <NA> E 45411569 3622561 +29.9772516
## 1 PK 12 G5420 <NA> E 45411569 3622561 +29.9772516
## 2 PK 12 G5420 <NA> E 45411569 3622561 +29.9772516
## 3 PK 12 G5420 <NA> E 45411569 3622561 +29.9772516
## 4 PK 12 G5420 <NA> E 45411569 3622561 +29.9772516
## 5 PK 12 G5420 <NA> E 45411569 3622561 +29.9772516
## INTPTLON statefp_2 COUNTYFP TRACTCE geoid_2 name_2
## 0 -094.0054218 48 245 011001 48245011001 110.01
## 1 -094.0054218 48 245 980000 48245980000 9800
## 2 -094.0054218 48 245 007100 48245007100 71
## 3 -094.0054218 48 245 011002 48245011002 110.02
## 4 -094.0054218 48 245 011101 48245011101 111.01
## 5 -094.0054218 48 245 011203 48245011203 112.03
## NAMELSAD mtfcc_2 funcstat_2 aland_2 awater_2 intptlat_2
## 0 Census Tract 110.01 G5020 S 3683278 5424 +29.9587159
## 1 Census Tract 9800 G5020 S 4546184 20989 +29.9515811
## 2 Census Tract 71 G5020 S 14592830 642583 +29.9115605
## 3 Census Tract 110.02 G5020 S 2298990 0 +29.9740297
## 4 Census Tract 111.01 G5020 S 3586121 875 +29.9668309
## 5 Census Tract 112.03 G5020 S 3500203 130615 +29.9859820
## intptlon_2 nhwhite nhblack nhother hispanic ids cofips
## 0 -093.9940534 4042 159 578 709 48245011001 48245
## 1 -094.0206917 19 0 119 17 48245980000 48245
## 2 -094.0281336 2124 266 170 1325 48245007100 48245
## 3 -093.9818113 3123 60 114 349 48245011002 48245
## 4 -094.0059588 3951 43 166 551 48245011101 48245
## 5 -094.0500790 465 764 43 445 48245011203 48245
dat<-race.sd #spatialpolygondataframe with the race data
cos<-unique(dat$UNSDLEA) #unique school district codes
spsegout<-function(dat, cos){
theil<-numeric(length(cos))
diss<-numeric(length(cos))
div<-numeric(length(cos))
isol<-numeric(length(cos))
inter<-numeric(length(cos))
school<-NULL
for (i in 1:length(cos)){
#Modify the two groups in the next line, this usees NH White and Hispanic
out<-spseg(dat[dat$UNSDLEA==cos[i],],data =dat@data[dat$UNSDLEA==cos[i],c("nhwhite","hispanic")], method="all", useC=T, negative.rm=T)
div[i]<-out@r #Diversity
diss[i]<-out@d #Dissimilarity
theil[i]<-out@h #Theil Entrophy
isol[i]<-out@p[2,2] #Isolation for Hispanics
inter[i]<-out@p[1,2] #Interaction for Whites and Hispanics
school[i]<-as.character(cos[i])
}
list(theil=theil, dissim=diss, diversity=div,isolation=isol, interaction=inter,school=school)
}
spsegs<-data.frame(spsegout(dat,cos))
head(spsegs)
## theil dissim diversity isolation interaction school
## 1 1.984326e-03 3.915320e-03 2.659673e-03 0.1652343 0.1646716 32280
## 2 1.110223e-16 3.065399e-16 4.440892e-16 0.1301094 0.1301094 38490
## 3 1.136642e-03 3.504607e-02 1.559223e-03 0.4538503 0.4391095 28230
## 4 2.404674e-04 6.804121e-03 3.325919e-04 0.5628839 0.5626281 36390
## 5 1.813791e-03 3.071926e-02 2.466440e-03 0.3762909 0.3723087 21030
## 6 0.000000e+00 0.000000e+00 0.000000e+00 0.1403803 0.1403803 11040
sdseg<-geo_join(district, spsegs, "UNSDLEA", "school")
#Map the index
spplot(sdseg, "interaction", at=seq(0,1, .2), col.regions=brewer.pal(n=5, "Blues"), main="Hispanic/Non Hispanic Interaction index by School Districts within Texas")
spplot(sdseg, "isolation", at=seq(0,1, .2), col.regions=brewer.pal(n=5, "Blues"), main="Hispanic Isolation index by School Districts within Texas")
msa<-core_based_statistical_areas(cb=T)
names(msa)
## [1] "CSAFP" "CBSAFP" "AFFGEOID" "GEOID" "NAME" "LSAD"
## [7] "ALAND" "AWATER"
params <- get_args_man(alg = "qgis:intersection",
qgis_env = myenv)
params$INPUT <- msa
params$INPUT2<-tract
params$OUTPUT<-file.path(wd, "msa_tract.shp")
# path to the output shapefile
msatract <- run_qgis(alg = "qgis:intersection",
params = params,
load_output = params$OUTPUT,
qgis_env = myenv)
## ERROR: Opening of authentication db FAILEDWARNING: Auth db query exec() FAILED
msatract<-msatract[msatract$STATEFP=="48",]
cols<-brewer.pal(n=12 , name="Set3")
cols<-sample(cols, size = length(unique(msatract$NAME)), replace = T)
plot(msatract, col=cols[as.numeric(msatract$NAME)], border=grey(.9))
#spplot(msatract, "NAME", col.regions=brewer.pal(n=12, name = "Set3"), auto.key=FALSE)
race.msa<-geo_join(msatract,tx.trrace,"geoid_2","ids" )
head(race.msa@data)
## CSAFP CBSAFP AFFGEOID GEOID NAME LSAD ALAND AWATER
## 0 <NA> 45020 310M200US45020 45020 Sweetwater, TX M2 2362061740 5052158
## 1 <NA> 45020 310M200US45020 45020 Sweetwater, TX M2 2362061740 5052158
## 2 <NA> 45020 310M200US45020 45020 Sweetwater, TX M2 2362061740 5052158
## 3 <NA> 45020 310M200US45020 45020 Sweetwater, TX M2 2362061740 5052158
## 4 <NA> 45020 310M200US45020 45020 Sweetwater, TX M2 2362061740 5052158
## 5 <NA> 45020 310M200US45020 45020 Sweetwater, TX M2 2362061740 5052158
## STATEFP COUNTYFP TRACTCE geoid_2 name_2 NAMELSAD MTFCC
## 0 48 441 013600 48441013600 136 Census Tract 136 G5020
## 1 48 151 950300 48151950300 9503 Census Tract 9503 G5020
## 2 48 335 950400 48335950400 9504 Census Tract 9504 G5020
## 3 48 353 950100 48353950100 9501 Census Tract 9501 G5020
## 4 48 353 950500 48353950500 9505 Census Tract 9505 G5020
## 5 48 353 950200 48353950200 9502 Census Tract 9502 G5020
## FUNCSTAT aland_2 awater_2 INTPTLAT INTPTLON nhwhite nhblack
## 0 S 701675542 422010 +32.3997269 -100.0143087 4172 6
## 1 S 1308923420 4090575 +32.6420058 -100.3982035 1332 37
## 2 S 1982961660 7252257 +32.2664096 -100.9049589 1174 49
## 3 S 505340985 3533009 +32.4442471 -100.3446966 1698 35
## 4 S 1834440353 1516728 +32.2671635 -100.4183080 1772 0
## 5 S 3300907 2421 +32.4828094 -100.3972167 2921 179
## nhother hispanic ids cofips
## 0 253 880 48441013600 48441
## 1 8 319 48151950300 48151
## 2 156 473 48335950400 48335
## 3 30 358 48353950100 48353
## 4 97 811 48353950500 48353
## 5 98 1100 48353950200 48353
dat<-race.msa #spatialpolygondataframe with the race data
cos<-unique(dat$CBSAFP) #unique cbsa codes
spsegout<-function(dat, cos){
theil<-numeric(length(cos))
diss<-numeric(length(cos))
div<-numeric(length(cos))
isol<-numeric(length(cos))
inter<-numeric(length(cos))
cbsa<-NULL
for (i in 1:length(cos)){
#Modify the two groups in the next line, this usees NH White and Hispanic
out<-spseg(dat[dat$CBSAFP==cos[i],],data =dat@data[dat$CBSAFP==cos[i],c("nhwhite","hispanic")], method="all", useC=T, negative.rm=T)
div[i]<-out@r #Diversity
diss[i]<-out@d #Dissimilarity
theil[i]<-out@h #Theil Entrophy
isol[i]<-out@p[2,2] #Isolation for Hispanics
inter[i]<-out@p[1,2] #Interaction for Whites and Hispanics
cbsa[i]<-as.character(cos[i])
}
list(theil=theil, dissim=diss, diversity=div,isolation=isol, interaction=inter,cbsa=cbsa)
}
spsegs<-data.frame(spsegout(dat,cos))
head(spsegs)
## theil dissim diversity isolation interaction cbsa
## 1 -0.004304018 0.06677058 -0.006774552 0.2571533 0.23951264 45020
## 2 -0.009780796 0.02421253 -0.013079487 0.0904850 0.08868336 45500
## 3 0.013846127 0.04965099 0.018996401 0.6436788 0.62951323 18580
## 4 -0.023894987 0.04840618 -0.033088730 0.3163348 0.30051690 34420
## 5 0.000210398 0.01215022 0.000213342 0.1409942 0.13900592 11980
## 6 0.002167641 0.06438760 0.002958120 0.4640171 0.44042268 29500
msaseg<-geo_join(msa, spsegs, "CBSAFP", "cbsa", how="inner")
#Map the index
spplot(msaseg, "interaction", at=seq(0,1, .2), col.regions=brewer.pal(n=5, "Blues"), main="Hispanic/Non Hispanic Interaction index by CBSAs within Texas")
spplot(msaseg, "isolation", at=seq(0,1, .2), col.regions=brewer.pal(n=5, "Blues"), main="Hispanic Isolation index by CBSAs within Texas")
So, often times we calculate these indices so we can use them in another application. So, below, I load data from the 2014 CDC Behavioral Risk Factor Surveillance System (BRFSS) SMART City data. Link, which includes a geographic identifier for each respondent, which indicates which metropolitan statistical area (MSA) they live in. We can join this to the segregation measures we calculated for the core based statistical areas (basically MSAs) and use it in a model.
Below, I test whether Hispanics living in areas with high hispanic isolation are more likely to
library(car)
load("/Users/ozd504/Google Drive//classes/dem7283/class17/data/brfss_14.Rdata")
#The names in the data are very ugly, so I make them less ugly
nams<-names(brfss14)
head(nams, n=10)
## [1] "dispcode" "genhlth" "physhlth" "menthlth" "poorhlth" "hlthpln1"
## [7] "persdoc2" "medcost" "checkup1" "exerany2"
#we see some names are lower case, some are upper and some have a little _ in the first position. This is a nightmare.
set.seed(1234)
newnames<-gsub(pattern = "x_",replacement = "",x = nams)
names(brfss14)<-tolower(newnames)
#samps<-sample(1:nrow(brfss14), size = 10000, replace = F)
#brfss14<-brfss14[samps,]
#Poor or fair self rated health
#brfss14$badhealth<-ifelse(brfss14$genhlth %in% c(4,5),1,0)
brfss14$badhealth<-recode(brfss14$genhlth, recodes="4:5=1; 1:3=0; else=NA")
#race/ethnicity
brfss14$black<-recode(brfss14$racegr3, recodes="2=1; 9=NA; else=0")
brfss14$white<-recode(brfss14$racegr3, recodes="1=1; 9=NA; else=0")
brfss14$other<-recode(brfss14$racegr3, recodes="3:4=1; 9=NA; else=0")
brfss14$hispanic<-recode(brfss14$racegr3, recodes="5=1; 9=NA; else=0")
brfss14$race_eth<-recode(brfss14$racegr3, recodes="1='nhwhite'; 2='nh black'; 3:4='nh other'; 5='hispanic'; else=NA", as.factor.result = T)
brfss14$race_eth<-relevel(brfss14$race_eth, ref = "nhwhite")
#insurance
brfss14$ins<-ifelse(brfss14$hlthpln1==1,1,0)
#income grouping
brfss14$inc<-ifelse(brfss14$incomg==9, NA, brfss14$incomg)
#education level
brfss14$educ<-recode(brfss14$educa, recodes="1:3='0LTHS'; 4='2hsgrad';
5='3somecol'; 6='4colgrad';9=NA", as.factor.result=T)
brfss14$educ<-relevel(brfss14$educ, ref='0LTHS')
#employment
brfss14$employ<-recode(brfss14$employ, recodes="1:2='Employed'; 2:6='nilf';
7='retired'; 8='unable'; else=NA", as.factor.result=T)
brfss14$employ<-relevel(brfss14$employ, ref='Employed')
#marital status
brfss14$marst<-recode(brfss14$marital, recodes="1='married'; 2='divorced'; 3='widowed';
4='separated'; 5='nm';6='cohab'; else=NA", as.factor.result=T)
brfss14$marst<-relevel(brfss14$marst, ref='married')
#Age cut into intervals
brfss14$agec<-cut(brfss14$age80, breaks=c(0,24,39,59,79), include.lowest = T)
#Health behaviors
#insurance
brfss14$ins<-recode(brfss14$hlthpln1, recodes = "1=1; 2=0; else=NA" )
#smoking currently
brfss14$smoke<-recode(brfss14$smoker3, recodes="1:2='Current'; 3='Former';4='NeverSmoked'; else=NA", as.factor.result=T)
brfss14$smoke<-relevel(brfss14$smoke, ref = "NeverSmoked")
#bmi
brfss14$bmi<-ifelse(is.na(brfss14$bmi5)==T, NA, brfss14$bmi5/100)
txbrfss<-brfss14[grep("TX", brfss14$mmsaname),]
txbrfss<-merge(txbrfss, spsegs, by.x="mmsa", by.y="cbsa")
table(as.character(txbrfss$mmsaname))
##
## Austin-Round Rock, TX, Metropolitan Statistical Area
## 2261
## College Station-Bryan, TX, Metropolitan Statistical Area
## 577
## Corpus Christi, TX, Metropolitan Statistical Area
## 623
## El Paso, TX, Metropolitan Statistical Area
## 723
## Houston-The Woodlands-Sugar Land, TX, Metropolitan Statistical Area
## 2164
## San Antonio-New Braunfels, TX, Metropolitan Statistical Area
## 2283
## Wichita Falls, TX, Metropolitan Statistical Area
## 547
library(lme4)
bhtx<-glmer(badhealth~agec+race_eth+educ+(1|mmsa),
txbrfss, family=binomial, weights = mmsawt/mean(txbrfss$mmsawt, na.rm=T),
control=glmerControl(optCtrl=list(maxfun=2e4), optimizer="bobyqa"))
arm::display(bhtx, detail=T)
## glmer(formula = badhealth ~ agec + race_eth + educ + (1 | mmsa),
## data = txbrfss, family = binomial, control = glmerControl(optCtrl = list(maxfun = 20000),
## optimizer = "bobyqa"), weights = mmsawt/mean(txbrfss$mmsawt,
## na.rm = T))
## coef.est coef.se z value Pr(>|z|)
## (Intercept) -1.74 0.15 -11.29 0.00
## agec(24,39] 0.49 0.12 4.07 0.00
## agec(39,59] 1.13 0.11 9.91 0.00
## agec(59,79] 1.48 0.12 11.92 0.00
## race_ethhispanic 0.19 0.09 2.24 0.03
## race_ethnh black 0.42 0.10 4.17 0.00
## race_ethnh other -0.55 0.19 -2.90 0.00
## educ2hsgrad -0.71 0.09 -8.08 0.00
## educ3somecol -0.88 0.09 -9.99 0.00
## educ4colgrad -1.99 0.13 -15.56 0.00
##
## Error terms:
## Groups Name Std.Dev.
## mmsa (Intercept) 0.14
## Residual 1.00
## ---
## number of obs: 7805, groups: mmsa, 7
## AIC = 6418.2, DIC = 7901.8
## deviance = 7149.0
bhtx2<-glmer(badhealth~agec+race_eth+educ+race_eth*scale(isolation)+(1|mmsa),
txbrfss, family=binomial, weights = mmsawt/mean(txbrfss$mmsawt, na.rm=T),
control=glmerControl(optCtrl=list(maxfun=2e4), optimizer="bobyqa"))
arm::display(bhtx2, detail=T)
## glmer(formula = badhealth ~ agec + race_eth + educ + race_eth *
## scale(isolation) + (1 | mmsa), data = txbrfss, family = binomial,
## control = glmerControl(optCtrl = list(maxfun = 20000), optimizer = "bobyqa"),
## weights = mmsawt/mean(txbrfss$mmsawt, na.rm = T))
## coef.est coef.se z value Pr(>|z|)
## (Intercept) -1.76 0.15 -12.06 0.00
## agec(24,39] 0.50 0.12 4.11 0.00
## agec(39,59] 1.14 0.11 9.92 0.00
## agec(59,79] 1.48 0.13 11.81 0.00
## race_ethhispanic 0.13 0.09 1.54 0.12
## race_ethnh black 0.43 0.10 4.17 0.00
## race_ethnh other -0.64 0.20 -3.20 0.00
## educ2hsgrad -0.73 0.09 -8.23 0.00
## educ3somecol -0.91 0.09 -10.21 0.00
## educ4colgrad -2.01 0.13 -15.67 0.00
## scale(isolation) 0.01 0.10 0.13 0.90
## race_ethhispanic:scale(isolation) 0.21 0.12 1.79 0.07
## race_ethnh black:scale(isolation) -0.05 0.20 -0.27 0.79
## race_ethnh other:scale(isolation) 0.87 0.32 2.76 0.01
##
## Error terms:
## Groups Name Std.Dev.
## mmsa (Intercept) 0.05
## Residual 1.00
## ---
## number of obs: 7805, groups: mmsa, 7
## AIC = 6405.1, DIC = 7911.4
## deviance = 7143.2
anova(bhtx, bhtx2)
## Data: txbrfss
## Models:
## bhtx: badhealth ~ agec + race_eth + educ + (1 | mmsa)
## bhtx2: badhealth ~ agec + race_eth + educ + race_eth * scale(isolation) +
## bhtx2: (1 | mmsa)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## bhtx 11 6418.2 6494.7 -3198.1 6396.2
## bhtx2 15 6405.1 6509.5 -3187.5 6375.1 21.089 4 0.0003041 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1