In this example, I will show how to fit a multi-level model that includes a predictor at the macro level. I will also consider the cross-level interaction effect, where we are interested in contextualizing the effect of an individual level predictor within the context of the macro level predictor. The data we use this time merges data from the 2012 CDC Behavioral Risk Factor Surveillance System (BRFSS) SMART county data and the 2010 American Community Survey 5-year estimates at the county level. Our outcome of interest is a person’s obesity status, measured using the BRFSS’s BMI variable, and using the cutoff rule of obese weight is a BMI greater than 30.

Again, we are assuming a set of propositions where we have an individual level outcome, and predictor variables at both the micro and macro levels. This can be shown in several different ways.

Multi level propositions Figures from Kawachi and Berkman (2003)

The first case is a macro to micro proposition, which may be exemplified by a statement such as: “Individuals in areas with high rates of poverty are more likely to be obese”.

Whereas the second frame illustrates a more specific special case, where there is a macro level effect, net of the individual level predictor, and may be stated “Individuals who are racial/ethnic minorities are more likely to be obese, compared to Non-Hispanic whites, and living in areas of high poverty also increase the risk of obesity”.

The last panel illustrates what is known as a cross level interaction, or a macro-micro interaction. This is where the relationship between x and y is dependent on Z. This leads to the statement “Individuals who are racial/ethnic minorities, living in areas with high poverty rates are at higher risk of obesity”.

First we load our data and libraries

library(lme4)
## Loading required package: Matrix
library(arm)
## Loading required package: MASS
## 
## arm (Version 1.10-1, built: 2018-4-12)
## Working directory is C:/Users/ozd504/Google Drive/classes/dem7473/code
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:arm':
## 
##     logit
library(haven)
library(tidycensus)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#load brfss
brfss_12<-read_xpt("~/Google Drive/classes/dem7473/data/CNTY12.xpt")
nams<-names(brfss_12)
newnames<-gsub("_", "", nams)
names(brfss_12)<-make.names(tolower(newnames), unique = T)

brfss_12$statefip<-sprintf("%02d", brfss_12$state )
brfss_12$cofip<-sprintf("%03d", brfss_12$cnty )
brfss_12$cofips<-paste(brfss_12$statefip, brfss_12$cofip, sep="")

#insurance
brfss_12$ins<-ifelse(brfss_12$hlthpln1==1,1,0)
#smoking currently
brfss_12$smoke<-Recode(brfss_12$smoker3, recodes="1:2=1; 3:4=0; else=NA")
#bmi
brfss_12$bmi<-ifelse(is.na(brfss_12$bmi5)==T, NA, brfss_12$bmi5/100)
#poor or fair health
brfss_12$badhealth<-ifelse(brfss_12$genhlth %in% c(4,5),1,0)

#race
brfss_12$black<-Recode(brfss_12$racegr2, recodes="2=1; 9=NA; else=0", as.factor=T)
brfss_12$white<-Recode(brfss_12$racegr2, recodes="1=1; 9=NA; else=0", as.factor=T)
brfss_12$other<-Recode(brfss_12$racegr2, recodes="3:4=1; 9=NA; else=0", as.factor=T)
brfss_12$hispanic<-Recode(brfss_12$racegr2, recodes="5=1; 9=NA; else=0", as.factor=T)

#have a personal doctor?
brfss_12$doc<-Recode(brfss_12$persdoc2, recodes="1:2=1; 3=0; else=NA", as.factor=F)

#needed care in last year but couldn't get it because of cost
brfss_12$medacc<-Recode(brfss_12$medcost, recodes="1=1;2=0;else=NA")

#education level
brfss_12$lths<-Recode(brfss_12$educa, recodes="1:3=1;9=NA; else=0", as.factor=F)
brfss_12$coll<-Recode(brfss_12$educa, recodes="5:6=1;9=NA; else=0", as.factor=F)

#employment
brfss_12$employ<-Recode(brfss_12$employ, recodes="1:2='Employed'; 2:6='nilf'; 7='retired'; 8='unable'; else=NA", as.factor=T)
brfss_12$employ<-relevel(brfss_12$employ, ref='Employed')

#marital status
brfss_12$marst<-Recode(brfss_12$marital, recodes="1='married'; 2='divorced'; 3='widowed'; 4='separated'; 5='nm';6='cohab'; else=NA", as.factor=T)
brfss_12$marst<-relevel(brfss_12$marst, ref='married')

#income
brfss_12$inc<-as.factor(ifelse(brfss_12$incomg==9, NA, brfss_12$incomg))

#Age cut into intervals
brfss_12$agec<-cut(brfss_12$age, breaks=c(0,24,39,59,79,99))

brfss_12$agez<-as.numeric(scale(brfss_12$age, scale =T,center=T))

brfss_12$bmiz<-as.numeric(scale(brfss_12$bmi, scale = T , center=T))

brfss_12$male<-ifelse(brfss_12$sex==1, 1, 0)
brfss_12<-brfss_12%>%
  #select(!mrace)%>%
  select(cofips,male, agec, agez, bmi, bmiz, black, hispanic, other,  marst, employ, lths, coll, badhealth)%>%
  filter(complete.cases(.))

Now we will begin fitting the multilevel regression model with the county that the person lives in being the higher level.

Higher level (Macro) variables

As a rule, when using ANY continuous predictor, and Especially continuous predictors at the higher level, you should always standardize (z-score) them. This is done for 2 reasons, first is that it makes the computer happy, meaning that, if the scale of all predictor variables is on a similar scale as the variance components of the model, the algorithms (Laplace approximation) will converge faster. Second is that scaling your higher level variables makes the interpretation of the effect of the higher level variable easier.

You should always standardize your macro level variables prior to merging them back to the individual level data.

To measure macro level variables, I will include some Census variables from the ACS 2011 5 Year estimates load in ACS data from tidycensus. The first set of variables includes information on the economic conditions of the county, specifically poverty and unemployment.

usacs<-get_acs(geography = "county", year = 2011,
                variables=c( "DP05_0001E", "DP03_0009P", "DP03_0062E", "DP03_0119PE",
                             "DP05_0001E","DP02_0009PE","DP02_0008PE", "DP02_0040E","DP02_0038E",
                             "DP02_0066PE","DP02_0067PE","DP02_0080PE","DP02_0092PE",
                             "DP03_0005PE","DP03_0028PE","DP03_0062E","DP03_0099PE","DP03_0101PE",
                             "DP03_0119PE","DP04_0046PE","DP05_0072PE","DP05_0073PE",
                             "DP05_0066PE","DP05_0033","DP05_0032", "DP05_0072PE", "DP02_0113PE",  "DP04_0003PE") ,
                summary_var = "B01001_001",
                geometry = F, output = "wide")
## Getting data from the 2007-2011 5-year ACS
## Using the ACS Data Profile
## Using the ACS Data Profile
usacs<-usacs%>%
  mutate(totpop= DP05_0001E, fertrate = DP02_0040E,pwhite=DP05_0072PE/100, nwhite=DP05_0032E,nblack=DP05_0033E,
         pblack=DP05_0073PE/100 , phisp=DP05_0066PE/100, pfemhh=DP02_0008PE/100,
         phsormore=DP02_0066PE/100,punemp=DP03_0009PE/100, medhhinc=DP03_0062E,
         ppov=DP03_0119PE/100, pforn=DP02_0092PE/100,plep=DP02_0113PE/100,  pvacant=DP04_0003PE/100)%>%
  select(GEOID,NAME, totpop, pwhite, pblack, phisp, pfemhh, phsormore, punemp, medhhinc, ppov, pforn,plep, pvacant)


head(usacs)

I also have segregation measures at the county level that I made here, I will merge these in too.

load(url("https://github.com/coreysparks/data/blob/master/segregation_county.Rdata?raw=true"))

#Merge files by county FIPS codes
joindata<-merge(usacs, county_dat, by.x="GEOID", by.y="cofips", all.x=T)

Clean and standardize contextual data

myscale<-function(x){as.numeric(scale(x))}
joindata<-joindata %>%
  mutate_at(c("ppov","punemp", "pblack","phisp", "pfemhh", "phsormore", "medhhinc", "pvacant", "dissim_wb", "isolation_b"),myscale)

joindata<-joindata%>%filter(complete.cases(.))
head(joindata)

Now, I merge the data back to the individual level data:

merged<-merge(x=brfss_12, y=joindata, by.x="cofips", by.y="GEOID")
merged<-merged%>%
  filter(complete.cases(.))

Models

Here I fit a binomial model (binary logistic regression) for obesity status (1= obese, 0=not obese) First a simple random intercept model.

I subset the data to only have a few states, this is purely for faster computer time:

#California,Illinois,  Massachusetts, New York, Ohio, Texas
#merged<-merged[merged$statefip%in%c("06", "25","17", "36", "39","48"),]

Here I fit the random intercept logistic regression:

Model 1 - individual level effects

fit.mix.bin<-glmer(I(bmi>=30)~male+agez+lths+coll+black+hispanic+other+(1|cofips),
                   family=binomial, data=merged, 
                   control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
library(MuMIn)
r.squaredGLMM(fit.mix.bin)
## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help
## page.
## Warning: The null model is correct only if all variables used by the
## original model remain unchanged.
##                    R2m        R2c
## theoretical 0.01959797 0.03106952
## delta       0.01288166 0.02042186
cor(fitted(fit.mix.bin),fit.mix.bin@frame$`I(bmi >= 30)`)^2
## [1] 0.02181931
#Tests of coefficitns
display(fit.mix.bin, detail=T, digits=4)
## glmer(formula = I(bmi >= 30) ~ male + agez + lths + coll + black + 
##     hispanic + other + (1 | cofips), data = merged, family = binomial, 
##     control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se  z value  Pr(>|z|)
## (Intercept)  -0.9784   0.0179 -54.6339   0.0000
## male          0.0684   0.0102   6.6792   0.0000
## agez          0.0151   0.0053   2.8404   0.0045
## lths          0.1472   0.0204   7.1981   0.0000
## coll         -0.2308   0.0116 -19.8379   0.0000
## black1        0.6689   0.0175  38.2151   0.0000
## hispanic1     0.2449   0.0203  12.0421   0.0000
## other1       -0.0561   0.0219  -2.5679   0.0102
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  cofips   (Intercept) 0.1974  
##  Residual             1.0000  
## ---
## number of obs: 206037, groups: cofips, 209
## AIC = 234913, DIC = 233704.3
## deviance = 234299.6
#confidence intervals for the Odds ratio
exp(confint(fit.mix.bin, method="Wald"))
##                 2.5 %    97.5 %
## .sig01             NA        NA
## (Intercept) 0.3629417 0.3893359
## male        1.0495139 1.0925001
## agez        1.0047008 1.0258995
## lths        1.1130302 1.2058978
## coll        0.7759953 0.8122053
## black1      1.8863078 2.0202818
## hispanic1   1.2275581 1.3294182
## other1      0.9058014 0.9868034

Model 2

Here is a model that only considers the macro level variable on the outcome:

fit.mix.bin.mac<-glmer(I(bmi>=30)~ppov+pblack+phsormore+isolation_b+(1|cofips),nAGQ=10,
                       family=binomial, data=merged, 
                       control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=4e5)))
r.squaredGLMM(fit.mix.bin.mac)
## Warning: The null model is correct only if all variables used by the
## original model remain unchanged.
##                     R2m        R2c
## theoretical 0.004055100 0.01542205
## delta       0.002650732 0.01008106
display(fit.mix.bin.mac, detail=T, digits=4)
## glmer(formula = I(bmi >= 30) ~ ppov + pblack + phsormore + isolation_b + 
##     (1 | cofips), data = merged, family = binomial, control = glmerControl(optimizer = "bobyqa", 
##     optCtrl = list(maxfun = 4e+05)), nAGQ = 10)
##             coef.est coef.se  z value  Pr(>|z|)
## (Intercept)  -0.9318   0.0232 -40.2147   0.0000
## ppov          0.1078   0.0470   2.2922   0.0219
## pblack        0.0740   0.0445   1.6649   0.0959
## phsormore    -0.0515   0.0334  -1.5414   0.1232
## isolation_b  -0.0106   0.0317  -0.3350   0.7377
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  cofips   (Intercept) 0.1949  
##  Residual             1.0000  
## ---
## number of obs: 206037, groups: cofips, 209
## AIC = 237356, DIC = 236159.5
## deviance = 236752.0
#confidence intervals for the Odds ratio
exp(confint(fit.mix.bin.mac, method="Wald"))
##                 2.5 %    97.5 %
## .sig01             NA        NA
## (Intercept) 0.3763670 0.4121511
## ppov        1.0157449 1.2213663
## pblack      0.9869636 1.1749159
## phsormore   0.8896213 1.0140800
## isolation_b 0.9297515 1.0529285

So, in counties that have higher poverty rates, the risk of obesity is higher, the effect of % black and % with more than high school are marginally significant, while there is not effect of living in more highly segregated counties.

Model 3

Then I do a multi-level model where I include the poverty rate and the segregation index at the county level:

fit.mix.bin.ml<-glmer(I(bmi>=30)~male+agez+lths+coll+black+hispanic+other+ppov+pblack+phsormore+isolation_b+(1|cofips),nAGQ=10,
                      family=binomial, data=merged,
                      control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=4e5)))

r.squaredGLMM(fit.mix.bin.ml)
## Warning: The null model is correct only if all variables used by the
## original model remain unchanged.
##                    R2m        R2c
## theoretical 0.02092099 0.03155204
## delta       0.01375364 0.02074258
display(fit.mix.bin.ml, detail=T, digits=4)
## glmer(formula = I(bmi >= 30) ~ male + agez + lths + coll + black + 
##     hispanic + other + ppov + pblack + phsormore + isolation_b + 
##     (1 | cofips), data = merged, family = binomial, control = glmerControl(optimizer = "bobyqa", 
##     optCtrl = list(maxfun = 4e+05)), nAGQ = 10)
##             coef.est coef.se  z value  Pr(>|z|)
## (Intercept)  -0.9201   0.0250 -36.8760   0.0000
## male          0.0685   0.0102   6.6857   0.0000
## agez          0.0149   0.0053   2.7977   0.0051
## lths          0.1460   0.0205   7.1390   0.0000
## coll         -0.2307   0.0116 -19.8262   0.0000
## black1        0.6681   0.0177  37.6847   0.0000
## hispanic1     0.2379   0.0205  11.6160   0.0000
## other1       -0.0588   0.0219  -2.6897   0.0072
## ppov          0.0904   0.0461   1.9612   0.0499
## pblack        0.0139   0.0436   0.3186   0.7500
## phsormore    -0.0136   0.0328  -0.4154   0.6778
## isolation_b  -0.0303   0.0311  -0.9730   0.3305
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  cofips   (Intercept) 0.1900  
##  Residual             1.0000  
## ---
## number of obs: 206037, groups: cofips, 209
## AIC = 234908, DIC = 233722.2
## deviance = 234301.9
#confidence intervals for the Odds ratio
exp(confint(fit.mix.bin.ml, method="Wald"))
##                 2.5 %    97.5 %
## .sig01             NA        NA
## (Intercept) 0.3794464 0.4184360
## male        1.0495976 1.0925990
## agez        1.0044730 1.0256707
## lths        1.1117420 1.2045474
## coll        0.7760454 0.8122682
## black1      1.8839944 2.0195889
## hispanic1   1.2186961 1.3205820
## other1      0.9033318 0.9841700
## ppov        1.0000571 1.1980130
## pblack      0.9308806 1.1045459
## phsormore   0.9250662 1.0519503
## isolation_b 0.9128140 1.0311730
#Compare this to the first model, to see if the higher level variables contribute to the model
anova(fit.mix.bin, fit.mix.bin.ml)

These results show, that net of individual level variables, the county level predictors tell us something about obesity risk, specifically in counties that have higher poverty rates, the odds of being obese is increased, albeit marginally.

Model 4

Finally, I fit the Cross level interaction model between poverty and racial, using the individual level race variables as my micro level variable of interest. This would answer the question, “Do racial/ethnic minorities living in High poverty areas face added risk of being obese” or “Do racial/ethnic minorities living in highly segregated area face added risk of being obese”

To do this in your model, you simply include a term of individualvar*highervar term:

Poverty*Race This model will test whether racial/ethnic minorities living in areas of high poverty are more likely to be obese

See: Black, Jennifer L. and James Macinko. 2008. “Neighborhoods and Obesity.” Nutrition Reviews 66(1):2-20.

Boardman, Jason D., Jarron M. Saint Onge, Richard G. Rogers, and Justin T. Denney. 2005. “Race Differentials in Obesity: The Impact of Place.” Journal of Health and Social Behavior 46(3):229-43.

fit.mix.bin.cl<-glmer(I(bmi>=30)~male+agez+lths+coll+(black+hispanic+other)*ppov+ (1|cofips),nAGQ=10,
                      family=binomial, data=merged,
                      control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=4e5)))
display(fit.mix.bin.cl, detail=T)
## glmer(formula = I(bmi >= 30) ~ male + agez + lths + coll + (black + 
##     hispanic + other) * ppov + (1 | cofips), data = merged, family = binomial, 
##     control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 4e+05)), 
##     nAGQ = 10)
##                coef.est coef.se z value Pr(>|z|)
## (Intercept)     -0.97     0.02  -45.49    0.00  
## male             0.07     0.01    6.77    0.00  
## agez             0.01     0.01    2.78    0.01  
## lths             0.14     0.02    7.03    0.00  
## coll            -0.23     0.01  -19.81    0.00  
## black1           0.69     0.02   37.08    0.00  
## hispanic1        0.27     0.02   12.16    0.00  
## other1           0.02     0.03    0.85    0.40  
## ppov             0.04     0.03    1.25    0.21  
## black1:ppov      0.10     0.03    2.95    0.00  
## hispanic1:ppov   0.11     0.03    3.08    0.00  
## other1:ppov      0.20     0.04    4.94    0.00  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  cofips   (Intercept) 0.19    
##  Residual             1.00    
## ---
## number of obs: 206037, groups: cofips, 209
## AIC = 234877, DIC = 233691.4
## deviance = 234271.3
exp(confint(fit.mix.bin.cl, method="Wald"))
##                    2.5 %    97.5 %
## .sig01                NA        NA
## (Intercept)    0.3645351 0.3962304
## male           1.0505563 1.0936053
## agez           1.0043713 1.0255746
## lths           1.1094350 1.2021077
## coll           0.7761218 0.8123550
## black1         1.9213772 2.0666419
## hispanic1      1.2546821 1.3689423
## other1         0.9702818 1.0788855
## ppov           0.9800859 1.0953950
## black1:ppov    1.0327110 1.1725550
## hispanic1:ppov 1.0394226 1.1903933
## other1:ppov    1.1312281 1.3303615
anova(fit.mix.bin.ml, fit.mix.bin.cl)

Which shows that all race/ethnic minorities face a disadvantage in terms of obesity risk in counties that have higher poverty rates.

Segregation * Race

See: Chang, V. W., A. E. Hillier, and N. K. Mehta. 2009. “Neighborhood Racial Isolation, Disorder and Obesity.” Social Forces 87(4):2063-92.

Kershaw, K. N., S. S. Albrecht, and M. R. Carnethon. 2013. “Racial and Ethnic Residential Segregation, the Neighborhood Socioeconomic Environment, and Obesity Among Blacks and Mexican Americans.” American Journal of Epidemiology 177(4):299-309.

fit.mix.bin.cl2<-glmer(I(bmi>=30)~male+agez+lths+coll+(black+hispanic+other)+black*isolation_b+ (1|cofips), nAGQ=10,
                      family=binomial, data=merged,
                      control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=4e5)))
display(fit.mix.bin.cl2, detail=T)
## glmer(formula = I(bmi >= 30) ~ male + agez + lths + coll + (black + 
##     hispanic + other) + black * isolation_b + (1 | cofips), data = merged, 
##     family = binomial, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 4e+05)), nAGQ = 10)
##                    coef.est coef.se z value Pr(>|z|)
## (Intercept)         -0.98     0.02  -51.93    0.00  
## male                 0.07     0.01    6.82    0.00  
## agez                 0.01     0.01    2.65    0.01  
## lths                 0.15     0.02    7.18    0.00  
## coll                -0.23     0.01  -19.72    0.00  
## black1               0.51     0.03   16.83    0.00  
## hispanic1            0.24     0.02   11.99    0.00  
## other1              -0.06     0.02   -2.59    0.01  
## isolation_b         -0.01     0.01   -1.13    0.26  
## black1:isolation_b   0.09     0.01    6.36    0.00  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  cofips   (Intercept) 0.20    
##  Residual             1.00    
## ---
## number of obs: 206037, groups: cofips, 209
## AIC = 234876, DIC = 233663.7
## deviance = 234258.8
exp(confint(fit.mix.bin.cl2, method="Wald"))
##                        2.5 %    97.5 %
## .sig01                    NA        NA
## (Intercept)        0.3635170 0.3912834
## male               1.0510656 1.0941439
## agez               1.0036789 1.0248724
## lths               1.1127368 1.2056284
## coll               0.7769819 0.8132588
## black1             1.5731043 1.7726847
## hispanic1          1.2263674 1.3281594
## other1             0.9053418 0.9863467
## isolation_b        0.9627714 1.0102660
## black1:isolation_b 1.0644780 1.1254128
anova(fit.mix.bin.ml, fit.mix.bin.cl2)

The segregation interaction model shows that, only for blacks, if they are living in a county that is highly segregated, this increases their risk of being obese, compared to whites in a county of average segregation. This model shows improvement compared to the multilevel model.

Fitted Values

Often times in these models, it can be useful to highlight specific cases to tell a story

Here I use the highest, an average and the lowest intercept counties to illustrate the differences.

which.max(ranef(fit.mix.bin.cl)$cofips$`(Intercept)`)
## [1] 78
which(ranef(fit.mix.bin.cl)$cofips$`(Intercept)`> -.01 &ranef(fit.mix.bin.cl)$cofips$`(Intercept)` <.01)
##  [1]  30  70 111 129 134 138 171 179 193 205
which.min(ranef(fit.mix.bin.cl)$cofips$`(Intercept)`)
## [1] 137
rownames(ranef(fit.mix.bin.cl)$cofips)[c(78, 30, 137)]
## [1] "26099" "09009" "36047"

We can get fitted values out of a model by using the predict() method for (lg)mer model fits. Here, I construct the fixed effect matrix for a few people from New Haven, CT 09009, Maccomb county, Michigan 26009 and Kings County, NY 36047.

#Get my different types of people from the fixed effects
newdat1<-expand.grid(male = c(0,1),agez=0, lths=0, coll=0, 
                     black=levels(merged$black),
                     hispanic=levels(merged$hispanic),
                     other=levels(merged$other))
#remove some nonsensicals, people who are multiple ethnicities, etc.
newdat1<-newdat1[c( -7:-8, -15:-16),]

#I need 3 counties worth of people
newdat1<-rbind(newdat1, newdat1, newdat1)
#Put in the county identifier
newdat1$cofips<-c(rep("26099", 12), rep("09009", 12), rep("36047", 12))
#get the poverty z score for these 3 counties
povz <- joindata[joindata$GEOID %in%c("26099", "09009", "36047") , "ppov"]

newdat1$ppov<-c(rep(povz[1], 12), rep(povz[2], 12), rep(povz[3], 12))

#Here's our three counties of example cases:
newdat1
newdat1$fitted<-predict(fit.mix.bin.cl,newdata=newdat1, type="response")

newdat1