Lecture notes

Fundamental problem

Names - Mixed effects models - Hierarchical linear models - Random effects models

These all refer to the same suite of models, but different disciplines/authors/applications have managed to confuse us all.

Perspectives in Multi-level Modeling

Basic multilevel perspectives * The Social epidemiology perspective * General âecologicalâ models * Longitudinal models for change

Each of these situate humans within some higher, contextual level that is assumed to either influence or be influenced by their actions/behaviors

And we would ideally like to be able to incorporate covariates at the individual and higher level that could influence behaviors

Multi-stage sampling

Non-random sampling
Population consists of known sub-groups called clusters
A 2 -stage sample might be households within neighborhoods, or children within schools
We may choose a random sample of schools/neighborhoods at the first stage, and a random sample of people within each school/neighborhood as the second stage
We need to be careful because the observations in the second stage are not independent of one another
Increased probability of selection for children in a selected school
This type of sampling leads to dependent observations

Here’s a picture of this:

Multistage Sampling

Multi-level propositions

When we have a research statement that involves individuals within some context, this is a multi-level proposition. In this sense, we are interested in questions that relate variables at different levels, the micro and the macro. This also holds in general if a sample was collected with a multi-stage sampling scheme.

In a multilevel proposition, variables are present at two different levels, and we are interested in the relationship between both the micro and macro level association with our outcome, y.

Multi-level

This can be contrasted with a purely micro level proposition, where all our observed variables are the level of the individual

micro-level

Likewise, if we are only interested in the relationship between macro level variables, we have this situation:

macro-level

Macro - micro propositions

We commonly encounter the situation where a macro level variable effects a micro level outcome. This can happen in several different ways.

macro-micro

The first case is a macro to micro proposition, which may be exemplified by a statement such as: “Individuals in areas with high environmental contamination leads to higher risk of death”.

Whereas the second frame illustrates a more specific special case, where there is a macro level effect, net of the individual level predictor, and may be stated “For individuals with the a given level of education, living in areas with high environmental contamination leads to higher risk of death”.

The last panel illustrates what is known as a cross level interaction, or a macro-micro interaction. This is where the relationship between x and y is dependent on Z. This leads to the statement “Individuals with low levels of education, living in areas with high environmental contamination have higher risk of death”.

Longitudinal Models

These kinds of models are used when you have observations on the same individuals over multiple time points * Don’t have to be the same times/ages for each observation + More flexibility * These models treat the individual as the higher level unit, and you are interested in studying * Change over time within an individual * Impacts of prior circumstances on later outcomes * Two modeling strategies will allow us to consider individual change over time, versus population averaged change over time.

Multilevel data Preface

Not all data sets will allow you to do multi-level modeling
- Many data sets don’t have any âhigher levelâ units identified, or the units they do have are not necessarily meaningful
Not all problems are multi-level problems
- Unless you are specifying a problem that is interested in how some characteristic of some âhigher levelâ structure is influencing behavior, these models are not for you.

Linear Mixed Model

In the traditional linear model, groups are treated as âfixedâ effects
ANOVA, ANCOVA, MANOVA
For instance the ANOVA model assumes that the group effects are fixed and do not change relative to the reference group
Also the groups represent distinct representations of all possible groups in the population
This model is of the form:

\[y_{ij} = \mu + u_{j} + e_{ij}\]

where \[\mu\] is the grand mean, \(u_{j}\) is the âfixed effectâ of the \(j^{th}\) group and \(e_{ij}\) is the residual for each individual

======================================================== This model assumes that you are capturing all variation in y by the group factor differences, \(u_{j}\) alone.

If you have all your groups,

and your only predictor is the group (factor) level,

and if you expect there to be directional differences across groups a priori,

then this is probably the model for you.

You might use this framework if you want to crudely model the effect of âregion of residenceâ in an analysis. + i.e. is the mean different across my âregionâ?

ANOVA and ANCOVA

The ANOVA and ANCOVA models are extremely useful if:
You simply want to test differences in the mean across groups (ANOVA)

\[ y_{ij} = \mu + u_{j} + e_{ij} \]

Each cell (group) has its mean defined by:

\[ \mu + u_{j} \]

And we typically set one group as the “reference group”
We test if each \[\mu_{j} = 0\] using a t-test and see if all our group means are equal
Also, the global F-test will show us if ANY of the means are different from one another
In the ANCOVA model, you want to examine the effect of a covariate in each group (ANCOVA)

\[ y_{ij} = \mu + u_{j}*\beta x_{i}+ e_{ij} \] + as a simple example

This model contains the interaction between the group factor \(u_{j}\) and \(x_i\)
This is often called the parallel slopes model, because it is testing the assumption from the simpler model:

\[ y_{ij} = \mu + u_{j}+\beta x_{i}+ e_{ij} \]

That all groups have the same \(\beta\) effect on the mean

Basic Random Effect Models

Consider the ANOVA model:

\[ y_{ij} = \mu + u_{j} + e_{ij} \]

The random effects model assumes that each of the group means \(\mu + u_j\) are composed of a grand mean and an iid random effect
This differs from the ANOVA model because the \(u_j\)’s are not considered fixed, by setting a comparison group.

\[ y_{ij} = \mu + u_{j} + e_{ij} \]

Generally, this iid random effect, u is assumed to come from :

\[u_j \sim N(0, \sigma^2)\]

the random effects are centered around the mean \(\mu\)
So that’s why there’s a 0 mean, and the variation in the groups is modeled by the estimated variance in the distribution \(\sigma^2\) +Basically, if \(\sigma^2\) = 0, then there is no variation between groups!
This model is called the random intercept model, because only the intercepts are allowed to vary randomly

Choosing…

There are differences between these classic models and the linear mixed model. As a rule, you use the fixed-effects models when:
1. You know that each group is regarded as unique, and you want to draw conclusions on each of these specific groups,
and you also know all the groups a priori e.g. sex or race
1. If the groups can be considered as a sample from some (real or hypothetical) population of groups, and you want to draw conclusions about this population of groups, then the random effects model is appropriate.
WHY? because if you have a LARGE number of groups,
say \(n_j\) > 10, then the odds that you are really interested in all possible difference in the means is probably pretty low

Forms of the random effect model

There are 2 basic forms for the mixed model
These models may be extended in MANY, MANY, MANY more ways
which is why we’re here
Random Intercepts model
Random Slopes model

Random Intercept Model

The random intercept model assumes you have:
j groups (j=1 to J)
i individuals within the j groups (i=1 to \(n_j\))
for each individual in the j groups you have measured \(y_{ij}\) and \(x_{ij}\) and potentially for each group j, we have may have measured \(z_j\) which is a covariate measured at the group level For example, you do a survey on health and you measure:
y = the health status of each individual
x= SES, race, etc of each individual
j = the county each individual lives in, and
z = the poverty rate or median income in the county
We write our full model, with k predictors as: \[y_{ij} = \beta_{0j} + \sum_{k} {\beta_k x_{ik}} + \gamma z_j + e_{ij}\]
This model has a few features that we can use or not use, as it suits us
e.g. if we don’t have a group-level predictor z, then we won’t have that component of the model
\(\beta_{0j}\) is called the random intercept,
We can write the random intercept as: \[ \beta_{0j} = \beta_0 + u_{j} \]
i.e. a fixed mean intercept and each group’s iid deviation from it
and again u is assumed to come from : \[u_j \sim N(0, \sigma^2)\]
Graphically the \(\beta_{0j}\) term can be seen as:

Random Intercepts

Variance components

This model also estimates variance components, so you can see how much variability is accounted for by adding the random intercept term.
if var( \(y_{ij}\)) is the total variance, \(\sigma^2\)
and var(\(u_j\)) is the higher level variance in the random intercepts, \(\sigma^2 _{u}\)
and var(\(e_{ij}\)) is the residual individual level variance, \(\sigma^2 _{e}\)
we can write the total variance as:

\(\sigma^2\) = \(\sigma^2 _{e}+\sigma^2 _{u}\) + These are called the âvariance componentsâ of the model, + and separate the variance into differences between individuals and differences between groups

the correlation between any two individuals within a given group is:

\[\rho(y_{ij},y_{i'j}) = \frac{ \sigma^2 _{u} }{ \sigma^2 _{u} + \sigma^2 _{e} }\]

is called the intra-class correlation coefficient, and can be interpreted as the correlation between 2 random individuals in a random group, but, I find it more informative to interpret as the fraction of the variance that is due to the groups.

Empirical Example

In this example, I introduce how to fit the multi-level model using the lme4 package. This example considers the linear case of the model, where the outcome is assumed to be continuous, and the model error term is assumed to be Gaussian. Subsequent examples will highlight the Generalized Linear Mixed Model (GLMM).

This example shows how to: * Examine variation between groups using fixed effects * Fit the basic random intercept model :

\(y_{ij} = \mu + u_{j} + e_{ij}\) with \(u_j \sim N(0, \sigma^2)\)

Where the intercepts (\(u_j\)) for each group vary randomly around the overall mean (\(\mu\))

*I also illustrate how to include group-level covariates and how to fit the random slopes model and the model for cross level interactions

The example merges data from the 2014 CDC Behavioral Risk Factor Surveillance System (BRFSS) SMART MSA data. Link and the 2010 American Community Survey 5-year estimates at the MSA level. More details on these data are found below.

#load brfss
library(car)
library(stargazer)
library(survey)
library(sjPlot)
library(ggplot2)
library(pander)
library(knitr)
load("brfss_14.Rdata")
set.seed(12345)
#samps<-sample(1:nrow(brfss14), size = 40000, replace=F)
#brfss14<-brfss14[samps,]
#The names in the data are very ugly, so I make them less ugly
nams<-names(brfss14)
#we see some names are lower case, some are upper and some have a little _ in the first position. This is a nightmare.
newnames<-gsub(pattern = "x_",replacement =  "",x =  nams)
names(brfss14)<-tolower(newnames)

#BMI
brfss14$bmi<-ifelse(is.na(brfss14$bmi5)==T, NA, brfss14$bmi5/100)

#Poor or fair self rated health
#brfss14$badhealth<-ifelse(brfss14$genhlth %in% c(4,5),1,0)
brfss14$badhealth<-recode(brfss14$genhlth, recodes="4:5=1; 1:3=0; else=NA")
#race/ethnicity
brfss14$black<-recode(brfss14$racegr3, recodes="2=1; 9=NA; else=0")
brfss14$white<-recode(brfss14$racegr3, recodes="1=1; 9=NA; else=0")
brfss14$other<-recode(brfss14$racegr3, recodes="3:4=1; 9=NA; else=0")
brfss14$hispanic<-recode(brfss14$racegr3, recodes="5=1; 9=NA; else=0")
brfss14$race_eth<-recode(brfss14$racegr3, recodes="1='nhwhite'; 2='nh black'; 3='nh other';
                         4='nh multirace'; 5='hispanic'; else=NA", as.factor.result = T)
brfss14$race_eth<-relevel(brfss14$race_eth, ref = "nhwhite")
#insurance
brfss14$ins<-ifelse(brfss14$hlthpln1==1,1,0)

#income grouping
brfss14$inc<-ifelse(brfss14$incomg==9, NA, brfss14$incomg)

#education level
brfss14$educ<-recode(brfss14$educa, recodes="1:2='0Prim'; 3='1somehs'; 4='2hsgrad';
                     5='3somecol'; 6='4colgrad';9=NA", as.factor.result=T)
#brfss14$educ<-relevel(brfss14$educ, ref='0Prim')

#employment
brfss14$employ<-recode(brfss14$employ, recodes="1:2='Employed'; 2:6='nilf';
                       7='retired'; 8='unable'; else=NA", as.factor.result=T)
brfss14$employ<-relevel(brfss14$employ, ref='Employed')

#marital status
brfss14$marst<-recode(brfss14$marital, recodes="1='married'; 2='divorced'; 3='widowed';
                      4='separated'; 5='nm';6='cohab'; else=NA", as.factor.result=T)
brfss14$marst<-relevel(brfss14$marst, ref='married')

#Age cut into intervals
brfss14$agec<-cut(brfss14$age80, breaks=c(0,24,39,59,79,99), include.lowest = T)

I want to see how many people we have in each MSA in the data:

#Now we will begin fitting the multilevel regression model with the msa
#that the person lives in being the higher level
head(data.frame(name=table(brfss14$mmsaname),id=unique(brfss14$mmsa)))

##                                                          name.Var1
## 1                      Aberdeen, SD, Micropolitan Statistical Area
## 2             Aguadilla-Isabela, PR, Metropolitan Statistical Area
## 3                   Albuquerque, NM, Metropolitan Statistical Area
## 4 Allentown-Bethlehem-Easton, PA-NJ, Metropolitan Statistical Area
## 5                     Anchorage, AK, Metropolitan Statistical Area
## 6 Atlanta-Sandy Springs-Roswell, GA, Metropolitan Statistical Area
##   name.Freq    id
## 1       620 10100
## 2       544 10380
## 3      1789 10740
## 4      1095 10900
## 5      1785 11260
## 6      2776 12060

#people within each msa

#How many total MSAs are in the data?
length(table(brfss14$mmsa))

## [1] 132

#counties

Higher level predictors

We will often be interested in factors at both the individual AND contextual levels. To illustrate this, I will use data from the American Community Survey measured at the MSA level. Specifically, I use the DP3 table, which provides economic characteristics of places, from the 2010 5 year ACS Link.

library(acs)
#Get 2010 ACS median household incomes for tracts in Texas
msaacs<-geo.make(msa="*")

#ACS table B03002 is Hispanic ethnicity by race 
acsecon<-acs.fetch(key=mykey, endyear=2010, span=5, geography=msaacs, variable = c("B19083_001","B17001_001","B17001_002" ))

colnames(acsecon@estimate)

## [1] "B19083_001" "B17001_001" "B17001_002"

msaecon<-data.frame(gini=acsecon@estimate[, "B19083_001"], 
ppoverty=acsecon@estimate[, "B17001_002"]/acsecon@estimate[, "B17001_001"],
giniz=scale(acsecon@estimate[, "B19083_001"]), 
ppovertyz=scale(acsecon@estimate[, "B17001_002"]/acsecon@estimate[, "B17001_001"]))

msaecon$ids<-paste(acsecon@geography$metropolitanstatisticalareamicropolitanstatisticalarea)
head(msaecon)

##                                                  gini  ppoverty    giniz
## Adjuntas, PR Micro Area                         0.525 0.5925656 2.718785
## Aguadilla-Isabela-San SebastiÃ¡n, PR Metro Area 0.529 0.5577311 2.847562
## Coamo, PR Micro Area                            0.532 0.5633984 2.944145
## Fajardo, PR Metro Area                          0.483 0.4309070 1.366623
## Guayama, PR Metro Area                          0.518 0.4980518 2.493425
## Jayuya, PR Micro Area                           0.502 0.5446137 1.978315
##                                                 ppovertyz   ids
## Adjuntas, PR Micro Area                          6.271934 10260
## Aguadilla-Isabela-San SebastiÃ¡n, PR Metro Area  5.762039 10380
## Coamo, PR Micro Area                             5.844994 17620
## Fajardo, PR Metro Area                           3.905632 21940
## Guayama, PR Metro Area                           4.888474 25020
## Jayuya, PR Micro Area                            5.570031 27580

Let’s see the geographic variation in these economic indicators:

library(tigris)

## 
## Attaching package: 'tigris'

## The following object is masked from 'package:graphics':
## 
##     plot

msa<-core_based_statistical_areas(cb=T)
msa_ec<-geo_join(msa, msaecon, "CBSAFP", "ids", how="inner")

library(RColorBrewer)
library(sp)
spplot(msa_ec, "gini", at=quantile(msa_ec$gini), col.regions=brewer.pal(n=6, "Reds"), col="transparent")

Merge the MSA data to the BRFSS data

joindata<-merge(brfss14, msaecon, by.x="mmsa",by.y="ids", all.x=T)
joindata$bmiz<-scale(joindata$bmi, center=T, scale=T)
joindata<-joindata[complete.cases(joindata[, c("bmiz", "race_eth", "agec", "educ", "gini")]),]
#and merge the data back to the kids data


head(joindata[, c("bmiz", "race_eth", "agec", "educ", "gini","ppoverty", "mmsa")])

##         bmiz race_eth    agec     educ  gini  ppoverty  mmsa
## 1 -1.0300232  nhwhite (39,59]    0Prim 0.423 0.1034143 10100
## 2 -0.7833572  nhwhite (24,39] 4colgrad 0.423 0.1034143 10100
## 3 -0.9588377  nhwhite (59,79] 4colgrad 0.423 0.1034143 10100
## 4 -0.6674739  nhwhite (39,59] 4colgrad 0.423 0.1034143 10100
## 5  3.5821344  nhwhite (59,79]  2hsgrad 0.423 0.1034143 10100
## 6 -0.3231347  nhwhite (39,59] 3somecol 0.423 0.1034143 10100

As a general rule, I will do a basic fixed-effects ANOVA as a precursor to doing full multi-level models, just to see if there is any variation amongst my higher level units (groups). If I do not see any variation in my higher level units, I generally will not proceed with the process of multi-level modeling.

fit.an<-lm(bmiz~as.factor(mmsa), joindata)
anova(fit.an)

## Analysis of Variance Table
## 
## Response: bmiz
##                     Df Sum Sq Mean Sq F value    Pr(>F)    
## as.factor(mmsa)    113   1210 10.7120  10.676 < 2.2e-16 ***
## Residuals       175697 176286  1.0034                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

library(lme4)
library(lmerTest)
library(arm)
fit<-lmer(bmiz~agec+educ+race_eth+(1|mmsa), data=joindata, na.action=na.omit)
arm::display(fit, detail=T)

## lme4::lmer(formula = bmiz ~ agec + educ + race_eth + (1 | mmsa), 
##     data = joindata, na.action = na.omit)
##                      coef.est coef.se t value
## (Intercept)           -0.37     0.02  -18.19 
## agec(24,39]            0.45     0.01   38.76 
## agec(39,59]            0.61     0.01   55.74 
## agec(59,79]            0.54     0.01   49.98 
## agec(79,99]            0.14     0.01   10.64 
## educ1somehs            0.00     0.02    0.23 
## educ2hsgrad           -0.05     0.02   -2.82 
## educ3somecol          -0.05     0.02   -2.82 
## educ4colgrad          -0.24     0.02  -14.58 
## race_ethhispanic       0.12     0.01   11.67 
## race_ethnh black       0.32     0.01   35.90 
## race_ethnh multirace   0.11     0.02    5.38 
## race_ethnh other      -0.11     0.01   -8.14 
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  mmsa     (Intercept) 0.07    
##  Residual             0.98    
## ---
## number of obs: 175811, groups: mmsa, 114
## AIC = 491887, DIC = 491660.7
## deviance = 491758.9

fit2<-lmer(bmiz~agec+educ+race_eth+(1|mmsa),
           weights = mmsawt/mean(mmsawt, na.rm=T), data=joindata,  na.action=na.omit)
arm::display(fit2, detail=T)

## lme4::lmer(formula = bmiz ~ agec + educ + race_eth + (1 | mmsa), 
##     data = joindata, weights = mmsawt/mean(mmsawt, na.rm = T), 
##     na.action = na.omit)
##                      coef.est coef.se t value
## (Intercept)           -0.41     0.02  -24.55 
## agec(24,39]            0.49     0.01   59.11 
## agec(39,59]            0.65     0.01   82.13 
## agec(59,79]            0.59     0.01   69.58 
## agec(79,99]            0.18     0.01   13.05 
## educ1somehs            0.00     0.01    0.13 
## educ2hsgrad           -0.05     0.01   -3.59 
## educ3somecol          -0.05     0.01   -3.53 
## educ4colgrad          -0.26     0.01  -19.72 
## race_ethhispanic       0.14     0.01   17.01 
## race_ethnh black       0.32     0.01   42.99 
## race_ethnh multirace   0.07     0.02    3.43 
## race_ethnh other      -0.26     0.01  -22.53 
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  mmsa     (Intercept) 0.08    
##  Residual             1.00    
## ---
## number of obs: 175811, groups: mmsa, 114
## AIC = 634543, DIC = 634313.4
## deviance = 634413.4

anova(fit, fit2)

## Data: joindata
## Models:
## object: bmiz ~ agec + educ + race_eth + (1 | mmsa)
## ..1: bmiz ~ agec + educ + race_eth + (1 | mmsa)
##        Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
## object 15 491789 491940 -245879   491759                        
## ..1    15 634443 634595 -317207   634413     0      0          1

rand(fit)

## Analysis of Random effects Table:
##      Chi.sq Chi.DF p.value    
## mmsa    549      1  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Which shows significant variation in average BMI across counties.

As a note, if my outcome is binomial or poisson, I would do p instead of lm() as this would give you the correct test for the other distributions.

Fitting the basic multi-level model

The random intercept model assumes you have:
j groups (j=1 to J)
i individuals within the j groups (i=1 to \(n_j\))
for each individual in the j groups you have measured \(y_{ij}\) and \(x_{ij}\) and potentially for each group j, we have may have measured \(z_j\) which is a covariate measured at the group level For example, you do a survey on health and you measure:
y = the health status of each individual
x= SES, race, etc of each individual
j = the higher level each individual lives in, and
z = the poverty rate or median income in the higher level

To specify a random intercept model for higher levels, we add, a model term that is (1|HIGHER LEVEL VARIABLE), which tells R to fit only a random intercept for each higher levely, in our case it will be (1|mmsa)

fit.mix<-lmer(bmiz~agec+educ+race_eth+(1|mmsa), data=joindata)
#do a test for the random effect
rand(fit.mix)

## Analysis of Random effects Table:
##      Chi.sq Chi.DF p.value    
## mmsa    549      1  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

display(fit.mix, detail=T)

## lme4::lmer(formula = bmiz ~ agec + educ + race_eth + (1 | mmsa), 
##     data = joindata)
##                      coef.est coef.se t value
## (Intercept)           -0.37     0.02  -18.19 
## agec(24,39]            0.45     0.01   38.76 
## agec(39,59]            0.61     0.01   55.74 
## agec(59,79]            0.54     0.01   49.98 
## agec(79,99]            0.14     0.01   10.64 
## educ1somehs            0.00     0.02    0.23 
## educ2hsgrad           -0.05     0.02   -2.82 
## educ3somecol          -0.05     0.02   -2.82 
## educ4colgrad          -0.24     0.02  -14.58 
## race_ethhispanic       0.12     0.01   11.67 
## race_ethnh black       0.32     0.01   35.90 
## race_ethnh multirace   0.11     0.02    5.38 
## race_ethnh other      -0.11     0.01   -8.14 
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  mmsa     (Intercept) 0.07    
##  Residual             0.98    
## ---
## number of obs: 175811, groups: mmsa, 114
## AIC = 491887, DIC = 491660.7
## deviance = 491758.9

So we see that our standard deviation at the MSA level is .1, and the standard deviation at the individual level (residual standard deviation) is .99. Square these to get variances, of course. Our fixed effects are interpreted as normal, older people have higher BMI’s than younger people, those with less than High School education have higher BMI’s, while people with college education have lower BMI’s than those with a high school education only. In terms of race/ethnicity, African Americans and Hispanics have higher average BMI’s, compared to Non-Hispanic Whites, while those of some other race/ethnicity have lower average BMI’s compared to whites. See, just like ordinary regression.

Some may be interested in getting the intra-class correlation coefficient. While I don’t usually pay attention to this, here it is:

#it can be a little hairy to get it, but it can be done using various part of VarCorr()
ICC1<-VarCorr(fit)$mmsa[1]/( VarCorr(fit)$mmsa[1]+attr(VarCorr(fit), "sc"))
ICC1

## [1] 0.004799509

So less than 1% of the variance in BMI is due to difference between MSAs. That’s not much, but according to our random effect testing, it’s not, statistically speaking, 0.

Sometimes, to gain the appreciation, we may want to plot the random effects, I first show the fixed effects, then the random effects:

#I need to rescale the estimates to the BMI scale from the z-score scale
meanbmi<-mean(joindata$bmi, na.rm=T)
sdbmi<-sd(joindata$bmi, na.rm=T)


fixcoef<-fit.an$coefficients[1]+fit.an$coefficients[-1]
fixcoef<-(fixcoef*sdbmi)+meanbmi

plot(NULL,ylim=c(20, 30), xlim=c(0,1), 
     ylab="Intercept", xlab="Age") # get the ylim from a summary of rancoefs1
title(main="Fixed Effect Model")

for (i in 1: length(fixcoef)[1]){
  #I plug in a beta, here it's the effect of age from fit.mix
  abline(a=fixcoef[i], b=.01,  lwd=1.5, col="green")
}

#It may be easier to visualize the random intercepts by plotting them
rancoefs1<-ranef(fit.mix)$mmsa+fixef(fit.mix)[1]
rancoefs1<-(rancoefs1*sdbmi)+meanbmi
summary(rancoefs1)

##   (Intercept)   
##  Min.   :24.48  
##  1st Qu.:25.30  
##  Median :25.62  
##  Mean   :25.58  
##  3rd Qu.:25.85  
##  Max.   :26.42

plot(NULL,ylim=c(20,30), xlim=c(0,1),
     ylab="Intercept", xlab="Age") # get the ylim from a summary of rancoefs1
title(main="Random Intercept Models")

for (i in 1: length(rancoefs1[,1])){
  #I plug in a beta, here it's the effect of age from fit.mix
  abline(a=rancoefs1[i,1], b=.01,  lwd=1.5, col="maroon")
}

Comparing estimates from the linear mixed model to traditional estimates

Shrinkage

In mixed models, we observe an effect referred to as “shrinkage” of the estimates of the means for each group.

This term refers to how group-level estimates in multi level models use information from all groups to produce the individual group estimates.

So, if we have a group with a larger sample size, the estimate of the mean in that group will have lots of information, and low variance in the estimate.

While a group with a small sample size will have less information about the mean for that group, and generally a higher variance estimate of the mean. Following Gelman and Hill p477, if we have a multilevel model where

\[y_i \sim N \left( \alpha_{j[i]}, \sigma_y^2 \right)\] and \[\alpha_{j} \sim N\left( \mu_{\alpha}, \sigma_{\alpha}^2 \right)\] and \(n_j\) is the sample size within each group. The multilevel estimate of the mean in each group, \(\alpha_j\) is

\[\alpha_j^{multilevel} = \omega_j \mu_{\alpha} + (1-\omega_j) \bar y_j\] where \[\omega_j = 1- \frac{\sigma_{\alpha}^2}{\sigma_{\alpha}^2+\sigma_{y}^2/n_j} \] is a pooling, or weighting factor. If \(\omega\) = 1 the we have complete pooling, and the group means equal the population mean, or when \(\omega\)=0, we have no pooling and the group means are totally defined with no contribution of the overall mean. This factor \(1-\omega_j\) is called the shrinkage factor, and describes how much information about each group mean is contributed from the overall population mean, versus the means of each group individually.

So what’s the difference? It may be informative to plot the estimated BMI’s for each MSA from the OLS and multilevel models. This section illustrates the effects of “pooling” as Gelman & Hill ch 12.

#these models are good for estimating group means better than traditional methods
#this follows the examples in chapter 12 of Gelman and Hill, I stole the code directly from them.

#complete pooling, this model fits the grand mean ONLY
fit.cp<-lm(bmi~1, joindata)
display(fit.cp)

## lm(formula = bmi ~ 1, data = joindata)
##             coef.est coef.se
## (Intercept) 27.84     0.01  
## ---
## n = 175811, k = 1
## residual sd = 6.07, R-Squared = 0.00

#No pooling i.e. fixed effects regression, this model fits separate means for each MSA using OLS
lm.unpooled<-lm(bmi~factor(mmsa)-1, joindata)


#partial pooling, this model fits the population mean and MSA deviations using multilevel models
fit0<-lmer(bmi~1+(1|mmsa), joindata)

#partial pooling with covariate
fit0.1<-lmer(bmi~ppovertyz+gini+(1|mmsa), joindata)


#Plot the means of the counties
J<-length(unique(joindata$mmsa))
ns<-as.numeric(table(fit0.1@frame$mmsa))
sample.size <- as.numeric(table(fit0.1@frame$mmsa))
sample.size.jittered <- sample.size*exp (runif (J, -.1, .1))

par (mar=c(5,5,4,2)+.1)
plot (sample.size.jittered, coef(lm.unpooled), cex.lab=1.2, cex.axis=1.2,
      xlab="sample size in MSA j", 
      ylab=expression (paste("est. intercept, ", alpha[j], "   (no pooling)")),
      pch=20, log="x", ylim=c(25, 30), yaxt="n", xaxt="n")
axis (1, quantile(ns), cex.axis=1.1)
axis (2, seq(25, 30), cex.axis=1.1)
for (j in 1:J){
  lines (rep(sample.size.jittered[j],2),
         coef(lm.unpooled)[j] + c(-1,1)*se.coef(lm.unpooled)[j], lwd=.5)
}
abline (coef(fit.cp)[1], 0, lwd=.5)
title(main="Estimates of MSA Means from the Fixed Effect Model")

#plot MLM estimates of MSA means + se's
par (mar=c(5,5,4,2)+.1)
a.hat.M1 <- coef(fit0)$mmsa[,1]
a.se.M1 <- se.coef(fit0)$mmsa
ns<-as.numeric(table(fit0@frame$mmsa))
plot (as.numeric(ns), t(a.hat.M1), cex.lab=1.2, cex.axis=1.1,
      xlab="sample size in MSA j",
      ylab=expression (paste("est. intercept, ", alpha[j], "(multilevel model)")),
      pch=20, log="x", ylim=c(25, 30), yaxt="n", xaxt="n")
axis (1, quantile(ns), cex.axis=1.1)
axis (2, seq(25,30), cex.axis=1.1)
for (j in 1:length(unique(joindata$mmsa))){
  lines (rep(as.numeric(ns)[j],2),
         as.vector(a.hat.M1[j]) + c(-1,1)*a.se.M1[j], lwd=.5, col="gray10")
}
abline (coef(fit.cp)[1], 0, lwd=.5)
title(main="Estimates of MSA Means from the MLM")

#plot MLM estimates of MSA means + se's, model with MSA covariates
par (mar=c(5,5,4,2)+.1)
a.hat.M2 <- coef(fit0.1)$mmsa[,1]
a.se.M2 <- se.coef(fit0.1)$mmsa
ns<-as.numeric(table(fit0.1@frame$mmsa))
plot (as.numeric(ns), t(a.hat.M2), cex.lab=1.2, cex.axis=1.1,
      xlab="sample size in MSA j",
      ylab=expression (paste("est. intercept, ", alpha[j], "(multilevel model with covariates)")),
      pch=20, log="x", ylim=c(25, 30), yaxt="n", xaxt="n")
axis (1, quantile(ns), cex.axis=1.1)
axis (2, seq(25,30), cex.axis=1.1)
for (j in 1:length(unique(joindata$mmsa))){
  lines (rep(as.numeric(ns)[j],2),
         as.vector(a.hat.M2[j]) + c(-1,1)*a.se.M2[j], lwd=.5, col="gray10")
}
abline (coef(fit.cp)[1], 0, lwd=.5)
title(main="Estimates of MSA Means from the MLM with MSA predictors")

In the model with no pooling, we see greater variance in the estimates for MSAs with smaller sample sizes, although not terribly variable in this case since we have lots of data, and in the model with pooling, the variance in these estimates is reduced, becuase we are using the estimates from the MSAs with lots of information to estimate the population mean of the MSAs with great precision.

Multilevel model with group-level predictors

Now, I fit the same model above, but this time I include a predictor at the MSA level, the Gini coefficient.

#Now I estimate the multilevel model including the effects for the
#MSA level variables
fit.mix2<-lmer(bmiz~agec+educ+race_eth+giniz+(1|mmsa), data=joindata)
display(fit.mix2, detail=T)

## lme4::lmer(formula = bmiz ~ agec + educ + race_eth + giniz + 
##     (1 | mmsa), data = joindata)
##                      coef.est coef.se t value
## (Intercept)           -0.37     0.02  -17.98 
## agec(24,39]            0.45     0.01   38.76 
## agec(39,59]            0.61     0.01   55.76 
## agec(59,79]            0.54     0.01   50.02 
## agec(79,99]            0.14     0.01   10.69 
## educ1somehs            0.00     0.02    0.22 
## educ2hsgrad           -0.05     0.02   -2.84 
## educ3somecol          -0.05     0.02   -2.84 
## educ4colgrad          -0.24     0.02  -14.59 
## race_ethhispanic       0.12     0.01   11.88 
## race_ethnh black       0.32     0.01   35.99 
## race_ethnh multirace   0.11     0.02    5.40 
## race_ethnh other      -0.11     0.01   -8.11 
## giniz                 -0.02     0.01   -2.44 
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  mmsa     (Intercept) 0.07    
##  Residual             0.98    
## ---
## number of obs: 175811, groups: mmsa, 114
## AIC = 491891, DIC = 491647.1
## deviance = 491752.9

rand(fit.mix2)

## Analysis of Random effects Table:
##      Chi.sq Chi.DF p.value    
## mmsa    527      1  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ICC2<-VarCorr(fit.mix2)$mmsa[1]/( VarCorr(fit.mix2)$mmsa[1]+attr(VarCorr(fit.mix2), "sc"))
ICC2

## [1] 0.004586173

#compare the random intercept ad multilevel model with a LRT
anova(fit.mix, fit.mix2)

## refitting model(s) with ML (instead of REML)

## Data: joindata
## Models:
## object: bmiz ~ agec + educ + race_eth + (1 | mmsa)
## ..1: bmiz ~ agec + educ + race_eth + giniz + (1 | mmsa)
##        Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)  
## object 15 491789 491940 -245879   491759                          
## ..1    16 491785 491946 -245876   491753 5.925      1    0.01493 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ICC goes down, which points to the fact that we are controlling away some of the between-MSA variance by including the MSA level predictor.

Random slope models

We expand the random effect model to include group-varying slopes by:

\[\begin{aligned} \ E( Y) = \beta_{0j}+ \beta_{ j} x\\ \beta_{0j} = \beta_0 + u_{1j}\\ \beta_{j} = \beta + u_{2j}\\ \end{aligned}\\\]

Where \(\beta_0\) is the average intercept, and \(u_{1j }\) is the group-specific deviation in the intercept And where \(\beta\) is the average slope, and \(u_{2j }\) is the group-specific deviation in the average slope

This effectively allows one or more of the individual level variables to have different effects in each of the groups.

#To do a random slope model, I do:
fit.mix3<-lmer(bmiz~agec+educ+race_eth+giniz+(race_eth|mmsa), joindata, REML=F)
display(fit.mix3, detail=T)

## lme4::lmer(formula = bmiz ~ agec + educ + race_eth + giniz + 
##     (race_eth | mmsa), data = joindata, REML = F)
##                      coef.est coef.se t value
## (Intercept)           -0.37     0.02  -18.14 
## agec(24,39]            0.45     0.01   38.72 
## agec(39,59]            0.60     0.01   55.66 
## agec(59,79]            0.54     0.01   49.95 
## agec(79,99]            0.14     0.01   10.68 
## educ1somehs            0.01     0.02    0.29 
## educ2hsgrad           -0.04     0.02   -2.68 
## educ3somecol          -0.04     0.02   -2.66 
## educ4colgrad          -0.24     0.02  -14.34 
## race_ethhispanic       0.11     0.02    7.17 
## race_ethnh black       0.31     0.01   21.45 
## race_ethnh multirace   0.10     0.03    3.92 
## race_ethnh other      -0.12     0.02   -5.98 
## giniz                 -0.01     0.01   -1.48 
## 
## Error terms:
##  Groups   Name                 Std.Dev. Corr                    
##  mmsa     (Intercept)          0.07                             
##           race_ethhispanic     0.09     -0.29                   
##           race_ethnh black     0.09     -0.18  0.38             
##           race_ethnh multirace 0.15      0.11 -0.65 -0.36       
##           race_ethnh other     0.12      0.07 -0.16 -0.09  0.39 
##  Residual                      0.98                             
## ---
## number of obs: 175811, groups: mmsa, 114
## AIC = 491710, DIC = 491650.3
## deviance = 491650.3

#compare the models with a LRT
anova(fit.mix2, fit.mix3)

## refitting model(s) with ML (instead of REML)

## Data: joindata
## Models:
## object: bmiz ~ agec + educ + race_eth + giniz + (1 | mmsa)
## ..1: bmiz ~ agec + educ + race_eth + giniz + (race_eth | mmsa)
##        Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## object 16 491785 491946 -245876   491753                             
## ..1    30 491710 492013 -245825   491650 102.66     14  1.462e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#the random slope model fits better

#plot random slopes and intercepts
rancoefs2<-ranef(fit.mix3)

plot(NULL, ylim=c(-.6, 0), xlim=c(0,1),ylab="Intercept", xlab="Hispanic Ethnicity")
title (main="Random Slope and Intercept Model - Hispanic by Metro")

for (i in 1: dim(rancoefs2$mmsa)[1]){
  
  abline(a=fixef(fit.mix3)["(Intercept)"]+rancoefs2$mmsa[[1]][i],
         b=fixef(fit.mix3)["race_ethhispanic"]+rancoefs2$mmsa[[2]][i], col=2)
}

Cross level interaction effects

Here, I show the model for a cross-level interaction. This model fits an interaction term between (at least) one individual level variable and a group level variable. This allows you to ask very informative questions regarding individuals within specific contexts.

#Cross-level interaction model:
fit.mix4<-lmer(bmiz~agec+educ+race_eth+gini*race_eth+(1|mmsa), joindata, REML=F)
display(fit.mix4, detail=T)

## lme4::lmer(formula = bmiz ~ agec + educ + race_eth + gini * race_eth + 
##     (1 | mmsa), data = joindata, REML = F)
##                           coef.est coef.se t value
## (Intercept)                 0.00     0.14   -0.01 
## agec(24,39]                 0.45     0.01   38.75 
## agec(39,59]                 0.61     0.01   55.77 
## agec(59,79]                 0.54     0.01   50.05 
## agec(79,99]                 0.14     0.01   10.72 
## educ1somehs                 0.00     0.02    0.22 
## educ2hsgrad                -0.05     0.02   -2.83 
## educ3somecol               -0.05     0.02   -2.83 
## educ4colgrad               -0.24     0.02  -14.58 
## race_ethhispanic            0.03     0.18    0.19 
## race_ethnh black           -0.63     0.24   -2.63 
## race_ethnh multirace        1.17     0.49    2.40 
## race_ethnh other            0.29     0.34    0.86 
## gini                       -0.83     0.32   -2.64 
## race_ethhispanic:gini       0.20     0.40    0.51 
## race_ethnh black:gini       2.11     0.53    3.96 
## race_ethnh multirace:gini  -2.40     1.10   -2.18 
## race_ethnh other:gini      -0.91     0.76   -1.20 
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  mmsa     (Intercept) 0.07    
##  Residual             0.98    
## ---
## number of obs: 175811, groups: mmsa, 114
## AIC = 491770, DIC = 491729.8
## deviance = 491729.8

This basically says that blacks in counties with higher than average income inequality rates (povz==1 vs povz==0) have higher BMI’s, while those of other race/ethnicities have lower BMI’s

#compare the models with a LRT
anova(fit.mix3, fit.mix4)

## Data: joindata
## Models:
## ..1: bmiz ~ agec + educ + race_eth + gini * race_eth + (1 | mmsa)
## object: bmiz ~ agec + educ + race_eth + giniz + (race_eth | mmsa)
##        Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## ..1    20 491770 491971 -245865   491730                             
## object 30 491710 492013 -245825   491650 79.534     10  6.195e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

DEM 7283 - Multi-level Models Example 1

Corey Sparks, PhD

March 27, 2017

Lecture notes

Fundamental problem

Perspectives in Multi-level Modeling

Multi-stage sampling

Multi-level propositions

Longitudinal Models

Multilevel data Preface

Linear Mixed Model

ANOVA and ANCOVA

Basic Random Effect Models

Choosing…

Forms of the random effect model

Random Intercept Model

Variance components

Empirical Example

Higher level predictors

Fitting the basic multi-level model

Comparing estimates from the linear mixed model to traditional estimates

Shrinkage

Multilevel model with group-level predictors

Random slope models

Cross level interaction effects