Introduction to Hierarchical Models

Corey Sparks
DEM 7903, Fall 2014

Fundamental problem

Names

Mixed effects models
Hierarchical linear models
Random effects models

These all refer to the same suite of models, but different disciplines/authors/applications have managed to confuse us all.

Perspectives in Multi-level Modeling

Basic multilevel perspectives

The Social epidemiology perspective
General “ecological” models
Longitudinal models for change

Each of these situate humans within some higher, contextual level that is assumed to either influence or be influenced by their actions/behaviors

And we would ideally like to be able to incorporate covariates at the individual and higher level that could influence behaviors

Example of a Multi-level problem

test

Longitudinal Models

These kinds of models are used when you have observations on the same individuals over multiple time points

Don't have to be the same times/ages for each observation
- More flexibility
These models treat the individual as the higher level unit, and you are interested in studying
Change over time within an individual
Impacts of prior circumstances on later outcomes

Preface

Not all data sets will allow you to do multi-level modeling
- Many data sets don't have any “higher level” units identified, or the units they do have are not necessarily meaningful
Not all problems are multi-level problems
- Unless you are specifying a problem that is interested in how some characteristic of some “higher level” structure is influencing behavior, these models are not for you.

Linear Mixed Model

In the traditional linear model, groups are treated as “fixed” effects
- ANOVA, ANCOVA, MANOVA
For instance the ANOVA model assumes that the group effects are fixed and do not change relative to the reference group
- Also the groups represent distinct representations of all possible groups in the population
This model is of the form: \[ y_{ij} = \mu + u_{j} + e_{ij} \]
where $ \mu $ is the grand mean, $ u_{j} $ is the “fixed effect” of the $ j^{th} $ group and $ e_{ij} $ is the residual for each individual

This model assumes that you are capturing all variation in y by the group factor differences, $ u_{j} $ alone.

If you have all your groups,

and your only predictor is the group (factor) level,

and if you expect there to be directional differences across groups a priori,

then this is probably the model for you.

You might use this framework if you want to crudely model the effect of “region of residence” in an analysis.

i.e. is the mean different across my “region”?

ANOVA and ANCOVA

The ANOVA and ANCOVA models are extremely useful if:
- You simply want to test differences in the mean across groups (ANOVA) \[ y_{ij} = \mu + u_{j} + e_{ij} \]
- Each cell (group) has its mean defined by: \[ \mu + u_{j} \]
- And we typically set one group as the “reference group”
- We test if each $$\mu_{j} = 0$$ using a t-test and see if all our group means are equal
- Also, the global F-test will show us if ANY of the means are different from one another

ANOVA and ANCOVA

In the ANCOVA model, you want to examine the effect of a covariate in each group (ANCOVA) \[ y_{ij} = \mu + u_{j}*\beta x_{i}+ e_{ij} \]
- as a simple example
- This model contains the interaction between the group factor $ u_{j} $ and $ x_i $
- This is often called the parallel slopes model, because it is testing the assumption from the simpler model: \[ y_{ij} = \mu + u_{j}+\beta x_{i}+ e_{ij} \]
- That all groups have the same $ \beta $ effect on the mean

Basic Random Effect Models

Consider the ANOVA model:

\[ y_{ij} = \mu + u_{j} + e_{ij} \]
- The random effects model assumes that each of the group means $ \mu + u_j $ are composed of a grand mean and an iid random effect
- This differes from the ANOVA model because the $ u_j $'s are not considered fixed, by setting a comparison group.

Basic Random Effect Models

\[ y_{ij} = \mu + u_{j} + e_{ij} \]

Generally, this iid random effect, u is assumed to come from : \[ u_j \sim N(0, \sigma^2) \]
- the random effects are centered around the mean $ \mu $
- So that's why there's a 0 mean, and the variation in the groups is modeled by the estimated variance in the distribution $ \sigma^2 $ +Basically, if $ \sigma^2 $ = 0, then there is no variation between groups!
This model is called the random intercept model, because only the intercepts are allowed to vary randomly

Choosing...

There are differences between these classic models and the linear mixed model. As a rule, you use the fixed-effects models when:
- 1) You know that each group is regarded as unique, and you want to draw conclusions on each of these specific groups,
- and you also know all the groups a priori e.g. sex or race

Choosing

2) If the groups can be considered as a sample from some (real or hypothetical) population of groups, and you want to draw conclusions about this population of groups, then the random effects model is appropriate.
- WHY? because if you have a LARGE number of groups,
- say $ n_j $ > 10, then the odds that you are really interested in all possible difference in the means is probably pretty low

Forms of the random effect model

There are 2 basic forms for the mixed model
These models may be extended in MANY, MANY, MANY more ways
- which is why we're here
- Random Intercepts model
- Random Slopes model

Random Intercept Model

The random intercept model assumes you have:
- j groups (j=1 to J)
- i individuals within the j groups (i=1 to $ n_j $)
- for each individual in the j groups you have measured $ y_{ij} $ and $ x_{ij} $ and potentially for each group j, we have may have measured $ z_j $ which is a covariate measured at the group level For example, you do a survey on health and you measure:
y = the health status of each individual
x= SES, race, etc of each individual
j = the county each individual lives in, and
z = the poverty rate or median income in the county

Random Intercept Model

We write our full model, with k predictors as: \[ y_{ij} = \beta_{0j} + \sum_{k} {\beta_k x_{ik}} + \gamma z_j + e_{ij} \]
- This model has a few features that we can use or not use, as it suits us
- e.g. if we don't have a group-level predictor z, then we won't have that component of the model
- $ \beta_{0j} $ is called the random intercept,
- We can write the random intercept as: \[ \beta_{0j} = \beta_0 + u_{j} \]
- i.e. a fixed mean intercept and each group's iid deviation from it
- and again u is assumed to come from : \[ u_j \sim N(0, \sigma^2) \]

Random Intercept Model

Graphically the $ \beta_{0j} $ term can be seen as:

Variance components

This model also estimates variance components, so you can see how much variability is accounted for by adding the random intercept term.
- if var($ y_{ij} $) is the total variance, $ \sigma^2 $
- and var($ u_j $) is the higher level variance in the random intercepts, $ \sigma^2 _{u} $
- and var($ e_{ij} $) is the residual individual level variance, $ \sigma^2 _{e} $
we can write the total variance as: $ \sigma^2 $ = $ \sigma^2 _{e}+\sigma^2 _{u} $
- These are called the “variance components” of the model,
- and separate the variance into differences between individuals and differences between groups

Variance components

the correlation between any two individuals within a given group is: \[ \rho(y_{ij},y_{i'j}) = \frac{ \sigma^2 _{u} }{ \sigma^2 _{u} + \sigma^2 _{e} } \]
is called the intra-class correlation coefficient, and can be interpreted as the correlation between 2 random individuals in a random group, but, I find it more informative to interpret as the fraction of the variance that is due to the groups.