Hierarchical Linear Model

author: angelayuan
date: Sep 6, 2015

Hierarchical Data Structures

Many social and natural phenomena have a nested or clustered organization

Children within classrooms within schools
Patients in a medical study grouped within doctors within different clinics
Effect sizes within studies within methods (meta analysis)
Time of measurement within persons within organizations

Simpson’s Paradox: Clustering is important

performance of individual groups is reversed when the groups are combined.

df <- data.frame(Quiz1 = c("60.0%","90.0%"), Quiz2 = c("10.0%","30.0%"))
row.names(df)<-c("Gina","Sam")
df

##      Quiz1 Quiz2
## Gina 60.0% 10.0%
## Sam  90.0% 30.0%

df_combine <- data.frame(Quiz1 = c("60/100","9/10"), Quiz2 = c("1/10","30/100"), Total = c("61/110","39/110"))
row.names(df_combine)<-c("Gina","Sam")
df_combine

##       Quiz1  Quiz2  Total
## Gina 60/100   1/10 61/110
## Sam    9/10 30/100 39/110

What is Hierarchical Linear Model ?

HLM is a complex form of ordinary least squares (OLS) regression that is used to analyze variance in the outcome variables when the predictor variables are at varying hierarchical levels.
HLM simultaneously investigates relationships within and between hierarchical levels of grouped data, making it more efficient at accounting for variance among variables at different levels than other existing analyses.
HLM has come to be known by several names, including multilevel-, mixed level-, mixed linear-, mixed effects-, random effects-, random coefficient (regression)-, and (complex) covariance components-modeling.

Why Is Multilevel Analysis Needed ?

Nesting creates dependencies in the data

Dependencies violate the assumptions of traditional statistical models (“independence of error”, “homogeneity of regression slopes”)
Dependencies result in inaccurate statistical estimates

Important to understand variation at different levels

Methods to analyze hierarchical data

This part compare and contrast HLM to the methods used to anlayze hierarchical data prior to HLM’s development.

Disaggregation: It deals with hierarchical data by ignoring the presence of group differences. It considers all relationships between variables to be context free and situated at level-1 of the hierarchy (i.e., at the individual level).
- For example, all students in the same class would be assigned the same mean classroom-related scores (e.g.homework assignment load and teaching experience), and all students in the same school would be assigned the same mean school-related scores (e.g., school geographic location)
- If teaching experience influences student breakfast consumption, the effects of the level-1 (student) and level-2 (classroom) variables on the outcome variable (GPA) cannot be disentangled.

alt text

Aggregation: It deals with hierarchical data by ignoring lower level individual differences, i.e. within-group variation is ignored and individuals are treated as homogenous entities.

alt text

HLM: HLM is needed for that (1) nesting creates dependencies in the data, and (2) it is important to understand variation at different levels.
- For example, we want to investigate the relationship between breakfast consumption and student GPA using HLM. Each level-1 (X,Y) unit (i.e., each student’s GPA and breakfast consumption) is identified by its level-2 cluster (i.e., that student’s classroom). Each level-2 cluster’s slope (i.e., each classroom’s slope) is also identified and analyzed separately.
- Using HLM, both the within- and between-group regressions are taken into account to depict the relationship between breakfast consumption and GPA.
- Disadvantages of HLM: (1) it requires large sample sizes for adequate power; (2) it can only handle missing data at level-1 and removes groups with missing data if they are at level-2 or above. Therefore it is advantegeous to increase the number of groups as opposed to the number of observations per group. A study with 30 groups with 30 observations each $(n = 30 \times 30 = 900)$ can have the same power as 150 groups with 5 observations each $(n = 5 \times 150 = 750)$.

alt text

Equations underlying HLM

HLM in this part is limited to two-level hierarchical data structures concerning continuous outcome variables.

Two models are developed in order to achieve the simultaneous investigation of the relationship within a given hierarchical level as well as the relationship across levels: one that reflects the relationship within lower level units, and a second that models how the relationship within lower level units varies between units.

Level-1 (Within-unit) models: $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}$ with $E(r_{ij}) = 0$ and $var(r_{ij}) = \sigma^{2}$

$Y_{ij}$: dependent variable measured for ith level-1 unit nested within the jth level-2 unit
$X_{ij}$: value on the level-1 predictor
$\beta_{0j}$: intercept for the jth level-2 unit
$\beta_{1j}$: regression coefficient associated with $X_{ij}$ for the jth level-2 unit
$r_{ij}$: random error associated with the ith level-1 unit nested within the jth level-2 unit

Level-2 (between-unit) models: the level-1 regression coefficients $(\beta_{0j} and \beta_{1j})$ are used as outcome variables and related to each of the level-2 predictors.

$\beta_{0j} = \gamma_{00} + \gamma_{01}G_{j} + U_{0j}$
$\beta_{1j} = \gamma_{10} + \gamma_{11}G_{j} + U_{1j}$
with $E(U_{0j}) = 0$, $E(U_{1j}) = 0$, $E(\beta_{0j}) = \gamma_{00}$, $E(\beta_{1j}) = \gamma_{10}$, $var(\beta_{0j}) = var(U_{0j}) = \tau_{00}$, $var(\beta_{1j}) = var(U_{1j}) = \tau_{11}$, $cov(\beta_{0j}, \beta_{1j}) = cov(U_{0j}, U_{1j}) = \tau_{01}$, and $cov(U_{0j}, r_{ij}) = cov(U_{1j}, r_{ij}) = 0$

$\beta_{0j}$: intercept for the jth level-2 unit
$\beta_{1j}$: slope for the jth level-2 unit
$G_{j}$: value on the level-2 predictor
$\gamma_{00}$: overall mean intercept adjusted for G
$\gamma_{01}$: overall mean intercept adjusted for G
$\gamma_{10}$: regression coefficient associated with G relative to level-1 intercept
$\gamma_{11}$: regression coefficient associated with G relative to level-1 slope
$U_{0j}$: random effects of the jth level-2 unit adjusted for G on the intercept
$U_{1j}$: random effects of the jth level-2 unit adjusted for G on the slope

A combined (two-level) model: substitute level-2 model into level-1 model
$Y_{ij} = \gamma_{00} + \gamma_{10}X_{ij} + \gamma_{01}G_{j} + \gamma_{11}G_{j}X_{ij} + U_{1j}X_{ij} + U_{0j} + r_{ij}$

The combined model incorporates the level-1 and level-2 predictors $(X_{ij} and G_{j})$, a cross-level term $(G_{j}X_{ij})$ as well as the composite error $(U_{1j}X_{ij} + U_{0j} + r_{ij}$. The two-level model is often termed a mixed model because it includes both fixed and random effects. The terms $U_{0j} and U_{1j}$ demonstrate that there is dependency among the level-1 units nested within each level-2 unit, and may have different values within level-2 units, leading to heterogeneous variances of the error terms.

Variables in HLM Models

Outcome variables
Predictors

Control variables
Explanatory variables

Variables at higher levels

Aggregated variables
Contextual variables

Estimation of effects

Two-level hierarchical models involve the estimation of three types of parameters

Fixed effects (do not vary across groups): Fixed Effects represent discrete, purposefully selected or existing values of a variable or factor
- Fixed effects exert constant impact on DV
- Random variability only occurs as a within subjects effect (level 1)
- Can only generalize to particular values used
- Fixed effects are represented by $\gamma_{00}, \gamma_{01}, \gamma_{11}, and \gamma_{10}$
- The technique used to estimate fixed effects is a Generalized Least Squared (GLS) estimate
- A GLS yields a weighted level-2 regression which ensures that groups (e.g., classrooms) with more accurate estimates of the outcome variable (i.e., the intercepts and slopes) are allocated more weight in the level-2 regression equation
random level-1 coefficients: Random Effects represent more continuous or randomly sampled values of a variable or factor
- Random effects exert variable impact on DV
- Variability occurs at level 1 and level 2
- Can study and model variability
- Can generalize to population of values
- Random effects are represented by $(\beta_{0j} and \beta_{1j}, can vary across groups)$
- Hierarchical models provide two estimates for random coefficients of a given group: (1) computing an OLS regression for the level-1 equation representing that group; and (2) the predicted values of $\beta_{0j}$ and $\beta_{1j}$ in the level-2 model
- HLM programs use an empirical Bayes estimation strategy, which takes into consideration both estimation strategies by computing an optimally weighted combination of the two
variance-covariance components
- these include (1) the covariance between level-2 error terms, i.e. $cov(\beta_{0j}, \beta_{1j})$ or $cov(U_{0j}, U_{1j})$ defined as $\tau_{01}$; (2) the variance in the level-1 error term, i.e. $var(r_{ij})$ denoted by $\sigma^{2}$; and (3) the variance in the level-2 error terms, i.e. $var(\beta_{0j})$ or $var(U_{0j})$ defined as $\tau_{00}$, and $var(\beta_{1j})$ or $var(U_{1j})$ defined as $\tau_{11}$
- variance-covariance estimates are made using iterative numerical procedures for unbalanced design
- Raudenbush & Bryk (2002) suggest the following conceptual approaches to estimating variance-covariance in unbalanced designs: (1) full maximum likelihood; (2) restricted maximum likelihood; and (3) Bayes estimation

Use fixed effects if

The groups are regarded as unique entities
If group values are determined by researcher through design or manipulation
Small j (< 10); improves power

Use random effects if

Groups regarded as a sample from a larger population
Researcher wishes to test effects of group level variables
Researcher wishes to understand group level differences
Small j (< 10); improves estimation

Statistical Estimation in HLM Models

Estimation Methods

Traditional Maximum Likelihood (ML) Estimation
- ML estimates model parameters by estimating a set of population parameters that maximize a likelihood function
- The likelihood function provides the probabilities of observing the sample data given particular parameter estimates
- ML methods produce parameters that maximize the probability of finding the observed sample data
- Disadvantage
  - Based on the likelihood function only
  - Could be approximated only
  - Not stable: slight change of model could lead to large change of results
  - Non-convergent cases: ~5% to 10%
  - Local maxima issues depending on initial values
  - Worse performance on small sample size
- Advantage
  - Easy mathematic or statistic skills (Classical statistics)
  - Easy computational skills (EM Algorithm)
FML: Full Maximum Likelihood has two advantages
- Computationally easier
RML: Restricted Maximum Likelihood
- RML expected to lead to better estimates especially when j is small
- With FML, overall chi-square tests both regression coefficients and variance components, with RML only variance components are tested
- Therefore if fixed portion of two models differ, must use FML for nested deviance tests
Modern Bayesian Estimation
- If information from only group j is used to estimate then we have the OLS estimate: $\beta_{0j} = \bar{Y_{j}}$
- If information from only the population is used to estimate then the group is estimated from the grand mean: $\gamma_{00} = \bar{Y_{..}} = \sum_{n=1}^N \frac{n_{j}}{N}\bar{Y_{j}}$
- Use of prior and posterior information improves estimation (depending on purpose)
- Estimates “shrink” toward the grand mean as shown in formula
- Amount of shrinkage depends on the “badness” of the unit estimate
  - Low reliability results in greater shrinkage (if ?? = 1, there is no shrinkage; if $\lambda = 0$, shrinkage is complete, $\gamma_{00}$)
  - Small n-size within a j unit results in greater shrinkage, “borrowing” from larger units
- Advantage
  - Based on the likelihood function and prior distribution (thus provide a chance to incorporate previous research findings)
  - More accurate results
  - More stable
  - Always converged
  - Global maximum promised (no need to rely on initial values)
  - Better performance on small sample size
- Disadvantage
  - Complicated mathematic or statistic skills (Bayesian statistics)
  - Complicated computational skills (Markov Chain Monte Carlo)

Computational Algorithms

Several algorithms exist for existing HLM models
- Expectation-Maximization (EM)
- Fisher scoring
- Iterative Generalized Least Squares (IGLS)
- Restricted IGLS (RIGLS)
All are iterative search and evaluation procedures

Model Estimation

Start values are used to solve model equations on first iteration
This solution is used to compute initial model fit
Next iteration involves search for better parameter values
New values evaluated for fit, then a new set of parameter values tried
When additional changes produce no appreciable improvement, iteration process terminates (convergence)
Note that convergence and model fit are very different issues

Centering

The Original Metric
- Sensible when 0 is a meaningful point on the original scale of the predictor
- Not sensible or interpretable in many other contexts

alt text

Centering around the grand mean
- Predictors at level 1 (X’s) are expressed as deviations from the grand mean (M): $(X_{ij} - M)$
- Intercept now expresses the expected outcome value (Y) for someone whose value on predictor X is the same as the grand mean on that predictor
- Centering is computationally more efficient
- Intercept represents the group mean adjusted for the grand mean $\bar{X_{j}}- M$ + $var({0j}) = {00}, the variance among the level-2 unit means adjusted for the grand mean
Centering around the group mean
- Individual scores are interpreted relative to their group mean $(X_{ij} - \bar{X})$
- The individual deviation scores are orthogonal to the group means
- Intercept represents the unadjusted mean achievement for the group
- Unbiased estimates of within-group effects
- May be necessary in random coefficient models if level 1 predictor affects the outcome at both level 1 and level 2
- Can control for unmeasured between group differences
- But can mask between group effects; interpretation is more complex

Parameter estimation

Coefficients and standatd errors estimated through maximum likelihood procedures (usually)
- The ratio of the parameter to its standard error produces a Wald test evaluated through comparison to the normal distribution (z)
- In HLM software, a more conservative approach is used
  - t-tests are used for significance testing
  - t-tests more accurate for fixed effects, small n, and nonnormal distributions
Variance Components

Parameter reliability

Analogous to score reliability: ratio of true score variance to total variance (true score + error)
In HLM, ratio of true parameter variance to total variability
$\lambda_{j} = n_{j}\times ICC / (1 + (n_{j}-1)\times ICC)$

alt text

Options of HLM models

Given a two-level data structure, one may decide

Inclusion of level-1 predictor(s) or not
Inclusion of level-2 predictor(s) or not
Inclusion predictors at both levels
Make the intercept varying randomly or not
Make the slopes varying randomly or not
Make both varying randomly or not
- If both are fixed effects and no predictor involved, it will be equivalent to a one-wat ANOVA model

Other options: location of independent variables / Centering

Use the original metric of IV
Center around the grand mean $(\bar{X_{..}})$
Center around the group mean $(\bar{X_{.j}})$
Center around other locations of IVs
Centering is not always necessary. The appropriate use of centering will reault in meaningful and interpretable parameters.

Two-Level HLM Models

Fixed Intercepts Model

The simplest HLM model is equivalent to a one-way ANOVA with fixed effects.
Level-1 model: $Y_{ij} = \beta_{0j} + r_{ij}$
Level-2 model: $\beta_{0j} = \gamma_{00}$
Combined model: $Y_{ij} = \gamma_{00} + r_{ij}$
This model simply estimates the grand mean $(\gamma_{00})$ and deviations from the grand mean $(r_{ij})$

alt text

ANOVA Model or Unconditional Model (random intercepts)

Equations
Level-1 model: $Y_{ij} = \beta_{0j} + r_{ij}$
Level-2 model: $\beta_{0j} = \gamma_{00} + U_{0j}$
Combined model: $Y_{ij} = \gamma_{00} + U_{0j} + r_{ij}$
Note the addition of U allows different intercepts for each j unit, a random effects model.

alt text

In addition to providing parameter estimates, the ANOVA model provides information about the presence of level 2 variance (the ICC) and whether there are significant differences between level 2 units
This model also called the Unconditional Model (because it is not “conditioned” by any predictors) and the empty model
Often used as a baseline model for comparison to more complex models

Conditional Models: ANCOVA

Adding a predictor to the ANOVA model results in an ANCOVA model with random intercepts.
Level-1 model: $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}$
Level-2 model: $\beta_{0j} = \gamma_{00} + U_{0j}$
$\beta_{1j} = \gamma_{10}$
Note that the effect of X is constrained to be the same fixed effect for every j unit (homogeneity of regression slopes)

alt text

Conditional Models: Random Coefficients

An additional parameter results in random variation of the slopes.
Level-1 model: $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}$
Level-2 model: $\beta_{0j} = \gamma_{00} + U_{0j}$
$\beta_{1j} = \gamma_{10} + U_{1j}$
Note that both intercepts and slopes now vary from group to group.

alt text

Standardized coefficients at level 1: $\beta_{0j}(SD_{X}/SD_{Y})$
Standardized coefficients at level 2: $\gamma_{0j}(SD_{X}/SD_{Y})$

Modeling variation at Level2: Intercepts as Outcomes

Equations
Level-1 model: $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}$
Level-2 model: $\beta_{0j} = \gamma_{00} + \gamma_{01}(G_{j}) + U_{0j}$
$\beta_{1j} = \gamma_{10} + U_{1j}$
Predictors (G’s) at level 2 are used to model variation in intercepts between the j units.

Modeling variation at Level2: Slopes as Outcomes

Equations
Level-1 model: $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}$
Level-2 model: $\beta_{0j} = \gamma_{00} + \gamma_{01}(G_{j}) + U_{0j}$
$\beta_{1j} = \gamma_{10} + \gamma_{11}(G_{j}) + U_{1j}$
G’s can be used to predict variation in slopes as well.

Variance Components Analysis

VCA allows estimation of the size of random variance components
- Important issue when unbalanced designs are used
- Iterative procedures must be used (usually Maximum Likelihood estimation)
Allows significance testing of whether there is variation in the components across units
Estimating Variance Components: Uniconditional Model
$Var(Y_{ij}) = Var(U_{0j}) + Var(r_{ij}) = \tau_{00} + \sigma^{2}$
Variance explained
$R^{2}$ at level 1 = $1 - (\sigma^{2}_{cond} + \tau_{cond})/(\sigma^{2}_{uncond} + \tau_{uncond})$
$R^{2}$ at level 2 = $1 - [(\sigma^{2}_{cond}/n_{h}) + \tau_{cond}]/[(\sigma^{2}_{uncond}/n_{h}) + \tau_{uncond}]$
where $n_{h}$ = the harmonic mean of n for the level 2 units $(k/[1/n_{1} + 1/n_{2} + ...1/n_{k}])$

Comparing models

Deviance tests
Variance explained: Examine reduction in unconditional model variance as predictors added, a simpler level 2 formula: $R^{2} = (\tau_{baseline} - \tau_{conditional})/ \tau_{baseline}$

Testing a Nested Sequence of HLM Models

Test unconditional model
Add level 1 predictors

Determine if there is variation across groups
If not, fix parameter
Decide whether to drop nonsignificant predictor
Test deviance, compute R square if so desired

Add level 2 predictors

Evaluate for significance
Test deviance, compute R square if so desired

Three Level Models

Equations

Level-1 (p students)
$Y_{ijk} = \pi_{0jk} + \pi_{1jk}(a_{pijk}) + e_{ijk}$
Level-2 (j classrooms) $\pi_{0jk} = \beta_{p0k} + \beta_{p1k}(X_{qjk}) + r_{p0k}$
$\pi_{1jk} = \beta_{p1kj} + \beta_{p1k}(X_{qjk}) + r_{p1k}$
Level-3 (k schools) $\beta_{p0k} = \gamma_{pq0} + \gamma_{pqs}(G_{sk}) + U_{pqk}$
$\beta_{p1k} = \gamma_{pq1} + \gamma_{pqs}(G_{sk}) + U_{pqk}$

Partitioning Variance

Proportion of variance within classrooms (individual student differences) = $\sigma^{2} / (\sigma^{2} + \tau_{\pi} +\tau_{\beta})$
Proportion of variance between classrooms within schools = $\tau_{\pi} / (\sigma^{2} + \tau_{\pi} +\tau_{\beta})$
Proportion of variance between schools = $\tau_{\beta} / (\sigma^{2} + \tau_{\pi} +\tau_{\beta})$

Longitudinal models

Equations
Level-1: $Y_{tij} = \pi_{0ij} + \pi_{1ij}(time) + e_{tij}$
Level-2: $\pi_{0ij} = \beta_{00j} + \beta_{01j}(X_{ij}) + r_{0ij}$
$\pi_{1ij} = \beta_{10j} + \beta_{11j}(X_{ij}) + r_{1ij}$
Level-3: $\beta_{00j} = \gamma_{000} + \gamma_{001}(G_{1j}) + U_{00j}$
$\beta_{10j} = \gamma_{100} + \gamma_{101}(G_{1j}) + U_{10j}$
Figures (from left to right, top to bottom) are (1) fitting a growth trajectory; (2) linear growth for individual students; (3) average linear growth by school; and (4) curvilinear longitudinal models.

Testing a Nested Sequence of HLM Longitudinal Models

Test unconditional model
Test Level 1 growth model
After establishing the level 1 growth model, use it as the baseline for succeeding model comparisons
Add level 2 predictors

Determine if there is variation across groups
If not, fix parameter
Decide whether to drop nonsignificant predictors
Test deviance, compute $R^{2}$ if so desired

Add level 3 predictors

Evaluate for significance
Test deviance, compute $R^{2}$ if so desired

Power in HLM Models

Factors Affecting Power

Sample Size
- Number of participants per cluster (N)
- Number of clusters (J)
Effect Size
Alpha level
Unexplained/residual variance
- Terminology: “error” versus unexplained or residual
- Residual variance reduces power
  - Anything that decreases residual variance, increases power (e.g., more homogeneous participants, additional
- Unreliability of measurement contributes to residual variance explanatory variables, etc.)
- Treatment infidelity contributes to residual variance
- Consider factors that may contribute to residual between cluster variance
Design Effects
- Intraclass correlation (ICC)
- Between vs. within cluster variance
- Treatment variability across clusters
- Repeated measures
- Blocking and matching
Statistical control

Randomization as a Control Tactic

What does randomization accomplish?
- Controls bias in assignment of treatment (works OK with small J)
- Turns confounding factors into randomly related effects (equivalence vs. randomness; does not work well with small J)
Applying an underpowered, small CRCT may not be sufficient to achieve rigor
- Consider other design approaches (e.g., interrupted time series, regression discontinuity designs)
- Aggressively apply other tactics for experimental or statistical control
Not all designs are created equal
No one design is best (e.g., randomized trials)

Improving Power Through Planned Design

Evaluate the validity of inferences for the planned design
Design to address most important potential study weaknesses
Realistic appraisal of study purpose, context, and odds of success
Importance of fostering better understanding of the factors influencing power
Planning that tailors design to study context and setting
- Strategies for cluster recruitment
- Prevention of missing data
- Planning for use of realistic designs and use of other strategies like blocking, matching, and use of covariates

Design For Statistical Power

Stronger treatments!
Treatment fidelity
Blocking and matching
Repeated measures
Focused tests (df = 1)
Intraclass correlation
Statistical control, use of covariates
Restriction of range (IV and DV)
Measurement reliability and validity (IV and DV)

An Example of Model Building (in Chen 2008)

Topics: School effectiveness and student achievement
Purpose: to investigate schools’ contribution to students’ cognitive growth across various contexts: different SES, school types, sizes, etc.

DV (Y): (1) students’ achievement (math and reading); (2) achievement gain; (3) bivariate gains
Level-1 (student) IV (X): SES (Socio-economic Status)
Level-2 (school) IV (W): school types (public or private), sizes

Data system:

National education longitudinal survey -1988 (NELS:88)
About 2,000 high schools (grade 10 to grade 12), 50,000 students

Null Models in HLM

Data: no predictor involved (scores from students from different schools)
Model (one-way ANOVA)
Level1/ Within each school: $Y_{ij} = \beta_{0j} + r_{ij}$ with $r_{ij} \sim N(0,\sigma^{2})$
Level2/ between schools: $\beta_{0j} = \gamma_{00} + U_{0j}$ with $U_{0j} \sim N(0,\tau_{00})$
Combined model: $Y_{ij} = \gamma_{00} + U_{0j} + r_{ij}$
Unconditional ICC (intraclass correlation): ICC = $\frac{\tau_{00}}{\tau_{00} + \sigma^2}$
ICC measures the proportion of variance in the outcome that occurs between level-2 units in the total variance which consists of both between and within-units variances

HLM with One Level-1 Predictor

Data: scores for students with various SES from schools (one level-1 predictor)
Model (one-way ANCOVA)
- Level1/ within each school:
  $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij} - \bar{X_{.j}}) + r_{ij}$ with $r_{ij} \sim N(0,\sigma^{2})$
  $Y_{ij}$: Achievement for student i in school j
  $\beta_{0j}$: Intercept for school j (Given the group mean centering, it is also the predicted achievement for a student with mean SES in school j)
  $\beta_{1j}$: Slope for school (Given the group mean centering, it is also the predicted achievement change for a student whose SES is one unit away from the mean SES in school j)
  $X_{ij}$: SES for student i in school j
  $\bar{X_{.j}}$: Mean SES in school j
  $r_{ij}$: Random error associated with student i in school j
- Level2/ between schools: random intercept, fixed slope
  $\beta_{0j} = \gamma_{00} + U_{0j}$ with $U_{0j} \sim N(0,\tau_{00})$
  $\beta_{1j} = \gamma_{10}$
- Combined model: $Y_{ij} = \gamma_{00} + \gamma_{10}(X_{ij} - \bar{X_{.j}}) + U_{0j} + r_{ij}$
  The only difference from the usual ANCOVA model is that the group effect u0j here is regarded as random instead of fixed in ANCOVA. The residual variance / ICC here is conditional on the level-1 predictor SES

HLM with One Level-2 Predictor

Data: scores for students from public and private schools (one level-2 predictor)
Model (means-as-outcome)
- Level1/ within each school:
  $Y_{ij} = \beta_{0j} + r_{ij}$ with $r_{ij} \sim N(0,\sigma^{2})$
- Level2/ between schools:
  $\beta_{0j} = \gamma_{00} + \gamma_{01}G_{j} + U_{0j}$ with $U_{0j} \sim N(0,\tau_{00})$
  $\beta_{0j}$: Intercept for school j
  $\gamma_{00}$: Mean achievement for the school with G = 0
  $\gamma_{01}$: Mean achievement difference between schools with one unit difference on W
  $U_{0j}$: Random error predicting level 1 intercept from level 2 predictor G
- Combined model: $Y_{ij} = \gamma_{00} + \gamma_{01}G_{j} + U_{0j} + r_{ij}$ with assumptions of $cov(U,r) = 0$, $cov(G,U) = 0$, and $cov(G,r) = 0$
  $U_{0j} + r_{ij}$: random effect which is conditional on the effect of G
- Conditional ICC: it measures the proportion of left-over between-units variance after controlling for G in the total variance. It can be useful to determine if there is large amount of variance that is unexplained after G which can lead to explorations of other level-2 predictors
- Auxiliary statistics: By comparing $\tau$ estimate across two models, we can developed an index of the proportion reduction in variance (PRV) or variance explained at level 2.

Combined Equation for a Two-level Model

Level1: $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij} - \bar{X_{.j}}) + r_{ij}$
Level2: $\beta_{0j} = \gamma_{00} + \gamma_{01}G_{j} + U_{0j}$
Level2: $\beta_{1j} = \gamma_{10} + \gamma_{11}G_{j} + U_{1j}$
Combined model:
- $Y_{ij} = \gamma_{00} + \gamma_{01}G_{j} + \gamma_{10}(X_{ij} - \bar{X_{.j}}) + \gamma_{11}G_{j}(X_{ij} - \bar{X_{.j}}) + U_{0j} + U_{1j}(X_{ij} - \bar{X_{.j}}) + r_{ij}$
- Fixed Components: $\gamma_{00} + \gamma_{01}G_{j} + \gamma_{10}(X_{ij} - \bar{X_{.j}}) + \gamma_{11}G_{j}(X_{ij} - \bar{X_{.j}})$
- Random Components: $U_{0j} + U_{1j}(X_{ij} - \bar{X_{.j}}) + r_{ij}$
- Therefore this is a MIXED (Fixed and random component) model!
- Since the level-1 intercept and slopes are the outcomes in the level-2, they are actual variables instead of parameters: $var(\beta_{0j}) = var(U_{0j}) = \tau_{00}$, $var(\beta_{1j}) = var(U_{1j}) = \tau_{11}$, $cov(\beta_{0j}, \beta_{1j}) = cov(U_{0j}, U_{1j}) = \tau_{01}$, and $cov(U_{0j}, r_{ij}) = cov(U_{1j}, r_{ij}) = 0$

Following figures (from top to bottom) indicate results of one-level model (IV uncentered), one-level model (IV centered), and two-level Model (IV centered)

alt text

Another Example of Model Building (in Woltman et al. 2012): Hypothesis Testing

Does student breakfast consumption (level-1 predictor) and teaching style (level-2 predictor) influence student GPA (level-1 outcome)?

alt text

In order to support above hypotheses, HLM models require five conditions to be satisfied (see above table).

Condition 1: These is Systematic Within- and Between-Group Variance in GPA
- It provides useful preliminary information and assures that there is appropriate variance to investigate the hypotheses
- HLM applies a one-way ANOVA to partition the within- and between-group variance
  $Y_{ij} = \beta_{0j} + r_{ij}$
  $\beta_{0j} = \gamma_{00} + U_{0j}$
  $\beta_{0j}$: mean GPA for classroom j
  $\gamma_{00}$: grand mean GPA
  $var(r_{ij}) = \sigma^{2}$: within group variance in GPA
  $var(U_{0j}) = \tau_{00}$: between group variance in GPA
- By running an initial ANOVA, HLM provides (1) the amount of variance within groups; (2) the amount of variance between groups; and (3) allows for the calculation of the ICC (intra-class correlation): ICC = $\frac{\tau_{00}}{\tau_{00} + \sigma^{2}}$
Condition 2 and 3: There is Significant Variance in the Level-1 Intercept and Slope
- Once within- and between-group variance has been partitioned, HLM applies a random coefficient regression to test the second and third conditions.
- equations
  $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}$
  $\beta_{0j} = \gamma_{00} + U_{0j}$
  $\beta_{1j} = \gamma_{10} + U_{1j}$
  $\gamma_{00}$: mean of the intercepts across classrooms
  $\gamma_{10}$: mean of the slopes across classrooms (Hypothesis 1)
  $var(r_{ij}) = \sigma^{2}$: Level-1 residual variance
  $var(U_{0j}) = \tau_{00}$: variance in intercepts
  $var(U_{1j}) = \tau_{11}$: variance in slopes
- HLM runs a t-test on these parameters to assess whether they differ significantly from zero. This t-test reveals whether the pooled slope between GPA and breakfast consumption differs from zero.
- HLM runs a $\chi^{2}$ test to assess whether the variance in the intercept $(\tau_{00})$ and slopes $(\tau_{11})$differs significantly from zero.
- HLM calculates the percent of variance in GPA that is accounted for by breakfast comsuption (Level-1 predictor).
  $R^{2}_{level-1 model}$ = $(\sigma^{2}_{oneway-ANOVA}$ - $\sigma^{2}_{random-regression})$ / $\sigma^{2}_{oneway-ANOVA}$
- Condition 2 and 3 must first be met in order to test Condition 4 and 5!
Condition 4: The Variance in the Level-1 Intercept is Predicted by Teaching Style (Level-2 predictor)
- Condition 4 assesses whether the significant variance at the intercepts (found in the second condition) is related to teaching style (Level-2 predictor). It is also known as the intercepts-as-outcomes model.
- HLM uses another random regression model to assess whether teaching style is significantly related to the intercept while holding breakfast consumption constant. $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}$
  $\beta_{0j} = \gamma_{00} + \gamma_{01}(G_{j}) + U_{0j}$
  $\beta_{1j} = \gamma_{10} + U_{1j}$
  $\gamma_{00}$: Level-2 intercept
  $\gamma_{01}$: Level-2 slope (Hypothesis 2)
  $\gamma_{10}$: mean (pooled) slopes
  $var(r_{ij}) = \sigma^{2}$: Level-1 residual variance
  $var(U_{0j}) = \tau_{00}$: residual intercept variance
  $var(U_{1j}) = \tau_{11}$: variance in slopes
- The residual variance $\tau_{00}$ is assessed for significance using another $\chi^{2}$ test. If this test indicates a significant value, other level-2 predictors can be added to account for this variance.
- To assess how much variance in GPA is accounted for by teaching style, the variance attributable to teaching style is compared to the total intercept variance
  $R^{2}_{level-2 intercept-model}$ = $(\tau_{00 - random-regression}$ - $\tau_{00 - intercepts-as-outcomes})$ / $\tau_{00 - random-regression}$
Condition 5: The variance in the Level-1 Slope is Predicted by Teaching Style (Level-2 predictor)
- Condition 5 assesses whether the difference in slopes is related to teaching style. It is known as the slopes-as-outcomes model. $Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}$
  $\beta_{0j} = \gamma_{00} + \gamma_{01}(G_{j}) + U_{0j}$
  $\beta_{1j} = \gamma_{10} + \gamma_{11}(G_{j}) + U_{1j}$
  $\gamma_{00}$: Level-2 intercept
  $\gamma_{01}$: Level-2 slope (Hypothesis 2)
  $\gamma_{10}$: Level-2 intercept
  $\gamma_{11}$: Level-2 slope (Hypothesis 3)
  $var(r_{ij}) = \sigma^{2}$: Level-1 residual variance
  $var(U_{0j}) = \tau_{00}$: residual intercept variance
  $var(U_{1j}) = \tau_{11}$: residual slope variance
- the percent of variance attributable to teaching style (Level-2 predictor) can be computed as a moderator in the breakfast consumption-GPA relationship by comparing its systematic variance with the pooled variance in the slopes
  $R^{2}_{level-2 slope-model}$ = $(\tau_{11 - intercepts-as-outcomes}$ - $\tau_{11 - slopes-as-outcomes})$ / $\tau_{11 - intercepts-as-outcomes}$

Regression Discontinuity and Interrupted Time Series Designs

Equation

Change in Intercept
$Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \pi_{2i}Treatment_{ij} + \epsilon_{ij}$
- when Treatment = 0: $Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \epsilon_{ij}$
- when Treatment = 1: $Y_{ij} = (\pi_{0i} + \pi_{2i}) + \pi_{1i}Time_{ij} + \epsilon_{ij}$
Change in Slope
- when Treatment = 0: $Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \epsilon_{ij}$
- when Treatment = 1:$Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \pi_{3i}Treatment \times Time_{ij} + \epsilon_{ij}$
Change in Intercept and Slope
$Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \pi_{2i}Treatment_{ij} + \pi_{3i}Treatment \times Time_{ij} + \epsilon_{ij}$
- when Treatment = 0: $Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \epsilon_{ij}$

alt text

Extensions of HLM

Categorical outcome variables
Cross-classified data
- e.g. friendship circles
Multivariate (growth) models
Modeling of latent variables (measurement models)
- Latent traits models (item response theory)
- Factor analysis (structure equation models)
- Latent class models
- Latent profiles models

Conclusion (in Woltman et al., 2012)

HLM is a multi-step, time-consuming process. It can accommodate any number of hierarchical levels, but the workload increases exponentially with each added level. Compared to most other statistical methods commonly used in psychological research, HLM is relatively new and various guidelines for HLM are still in the process of development (Beaubien et al., 2001; Raudenbush & Bryk, 2002). Prior to conducting an HLM analysis, background interaction effects between predictor variables should be accounted for, and sufficient amounts of within- and between-level variance at all levels of the hierarchy should be ensured. HLM presumes that data is normally distributed: When the assumption of normality for the predictor and/or outcome variable(s) is violated, this range restriction biases HLM output. Finally, as previously mentioned, outcome variable(s) of interest must be situated at the lowest level of analysis in HLM (Beaubien et al., 2001).

Although HLM is relatively new, it is already being used in novel ways across a vast range of research domains. Examples of research questions analyzed using HLM include the effects of the environment on aspects of youth development (Avan & Kirkwood, 2001; Kotch, et al., 2008; Lyons, Terry, Martinovich, Peterson, & Bouska, 2001), longitudinal examinations of symptoms in chronic illness (Connelly, et al., 2007; Doorenbos, Given, Given, & Verbitsky, 2006), relationship quality based on sexual orientation (Kurdek, 1998), and interactions between patient and program characteristics in treatment programs (Chou, Hser, & Anglin, 1998).

How to conduct HLM in R

Reference

Woltman, H.et al. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52-69.
Sullivan, L. et al. (1999). Tutorial in biostatistics: an introduction to hierarchical linear modeling. Statistics in Medicine, 18, 855-888.
Chen J.S. (2008). Introduction to HLM and some special topics.
Stevens, J. (2007). Hierarchical Linear Models.
Raudenbush, S.W.& Bryk, A.S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods (2nd ed.). Thousand Oaks, CA: Sage Publication.
Willett, J.B. et al. (1998). The design and analysis of longitudinal studies of development and psychopathology in context: Statistical models and methodological recommendations. Developmental Psychopathology, 10, 395-426.

Hierarchical Linear Model

Hierarchical Data Structures

What is Hierarchical Linear Model ?

Why Is Multilevel Analysis Needed ?

Methods to analyze hierarchical data

Equations underlying HLM

Variables in HLM Models

Estimation of effects

Statistical Estimation in HLM Models

Options of HLM models

Two-Level HLM Models

Three Level Models

Power in HLM Models

An Example of Model Building (in Chen 2008)

Another Example of Model Building (in Woltman et al. 2012): Hypothesis Testing

Regression Discontinuity and Interrupted Time Series Designs

Extensions of HLM

Other HLM Topics

Conclusion (in Woltman et al., 2012)

How to conduct HLM in R

Reference