Hierarchical Linear Model

author: angelayuan
date: Sep 6, 2015

Hierarchical Data Structures

Many social and natural phenomena have a nested or clustered organization

  • Children within classrooms within schools
  • Patients in a medical study grouped within doctors within different clinics
  • Effect sizes within studies within methods (meta analysis)
  • Time of measurement within persons within organizations

Simpson’s Paradox: Clustering is important

  • performance of individual groups is reversed when the groups are combined.
df <- data.frame(Quiz1 = c("60.0%","90.0%"), Quiz2 = c("10.0%","30.0%"))
row.names(df)<-c("Gina","Sam")
df
##      Quiz1 Quiz2
## Gina 60.0% 10.0%
## Sam  90.0% 30.0%
df_combine <- data.frame(Quiz1 = c("60/100","9/10"), Quiz2 = c("1/10","30/100"), Total = c("61/110","39/110"))
row.names(df_combine)<-c("Gina","Sam")
df_combine
##       Quiz1  Quiz2  Total
## Gina 60/100   1/10 61/110
## Sam    9/10 30/100 39/110

What is Hierarchical Linear Model ?

  • HLM is a complex form of ordinary least squares (OLS) regression that is used to analyze variance in the outcome variables when the predictor variables are at varying hierarchical levels.

  • HLM simultaneously investigates relationships within and between hierarchical levels of grouped data, making it more efficient at accounting for variance among variables at different levels than other existing analyses.

  • HLM has come to be known by several names, including multilevel-, mixed level-, mixed linear-, mixed effects-, random effects-, random coefficient (regression)-, and (complex) covariance components-modeling.


Why Is Multilevel Analysis Needed ?

Nesting creates dependencies in the data

  • Dependencies violate the assumptions of traditional statistical models (“independence of error”, “homogeneity of regression slopes”)
  • Dependencies result in inaccurate statistical estimates

Important to understand variation at different levels


Methods to analyze hierarchical data

This part compare and contrast HLM to the methods used to anlayze hierarchical data prior to HLM’s development.

  • Disaggregation: It deals with hierarchical data by ignoring the presence of group differences. It considers all relationships between variables to be context free and situated at level-1 of the hierarchy (i.e., at the individual level).
    • For example, all students in the same class would be assigned the same mean classroom-related scores (e.g.homework assignment load and teaching experience), and all students in the same school would be assigned the same mean school-related scores (e.g., school geographic location)
    • If teaching experience influences student breakfast consumption, the effects of the level-1 (student) and level-2 (classroom) variables on the outcome variable (GPA) cannot be disentangled.

alt text

  • Aggregation: It deals with hierarchical data by ignoring lower level individual differences, i.e. within-group variation is ignored and individuals are treated as homogenous entities.

alt text

  • HLM: HLM is needed for that (1) nesting creates dependencies in the data, and (2) it is important to understand variation at different levels.
    • For example, we want to investigate the relationship between breakfast consumption and student GPA using HLM. Each level-1 (X,Y) unit (i.e., each student’s GPA and breakfast consumption) is identified by its level-2 cluster (i.e., that student’s classroom). Each level-2 cluster’s slope (i.e., each classroom’s slope) is also identified and analyzed separately.
    • Using HLM, both the within- and between-group regressions are taken into account to depict the relationship between breakfast consumption and GPA.
    • Disadvantages of HLM: (1) it requires large sample sizes for adequate power; (2) it can only handle missing data at level-1 and removes groups with missing data if they are at level-2 or above. Therefore it is advantegeous to increase the number of groups as opposed to the number of observations per group. A study with 30 groups with 30 observations each \((n = 30 \times 30 = 900)\) can have the same power as 150 groups with 5 observations each \((n = 5 \times 150 = 750)\).

alt text


Equations underlying HLM

HLM in this part is limited to two-level hierarchical data structures concerning continuous outcome variables.

Two models are developed in order to achieve the simultaneous investigation of the relationship within a given hierarchical level as well as the relationship across levels: one that reflects the relationship within lower level units, and a second that models how the relationship within lower level units varies between units.

Level-1 (Within-unit) models: \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}\) with \(E(r_{ij}) = 0\) and \(var(r_{ij}) = \sigma^{2}\)

  • \(Y_{ij}\): dependent variable measured for ith level-1 unit nested within the jth level-2 unit
  • \(X_{ij}\): value on the level-1 predictor
  • \(\beta_{0j}\): intercept for the jth level-2 unit
  • \(\beta_{1j}\): regression coefficient associated with \(X_{ij}\) for the jth level-2 unit
  • \(r_{ij}\): random error associated with the ith level-1 unit nested within the jth level-2 unit

Level-2 (between-unit) models: the level-1 regression coefficients \((\beta_{0j} and \beta_{1j})\) are used as outcome variables and related to each of the level-2 predictors.

\(\beta_{0j} = \gamma_{00} + \gamma_{01}G_{j} + U_{0j}\)
\(\beta_{1j} = \gamma_{10} + \gamma_{11}G_{j} + U_{1j}\)
with \(E(U_{0j}) = 0\), \(E(U_{1j}) = 0\), \(E(\beta_{0j}) = \gamma_{00}\), \(E(\beta_{1j}) = \gamma_{10}\), \(var(\beta_{0j}) = var(U_{0j}) = \tau_{00}\), \(var(\beta_{1j}) = var(U_{1j}) = \tau_{11}\), \(cov(\beta_{0j}, \beta_{1j}) = cov(U_{0j}, U_{1j}) = \tau_{01}\), and \(cov(U_{0j}, r_{ij}) = cov(U_{1j}, r_{ij}) = 0\)

  • \(\beta_{0j}\): intercept for the jth level-2 unit
  • \(\beta_{1j}\): slope for the jth level-2 unit
  • \(G_{j}\): value on the level-2 predictor
  • \(\gamma_{00}\): overall mean intercept adjusted for G
  • \(\gamma_{01}\): overall mean intercept adjusted for G
  • \(\gamma_{10}\): regression coefficient associated with G relative to level-1 intercept
  • \(\gamma_{11}\): regression coefficient associated with G relative to level-1 slope
  • \(U_{0j}\): random effects of the jth level-2 unit adjusted for G on the intercept
  • \(U_{1j}\): random effects of the jth level-2 unit adjusted for G on the slope

A combined (two-level) model: substitute level-2 model into level-1 model
\(Y_{ij} = \gamma_{00} + \gamma_{10}X_{ij} + \gamma_{01}G_{j} + \gamma_{11}G_{j}X_{ij} + U_{1j}X_{ij} + U_{0j} + r_{ij}\)

The combined model incorporates the level-1 and level-2 predictors \((X_{ij} and G_{j})\), a cross-level term \((G_{j}X_{ij})\) as well as the composite error \((U_{1j}X_{ij} + U_{0j} + r_{ij}\). The two-level model is often termed a mixed model because it includes both fixed and random effects. The terms \(U_{0j} and U_{1j}\) demonstrate that there is dependency among the level-1 units nested within each level-2 unit, and may have different values within level-2 units, leading to heterogeneous variances of the error terms.


Variables in HLM Models

Outcome variables
Predictors

  • Control variables
  • Explanatory variables

Variables at higher levels

  • Aggregated variables
  • Contextual variables

Estimation of effects

Two-level hierarchical models involve the estimation of three types of parameters

  • Fixed effects (do not vary across groups): Fixed Effects represent discrete, purposefully selected or existing values of a variable or factor
    • Fixed effects exert constant impact on DV
    • Random variability only occurs as a within subjects effect (level 1)
    • Can only generalize to particular values used
    • Fixed effects are represented by \(\gamma_{00}, \gamma_{01}, \gamma_{11}, and \gamma_{10}\)
    • The technique used to estimate fixed effects is a Generalized Least Squared (GLS) estimate
    • A GLS yields a weighted level-2 regression which ensures that groups (e.g., classrooms) with more accurate estimates of the outcome variable (i.e., the intercepts and slopes) are allocated more weight in the level-2 regression equation
  • random level-1 coefficients: Random Effects represent more continuous or randomly sampled values of a variable or factor
    • Random effects exert variable impact on DV
    • Variability occurs at level 1 and level 2
    • Can study and model variability
    • Can generalize to population of values
    • Random effects are represented by \((\beta_{0j} and \beta_{1j}, can vary across groups)\)
    • Hierarchical models provide two estimates for random coefficients of a given group: (1) computing an OLS regression for the level-1 equation representing that group; and (2) the predicted values of \(\beta_{0j}\) and \(\beta_{1j}\) in the level-2 model
    • HLM programs use an empirical Bayes estimation strategy, which takes into consideration both estimation strategies by computing an optimally weighted combination of the two
  • variance-covariance components
    • these include (1) the covariance between level-2 error terms, i.e. \(cov(\beta_{0j}, \beta_{1j})\) or \(cov(U_{0j}, U_{1j})\) defined as \(\tau_{01}\); (2) the variance in the level-1 error term, i.e. \(var(r_{ij})\) denoted by \(\sigma^{2}\); and (3) the variance in the level-2 error terms, i.e. \(var(\beta_{0j})\) or \(var(U_{0j})\) defined as \(\tau_{00}\), and \(var(\beta_{1j})\) or \(var(U_{1j})\) defined as \(\tau_{11}\)
    • variance-covariance estimates are made using iterative numerical procedures for unbalanced design
    • Raudenbush & Bryk (2002) suggest the following conceptual approaches to estimating variance-covariance in unbalanced designs: (1) full maximum likelihood; (2) restricted maximum likelihood; and (3) Bayes estimation

Use fixed effects if

  • The groups are regarded as unique entities
  • If group values are determined by researcher through design or manipulation
  • Small j (< 10); improves power

Use random effects if

  • Groups regarded as a sample from a larger population
  • Researcher wishes to test effects of group level variables
  • Researcher wishes to understand group level differences
  • Small j (< 10); improves estimation

Statistical Estimation in HLM Models

Estimation Methods

  • Traditional Maximum Likelihood (ML) Estimation
    • ML estimates model parameters by estimating a set of population parameters that maximize a likelihood function
    • The likelihood function provides the probabilities of observing the sample data given particular parameter estimates
    • ML methods produce parameters that maximize the probability of finding the observed sample data
    • Disadvantage
      • Based on the likelihood function only
      • Could be approximated only
      • Not stable: slight change of model could lead to large change of results
      • Non-convergent cases: ~5% to 10%
      • Local maxima issues depending on initial values
      • Worse performance on small sample size
    • Advantage
      • Easy mathematic or statistic skills (Classical statistics)
      • Easy computational skills (EM Algorithm)
  • FML: Full Maximum Likelihood has two advantages
    • Computationally easier
  • RML: Restricted Maximum Likelihood
    • RML expected to lead to better estimates especially when j is small
    • With FML, overall chi-square tests both regression coefficients and variance components, with RML only variance components are tested
    • Therefore if fixed portion of two models differ, must use FML for nested deviance tests
  • Modern Bayesian Estimation
    • If information from only group j is used to estimate then we have the OLS estimate: \(\beta_{0j} = \bar{Y_{j}}\)
    • If information from only the population is used to estimate then the group is estimated from the grand mean: \(\gamma_{00} = \bar{Y_{..}} = \sum_{n=1}^N \frac{n_{j}}{N}\bar{Y_{j}}\)

    alt text

    • Use of prior and posterior information improves estimation (depending on purpose)
    • Estimates “shrink” toward the grand mean as shown in formula
    • Amount of shrinkage depends on the “badness” of the unit estimate
      • Low reliability results in greater shrinkage (if ?? = 1, there is no shrinkage; if \(\lambda = 0\), shrinkage is complete, \(\gamma_{00}\))
      • Small n-size within a j unit results in greater shrinkage, “borrowing” from larger units
    • Advantage
      • Based on the likelihood function and prior distribution (thus provide a chance to incorporate previous research findings)
      • More accurate results
      • More stable
      • Always converged
      • Global maximum promised (no need to rely on initial values)
      • Better performance on small sample size
    • Disadvantage
      • Complicated mathematic or statistic skills (Bayesian statistics)
      • Complicated computational skills (Markov Chain Monte Carlo)

Computational Algorithms

  • Several algorithms exist for existing HLM models
    • Expectation-Maximization (EM)
    • Fisher scoring
    • Iterative Generalized Least Squares (IGLS)
    • Restricted IGLS (RIGLS)
  • All are iterative search and evaluation procedures

Model Estimation

  • Start values are used to solve model equations on first iteration
  • This solution is used to compute initial model fit
  • Next iteration involves search for better parameter values
  • New values evaluated for fit, then a new set of parameter values tried
  • When additional changes produce no appreciable improvement, iteration process terminates (convergence)
  • Note that convergence and model fit are very different issues

Centering

  • The Original Metric
    • Sensible when 0 is a meaningful point on the original scale of the predictor
    • Not sensible or interpretable in many other contexts

alt text

  • Centering around the grand mean
    • Predictors at level 1 (X’s) are expressed as deviations from the grand mean (M): \((X_{ij} - M)\)
    • Intercept now expresses the expected outcome value (Y) for someone whose value on predictor X is the same as the grand mean on that predictor
    • Centering is computationally more efficient
    • Intercept represents the group mean adjusted for the grand mean \(\bar{X_{j}}- M\) + $var({0j}) = {00}, the variance among the level-2 unit means adjusted for the grand mean

    alt text

  • Centering around the group mean
    • Individual scores are interpreted relative to their group mean \((X_{ij} - \bar{X})\)
    • The individual deviation scores are orthogonal to the group means
    • Intercept represents the unadjusted mean achievement for the group
    • Unbiased estimates of within-group effects
    • May be necessary in random coefficient models if level 1 predictor affects the outcome at both level 1 and level 2
    • Can control for unmeasured between group differences
    • But can mask between group effects; interpretation is more complex

    alt text

Parameter estimation

  • Coefficients and standatd errors estimated through maximum likelihood procedures (usually)
    • The ratio of the parameter to its standard error produces a Wald test evaluated through comparison to the normal distribution (z)
    • In HLM software, a more conservative approach is used
      • t-tests are used for significance testing
      • t-tests more accurate for fixed effects, small n, and nonnormal distributions
  • Variance Components

Parameter reliability

  • Analogous to score reliability: ratio of true score variance to total variance (true score + error)
  • In HLM, ratio of true parameter variance to total variability
  • \(\lambda_{j} = n_{j}\times ICC / (1 + (n_{j}-1)\times ICC)\)

alt text


Options of HLM models

Given a two-level data structure, one may decide

  • Inclusion of level-1 predictor(s) or not
  • Inclusion of level-2 predictor(s) or not
  • Inclusion predictors at both levels
  • Make the intercept varying randomly or not
  • Make the slopes varying randomly or not
  • Make both varying randomly or not
    • If both are fixed effects and no predictor involved, it will be equivalent to a one-wat ANOVA model

Other options: location of independent variables / Centering

  • Use the original metric of IV
  • Center around the grand mean \((\bar{X_{..}})\)
  • Center around the group mean \((\bar{X_{.j}})\)
  • Center around other locations of IVs
    Centering is not always necessary. The appropriate use of centering will reault in meaningful and interpretable parameters.

Two-Level HLM Models

Fixed Intercepts Model

  • The simplest HLM model is equivalent to a one-way ANOVA with fixed effects.
    Level-1 model: \(Y_{ij} = \beta_{0j} + r_{ij}\)
    Level-2 model: \(\beta_{0j} = \gamma_{00}\)
    Combined model: \(Y_{ij} = \gamma_{00} + r_{ij}\)

  • This model simply estimates the grand mean \((\gamma_{00})\) and deviations from the grand mean \((r_{ij})\)

alt text

ANOVA Model or Unconditional Model (random intercepts)

  • Equations
    Level-1 model: \(Y_{ij} = \beta_{0j} + r_{ij}\)
    Level-2 model: \(\beta_{0j} = \gamma_{00} + U_{0j}\)
    Combined model: \(Y_{ij} = \gamma_{00} + U_{0j} + r_{ij}\)
    Note the addition of U allows different intercepts for each j unit, a random effects model.

alt text

  • In addition to providing parameter estimates, the ANOVA model provides information about the presence of level 2 variance (the ICC) and whether there are significant differences between level 2 units
  • This model also called the Unconditional Model (because it is not “conditioned” by any predictors) and the empty model
  • Often used as a baseline model for comparison to more complex models

Conditional Models: ANCOVA

  • Adding a predictor to the ANOVA model results in an ANCOVA model with random intercepts.
    Level-1 model: \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}\)
    Level-2 model: \(\beta_{0j} = \gamma_{00} + U_{0j}\)
    \(\beta_{1j} = \gamma_{10}\)

  • Note that the effect of X is constrained to be the same fixed effect for every j unit (homogeneity of regression slopes)

alt text

Conditional Models: Random Coefficients

  • An additional parameter results in random variation of the slopes.
    Level-1 model: \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}\)
    Level-2 model: \(\beta_{0j} = \gamma_{00} + U_{0j}\)
    \(\beta_{1j} = \gamma_{10} + U_{1j}\)

  • Note that both intercepts and slopes now vary from group to group.

alt text

  • Standardized coefficients at level 1: \(\beta_{0j}(SD_{X}/SD_{Y})\)
  • Standardized coefficients at level 2: \(\gamma_{0j}(SD_{X}/SD_{Y})\)

Modeling variation at Level2: Intercepts as Outcomes

  • Equations
    Level-1 model: \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}\)
    Level-2 model: \(\beta_{0j} = \gamma_{00} + \gamma_{01}(G_{j}) + U_{0j}\)
    \(\beta_{1j} = \gamma_{10} + U_{1j}\)

  • Predictors (G’s) at level 2 are used to model variation in intercepts between the j units.

Modeling variation at Level2: Slopes as Outcomes

  • Equations
    Level-1 model: \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}\)
    Level-2 model: \(\beta_{0j} = \gamma_{00} + \gamma_{01}(G_{j}) + U_{0j}\)
    \(\beta_{1j} = \gamma_{10} + \gamma_{11}(G_{j}) + U_{1j}\)

  • G’s can be used to predict variation in slopes as well.

Variance Components Analysis

  • VCA allows estimation of the size of random variance components
    • Important issue when unbalanced designs are used
    • Iterative procedures must be used (usually Maximum Likelihood estimation)
  • Allows significance testing of whether there is variation in the components across units

  • Estimating Variance Components: Uniconditional Model
    \(Var(Y_{ij}) = Var(U_{0j}) + Var(r_{ij}) = \tau_{00} + \sigma^{2}\)

  • Variance explained
    \(R^{2}\) at level 1 = \(1 - (\sigma^{2}_{cond} + \tau_{cond})/(\sigma^{2}_{uncond} + \tau_{uncond})\)
    \(R^{2}\) at level 2 = \(1 - [(\sigma^{2}_{cond}/n_{h}) + \tau_{cond}]/[(\sigma^{2}_{uncond}/n_{h}) + \tau_{uncond}]\)
    where \(n_{h}\) = the harmonic mean of n for the level 2 units \((k/[1/n_{1} + 1/n_{2} + ...1/n_{k}])\)

Comparing models

  • Deviance tests
  • Variance explained: Examine reduction in unconditional model variance as predictors added, a simpler level 2 formula: \(R^{2} = (\tau_{baseline} - \tau_{conditional})/ \tau_{baseline}\)

Testing a Nested Sequence of HLM Models

  1. Test unconditional model
  2. Add level 1 predictors
  • Determine if there is variation across groups
  • If not, fix parameter
  • Decide whether to drop nonsignificant predictor
  • Test deviance, compute R square if so desired
  1. Add level 2 predictors
  • Evaluate for significance
  • Test deviance, compute R square if so desired

Three Level Models

Equations

  • Level-1 (p students)
    \(Y_{ijk} = \pi_{0jk} + \pi_{1jk}(a_{pijk}) + e_{ijk}\)
  • Level-2 (j classrooms) \(\pi_{0jk} = \beta_{p0k} + \beta_{p1k}(X_{qjk}) + r_{p0k}\)
    \(\pi_{1jk} = \beta_{p1kj} + \beta_{p1k}(X_{qjk}) + r_{p1k}\)
  • Level-3 (k schools) \(\beta_{p0k} = \gamma_{pq0} + \gamma_{pqs}(G_{sk}) + U_{pqk}\)
    \(\beta_{p1k} = \gamma_{pq1} + \gamma_{pqs}(G_{sk}) + U_{pqk}\)

Partitioning Variance

  • Proportion of variance within classrooms (individual student differences) = \(\sigma^{2} / (\sigma^{2} + \tau_{\pi} +\tau_{\beta})\)
  • Proportion of variance between classrooms within schools = \(\tau_{\pi} / (\sigma^{2} + \tau_{\pi} +\tau_{\beta})\)
  • Proportion of variance between schools = \(\tau_{\beta} / (\sigma^{2} + \tau_{\pi} +\tau_{\beta})\)

Longitudinal models

  • Equations
    Level-1: \(Y_{tij} = \pi_{0ij} + \pi_{1ij}(time) + e_{tij}\)
    Level-2: \(\pi_{0ij} = \beta_{00j} + \beta_{01j}(X_{ij}) + r_{0ij}\)
    \(\pi_{1ij} = \beta_{10j} + \beta_{11j}(X_{ij}) + r_{1ij}\)
    Level-3: \(\beta_{00j} = \gamma_{000} + \gamma_{001}(G_{1j}) + U_{00j}\)
    \(\beta_{10j} = \gamma_{100} + \gamma_{101}(G_{1j}) + U_{10j}\)

  • Figures (from left to right, top to bottom) are (1) fitting a growth trajectory; (2) linear growth for individual students; (3) average linear growth by school; and (4) curvilinear longitudinal models.

alt text

Testing a Nested Sequence of HLM Longitudinal Models

  1. Test unconditional model
  2. Test Level 1 growth model
  3. After establishing the level 1 growth model, use it as the baseline for succeeding model comparisons
  4. Add level 2 predictors
  • Determine if there is variation across groups
  • If not, fix parameter
  • Decide whether to drop nonsignificant predictors
  • Test deviance, compute \(R^{2}\) if so desired
  1. Add level 3 predictors
  • Evaluate for significance
  • Test deviance, compute \(R^{2}\) if so desired

Power in HLM Models

Factors Affecting Power

  • Sample Size
    • Number of participants per cluster (N)
    • Number of clusters (J)
  • Effect Size
  • Alpha level
  • Unexplained/residual variance
    • Terminology: “error” versus unexplained or residual
    • Residual variance reduces power
      • Anything that decreases residual variance, increases power (e.g., more homogeneous participants, additional
    • Unreliability of measurement contributes to residual variance explanatory variables, etc.)
    • Treatment infidelity contributes to residual variance
    • Consider factors that may contribute to residual between cluster variance
  • Design Effects
    • Intraclass correlation (ICC)
    • Between vs. within cluster variance
    • Treatment variability across clusters
    • Repeated measures
    • Blocking and matching
  • Statistical control

Randomization as a Control Tactic

  • What does randomization accomplish?
    • Controls bias in assignment of treatment (works OK with small J)
    • Turns confounding factors into randomly related effects (equivalence vs. randomness; does not work well with small J)
  • Applying an underpowered, small CRCT may not be sufficient to achieve rigor
    • Consider other design approaches (e.g., interrupted time series, regression discontinuity designs)
    • Aggressively apply other tactics for experimental or statistical control
  • Not all designs are created equal
  • No one design is best (e.g., randomized trials)

Improving Power Through Planned Design

  • Evaluate the validity of inferences for the planned design
  • Design to address most important potential study weaknesses
  • Realistic appraisal of study purpose, context, and odds of success
  • Importance of fostering better understanding of the factors influencing power
  • Planning that tailors design to study context and setting
    • Strategies for cluster recruitment
    • Prevention of missing data
    • Planning for use of realistic designs and use of other strategies like blocking, matching, and use of covariates

Design For Statistical Power

  • Stronger treatments!
  • Treatment fidelity
  • Blocking and matching
  • Repeated measures
  • Focused tests (df = 1)
  • Intraclass correlation
  • Statistical control, use of covariates
  • Restriction of range (IV and DV)
  • Measurement reliability and validity (IV and DV)

An Example of Model Building (in Chen 2008)

Topics: School effectiveness and student achievement
Purpose: to investigate schools’ contribution to students’ cognitive growth across various contexts: different SES, school types, sizes, etc.

  • DV (Y): (1) students’ achievement (math and reading); (2) achievement gain; (3) bivariate gains
  • Level-1 (student) IV (X): SES (Socio-economic Status)
  • Level-2 (school) IV (W): school types (public or private), sizes

Data system:

  • National education longitudinal survey -1988 (NELS:88)
  • About 2,000 high schools (grade 10 to grade 12), 50,000 students

Null Models in HLM

  • Data: no predictor involved (scores from students from different schools)
  • Model (one-way ANOVA)
    Level1/ Within each school: \(Y_{ij} = \beta_{0j} + r_{ij}\) with \(r_{ij} \sim N(0,\sigma^{2})\)
    Level2/ between schools: \(\beta_{0j} = \gamma_{00} + U_{0j}\) with \(U_{0j} \sim N(0,\tau_{00})\)
    Combined model: \(Y_{ij} = \gamma_{00} + U_{0j} + r_{ij}\)
    Unconditional ICC (intraclass correlation): ICC = \(\frac{\tau_{00}}{\tau_{00} + \sigma^2}\)
    ICC measures the proportion of variance in the outcome that occurs between level-2 units in the total variance which consists of both between and within-units variances

HLM with One Level-1 Predictor

  • Data: scores for students with various SES from schools (one level-1 predictor)
  • Model (one-way ANCOVA)
    • Level1/ within each school:
      \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij} - \bar{X_{.j}}) + r_{ij}\) with \(r_{ij} \sim N(0,\sigma^{2})\)
      \(Y_{ij}\): Achievement for student i in school j
      \(\beta_{0j}\): Intercept for school j (Given the group mean centering, it is also the predicted achievement for a student with mean SES in school j)
      \(\beta_{1j}\): Slope for school (Given the group mean centering, it is also the predicted achievement change for a student whose SES is one unit away from the mean SES in school j)
      \(X_{ij}\): SES for student i in school j
      \(\bar{X_{.j}}\): Mean SES in school j
      \(r_{ij}\): Random error associated with student i in school j
    • Level2/ between schools: random intercept, fixed slope
      \(\beta_{0j} = \gamma_{00} + U_{0j}\) with \(U_{0j} \sim N(0,\tau_{00})\)
      \(\beta_{1j} = \gamma_{10}\)

    • Combined model: \(Y_{ij} = \gamma_{00} + \gamma_{10}(X_{ij} - \bar{X_{.j}}) + U_{0j} + r_{ij}\)
      The only difference from the usual ANCOVA model is that the group effect u0j here is regarded as random instead of fixed in ANCOVA. The residual variance / ICC here is conditional on the level-1 predictor SES

HLM with One Level-2 Predictor

  • Data: scores for students from public and private schools (one level-2 predictor)
  • Model (means-as-outcome)
    • Level1/ within each school:
      \(Y_{ij} = \beta_{0j} + r_{ij}\) with \(r_{ij} \sim N(0,\sigma^{2})\)
    • Level2/ between schools:
      \(\beta_{0j} = \gamma_{00} + \gamma_{01}G_{j} + U_{0j}\) with \(U_{0j} \sim N(0,\tau_{00})\)
      \(\beta_{0j}\): Intercept for school j
      \(\gamma_{00}\): Mean achievement for the school with G = 0
      \(\gamma_{01}\): Mean achievement difference between schools with one unit difference on W
      \(U_{0j}\): Random error predicting level 1 intercept from level 2 predictor G

    • Combined model: \(Y_{ij} = \gamma_{00} + \gamma_{01}G_{j} + U_{0j} + r_{ij}\) with assumptions of \(cov(U,r) = 0\), \(cov(G,U) = 0\), and \(cov(G,r) = 0\)
      \(U_{0j} + r_{ij}\): random effect which is conditional on the effect of G
    • Conditional ICC: it measures the proportion of left-over between-units variance after controlling for G in the total variance. It can be useful to determine if there is large amount of variance that is unexplained after G which can lead to explorations of other level-2 predictors
    • Auxiliary statistics: By comparing \(\tau\) estimate across two models, we can developed an index of the proportion reduction in variance (PRV) or variance explained at level 2.

Combined Equation for a Two-level Model

  • Level1: \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij} - \bar{X_{.j}}) + r_{ij}\)
  • Level2: \(\beta_{0j} = \gamma_{00} + \gamma_{01}G_{j} + U_{0j}\)
  • Level2: \(\beta_{1j} = \gamma_{10} + \gamma_{11}G_{j} + U_{1j}\)
  • Combined model:
    • \(Y_{ij} = \gamma_{00} + \gamma_{01}G_{j} + \gamma_{10}(X_{ij} - \bar{X_{.j}}) + \gamma_{11}G_{j}(X_{ij} - \bar{X_{.j}}) + U_{0j} + U_{1j}(X_{ij} - \bar{X_{.j}}) + r_{ij}\)
    • Fixed Components: \(\gamma_{00} + \gamma_{01}G_{j} + \gamma_{10}(X_{ij} - \bar{X_{.j}}) + \gamma_{11}G_{j}(X_{ij} - \bar{X_{.j}})\)
    • Random Components: \(U_{0j} + U_{1j}(X_{ij} - \bar{X_{.j}}) + r_{ij}\)
    • Therefore this is a MIXED (Fixed and random component) model!
    • Since the level-1 intercept and slopes are the outcomes in the level-2, they are actual variables instead of parameters: \(var(\beta_{0j}) = var(U_{0j}) = \tau_{00}\), \(var(\beta_{1j}) = var(U_{1j}) = \tau_{11}\), \(cov(\beta_{0j}, \beta_{1j}) = cov(U_{0j}, U_{1j}) = \tau_{01}\), and \(cov(U_{0j}, r_{ij}) = cov(U_{1j}, r_{ij}) = 0\)

Following figures (from top to bottom) indicate results of one-level model (IV uncentered), one-level model (IV centered), and two-level Model (IV centered)

alt text


Another Example of Model Building (in Woltman et al. 2012): Hypothesis Testing

Does student breakfast consumption (level-1 predictor) and teaching style (level-2 predictor) influence student GPA (level-1 outcome)?

alt text

In order to support above hypotheses, HLM models require five conditions to be satisfied (see above table).

  • Condition 1: These is Systematic Within- and Between-Group Variance in GPA
    • It provides useful preliminary information and assures that there is appropriate variance to investigate the hypotheses
    • HLM applies a one-way ANOVA to partition the within- and between-group variance
      \(Y_{ij} = \beta_{0j} + r_{ij}\)
      \(\beta_{0j} = \gamma_{00} + U_{0j}\)
      \(\beta_{0j}\): mean GPA for classroom j
      \(\gamma_{00}\): grand mean GPA
      \(var(r_{ij}) = \sigma^{2}\): within group variance in GPA
      \(var(U_{0j}) = \tau_{00}\): between group variance in GPA

    • By running an initial ANOVA, HLM provides (1) the amount of variance within groups; (2) the amount of variance between groups; and (3) allows for the calculation of the ICC (intra-class correlation): ICC = \(\frac{\tau_{00}}{\tau_{00} + \sigma^{2}}\)

  • Condition 2 and 3: There is Significant Variance in the Level-1 Intercept and Slope
    • Once within- and between-group variance has been partitioned, HLM applies a random coefficient regression to test the second and third conditions.
    • equations
      \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}\)
      \(\beta_{0j} = \gamma_{00} + U_{0j}\)
      \(\beta_{1j} = \gamma_{10} + U_{1j}\)
      \(\gamma_{00}\): mean of the intercepts across classrooms
      \(\gamma_{10}\): mean of the slopes across classrooms (Hypothesis 1)
      \(var(r_{ij}) = \sigma^{2}\): Level-1 residual variance
      \(var(U_{0j}) = \tau_{00}\): variance in intercepts
      \(var(U_{1j}) = \tau_{11}\): variance in slopes

    • HLM runs a t-test on these parameters to assess whether they differ significantly from zero. This t-test reveals whether the pooled slope between GPA and breakfast consumption differs from zero.
    • HLM runs a \(\chi^{2}\) test to assess whether the variance in the intercept \((\tau_{00})\) and slopes \((\tau_{11})\)differs significantly from zero.
    • HLM calculates the percent of variance in GPA that is accounted for by breakfast comsuption (Level-1 predictor).
      \(R^{2}_{level-1 model}\) = \((\sigma^{2}_{oneway-ANOVA}\) - \(\sigma^{2}_{random-regression})\) / \(\sigma^{2}_{oneway-ANOVA}\)

    • Condition 2 and 3 must first be met in order to test Condition 4 and 5!

  • Condition 4: The Variance in the Level-1 Intercept is Predicted by Teaching Style (Level-2 predictor)
    • Condition 4 assesses whether the significant variance at the intercepts (found in the second condition) is related to teaching style (Level-2 predictor). It is also known as the intercepts-as-outcomes model.
    • HLM uses another random regression model to assess whether teaching style is significantly related to the intercept while holding breakfast consumption constant. \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}\)
      \(\beta_{0j} = \gamma_{00} + \gamma_{01}(G_{j}) + U_{0j}\)
      \(\beta_{1j} = \gamma_{10} + U_{1j}\)
      \(\gamma_{00}\): Level-2 intercept
      \(\gamma_{01}\): Level-2 slope (Hypothesis 2)
      \(\gamma_{10}\): mean (pooled) slopes
      \(var(r_{ij}) = \sigma^{2}\): Level-1 residual variance
      \(var(U_{0j}) = \tau_{00}\): residual intercept variance
      \(var(U_{1j}) = \tau_{11}\): variance in slopes

    • The residual variance \(\tau_{00}\) is assessed for significance using another \(\chi^{2}\) test. If this test indicates a significant value, other level-2 predictors can be added to account for this variance.
    • To assess how much variance in GPA is accounted for by teaching style, the variance attributable to teaching style is compared to the total intercept variance
      \(R^{2}_{level-2 intercept-model}\) = \((\tau_{00 - random-regression}\) - \(\tau_{00 - intercepts-as-outcomes})\) / \(\tau_{00 - random-regression}\)

  • Condition 5: The variance in the Level-1 Slope is Predicted by Teaching Style (Level-2 predictor)
    • Condition 5 assesses whether the difference in slopes is related to teaching style. It is known as the slopes-as-outcomes model. \(Y_{ij} = \beta_{0j} + \beta_{1j}(X_{ij}) + r_{ij}\)
      \(\beta_{0j} = \gamma_{00} + \gamma_{01}(G_{j}) + U_{0j}\)
      \(\beta_{1j} = \gamma_{10} + \gamma_{11}(G_{j}) + U_{1j}\)
      \(\gamma_{00}\): Level-2 intercept
      \(\gamma_{01}\): Level-2 slope (Hypothesis 2)
      \(\gamma_{10}\): Level-2 intercept
      \(\gamma_{11}\): Level-2 slope (Hypothesis 3)
      \(var(r_{ij}) = \sigma^{2}\): Level-1 residual variance
      \(var(U_{0j}) = \tau_{00}\): residual intercept variance
      \(var(U_{1j}) = \tau_{11}\): residual slope variance

    • the percent of variance attributable to teaching style (Level-2 predictor) can be computed as a moderator in the breakfast consumption-GPA relationship by comparing its systematic variance with the pooled variance in the slopes
      \(R^{2}_{level-2 slope-model}\) = \((\tau_{11 - intercepts-as-outcomes}\) - \(\tau_{11 - slopes-as-outcomes})\) / \(\tau_{11 - intercepts-as-outcomes}\)


Regression Discontinuity and Interrupted Time Series Designs

Equation

  • Change in Intercept
    \(Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \pi_{2i}Treatment_{ij} + \epsilon_{ij}\)
    • when Treatment = 0: \(Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \epsilon_{ij}\)
    • when Treatment = 1: \(Y_{ij} = (\pi_{0i} + \pi_{2i}) + \pi_{1i}Time_{ij} + \epsilon_{ij}\)
  • Change in Slope
    • when Treatment = 0: \(Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \epsilon_{ij}\)
    • when Treatment = 1:\(Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \pi_{3i}Treatment \times Time_{ij} + \epsilon_{ij}\)
  • Change in Intercept and Slope
    \(Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \pi_{2i}Treatment_{ij} + \pi_{3i}Treatment \times Time_{ij} + \epsilon_{ij}\)
    • when Treatment = 0: \(Y_{ij} = \pi_{0i} + \pi_{1i}Time_{ij} + \epsilon_{ij}\)

alt text


Extensions of HLM

  • Categorical outcome variables
  • Cross-classified data
    • e.g. friendship circles
  • Multivariate (growth) models
  • Modeling of latent variables (measurement models)
    • Latent traits models (item response theory)
    • Factor analysis (structure equation models)
    • Latent class models
    • Latent profiles models

Other HLM Topics

Individual growth (longitudinal) model

  • Subjects are measured repeatedly over time to study individual growth
    • The same measure is used (required vertical equating in IRT)
    • Data collected at multiple time points
    • More time points (waves) are always recommended
  • Two-level models are used to represent observations within persons design
    • The individual growth trajectory (level 1) depends on personal characteristics (level 2) - level 2 variables are time invariant
    • Various numbers and intervals of time points across persons are not an problem - especially appropriate to handle missing data

Hierarchical Generalized Linear Model (HGLM)

  • HLM assumes a linear function of the outcome by regression coefficient and normal random terms
  • The assumption of linearity and normality are not realistic for other types of outcome variable such as binary, count, or multinomial data
    • No restriction on predicted L1 outcome
    • Random terms are influenced and can’t be normal
  • HGLM offer the capacity of modeling hierarchical data with non-linear structure models and non-normally distributed error terms
    • GLM

    alt text

  • Types of HGLM Outcomes
    • Bernoulli - binary, pass or fail
    • Binomial - number of passes in n trials
    • Multinomial - times getting two 3 and four 6 when throwing a die for 30 times
    • Poisson - count data (number of crimes for a person in five years or in certain neighborhoods)
    • Ordinal - Likert scale items
    • For all the above distributions, outcome is transformed and the transformed outcome is linked to a linear function of regression coefficient - called linked function

Conclusion (in Woltman et al., 2012)

HLM is a multi-step, time-consuming process. It can accommodate any number of hierarchical levels, but the workload increases exponentially with each added level. Compared to most other statistical methods commonly used in psychological research, HLM is relatively new and various guidelines for HLM are still in the process of development (Beaubien et al., 2001; Raudenbush & Bryk, 2002). Prior to conducting an HLM analysis, background interaction effects between predictor variables should be accounted for, and sufficient amounts of within- and between-level variance at all levels of the hierarchy should be ensured. HLM presumes that data is normally distributed: When the assumption of normality for the predictor and/or outcome variable(s) is violated, this range restriction biases HLM output. Finally, as previously mentioned, outcome variable(s) of interest must be situated at the lowest level of analysis in HLM (Beaubien et al., 2001).

Although HLM is relatively new, it is already being used in novel ways across a vast range of research domains. Examples of research questions analyzed using HLM include the effects of the environment on aspects of youth development (Avan & Kirkwood, 2001; Kotch, et al., 2008; Lyons, Terry, Martinovich, Peterson, & Bouska, 2001), longitudinal examinations of symptoms in chronic illness (Connelly, et al., 2007; Doorenbos, Given, Given, & Verbitsky, 2006), relationship quality based on sexual orientation (Kurdek, 1998), and interactions between patient and program characteristics in treatment programs (Chou, Hser, & Anglin, 1998).


How to conduct HLM in R


Reference

  • Woltman, H.et al. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52-69.
  • Sullivan, L. et al. (1999). Tutorial in biostatistics: an introduction to hierarchical linear modeling. Statistics in Medicine, 18, 855-888.
  • Chen J.S. (2008). Introduction to HLM and some special topics.
  • Stevens, J. (2007). Hierarchical Linear Models.
  • Raudenbush, S.W.& Bryk, A.S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods (2nd ed.). Thousand Oaks, CA: Sage Publication.
  • Willett, J.B. et al. (1998). The design and analysis of longitudinal studies of development and psychopathology in context: Statistical models and methodological recommendations. Developmental Psychopathology, 10, 395-426.