DEM 7903 Week 6: Longitudinal models for change using GEE's

Corey S. Sparks, PhD
February 24, 2016

Introduction to GEE's

Up until now, we have used (G)LMM's to analyze data that were “clustered”

Persons within neighborhoods
Survey data in general - stratified sampling
Persons over time / Longitudinal data

The next topic will introduce a modeling strategy that allows us to consider clustered data, but in a different fashion

GLMMS's

GLMM's are commonly referred to as conditional models, because the model coefficients “\( \beta \)'s” are condition on the random effects in the model.

Likewise, the mean if conditional on the random effects. This is another way of saying that the mean for a given covariate pattern is conditional on the group that the particular person is in.

\( \mu_{ij}^c = E(Y_{ij} | u_j) = X_{ij}\beta + u_j \)

GLMMS's and GEE's

In contrast, Generalzed Estimating Equations are referred to as marginal models because they only estimate the overall mean.

\( \mu_{ij} = X_{ij}\beta \)

Lee and Nelder, 2004 provide a very good description of how these two methods compare to one another

Generalized Estimating Equations

Typically first attributed to Liang and Zeger, 1986
GEE's are regression models
Interested in modeling the mean response, while treating correlation within person/cluster as a nuisance
NOT based on maximum likelihood
Does not need a fully specified joint distribution, only the marginal distribution (mean) of the outcome
Models can be for any distribution for the outcome

GEE's

For longitudinal data, we assume we have \( y_{ij} \) as our outcome on person i at time j. This could just as easily be persons within other types of clusters, like counties or sampling units.
Also have \( X_{ij} \), the matrix of predictors
Specify the link function between \( y_{ij} \) and \( X_{ij} \) as in a GLM, via a link function
Focus is on the linear predictor of the link function - the mean
NOT INTERESTED in variance components ONLY regression coefficients

GEE's

Covariance structure
- We also may wish to model how observations are related to one another via some type of correlation structure between waves
- This directly implies that observations are NOT INDEPENDENT, and that's fine
- Observations between clusters are independent
- Errors are correlated
- No assumption of common variance (homoskedsasticity)

GEE's - Model form

A basic form of the model would be:

\( Y_{ij} = \beta_0 + \sum_k X_{ijk} \beta_k + CORR + error \)

Ordinary models will tend to over estimate the standard errors for the \( \beta \)'s for time varying predictors in a model with repeated observations, because these models do not account for the correlation within clusters \ observations over time.

Likewise, the standard errors of time invariant predictors will be under estimated

GEE's - Model estimation

Given the mean function for the model and a specified correlation function, the model parameters may be estimated by finding the solution for:

\[ U(\beta) = \sum_i ^n \frac{\delta \mu_{ij}}{ \delta \beta_k} V_i ^{-1} (Y_{ij} - \mu(\beta)) \]

Which gives estimates of the \( \beta \)'s for the linear mean function.

GEE's - Model estimation

First, a naive linear regression analysis is carried out, assuming the observations within subjects are independent.
Then, residuals are calculated from the naive model (observed-predicted) and a working correlation matrix is estimated from these residuals.
Then the regression coefficients are refit, correcting for the correlation. (Iterative process)
The within-subject correlation structure is treated as a nuisance variable (i.e. as a covariate)

GEE's - Correlation Structure

For three time points per person, the ordinary regression model correlation in residuals within clusters/persons over time can be thought of as the matrix:

\[ \begin{bmatrix} \sigma^2 & 0 & 0 \\ 0 & \sigma^2 &0 \\ 0 & 0 & \sigma^2 \end{bmatrix} \]

which assumed the variances are constant and the residuals are independent over time

GEE's - Correlation Structure

But in a GEE, the model include the actual correlation between measurements over time:

\[ \begin{bmatrix} \sigma_1 ^2 & a & c \\ a & \sigma_2 ^2 &b \\ b & c & \sigma_3 ^2 \end{bmatrix} \]

Which allows the variances over time to be different, as well as correlations between times to be present.

GEE's - Correlation Structure

Several types of correlation/covariance are commonly used in GEE's
When we fit a GEE, we have to assume a certain type of correlation for the repeated measures. These are typically:
- Independence - same as OLS
- Exchangeable/compound symmetry (simplest)
- Autoregressive
- Unstructured (most complicated)

GEE's - Correlation Structure - Independent

\[ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 &0 \\ 0 & 0 & 1 \end{bmatrix} \]

GEE's - Correlation Structure - Exchangeable

\[ \begin{bmatrix} 1 & \rho & \rho \\ \rho & 1 &\rho \\ \rho &\rho & 1 \end{bmatrix} \]

GEE's - Correlation Structure

\[ \begin{bmatrix} 1 & \rho & \rho^2 \\ \rho & 1 &\rho\\ \rho^2 & \rho & 1 \end{bmatrix} \]

GEE's - Correlation Structure - Unstructured

\[ \begin{bmatrix} 1 & \rho_1 & \rho_2 \\ \rho_1 & 1 &\rho_3 \\ \rho_2 & \rho_3& 1 \end{bmatrix} \]