Observation level random effects

Definition

An observation level random effect (OLRE) is a random effect with a different level for each observation. Combined with a Poisson distribution is can capture moderate overdispersion.

Example

The model below is a simple mixed model. \(\delta_{ijk}\) is the ORLE.

\[Y_{ijk} \sim Poisson(\mu_{ijk})\]

\[\log(\mu_{ijk}) = \eta_{ijk}\]

\[\eta_{ijk} = \beta_0 + \beta_1 X_i + b_j + \delta_{ijk}\]

\[b_j \sim N(0, \sigma_b)\]

\[\delta_{ijk} \sim N(0, \sigma_{olre})\]

Let’s generate some overdispersed data using a negative binomial distribution.

set.seed(324)
n.i <- 10
n.j <- 10
n.k <- 10
beta.0 <- 1
beta.1 <- 0.3
sigma.b <- 0.5
theta <- 5
dataset <- expand.grid(
  X = seq_len(n.i),
  b = seq_len(n.j),
  Replicate = seq_len(n.k)
)
rf.b <- rnorm(n.j, mean = 0, sd = sigma.b)
dataset$eta <- beta.0 + beta.1 * dataset$X + rf.b[dataset$b]
dataset$mu <- exp(dataset$eta)
dataset$Y <- rnbinom(nrow(dataset), mu = dataset$mu, size = theta)
dataset$OLRE <- seq_len(nrow(dataset))

Next we fit the model with an observation level random effect.

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: Y ~ X + (1 | b) + (1 | OLRE)
##    Data: dataset
## 
##      AIC      BIC   logLik deviance df.resid 
##   6871.8   6891.4  -3431.9   6863.8      996 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.06918 -0.41636 -0.00649  0.27116  1.55094 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  OLRE   (Intercept) 0.1732   0.4161  
##  b      (Intercept) 0.1475   0.3840  
## Number of obs: 1000, groups:  OLRE, 1000; b, 10
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 1.002971   0.127699    7.85 4.02e-15 ***
## X           0.298558   0.005803   51.45  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##   (Intr)
## X -0.282

The estimates for \(\beta_0\), \(\beta_1\) and \(\sigma_b\) are reasonably close to the ones used to generate the data.

Check the sanity of the model

Models with an OLRE should be used carefully. Because OLRE can have a very strong influence on the model. One should always check the standard deviation of the ORLE. High standard deviations are a good indication for problems. Let’s show why.

The main model (without OLRE) is \(\gamma_{ijk} = \beta_0 + \beta_1 X_i + b_j\). The OLRE correct this for overdispersion \(\eta_{ijk} = \gamma_{ijk} + \delta_{ijk}\). Since \(\delta_{ijk} \sim N(0, \sigma_{olre})\), the 2.5% and 97.5% quantiles of this distribution define a range in which 95% of the plausible OLRE values are. The 2.5% quantile (\(\delta_{lcl}\)) indicates a strong but plausible downward correction, the 97.5% quantile (\(\delta_{ucl}\)) an upward correction. So due to the corrections we have \(\eta_{ijk} = (\gamma_{ijk} + \delta_{lcl}; \gamma_{ijk} + \delta_{ucl}) = \gamma_{ijk} + (\delta_{lcl};\delta_{ucl})\).In a Poisson distribution with log-link we have \(\log(\mu_{ijk}) = \eta_{ijk}\) or \(\mu_{ijk} = e ^{\eta_{ijk}} = e ^{\gamma_{ijk} + (\delta_{lcl};\delta_{ucl})}\). The ratio \(r_{ijk}\) between the upper and lower boundary of \(\mu_{ijk}\) is:

\[r_{ijk} = \frac{e ^{\gamma_{ijk} + \delta_{ucl}}}{e ^{\gamma_{ijk} + \delta_{lcl})}}\]

\[\log(r_{ijk}) = \log(\frac{e ^{\gamma_{ijk} + \delta_{ucl}}}{e ^{\gamma_{ijk} + \delta_{lcl})}})\]

\[\log(r_{ijk}) = \log(e ^{\gamma_{ijk} + \delta_{ucl}}) - \log(e ^{\gamma_{ijk} + \delta_{lcl}))}\]

\[\log(r_{ijk}) = \gamma_{ijk} + \delta_{ucl} - \gamma_{ijk} - \delta_{lcl} = \delta_{ucl} - \delta_{lcl}\]

The 2.5% quantile of \(N(0, \sigma_{olre})\) is \(\delta_{lcl} = -1.96 \sigma_{olre}\). The 97.5% quantile of \(N(0, \sigma_{olre})\) is \(\delta_{lcl} = +1.96 \sigma_{olre}\). Hence

\[\log(r_{ijk}) = 1.96 * \sigma_{olre} - (-1.96\sigma_{olre}) \approx 4 \sigma_{olre}\]

\[r_{ijk} \approx 50.3 e ^ {\sigma_{olre}}\]

So when two observations have the same fixed and random covariates, expect of the OLRE, then the ratio between a fitted value with high and low OLRE is a whopping 50 times \(e ^ {\sigma_{olre}}\).

This can be reasonable for small values of \(\sigma_{olre}\).

But it goes though the roof quickly with even moderate values of \(\sigma_{olre}\).

Example with strong overdispersion

theta <- 0.15
dataset$Y2 <- rnbinom(nrow(dataset), mu = dataset$mu, size = theta)

m2 <- glmer(Y2 ~ X + (1 | b) + (1 | OLRE), data = dataset, family = poisson)
summary(m2)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: Y2 ~ X + (1 | b) + (1 | OLRE)
##    Data: dataset
## 
##      AIC      BIC   logLik deviance df.resid 
##   5612.8   5632.4  -2802.4   5604.8      996 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -0.45313 -0.35115 -0.26657  0.06685  0.17419 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  OLRE   (Intercept) 10.71    3.272   
##  b      (Intercept)  0.00    0.000   
## Number of obs: 1000, groups:  OLRE, 1000; b, 10
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.16115    0.29531  -7.318 2.51e-13 ***
## X            0.27762    0.04227   6.568 5.11e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##   (Intr)
## X -0.850

m2.nb <- glmer.nb(Y2 ~ X + (1 | b), data = dataset)
summary(m2.nb)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(0.1442)  ( log )
## Formula: Y2 ~ X + (1 | b)
##    Data: dataset
## 
##      AIC      BIC   logLik deviance df.resid 
##   5509.7   5529.3  -2750.8   5501.7      996 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.3794 -0.3777 -0.3706 -0.1276 11.8382 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  b      (Intercept) 0.1257   0.3545  
## Number of obs: 1000, groups:  b, 10
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.93893    0.21541   4.359 1.31e-05 ***
## X            0.32380    0.02924  11.074  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##   (Intr)
## X -0.756

The estimates for \(\beta_0\), \(\beta_1\) and \(\sigma_b\) are quite different with the ORLE model. The negative binomial model performs reasonable for \(\beta_0\), \(\beta_1\) but not for \(\sigma_b\).

Observation level random effects

Thierry Onkelinx

27 juni 2016

Definition

Example

Check the sanity of the model

Example with strong overdispersion