An observation level random effect (OLRE) is a random effect with a different level for each observation. Combined with a Poisson distribution is can capture moderate overdispersion.
The model below is a simple mixed model. \(\delta_{ijk}\) is the ORLE.
\[Y_{ijk} \sim Poisson(\mu_{ijk})\]
\[\log(\mu_{ijk}) = \eta_{ijk}\]
\[\eta_{ijk} = \beta_0 + \beta_1 X_i + b_j + \delta_{ijk}\]
\[b_j \sim N(0, \sigma_b)\]
\[\delta_{ijk} \sim N(0, \sigma_{olre})\]
Let’s generate some overdispersed data using a negative binomial distribution.
set.seed(324)
n.i <- 10
n.j <- 10
n.k <- 10
beta.0 <- 1
beta.1 <- 0.3
sigma.b <- 0.5
theta <- 5
dataset <- expand.grid(
X = seq_len(n.i),
b = seq_len(n.j),
Replicate = seq_len(n.k)
)
rf.b <- rnorm(n.j, mean = 0, sd = sigma.b)
dataset$eta <- beta.0 + beta.1 * dataset$X + rf.b[dataset$b]
dataset$mu <- exp(dataset$eta)
dataset$Y <- rnbinom(nrow(dataset), mu = dataset$mu, size = theta)
dataset$OLRE <- seq_len(nrow(dataset))
Next we fit the model with an observation level random effect.
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: Y ~ X + (1 | b) + (1 | OLRE)
## Data: dataset
##
## AIC BIC logLik deviance df.resid
## 6871.8 6891.4 -3431.9 6863.8 996
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.06918 -0.41636 -0.00649 0.27116 1.55094
##
## Random effects:
## Groups Name Variance Std.Dev.
## OLRE (Intercept) 0.1732 0.4161
## b (Intercept) 0.1475 0.3840
## Number of obs: 1000, groups: OLRE, 1000; b, 10
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.002971 0.127699 7.85 4.02e-15 ***
## X 0.298558 0.005803 51.45 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## X -0.282
The estimates for \(\beta_0\), \(\beta_1\) and \(\sigma_b\) are reasonably close to the ones used to generate the data.
Models with an OLRE should be used carefully. Because OLRE can have a very strong influence on the model. One should always check the standard deviation of the ORLE. High standard deviations are a good indication for problems. Let’s show why.
The main model (without OLRE) is \(\gamma_{ijk} = \beta_0 + \beta_1 X_i + b_j\). The OLRE correct this for overdispersion \(\eta_{ijk} = \gamma_{ijk} + \delta_{ijk}\). Since \(\delta_{ijk} \sim N(0, \sigma_{olre})\), the 2.5% and 97.5% quantiles of this distribution define a range in which 95% of the plausible OLRE values are. The 2.5% quantile (\(\delta_{lcl}\)) indicates a strong but plausible downward correction, the 97.5% quantile (\(\delta_{ucl}\)) an upward correction. So due to the corrections we have \(\eta_{ijk} = (\gamma_{ijk} + \delta_{lcl}; \gamma_{ijk} + \delta_{ucl}) = \gamma_{ijk} + (\delta_{lcl};\delta_{ucl})\).In a Poisson distribution with log-link we have \(\log(\mu_{ijk}) = \eta_{ijk}\) or \(\mu_{ijk} = e ^{\eta_{ijk}} = e ^{\gamma_{ijk} + (\delta_{lcl};\delta_{ucl})}\). The ratio \(r_{ijk}\) between the upper and lower boundary of \(\mu_{ijk}\) is:
\[r_{ijk} = \frac{e ^{\gamma_{ijk} + \delta_{ucl}}}{e ^{\gamma_{ijk} + \delta_{lcl})}}\]
\[\log(r_{ijk}) = \log(\frac{e ^{\gamma_{ijk} + \delta_{ucl}}}{e ^{\gamma_{ijk} + \delta_{lcl})}})\]
\[\log(r_{ijk}) = \log(e ^{\gamma_{ijk} + \delta_{ucl}}) - \log(e ^{\gamma_{ijk} + \delta_{lcl}))}\]
\[\log(r_{ijk}) = \gamma_{ijk} + \delta_{ucl} - \gamma_{ijk} - \delta_{lcl} = \delta_{ucl} - \delta_{lcl}\]
The 2.5% quantile of \(N(0, \sigma_{olre})\) is \(\delta_{lcl} = -1.96 \sigma_{olre}\). The 97.5% quantile of \(N(0, \sigma_{olre})\) is \(\delta_{lcl} = +1.96 \sigma_{olre}\). Hence
\[\log(r_{ijk}) = 1.96 * \sigma_{olre} - (-1.96\sigma_{olre}) \approx 4 \sigma_{olre}\]
Or
\[r_{ijk} \approx 50.3 e ^ {\sigma_{olre}}\]
So when two observations have the same fixed and random covariates, expect of the OLRE, then the ratio between a fitted value with high and low OLRE is a whopping 50 times \(e ^ {\sigma_{olre}}\).
This can be reasonable for small values of \(\sigma_{olre}\).
But it goes though the roof quickly with even moderate values of \(\sigma_{olre}\).
theta <- 0.15
dataset$Y2 <- rnbinom(nrow(dataset), mu = dataset$mu, size = theta)
m2 <- glmer(Y2 ~ X + (1 | b) + (1 | OLRE), data = dataset, family = poisson)
summary(m2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: Y2 ~ X + (1 | b) + (1 | OLRE)
## Data: dataset
##
## AIC BIC logLik deviance df.resid
## 5612.8 5632.4 -2802.4 5604.8 996
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.45313 -0.35115 -0.26657 0.06685 0.17419
##
## Random effects:
## Groups Name Variance Std.Dev.
## OLRE (Intercept) 10.71 3.272
## b (Intercept) 0.00 0.000
## Number of obs: 1000, groups: OLRE, 1000; b, 10
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.16115 0.29531 -7.318 2.51e-13 ***
## X 0.27762 0.04227 6.568 5.11e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## X -0.850
m2.nb <- glmer.nb(Y2 ~ X + (1 | b), data = dataset)
summary(m2.nb)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(0.1442) ( log )
## Formula: Y2 ~ X + (1 | b)
## Data: dataset
##
## AIC BIC logLik deviance df.resid
## 5509.7 5529.3 -2750.8 5501.7 996
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.3794 -0.3777 -0.3706 -0.1276 11.8382
##
## Random effects:
## Groups Name Variance Std.Dev.
## b (Intercept) 0.1257 0.3545
## Number of obs: 1000, groups: b, 10
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.93893 0.21541 4.359 1.31e-05 ***
## X 0.32380 0.02924 11.074 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## X -0.756
The estimates for \(\beta_0\), \(\beta_1\) and \(\sigma_b\) are quite different with the ORLE model. The negative binomial model performs reasonable for \(\beta_0\), \(\beta_1\) but not for \(\sigma_b\).