Generalized Linear Mixed Effects models

Alejandro Molina Moctezuma

Review

Simple linear model
Deterministic: \(E[y_i] = \beta_0 + \beta_1x\)
Stochastic: \(y_i \sim Normal(E[y_i], \sigma)\)
If it’s continuous:

Review

Simple linear model
Deterministic: \(E[y_i] = \beta_0 + \beta_1x\)
Stochastic: \(y_i \sim Normal(E[y_i], \sigma)\)
If it’s discrete:
Where \(x_1\) = 1 or 0

(Intercept)           x 
   7.087216    3.382339

Review

Multiple groups
Deterministic: \(E[y_i] = \beta_0 + \beta_1x_{1,i} + \beta_2x_{2,i}\)
Stochastic: \(y_i \sim Normal(E[y_i], \sigma)\)
Where \(x_1\) = 1 or 0
And \(x_2\) = 1 or 0
Group Value of x1 Value of x2

1 0 0

2 1 0

3 0 1

Group	Value of x1	Value of x2
1	0	0
2	1	0
3	0	1

Review

We can have multiple regression

Deterministic: \(E[y_i] = \beta_0 + \beta_1x_{1,i} + \beta_2x_{2,i}\)
Stochastic: \(y_i \sim Normal(E[y_i], \sigma)\)

x1 = continuous
x2 = 1 or 0

Image

Linear models can be very complex

As many \(\beta s\) as you want (limit n-1)
Continuous and categorical
Polynomial
multiple continuous variables that are interactive

Assumptions

Linearity
Normality
Independence
Equal variance

GLM’s

Generalized linear models

How many assumptions does it break?

GLM’s

How many assumptions does it break?

Glm’s

What do Glm’s do?

Transform the response to linear
Have a different distribution of the residuals

If normal:

\[ \underbrace{E[y_i]}_{\text{expected value}} = \underbrace{\beta_0 + \beta_1x_{1,i} + ... \beta_mx_{m,i}}_{deterministic} \]
\[ y_i \sim \underbrace{N(mean=E[y_i], var=\sigma^2)}_{stochastic} \]
Poisson glm:
\[ \underbrace{log(\lambda)}_{\text{link function}} = \underbrace{\beta_0 + \beta_1x_{1,i} + ... \beta_mx_{m,i}}_{deterministic} \]

\[ y_i \sim \underbrace{Poisson(\lambda)}_{stochastic} \]
Negative Binomial glm:
\[ \underbrace{log(\lambda)}_{\text{link function}} = \underbrace{\beta_0 + \beta_1x_{1,i} + ... \beta_mx_{m,i}}_{deterministic} \]

\[ y_i \sim \underbrace{NB(\mu,\theta)}_{stochastic} \]
\(variance = \frac{\theta}{\mu+\theta}\)

GLM’s

GLM’s

Glm’s

Hurdle models

Create two datasets: One with zeroes and ones \(Z_i\)
\(Z_i \sim Bernoulli (p_i)\)
\(logit(pi) = \beta_0 + ...\)
and then a Poisson or negative binomial

Mixed effects

Set nets on each of those sites
Measure 50, 43, 67, and 90 fish. For mercury concentration
What assumptions are we breaking?

We don’t do this

\(E(Hg_i) = \beta_0 + \beta_1*size_{i}\)
Pseudoreplication
\(E(Hg_i) = \beta_0 + \beta_1size_{i} + \beta_2 site2_i + \beta_3 site3_i + \beta_4 site4_i\)
We don’t care about the specific sites… we want whole population-wide!

What do we want to know?

Mercury concentration population-wide
Variance introduces by placement of the nets
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{(\beta_1+\underbrace{\psi_j}_{\text{Random slope}})size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
\(\gamma_j \sim Normal(0,\sigma_\gamma)\)
\(\psi_j \sim Normal(0,\sigma_\psi)\)
\(\epsilon \sim Normal(0,\sigma)\)

Mixed models

If we set all random effects to zero, we get population mean, and predicted value for a random individual

Mixed effects

Set nets on each of those sites
Measure 50, 43, 67, and 90 fish. For number of parasites OR presence of parasites?
What assumptions are we breaking?

Generalized linear mixed effects models

Linear portion of the Mixed effects models have a deterministic and stochastic component
Count data
Binary data (1, 0)
Multinomial response variable

Generalized linear mixed effects models

We talked last week about overdispersion
Poisson, Negative Binomial, Bernoulli and Binomial are not parameterized in terms of separate mean and variance parameters
Normal distribution is: \(Normal(\mu, \sigma)\)
Mean and variance of Poisson are \(\lambda\)
Mean and variance of Bernoulli are a function of p
\(Mean = p\) and \(variance=p(1-p)\)

GLMM’s

How can we run mixed effects models?
Easy way: Add random effects to the linear predictor, leading to generalized linear mixed effect models
Essentially, you have two sources of variation
One is normally dsitributed, the other one is distributed according to a different distribution

Example

A poisson glm:

\(log(\lambda) = \beta_0 + \beta_1x_i\)
\(y_i \sim Poisson(\lambda)\)
A glmm: Poisson-normal
\(log(\lambda) = (\beta_0+\gamma) + (\beta_1+\psi)x_{ij}\)
\(\gamma \sim N(0,\sigma_\gamma)\) , \(\psi\sim N(0,\sigma_\psi)\)
\(y_i \sim Poisson(\lambda)\)

Example

Log norm

Parameter interpretation

Not easy to interpret!
In random effects if we set all random effects to 0, then we estimate the mean for a “typical” individual
“typical means subject, site, individual, etc.”
The variances are in different distributions
Typical individual does not equal “population average response”

Example

Both curves are not lining up
Due to non-logit transformations (random effects are normal)

Interpreting data

Individual response curves (black), the response curve for a typical individual with random effects at zeroes, and the population mean response curve (blue) and on the logit and probability scales

Take away

If you do glmm’s be very careful about interpretation

In general, transforming data can be risky

Solutions?

Package GLMMadaptive estimates marginal means
How can we run mixed effects models?
Easy way: Add random effects to the linear predictor, leading to generalized linear mixed effect models
Hards way: Generalized Estimating Equations
https://fw8051statistics4ecologists.netlify.app/gee