Mixed effect models: there be dragons

15/3/2021

Outline

What problem do mixed effect models solve?
Random versus fixed effects
Parameters
Equations for dragons

Intro to mixed effect models

Mixed models are also known as mixed effects models or multilevel models
used when the data have some sort of hierarchical form such as in longitudinal or panel data, repeated measures, time series and blocked experiments
which can have both fixed and random coefficients together with multiple error terms.
Greatest benefit is they save degrees of freedom

Challenges of mixed effect models

Super easy to code in R (end of next lecture)
Difficult to interpret (end of next lecture)
Somewhere in between to understand how they work (today)

Random and fixed effects

Always categorical predictor variable
Not so much the variables themselves, but rather what you are interested in
Are you measuring a few specific instances of interest in themselves (=fixed) or a few randomly chosen instances interesting only as representatives of a population (=random).
fixed reporting n site estimates (s1, s2,…,sn). If it is random it is reporting a single \(\sigma^2\) value with \(\sigma^2\) =variance(s1, s2,…,sn). Which one do you want?
Long discussion https://dynamicecology.wordpress.com/2015/11/04/is-it-a-fixed-or-random-effect/

Parameters

degrees of freedom = sample size (n) - number of parameters (p)
the more parameters you have, the more data you need
rule of thumb you need 10 times more data than parameters you are trying to estimate
degrees of freedom eat up power (next slide)

degrees of freedom eat up power

Source	Sum of squares	Degrees of freedom	Mean squares	F
Garden	20	1	20	15
Error	24	18	s^2 = 1.3333
Total	44	19

Degrees of freedom (n-p)
- Garden: 2 levels, 1 parameter, therefore 2-1
- Error: 20 samples, 2 parameters (look at the equation). 20-2
- Total: Add up the other two
Mean squares (Mean squared deviation - lecture 2) = SS/df
F = Mean squares (treatment) / Mean squares (error) = 20/1.333 [Think signal over noise]

Dragons!

Taken from https://gkhajduk.github.io/2017-03-09-mixed-models/
Next lecture we will analyse this data
Today lets thinks about the model

Dragons

Imagine that we decided to train dragons and so we went out into the mountains and collected data on dragon intelligence (testScore) as a prerequisite. We sampled individuals over a range of body lengths and across three sites in eight different mountain ranges.

\[ \text{Model 1: } Y_i = \alpha + \beta bodylengths_i + \epsilon_i \text{ and } \epsilon_i \sim N(0,\sigma^2) \]

Going back to the \(\alpha\) as intercept nomenclature
This model has three unknown parameters (one intercept, one slope and the error term)
Assumes that testscore/body length relationship is the same in each mountain

Including mountain

\[\text{Model 2: } Y_{ij} = \alpha_j + \beta_j bodylengths_{ij} + \epsilon_{ij} \text{ and } \epsilon_{ij} \sim N(0,\sigma^2) \]

\(j\) is 1, … ,8 and \(i\) is 1, … , number of samples per mountain
This is an ANCOVA using mountain as a factor, body length as a continous explanatory variable and an interaction term (mountain:bodylength).
Here the regression lines are allowed to have different intercepts and different slopes for each mountain. That is , the model does not assume that testscore/body length relationship is the same in each mountain
But it has 17 parameters (8 (mountains) x 2 (slope and intercept) + 1 (error term))
If we included site, that would increase by a factor of 3 (49 = (8 x 3 x 2) + 1)

Two simpler models

\[\text{Model 3: } Y_{ij} = \alpha_j + \beta bodylengths_{ij} + \epsilon_{ij} \text{ and } \epsilon_{ij} \sim N(0,\sigma^2) \]

Different intercepts but same slope for each mountain
10 parameters (8 (mountains) x 1 (intercept) + 1 (slope) + 1 (error term))

\[\text{Model 4: } Y_{ij} = \alpha + \beta_j bodylengths_{ij} + \epsilon_{ij} \text{ and } \epsilon_{ij} \sim N(0,\sigma^2) \]

the intercepts are kept same and the slopes are allowed to differ.
10 parameters (8 (mountains) x 1 (slope) + 1 (intercept) + 1 (error term))

Enter the mixed effect model

The price of 14 (17-3) extra regression parameters can be rather large, namely in the loss of precious degrees of freedom.
To avoid this, mixed modelling can be used.
But there is another motivation for using mixed modelling with these data:
- If mountain is used as a fixed term, we can only make a statement of testscore/bodylength relationships for these particular mountains
- whereas if we use it as a random component, we can predict the testscore/bodylength relationship for all similar mountains.

We’ll start with one of the simpler models

\[\text{Model 3: } Y_{ij} = \alpha_j + \beta bodylengths_{ij} + \epsilon_{ij} \text{ and } \epsilon_{ij} \sim N(0,\sigma^2) \]

\[\text{Model 5: } Y_{ij} = \alpha + \beta bodylengths_{ij} + a_j + \epsilon_{ij} \text{ where } \] \[ a_j \sim N(0,\sigma_a^2) \text{ and } \epsilon_{ij} \sim N(0,\sigma^2) \]

We assume there is only one regression line with a single intercept and a single slope. The single intercept (\(\alpha\)) and the single slope (\(\beta\)) are called the fixed parameters
Additionally there is a random intercept \(a_j\), which adds a certain amount of random variation to the intercept at each mountain
So the unknown parameters are \(\alpha\), \(\beta\), the variance of the noise \(\sigma^2\) and the variance of the random intercept \(\sigma_a^2\). That is only 4 parameters. This is the magic of the mixed effect model.

The mixed effect equivalent of model 2

Slopes and intercepts can change \[\text{Model 6: } Y_ij = \alpha + a_j + \beta bodylengths_{ij} + b_j bodylengths_{ij} + \epsilon_{ij} \text{ where } \]

\[ a_j \sim N(0,\sigma_a^2) \text{ and } b_j \sim N(0,\sigma_b^2) \text{ and }\epsilon_{ij} \sim N(0,\sigma^2) \]

\(b_j\) random variation of the slope at each mountain
5 parameters as oppose to 17!!!!!!!!

Next time

We will code this example as a linear mixed effect model
You can extend mixed effect models to be generalized (generalized mixed effect models), but that’s getting quite advanced (one for your own enjoyment)