Second part: GLMM

Back to our example

We set 4 nets
Sites were randomly chosen
We are interested in the content of Hg in Walleye
Hg is correlated with size. The bigger the fish, the higher Hg content it has
So, normally we could do one thing:
\(E(Hg_i) = \beta_0 + \beta_1*size_{i}\)

Linear model


Call:
lm(formula = Hg ~ size, data = HgDat_df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.62057 -0.36606 -0.03252  0.35164  0.55112 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.861740   0.193306   4.458 5.45e-05 ***
size        0.010226   0.004662   2.194   0.0335 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3835 on 45 degrees of freedom
Multiple R-squared:  0.0966,    Adjusted R-squared:  0.07653 
F-statistic: 4.812 on 1 and 45 DF,  p-value: 0.03347

Use coefficients to estimate size at over 1.1 \(\mu g\) .
Limit fish of that size

The variance can affect the slope, the intercept, or both

Random effects introduce variance.
It can introduce variance to the intercept or to the slope

Mixed model

No mixed model:
\[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]
i individuals, j sites (4), mixed intercept:
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites
What does a mixed intercept mean?

Mixed model with random intercept

This is the result of a mixed model:

What does the random intercept mean?
Why not simply do:
\(Hg_i \sim \beta_0 + \beta_1size_{i} + \beta_2rB_{i} + \beta_3rC_{i} + \beta_4rD_{i} \epsilon\)
Model with effect of site

Mixed model with random intercept

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -132.3   -124.9     70.1   -140.3       43 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.141157 0.37571 
 Residual             0.001643 0.04053 
Number of obs: 47, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00164 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.604302   0.189016    3.20  0.00139 ** 
size        0.017296   0.000507   34.12  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]

Mixed effects models: how to run them?

library(glmmTMB)
m2<- glmmTMB(Hg~size +(1|region), data=HgDat_df)

Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1|region\)
Effect -> 1 (intercept)
Grouping factor -> region

Mixed effects model output

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -132.3   -124.9     70.1   -140.3       43 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.141157 0.37571 
 Residual             0.001643 0.04053 
Number of obs: 47, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00164 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.604302   0.189016    3.20  0.00139 ** 
size        0.017296   0.000507   34.12  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Mixed model with random intercept and random slope

\[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]
i individuals, j sites (4)
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{(\beta_1+\underbrace{\psi_j}_{\text{Random slope}})size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites

Random slope and intercept

m3<-glmmTMB(Hg~size +(1+size|region), data=HgDat2_df)
summary(m3)

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -133.9   -119.4     72.9   -145.9       76 

Random effects:

Conditional model:
 Groups   Name        Variance  Std.Dev. Corr  
 region   (Intercept) 3.323e-02 0.182284       
          size        4.497e-05 0.006706 -0.09 
 Residual             6.844e-03 0.082725       
Number of obs: 82, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00684 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.062721   0.098281  10.813   <2e-16 ***
size        0.029886   0.003458   8.642   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1 + size|region\)
Effect -> 1 (intercept) + size
Grouping factor -> region

Random slope and intercept

Only random slope

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (0 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -149.8   -140.1     78.9   -157.8       80 

Random effects:

Conditional model:
 Groups   Name Variance  Std.Dev.
 region   size 0.0001243 0.01115 
 Residual      0.0065550 0.08096 
Number of obs: 84, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00656 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.976118   0.032413  30.115  < 2e-16 ***
size        0.027121   0.005629   4.818 1.45e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example

Zuur, A., Ieno, E. N., Walker, N., Saveliev, A. A. & Smith, G. M. Mixed Effects Models and Extensions in Ecology with R. (Springer New York, 2009).

https://www.highstat.com/

RIKZ data

RIKZ institute
Inter-tidal area (AKA beach): 9
In each beach, 5 samples were taken
Response variable: macro-fauna richness (AKA number of species)
NAP: height of sampling station compared to mean tidal level
Exposure (index). Of multiple environmental conditions. Treated as categorical, as there are three levels

RIKZ data

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)

Let’s work on this example

Objective

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)
 c

Exploring whether there is a relationship between richness and the two factors: NAP and exposure
We have an N of 45… but do we?
We have multiple potential alternatives: there is an effect of NAP, an effect of Exposure, an effect of both, of neither
Also… there are mixed effects
Random intercept, random slope, both? –> let’s wait to discuss this
Step 1: Read the data… there is a potential issue

rikz<-read.table(file = "RIKZ.txt", header = TRUE, dec = ",")
str(rikz)

Step 2: decide our analysis! 🤔 Any ideas? any details?

The data

There seems to be an effect of NAP…
how about… a random effect of beach?
step 3: how do we decide random effects?
step 3: random effects of NAP, Exposure? or both?
What do we do?

Model selection with random models

WE CANNOT TEST FIXED AND RANDOM EFFECTS AT THE SAME TIME

Test mixed effects first.
We use the “global” or saturated fixed model
How many models?
Use AIC
What model structure?

Let’s run the models

Mixed effects models in glmmTMB. How do we run them?
We have both fixed and mixed effects models in glmmTMB.
In this specific example, we are trying to run 4 models
Trying to figure out which random effects to use in our model selection step.
The first step is to run 4 models, each with the same fixed effects, but different random effects.
Here is how you run those models in r:
\[ \underbrace{model2}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (1|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution}) \]
mixed models: x|g
x is an effect
g is grouping factor (categorical)
\(1 |beach\)
Effect -> 1 (intercept)
grouping factor –> beach
In our example we are LEAVING FIXED EFFECTS THE SAME. THEY ARE ALWAYS Richness~Nap*Exposure.
However, we are exploring 4 random effects:

No random effects
Random intercept (model 2, the one showed above)
Random slope
Random intercept and random slope

Model 1:

The no random effects model is:
\[ log(ERichness) \sim \beta_0 + \beta_1 (NAP)_i + \beta_2 Exposure2_i ... \]
\[ Richness_i \sim Poisson(ERichness)_i \]
And the code to run it is:

model1<-glmmTMB(Richness~NAP*Exposure, data = rikz, family = poisson)

It has no random effect… so, it’s just run as a simple glm.

Model 2

The random intercept model is:

\[ log(ERichness) \sim (\beta_0 + \gamma_j) + \beta_1 (NAP)_i + \beta_2 Exposure2_i ... \]
\[ Richness_i \sim Poisson(ERichness)_i \]
And the way to introduce than randomness \(\gamma_j\) is:
\[ \underbrace{model2}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (1|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution}) \]
or:

model2<-glmmTMB(Richness~NAP*Exposure + (1|Beach), data = rikz, family = poisson)

If you see the equation, we need to estimate one gamma for each beach. That is why we use (1|Beach). - x is an effect

g is grouping factor (categorical)
\(1 |beach\)

Model 3

Random slope:
\[ log(ERichness) \sim \beta_0 + (\beta_1 * \psi_j * NAP_i )+ \beta_2 Exposure2_i ... \]
\[ Richness_i \sim Poisson(ERichness)_i \]
Here, there is a random slope (\(\psi_j\)), and it affects NAP. Exposure is categorical, so, it doesn’t hav a slope. This is how we run it:
\[ \underbrace{model3}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (0+NAP|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution}) \]
Here, we don’t want it to estimate random intercepts, hence the 0. However, we want it to estimate random slopes, so, that’s why we ass NAP

x is an effect

g is grouping factor (categorical)
\(0 + NAP |beach\)
No random intercept (0), and NAP (random slope)

Model 4:

Random slope and intercept:
\[ log(ERichness) \sim (\beta_0 + \gamma_j) + (\beta_1 * \psi_j * NAP_i )+ \beta_2 Exposure2_i ... \]
\[ Richness_i \sim Poisson(ERichness)_i \]
Here, there is a random slope (\(\psi_j\)) and random intercept (\(\gamma_i\)). This is how we run it:
\[ \underbrace{model4}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (1+NAP|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution}) \]

Model selection with random models


Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt     LL
Mod1 6 210.94       0.00   0.62   0.62 -98.36
Mod3 7 213.24       2.30   0.20   0.81 -98.10
Mod2 7 213.52       2.59   0.17   0.98 -98.25
Mod4 9 217.81       6.87   0.02   1.00 -97.33

Are the AIC values the same?

Notes on AIC

I recommend using the AICcmodavg package and making a list!

AICcmodavg::aictab(list(model1,model2,model3,model4))

Next step: let’s explore the model (factors!)

How do you explore the best model?

Next step: let’s explore the model (factors!)

Anova(glm(Richness~NAP*Exposure, data = rikz, family = poisson))

Analysis of Deviance Table (Type II tests)

Response: Richness
             LR Chisq Df Pr(>Chisq)    
NAP            58.915  1  1.646e-14 ***
Exposure       54.810  2  1.254e-12 ***
NAP:Exposure    3.644  2     0.1617    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Talk about Zuur’s approach here

Next step?


Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt      LL
Mod2 4 209.37       0.00   0.69   0.69 -100.19
Mod1 6 210.94       1.57   0.31   1.00  -98.36
Mod3 2 259.47      50.10   0.00   1.00 -127.59
Mod4 3 265.87      56.50   0.00   1.00 -129.64
Mod5 1 323.84     114.47   0.00   1.00 -160.88

Evidence ratios

Burnham and Anderson

Evidence ratios


Evidence ratio between models 'Mod2' and 'Mod1':
2.19

Raffle tickets

Last step: Model validation

Of the best model.

Other steps?

Check assumptions (if normal)

Mixed effects models pt 2

For some of the other examples

An alternative to mixed models can be repeated measures methods

Split-plot with adjustments
Profile analysis
Linear mixed models with correlations

Mixed effects models pt 2

Split plot with adjustments

Repeated measures ANOVA

Model:

\[ y_{ijk} = \mu + \alpha_i + \beta_j + \alpha \beta_{ij} + \delta_{ik} + \epsilon_{ijk} \]

Split plot

You need to adjust the p-value!

It is the best method to use for well-balanced, experimental data. Also, simplest to do and interpret

You get p-values for time, for treatment and for the interaction

Profile analysis

Step 1: Run a MANOVA (what even is this?) –> Multivariate
Step 2: Run multiple ANOVAS one for each time period

Mixed models with correlation

More about this on Friday’s lab

Wednesday class homework

We will talk about visualization
We can discuss plots, or figure out how to make some plots that are important for you
Send a figure of a plot that you may not understand or want to learn how to plot it

Data visualization

My favorite part of stats/data analysis/ etc
You can be creative! but be smart and ethical.
Plots are easy to “manipulate”

Misleading plots

Incorrect and misleading plots

Pie chart with total over 100
Avoid pie charts. Grouped barplots are potentially teh best solution for “parts of a whole”. Stacked barplot can be challenging to distinguish.

Plotting

Estimate (point) and CI
Similar to a line and CI

Plots

Distribution
Correlation
Ranking
Part of a whole
Evolution (time series, line plot)
Map
Flows

Distribution plots

Distribution plots

Correlation

Scatterplot
Heatmap
Correlogram
Bubble

Scatterplot

Combining scatterplot and distribution

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Second part: GLMM

Back to our example

Back to our example

Linear model

Linear model

The variance can affect the slope, the intercept, or both

Mixed model

Mixed model with random intercept

Mixed model with random intercept

Mixed effects models: how to run them?

Mixed effects model output

Mixed model with random intercept and random slope

Random slope and intercept

Random slope and intercept

Only random slope

Example

RIKZ data

RIKZ data

RIKZ data

Let’s work on this example

Objective

The data

The data

Model selection with random models

Let’s run the models

Model 1:

Model 2

Model 3

Model 4:

Model selection with random models

Notes on AIC

Next step: let’s explore the model (factors!)

Next step: let’s explore the model (factors!)

Next step?

Evidence ratios

Evidence ratios

Last step: Model validation

Other steps?

Mixed effects models pt 2

Mixed effects models pt 2

Split plot with adjustments

Split plot

Profile analysis

Mixed models with correlation

Wednesday class homework

Data visualization

Misleading plots

Incorrect and misleading plots

Plotting

Plots

Distribution plots

Distribution plots

Distribution plots

Distribution plots

Correlation

Scatterplot

Combining scatterplot and distribution

Combining scatterplot and distribution

Combining scatterplot and distribution

Reminder

Distribution plots