Second part: GLMM

Back to our example

Back to our example

  • We set 4 nets

  • Sites were randomly chosen

  • We are interested in the content of Hg in Walleye

  • Hg is correlated with size. The bigger the fish, the higher Hg content it has

  • So, normally we could do one thing:

  • \(E(Hg_i) = \beta_0 + \beta_1 \times size_{i}\)

Linear model

Linear model


Call:
lm(formula = Hg ~ size, data = HgDat_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7029 -0.5174  0.1537  0.2376  0.4537 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.892114   0.186072   4.794 1.18e-05 ***
size        0.015275   0.004451   3.432  0.00111 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3758 on 58 degrees of freedom
Multiple R-squared:  0.1688,    Adjusted R-squared:  0.1545 
F-statistic: 11.78 on 1 and 58 DF,  p-value: 0.001111
  • Use coefficients to estimate size at over 1.1 \(\mu g\) .

  • Limit fish of that size

The variance can affect the slope, the intercept, or both

  • Random effects introduce variance.

  • It can introduce variance to the intercept or to the slope

Mixed model

  • No mixed model:

  • \[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]

  • i individuals, j sites (4), mixed intercept:

  • \[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]

  • Variance comes from random “selection” of fish within a net

  • Variance comes from random “selection” of sites

  • What does a mixed intercept mean?

Mixed model with random intercept

This is the result of a mixed model:

  • What does the random intercept mean?
  • Why not simply do:
  • \(Hg_i \sim \beta_0 + \beta_1size_{i} + \beta_2rB_{i} + \beta_3rC_{i} + \beta_4rD_{i} \epsilon\)
  • Model with effect of site

Mixed model with random intercept

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -162.8   -154.4     85.4   -170.8       56 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.132190 0.36358 
 Residual             0.002157 0.04644 
Number of obs: 60, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00216 

Conditional model:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.8506686  0.1832411   4.642 3.44e-06 ***
size        0.0167689  0.0005514  30.413  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\[ Hg_{ij} = \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]

  • Where:

  • \[ \epsilon \sim N(0,\sigma^2) \]

  • \[ \gamma_j \sim N(0,\sigma^2_\gamma) \]

Mixed effects models: how to run them?

library(glmmTMB)
m2<- glmmTMB(Hg~size +(1|region), data=HgDat_df)
  • Random effects are specified as \(x|g\)

  • x is an effect

  • g is grouping factor (categorical)

  • \(1|region\)

  • Effect -> 1 (intercept)
    Grouping factor -> region

Mixed effects model output

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -162.8   -154.4     85.4   -170.8       56 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.132190 0.36358 
 Residual             0.002157 0.04644 
Number of obs: 60, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00216 

Conditional model:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.8506686  0.1832411   4.642 3.44e-06 ***
size        0.0167689  0.0005514  30.413  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Mixed model with random intercept and random slope

  • \[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]

  • i individuals, j sites (4)

  • \[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{(\beta_1+\underbrace{\psi_j}_{\text{Random slope}})size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]

  • Variance comes from random “selection” of fish within a net

  • Variance comes from random “selection” of sites

  • Where,

  • \[ \epsilon \sim N(0,\sigma^2) \]

  • \[ \gamma_j \sim N(0,\sigma^2_\gamma) \]

  • \[ \psi_j \sim N(0,\sigma^2_\psi) \]

Random slope and intercept

m3<-glmmTMB(Hg~size +(1+size|region), data=HgDat2_df)
summary(m3)
 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -160.4   -146.0     86.2   -172.4       75 

Random effects:

Conditional model:
 Groups   Name        Variance  Std.Dev. Corr  
 region   (Intercept) 1.439e-02 0.119958       
          size        8.452e-05 0.009194 -1.00 
 Residual             5.298e-03 0.072785       
Number of obs: 81, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.0053 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.964214   0.067079  14.374  < 2e-16 ***
size        0.021177   0.004656   4.549  5.4e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Random effects are specified as \(x|g\)

  • x is an effect

  • g is grouping factor (categorical)

  • \(1 + size|region\)

  • Effect -> 1 (intercept) + size
    Grouping factor -> region

Random slope and intercept

Only random slope

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (0 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -168.8   -159.2     88.4   -176.8       77 

Random effects:

Conditional model:
 Groups   Name Variance  Std.Dev.
 region   size 0.0001109 0.01053 
 Residual      0.0047275 0.06876 
Number of obs: 81, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00473 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.004771   0.028598   35.13  < 2e-16 ***
size        0.024258   0.005307    4.57 4.85e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example

  1. Zuur, A., Ieno, E. N., Walker, N., Saveliev, A. A. & Smith, G. M. Mixed Effects Models and Extensions in Ecology with R. (Springer New York, 2009).

https://www.highstat.com/

RIKZ data

RIKZ data

  • RIKZ institute

  • Inter-tidal area (AKA beach): 9

  • In each beach, 5 samples were taken

  • Response variable: macro-fauna richness (AKA number of species)

  • NAP: height of sampling station compared to mean tidal level

  • Exposure (index). Of multiple environmental conditions. Treated as categorical, as there are three levels

RIKZ data

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)
 

Let’s work on this example

Objective

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)
 c

  • Exploring whether there is a relationship between richness and the two factors: NAP and exposure

  • We have an N of 45… but do we?

  • We have multiple potential alternatives: there is an effect of NAP, an effect of Exposure, an effect of both, of neither

  • Also… there are mixed effects

  • Random intercept, random slope, both? –> let’s wait to discuss this

  • Step 1: Read the data… there is a potential issue

rikz<-read.table(file = "RIKZ.txt", header = TRUE, dec = ",")
str(rikz)
  • Step 2: decide our analysis! 🤔 Any ideas? any details?

The data

The data

  • There seems to be an effect of NAP…

  • how about… a random effect of beach?

  • step 3: how do we decide random effects?

  • step 3: random intercept? Or random slope? Or both? Or non

  • What do we do?

  • Exposure is categorical… so no random slope 😃

Model selection with random models

WE CANNOT TEST FIXED AND RANDOM EFFECTS AT THE SAME TIME

  • Test mixed effects first.

  • We use the “global” or saturated fixed model

  • How many models?

  • Use AIC

  • What model structure?

Model selection with random models

Four models. Run all with glmmTMB

Fixed effects: NAP*Exposure

Family: Poisson


Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt     LL
Mod1 6 210.94       0.00   0.62   0.62 -98.36
Mod3 7 213.24       2.30   0.20   0.81 -98.10
Mod2 7 213.52       2.59   0.17   0.98 -98.25
Mod4 9 217.81       6.87   0.02   1.00 -97.33

Are the AIC values the same?

Notes on AIC

I recommend using the AICcmodavg package and making a list!

library(AICcmodavg)
aictab(list(model1,model2,model3,model4))

Next step: let’s explore the model (factors!)

How do you explore the best model?

Let’s look at the summary

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: Richness
               Chisq Df Pr(>Chisq)    
NAP          51.7328  1  6.359e-13 ***
Exposure     55.4525  2  9.092e-13 ***
NAP:Exposure  3.5604  2     0.1686    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Next step: let’s explore the model (factors!)

Anova(model1)
Analysis of Deviance Table (Type II Wald chisquare tests)

Response: Richness
               Chisq Df Pr(>Chisq)    
NAP          51.7328  1  6.359e-13 ***
Exposure     55.4525  2  9.092e-13 ***
NAP:Exposure  3.5604  2     0.1686    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Talk about Zuur’s approach here

Next step?

model1<-glm(Richness~NAP*Exposure, data = rikz, family = poisson)
model2<-glm(Richness~NAP+Exposure, data = rikz, family = poisson)
model3<-glm(Richness~NAP, data = rikz, family = poisson)
model4<-glm(Richness~Exposure, data = rikz, family = poisson)
model5<-glm(Richness~1, data = rikz, family = poisson)

modeltab<-AICcmodavg::aictab(list(model1,model2,model3,model4,model5))
modeltab

Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt      LL
Mod2 4 209.37       0.00   0.69   0.69 -100.19
Mod1 6 210.94       1.57   0.31   1.00  -98.36
Mod3 2 259.47      50.10   0.00   1.00 -127.59
Mod4 3 265.87      56.50   0.00   1.00 -129.64
Mod5 1 323.84     114.47   0.00   1.00 -160.88

Evidence ratios

Burnham and Anderson

Evidence ratios


Evidence ratio between models 'Mod2' and 'Mod1':
2.19 
  • Raffle tickets

Last step: Model validation

Of the best model.

Other steps?

Check assumptions (if normal)

Mixed effects models pt 2

For some of the other examples

An alternative to mixed models can be repeated measures methods

  1. Split-plot with adjustments
  2. Profile analysis
  3. Linear mixed models with correlations

Mixed effects models pt 2

Split plot with adjustments

Repeated measures ANOVA

Model:

\[ y_{ijk} = \mu + \alpha_i + \beta_j + \alpha \beta_{ij} + \delta_{ik} + \epsilon_{ijk} \]

Split plot

You need to adjust the p-value!

It is the best method to use for well-balanced, experimental data. Also, simplest to do and interpret

You get p-values for time, for treatment and for the interaction

Profile analysis

  • Step 1: Run a MANOVA (what even is this?) –> Multivariate

  • Step 2: Run multiple ANOVAS one for each time period

Mixed models with correlation

More about this on Friday’s lab

Next week: Data visualization

  • My favorite part of stats/data analysis/ etc

  • You can be creative! but be smart and ethical.

  • Plots are easy to “manipulate”

Misleading plots

Incorrect and misleading plots

  • Pie chart with total over 100

  • Avoid pie charts. Grouped barplots are potentially teh best solution for “parts of a whole”. Stacked barplot can be challenging to distinguish.

Plotting

  • Estimate (point) and CI

  • Similar to a line and CI

Plots

  • Distribution

  • Correlation

  • Ranking

  • Part of a whole

  • Evolution (time series, line plot)

  • Map

  • Flows

Distribution