Second part: GLMM

Back to our example

We set 4 nets
Sites were randomly chosen
We are interested in the content of Hg in Walleye
Hg is correlated with size. The bigger the fish, the higher Hg content it has
So, normally we could do one thing:
\(E(Hg_i) = \beta_0 + \beta_1 \times size_{i}\)

Linear model


Call:
lm(formula = Hg ~ size, data = HgDat_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7029 -0.5174  0.1537  0.2376  0.4537 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.892114   0.186072   4.794 1.18e-05 ***
size        0.015275   0.004451   3.432  0.00111 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3758 on 58 degrees of freedom
Multiple R-squared:  0.1688,    Adjusted R-squared:  0.1545 
F-statistic: 11.78 on 1 and 58 DF,  p-value: 0.001111

Use coefficients to estimate size at over 1.1 \(\mu g\) .
Limit fish of that size

The variance can affect the slope, the intercept, or both

Random effects introduce variance.
It can introduce variance to the intercept or to the slope

Mixed model

No mixed model:
\[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]
i individuals, j sites (4), mixed intercept:
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites
What does a mixed intercept mean?

Mixed model with random intercept

This is the result of a mixed model:

What does the random intercept mean?
Why not simply do:
\(Hg_i \sim \beta_0 + \beta_1size_{i} + \beta_2rB_{i} + \beta_3rC_{i} + \beta_4rD_{i} \epsilon\)
Model with effect of site

Mixed model with random intercept

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -162.8   -154.4     85.4   -170.8       56 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.132190 0.36358 
 Residual             0.002157 0.04644 
Number of obs: 60, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00216 

Conditional model:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.8506686  0.1832411   4.642 3.44e-06 ***
size        0.0167689  0.0005514  30.413  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\[ Hg_{ij} = \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]

Where:
\[ \epsilon \sim N(0,\sigma^2) \]
\[ \gamma_j \sim N(0,\sigma^2_\gamma) \]

Mixed effects models: how to run them?

library(glmmTMB)
m2<- glmmTMB(Hg~size +(1|region), data=HgDat_df)

Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1|region\)
Effect -> 1 (intercept)
Grouping factor -> region

Mixed effects model output

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -162.8   -154.4     85.4   -170.8       56 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.132190 0.36358 
 Residual             0.002157 0.04644 
Number of obs: 60, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00216 

Conditional model:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.8506686  0.1832411   4.642 3.44e-06 ***
size        0.0167689  0.0005514  30.413  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Mixed model with random intercept and random slope

\[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]
i individuals, j sites (4)
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{(\beta_1+\underbrace{\psi_j}_{\text{Random slope}})size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites
Where,
\[ \epsilon \sim N(0,\sigma^2) \]
\[ \gamma_j \sim N(0,\sigma^2_\gamma) \]
\[ \psi_j \sim N(0,\sigma^2_\psi) \]

Random slope and intercept

m3<-glmmTMB(Hg~size +(1+size|region), data=HgDat2_df)
summary(m3)

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -160.4   -146.0     86.2   -172.4       75 

Random effects:

Conditional model:
 Groups   Name        Variance  Std.Dev. Corr  
 region   (Intercept) 1.439e-02 0.119958       
          size        8.452e-05 0.009194 -1.00 
 Residual             5.298e-03 0.072785       
Number of obs: 81, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.0053 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.964214   0.067079  14.374  < 2e-16 ***
size        0.021177   0.004656   4.549  5.4e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1 + size|region\)
Effect -> 1 (intercept) + size
Grouping factor -> region

Random slope and intercept

Only random slope

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (0 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -168.8   -159.2     88.4   -176.8       77 

Random effects:

Conditional model:
 Groups   Name Variance  Std.Dev.
 region   size 0.0001109 0.01053 
 Residual      0.0047275 0.06876 
Number of obs: 81, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00473 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.004771   0.028598   35.13  < 2e-16 ***
size        0.024258   0.005307    4.57 4.85e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example

Zuur, A., Ieno, E. N., Walker, N., Saveliev, A. A. & Smith, G. M. Mixed Effects Models and Extensions in Ecology with R. (Springer New York, 2009).

https://www.highstat.com/

RIKZ data

RIKZ institute
Inter-tidal area (AKA beach): 9
In each beach, 5 samples were taken
Response variable: macro-fauna richness (AKA number of species)
NAP: height of sampling station compared to mean tidal level
Exposure (index). Of multiple environmental conditions. Treated as categorical, as there are three levels

RIKZ data

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)

Let’s work on this example

Objective

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)
 c

Exploring whether there is a relationship between richness and the two factors: NAP and exposure
We have an N of 45… but do we?
We have multiple potential alternatives: there is an effect of NAP, an effect of Exposure, an effect of both, of neither
Also… there are mixed effects
Random intercept, random slope, both? –> let’s wait to discuss this
Step 1: Read the data… there is a potential issue

rikz<-read.table(file = "RIKZ.txt", header = TRUE, dec = ",")
str(rikz)

Step 2: decide our analysis! 🤔 Any ideas? any details?

The data

There seems to be an effect of NAP…
how about… a random effect of beach?
step 3: how do we decide random effects?
step 3: random intercept? Or random slope? Or both? Or non
What do we do?
Exposure is categorical… so no random slope 😃

Model selection with random models

WE CANNOT TEST FIXED AND RANDOM EFFECTS AT THE SAME TIME

Test mixed effects first.
We use the “global” or saturated fixed model
How many models?
Use AIC
What model structure?

Model selection with random models

Four models. Run all with glmmTMB

Fixed effects: NAP*Exposure

Family: Poisson


Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt     LL
Mod1 6 210.94       0.00   0.62   0.62 -98.36
Mod3 7 213.24       2.30   0.20   0.81 -98.10
Mod2 7 213.52       2.59   0.17   0.98 -98.25
Mod4 9 217.81       6.87   0.02   1.00 -97.33

Are the AIC values the same?

Notes on AIC

I recommend using the AICcmodavg package and making a list!

library(AICcmodavg)
aictab(list(model1,model2,model3,model4))

Next step: let’s explore the model (factors!)

How do you explore the best model?

Let’s look at the summary

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: Richness
               Chisq Df Pr(>Chisq)    
NAP          51.7328  1  6.359e-13 ***
Exposure     55.4525  2  9.092e-13 ***
NAP:Exposure  3.5604  2     0.1686    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Next step: let’s explore the model (factors!)

Anova(model1)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: Richness
               Chisq Df Pr(>Chisq)    
NAP          51.7328  1  6.359e-13 ***
Exposure     55.4525  2  9.092e-13 ***
NAP:Exposure  3.5604  2     0.1686    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Talk about Zuur’s approach here

Next step?

model1<-glm(Richness~NAP*Exposure, data = rikz, family = poisson)
model2<-glm(Richness~NAP+Exposure, data = rikz, family = poisson)
model3<-glm(Richness~NAP, data = rikz, family = poisson)
model4<-glm(Richness~Exposure, data = rikz, family = poisson)
model5<-glm(Richness~1, data = rikz, family = poisson)

modeltab<-AICcmodavg::aictab(list(model1,model2,model3,model4,model5))
modeltab


Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt      LL
Mod2 4 209.37       0.00   0.69   0.69 -100.19
Mod1 6 210.94       1.57   0.31   1.00  -98.36
Mod3 2 259.47      50.10   0.00   1.00 -127.59
Mod4 3 265.87      56.50   0.00   1.00 -129.64
Mod5 1 323.84     114.47   0.00   1.00 -160.88

Evidence ratios

Burnham and Anderson

Evidence ratios


Evidence ratio between models 'Mod2' and 'Mod1':
2.19

Raffle tickets

Last step: Model validation

Of the best model.

Other steps?

Check assumptions (if normal)

Mixed effects models pt 2

For some of the other examples

An alternative to mixed models can be repeated measures methods

Split-plot with adjustments
Profile analysis
Linear mixed models with correlations

Mixed effects models pt 2

Split plot with adjustments

Repeated measures ANOVA

Model:

\[ y_{ijk} = \mu + \alpha_i + \beta_j + \alpha \beta_{ij} + \delta_{ik} + \epsilon_{ijk} \]

Split plot

You need to adjust the p-value!

It is the best method to use for well-balanced, experimental data. Also, simplest to do and interpret

You get p-values for time, for treatment and for the interaction

Profile analysis

Step 1: Run a MANOVA (what even is this?) –> Multivariate
Step 2: Run multiple ANOVAS one for each time period

Mixed models with correlation

More about this on Friday’s lab

Next week: Data visualization

My favorite part of stats/data analysis/ etc
You can be creative! but be smart and ethical.
Plots are easy to “manipulate”

Misleading plots

Incorrect and misleading plots

Pie chart with total over 100
Avoid pie charts. Grouped barplots are potentially teh best solution for “parts of a whole”. Stacked barplot can be challenging to distinguish.

Plotting

Estimate (point) and CI
Similar to a line and CI

Plots

Distribution
Correlation
Ranking
Part of a whole
Evolution (time series, line plot)
Map
Flows

Distribution