Second part: GLMM

Back to our example

We set 4 nets
Sites were randomly chosen
We are interested in the content of Hg in Walleye
Hg is correlated with size. The bigger the fish, the higher Hg content it has
So, normally we could do one thing:
\(E(Hg_i) = \beta_0 + \beta_1*size_{i}\)

Linear model


Call:
lm(formula = Hg ~ size, data = HgDat_df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.41690 -0.04704  0.05721  0.13802  0.29144 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.905404   0.087843  10.307 8.18e-15 ***
size        0.015685   0.002108   7.441 4.85e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2063 on 59 degrees of freedom
Multiple R-squared:  0.4841,    Adjusted R-squared:  0.4754 
F-statistic: 55.37 on 1 and 59 DF,  p-value: 4.846e-10

Use coefficients to estimate size at over 1.1 \(\mu g\) .
Limit fish of that size

The variance can affect the slope, the intercept, or both

Random effects introduce variance.
It can introduce variance to the intercept or to the slope

Mixed model

No mixed model:
\[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]
i individuals, j sites (4), mixed intercept:
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites
What does a mixed intercept mean?

Mixed model with random intercept

This is the result of a mixed model:

What does the random intercept mean?
Why not simply do:
\(Hg_i \sim \beta_0 + \beta_1size_{i} + \beta_2rB_{i} + \beta_3rC_{i} + \beta_4rD_{i} \epsilon\)
Model with effect of site

Mixed model with random intercept

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -156.3   -147.8     82.1   -164.3       57 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.038392 0.19594 
 Residual             0.002789 0.05281 
Number of obs: 61, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00279 

Conditional model:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.8541981  0.1006602   8.486   <2e-16 ***
size        0.0168444  0.0005563  30.279   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]

Mixed effects models: how to run them?

library(glmmTMB)
m2<- glmmTMB(Hg~size +(1|region), data=HgDat_df)

Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1|region\)
Effect -> 1 (intercept)
Grouping factor -> region

Mixed effects model output

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -156.3   -147.8     82.1   -164.3       57 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.038392 0.19594 
 Residual             0.002789 0.05281 
Number of obs: 61, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00279 

Conditional model:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.8541981  0.1006602   8.486   <2e-16 ***
size        0.0168444  0.0005563  30.279   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Mixed model with random intercept and random slope

\[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]
i individuals, j sites (4)
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{(\beta_1+\underbrace{\psi_j}_{\text{Random slope}})size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites

Random slope and intercept

m3<-glmmTMB(Hg~size +(1+size|region), data=HgDat2_df)
summary(m3)

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -151.8   -137.5     81.9   -163.8       75 

Random effects:

Conditional model:
 Groups   Name        Variance  Std.Dev. Corr 
 region   (Intercept) 0.0684007 0.26154       
          size        0.0001164 0.01079  0.49 
 Residual             0.0048202 0.06943       
Number of obs: 81, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00482 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.273729   0.133954   9.509  < 2e-16 ***
size        0.032573   0.005441   5.987 2.14e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1 + size|region\)
Effect -> 1 (intercept) + size
Grouping factor -> region

Random slope and intercept

Only random slope

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (0 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -144.2   -135.0     76.1   -152.2       70 

Random effects:

Conditional model:
 Groups   Name Variance  Std.Dev.
 region   size 0.0001867 0.01367 
 Residual      0.0050884 0.07133 
Number of obs: 74, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00509 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.977918   0.035787  27.326  < 2e-16 ***
size        0.023133   0.006882   3.361 0.000776 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example

Zuur, A., Ieno, E. N., Walker, N., Saveliev, A. A. & Smith, G. M. Mixed Effects Models and Extensions in Ecology with R. (Springer New York, 2009).

https://www.highstat.com/

RIKZ data

RIKZ institute
Inter-tidal area (AKA beach): 9
In each beach, 5 samples were taken
Response variable: macro-fauna richness (AKA number of species)
NAP: height of sampling station compared to mean tidal level
Exposure (index). Of multiple environmental conditions. Treated as categorical, as there are three levels

RIKZ data

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)

Let’s work on this example

Objective

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)
 c

Exploring whether there is a relationship between richness and the two factors: NAP and exposure
We have an N of 45… but do we?
We have multiple potential alternatives: there is an effect of NAP, an effect of Exposure, an effect of both, of neither
Also… there are mixed effects
Random intercept, random slope, both? –> let’s wait to discuss this
Step 1: Read the data… there is a potential issue

rikz<-read.table(file = "RIKZ.txt", header = TRUE, dec = ",")
str(rikz)

Step 2: decide our analysis! 🤔 Any ideas? any details?

The data

There seems to be an effect of NAP…
how about… a random effect of beach?
step 3: how do we decide random effects?
step 3: random effects of NAP, Exposure? or both?
What do we do?

Model selection with random models

WE CANNOT TEST FIXED AND RANDOM EFFECTS AT THE SAME TIME

Test mixed effects first.
We use the “global” or saturated fixed model
How many models?
Use AIC
What model structure?

Model selection with random models


Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt     LL
Mod1 6 210.94       0.00   0.62   0.62 -98.36
Mod3 7 213.24       2.30   0.20   0.81 -98.10
Mod2 7 213.52       2.59   0.17   0.98 -98.25
Mod4 9 217.81       6.87   0.02   1.00 -97.33

Are the AIC values the same?

Notes on AIC

I recommend using the AICcmodavg package and making a list!

AICcmodavg::aictab(list(model1,model2,model3,model4))

Next step: let’s explore the model (factors!)

How do you explore the best model?

Next step: let’s explore the model (factors!)

Anova(glm(Richness~NAP*Exposure, data = rikz, family = poisson))

Analysis of Deviance Table (Type II tests)

Response: Richness
             LR Chisq Df Pr(>Chisq)    
NAP            58.915  1  1.646e-14 ***
Exposure       54.810  2  1.254e-12 ***
NAP:Exposure    3.644  2     0.1617    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Talk about Zuur’s approach here

Next step?


Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt      LL
Mod2 4 209.37       0.00   0.69   0.69 -100.19
Mod1 6 210.94       1.57   0.31   1.00  -98.36
Mod3 2 259.47      50.10   0.00   1.00 -127.59
Mod4 3 265.87      56.50   0.00   1.00 -129.64
Mod5 1 323.84     114.47   0.00   1.00 -160.88

Evidence ratios

Burnham and Anderson

Evidence ratios


Evidence ratio between models 'Mod2' and 'Mod1':
2.19

Raffle tickets

Last step: Model validation

Of the best model.

Other steps?

Check assumptions (if normal)

Mixed effects models pt 2

For some of the other examples

An alternative to mixed models can be repeated measures methods

Split-plot with adjustments
Profile analysis
Linear mixed models with correlations

Mixed effects models pt 2

Split plot with adjustments

Repeated measures ANOVA

Model:

\[ y_{ijk} = \mu + \alpha_i + \beta_j + \alpha \beta_{ij} + \delta_{ik} + \epsilon_{ijk} \]

Split plot

You need to adjust the p-value!

It is the best method to use for well-balanced, experimental data. Also, simplest to do and interpret

You get p-values for time, for treatment and for the interaction

Profile analysis

Step 1: Run a MANOVA (what even is this?) –> Multivariate
Step 2: Run multiple ANOVAS one for each time period

Mixed models with correlation

More about this on Friday’s lab

Wednesday class homework

We will talk about visualization
We can discuss plots, or figure out how to make some plots that are important for you
Send a figure of a plot that you may not understand or want to learn how to plot it

Data visualization

My favorite part of stats/data analysis/ etc
You can be creative! but be smart and ethical.
Plots are easy to “manipulate”

Misleading plots

Incorrect and misleading plots

Pie chart with total over 100
Avoid pie charts. Grouped barplots are potentially teh best solution for “parts of a whole”. Stacked barplot can be challenging to distinguish.

Plotting

Estimate (point) and CI
Similar to a line and CI

Plots

Distribution
Correlation
Ranking
Part of a whole
Evolution (time series, line plot)
Map
Flows

Distribution

Second part: GLMM

Back to our example

Back to our example

Linear model

Linear model

The variance can affect the slope, the intercept, or both

Mixed model

Mixed model with random intercept

Mixed model with random intercept

Mixed effects models: how to run them?

Mixed effects model output

Mixed model with random intercept and random slope

Random slope and intercept

Random slope and intercept

Only random slope

Example

RIKZ data

RIKZ data

RIKZ data

Let’s work on this example

Objective

The data

The data

Model selection with random models

Model selection with random models

Notes on AIC

Next step: let’s explore the model (factors!)

Next step: let’s explore the model (factors!)

Next step?

Evidence ratios

Evidence ratios

Last step: Model validation

Other steps?

Mixed effects models pt 2

Mixed effects models pt 2

Split plot with adjustments

Split plot

Profile analysis

Mixed models with correlation

Wednesday class homework

Data visualization

Misleading plots

Incorrect and misleading plots

Plotting

Plots

Distribution

Reminder