Second part: GLMM

Back to our example

Back to our example

  • We set 4 nets

  • Sites were randomly chosen

  • We are interested in the content of Hg in Walleye

  • Hg is correlated with size. The bigger the fish, the higher Hg content it has

  • So, normally we could do one thing:

  • \(E(Hg_i) = \beta_0 + \beta_1*size_{i}\)

Linear model

Linear model


Call:
lm(formula = Hg ~ size, data = HgDat_df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.62057 -0.36606 -0.03252  0.35164  0.55112 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.861740   0.193306   4.458 5.45e-05 ***
size        0.010226   0.004662   2.194   0.0335 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3835 on 45 degrees of freedom
Multiple R-squared:  0.0966,    Adjusted R-squared:  0.07653 
F-statistic: 4.812 on 1 and 45 DF,  p-value: 0.03347
  • Use coefficients to estimate size at over 1.1 \(\mu g\) .

  • Limit fish of that size

The variance can affect the slope, the intercept, or both

  • Random effects introduce variance.

  • It can introduce variance to the intercept or to the slope

Mixed model

  • No mixed model:

  • \[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]

  • i individuals, j sites (4), mixed intercept:

  • \[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]

  • Variance comes from random “selection” of fish within a net

  • Variance comes from random “selection” of sites

  • What does a mixed intercept mean?

Mixed model with random intercept

This is the result of a mixed model:

  • What does the random intercept mean?
  • Why not simply do:
  • \(Hg_i \sim \beta_0 + \beta_1size_{i} + \beta_2rB_{i} + \beta_3rC_{i} + \beta_4rD_{i} \epsilon\)
  • Model with effect of site

Mixed model with random intercept

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -132.3   -124.9     70.1   -140.3       43 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.141157 0.37571 
 Residual             0.001643 0.04053 
Number of obs: 47, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00164 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.604302   0.189016    3.20  0.00139 ** 
size        0.017296   0.000507   34.12  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]

Mixed effects models: how to run them?

library(glmmTMB)
m2<- glmmTMB(Hg~size +(1|region), data=HgDat_df)
  • Random effects are specified as \(x|g\)

  • x is an effect

  • g is grouping factor (categorical)

  • \(1|region\)

  • Effect -> 1 (intercept)
    Grouping factor -> region

Mixed effects model output

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 | region)
Data: HgDat_df

     AIC      BIC   logLik deviance df.resid 
  -132.3   -124.9     70.1   -140.3       43 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 region   (Intercept) 0.141157 0.37571 
 Residual             0.001643 0.04053 
Number of obs: 47, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00164 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.604302   0.189016    3.20  0.00139 ** 
size        0.017296   0.000507   34.12  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Mixed model with random intercept and random slope

  • \[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]

  • i individuals, j sites (4)

  • \[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{(\beta_1+\underbrace{\psi_j}_{\text{Random slope}})size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]

  • Variance comes from random “selection” of fish within a net

  • Variance comes from random “selection” of sites

Random slope and intercept

m3<-glmmTMB(Hg~size +(1+size|region), data=HgDat2_df)
summary(m3)
 Family: gaussian  ( identity )
Formula:          Hg ~ size + (1 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -133.9   -119.4     72.9   -145.9       76 

Random effects:

Conditional model:
 Groups   Name        Variance  Std.Dev. Corr  
 region   (Intercept) 3.323e-02 0.182284       
          size        4.497e-05 0.006706 -0.09 
 Residual             6.844e-03 0.082725       
Number of obs: 82, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00684 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.062721   0.098281  10.813   <2e-16 ***
size        0.029886   0.003458   8.642   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Random effects are specified as \(x|g\)

  • x is an effect

  • g is grouping factor (categorical)

  • \(1 + size|region\)

  • Effect -> 1 (intercept) + size
    Grouping factor -> region

Random slope and intercept

Only random slope

 Family: gaussian  ( identity )
Formula:          Hg ~ size + (0 + size | region)
Data: HgDat2_df

     AIC      BIC   logLik deviance df.resid 
  -149.8   -140.1     78.9   -157.8       80 

Random effects:

Conditional model:
 Groups   Name Variance  Std.Dev.
 region   size 0.0001243 0.01115 
 Residual      0.0065550 0.08096 
Number of obs: 84, groups:  region, 4

Dispersion estimate for gaussian family (sigma^2): 0.00656 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.976118   0.032413  30.115  < 2e-16 ***
size        0.027121   0.005629   4.818 1.45e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example

  1. Zuur, A., Ieno, E. N., Walker, N., Saveliev, A. A. & Smith, G. M. Mixed Effects Models and Extensions in Ecology with R. (Springer New York, 2009).

https://www.highstat.com/

RIKZ data

RIKZ data

  • RIKZ institute

  • Inter-tidal area (AKA beach): 9

  • In each beach, 5 samples were taken

  • Response variable: macro-fauna richness (AKA number of species)

  • NAP: height of sampling station compared to mean tidal level

  • Exposure (index). Of multiple environmental conditions. Treated as categorical, as there are three levels

RIKZ data

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)
 

Let’s work on this example

Objective

flowchart TD

  A[RIKZ DATA] --> B(Beach 1)
  A --> C(Beach 2)
  A --> E(Beach 9)
  B --> F(site 1)
  B --> G(site 2)
  B --> H(site 3)
  B --> I(site 4)
  B --> J(site 5)
  C --> K(site 1)
  C --> L(site 2)
  C --> M(site 3)
  C --> N(site 4)
  C --> O(site 5)
  E --> P(site 1)
  E --> Q(site 2)
  E --> R(site 3)
  E --> S(site 4)
  E --> T(site 5)
 c

  • Exploring whether there is a relationship between richness and the two factors: NAP and exposure

  • We have an N of 45… but do we?

  • We have multiple potential alternatives: there is an effect of NAP, an effect of Exposure, an effect of both, of neither

  • Also… there are mixed effects

  • Random intercept, random slope, both? –> let’s wait to discuss this

  • Step 1: Read the data… there is a potential issue

rikz<-read.table(file = "RIKZ.txt", header = TRUE, dec = ",")
str(rikz)
  • Step 2: decide our analysis! 🤔 Any ideas? any details?

The data

The data

  • There seems to be an effect of NAP…

  • how about… a random effect of beach?

  • step 3: how do we decide random effects?

  • step 3: random effects of NAP, Exposure? or both?

  • What do we do?

Model selection with random models

WE CANNOT TEST FIXED AND RANDOM EFFECTS AT THE SAME TIME

  • Test mixed effects first.

  • We use the “global” or saturated fixed model

  • How many models?

  • Use AIC

  • What model structure?

Let’s run the models

  • Mixed effects models in glmmTMB. How do we run them?

  • We have both fixed and mixed effects models in glmmTMB.

  • In this specific example, we are trying to run 4 models

  • Trying to figure out which random effects to use in our model selection step.

  • The first step is to run 4 models, each with the same fixed effects, but different random effects.

  • Here is how you run those models in r:

  • \[ \underbrace{model2}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (1|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution}) \]

  • mixed models: x|g

  • x is an effect

  • g is grouping factor (categorical)

  • \(1 |beach\)

  • Effect -> 1 (intercept)

  • grouping factor –> beach

  • In our example we are LEAVING FIXED EFFECTS THE SAME. THEY ARE ALWAYS Richness~Nap*Exposure.

  • However, we are exploring 4 random effects:

  1. No random effects
  2. Random intercept (model 2, the one showed above)
  3. Random slope
  4. Random intercept and random slope

Model 1:

  • The no random effects model is:

  • \[ log(ERichness) \sim \beta_0 + \beta_1 (NAP)_i + \beta_2 Exposure2_i ... \]

  • \[ Richness_i \sim Poisson(ERichness)_i \]

  • And the code to run it is:

model1<-glmmTMB(Richness~NAP*Exposure, data = rikz, family = poisson)
  • It has no random effect… so, it’s just run as a simple glm.

Model 2

The random intercept model is:

  • \[ log(ERichness) \sim (\beta_0 + \gamma_j) + \beta_1 (NAP)_i + \beta_2 Exposure2_i ... \]

  • \[ Richness_i \sim Poisson(ERichness)_i \]

  • And the way to introduce than randomness \(\gamma_j\) is:

  • \[ \underbrace{model2}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (1|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution}) \]

  • or:

model2<-glmmTMB(Richness~NAP*Exposure + (1|Beach), data = rikz, family = poisson)

If you see the equation, we need to estimate one gamma for each beach. That is why we use (1|Beach). - x is an effect

  • g is grouping factor (categorical)

  • \(1 |beach\)

Model 3

  • Random slope:

  • \[ log(ERichness) \sim \beta_0 + (\beta_1 * \psi_j * NAP_i )+ \beta_2 Exposure2_i ... \]

  • \[ Richness_i \sim Poisson(ERichness)_i \]

  • Here, there is a random slope (\(\psi_j\)), and it affects NAP. Exposure is categorical, so, it doesn’t hav a slope. This is how we run it:

  • \[ \underbrace{model3}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (0+NAP|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution}) \]

  • Here, we don’t want it to estimate random intercepts, hence the 0. However, we want it to estimate random slopes, so, that’s why we ass NAP

x is an effect

  • g is grouping factor (categorical)

  • \(0 + NAP |beach\)

  • No random intercept (0), and NAP (random slope)

Model 4:

  • Random slope and intercept:

  • \[ log(ERichness) \sim (\beta_0 + \gamma_j) + (\beta_1 * \psi_j * NAP_i )+ \beta_2 Exposure2_i ... \]

  • \[ Richness_i \sim Poisson(ERichness)_i \]

  • Here, there is a random slope (\(\psi_j\)) and random intercept (\(\gamma_i\)). This is how we run it:

  • \[ \underbrace{model4}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (1+NAP|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution}) \]

Or

Model selection with random models


Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt     LL
Mod1 6 210.94       0.00   0.62   0.62 -98.36
Mod3 7 213.24       2.30   0.20   0.81 -98.10
Mod2 7 213.52       2.59   0.17   0.98 -98.25
Mod4 9 217.81       6.87   0.02   1.00 -97.33

Are the AIC values the same?

Notes on AIC

I recommend using the AICcmodavg package and making a list!

AICcmodavg::aictab(list(model1,model2,model3,model4))

Next step: let’s explore the model (factors!)

How do you explore the best model?

Next step: let’s explore the model (factors!)

Anova(glm(Richness~NAP*Exposure, data = rikz, family = poisson))
Analysis of Deviance Table (Type II tests)

Response: Richness
             LR Chisq Df Pr(>Chisq)    
NAP            58.915  1  1.646e-14 ***
Exposure       54.810  2  1.254e-12 ***
NAP:Exposure    3.644  2     0.1617    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Talk about Zuur’s approach here

Next step?


Model selection based on AICc:

     K   AICc Delta_AICc AICcWt Cum.Wt      LL
Mod2 4 209.37       0.00   0.69   0.69 -100.19
Mod1 6 210.94       1.57   0.31   1.00  -98.36
Mod3 2 259.47      50.10   0.00   1.00 -127.59
Mod4 3 265.87      56.50   0.00   1.00 -129.64
Mod5 1 323.84     114.47   0.00   1.00 -160.88

Evidence ratios

Burnham and Anderson

Evidence ratios


Evidence ratio between models 'Mod2' and 'Mod1':
2.19 
  • Raffle tickets

Last step: Model validation

Of the best model.

Other steps?

Check assumptions (if normal)

Mixed effects models pt 2

For some of the other examples

An alternative to mixed models can be repeated measures methods

  1. Split-plot with adjustments
  2. Profile analysis
  3. Linear mixed models with correlations

Mixed effects models pt 2

Split plot with adjustments

Repeated measures ANOVA

Model:

\[ y_{ijk} = \mu + \alpha_i + \beta_j + \alpha \beta_{ij} + \delta_{ik} + \epsilon_{ijk} \]

Split plot

You need to adjust the p-value!

It is the best method to use for well-balanced, experimental data. Also, simplest to do and interpret

You get p-values for time, for treatment and for the interaction

Profile analysis

  • Step 1: Run a MANOVA (what even is this?) –> Multivariate

  • Step 2: Run multiple ANOVAS one for each time period

Mixed models with correlation

More about this on Friday’s lab

Wednesday class homework

  • We will talk about visualization

  • We can discuss plots, or figure out how to make some plots that are important for you

  • Send a figure of a plot that you may not understand or want to learn how to plot it

Data visualization

  • My favorite part of stats/data analysis/ etc

  • You can be creative! but be smart and ethical.

  • Plots are easy to “manipulate”

Misleading plots

Incorrect and misleading plots

  • Pie chart with total over 100

  • Avoid pie charts. Grouped barplots are potentially teh best solution for “parts of a whole”. Stacked barplot can be challenging to distinguish.

Plotting

  • Estimate (point) and CI

  • Similar to a line and CI

Plots

  • Distribution

  • Correlation

  • Ranking

  • Part of a whole

  • Evolution (time series, line plot)

  • Map

  • Flows

Distribution plots

Distribution plots

Distribution plots

Distribution plots

Correlation

  • Scatterplot

  • Heatmap

  • Correlogram

  • Bubble

Scatterplot

Combining scatterplot and distribution

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Combining scatterplot and distribution

Combining scatterplot and distribution

Reminder

Distribution plots