Back to our example
Back to our example
We set 4 nets
Sites were randomly chosen
We are interested in the content of Hg in Walleye
Hg is correlated with size. The bigger the fish, the higher Hg content it has
So, normally we could do one thing:
\(E(Hg_i) = \beta_0 + \beta_1*size_{i}\)
Linear model
Linear model
Call:
lm(formula = Hg ~ size, data = HgDat_df)
Residuals:
Min 1Q Median 3Q Max
-0.41690 -0.04704 0.05721 0.13802 0.29144
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.905404 0.087843 10.307 8.18e-15 ***
size 0.015685 0.002108 7.441 4.85e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2063 on 59 degrees of freedom
Multiple R-squared: 0.4841, Adjusted R-squared: 0.4754
F-statistic: 55.37 on 1 and 59 DF, p-value: 4.846e-10
The variance can affect the slope, the intercept, or both
Mixed model
No mixed model:
\[
Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon
\]
i individuals, j sites (4), mixed intercept:
\[
Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var}
\]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites
What does a mixed intercept mean?
Mixed model with random intercept
This is the result of a mixed model:
What does the random intercept mean?
Why not simply do:
\(Hg_i \sim \beta_0 + \beta_1size_{i} + \beta_2rB_{i} + \beta_3rC_{i} + \beta_4rD_{i} \epsilon\)
Model with effect of site
Mixed model with random intercept
Family: gaussian ( identity )
Formula: Hg ~ size + (1 | region)
Data: HgDat_df
AIC BIC logLik deviance df.resid
-156.3 -147.8 82.1 -164.3 57
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
region (Intercept) 0.038392 0.19594
Residual 0.002789 0.05281
Number of obs: 61, groups: region, 4
Dispersion estimate for gaussian family (sigma^2): 0.00279
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8541981 0.1006602 8.486 <2e-16 ***
size 0.0168444 0.0005563 30.279 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\[
Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var}
\]
Mixed effects models: how to run them?
library (glmmTMB)
m2<- glmmTMB (Hg~ size + (1 | region), data= HgDat_df)
Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1|region\)
Effect -> 1 (intercept)
Grouping factor -> region
Mixed effects model output
Family: gaussian ( identity )
Formula: Hg ~ size + (1 | region)
Data: HgDat_df
AIC BIC logLik deviance df.resid
-156.3 -147.8 82.1 -164.3 57
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
region (Intercept) 0.038392 0.19594
Residual 0.002789 0.05281
Number of obs: 61, groups: region, 4
Dispersion estimate for gaussian family (sigma^2): 0.00279
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8541981 0.1006602 8.486 <2e-16 ***
size 0.0168444 0.0005563 30.279 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Mixed model with random intercept and random slope
\[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]
i individuals, j sites (4)
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{(\beta_1+\underbrace{\psi_j}_{\text{Random slope}})size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites
Random slope and intercept
m3<- glmmTMB (Hg~ size + (1 + size| region), data= HgDat2_df)
summary (m3)
Family: gaussian ( identity )
Formula: Hg ~ size + (1 + size | region)
Data: HgDat2_df
AIC BIC logLik deviance df.resid
-151.8 -137.5 81.9 -163.8 75
Random effects:
Conditional model:
Groups Name Variance Std.Dev. Corr
region (Intercept) 0.0684007 0.26154
size 0.0001164 0.01079 0.49
Residual 0.0048202 0.06943
Number of obs: 81, groups: region, 4
Dispersion estimate for gaussian family (sigma^2): 0.00482
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.273729 0.133954 9.509 < 2e-16 ***
size 0.032573 0.005441 5.987 2.14e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1 + size|region\)
Effect -> 1 (intercept) + size
Grouping factor -> region
Random slope and intercept
Only random slope
Family: gaussian ( identity )
Formula: Hg ~ size + (0 + size | region)
Data: HgDat2_df
AIC BIC logLik deviance df.resid
-144.2 -135.0 76.1 -152.2 70
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
region size 0.0001867 0.01367
Residual 0.0050884 0.07133
Number of obs: 74, groups: region, 4
Dispersion estimate for gaussian family (sigma^2): 0.00509
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.977918 0.035787 27.326 < 2e-16 ***
size 0.023133 0.006882 3.361 0.000776 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RIKZ data
RIKZ data
RIKZ institute
Inter-tidal area (AKA beach): 9
In each beach, 5 samples were taken
Response variable: macro-fauna richness (AKA number of species)
NAP: height of sampling station compared to mean tidal level
Exposure (index). Of multiple environmental conditions. Treated as categorical, as there are three levels
RIKZ data
flowchart TD
A[RIKZ DATA] --> B(Beach 1)
A --> C(Beach 2)
A --> E(Beach 9)
B --> F(site 1)
B --> G(site 2)
B --> H(site 3)
B --> I(site 4)
B --> J(site 5)
C --> K(site 1)
C --> L(site 2)
C --> M(site 3)
C --> N(site 4)
C --> O(site 5)
E --> P(site 1)
E --> Q(site 2)
E --> R(site 3)
E --> S(site 4)
E --> T(site 5)
Let’s work on this example
Objective
flowchart TD
A[RIKZ DATA] --> B(Beach 1)
A --> C(Beach 2)
A --> E(Beach 9)
B --> F(site 1)
B --> G(site 2)
B --> H(site 3)
B --> I(site 4)
B --> J(site 5)
C --> K(site 1)
C --> L(site 2)
C --> M(site 3)
C --> N(site 4)
C --> O(site 5)
E --> P(site 1)
E --> Q(site 2)
E --> R(site 3)
E --> S(site 4)
E --> T(site 5)
c
Exploring whether there is a relationship between richness and the two factors: NAP and exposure
We have an N of 45… but do we?
We have multiple potential alternatives: there is an effect of NAP, an effect of Exposure, an effect of both, of neither
Also… there are mixed effects
Random intercept, random slope, both? –> let’s wait to discuss this
Step 1: Read the data… there is a potential issue
rikz<- read.table (file = "RIKZ.txt" , header = TRUE , dec = "," )
str (rikz)
Step 2: decide our analysis! 🤔 Any ideas? any details?
rikz<-read.table(file = “RIKZ.txt”, header = TRUE, dec = “,”)
rikz$NAP<-as.numeric(rikz$NAP)
rikz$Beach<-factor(rikz$Beach)
rikz$Exposure<-factor(rikz$Exposure)
The data
The data
There seems to be an effect of NAP…
how about… a random effect of beach?
step 3: how do we decide random effects?
step 3: random effects of NAP, Exposure? or both?
What do we do?
Model selection with random models
WE CANNOT TEST FIXED AND RANDOM EFFECTS AT THE SAME TIME
Talk about Zuur’s weird non-Poisson, and normal analysis
Model selection with random models
Model selection based on AICc:
K AICc Delta_AICc AICcWt Cum.Wt LL
Mod1 6 210.94 0.00 0.62 0.62 -98.36
Mod3 7 213.24 2.30 0.20 0.81 -98.10
Mod2 7 213.52 2.59 0.17 0.98 -98.25
Mod4 9 217.81 6.87 0.02 1.00 -97.33
Are the AIC values the same?
Notes on AIC
I recommend using the AICcmodavg package and making a list!
AICcmodavg:: aictab (list (model1,model2,model3,model4))
Next step: let’s explore the model (factors!)
How do you explore the best model?
Next step: let’s explore the model (factors!)
Anova (glm (Richness~ NAP* Exposure, data = rikz, family = poisson))
Analysis of Deviance Table (Type II tests)
Response: Richness
LR Chisq Df Pr(>Chisq)
NAP 58.915 1 1.646e-14 ***
Exposure 54.810 2 1.254e-12 ***
NAP:Exposure 3.644 2 0.1617
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Talk about Zuur’s approach here
Next step?
Model selection based on AICc:
K AICc Delta_AICc AICcWt Cum.Wt LL
Mod2 4 209.37 0.00 0.69 0.69 -100.19
Mod1 6 210.94 1.57 0.31 1.00 -98.36
Mod3 2 259.47 50.10 0.00 1.00 -127.59
Mod4 3 265.87 56.50 0.00 1.00 -129.64
Mod5 1 323.84 114.47 0.00 1.00 -160.88
Evidence ratios
Burnham and Anderson
Evidence ratios
Evidence ratio between models 'Mod2' and 'Mod1':
2.19
Last step: Model validation
Of the best model.
Other steps?
Check assumptions (if normal)
Mixed effects models pt 2
Split plot with adjustments
Repeated measures ANOVA
Model:
\[
y_{ijk} = \mu + \alpha_i + \beta_j + \alpha \beta_{ij} + \delta_{ik} + \epsilon_{ijk}
\]
mu = grand mean
α i = effect of the i th treatment level
β j = effect of the j th level of time
α β i j = interaction effect between treatment and time
δ i k = subject effect
ϵ i j k = residual, unexplained variation
Split plot
You need to adjust the p-value!
It is the best method to use for well-balanced, experimental data. Also, simplest to do and interpret
You get p-values for time, for treatment and for the interaction
Mixed models with correlation
More about this on Friday’s lab
Wednesday class homework
We will talk about visualization
We can discuss plots, or figure out how to make some plots that are important for you
Send a figure of a plot that you may not understand or want to learn how to plot it
Data visualization
My favorite part of stats/data analysis/ etc
You can be creative! but be smart and ethical.
Plots are easy to “manipulate”
Misleading plots
Incorrect and misleading plots
Plotting
Estimate (point) and CI
Similar to a line and CI
Reminder