Back to our example
Back to our example
We set 4 nets
Sites were randomly chosen
We are interested in the content of Hg in Walleye
Hg is correlated with size. The bigger the fish, the higher Hg content it has
So, normally we could do one thing:
\(E(Hg_i) = \beta_0 + \beta_1*size_{i}\)
Linear model
Linear model
Call:
lm(formula = Hg ~ size, data = HgDat_df)
Residuals:
Min 1Q Median 3Q Max
-0.62057 -0.36606 -0.03252 0.35164 0.55112
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.861740 0.193306 4.458 5.45e-05 ***
size 0.010226 0.004662 2.194 0.0335 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3835 on 45 degrees of freedom
Multiple R-squared: 0.0966, Adjusted R-squared: 0.07653
F-statistic: 4.812 on 1 and 45 DF, p-value: 0.03347
The variance can affect the slope, the intercept, or both
Mixed model
No mixed model:
\[
Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon
\]
i individuals, j sites (4), mixed intercept:
\[
Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var}
\]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites
What does a mixed intercept mean?
Mixed model with random intercept
This is the result of a mixed model:
What does the random intercept mean?
Why not simply do:
\(Hg_i \sim \beta_0 + \beta_1size_{i} + \beta_2rB_{i} + \beta_3rC_{i} + \beta_4rD_{i} \epsilon\)
Model with effect of site
Mixed model with random intercept
Family: gaussian ( identity )
Formula: Hg ~ size + (1 | region)
Data: HgDat_df
AIC BIC logLik deviance df.resid
-132.3 -124.9 70.1 -140.3 43
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
region (Intercept) 0.141157 0.37571
Residual 0.001643 0.04053
Number of obs: 47, groups: region, 4
Dispersion estimate for gaussian family (sigma^2): 0.00164
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.604302 0.189016 3.20 0.00139 **
size 0.017296 0.000507 34.12 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\[
Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{\beta_1size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var}
\]
Mixed effects models: how to run them?
library (glmmTMB)
m2<- glmmTMB (Hg~ size + (1 | region), data= HgDat_df)
Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1|region\)
Effect -> 1 (intercept)
Grouping factor -> region
Mixed effects model output
Family: gaussian ( identity )
Formula: Hg ~ size + (1 | region)
Data: HgDat_df
AIC BIC logLik deviance df.resid
-132.3 -124.9 70.1 -140.3 43
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
region (Intercept) 0.141157 0.37571
Residual 0.001643 0.04053
Number of obs: 47, groups: region, 4
Dispersion estimate for gaussian family (sigma^2): 0.00164
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.604302 0.189016 3.20 0.00139 **
size 0.017296 0.000507 34.12 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Mixed model with random intercept and random slope
\[ Hg_i \sim \beta_0 + \beta_1size_{i} + \epsilon \]
i individuals, j sites (4)
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{(\beta_1+\underbrace{\psi_j}_{\text{Random slope}})size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
Variance comes from random “selection” of fish within a net
Variance comes from random “selection” of sites
Random slope and intercept
m3<- glmmTMB (Hg~ size + (1 + size| region), data= HgDat2_df)
summary (m3)
Family: gaussian ( identity )
Formula: Hg ~ size + (1 + size | region)
Data: HgDat2_df
AIC BIC logLik deviance df.resid
-133.9 -119.4 72.9 -145.9 76
Random effects:
Conditional model:
Groups Name Variance Std.Dev. Corr
region (Intercept) 3.323e-02 0.182284
size 4.497e-05 0.006706 -0.09
Residual 6.844e-03 0.082725
Number of obs: 82, groups: region, 4
Dispersion estimate for gaussian family (sigma^2): 0.00684
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.062721 0.098281 10.813 <2e-16 ***
size 0.029886 0.003458 8.642 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Random effects are specified as \(x|g\)
x is an effect
g is grouping factor (categorical)
\(1 + size|region\)
Effect -> 1 (intercept) + size
Grouping factor -> region
Random slope and intercept
Only random slope
Family: gaussian ( identity )
Formula: Hg ~ size + (0 + size | region)
Data: HgDat2_df
AIC BIC logLik deviance df.resid
-149.8 -140.1 78.9 -157.8 80
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
region size 0.0001243 0.01115
Residual 0.0065550 0.08096
Number of obs: 84, groups: region, 4
Dispersion estimate for gaussian family (sigma^2): 0.00656
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.976118 0.032413 30.115 < 2e-16 ***
size 0.027121 0.005629 4.818 1.45e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RIKZ data
RIKZ data
RIKZ institute
Inter-tidal area (AKA beach): 9
In each beach, 5 samples were taken
Response variable: macro-fauna richness (AKA number of species)
NAP: height of sampling station compared to mean tidal level
Exposure (index). Of multiple environmental conditions. Treated as categorical, as there are three levels
RIKZ data
flowchart TD
A[RIKZ DATA] --> B(Beach 1)
A --> C(Beach 2)
A --> E(Beach 9)
B --> F(site 1)
B --> G(site 2)
B --> H(site 3)
B --> I(site 4)
B --> J(site 5)
C --> K(site 1)
C --> L(site 2)
C --> M(site 3)
C --> N(site 4)
C --> O(site 5)
E --> P(site 1)
E --> Q(site 2)
E --> R(site 3)
E --> S(site 4)
E --> T(site 5)
Let’s work on this example
Objective
flowchart TD
A[RIKZ DATA] --> B(Beach 1)
A --> C(Beach 2)
A --> E(Beach 9)
B --> F(site 1)
B --> G(site 2)
B --> H(site 3)
B --> I(site 4)
B --> J(site 5)
C --> K(site 1)
C --> L(site 2)
C --> M(site 3)
C --> N(site 4)
C --> O(site 5)
E --> P(site 1)
E --> Q(site 2)
E --> R(site 3)
E --> S(site 4)
E --> T(site 5)
c
Exploring whether there is a relationship between richness and the two factors: NAP and exposure
We have an N of 45… but do we?
We have multiple potential alternatives: there is an effect of NAP, an effect of Exposure, an effect of both, of neither
Also… there are mixed effects
Random intercept, random slope, both? –> let’s wait to discuss this
Step 1: Read the data… there is a potential issue
rikz<- read.table (file = "RIKZ.txt" , header = TRUE , dec = "," )
str (rikz)
Step 2: decide our analysis! 🤔 Any ideas? any details?
rikz<-read.table(file = “RIKZ.txt”, header = TRUE, dec = “,”)
rikz$NAP<-as.numeric(rikz$NAP)
rikz$Beach<-factor(rikz$Beach)
rikz$Exposure<-factor(rikz$Exposure)
The data
The data
There seems to be an effect of NAP…
how about… a random effect of beach?
step 3: how do we decide random effects?
step 3: random effects of NAP, Exposure? or both?
What do we do?
Model selection with random models
WE CANNOT TEST FIXED AND RANDOM EFFECTS AT THE SAME TIME
Talk about Zuur’s weird non-Poisson, and normal analysis
Model 1:
The no random effects model is:
\[
log(ERichness) \sim \beta_0 + \beta_1 (NAP)_i + \beta_2 Exposure2_i ...
\]
\[
Richness_i \sim Poisson(ERichness)_i
\]
And the code to run it is:
model1<- glmmTMB (Richness~ NAP* Exposure, data = rikz, family = poisson)
It has no random effect… so, it’s just run as a simple glm.
Model 2
The random intercept model is:
\[ log(ERichness) \sim (\beta_0 + \gamma_j) + \beta_1 (NAP)_i + \beta_2 Exposure2_i ... \]
\[ Richness_i \sim Poisson(ERichness)_i \]
And the way to introduce than randomness \(\gamma_j\) is:
\[
\underbrace{model2}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (1|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution})
\]
or:
model2<- glmmTMB (Richness~ NAP* Exposure + (1 | Beach), data = rikz, family = poisson)
If you see the equation, we need to estimate one gamma for each beach. That is why we use (1|Beach). - x is an effect
Model 3
Random slope:
\[ log(ERichness) \sim \beta_0 + (\beta_1 * \psi_j * NAP_i )+ \beta_2 Exposure2_i ... \]
\[ Richness_i \sim Poisson(ERichness)_i \]
Here, there is a random slope (\(\psi_j\) ), and it affects NAP. Exposure is categorical, so, it doesn’t hav a slope. This is how we run it:
\[
\underbrace{model3}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (0+NAP|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution})
\]
Here, we don’t want it to estimate random intercepts, hence the 0. However, we want it to estimate random slopes, so, that’s why we ass NAP
x is an effect
Model 4:
Random slope and intercept:
\[ log(ERichness) \sim (\beta_0 + \gamma_j) + (\beta_1 * \psi_j * NAP_i )+ \beta_2 Exposure2_i ... \]
\[ Richness_i \sim Poisson(ERichness)_i \]
Here, there is a random slope (\(\psi_j\) ) and random intercept (\(\gamma_i\) ). This is how we run it:
\[
\underbrace{model4}_{\text{Model Name}}<- \underbrace{glmmTMB}_{\text{function for mixed effects models}} (\underbrace{\text{Richness~NAP*Exposure}}_{\text{Fixed effects}} \underbrace{+ (1+NAP|Beach)}_{Random effects}, data = rikz, \underbrace{family = poisson}_{Distribution})
\]
Or
Model selection with random models
Model selection based on AICc:
K AICc Delta_AICc AICcWt Cum.Wt LL
Mod1 6 210.94 0.00 0.62 0.62 -98.36
Mod3 7 213.24 2.30 0.20 0.81 -98.10
Mod2 7 213.52 2.59 0.17 0.98 -98.25
Mod4 9 217.81 6.87 0.02 1.00 -97.33
Are the AIC values the same?
Notes on AIC
I recommend using the AICcmodavg package and making a list!
AICcmodavg:: aictab (list (model1,model2,model3,model4))
Next step: let’s explore the model (factors!)
How do you explore the best model?
Next step: let’s explore the model (factors!)
Anova (glm (Richness~ NAP* Exposure, data = rikz, family = poisson))
Analysis of Deviance Table (Type II tests)
Response: Richness
LR Chisq Df Pr(>Chisq)
NAP 58.915 1 1.646e-14 ***
Exposure 54.810 2 1.254e-12 ***
NAP:Exposure 3.644 2 0.1617
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Talk about Zuur’s approach here
Next step?
Model selection based on AICc:
K AICc Delta_AICc AICcWt Cum.Wt LL
Mod2 4 209.37 0.00 0.69 0.69 -100.19
Mod1 6 210.94 1.57 0.31 1.00 -98.36
Mod3 2 259.47 50.10 0.00 1.00 -127.59
Mod4 3 265.87 56.50 0.00 1.00 -129.64
Mod5 1 323.84 114.47 0.00 1.00 -160.88
Evidence ratios
Burnham and Anderson
Evidence ratios
Evidence ratio between models 'Mod2' and 'Mod1':
2.19
Last step: Model validation
Of the best model.
Other steps?
Check assumptions (if normal)
Mixed effects models pt 2
Split plot with adjustments
Repeated measures ANOVA
Model:
\[
y_{ijk} = \mu + \alpha_i + \beta_j + \alpha \beta_{ij} + \delta_{ik} + \epsilon_{ijk}
\]
mu = grand mean
α i = effect of the i th treatment level
β j = effect of the j th level of time
α β i j = interaction effect between treatment and time
δ i k = subject effect
ϵ i j k = residual, unexplained variation
Split plot
You need to adjust the p-value!
It is the best method to use for well-balanced, experimental data. Also, simplest to do and interpret
You get p-values for time, for treatment and for the interaction
Mixed models with correlation
More about this on Friday’s lab
Wednesday class homework
We will talk about visualization
We can discuss plots, or figure out how to make some plots that are important for you
Send a figure of a plot that you may not understand or want to learn how to plot it
Data visualization
My favorite part of stats/data analysis/ etc
You can be creative! but be smart and ethical.
Plots are easy to “manipulate”
Misleading plots
Incorrect and misleading plots
Plotting
Estimate (point) and CI
Similar to a line and CI
Distribution plots
Distribution plots
Correlation
Scatterplot
Heatmap
Correlogram
Bubble
Combining scatterplot and distribution
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Combining scatterplot and distribution
Combining scatterplot and distribution
Reminder