Mixed-Effects Model

Part 1: Repeatability of a morphological trait in walking sticks
- Fit a linear mixed-effects model
Part 2: Sexual Dimorphism in Penguins
- Fit a linear mixed-effects model
Part 3: Barley variety on crop yield
- Fit a linear mixed-effects model

The following workshop focuses on applying mixed-effects linear models to ecology-based datasets in R. Mixed-effects models require data to include random effects, which persist when multiple measurements are used on randomly sampled data.

The initial step is to set up the workspace and directory, and you should also load the lme4, nlme, visreg, lmerTest, ggplot2, emmeans, and dplyr libraries.

# Clear the environment from any previous workshops.
rm(list=ls())

# Load packages.
{
library(lme4)
library(nlme)
library(visreg)
library(lmerTest)
library(ggplot2)
library(emmeans)
library(dplyr)
}

# Get working directory
getwd()

## [1] "/Users/airiyamane/Downloads"

# Setting the new working directory
setwd("/Users/airiyamane/Downloads")

Part 1: Repeatability of a morphological trait in walking sticks

As a first workshop activity, we use a data set extracted from a paper from Nosil and Crespi (2006, Proceedings of the National Academy of Sciences 103: 9090–9095), who investigated adaptive radiation using Timema stick insects to understand the evolution of ecological and phenotypic diversity. The researchers focused on various traits, including color, body size, and shape, among different ecotypes utilizing distinct host plants. They conducted a field experiment that demonstrated the correlation between predator presence and the direction and strength of selection, which in turn influenced population divergence. We will obtain the repeatability of this experiment.

# 1. Read and examine the “timema.csv” dataset from MyCourses.
walkingsticks <- read.csv(file = "walkingsticks.csv", na = c("NA", ""))

# 2. Display the first few lines of the dataframe.
head(walkingsticks)

##   specimen headwidth
## 1        1      0.15
## 2        1      0.15
## 3        2      0.18
## 4        2      0.18
## 5        3      0.17
## 6        3      0.18

# 3.
# Calculate the mean head width for each specimen, as well as which of its
# two measurements is the smallest and which the largest. Save the result in
# the same data frame.
walkingstick <- mutate(group_by(walkingsticks, specimen), meanHeadwidth = mean(headwidth))
head(data.frame(walkingstick))

##   specimen headwidth meanHeadwidth
## 1        1      0.15         0.150
## 2        1      0.15         0.150
## 3        2      0.18         0.180
## 4        2      0.18         0.180
## 5        3      0.17         0.175
## 6        3      0.18         0.175

# 4.
# Order the specimens by their mean measurement, from smallest to largest.
walkingstick <- arrange(walkingstick, meanHeadwidth)
walkingstick$indiv <- rep(1:25, rep(2, 25))

# 5.
# Draw the plot.
# stat_summary layer with geom = "linerange" draws a line from the smallest to
# largest measurement for each individual. geom_point() overlays the points.
ggplot(walkingstick, aes(indiv, headwidth)) +
  stat_summary(fun.data = mean_se, geom = "linerange",
               colour = "black") +
  geom_point(color = "firebrick", size = 3) +
  labs(x = "Individual walking stick", y = "Head width (cm)") +
  theme_classic()

Fit a linear mixed-effects model

# 1.
# Fit a linear mixed-effects model, with "headwidths" as a response variable,
# and "specimen" as a random effect.
walkingstickAnova <- lmer(headwidth ~ (1 | specimen), data = walkingstick)

# 2.
# Get variance components of walkingstickAnova.
VarCorr(walkingstickAnova)

##  Groups   Name        Std.Dev.
##  specimen (Intercept) 0.015682
##  Residual             0.012884

The specimen (Intercept) Std.Dev. value of 0.015682 represents the variation between specimens. The Residual Std.Dev. value of 0.012884 represents the variation within specimens.

# 3.
# Calculate the repeatability of "headwidth"
SD_amsq <- 0.015682^2
SD_wisq <- 0.012884^2
repeatability <- SD_amsq/(SD_amsq + SD_wisq)

# Show the result.
repeatability

## [1] 0.5970178

Approximately 59.7% of the total variation in head width is explained by differences between specimens, while the remaining 40.3% is due to measurement error or other sources of within-specimen variation, this error is large.

# 5.
ggplot(walkingstick, aes(indiv, headwidth)) +
  stat_summary(fun.data = mean_se, geom = "linerange",
               colour = "black") +
  geom_point(color = "firebrick", size = 3) +
  geom_point(aes(y = fitted(walkingstickAnova)), color = "skyblue3", shape = 16, size = 3) +
  labs(x = "Individual walking stick", y = "Head width (cm)") +
  theme_classic()

Part 2: Sexual Dimorphism in Penguins

For the second part of this workshop, we are using data from Gorman et al.(2014, PLoS ONE, 9(3), e90081) that investigates ecological sexual size dimorphism and its effects on sex-specific foraging and breeding foraging niches. They use various techniques to look at genomic analyses such as gel electrophoresis and primers, while also turning to stable isotope signature analysis to look at trophic feeding and diet composition. This section will work on something a little simpler, investigating sexual size dimorphism in flipper length, under the effects of different species and islands, which will be the random effects for the linear mixed-effects model.

# 1. Read and examine the “penguins” dataset from MyCourses.
penguins<- read.csv(file = "penguins.csv", na = c("NA", ""))

# 2.View the first few lines of the dataset to ensure the data was properly read.
head(penguins)

##   rownames species    island bill_length_mm bill_depth_mm flipper_length_mm
## 1        1  Adelie Torgersen           39.1          18.7               181
## 2        2  Adelie Torgersen           39.5          17.4               186
## 3        3  Adelie Torgersen           40.3          18.0               195
## 4        4  Adelie Torgersen             NA            NA                NA
## 5        5  Adelie Torgersen           36.7          19.3               193
## 6        6  Adelie Torgersen           39.3          20.6               190
##   body_mass_g    sex
## 1        3750   male
## 2        3800 female
## 3        3250 female
## 4          NA   <NA>
## 5        3450 female
## 6        3650   male

# 3. Clean the data. Delete the unnecessary columns in the dataset 
# (bill_length_mm, bill_depth_mm, body_mass_g). Once that is done, notice that 
# there are NA values and blanks in the data. Index the blank spaces in the 
# dataset and transform them into NA values, then remove all rows that contain 
# an NA value. The data is now ready for analysis.
penguins <- penguins %>%
  select(-bill_length_mm, -bill_depth_mm, -body_mass_g)

penguins[penguins == ""] <- NA  # Convert blanks to NA
penguins <- na.omit(penguins)   # Remove rows with NA values

# 4. Create a boxplot of the data, showing flipper length for each sex. 
ggplot(penguins, aes(x = sex, y = flipper_length_mm)) +
  geom_boxplot(aes(fill = sex)) +
  theme_minimal() +
  labs(title = "Flipper Length by Sex",
       x = "Sex",
       y = "Flipper Length (mm)")

# 5. 
# Create another boxplot and play around with the data to familiarise yourself
# with it. Display two of island, species, and sex variables in different 
# combinations using one as the x-axis variable and another as the
# aes(col = variable) in geom_boxplot for example. 
ggplot(penguins, aes(x = island, y = flipper_length_mm, col = species)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Flipper Length by Island and Species",
       x = "Island",
       y = "Flipper Length (mm)")

Fit a linear mixed-effects model

# 1. 
# Fit a Linear Mixed-Effects model on the data and store it in a variable
# The random effects here are the species and island variables, sex is the 
# explanatory variable and flipper length the response variable
zpeng <- lmer(flipper_length_mm ~ sex + (sex|species) + (1|island), data = penguins)
# Here we use (sex|species) instead of (1|species) because we expect the
# effect of sex to be different by species, but we keep (1|island), because
# we do not think the effect of sex will change across islands. This
# allows the slope of sex effect to change for each species.

# 2. 
# Use emmeans to get the means for each sex 
sex_means <- emmeans(zpeng, "sex", data = penguins)
# Print the results
sex_means

##  sex    emmean   SE   df lower.CL upper.CL
##  female    198 7.94 2.02      164      232
##  male      205 9.04 2.02      166      243
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95

# The mean for males is 205 and females 198, with SEs being 9.04 and 7.94 
# respectively

# 3. 
# Use the summary() function to find the variances for the random effects
summary(zpeng)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: flipper_length_mm ~ sex + (sex | species) + (1 | island)
##    Data: penguins
## 
## REML criterion at convergence: 2108.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.61449 -0.62406  0.00922  0.60828  2.90336 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  species  (Intercept) 186.957  13.673       
##           sexmale       3.991   1.998   0.95
##  island   (Intercept)   1.183   1.088       
##  Residual              31.667   5.627       
## Number of obs: 333, groups:  species, 3; island, 3
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)   
## (Intercept)  197.875      7.934   2.018  24.940  0.00153 **
## sexmale        6.934      1.311   2.130   5.289  0.02961 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr)
## sexmale 0.815

# The variance for species is 185.97 while its sexmale value is 3.991. 
# The island variance is 1.18 and the residual is of 31.67.

# 4.
# The variance for species is extremely high, which means that species 
# is heavily responsible for variation. Its sexmale value means that the
# effect of sex does vary across species (different slopes).  The variance for 
# island is extremely low, meaning that it has little effect on variation in the
# model.

# 5.
# The intercept of 197.88 shows the value of females, while the sexmale 
# value shows the difference in males, meaning males have flippers that
# are on average 6.93 mm longer than female flippers. The associated std
# errors are 7.93 and 1.31 respectively. The t-values are high, suggesting 
# the intercept and difference are significant. The p-value is significant for 
# the difference. We conclude that there is sexual dimorphism in penguins 
# based on flipper length when considering the random effects.

# Fitting linear model to data and comparing results with and without random effects
zfixedpeng <- lm(flipper_length_mm ~ sex, data = penguins)
summary(zfixedpeng)

## 
## Call:
## lm(formula = flipper_length_mm ~ sex, data = penguins)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.506 -10.364  -4.364  12.636  26.494 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  197.364      1.057 186.792  < 2e-16 ***
## sexmale        7.142      1.488   4.801 2.39e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.57 on 331 degrees of freedom
## Multiple R-squared:  0.06511,    Adjusted R-squared:  0.06229 
## F-statistic: 23.05 on 1 and 331 DF,  p-value: 2.391e-06

# The summary shows statistically highly significant difference in flipper length 
# between sexes with large t-values and extremely small p-values. This
# means that the random effects of species and island did not cause SSD to become 
# insignificant in the linear mixed-effects model.

Part 3: Barley variety on crop yield

# 1. 
# Read and examine the “barley” dataset from MyCourses.
barley <- read.csv(file = "barley.csv", stringsAsFactors = TRUE, na = c("NA", ""))

# 2. 
# View the first few lines of the dataset to ensure the data was properly 
# read.
head(barley)

##   rownames    yield   variety            site
## 1        1 27.00000 Manchuria University Farm
## 2        2 48.86667 Manchuria          Waseca
## 3        3 27.43334 Manchuria          Morris
## 4        4 39.93333 Manchuria       Crookston
## 5        5 32.96667 Manchuria    Grand Rapids
## 6        6 28.96667 Manchuria          Duluth

Fit a linear mixed-effects model

# 1. 
# Fit a mixed-effects linear model to the data. Store the output in an lmer 
# object. 
barley_model <- lmer(yield ~ variety + (1 | site), data = barley)

# 2. 
# Use the interaction.plot command to see how the mean yield changes between 
# different species separately for each site. What site has the greatest means 
# of barley yield? 
attach(barley)
interaction.plot(variety, site, yield)

# Q. What site has the greatest means of barley yield?
# A. Waseca

# 3.
# Using visreg, generate two plots to visualize model fits to data. In the 
# first plot, plot the fit of the response variable to each variety. In the 
# second plot, plot the fit of the data to variety separated by sites. 
visreg(barley_model, xvar = "variety")

visreg(barley_model, xvar = "variety", by = "site", scales = list(rot = 90))

# 4. 
# Obtain the model-based estimates of the variety means. 
emmeans(barley_model, "variety", data = barley)

##  variety          emmean   SE   df lower.CL upper.CL
##  Glabron            37.3 4.48 6.83     26.7     48.0
##  Manchuria          34.2 4.48 6.83     23.5     44.8
##  No. 457            40.2 4.48 6.83     29.6     50.9
##  No. 462            39.1 4.48 6.83     28.4     49.7
##  No. 475            31.8 4.48 6.83     21.2     42.5
##  Peatland           36.6 4.48 6.83     25.9     47.2
##  Svansota           34.0 4.48 6.83     23.4     44.7
##  Trebi              42.5 4.48 6.83     31.8     53.1
##  Velvet             34.5 4.48 6.83     23.8     45.1
##  Wisconsin No. 38   40.6 4.48 6.83     29.9     51.2
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95

# Q. Did one variety produce more than the others? 
# A. Trebi variety yielded the most.

# 5. 
# Use the anova command to test the fixed effects. First use type I SS, then 
# use type III SS. You should get identical results. Why might that be the case? 
anova(barley_model, type = 1)  # Type I SS

## Type I Analysis of Variance Table with Satterthwaite's method
##         Sum Sq Mean Sq NumDF DenDF F value   Pr(>F)   
## variety 646.26  71.807     9    45  3.6799 0.001612 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(barley_model, type = 3)  # Type III SS

## Type III Analysis of Variance Table with Satterthwaite's method
##         Sum Sq Mean Sq NumDF DenDF F value   Pr(>F)   
## variety 646.26  71.807     9    45  3.6799 0.001612 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Indicates the data is balanced, that there are an equal number of observation
# for each treatment within each block.

Mixed-Effects Model

Workshopr 5 Solutions

Airi, Ani, Jed, William

2025-02-07

Part 1: Repeatability of a morphological trait in walking sticks

Fit a linear mixed-effects model

Part 2: Sexual Dimorphism in Penguins

Fit a linear mixed-effects model

Part 3: Barley variety on crop yield

Fit a linear mixed-effects model