The following workshop focuses on applying mixed-effects linear models to ecology-based datasets in R. Mixed-effects models require data to include random effects, which persist when multiple measurements are used on randomly sampled data.
The initial step is to set up the workspace and directory, and you should also load the lme4, nlme, visreg, lmerTest, ggplot2, emmeans, and dplyr libraries.
# Clear the environment from any previous workshops.
rm(list=ls())
# Load packages.
{
library(lme4)
library(nlme)
library(visreg)
library(lmerTest)
library(ggplot2)
library(emmeans)
library(dplyr)
}
# Get working directory
getwd()
## [1] "/Users/airiyamane/Downloads"
# Setting the new working directory
setwd("/Users/airiyamane/Downloads")
As a first workshop activity, we use a data set extracted from a paper from Nosil and Crespi (2006, Proceedings of the National Academy of Sciences 103: 9090–9095), who investigated adaptive radiation using Timema stick insects to understand the evolution of ecological and phenotypic diversity. The researchers focused on various traits, including color, body size, and shape, among different ecotypes utilizing distinct host plants. They conducted a field experiment that demonstrated the correlation between predator presence and the direction and strength of selection, which in turn influenced population divergence. We will obtain the repeatability of this experiment.
# 1. Read and examine the “timema.csv” dataset from MyCourses.
walkingsticks <- read.csv(file = "walkingsticks.csv", na = c("NA", ""))
# 2. Display the first few lines of the dataframe.
head(walkingsticks)
## specimen headwidth
## 1 1 0.15
## 2 1 0.15
## 3 2 0.18
## 4 2 0.18
## 5 3 0.17
## 6 3 0.18
# 3.
# Calculate the mean head width for each specimen, as well as which of its
# two measurements is the smallest and which the largest. Save the result in
# the same data frame.
walkingstick <- mutate(group_by(walkingsticks, specimen), meanHeadwidth = mean(headwidth))
head(data.frame(walkingstick))
## specimen headwidth meanHeadwidth
## 1 1 0.15 0.150
## 2 1 0.15 0.150
## 3 2 0.18 0.180
## 4 2 0.18 0.180
## 5 3 0.17 0.175
## 6 3 0.18 0.175
# 4.
# Order the specimens by their mean measurement, from smallest to largest.
walkingstick <- arrange(walkingstick, meanHeadwidth)
walkingstick$indiv <- rep(1:25, rep(2, 25))
# 5.
# Draw the plot.
# stat_summary layer with geom = "linerange" draws a line from the smallest to
# largest measurement for each individual. geom_point() overlays the points.
ggplot(walkingstick, aes(indiv, headwidth)) +
stat_summary(fun.data = mean_se, geom = "linerange",
colour = "black") +
geom_point(color = "firebrick", size = 3) +
labs(x = "Individual walking stick", y = "Head width (cm)") +
theme_classic()
# 1.
# Fit a linear mixed-effects model, with "headwidths" as a response variable,
# and "specimen" as a random effect.
walkingstickAnova <- lmer(headwidth ~ (1 | specimen), data = walkingstick)
# 2.
# Get variance components of walkingstickAnova.
VarCorr(walkingstickAnova)
## Groups Name Std.Dev.
## specimen (Intercept) 0.015682
## Residual 0.012884
The specimen (Intercept) Std.Dev. value of 0.015682 represents the variation between specimens. The Residual Std.Dev. value of 0.012884 represents the variation within specimens.
# 3.
# Calculate the repeatability of "headwidth"
SD_amsq <- 0.015682^2
SD_wisq <- 0.012884^2
repeatability <- SD_amsq/(SD_amsq + SD_wisq)
# Show the result.
repeatability
## [1] 0.5970178
# 5.
ggplot(walkingstick, aes(indiv, headwidth)) +
stat_summary(fun.data = mean_se, geom = "linerange",
colour = "black") +
geom_point(color = "firebrick", size = 3) +
geom_point(aes(y = fitted(walkingstickAnova)), color = "skyblue3", shape = 16, size = 3) +
labs(x = "Individual walking stick", y = "Head width (cm)") +
theme_classic()
For the second part of this workshop, we are using data from Gorman et al.(2014, PLoS ONE, 9(3), e90081) that investigates ecological sexual size dimorphism and its effects on sex-specific foraging and breeding foraging niches. They use various techniques to look at genomic analyses such as gel electrophoresis and primers, while also turning to stable isotope signature analysis to look at trophic feeding and diet composition. This section will work on something a little simpler, investigating sexual size dimorphism in flipper length, under the effects of different species and islands, which will be the random effects for the linear mixed-effects model.
# 1. Read and examine the “penguins” dataset from MyCourses.
penguins<- read.csv(file = "penguins.csv", na = c("NA", ""))
# 2.View the first few lines of the dataset to ensure the data was properly read.
head(penguins)
## rownames species island bill_length_mm bill_depth_mm flipper_length_mm
## 1 1 Adelie Torgersen 39.1 18.7 181
## 2 2 Adelie Torgersen 39.5 17.4 186
## 3 3 Adelie Torgersen 40.3 18.0 195
## 4 4 Adelie Torgersen NA NA NA
## 5 5 Adelie Torgersen 36.7 19.3 193
## 6 6 Adelie Torgersen 39.3 20.6 190
## body_mass_g sex
## 1 3750 male
## 2 3800 female
## 3 3250 female
## 4 NA <NA>
## 5 3450 female
## 6 3650 male
# 3. Clean the data. Delete the unnecessary columns in the dataset
# (bill_length_mm, bill_depth_mm, body_mass_g). Once that is done, notice that
# there are NA values and blanks in the data. Index the blank spaces in the
# dataset and transform them into NA values, then remove all rows that contain
# an NA value. The data is now ready for analysis.
penguins <- penguins %>%
select(-bill_length_mm, -bill_depth_mm, -body_mass_g)
penguins[penguins == ""] <- NA # Convert blanks to NA
penguins <- na.omit(penguins) # Remove rows with NA values
# 4. Create a boxplot of the data, showing flipper length for each sex.
ggplot(penguins, aes(x = sex, y = flipper_length_mm)) +
geom_boxplot(aes(fill = sex)) +
theme_minimal() +
labs(title = "Flipper Length by Sex",
x = "Sex",
y = "Flipper Length (mm)")
# 5.
# Create another boxplot and play around with the data to familiarise yourself
# with it. Display two of island, species, and sex variables in different
# combinations using one as the x-axis variable and another as the
# aes(col = variable) in geom_boxplot for example.
ggplot(penguins, aes(x = island, y = flipper_length_mm, col = species)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Flipper Length by Island and Species",
x = "Island",
y = "Flipper Length (mm)")
# 1.
# Fit a Linear Mixed-Effects model on the data and store it in a variable
# The random effects here are the species and island variables, sex is the
# explanatory variable and flipper length the response variable
zpeng <- lmer(flipper_length_mm ~ sex + (sex|species) + (1|island), data = penguins)
# Here we use (sex|species) instead of (1|species) because we expect the
# effect of sex to be different by species, but we keep (1|island), because
# we do not think the effect of sex will change across islands. This
# allows the slope of sex effect to change for each species.
# 2.
# Use emmeans to get the means for each sex
sex_means <- emmeans(zpeng, "sex", data = penguins)
# Print the results
sex_means
## sex emmean SE df lower.CL upper.CL
## female 198 7.94 2.02 164 232
## male 205 9.04 2.02 166 243
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
# The mean for males is 205 and females 198, with SEs being 9.04 and 7.94
# respectively
# 3.
# Use the summary() function to find the variances for the random effects
summary(zpeng)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: flipper_length_mm ~ sex + (sex | species) + (1 | island)
## Data: penguins
##
## REML criterion at convergence: 2108.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.61449 -0.62406 0.00922 0.60828 2.90336
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## species (Intercept) 186.957 13.673
## sexmale 3.991 1.998 0.95
## island (Intercept) 1.183 1.088
## Residual 31.667 5.627
## Number of obs: 333, groups: species, 3; island, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 197.875 7.934 2.018 24.940 0.00153 **
## sexmale 6.934 1.311 2.130 5.289 0.02961 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## sexmale 0.815
# The variance for species is 185.97 while its sexmale value is 3.991.
# The island variance is 1.18 and the residual is of 31.67.
# 4.
# The variance for species is extremely high, which means that species
# is heavily responsible for variation. Its sexmale value means that the
# effect of sex does vary across species (different slopes). The variance for
# island is extremely low, meaning that it has little effect on variation in the
# model.
# 5.
# The intercept of 197.88 shows the value of females, while the sexmale
# value shows the difference in males, meaning males have flippers that
# are on average 6.93 mm longer than female flippers. The associated std
# errors are 7.93 and 1.31 respectively. The t-values are high, suggesting
# the intercept and difference are significant. The p-value is significant for
# the difference. We conclude that there is sexual dimorphism in penguins
# based on flipper length when considering the random effects.
# Fitting linear model to data and comparing results with and without random effects
zfixedpeng <- lm(flipper_length_mm ~ sex, data = penguins)
summary(zfixedpeng)
##
## Call:
## lm(formula = flipper_length_mm ~ sex, data = penguins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.506 -10.364 -4.364 12.636 26.494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 197.364 1.057 186.792 < 2e-16 ***
## sexmale 7.142 1.488 4.801 2.39e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.57 on 331 degrees of freedom
## Multiple R-squared: 0.06511, Adjusted R-squared: 0.06229
## F-statistic: 23.05 on 1 and 331 DF, p-value: 2.391e-06
# The summary shows statistically highly significant difference in flipper length
# between sexes with large t-values and extremely small p-values. This
# means that the random effects of species and island did not cause SSD to become
# insignificant in the linear mixed-effects model.
# 1.
# Read and examine the “barley” dataset from MyCourses.
barley <- read.csv(file = "barley.csv", stringsAsFactors = TRUE, na = c("NA", ""))
# 2.
# View the first few lines of the dataset to ensure the data was properly
# read.
head(barley)
## rownames yield variety site
## 1 1 27.00000 Manchuria University Farm
## 2 2 48.86667 Manchuria Waseca
## 3 3 27.43334 Manchuria Morris
## 4 4 39.93333 Manchuria Crookston
## 5 5 32.96667 Manchuria Grand Rapids
## 6 6 28.96667 Manchuria Duluth
# 1.
# Fit a mixed-effects linear model to the data. Store the output in an lmer
# object.
barley_model <- lmer(yield ~ variety + (1 | site), data = barley)
# 2.
# Use the interaction.plot command to see how the mean yield changes between
# different species separately for each site. What site has the greatest means
# of barley yield?
attach(barley)
interaction.plot(variety, site, yield)
# Q. What site has the greatest means of barley yield?
# A. Waseca
# 3.
# Using visreg, generate two plots to visualize model fits to data. In the
# first plot, plot the fit of the response variable to each variety. In the
# second plot, plot the fit of the data to variety separated by sites.
visreg(barley_model, xvar = "variety")
visreg(barley_model, xvar = "variety", by = "site", scales = list(rot = 90))
# 4.
# Obtain the model-based estimates of the variety means.
emmeans(barley_model, "variety", data = barley)
## variety emmean SE df lower.CL upper.CL
## Glabron 37.3 4.48 6.83 26.7 48.0
## Manchuria 34.2 4.48 6.83 23.5 44.8
## No. 457 40.2 4.48 6.83 29.6 50.9
## No. 462 39.1 4.48 6.83 28.4 49.7
## No. 475 31.8 4.48 6.83 21.2 42.5
## Peatland 36.6 4.48 6.83 25.9 47.2
## Svansota 34.0 4.48 6.83 23.4 44.7
## Trebi 42.5 4.48 6.83 31.8 53.1
## Velvet 34.5 4.48 6.83 23.8 45.1
## Wisconsin No. 38 40.6 4.48 6.83 29.9 51.2
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
# Q. Did one variety produce more than the others?
# A. Trebi variety yielded the most.
# 5.
# Use the anova command to test the fixed effects. First use type I SS, then
# use type III SS. You should get identical results. Why might that be the case?
anova(barley_model, type = 1) # Type I SS
## Type I Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## variety 646.26 71.807 9 45 3.6799 0.001612 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(barley_model, type = 3) # Type III SS
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## variety 646.26 71.807 9 45 3.6799 0.001612 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Indicates the data is balanced, that there are an equal number of observation
# for each treatment within each block.