In ecology, a metacommunity is a network of communities of organisms that are interconnected by dispersal and colonization dynamics (Leibold et al. 2004). Here I describe a metacommunity simulation package (MCSim
) for the R statistical language. The current version of MCSim
can create lottery-based simulations of metacommunities, which can be used to test assumptions about the emergent patterns of metacommunities. See our paper in Ecological Modelling (Sokol et al. 2015).
This simulation is a zero-sum, individual-oriented, iterative, lottery-based simulation. That means that the number of individuals in the simulated metacommunity remains constant through time and each time step is a complete, synchronous generation for the entire metacommunity. A lottery process moves the simulation forward in time with each time step, allowing for stochastic metacommunity dynamics to occur. However, environmental filtering can also be built into the simulation, allowing for deterministic species sorting to occur.
Users can use igraph
to design network topology and seed simulations with observed data. Please find the most recent releases of MCSim
here, and the current version under development at https://github.com/sokole/MCSim.
In order to install the current version of MCSim
from github, you will need to have the devtools
package installed. install.packages("devtools")
Once devtools
is installed, you can install the current version of MCSim
# -- Install the current dev version of MCSim
devtools::install_github('sokole/MCSim')
# -- Install the version used in this tutorial
devtools::install_github('sokole/MCSim@v0.4.1.9001')
# v0.4.1.9001 is used in this demo
There are two steps to creating a metacommunity simulation in MCSim:
The landscape is the “game board” on which the simulation plays out, and it is created using the fn.make.landscape
function. A landscape can be made from a data.frame
, for example:
# A data frame with coordinates for 5 sites
xy.coordinates <- data.frame(
x = c(1, 2, 3, 4, 5),
y = c(1, 3, 1, 5, 2))
print(xy.coordinates)
## x y
## 1 1 1
## 2 2 3
## 3 3 1
## 4 4 5
## 5 5 2
Here are the sites that will make up the landscape plotted in xy space:
plot(y ~ x, xy.coordinates)
When making a landscape, we have to embed additional information about the simulation in the landscape so that MCSim can keep track of site characteristics, in addition to their location. For example, m
can be used to specify an immigration rate and JM
can be used to specify the metacommunity size (see Hubbell 2001), where JM
is the number of individual organisms that inhabit the metacommunity.
my.landscape <- MCSim::fn.make.landscape(
site.coords = xy.coordinates,
m = 0.5,
JM = 1000000)
The elements of a landscape include:
$site.info
, a data table of site characteristics.$site.coords
, the xy-coordinates of the sites.$dist.mat
, a distance matrix or a connectivity matrix created in igraph
$list.of.stuff
, a list that can be used to store extra information about the landscape, such as the properties of an igraph
tree.The $site.info
element of the list that is returned by fn.make.landscape
includes:
site.ID
, site labelsarea.m2
, the area of a site. This only matters if immigration is defined as no. individuals / m2JL
, the number of individuals that make each individual communityEf
, the “position” of a site along an environmental gradient that determines how the environmental filter will weight the lottery selection of species in the available source pool based on their habitat preferences, or “niche positions”.Ef.specificity
, the specificity, or strength, of the environmental filter. Default value is 0, which makes the filter select for a single point on the environmental gradient.IL
, the immigration rate at each site in individuals / m2m
, the immigration rate at each site reported as Hubbell’s mVectors can also be passed to fn.make.landscape
, for example:
my.landscape <- MCSim::fn.make.landscape(
site.coords = xy.coordinates,
m = c(0.5, 0.5, 0.1, 0.1, 0.01),
Ef = c(-1, -.25, .1, 1, 2),
JM = 1000000)
print(my.landscape$site.info)
## site.ID area.m2 JL Ef Ef.specificity IL m
## 1 1 1 2e+05 -1.00 0 199999.000 0.50
## 2 2 1 2e+05 -0.25 0 199999.000 0.50
## 3 3 1 2e+05 0.10 0 22222.111 0.10
## 4 4 1 2e+05 1.00 0 22222.111 0.10
## 5 5 1 2e+05 2.00 0 2020.192 0.01
Note that fn.make.landscape
will return a creative exclamation in the affirmative if it successfully creates a landscape.
There are alternative methods to creating a landscape, such as using igraph
to create a connectivity matrix that can be passed to fn.make.landscape
as a distance matrix. These methods will be explained in detail in a subsequent tutorial.
Once the landscape is created, you can pass the landscape object to fn.metaSIM
along with parameter settings that define the rules for how metacommunity dynamics will play out in the metacommunity simulation. Note that the current version of MCSim
is zero sum, which means there will always be JM individuals in the simulation during each generation.
set.seed(1234) #set random seed
# make a landscape
my.landscape <- MCSim::fn.make.landscape(
site.coords = xy.coordinates,
m = 0.5,
JM = 1000000)
# niche positions, niche breadths, and relative abundances for three species
niche.positions <- c(-.5, 0, .5)
niche.breadths <- c(.2, .2, 5)
regional.rel.abund <- c(.8, .1, .1)
# run a simulation with 10 generations
sim.result <- MCSim::fn.metaSIM(
landscape = my.landscape,
trait.Ef = niche.positions,
trait.Ef.sd = niche.breadths,
gamma.abund = regional.rel.abund,
W.r = 0,
nu = 0.001,
n.timestep = 10,
sim.ID = "my_test_sim",
output.dir.path = "my_sim_output_directory"
)
To run a simulation, we have to define:
niche.positions
, niche.breadths
, and gamma.abund
define the regional species pool that seeds the simulation.W.r
is the slope of the dispersal kernelnu
is the probability that a novel species will appear during a recruitment event.n.timestep
defines the number of generationssim.ID
provides a name for the simulation, which is saved along with parameter values in the output.dir.path
directory, which is created in the working directory of the R session.The simulation will create a list which will include the sim.result.name
, landscape
, dat.gamma.t0
(species abundances and trait characteristics at time 0), and J.long
(community composition at each site at each time step in long format). You can view first few rows of J.long
with head(sim.result$J.long)
print(sim.result$sim.result.name)
## [1] "SIM_my_test_sim_20160308_124118"
Note that a time stamp with the format yyyymmdd_hhmmss is appended to the end of the sim result name.
Resulting species counts for each site at each time step are listed in long format:
head(sim.result$J.long)
## timestep site spp count
## 1 1 1 spp1 29944
## 2 1 2 spp1 29868
## 3 1 3 spp1 30065
## 4 1 4 spp1 29946
## 5 1 5 spp1 30008
## 6 1 1 spp2 85120
vegan
package for RThe MCSim
package can also be used to create simulation scenarios based on empirical data sets. Here we use the mite data set available in the vegan
package for R (Oksanen et al. 2016).
Note that the empirical data set that we to seed the simulation includes
The basic steps to this analysis are:
ade4
package (Chessel et al. 2004) to estimate species’ niche positions and niche breadths# # -- Packages that need to be installed
# install.packages(c("ade4",
# "vegan",
# "plyr",
# "dplyr",
# "ggplot2"))
library(ade4)
library(vegan)
library(dplyr)
library(ggplot2)
# -------------------------------------------------------------
# -- Read in empirical data, calculate niches
# --------------------------
d.comm <- get(data("mite"))
# Here I'm only using the 10 most abundant mites in the data set
d.comm <- d.comm[,order(colSums(d.comm), decreasing = TRUE)][,1:10]
# Here I'm only using continuous environemntal variables
d.env <- scale(get(data("mite.env"))[,c("SubsDens","WatrCont")])
# Here are the xy-coordinates
d.geo <- get(data("mite.xy"))
# -------------------------------------------------------------
# -- Make a matrix with site information, useful for plotting later on
# --------------------------
d.siteinfo <- data.frame(
Site.code = paste0('site',c(1:nrow(d.env))),
get(data("mite.env"))
)
# -- calculate RAs from densities
d.comm.ra <- d.comm / rowSums(d.comm)
# -- calculate niches for species
dudi.pca.env <- dudi.pca(d.env, scale = TRUE, scan = FALSE, nf=1)
niche.species <- niche(dudi.pca.env, Y = d.comm.ra, scann = FALSE)
d.niche <- data.frame(
niche.pos=niche.species$li,
as.data.frame(niche.param(niche.species))
)
# -- calculate niche positions for each of the sites
d.site.niche <- data.frame(
d.siteinfo,
dudi.pca.env$li
)
# -- make 1-Dimensional axis for arranging sites in a plot along the environmental gradient
mod.pca.geo <- princomp(d.geo)
d.site.1D <- data.frame(
site.name = as.character(d.siteinfo$Site.code),
region = as.character(d.siteinfo$Topo),
pca.score = mod.pca.geo$scores[,1],
pca.rank = rank(mod.pca.geo$scores[,1])
)
The empirical data can be used to characterize the mite species’ niche preferences:
# -------------------------------------------------------------
# -- plot the coenoclines, species' niche positions along the environmental gradient
# --
# -- Note that the "rug" marks along the x-axis denote the sites' locations
# -- on the environmental gradient
# ------------------------
env.axis <- d.site.niche$Axis1
species.niche <- d.niche
niche.factor <- .5 #scale niche breadths
# -- function for plotting bell curves
fn_norm_curve <- function(sigma=1, mu = 0,...) {
curve(
(1/sigma * sqrt(2 * pi)) * exp((-1 *(x - mu)^2) / (2 * sigma^2)), ...) #formula for bell curve
}
# -- Initialize plot of coenoclines
plot(1,1,
xlim = c(min(env.axis), max(env.axis)),
ylim = c(0, 10),
type = 'n',
xlab = 'Environmental gradient (PCA axis 1 scores)',
ylab = 'Prob. dens.',
main = 'Mite niche positions and niche widths')
mypal <- rainbow(nrow(species.niche))
# -- loop to plot each species' habitat preference
for (i.spp in 1:ncol(d.comm)){
fn_norm_curve(
mu = species.niche[i.spp, 'Axis1'],
sigma = niche.factor*sqrt(species.niche[i.spp, 'Tol']),
add = TRUE,
col = mypal[i.spp])
}
# -- plot sites along the x-axis
rug(env.axis)
# -- legend for mite species codes
legend(x = 'topleft',
legend = row.names(species.niche),
lty = 1,
col = mypal,
cex = .75,
ncol = 4)
# -------------------------------------------------------------
# -- Metacommunity simulation using MCSim
# -- SCENARIO: Species sorting based on habitat preferences
# --------------------------
devtools::install_github('sokole/MCSim')
set.seed(1)
# -- make the landscape
# -- I arbitrarily chose m = 0.05 and JM = 1e6
simulation_landscape <- MCSim::fn.make.landscape(
site.coords = d.geo,
Ef = d.site.niche$Axis1,
m = 0.5,
JM = 1e6)
# -- IMPORTANT NOTES on the simulation
# -- 1. multiplying niche breadths (trait.Ef.sd) by 2 weakens the environmental
# -- filters slightly, allowing for more inter-specific co-occurrences.
# -- the simulation neutral
# -- 2. W.r = 5e6 produces a steep dispersal kernel such that dispersal
# -- effectively only occurs between adjacent sites.
simoutput<-MCSim::fn.metaSIM(
landscape = simulation_landscape,
output.dir.path = 'SIM_RESULTS',
scenario.ID = 'mite_example',
sim.ID = 'neutral.model',
trait.Ef = d.niche$Axis1,
trait.Ef.sd = sqrt(d.niche$Tol),
J.t0 = d.comm.ra,
n.timestep = 100,
W.r = 5e6,
nu = 0,
speciation.limit = 0,
save.sim = FALSE
)
# -----------------------------------------------------
# -- extract initial regional species pool from simulation
dat.gamma.t0 <- simoutput$dat.gamma.t0
dat.gamma.t0$env.rank <- rank(dat.gamma.t0$trait.Ef)
# -----------------------------------------------------
# -- make some dot blot plots
# ----------------------------
# -- extract timesteps to plot
J.long <- filter(simoutput$J.long, timestep%in%c(1,2,10,max(timestep)))
# -- group spp counts by time and site to calculate site count totals
J.time.site <- group_by(J.long, timestep, site)
JLs <- summarise(J.time.site, site.totals=sum(count))
J.JLs <- full_join(J.long, JLs, by=c('timestep','site'))
# -- calculate relative abundances (RAs) from counts and site totals, remove
# observations with RAs of 0
J.RAs.long<-J.JLs
J.RAs.long$RA<-J.RAs.long$count/J.RAs.long$site.totals
J.RAs.long<-filter(J.RAs.long, RA > 0)
# -- add environmental data and species names for plotting
J.RAs.long<-mutate(J.RAs.long,
spp.no = as.numeric(as.factor(spp)))
J.RAs.long$region <- d.site.1D[J.RAs.long$site, 'region']
J.RAs.long$pca.rank <- d.site.1D[J.RAs.long$site, 'pca.rank']
J.RAs.long$pca.score <- d.site.1D[J.RAs.long$site, 'pca.score']
J.RAs.long$env.rank <- dat.gamma.t0[J.RAs.long$spp,'env.rank']
# -- make a species characteristic data frame for labeling the spp axis
d.spp.RAs <- as.data.frame(J.RAs.long %>%
group_by(spp) %>%
summarise(max.RA = max(RA),
spp.no = spp.no[1],
env.rank = env.rank[1]))
# -- only label species with RAs over 0.40
spp.labels <- filter(d.spp.RAs, max.RA > .4)
# -- make plot
p <- ggplot(J.RAs.long, aes(pca.rank,
env.rank,
size = RA,
color = region))
p + geom_point() +
facet_grid(. ~ timestep) +
theme(axis.text.x = element_text(size = 8)) +
theme(axis.text.y = element_text(size = 6)) +
scale_size('Relative\nAbundance', range = c(.5,4)) +
scale_color_discrete('Region') +
labs(title = 'Neutral Model Metacommunity Simulation\nNo. generations ( ---> )',
x = 'Site ID',
y = 'Species ID') +
scale_y_continuous(
breaks = spp.labels$env.rank,
labels = spp.labels$spp
)
Conclusions. Given these model parameters for a species sorting scenario, local communities become dominated by taxa with matching habitat preferences after 100 generations.
# -------------------------------------------------------------
# -- Metacommunity simulation using MCSim
# -- SCENARIO: Neutral community model with limited dispersal
# --------------------------
set.seed(1)
# -- make the landscape
# -- I arbitrarily chose m = 0.05 and JM = 1e6
simulation_landscape <- MCSim::fn.make.landscape(
site.coords = d.geo,
Ef = d.site.niche$Axis1,
m = 0.5,
JM = 1e6)
# -- IMPORTANT NOTES on the simulation
# -- 1. multiplying niche breadths (trait.Ef.sd) by 1000 effectively makes
# -- the simulation neutral
# -- 2. W.r = 5e6 produces a steep dispersal kernel such that dispersal
# -- effectively only occurs between adjacent sites.
simoutput<-MCSim::fn.metaSIM(
landscape = simulation_landscape,
output.dir.path = 'SIM_RESULTS',
scenario.ID = 'mite_example',
sim.ID = 'neutral.model',
trait.Ef = d.niche$Axis1,
trait.Ef.sd = 1000 * (sqrt(d.niche$Tol)),
J.t0 = d.comm.ra,
n.timestep = 100,
W.r = 5e6,
nu = 0,
speciation.limit = 0,
save.sim = FALSE
)
# -----------------------------------------------------
# -- extract initial regional species pool from simulation
dat.gamma.t0 <- simoutput$dat.gamma.t0
dat.gamma.t0$env.rank <- rank(dat.gamma.t0$trait.Ef)
# -----------------------------------------------------
# -- make some dot blot plots
# ----------------------------
# -- extract timesteps to plot
J.long <- filter(simoutput$J.long, timestep%in%c(1,2,10,max(timestep)))
# -- group spp counts by time and site to calculate site count totals
J.time.site <- group_by(J.long, timestep, site)
JLs <- summarise(J.time.site, site.totals=sum(count))
J.JLs <- full_join(J.long, JLs, by=c('timestep','site'))
# -- calculate relative abundances (RAs) from counts and site totals, remove
# observations with RAs of 0
J.RAs.long<-J.JLs
J.RAs.long$RA<-J.RAs.long$count/J.RAs.long$site.totals
J.RAs.long<-filter(J.RAs.long, RA > 0)
# -- add environmental data and species names for plotting
J.RAs.long<-mutate(J.RAs.long,
spp.no = as.numeric(as.factor(spp)))
J.RAs.long$region <- d.site.1D[J.RAs.long$site, 'region']
J.RAs.long$pca.rank <- d.site.1D[J.RAs.long$site, 'pca.rank']
J.RAs.long$pca.score <- d.site.1D[J.RAs.long$site, 'pca.score']
J.RAs.long$env.rank <- dat.gamma.t0[J.RAs.long$spp,'env.rank']
# -- make a species characteristic data frame for labeling the spp axis
d.spp.RAs <- as.data.frame(J.RAs.long %>%
group_by(spp) %>%
summarise(max.RA = max(RA),
spp.no = spp.no[1],
env.rank = env.rank[1]))
# -- only label species with RAs over 0.40
spp.labels <- filter(d.spp.RAs, max.RA > .4)
# -- make plot
p <- ggplot(J.RAs.long, aes(pca.rank,
env.rank,
size = RA,
color = region))
p + geom_point() +
facet_grid(. ~ timestep) +
theme(axis.text.x = element_text(size = 8)) +
theme(axis.text.y = element_text(size = 6)) +
scale_size('Relative\nAbundance', range = c(.5,4)) +
scale_color_discrete('Region') +
labs(title = 'Neutral Model Metacommunity Simulation\nNo. generations ( ---> )',
x = 'Site ID',
y = 'Species ID') +
scale_y_continuous(
breaks = spp.labels$env.rank,
labels = spp.labels$spp
)
Conclusions. Given these model parameters for the neutral model simulation scenario, metacommunity composition remains relatively stable over 100 generations.
Chessel, D., A. B. Dufour, and J. Thioulouse. 2004. The ade4 package-I-One-table methods. R news 4:5–10.
Hubbell, S. P. 2001. A unified theory of biodiversity and biogeography. Princeton University Press.
Leibold, M. A., M. Holyoak, N. Mouquet, P. Amarasekare, J. M. Chase, M. F. Hoopes, R. D. Holt, J. B. Shurin, R. Law, D. Tilman, M. Loreau, and A. Gonzalez. 2004. The metacommunity concept: a framework for multi-scale community ecology. Ecology Letters 7:601–613.
Oksanen, J., F. G. Blanchet, R. Kindt, P. Legendre, P. R. Minchin, R. B. O’Hara, G. L. Simpson, P. Solymos, M. H. H. Stevens, and H. Wagner. 2016. vegan: Community Ecology Package. v2.3-3.
Sokol, E. R., B. L. Brown, C. C. Carey, B. M. Tornwall, C. M. Swan, and J. E. Barrett. 2015. Linking management to biodiversity in built ponds using metacommunity simulations. Ecological Modelling 296:36–45. (link)
Wickham, H., and W. Chang. 2016. devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools.