This document records, in a reproducible manner, all of the steps taken when assessing the performance of the method presented in Sundqvist et al. 2015 (citation will be inserted when paper is accepted). The method is implemented in the function divMigrate in the R-package diveRsity1 and a shiny application called divMigrate-online is availible here. Results from these simulations are published in Sundqvist et al. 2015.
Instructions and files necessary to run the code below are available at github.
All analyses are carried out using R as a scripting environment. The software fastsimcoal22 is used to simulate the various scenarios addressed. In addition to fastsimcoal2, the R package diveRsity is used extensively for file conversions and simulation parameter file generation. As well as these processes, diveRsity is used to calculate directional relative migration matrices, used for assessing the power of the method. This is done using the function divMigrate.
In these simulations the power of the method presented in Sundqvist et al. 2015 it tested under various gene flow patterns and sample designs. This is done by simulating data sets under three distinct gene flow patterns (unidirectional, symmetric bidirectional and asymmetric bidirectional). Unidirectional gene flow is simulated in the direction from population two to population one, bidirectional symmetric gene flow is simulated in both directions at an equal rate (\(m/2\)) and asymmetric bidirectional gene flow is simulated in both direction but at unequal rate (\(m*1/4\) and \(m*3/4\)). The ability of the method to resolve each of these migration patterns was tested for three levels of gene flow, low (m = 0.00025), medium (m = 0.005) and high (m = 0.05). For each combination of gene flow pattern and gene flow rate, 1000 microsatellite data sets is simulated for increasing sample sizes and increasing number of loci. When evaluating increasing sample sizes the number of loci is fixed at 50 and when evaluating the number of loci the sample size is fixed to 50. Power is estimated for calculations of directional relative migration calculated from Gst and Jost´s D. For all simulations, the base population size is 1000 and locus mutation rate is 5×10−4. To ease the simulation and calculation process, a custom wrapper function that generates our parameter files for fastsimcoal2, converts .arp files to the genepop format and calculates the proportion of times migration in the direction pop1→pop2 is less than pop1←pop2 (i.e. power to resolve a given migration regime), will be used.
To run the simulations we first need to install packages, load files and define the divergence levels, sample sizes and number of loci we want to evaluate.
install.packages("diveRsity")
install.packages("ggplot2")
install.packages("reshape")
#load parMaker into R
source("parMaker.R")
#load simulation wrapper function
source("simFun.R")
#load function to create multiplots
source("multiplot.R")
#define divergences
fst <- c(0.5, 0.05, 0.005)
#define sample sizes
n <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# define number of loci
nloci <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# create a directory for all results
system("mkdir sim-results")
Then we can run the first simulation, unidirectional gene flow and increasing sample sizes. The simulation is executed and plots are generated by running the code below.
# create simulation names for each divergence level
low_nms <- paste("lowuni", n, sep = "")
med_nms <- paste("meduni", n, sep = "")
hi_nms <- paste("highuni", n, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, samp_size = n,
MoreArgs = list(fst = 0.5, mig_mod = "uni"), SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), N = n, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, samp_size = n,
MoreArgs = list(fst = 0.05, mig_mod = "uni"), SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), N = n, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, samp_size = n,
MoreArgs = list(fst = 0.005, mig_mod = "uni"), SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), N = n, mig = "high")
# Combine data
unidat_ss <- rbind(lowdf, meddf, hidf)
# save results for later use
save(unidat_ss, file="sim-results/uni_sam_size.Rdata")
# manipulate data
unidat_ss$N <- as.factor(unidat_ss$N)
unidat_ss <- melt(unidat_ss)
colnames(unidat_ss) <- c("N", "migration", "stat", "power")
# making plots
d <- unidat_ss[unidat_ss$stat == "d",]
p1 <- ggplot(data = d, aes(x = N, y = power))
p1 <- p1 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p1 <- p1 + scale_y_continuous(limits = c(0,1))
p1 <- p1 + ggtitle(expression("Unidirectional migration "*"(D"["Jost"]*")"))
p1 <- p1 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p1 <- p1+xlab("Sample size") + ylab("Power")
g <- unidat_ss[unidat_ss$stat == "g",]
p2 <- ggplot(data = g, aes(x = N, y = power))
p2 <- p2 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p2 <- p2 + scale_y_continuous(limits = c(0,1))
p2 <- p2 + ggtitle(expression("Unidirectional migration "*"(G"["ST"]*")"))
p2 <- p2 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p2 <- p2+xlab("Sample size") + ylab("Power")
multiplot(p1, p2, cols=2)
By changing all “uni” to “symbi” in the code above we can run a simulation with symmetric bidirectional migration instead.
# create simulation names for each divergence level
low_nms <- paste("lowsymbi", n, sep = "")
med_nms <- paste("medsymbi", n, sep = "")
hi_nms <- paste("highsymbi", n, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, samp_size = n,
MoreArgs = list(fst = 0.5, mig_mod = "symbi"), SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), N = n, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, samp_size = n,
MoreArgs = list(fst = 0.05, mig_mod = "symbi"), SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), N = n, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, samp_size = n,
MoreArgs = list(fst = 0.005, mig_mod = "symbi"), SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), N = n, mig = "high")
# Combine data
symbidat_ss <- rbind(lowdf, meddf, hidf)
# save results for later use
save(symbidat_ss, file="sim-results/symbi_sam_size.Rdata")
# manipulate data
symbidat_ss$N <- as.factor(symbidat_ss$N)
symbidat_ss <- melt(symbidat_ss)
colnames(symbidat_ss) <- c("N", "migration", "stat", "power")
# making plots
d <- symbidat_ss[symbidat_ss$stat == "d",]
p1 <- ggplot(data = d, aes(x = N, y = power))
p1 <- p1 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p1 <- p1 + scale_y_continuous(limits = c(0,1))
p1 <- p1 + ggtitle(expression("Symmetric bidirectional migration "*"(D"["Jost"]*")"))
p1 <- p1 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p1 <- p1+xlab("Sample size") + ylab("Power")
g <- symbidat_ss[symbidat_ss$stat == "g",]
p2 <- ggplot(data = g, aes(x = N, y = power))
p2 <- p2 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p2 <- p2 + scale_y_continuous(limits = c(0,1))
p2 <- p2 + ggtitle(expression("Symmetric bidirectional migration "*"(G"["ST"]*")"))
p2 <- p2 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p2 <- p2+xlab("Sample size") + ylab("Power")
multiplot(p1, p2, cols=2)
To run the asymmetric bidirectional migration we continue and change all “symbi” to “asymbi”.
# create simulation names for each divergence level
low_nms <- paste("lowasymbi", n, sep = "")
med_nms <- paste("medasymbi", n, sep = "")
hi_nms <- paste("highasymbi", n, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, samp_size = n,
MoreArgs = list(fst = 0.5, mig_mod = "asymbi"), SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), N = n, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, samp_size = n,
MoreArgs = list(fst = 0.05, mig_mod = "asymbi"), SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), N = n, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, samp_size = n,
MoreArgs = list(fst = 0.005, mig_mod = "asymbi"),
SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), N = n, mig = "high")
# Combine data
asymbidat_ss <- rbind(lowdf, meddf, hidf)
# save results for later use
save(asymbidat_ss, file="sim-results/asymbi_sam_size.Rdata")
# manipulate data
asymbidat_ss$N <- as.factor(asymbidat_ss$N)
asymbidat_ss <- melt(asymbidat_ss)
colnames(asymbidat_ss) <- c("N", "migration", "stat", "power")
# making plots
d <- asymbidat_ss[asymbidat_ss$stat == "d",]
p1 <- ggplot(data = d, aes(x = N, y = power))
p1 <- p1 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p1 <- p1 + scale_y_continuous(limits = c(0,1))
p1 <- p1 + ggtitle(expression("Asymmetric bidirectional migration "*"(D"["Jost"]*")"))
p1 <- p1 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p1 <- p1+xlab("Sample size") + ylab("Power")
g <- asymbidat_ss[asymbidat_ss$stat == "g",]
p2 <- ggplot(data = g, aes(x = N, y = power))
p2 <- p2 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p2 <- p2 + scale_y_continuous(limits = c(0,1))
p2 <- p2 + ggtitle(expression("Asymmetric bidirectional migration "*"(G"["ST"]*")"))
p2 <- p2 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p2 <- p2+xlab("Sample size") + ylab("Power")
multiplot(p1, p2, cols=2)
Now we want to test the same three migration patterns again but while increasing the number of loci instead. To due this we have to make some small changes to the code above as for example changing sample size (ss) to number of loci (nl), fixating sample size to 50 and increasing the number of loci according to nloci.
# Create simulation names for each divergence level
low_nms <- paste("unilowloci", nloci, sep = "")
med_nms <- paste("unimedloci", nloci, sep = "")
hi_nms <- paste("unihighloci", nloci, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, nloci = nloci,
MoreArgs = list(fst = 0.5, mig_mod = "uni", samp_size = 50), SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), nloci = nloci, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, nloci = nloci,
MoreArgs = list(fst = 0.05, mig_mod = "uni", samp_size = 50), SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), nloci = nloci, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, nloci = nloci,
MoreArgs = list(fst = 0.005, mig_mod = "uni", samp_size = 50), SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), nloci = nloci, mig = "high")
# Combine data
unidat_nl <- rbind(lowdf, meddf, hidf)
# save results for later use
save(unidat_nl, file="sim-results/uni_loci.Rdata")
# manipulate data
unidat_nl$nloci <- as.factor(unidat_nl$nloci)
unidat_nl <- melt(unidat_nl)
colnames(unidat_nl) <- c("nloci", "migration", "stat", "power")
# making plots
d <- unidat_nl[unidat_nl$stat == "d",]
p1 <- ggplot(data = d, aes(x = nloci, y = power))
p1 <- p1 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p1 <- p1 + scale_y_continuous(limits = c(0,1))
p1 <- p1 + ggtitle(expression("Unidirectional migration "*"(D"["Jost"]*")"))
p1 <- p1 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p1 <- p1+xlab("Number of loci") + ylab("Power")
g <- unidat_nl[unidat_nl$stat == "g",]
p2 <- ggplot(data = g, aes(x = nloci, y = power))
p2 <- p2 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p2 <- p2 + scale_y_continuous(limits = c(0,1))
p2 <- p2 + ggtitle(expression("Unidirectional migration "*"(G"["ST"]*")"))
p2 <- p2 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p2 <- p2+xlab("Number of loci") + ylab("Power")
multiplot(p1, p2, cols=2)
To due the same procedure for symmetric bidirectional migration we again change all “uni” to “symbi” in the code above.
# Create simulation names for each divergence level
low_nms <- paste("symbilownloc", nloci, sep = "")
med_nms <- paste("symbimednloc", nloci, sep = "")
hi_nms <- paste("symbihighnloc", nloci, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, nloci = nloci,
MoreArgs = list(fst = 0.5, mig_mod = "symbi", samp_size = 50), SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), nloci = nloci, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, nloci = nloci,
MoreArgs = list(fst = 0.05, mig_mod = "symbi", samp_size = 50), SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), nloci = nloci, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, nloci = nloci,
MoreArgs = list(fst = 0.005, mig_mod = "symbi", samp_size = 50), SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), nloci = nloci, mig = "high")
# Combine data
symbidat_nl <- rbind(lowdf, meddf, hidf)
# save results for later use
save(symbidat_nl, file="sim-results/symbi_loci.Rdata")
# manipulate data
symbidat_nl$nloci <- as.factor(symbidat_nl$nloci)
symbidat_nl <- melt(symbidat_nl)
colnames(symbidat_nl) <- c("nloci", "migration", "stat", "power")
# making plots
d <- symbidat_nl[symbidat_nl$stat == "d",]
p1 <- ggplot(data = d, aes(x = nloci, y = power))
p1 <- p1 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p1 <- p1 + scale_y_continuous(limits = c(0,1))
p1 <- p1 + ggtitle(expression("Symmetric bidirectional migration "*"(D"["Jost"]*")"))
p1 <- p1 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p1 <- p1+xlab("Number of loci") + ylab("Power")
g <- symbidat_nl[symbidat_nl$stat == "g",]
p2 <- ggplot(data = g, aes(x = nloci, y = power))
p2 <- p2 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p2 <- p2 + scale_y_continuous(limits = c(0,1))
p2 <- p2 + ggtitle(expression("Symmetric bidirectional migration "*"(G"["ST"]*")"))
p2 <- p2 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p2 <- p2+xlab("Number of loci") + ylab("Power")
multiplot(p1, p2, cols=2)
To due the last simulation, asymmetric bidirectional migration and increasing number of loci we continue and change “symbi” to “asymbi”.
# Create simulation names for each divergence level
low_nms <- paste("asymbilownloc", nloci, sep = "")
med_nms <- paste("asymbimednloc", nloci, sep = "")
hi_nms <- paste("asymbihighnloc", nloci, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, nloci = nloci,
MoreArgs = list(fst = 0.5, mig_mod = "asymbi", samp_size = 50), SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), nloci = nloci, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, nloci = nloci,
MoreArgs = list(fst = 0.05, mig_mod = "asymbi", samp_size = 50), SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), nloci = nloci, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, nloci = nloci,
MoreArgs = list(fst = 0.005, mig_mod = "asymbi", samp_size = 50), SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), nloci = nloci, mig = "high")
# Combine data
asymbidat_nl <- rbind(lowdf, meddf, hidf)
# save results for later use
save(asymbidat_nl, file="sim-results/asymbi_loci.Rdata")
# manipulate data
asymbidat_nl$nloci <- as.factor(asymbidat_nl$nloci)
asymbidat_nl <- melt(asymbidat_nl)
colnames(asymbidat_nl) <- c("nloci", "migration", "stat", "power")
# making plots
d <- asymbidat_nl[asymbidat_nl$stat == "d",]
p1 <- ggplot(data = d, aes(x = nloci, y = power))
p1 <- p1 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p1 <- p1 + scale_y_continuous(limits = c(0,1))
p1 <- p1 + ggtitle(expression("Asymmetric bidirectional migration "*"(D"["Jost"]*")"))
p1 <- p1 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p1 <- p1+xlab("Number of loci") + ylab("Power")
g <- asymbidat_nl[asymbidat_nl$stat == "g",]
p2 <- ggplot(data = g, aes(x = nloci, y = power))
p2 <- p2 + geom_line(aes(group = migration, color = migration, lty = migration), size = 2)
p2 <- p2 + scale_y_continuous(limits = c(0,1))
p2 <- p2 + ggtitle(expression("Asymmetric bidirectional migration "*"(G"["ST"]*")"))
p2 <- p2 + theme(axis.text = element_text(size = rel(1.3)), axis.title.x = element_text(size = rel(1.5)),axis.title.y = element_text(size = rel(1.5)), axis.text = element_text(colour = "black"),legend.title = element_text(colour="black", size=16), legend.text = element_text(colour="black", size = 16), panel.background = element_rect(colour = "black"))
p2 <- p2+xlab("Number of loci") + ylab("Power")
multiplot(p1, p2, cols=2)
Keenan, K., McGinnity, P., Cross, T. F., Crozier, W. W., Prodöhl, P. A. (2013), diveRsity: An R package for the estimation and exploration of population genetics parameters and their associated errors. Methods in Ecology and Evolution, 4: 782–788. doi: 10.1111/2041-210X.12067↩
Excoffier, Laurent, and Matthieu Foll. “Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios.” Bioinformatics 27.9 (2011): 1332-1334.↩