This document records, in a reproducible manner, all of the steps taken when assessing the performance of the method1 to detect directional migration, implemented in divMigrate2. Results from these analyses are published in Sundqvist et al. 2015 (insert citation when paper comes out). The \(Nm\) method from Alcala et al. (2014) is used in the code below, but was not published in the paper, since it was still under development.
Instructions and files necessary to run the code below are available at github.
All analyses are carried out using R as a scripting environment. The software fastsimcoal23 is used to simulate the various scenarios addressed. In addition to fastsimcoal2, the R package diveRsity is used extensively for file conversions and simulation parameter file generation. As well as these processes, diveRsity is used to calculate relative migration matrices, used for assessing the statistical power of the method. This is done using the function divMigrate, mentioned above.
A number of simulations will be carried out in order to test the behavior of the directional migration method under various conditions. Principally, we are interested in how the method behaves with various sample sizes and numbers of loci.
The ability of the directional migration method to discriminate between various gene flow regimes will be assessed using varying sample size and number of loci. In particular, the ability of the directional migration method to detect unidirectional, bidirectional and, no migration will be tested. These three regimes will also be assessed for three levels of migration: High, medium and low.
Gene flow patterns will be simplified to the pairwise scenario. In all cases where migration occurs in one direction but not the other, it will be from population 2 to population 1.
In all instances, we are interested in the power of the directional migration method to correctly resolve a given migration pattern. This is done by simulating multiple data under the same evolutionary condition and calculating the proportion of times we are able to resolve the original migration pattern. Initially, this method was carried out using 1000 simulation replicates, but to allow for the incorporation of more simulation parameters, the accuracy of using only 100 replicates was tested. It was found that 100 replicates provided that same power estimates to one decimal place, thus was sufficient for the purpose of the tests carried out.
parMaker is a custom function for generating .par input parameter files for fastsimcoal2. It’s default parameters are as follows:
parMaker(outfile = "test", npops = 2, popSizes = 1000,
sampSizes = 5, popGrowth = 0, numMigMat = 1,
migMat = matrix(c(0, 0.0005, 0, 0), nrow = 2,
ncol = 2, byrow = TRUE),
numHistEvents = 0, numIndependentChromo = c(1,0),
numLinkBlocks = 1, lociData = list(locType = "MICROSAT",
numLoc = 10,
recombRate = 0.0,
mu = 0.0005,
addPar = 0))
While there are a large number of arguments in this function, a smaller number are required for our simulations. These are explained in detail below.
outfile specifies the name of the output parameter file. It should be passed as a character string. This string will be assigned the .par suffix for use with fastsimcoal21.npops is an integer, specifying the number of populations to simulate.popSizes can be an integer or an integer vector (of length equal to npops). This argument specifies the sizes of base populations.sampSizes can be an integer or an integer vector (of length equal to npops). This argument specifies the sizes of samples taken from base populations at time 0.numMigMat an integer, specifying the number of migration matrices to use during simulations.migMat is a numeric matrix (if numMigMat = 1) or a list of numeric matrices (if numMigMat > 1).numHistEvents is an integer specifying the number of historical events that occur during simulations.histEvents can be a numeric vector of length = 7 (if numHistEvents = 1) or a list of numeric vectors (each of length = 7). The number of vectors in the list should be equal to numHistEvents. Each element of these vectors specify the following parameters:
lociData is a list (of length = 5) that specifies the type and properties of the marker to be simulated. The five elements of this argument are:
locType = indicate the type of marker (e.g. “MICROSAT”, “DNA”, “SNP” or “STANDARD”)numLoc = The number of markersrecombRate = the recombination rate of the markersmu = the mutation rate of the markersaddPar = an additional parameter (e.g. mutation rate per bp for DNA type marker).# load parMaker into R
source("parMaker.R")
# clear demo directory
system("rm -r demo")
parMaker(outfile = "demo", popSizes = 2000, sampSizes = 10,
numMigMat = 1, migMat = matrix(c(0.0, 0.005, 0.0, 0.0),
nrow = 2, ncol = 2,
byrow = TRUE),
numIndependentChromo = c(5, 0),
numHistEvents = 1, histEvents = c(100, 0, 1, 0, 2, 0, 0),
lociData = list(locType = "MICROSAT",
numLoc = 1,
recombRate = 0,
mu = 0.001,
addPar = 0))
# read the par file
par_demo <- readLines("demo.par")
cat(paste(par_demo, collapse = "\n"))
//Number of population samples (demes)
2 samples to simulate :
//Population effective sizes (number of genes)
2000
2000
//Samples sizes
10
10
//Growth rates : negative growth implies population expansion
0
0
//Number of migration matrices : 0 implies no migration between demes
1
//migration matrix
0.000 0.005
0.000 0.000
//historical event: time, source, sink, migrants, new size, new growth rate, migr. matrix
1 historical event
100 0 1 0 2 0 0
//Number of independent loci [chromosome]
5 0
//Per chromosome: Number of linkage blocks
1
//per Block: data type, num loci, rec. rate and mut rate + optional parameters
MICROSAT 1 0 0.001 0
We can also check if this file is readable by fastsimcoal2 as follows:
system("./fastsimcoal21 -i demo.par -n 1 -g")
arl_file <- readLines("demo/demo_1_1.arp")
cat(arl_file, sep = "\n")
#Arlequin input file written by the simulation program fastsimcoal.exe
[Profile]
Title="A series of simulated samples"
NbSamples=2
GenotypicData=1
GameticPhase=0
RecessiveData=0
DataType=MICROSAT
LocusSeparator=WHITESPACE
MissingData='?'
[Data]
[[Samples]]
#Number of independent chromosomes: 5
#Total number of polymorphic sites: 5
# 1 polymorphic positions on chromosome 1
#1
# 1 polymorphic positions on chromosome 2
#1
# 1 polymorphic positions on chromosome 3
#1
# 1 polymorphic positions on chromosome 4
#1
# 1 polymorphic positions on chromosome 5
#1
SampleName="Sample 1"
SampleSize=5
SampleData= {
1_1 1 500 497 495 501 501
498 498 495 497 502
1_2 1 501 499 498 499 501
500 504 498 499 502
1_3 1 500 507 496 501 500
500 498 496 500 503
1_4 1 502 506 497 502 502
497 498 497 501 502
1_5 1 500 499 497 502 502
500 506 499 501 502
}
SampleName="Sample 2"
SampleSize=5
SampleData= {
2_1 1 500 498 497 501 500
501 507 497 500 501
2_2 1 500 506 499 501 500
499 498 499 501 500
2_3 1 501 499 497 501 500
500 504 497 502 500
2_4 1 499 504 497 501 500
497 498 497 500 502
2_5 1 501 504 498 499 500
497 504 497 499 499
}
[[Structure]]
StructureName="Simulated data"
NbGroups=1
Group={
"Sample 1"
"Sample 2"
}
We can see that the parMaker function has generated a parameter file which fastsimcoal2 has used to generate a .arp genotype file. We can convert this file to a genepop for and analyse it using the divMigrate function in diveRsity as follows:
library(diveRsity)
arp2gen("demo/demo_1_1.arp")
# test that genepop file looks ok
demo_gen <- readLines("demo/demo_1_1.gen")
cat(demo_gen, sep = "\n")
demo/demo_1_1_gen_converted
locus1
locus2
locus3
locus4
locus5
POP
pop1 , 499500 498499 497488 500503 502502
pop1 , 500499 500499 498489 504503 500500
pop1 , 499501 498501 496497 500503 502502
pop1 , 499500 499499 491496 503501 500500
pop1 , 500502 498500 488488 503496 499503
POP
pop2 , 500500 500498 496495 504500 504500
pop2 , 503500 500498 496497 503499 500500
pop2 , 499498 501501 496497 500501 500503
pop2 , 499502 500500 488496 504499 499503
pop2 , 498500 501501 489496 499504 499500
# run `divMigrate`
demo_res <- divMigrate("demo/demo_1_1.gen")
qgraph::qgraph(demo_res[[1]])
We can see that the analysis was successful and our network plot demonstrates that the higher migration occurs from pop2 to pop1 (i.e. darker connection), as specified in our par file.
In this section we will simulate data under the various levels of population divergence for each of the three migration patterns mentioned above. Namely, the divergence between population pairs will be \(F_{ST} = 0.5, 0.05, 0.005\) corresponding to low, medium and high migration rates, respectively. Using the equation \(F_{ST} \approx \frac{1}{4Nm + 1}\), these divergences result in migration rates of \(\approx\) 0.00025, 0.005 and 0.05, respectively. The ability to resolve unidirectional, bidirectional and no migration regimes for variable sample sizes and number of loci will be assessed. Statistical power will be estimated for all three migration statistics provided in divMigrate (i.e. \(D_{Jost}\)4, \(G_{ST}\)5 and \(Nm\)6). For all simulations, the base population size will be 1000, locus mutation rate will be \(5 \times 10^{-4}\) (A mutation rate common in vertebrates7).
In these simulations we will test the statistical power of the method to detect the correct direction of migration at the three divergence levels mentioned above. To ease the simulation and calculation process, a custom wrapper function that generates our parameter files for fastsimcoal2, converts .arp files to the genepop format and calculates the proportion of times migration in the direction \(pop1 \rightarrow pop2\) is less than \(pop1 \leftarrow pop2\) (i.e. statistical power to resolve a given migration regime), will be used.
The code to run the simulations and to compile the results from divMigrate is below.
# load simulation wrapper function
source("simFun.R")
# define divergences
fst <- c(0.5, 0.05, 0.005)
# define sample sizes
n <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# create simulation names for each divergence level
low_nms <- paste("low", n, sep = "")
med_nms <- paste("med", n, sep = "")
hi_nms <- paste("high", n, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, samp_size = n,
MoreArgs = list(fst = 0.5), SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), N = n, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, samp_size = n,
MoreArgs = list(fst = 0.05), SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), N = n, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, samp_size = n,
MoreArgs = list(fst = 0.005), SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), N = n, mig = "high")
# Comile the results
unidat_ss <- rbind(lowdf, meddf, hidf)
# create a directory for all results
system("mkdir sim-results")
# save results for later use
save(unidat_ss, file="sim-results/uni_sam_size.Rdata")
Setting up the plotting environment.
# load required packages
library(reshape) # reshape main data
library(ggplot2) # plotting results
library(wesanderson) # creating colour palettes
# manipulate data
unidat_ss$N <- as.factor(unidat_ss$N)
unidat_ss <- melt(unidat_ss)
colnames(unidat_ss) <- c("N", "migration", "stat", "power")
# create palette
pal <- wes.palette(3, "Darjeeling")
d <- unidat_ss[unidat_ss$stat == "d",]
p <- ggplot(data = d, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(D"["Jost"]*")"))
p <- p + theme_bw()
p
Figure 1: Statistical power to resolve unidirectional migration using \(D_{Jost}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various sample sizes.
g <- unidat_ss[unidat_ss$stat == "g",]
p <- ggplot(data = g, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(G"["ST"]*")"))
p <- p + theme_bw()
p
Figure 2: Statistical power to resolve unidirectional migration using \(G_{ST}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various sample sizes.
nm <- unidat_ss[unidat_ss$stat == "n",]
p <- ggplot(data = nm, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(Nm)"))
p <- p + theme_bw()
p
Figure 3: Statistical power to resolve unidirectional migration using \(Nm_{Alcala}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various sample sizes.
# create simulation names for each divergence level
low_nms <- paste("bilow", n, sep = "")
med_nms <- paste("bimed", n, sep = "")
hi_nms <- paste("bihigh", n, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, samp_size = n,
MoreArgs = list(fst = 0.5, mig_mod = "bi"),
SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), N = n, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, samp_size = n,
MoreArgs = list(fst = 0.05, mig_mod = "bi"),
SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), N = n, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, samp_size = n,
MoreArgs = list(fst = 0.005, mig_mod = "bi"),
SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), N = n, mig = "high")
# Combine data
bidat_ss <- rbind(lowdf, meddf, hidf)
# save results for later use
save(bidat_ss, file="sim-results/bi_sam_size.Rdata")
Setting up the plotting environment.
# manipulate data
bidat_ss$N <- as.factor(bidat_ss$N)
bidat_ss <- melt(bidat_ss)
colnames(bidat_ss) <- c("N", "migration", "stat", "power")
d <- bidat_ss[bidat_ss$stat == "d",]
p <- ggplot(data = d, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(D"["Jost"]*")"))
p <- p + theme_bw()
p
Figure 4: Statistical power to resolve bidirectional migration using \(D_{Jost}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable sample sizes.
g <- bidat_ss[bidat_ss$stat == "g",]
p <- ggplot(data = g, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(G"["ST"]*")"))
p <- p + theme_bw()
p
Figure 5: Statistical power to resolve bidirectional migration using \(G_{ST}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable sample sizes.
nm <- bidat_ss[bidat_ss$stat == "n",]
p <- ggplot(data = nm, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(Nm)"))
p <- p + theme_bw()
p
Figure 6: Statistical power to resolve bidirectional migration using \(Nm_{Alcala}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable sample sizes.
# create simulation names
low_nms <- paste("nolow", n, sep = "")
# run analyses
# no migration
low <- mapply(simFun, outname = low_nms, samp_size = n,
MoreArgs = list(mig_mod = "none"),
SIMPLIFY = FALSE)
nodat_ss <- data.frame(d = sapply(low, "[[", "d"),
g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), N = n,
mig = "low")
# save results for later use
save(nodat_ss, file="sim-results/no_sam_size.Rdata")
Setting up the plotting environment.
# manipulate data
nodat_ss$N <- as.factor(nodat_ss$N)
nodat_ss <- melt(nodat_ss)
colnames(nodat_ss) <- c("N", "migration", "stat", "power")
# create palette
pal <- wes.palette(3, "Moonrise2")
p <- ggplot(data = nodat_ss, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = stat, color = stat,
lty = stat), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle("No migration")
p <- p + theme_bw()
p
Figure 7: Statistical power to resolve a no migration model using \(D_{Jost}\), \(G_{ST}\) and \(Nm_{Alcala}\) to calculate relative migration rates for variable sample sizes.
In these simulations we will test the statistical power of the directional migration method to detect the correct migration scenario at the three migration rates mentioned above, for variable numbers of microsatellite loci. For all simulations, base population size will be 1000, while the sample size will be fixed at 50. All other parameter will be the same as above. Statistical power is also estimated in the same way.
# Define locus numbers
nloci <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# create simulation names for each divergence level
low_nms <- paste("lownloc", nloci, sep = "")
med_nms <- paste("mednloc", nloci, sep = "")
hi_nms <- paste("highnloc", nloci, sep = "")
# run analyses
# low migration
low <- mapply(simFun, outname = low_nms, nloci = nloci,
MoreArgs = list(fst = 0.5, samp_size = 50, nsim = 10),
SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), nloci = nloci, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, nloci = nloci,
MoreArgs = list(fst = 0.05, samp_size = 50),
SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), nloci = nloci, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, nloci = nloci,
MoreArgs = list(fst = 0.005, samp_size = 50),
SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), nloci = nloci, mig = "high")
# Combine data
unidat_nl <- rbind(lowdf, meddf, hidf)
# save results for later use
save(unidat_nl, file = "sim-results/uni_nloc.Rdata")
Setting up the plotting environment.
# manipulate data
unidat_nl$nloci <- as.factor(unidat_nl$nloci)
unidat_nl <- melt(unidat_nl)
colnames(unidat_nl) <- c("nloci", "migration", "stat", "power")
# create palette
pal <- wes.palette(3, "Darjeeling")
d <- unidat_nl[unidat_nl$stat == "d",]
p <- ggplot(data = d, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(D"["Jost"]*")"))
p <- p + theme_bw()
p
Figure 8: Statistical power to resolve unidirectional migration using \(D_{Jost}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various numbers of loci.
g <- unidat_nl[unidat_nl$stat == "g",]
p <- ggplot(data = g, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(G"["ST"]*")"))
p <- p + theme_bw()
p
Figure 9: Statistical power to resolve unidirectional migration using \(G_{ST}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various numbers of loci.
nm <- unidat_nl[unidat_nl$stat == "n",]
p <- ggplot(data = nm, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(Nm)"))
p <- p + theme_bw()
p
Figure 10: Statistical power to resolve unidirectional migration using \(Nm_{Alcala}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various numbers of loci.
# Create simulation names for each divergence level
low_nms <- paste("bilownloc", nloci, sep = "")
med_nms <- paste("bimednloc", nloci, sep = "")
hi_nms <- paste("bihighnloc", nloci, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, nloci = nloci,
MoreArgs = list(fst = 0.5, mig_mod = "bi",
samp_size = 50),
SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"), nloci = nloci, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, nloci = nloci,
MoreArgs = list(fst = 0.05, mig_mod = "bi",
samp_size = 50),
SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
n = sapply(med, "[[", "n"), nloci = nloci, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, nloci = nloci,
MoreArgs = list(fst = 0.005, mig_mod = "bi",
samp_size = 50),
SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
n = sapply(hi, "[[", "n"), nloci = nloci, mig = "high")
# Combine data
bidat_nl <- rbind(lowdf, meddf, hidf)
# save results for later use
save(bidat_nl, file="sim-results/bi_nloc.Rdata")
Setting up the plotting environment
# manipulate data
bidat_nl$nloci <- as.factor(bidat_nl$nloci)
bidat_nl <- melt(bidat_nl)
colnames(bidat_nl) <- c("nloci", "migration", "stat", "power")
d <- bidat_nl[bidat_nl$stat == "d",]
p <- ggplot(data = d, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(D"["Jost"]*")"))
p <- p + theme_bw()
p
Figure 11: Statistical power to resolve bidirectional migration using \(D_{Jost}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable numbers of loci.
g <- bidat_nl[bidat_nl$stat == "g",]
p <- ggplot(data = g, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(G"["ST"]*")"))
p <- p + theme_bw()
p
Figure 12: Statistical power to resolve bidirectional migration using \(G_{ST}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable numbers of loci.
nm <- bidat_nl[bidat_nl$stat == "n",]
p <- ggplot(data = nm, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration,
lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(Nm)"))
p <- p + theme_bw()
p
Figure 13: Statistical power to resolve bidirectional migration using \(Nm_{Alcala}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable numbers of loci.
# create simulation names
low_nms <- paste("nolow", nloci, sep = "")
# run analyses
# low migration
low <- mapply(simFun, outname = low_nms, nloci = nloci,
MoreArgs = list(fst = 0.5, mig_mod = "none",
samp_size = 50),
SIMPLIFY = FALSE)
nodat_nl <- data.frame(d = sapply(low, "[[", "d"),
g = sapply(low, "[[", "g"),
n = sapply(low, "[[", "n"),
nloci = nloci, mig = "low")
save(nodat_nl, file="sim-results/no_nloc.Rdata")
# manipulate data
nodat_nl$nloci <- as.factor(nodat_nl$nloci)
nodat_nl <- melt(nodat_nl)
colnames(nodat_nl) <- c("nloci", "migration", "stat", "power")
# create palette
pal <- wes.palette(3, "Moonrise2")
# plotting
p <- ggplot(data = nodat_nl, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = stat, color = stat,
lty = stat), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("No migration"))
p <- p + theme_bw()
p
Figure 14: Statistical power to resolve a no migration model using \(D_{Jost}\), \(G_{ST}\) and \(Nm_{Alcala}\) to calculate relative migration rates for variable numbers of loci.
Sample size effects
When testing power to detect unidirectional migration, the method was most successful under a moderate migration regime (i.e. migration = 0.005). Power appeared to plateau for the medium migration rate when sample size was \(\geq\) 30, while power failed to plateau for the low and high migration rates, even up to the maximum sample sizes tested. Power to detect unidirectional migration under low migration rates was consistently above 0.8 for sample sizes \(\geq\) 30. The \(G_{ST}\) and \(Nm_{Alcala}\) methods performed slightly better than the \(D_{Jost}\) approach when the migration rate was low. These results are in line with recent observations of the relative properties of \(D_{Jost}\) and \(G_{ST}\) under variable demographic scenarios 8. Table 1 contains the general guidelines for using \(G_{ST}\) and \(D_{Jost}\) as suggested in Alcala et al., (2014).
Table 1: Use of \(G_{ST}\) and \(D_{Jost}\) when parameter \(n\) (number of populations) and \(\theta\) (scaled mutation rate) are known. Modified version of Table 4 from Alcala et al., (2014).
| Mutation regime | Measure to use |
|---|---|
| \(n\theta < 1\) | \(D_{Jost}\) |
| \(\theta < 1 < n\theta\) | \(G_{ST}\) |
| \(1 < \theta < n\theta\) | \(D_{Jost}\) |
In all of the simulations, the expression \(\theta < 1 < n\theta\) was true, explaining the apparently better performance of \(G_{ST}\) over \(D_{Jost}\). Future work might include a more comprehensive exploration of the relative ability to detect directional migration using either \(D_{Jost}\) or \(G_{ST}\), under a wider range of demographic scenarios. Power to detect directional migration at high migration levels was very low for most sample sizes tested. This is likely due to the homogenizing effect of such a migration rate, thus the difference between populations were minimal. However, the upward trend in power for both increasing sample size and number of loci suggest that under the correct experimental design the method may be capable of resolving such patterns.
Locus number effects
In general, the patterns observed for increasing number of loci are similar to those for increasing sample sizes. Again power was highest for the moderate migration rate, while low migration had the next highest power at all numbers of loci. Again, the method failed to detect the correct direction of migration with sufficiently high levels of accuracy when migration was high. In this instance, power appeared to plateau around 1.0 for the low migration rate when the number of loci was \(geq\) 90, supporting the speculation above that more loci and larger sample sizes can greatly improve the performance of the method.
Simulations of populations under scenarios of bidirectional migration and no migration were carried out to test the rate of type I errors. This was calculated by estimating the number of simulations that resulted in a larger relative migration rate from \(pop 2 \rightarrow pop 1\), thus the expected power was 0.5. These analyses were interested in deviations from this expectation, since it indicates the risk of false positives when using the directional migration method.
Sample size effects
Generally, for all sample sizes tested, power clustered around 0.5. There were some erratic deviations form this pattern (e.g. the medium line in fig 5 where N = 40). These deviations may be due to using only 100 replicates when estimating power. Using a larger number of replicates may result is more accurate conformance to expectations.
The patterns observed using \(G_{ST}\) and \(Nm_{Alcala}\) appeared to be less erratic than those using \(D_{Jost}\), again in line with the observations of Alcala et al., (2014). Interestingly, the pattern observed for the high migration rate was less erratic than for low and medium migration rates.
Number of loci effects
The results were, again, very similar to those for sample size effects.
Sundqvist, L., Zackrisson, M., & Kleinhans, D., (2013), Directional genetic differentiation and asymmetric migration, Populations and Evolution subject @ arXiv:1304.0118 [q-bio.PE]↩
Keenan, K., McGinnity, P., Cross, T. F., Crozier, W. W., Prodöhl, P. A. (2013), diveRsity: An R package for the estimation and exploration of population genetics parameters and their associated errors. Methods in Ecology and Evolution, 4: 782–788. doi: 10.1111/2041-210X.12067↩
Excoffier, Laurent, and Matthieu Foll. “Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios.” Bioinformatics 27.9 (2011): 1332-1334.↩
Jost, L. (2008) GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015–4026↩
Nei M. 1973. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, USA., 70(12, Pt 1), 3321-3323↩
Alcala, Nicolas, Jérôme Goudet, and Séverine Vuilleumier. “On the transition of genetic differentiation from isolation to panmixia: What we can learn from GST and D.” Theoretical population biology 93 (2014): 75-84.↩
Bhargava, A., Fuentes, F.F., 2010. Mutational dynamics of microsatellites. Molecular Biotechnology 44, 250–266↩
Alcala, Nicolas, Jérôme Goudet, and Séverine Vuilleumier. “On the transition of genetic differentiation from isolation to panmixia: What we can learn from GST and D.” Theoretical population biology 93 (2014): 75-84.↩