Introduction

This document records, in a reproducible manner, all of the steps taken when assessing the performance of the method1 to detect directional migration, implemented in divMigrate2. Results from these analyses are published in Sundqvist et al. 2015 (insert citation when paper comes out). The \(Nm\) method from Alcala et al. (2014) is used in the code below, but was not published in the paper, since it was still under development.

Instructions and files necessary to run the code below are available at github.

Software used

All analyses are carried out using R as a scripting environment. The software fastsimcoal23 is used to simulate the various scenarios addressed. In addition to fastsimcoal2, the R package diveRsity is used extensively for file conversions and simulation parameter file generation. As well as these processes, diveRsity is used to calculate relative migration matrices, used for assessing the statistical power of the method. This is done using the function divMigrate, mentioned above.

Simulations

A number of simulations will be carried out in order to test the behavior of the directional migration method under various conditions. Principally, we are interested in how the method behaves with various sample sizes and numbers of loci.

Migration model

The ability of the directional migration method to discriminate between various gene flow regimes will be assessed using varying sample size and number of loci. In particular, the ability of the directional migration method to detect unidirectional, bidirectional and, no migration will be tested. These three regimes will also be assessed for three levels of migration: High, medium and low.

Gene flow patterns will be simplified to the pairwise scenario. In all cases where migration occurs in one direction but not the other, it will be from population 2 to population 1.

Statistical power

In all instances, we are interested in the power of the directional migration method to correctly resolve a given migration pattern. This is done by simulating multiple data under the same evolutionary condition and calculating the proportion of times we are able to resolve the original migration pattern. Initially, this method was carried out using 1000 simulation replicates, but to allow for the incorporation of more simulation parameters, the accuracy of using only 100 replicates was tested. It was found that 100 replicates provided that same power estimates to one decimal place, thus was sufficient for the purpose of the tests carried out.

Generating fastsimcoal2 par files

parMaker is a custom function for generating .par input parameter files for fastsimcoal2. It’s default parameters are as follows:

parMaker(outfile = "test", npops = 2, popSizes = 1000, 
         sampSizes = 5, popGrowth = 0, numMigMat = 1, 
         migMat = matrix(c(0, 0.0005, 0, 0), nrow = 2,
                         ncol = 2, byrow = TRUE),
         numHistEvents = 0, numIndependentChromo = c(1,0),
         numLinkBlocks = 1, lociData = list(locType = "MICROSAT",
                                            numLoc = 10,
                                            recombRate = 0.0,
                                            mu = 0.0005,
                                            addPar = 0))

While there are a large number of arguments in this function, a smaller number are required for our simulations. These are explained in detail below.

Arguments

  • outfile specifies the name of the output parameter file. It should be passed as a character string. This string will be assigned the .par suffix for use with fastsimcoal21.
  • npops is an integer, specifying the number of populations to simulate.
  • popSizes can be an integer or an integer vector (of length equal to npops). This argument specifies the sizes of base populations.
  • sampSizes can be an integer or an integer vector (of length equal to npops). This argument specifies the sizes of samples taken from base populations at time 0.
  • numMigMat an integer, specifying the number of migration matrices to use during simulations.
  • migMat is a numeric matrix (if numMigMat = 1) or a list of numeric matrices (if numMigMat > 1).
  • numHistEvents is an integer specifying the number of historical events that occur during simulations.
  • histEvents can be a numeric vector of length = 7 (if numHistEvents = 1) or a list of numeric vectors (each of length = 7). The number of vectors in the list should be equal to numHistEvents. Each element of these vectors specify the following parameters:
    • time = the number of generations before t0 that the historical event occurs at.
    • source = the source population
    • sink = the sink population
    • migrants = the proportion of migrants that move from the source population to the sink population.
    • new size = the new population size
    • growth rate = the new population growth rate
    • migr. matrix = the migration matrix to be used
  • lociData is a list (of length = 5) that specifies the type and properties of the marker to be simulated. The five elements of this argument are:
    • locType = indicate the type of marker (e.g. “MICROSAT”, “DNA”, “SNP” or “STANDARD”)
    • numLoc = The number of markers
    • recombRate = the recombination rate of the markers
    • mu = the mutation rate of the markers
    • addPar = an additional parameter (e.g. mutation rate per bp for DNA type marker).

Function demo

# load parMaker into R
source("parMaker.R")
# clear demo directory
system("rm -r demo")
parMaker(outfile = "demo", popSizes = 2000, sampSizes = 10, 
         numMigMat = 1, migMat = matrix(c(0.0, 0.005, 0.0, 0.0), 
                                        nrow = 2, ncol = 2, 
                                        byrow = TRUE),
         numIndependentChromo = c(5, 0),
         numHistEvents = 1, histEvents = c(100, 0, 1, 0, 2, 0, 0),
         lociData = list(locType = "MICROSAT",
                         numLoc = 1,
                         recombRate = 0,
                         mu = 0.001,
                         addPar = 0))
Inspect the output
# read the par file
par_demo <- readLines("demo.par")
cat(paste(par_demo, collapse = "\n"))
//Number of population samples (demes)
2 samples to simulate :
//Population effective sizes (number of genes)
2000
2000
//Samples sizes
10
10
//Growth rates  : negative growth implies population expansion
0
0
//Number of migration matrices : 0 implies no migration between demes
1
//migration matrix
0.000 0.005
0.000 0.000
//historical event: time, source, sink, migrants, new size, new growth rate, migr. matrix 
1 historical event
100 0 1 0 2 0 0
//Number of independent loci [chromosome] 
5 0
//Per chromosome: Number of linkage blocks
1
//per Block: data type, num loci, rec. rate and mut rate + optional parameters
MICROSAT 1 0 0.001 0

We can also check if this file is readable by fastsimcoal2 as follows:

system("./fastsimcoal21 -i demo.par -n 1 -g")
arl_file <- readLines("demo/demo_1_1.arp")
cat(arl_file, sep = "\n")
#Arlequin input file written by the simulation program fastsimcoal.exe

[Profile]
    Title="A series of simulated samples"
    NbSamples=2

    GenotypicData=1
    GameticPhase=0
    RecessiveData=0
    DataType=MICROSAT
    LocusSeparator=WHITESPACE
    MissingData='?'

[Data]
    [[Samples]]

#Number of independent chromosomes: 5
#Total number of polymorphic sites: 5
# 1 polymorphic positions on chromosome 1
#1
# 1 polymorphic positions on chromosome 2
#1
# 1 polymorphic positions on chromosome 3
#1
# 1 polymorphic positions on chromosome 4
#1
# 1 polymorphic positions on chromosome 5
#1

        SampleName="Sample 1"
        SampleSize=5
        SampleData= {
1_1 1    500  497  495  501  501 
         498  498  495  497  502 
1_2 1    501  499  498  499  501 
         500  504  498  499  502 
1_3 1    500  507  496  501  500 
         500  498  496  500  503 
1_4 1    502  506  497  502  502 
         497  498  497  501  502 
1_5 1    500  499  497  502  502 
         500  506  499  501  502 

}
        SampleName="Sample 2"
        SampleSize=5
        SampleData= {
2_1 1    500  498  497  501  500 
         501  507  497  500  501 
2_2 1    500  506  499  501  500 
         499  498  499  501  500 
2_3 1    501  499  497  501  500 
         500  504  497  502  500 
2_4 1    499  504  497  501  500 
         497  498  497  500  502 
2_5 1    501  504  498  499  500 
         497  504  497  499  499 

}

[[Structure]]

    StructureName="Simulated data"
    NbGroups=1
    Group={
       "Sample 1"
       "Sample 2"
    }

We can see that the parMaker function has generated a parameter file which fastsimcoal2 has used to generate a .arp genotype file. We can convert this file to a genepop for and analyse it using the divMigrate function in diveRsity as follows:

library(diveRsity)
arp2gen("demo/demo_1_1.arp")
# test that genepop file looks ok
demo_gen <- readLines("demo/demo_1_1.gen")
cat(demo_gen, sep = "\n")
demo/demo_1_1_gen_converted
locus1
locus2
locus3
locus4
locus5
POP
pop1 ,  499500  498499  497488  500503  502502
pop1 ,  500499  500499  498489  504503  500500
pop1 ,  499501  498501  496497  500503  502502
pop1 ,  499500  499499  491496  503501  500500
pop1 ,  500502  498500  488488  503496  499503
POP
pop2 ,  500500  500498  496495  504500  504500
pop2 ,  503500  500498  496497  503499  500500
pop2 ,  499498  501501  496497  500501  500503
pop2 ,  499502  500500  488496  504499  499503
pop2 ,  498500  501501  489496  499504  499500
# run `divMigrate`
demo_res <- divMigrate("demo/demo_1_1.gen")
qgraph::qgraph(demo_res[[1]])

plot of chunk unnamed-chunk-6

We can see that the analysis was successful and our network plot demonstrates that the higher migration occurs from pop2 to pop1 (i.e. darker connection), as specified in our par file.

Migration model detection

In this section we will simulate data under the various levels of population divergence for each of the three migration patterns mentioned above. Namely, the divergence between population pairs will be \(F_{ST} = 0.5, 0.05, 0.005\) corresponding to low, medium and high migration rates, respectively. Using the equation \(F_{ST} \approx \frac{1}{4Nm + 1}\), these divergences result in migration rates of \(\approx\) 0.00025, 0.005 and 0.05, respectively. The ability to resolve unidirectional, bidirectional and no migration regimes for variable sample sizes and number of loci will be assessed. Statistical power will be estimated for all three migration statistics provided in divMigrate (i.e. \(D_{Jost}\)4, \(G_{ST}\)5 and \(Nm\)6). For all simulations, the base population size will be 1000, locus mutation rate will be \(5 \times 10^{-4}\) (A mutation rate common in vertebrates7).

Sample size effects

Unidirectional migration model (Sample size)

In these simulations we will test the statistical power of the method to detect the correct direction of migration at the three divergence levels mentioned above. To ease the simulation and calculation process, a custom wrapper function that generates our parameter files for fastsimcoal2, converts .arp files to the genepop format and calculates the proportion of times migration in the direction \(pop1 \rightarrow pop2\) is less than \(pop1 \leftarrow pop2\) (i.e. statistical power to resolve a given migration regime), will be used.

Simuations and analysis

The code to run the simulations and to compile the results from divMigrate is below.

# load simulation wrapper function
source("simFun.R")
# define divergences
fst <- c(0.5, 0.05, 0.005)
# define sample sizes
n <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# create simulation names for each divergence level
low_nms <- paste("low", n, sep = "")
med_nms <- paste("med", n, sep = "")
hi_nms <- paste("high", n, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, samp_size = n,
              MoreArgs = list(fst = 0.5), SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
                    n = sapply(low, "[[", "n"), N = n, mig = "low")

# Medium migration
med <- mapply(simFun, outname = med_nms, samp_size = n,
              MoreArgs = list(fst = 0.05), SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
                    n = sapply(med, "[[", "n"), N = n, mig = "med")

# High Migration
hi <- mapply(simFun, outname = hi_nms, samp_size = n,
              MoreArgs = list(fst = 0.005), SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
                    n = sapply(hi, "[[", "n"), N = n, mig = "high")
# Comile the results
unidat_ss <- rbind(lowdf, meddf, hidf)
# create a directory for all results
system("mkdir sim-results")
# save results for later use
save(unidat_ss, file="sim-results/uni_sam_size.Rdata")

Result plots

Setting up the plotting environment.

# load required packages
library(reshape) # reshape main data
library(ggplot2) # plotting results
library(wesanderson) # creating colour palettes
# manipulate data
unidat_ss$N <- as.factor(unidat_ss$N)
unidat_ss <- melt(unidat_ss)
colnames(unidat_ss) <- c("N", "migration", "stat", "power")
# create palette
pal <- wes.palette(3, "Darjeeling")

Plotting \(D_{Jost}\) results

d <- unidat_ss[unidat_ss$stat == "d",]
p <- ggplot(data = d, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(D"["Jost"]*")"))
p <- p + theme_bw()
p

plot of chunk plotsampsizeD

Figure 1: Statistical power to resolve unidirectional migration using \(D_{Jost}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various sample sizes.

Plotting \(G_{ST}\) results

g <- unidat_ss[unidat_ss$stat == "g",]
p <- ggplot(data = g, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(G"["ST"]*")"))
p <- p + theme_bw()
p

plot of chunk plotsampsizeG

Figure 2: Statistical power to resolve unidirectional migration using \(G_{ST}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various sample sizes.

Plotting \(Nm_{Alcala}\) results

nm <- unidat_ss[unidat_ss$stat == "n",]
p <- ggplot(data = nm, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(Nm)"))
p <- p + theme_bw()
p

plot of chunk plotsampsizeNm

Figure 3: Statistical power to resolve unidirectional migration using \(Nm_{Alcala}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various sample sizes.

Bidirectional migration model (Sample size)

Simuations and analysis

# create simulation names for each divergence level
low_nms <- paste("bilow", n, sep = "")
med_nms <- paste("bimed", n, sep = "")
hi_nms <- paste("bihigh", n, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, samp_size = n,
              MoreArgs = list(fst = 0.5, mig_mod = "bi"), 
              SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
                    n = sapply(low, "[[", "n"), N = n, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, samp_size = n,
              MoreArgs = list(fst = 0.05, mig_mod = "bi"), 
              SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
                    n = sapply(med, "[[", "n"), N = n, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, samp_size = n,
             MoreArgs = list(fst = 0.005, mig_mod = "bi"), 
             SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
                    n = sapply(hi, "[[", "n"), N = n, mig = "high")
# Combine data
bidat_ss <- rbind(lowdf, meddf, hidf)
# save results for later use
save(bidat_ss, file="sim-results/bi_sam_size.Rdata")

Result plots

Setting up the plotting environment.

# manipulate data
bidat_ss$N <- as.factor(bidat_ss$N)
bidat_ss <- melt(bidat_ss)
colnames(bidat_ss) <- c("N", "migration", "stat", "power")

Plotting \(D_{Jost}\) results

d <- bidat_ss[bidat_ss$stat == "d",]
p <- ggplot(data = d, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(D"["Jost"]*")"))
p <- p + theme_bw()
p

plot of chunk biplotsampsizeD

Figure 4: Statistical power to resolve bidirectional migration using \(D_{Jost}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable sample sizes.

Plotting \(G_{ST}\) results

g <- bidat_ss[bidat_ss$stat == "g",]
p <- ggplot(data = g, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(G"["ST"]*")"))
p <- p + theme_bw()
p

plot of chunk biplotsampsizeG

Figure 5: Statistical power to resolve bidirectional migration using \(G_{ST}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable sample sizes.

Plotting \(Nm_{Alcala}\) results

nm <- bidat_ss[bidat_ss$stat == "n",]
p <- ggplot(data = nm, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(Nm)"))
p <- p + theme_bw()
p

plot of chunk biplotsampsizeNm

Figure 6: Statistical power to resolve bidirectional migration using \(Nm_{Alcala}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable sample sizes.

No migration model (Sample size)

Simuations and analysis

# create simulation names
low_nms <- paste("nolow", n, sep = "")
# run analyses
# no migration
low <- mapply(simFun, outname = low_nms, samp_size = n,
              MoreArgs = list(mig_mod = "none"), 
              SIMPLIFY = FALSE)
nodat_ss <- data.frame(d = sapply(low, "[[", "d"), 
                       g = sapply(low, "[[", "g"),
                       n = sapply(low, "[[", "n"), N = n,
                       mig = "low")
# save results for later use
save(nodat_ss, file="sim-results/no_sam_size.Rdata")

Result plots

Setting up the plotting environment.

# manipulate data
nodat_ss$N <- as.factor(nodat_ss$N)
nodat_ss <- melt(nodat_ss)
colnames(nodat_ss) <- c("N", "migration", "stat", "power")
# create palette
pal <- wes.palette(3, "Moonrise2")

No migration model detection

p <- ggplot(data = nodat_ss, aes(x = N, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = stat, color = stat, 
                       lty = stat), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle("No migration")
p <- p + theme_bw()
p

plot of chunk noplotsampsizeD

Figure 7: Statistical power to resolve a no migration model using \(D_{Jost}\), \(G_{ST}\) and \(Nm_{Alcala}\) to calculate relative migration rates for variable sample sizes.

Locus number effects (STR only)

In these simulations we will test the statistical power of the directional migration method to detect the correct migration scenario at the three migration rates mentioned above, for variable numbers of microsatellite loci. For all simulations, base population size will be 1000, while the sample size will be fixed at 50. All other parameter will be the same as above. Statistical power is also estimated in the same way.

Unidirectional migration model (Number of loci)

Simuations and analysis

# Define locus numbers
nloci <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# create simulation names for each divergence level
low_nms <- paste("lownloc", nloci, sep = "")
med_nms <- paste("mednloc", nloci, sep = "")
hi_nms <- paste("highnloc", nloci, sep = "")
# run analyses
# low migration
low <- mapply(simFun, outname = low_nms, nloci = nloci,
              MoreArgs = list(fst = 0.5, samp_size = 50, nsim = 10), 
              SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
                    n = sapply(low, "[[", "n"), nloci = nloci, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, nloci = nloci,
              MoreArgs = list(fst = 0.05, samp_size = 50), 
              SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
                    n = sapply(med, "[[", "n"), nloci = nloci, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, nloci = nloci,
              MoreArgs = list(fst = 0.005, samp_size = 50), 
             SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
                    n = sapply(hi, "[[", "n"), nloci = nloci, mig = "high")
# Combine data
unidat_nl <- rbind(lowdf, meddf, hidf)
# save results for later use
save(unidat_nl, file = "sim-results/uni_nloc.Rdata")

Result plots

Setting up the plotting environment.

# manipulate data
unidat_nl$nloci <- as.factor(unidat_nl$nloci)
unidat_nl <- melt(unidat_nl)
colnames(unidat_nl) <- c("nloci", "migration", "stat", "power")
# create palette
pal <- wes.palette(3, "Darjeeling")

Plotting \(D_{Jost}\) results

d <- unidat_nl[unidat_nl$stat == "d",]
p <- ggplot(data = d, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(D"["Jost"]*")"))
p <- p + theme_bw()
p

plot of chunk plotnlocD

Figure 8: Statistical power to resolve unidirectional migration using \(D_{Jost}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various numbers of loci.

Plotting \(G_{ST}\) results

g <- unidat_nl[unidat_nl$stat == "g",]
p <- ggplot(data = g, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(G"["ST"]*")"))
p <- p + theme_bw()
p

plot of chunk plotnlocG

Figure 9: Statistical power to resolve unidirectional migration using \(G_{ST}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various numbers of loci.

Plotting \(Nm_{Alcala}\) results

nm <- unidat_nl[unidat_nl$stat == "n",]
p <- ggplot(data = nm, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Unidirectional migration "*"(Nm)"))
p <- p + theme_bw()
p

plot of chunk plotnlocNm

Figure 10: Statistical power to resolve unidirectional migration using \(Nm_{Alcala}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for various numbers of loci.

Bidirectional migration model (Number of loci)

Simuations and analysis

# Create simulation names for each divergence level
low_nms <- paste("bilownloc", nloci, sep = "")
med_nms <- paste("bimednloc", nloci, sep = "")
hi_nms <- paste("bihighnloc", nloci, sep = "")
# Run the analyses
# low migration
low <- mapply(simFun, outname = low_nms, nloci = nloci,
              MoreArgs = list(fst = 0.5, mig_mod = "bi",
                              samp_size = 50),
              SIMPLIFY = FALSE)
lowdf <- data.frame(d = sapply(low, "[[", "d"), g = sapply(low, "[[", "g"),
                    n = sapply(low, "[[", "n"), nloci = nloci, mig = "low")
# Medium migration
med <- mapply(simFun, outname = med_nms, nloci = nloci,
              MoreArgs = list(fst = 0.05, mig_mod = "bi",
                              samp_size = 50), 
              SIMPLIFY = FALSE)
meddf <- data.frame(d = sapply(med, "[[", "d"), g = sapply(med, "[[", "g"),
                    n = sapply(med, "[[", "n"), nloci = nloci, mig = "med")
# High Migration
hi <- mapply(simFun, outname = hi_nms, nloci = nloci,
             MoreArgs = list(fst = 0.005, mig_mod = "bi",
                             samp_size = 50), 
             SIMPLIFY = FALSE)
hidf <- data.frame(d = sapply(hi, "[[", "d"), g = sapply(hi, "[[", "g"),
                    n = sapply(hi, "[[", "n"), nloci = nloci, mig = "high")
# Combine data
bidat_nl <- rbind(lowdf, meddf, hidf)
# save results for later use
save(bidat_nl, file="sim-results/bi_nloc.Rdata")

Result plots

Setting up the plotting environment

# manipulate data
bidat_nl$nloci <- as.factor(bidat_nl$nloci)
bidat_nl <- melt(bidat_nl)
colnames(bidat_nl) <- c("nloci", "migration", "stat", "power")

Plotting \(D_{Jost}\) results

d <- bidat_nl[bidat_nl$stat == "d",]
p <- ggplot(data = d, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(D"["Jost"]*")"))
p <- p + theme_bw()
p

plot of chunk biplotnlocD

Figure 11: Statistical power to resolve bidirectional migration using \(D_{Jost}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable numbers of loci.

Plotting \(G_{ST}\) results

g <- bidat_nl[bidat_nl$stat == "g",]
p <- ggplot(data = g, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(G"["ST"]*")"))
p <- p + theme_bw()
p

plot of chunk biplotnlocG

Figure 12: Statistical power to resolve bidirectional migration using \(G_{ST}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable numbers of loci.

Plotting \(Nm_{Alcala}\) results

nm <- bidat_nl[bidat_nl$stat == "n",]
p <- ggplot(data = nm, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = migration, color = migration, 
                       lty = migration), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("Bidirectional migration "*"(Nm)"))
p <- p + theme_bw()
p

plot of chunk biplotnlocNm

Figure 13: Statistical power to resolve bidirectional migration using \(Nm_{Alcala}\) to calculation relative migration rates. Power is estimated for high (0.05), medium (0.005) and low (0.00025) migration for variable numbers of loci.

No migration model (Number of loci)

Simuations and analysis

# create simulation names
low_nms <- paste("nolow", nloci, sep = "")
# run analyses
# low migration
low <- mapply(simFun, outname = low_nms, nloci = nloci,
              MoreArgs = list(fst = 0.5, mig_mod = "none",
                              samp_size = 50), 
              SIMPLIFY = FALSE)
nodat_nl <- data.frame(d = sapply(low, "[[", "d"), 
                       g = sapply(low, "[[", "g"),
                       n = sapply(low, "[[", "n"), 
                       nloci = nloci, mig = "low")
save(nodat_nl, file="sim-results/no_nloc.Rdata")

Result plots

# manipulate data
nodat_nl$nloci <- as.factor(nodat_nl$nloci)
nodat_nl <- melt(nodat_nl)
colnames(nodat_nl) <- c("nloci", "migration", "stat", "power")
# create palette
pal <- wes.palette(3, "Moonrise2")

No migration model detection

# plotting
p <- ggplot(data = nodat_nl, aes(x = nloci, y = power))
# p <- p + geom_point(size = 3, aes(pch = migration))
p <- p + geom_line(aes(group = stat, color = stat, 
                       lty = stat), size = 2)
p <- p + scale_color_manual(values = pal)
p <- p + scale_y_continuous(limits = c(0,1))
p <- p + ggtitle(expression("No migration"))
p <- p + theme_bw()
p

plot of chunk noplotnloc2

Figure 14: Statistical power to resolve a no migration model using \(D_{Jost}\), \(G_{ST}\) and \(Nm_{Alcala}\) to calculate relative migration rates for variable numbers of loci.

Discussion

Detecting unidirectional migration

Sample size effects

When testing power to detect unidirectional migration, the method was most successful under a moderate migration regime (i.e. migration = 0.005). Power appeared to plateau for the medium migration rate when sample size was \(\geq\) 30, while power failed to plateau for the low and high migration rates, even up to the maximum sample sizes tested. Power to detect unidirectional migration under low migration rates was consistently above 0.8 for sample sizes \(\geq\) 30. The \(G_{ST}\) and \(Nm_{Alcala}\) methods performed slightly better than the \(D_{Jost}\) approach when the migration rate was low. These results are in line with recent observations of the relative properties of \(D_{Jost}\) and \(G_{ST}\) under variable demographic scenarios 8. Table 1 contains the general guidelines for using \(G_{ST}\) and \(D_{Jost}\) as suggested in Alcala et al., (2014).

Table 1: Use of \(G_{ST}\) and \(D_{Jost}\) when parameter \(n\) (number of populations) and \(\theta\) (scaled mutation rate) are known. Modified version of Table 4 from Alcala et al., (2014).

Mutation regime Measure to use
\(n\theta < 1\) \(D_{Jost}\)
\(\theta < 1 < n\theta\) \(G_{ST}\)
\(1 < \theta < n\theta\) \(D_{Jost}\)

In all of the simulations, the expression \(\theta < 1 < n\theta\) was true, explaining the apparently better performance of \(G_{ST}\) over \(D_{Jost}\). Future work might include a more comprehensive exploration of the relative ability to detect directional migration using either \(D_{Jost}\) or \(G_{ST}\), under a wider range of demographic scenarios. Power to detect directional migration at high migration levels was very low for most sample sizes tested. This is likely due to the homogenizing effect of such a migration rate, thus the difference between populations were minimal. However, the upward trend in power for both increasing sample size and number of loci suggest that under the correct experimental design the method may be capable of resolving such patterns.

Locus number effects

In general, the patterns observed for increasing number of loci are similar to those for increasing sample sizes. Again power was highest for the moderate migration rate, while low migration had the next highest power at all numbers of loci. Again, the method failed to detect the correct direction of migration with sufficiently high levels of accuracy when migration was high. In this instance, power appeared to plateau around 1.0 for the low migration rate when the number of loci was \(geq\) 90, supporting the speculation above that more loci and larger sample sizes can greatly improve the performance of the method.

False detection rates

Simulations of populations under scenarios of bidirectional migration and no migration were carried out to test the rate of type I errors. This was calculated by estimating the number of simulations that resulted in a larger relative migration rate from \(pop 2 \rightarrow pop 1\), thus the expected power was 0.5. These analyses were interested in deviations from this expectation, since it indicates the risk of false positives when using the directional migration method.

Sample size effects

Generally, for all sample sizes tested, power clustered around 0.5. There were some erratic deviations form this pattern (e.g. the medium line in fig 5 where N = 40). These deviations may be due to using only 100 replicates when estimating power. Using a larger number of replicates may result is more accurate conformance to expectations.

The patterns observed using \(G_{ST}\) and \(Nm_{Alcala}\) appeared to be less erratic than those using \(D_{Jost}\), again in line with the observations of Alcala et al., (2014). Interestingly, the pattern observed for the high migration rate was less erratic than for low and medium migration rates.

Number of loci effects

The results were, again, very similar to those for sample size effects.


  1. Sundqvist, L., Zackrisson, M., & Kleinhans, D., (2013), Directional genetic differentiation and asymmetric migration, Populations and Evolution subject @ arXiv:1304.0118 [q-bio.PE]

  2. Keenan, K., McGinnity, P., Cross, T. F., Crozier, W. W., Prodöhl, P. A. (2013), diveRsity: An R package for the estimation and exploration of population genetics parameters and their associated errors. Methods in Ecology and Evolution, 4: 782–788. doi: 10.1111/2041-210X.12067

  3. Excoffier, Laurent, and Matthieu Foll. “Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios.” Bioinformatics 27.9 (2011): 1332-1334.

  4. Jost, L. (2008) GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015–4026

  5. Nei M. 1973. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, USA., 70(12, Pt 1), 3321-3323

  6. Alcala, Nicolas, Jérôme Goudet, and Séverine Vuilleumier. “On the transition of genetic differentiation from isolation to panmixia: What we can learn from GST and D.” Theoretical population biology 93 (2014): 75-84.

  7. Bhargava, A., Fuentes, F.F., 2010. Mutational dynamics of microsatellites. Molecular Biotechnology 44, 250–266

  8. Alcala, Nicolas, Jérôme Goudet, and Séverine Vuilleumier. “On the transition of genetic differentiation from isolation to panmixia: What we can learn from GST and D.” Theoretical population biology 93 (2014): 75-84.