Calculating basic population statistics in batch mode

Kevin Keenan 2014

This script will demonstrate the use of the divBasic function from the diveRsity package for the calculation of various basic population level statistics. The script demonstrates manipulation methods for storing outputs in convineient formats for further analyses/plotting.

Input data

In this example, we consider the process of analysing the output genepop files from the forward time simulation software, EASYPOP (v2.01). The simulations have produced 500 output genepop files (.gen), each one containing two population samples of 50 individuals each, scored at 10 microsatellite loci. The contents of one of these files is given below:

2   10  999 3
loc-1
loc-2
loc-3
loc-4
loc-5
loc-6
loc-7
loc-8
loc-9
loc-10
pop
   1 , 758451 241241 963533 903790 865543 255712 311122 334791 539219 213987 
   1 , 758984 500500 963533 903903 865431 712712 026266 545545 193193 731731 
   1 , 758451 291291 963389 790421 543543 255806 122266 219334 557557 213213 
   1 , 451451 241241 938901 421421 543431 712712 026311 219195 539615 987348 
   1 , 758758 500843 963963 903903 543543 255712 026311 334334 193539 605731 
   1 , 451758 843259 901963 903790 865543 806667 026122 334219 539615 708708 
   1 , 984758 291291 389963 903790 543465 712712 266266 195791 193539 213731 
   1 , 758758 843259 963963 421790 543543 806255 311266 791791 557615 731708 
   1 , 758758 291241 389963 790903 865865 255236 311266 219545 219193 731731 
   1 , 758451 843291 963963 421421 465465 806806 026122 334219 193557 708708 
   1 , 758758 291291 901533 903903 865431 667712 026266 195545 193193 987987 
   1 , 984451 241241 389389 790903 865543 712712 266266 195219 193219 708987 
   1 , 758451 241241 901963 903421 543543 712667 266026 334219 539539 731987 
   1 , 758451 241241 938389 903790 543431 667712 026026 545195 193615 987708 
   1 , 758758 500259 963901 790903 865543 236667 026266 219195 219557 731731 
   1 , 451758 259241 963533 903790 465865 806255 026311 218334 193539 708731 
   1 , 758758 500843 963533 903421 865543 667712 266266 195334 193557 731348 
   1 , 758451 291291 389963 790790 865865 806255 266026 545219 615615 708348 
   1 , 758451 843241 963533 421903 865543 255806 311266 195545 539539 708213 
   1 , 451987 843291 963389 421903 865543 712255 266426 334791 557219 605213 
   1 , 451758 241241 938963 903421 543543 667806 026122 219334 193557 731213 
   1 , 758987 241291 389533 421903 431543 255806 311122 334791 557219 731731 
   1 , 758758 843843 901963 903790 543543 667236 026311 195545 539539 348731 
   1 , 758758 241241 389389 903903 865865 667712 266266 545219 193539 987731 
   1 , 758758 843843 963389 421903 431431 712667 311026 334334 539219 605731 
   1 , 758451 291241 963389 903903 543465 806712 266122 195334 539557 213708 
   1 , 758451 843241 963901 421421 865431 806712 311026 195195 539557 708348 
   1 , 758984 291291 901533 790903 543431 255712 122266 334219 557539 213731 
   1 , 451758 241291 938901 421421 865543 667712 026426 791219 539539 987213 
   1 , 451758 241291 963533 421903 465865 667712 311266 195545 539539 605987 
   1 , 758758 843241 938901 903790 431865 255255 122311 219334 219539 731731 
   1 , 451758 291291 901963 421421 431865 806712 426122 791219 539193 987213 
   1 , 758758 291500 963389 790790 543543 806712 311311 334334 557539 082731 
   1 , 758451 843241 963901 421790 431543 712255 026122 334791 539539 708213 
   1 , 984451 241241 963533 790903 543543 712806 266266 195545 615219 708731 
   1 , 758758 241241 389901 421790 431465 712255 311311 791791 539539 708731 
   1 , 451987 241291 938389 421903 543865 667255 026122 219195 539219 987731 
   1 , 758451 241291 901389 903790 865543 255712 026122 334791 539557 213213 
   1 , 758758 241241 389963 421421 431543 255712 311266 195334 557539 731348 
   1 , 984984 241500 389963 903790 543865 712236 266266 545545 193557 987731 
   1 , 758758 500291 963963 903421 543865 806667 266311 195791 557539 082348 
   1 , 758451 843291 963389 421903 543543 255712 026266 791219 539557 708213 
   1 , 451758 843291 901963 903790 865543 806667 266266 334791 539615 731708 
   1 , 758451 241241 389963 790790 543865 806255 311266 334791 539557 708348 
   1 , 758451 843259 963963 903790 543543 712667 266122 334219 539615 708708 
   1 , 451758 291241 389389 903903 865543 255236 026311 219219 219539 731213 
   1 , 984451 241292 533389 903421 865543 667255 266266 219334 557557 987708 
   1 , 451451 291291 901389 903903 431543 806712 026122 195219 539219 987213 
   1 , 758758 843259 938901 421790 431543 255667 266266 219219 219557 731213 
   1 , 984451 241241 533901 421790 865431 667712 266026 545195 193557 348348 
pop
   2 , 422257 915915 521085 772231 486145 689470 435731 254425 284599 946460 
   2 , 422535 915915 390521 772772 145891 470470 435435 600600 819599 946460 
   2 , 422422 915915 390085 230231 303145 420420 731435 425600 284599 946946 
   2 , 812812 337915 521390 772230 145145 420420 731435 600254 599599 709460 
   2 , 422812 915915 390390 772230 486486 470470 435731 600600 599599 460460 
   2 , 812812 337915 389085 772230 486145 420420 435731 254600 599599 836709 
   2 , 422812 915245 390390 772772 145145 470470 435731 354600 819450 946709 
   2 , 812812 337915 390521 772772 145145 420689 731435 600254 284284 946460 
   2 , 257812 915915 651390 231772 145303 470421 731435 600600 284599 709709 
   2 , 422257 915915 521390 230772 303145 420420 435435 600810 599599 460709 
   2 , 812422 915915 390390 230772 303303 470420 435435 425254 599599 946709 
   2 , 812812 915915 390390 772772 303303 421420 731435 600600 450599 709460 
   2 , 812257 915915 390390 230231 486486 690420 435659 600425 284450 946460 
   2 , 812812 337337 389314 230231 486303 991420 435731 254600 599284 836460 
   2 , 422257 915824 390085 772231 145303 470470 435731 354600 819599 946709 
   2 , 422257 245915 390521 231772 145486 470470 731731 600600 450599 946946 
   2 , 812422 915337 390390 601231 145303 689689 731731 600600 599819 460946 
   2 , 422812 245337 651390 772230 145145 470420 435435 600600 284284 946460 
   2 , 257812 245915 390390 231231 145145 470470 731731 600354 450284 946460 
   2 , 422812 245824 390085 772772 891303 470470 435731 254600 819599 946709 
   2 , 812812 915915 314390 231601 303486 689420 436731 254600 284599 709460 
   2 , 257812 915337 390390 772772 486486 470470 435731 254254 599284 460654 
   2 , 812812 245824 390390 772772 303145 689689 435731 600600 599450 946460 
   2 , 257257 337915 390390 230230 486486 420689 435435 425600 599819 946946 
   2 , 812257 915336 390390 772231 145303 470470 435731 600425 599450 460460 
   2 , 812812 915915 390085 772230 891145 470420 435435 600600 599599 946460 
   2 , 257812 915915 085521 230772 303303 420420 435731 600354 599599 460460 
   2 , 812257 915915 390085 601772 303145 689689 435731 254600 599599 460460 
   2 , 257257 824915 390390 772772 486145 470420 435435 254254 599599 836709 
   2 , 812812 915337 390390 230772 303303 420420 731435 600254 599599 460460 
   2 , 422422 824915 521521 772772 303303 689420 435435 254254 284599 836709 
   2 , 812812 915915 085390 772230 303303 420420 435435 600254 599599 460460 
   2 , 812257 337915 390390 230772 486303 991420 435731 254254 599284 836654 
   2 , 812812 245337 651390 231231 303145 689420 436731 354600 284819 946946 
   2 , 812812 915915 390390 601772 891486 420420 731435 600600 819284 460460 
   2 , 257812 915915 390521 772772 486486 420470 435435 254600 819284 460460 
   2 , 422812 824337 085085 230231 145145 420420 731435 600600 599599 460946 
   2 , 812812 915337 521390 230772 303303 689470 731435 600253 819599 460460 
   2 , 535812 915245 390390 601772 486303 470470 731731 600600 599450 460709 
   2 , 257257 915915 390085 230231 486303 420470 435659 600425 819599 460460 
   2 , 812812 915915 521390 230772 303486 689420 731731 600254 599599 460946 
   2 , 812422 915915 390390 772772 145303 420470 659731 425254 599599 946836 
   2 , 257257 915915 390390 772772 303891 420470 731435 600600 819819 460460 
   2 , 257812 245824 390390 772230 145145 421420 731731 600600 284284 709709 
   2 , 812812 915915 390390 772230 303486 420470 435435 600600 284599 946709 
   2 , 812812 337824 521390 772772 145145 420420 435731 600600 599599 709709 
   2 , 812812 245915 085390 772231 303145 689420 435731 654354 599284 460460 
   2 , 812812 915337 085390 772772 145145 421420 068435 600254 450599 946946 
   2 , 812812 337915 521390 772601 145303 420420 731731 354254 599599 709460 
   2 , 812257 337915 390085 231231 145145 421470 731731 425600 599599 946946 

All output genepop files from our simulation are in a folder, “Sim-files”, under the current R working directory. This folder also contains multiple other files, which EASYPOP outputs, so we need to find a way to identify these files, to be passed to the divBasic function. This can be done as follows:

infiles <- list.files(path = "Sim-files/", pattern = "*.gen", full.names = TRUE)
infiles[1:5]
[1] "Sim-files/mySims-001.gen" "Sim-files/mySims-002.gen"
[3] "Sim-files/mySims-003.gen" "Sim-files/mySims-004.gen"
[5] "Sim-files/mySims-005.gen"

Now that we know the names of our input files (unordered), we can load diveRsity and calculate some statistics.

library(diveRsity)
# calculate stats for one file
test_stats <- divBasic(infile = infiles[1])
# explore output structure
names(test_stats)
[1] "locus_pop_size"     "Allele_number"      "proportion_Alleles"
[4] "Allelic_richness"   "Ho"                 "He"                
[7] "HWE"                "arCIs"              "mainTab"           

We can see that there are a number of outputs here. To ensure that we can manipulate these into the most efficient structure across all simulation replicates, we need to know what the structure of each single object is. An easy way to do this is to print the object to the console. For this demo, we will focus on overall Expected heterozygosity (\( H_{E} \)).

test_stats$He
           1    2
loc-1   0.57 0.58
loc-2   0.73 0.55
loc-3   0.74 0.55
loc-4   0.66 0.63
loc-5   0.66 0.68
loc-6   0.76 0.67
loc-7   0.74 0.55
loc-8   0.79 0.60
loc-9   0.75 0.61
loc-10  0.80 0.69
overall 0.72 0.61

We can see that the overall values are in the matrix row with the rowname, overall. Let's see how we can use this information to generate the \( H_{E} \) results for all input files:

system.time({
    allRes <- lapply(infiles, function(x) {
        return(divBasic(x))
    })
})
   user  system elapsed 
 860.29    0.25  863.45 

Now that we have all of our results from each simulation replicate, we can begin to extract parameters of interest:

he <- lapply(allRes, function(x) {
    return(x$He["overall", ])
})

Now we can extract the overall \( H_{E} \) for each population and plot the results as follows:

pop1He <- sapply(he, "[", 1)
pop2He <- sapply(he, "[", 2)
plot(pop1He, type = "n", ylim = c(min(c(pop1He, pop2He)), max(c(pop1He, pop2He))))
points(c(pop1He, pop2He), col = c("blue", "red"))

plot of chunk plts1

We could also explore the relationships between \( H_{E} \) and \( H_{O} \) across simulation replicates. We can extract \( H_{O} \) estimates in much the same way as \( H_{E} \):

ho <- lapply(allRes, function(x) {
    return(x$Ho["overall", ])
})
pop1Ho <- sapply(ho, "[", 1)
pop2Ho <- sapply(ho, "[", 2)
plot(pop1Ho, pop1He, main = "pop 1")
abline(lm(pop1He ~ pop1Ho), col = "red")

plot of chunk unnamed-chunk-4

cor.test(pop1He, pop1Ho)

    Pearson's product-moment correlation

data:  pop1He and pop1Ho
t = 50.95, df = 498, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9006 0.9291
sample estimates:
  cor 
0.916 
plot(pop2Ho, pop2He, main = "pop 2")
abline(lm(pop2He ~ pop2Ho), col = "red")

plot of chunk unnamed-chunk-4

cor.test(pop2He, pop2Ho)

    Pearson's product-moment correlation

data:  pop2He and pop2Ho
t = 53.12, df = 498, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9076 0.9341
sample estimates:
  cor 
0.922