diveRsity v1.8.0 update: fastDivPart exampleAs of 05/02/14, version 1.8.0 of the diveRsity is available for download from github as follows:
library(devtools)
install_github("diveRsity", "kkeenan02")
This version contains a couple of major changes, most notably, the deprecation of divPart, the original function used to calculate diversity and differentiation statistics. This function has been completely replaced by fastDivPart, which has much more compact and efficient code, as well as more sensible data output structures.
This web page will demonstrate some downstream uses of a newly returned data structure, pairwise locus parameter estimates, from fastDivPart. Specifically, we will see how locus pairwise differences can be visualised using ggplot2.
For this demonstration we will used Test_data, which is distributed with diveRsity. These data can be loaded into the R session as follows:
library(diveRsity)
data(Test_data)
We can do a couple of preliminary inspections of the data using readGenepop, before calculating differentiation statistics.
myData <- readGenepop(infile = Test_data)
# how many population samples do we have?
myData$npops
[1] 6
# how many loci have been scored?
myData$nloci
[1] 37
Let’s have a look at some plots of the allele frequency differences across samples.
# load ggplot2 for plotting
library(ggplot2)
library(reshape2)
# extract allele frequencies for locus 1
loc1 <- myData$allele_freq[[1]]
# add pop names to data
colnames(loc1) <- myData$pop_names
# organise the data for plotting with ggplot
loc1 <- melt(loc1)
loc1
Var1 Var2 value
1 258 pop1, 0.00000
2 264 pop1, 0.00000
3 280 pop1, 0.28261
4 365 pop1, 0.01087
5 369 pop1, 0.29348
6 397 pop1, 0.00000
7 539 pop1, 0.41304
8 258 pop2, 0.00000
9 264 pop2, 0.00000
10 280 pop2, 0.25000
11 365 pop2, 0.05000
12 369 pop2, 0.18750
13 397 pop2, 0.02500
14 539 pop2, 0.48750
15 258 pop3, 0.00000
16 264 pop3, 0.00000
17 280 pop3, 0.28049
18 365 pop3, 0.04878
19 369 pop3, 0.30488
20 397 pop3, 0.02439
21 539 pop3, 0.34146
22 258 pop4, 0.00000
23 264 pop4, 0.00000
24 280 pop4, 0.14423
25 365 pop4, 0.05769
26 369 pop4, 0.38462
27 397 pop4, 0.00000
28 539 pop4, 0.41346
29 258 pop5, 0.01220
30 264 pop5, 0.01220
31 280 pop5, 0.08537
32 365 pop5, 0.06098
33 369 pop5, 0.50000
34 397 pop5, 0.00000
35 539 pop5, 0.32927
36 258 pop6, 0.00000
37 264 pop6, 0.00000
38 280 pop6, 0.18293
39 365 pop6, 0.03659
40 369 pop6, 0.48780
41 397 pop6, 0.00000
42 539 pop6, 0.29268
# plot the frequencies as stacked bar charts
ggplot(loc1, aes(x = factor(Var2), y = value, fill = factor(Var1))) +
geom_bar(stat = "identity")
# repeat for locus 2
loc2 <- myData$allele_freq[[2]]
# add pop names to data
colnames(loc2) <- myData$pop_names
# organise the data for plotting with ggplot
loc2 <- melt(loc2)
loc2
Var1 Var2 value
1 162 pop1, 0.03191
2 164 pop1, 0.44681
3 167 pop1, 0.52128
4 162 pop2, 0.00000
5 164 pop2, 0.52381
6 167 pop2, 0.47619
7 162 pop3, 0.00000
8 164 pop3, 0.48780
9 167 pop3, 0.51220
10 162 pop4, 0.00000
11 164 pop4, 0.46078
12 167 pop4, 0.53922
13 162 pop5, 0.00000
14 164 pop5, 0.42683
15 167 pop5, 0.57317
16 162 pop6, 0.00000
17 164 pop6, 0.41463
18 167 pop6, 0.58537
# plot the frequencies as stacked bar charts
ggplot(loc2, aes(x = factor(Var2), y = value, fill = factor(Var1))) +
geom_bar(stat = "identity")
Now we can calculate our pairwise locus statistics:
system.time({
myStats <- fastDivPart(infile = Test_data, WC_Fst = TRUE, bs_pairwise = TRUE,
pairwise = TRUE, bootstraps = 1000, parallel = TRUE)
})
user system elapsed
9.780 0.962 578.406
Let's inspect the output of fastDivPart.
names(myStats)
[1] "standard" "estimate" "pairwise" "meanPairwise" "bs_pairwise" "bs_pairwise_loci"
Each element in myStats contains information about a range of statistics, often in different formats. For example myStats$estimate is a data.frame containing locus and global differentiation/diversity partitioning statistics, as well as some additional information.
myStats$estimate
Harmonic_N H_st_est D_st_est G_st_est G_hed_st_est D_Jost_est Fst_WC Fit_WC
Locus1 43.12 0.0482 0.0160 0.0234 0.0799 0.0578 0.0260 0.0506
Locus2 43.52 -0.0038 -0.0019 -0.0038 -0.0084 -0.0046 -0.0042 -0.0649
Locus3 43.64 0.3610 0.0566 0.0629 0.4688 0.4332 0.0753 0.0896
Locus4 43.45 0.0700 0.0225 0.0321 0.1134 0.0840 0.0361 0.0449
Locus5 42.77 0.1998 0.0177 0.0191 0.2542 0.2397 0.0224 0.0646
Locus6 43.45 0.1201 0.0228 0.0274 0.1675 0.1441 0.0303 0.2179
Locus7 43.45 0.0382 0.0142 0.0221 0.0670 0.0459 0.0261 0.0318
Locus8 43.26 0.0224 0.0168 0.0632 0.0884 0.0268 0.0696 0.2463
Locus9 43.07 0.2703 0.0153 0.0160 0.3352 0.3244 0.0191 0.0483
Locus10 43.25 0.3237 0.0439 0.0483 0.4181 0.3885 0.0570 0.0890
Locus11 41.95 0.0339 0.0075 0.0095 0.0499 0.0407 0.0104 0.0579
Locus12 41.52 0.3824 0.0330 0.0349 0.4777 0.4589 0.0410 0.0630
Locus13 41.10 0.3815 0.1037 0.1247 0.5254 0.4578 0.1481 0.1301
Locus14 41.05 0.1793 0.0231 0.0258 0.2355 0.2152 0.0298 0.0674
Locus15 41.84 0.0617 0.0279 0.0485 0.1189 0.0740 0.0543 0.0957
Locus16 41.51 0.4234 0.0754 0.0840 0.5494 0.5080 0.0990 0.0954
Locus17 41.84 0.0483 0.0176 0.0270 0.0833 0.0579 0.0315 0.0260
Locus18 41.66 0.3062 0.0211 0.0222 0.3815 0.3675 0.0250 0.0661
Locus19 41.77 0.3337 0.0285 0.0302 0.4186 0.4005 0.0349 0.0740
Locus20 41.28 0.2971 0.0265 0.0283 0.3747 0.3565 0.0332 0.0659
Locus21 41.95 0.0831 0.0256 0.0357 0.1319 0.0997 0.0400 0.1491
Locus22 42.43 0.1773 0.0220 0.0245 0.2320 0.2127 0.0279 0.0758
Locus23 41.70 0.2854 0.0771 0.0955 0.4054 0.3425 0.1130 0.1223
Locus24 42.13 0.1879 0.0440 0.0543 0.2675 0.2254 0.0637 0.1493
Locus25 42.86 0.1413 0.0480 0.0678 0.2259 0.1696 0.0817 0.0252
Locus26 42.25 0.0568 0.0264 0.0470 0.1120 0.0681 0.0542 0.0742
Locus27 41.36 0.1340 0.0238 0.0281 0.1844 0.1608 0.0309 0.2046
Locus28 42.30 0.0468 0.0245 0.0489 0.1023 0.0561 0.0550 0.1426
Locus29 42.69 0.2060 0.0232 0.0255 0.2664 0.2472 0.0286 0.0815
Locus30 40.27 0.1275 0.0282 0.0349 0.1826 0.1531 0.0368 0.3774
Locus31 42.00 0.3393 0.0303 0.0322 0.4263 0.4072 0.0371 0.0685
Locus32 41.22 0.6250 0.0235 0.0238 0.7560 0.7500 0.0271 0.0923
Locus33 41.48 0.3389 0.0243 0.0255 0.4218 0.4067 0.0293 0.0675
Locus34 40.48 0.2607 0.0676 0.0836 0.3703 0.3128 0.0950 0.2614
Locus35 42.69 0.0395 0.0182 0.0326 0.0785 0.0474 0.0368 0.0687
Locus36 42.35 0.1478 0.0305 0.0370 0.2079 0.1774 0.0435 0.0621
Locus37 42.12 0.0281 0.0170 0.0413 0.0736 0.0337 0.0511 0.0965
Global NA NA NA 0.0397 0.1806 0.1462 0.0461 0.0985
The newest output structure from fastDivPart is bs_pairwise_loci (or pw_locus if pairwise = TRUE and bs_pairwise = FALSE). This element of myStats is a list of either three lists (when WC_Fst = FALSE; default) or four lists (when WC_Fst = TRUE).
Each of these ¾ lists contains dataframes for each locus at each of the statistics, \( G_{ST} \) (Nei & Chesser, 1983), \( G'_{ST} \) (Hedrick, 2005), \( D_{Jost} \) (Jost, 2008), and/or \( \theta \) (Weir & Cockerham, 1984). Let's explore the structure of this element a bit more.
# what elements are in `myStats$bs_pairwise_loci`?
names(myStats$bs_pairwise_loci)
[1] "gstEst" "gstEstHed" "djostEst" "thetaWC"
We can see, as expected, there are four elements (since WC_Fst = TRUE). Each of these elements will have identical structures. So let's look closer at thetaWC.
# to save typing, we can assign thetaWC to a variable name
locFst <- myStats$bs_pairwise_loci$thetaWC
# names of each element in the list
names(locFst)
[1] "Locus1" "Locus2" "Locus3" "Locus4" "Locus5" "Locus6" "Locus7" "Locus8" "Locus9" "Locus10" "Locus11"
[12] "Locus12" "Locus13" "Locus14" "Locus15" "Locus16" "Locus17" "Locus18" "Locus19" "Locus20" "Locus21" "Locus22"
[23] "Locus23" "Locus24" "Locus25" "Locus26" "Locus27" "Locus28" "Locus29" "Locus30" "Locus31" "Locus32" "Locus33"
[34] "Locus34" "Locus35" "Locus36" "Locus37"
The names of each element is determined by the locus names in our input file, so if we know loci well, and want to inspect one in particular, we just need to know its name.
# look at locus 1
locFst$Locus1
actual lower upper
pop1, vs. pop2, 0.001996 -0.019811 0.04195
pop1, vs. pop3, -0.006733 -0.020736 0.02917
pop1, vs. pop4, 0.010357 -0.015804 0.05780
pop1, vs. pop5, 0.054282 0.002620 0.12810
pop1, vs. pop6, 0.032709 -0.014660 0.11357
pop2, vs. pop3, 0.011998 -0.017740 0.06429
pop2, vs. pop4, 0.026874 -0.012541 0.08832
pop2, vs. pop5, 0.089735 0.016991 0.17745
pop2, vs. pop6, 0.076395 0.002188 0.17600
pop3, vs. pop4, 0.010266 -0.017157 0.05835
pop3, vs. pop5, 0.043663 -0.002548 0.11648
pop3, vs. pop6, 0.019755 -0.021314 0.08440
pop4, vs. pop5, 0.008034 -0.018876 0.06095
pop4, vs. pop6, 0.008256 -0.021760 0.06780
pop5, vs. pop6, -0.003864 -0.022863 0.03001
We can see that locFst$Locus1 is a dataframe, with three columns (“actual”, “lower”, “upper”), and in this case, 15 rows (1 for each possible pairwise comparisons).
If we were interested in visually inspecting how this locus was behaving among pairwise comparisons, we might like to generate a plot of the actual \( F_{ST} \) values as well as their 95% CIs (bias corrected). We could do this as follows:
df <- locFst$Locus1
ggplot(df, aes(x = rownames(df), y = actual))+
geom_abline(aes(intercept = mean(df$actual, na.rm = TRUE), slope = 0),
colour = "red", lwd = 2) +
geom_point()+
theme(axis.text.x=element_text(angle=-90))+
geom_errorbar(aes(ymin=lower, ymax = upper), width = 0.3)
While the plot, in this instance, is quite easy to interpret, in many cases, where more than 10 population samples are screened, the plot will be quite crowded.
Hedrick, P. (2005) A standardized genetic differentiation measure. Evolution, 59, 1633–1638
Jost, L. (2008) GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015–4026
Keenan, K., McGinnity, P., Cross, T. F., Crozier, W. W., Prodöhl, P. A. (2013), diveRsity: An R package for the estimation and exploration of population genetics parameters and their associated errors. Methods in Ecology and Evolution, 4: 782–788. doi: 10.1111/2041-210X.12067
Nei, M. and Chesser, R. K. (1983), Estimation of fixation indices and gene diversities. Annals of Human Genetics, 47: 253–259. doi: 10.1111/j.1469-1809.1983.tb00993.x
Weir, B. & Cockerham, C. (1984) Estimating F-statistics for the analysis of population structure. Evolution, 38, 1358–1370
For any further questions on the use of diveRsity, or to report a bug, please send an email to kkeenan
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 ggplot2_0.9.3.1 diveRsity_1.8.0 plotrix_3.5-2 knitr_1.5
loaded via a namespace (and not attached):
[1] bitops_1.0-6 caTools_1.16 cluster_1.14.4 colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4
[7] ellipse_0.3-8 evaluate_0.5.1 formatR_0.10 Formula_1.1-1 grid_3.0.2 gtable_0.1.2
[13] Hmisc_3.13-0 httpuv_1.2.1 igraph_0.6.6 jpeg_0.1-6 labeling_0.2 lattice_0.20-24
[19] lavaan_0.5-15 MASS_7.3-29 mnormt_1.4-7 munsell_0.4.2 pbivnorm_0.5-1 plyr_1.8
[25] png_0.1-7 proto_0.3-10 psych_1.3.12 qgraph_1.2.3 quadprog_1.5-5 RColorBrewer_1.0-5
[31] RJSONIO_1.0-3 scales_0.2.3 sem_3.1-3 shiny_0.8.0 splines_3.0.2 stats4_3.0.2
[37] stringr_0.6.2 survival_2.37-4 tools_3.0.2 xtable_1.7-1