Using divBasic: Testing inbreeding differences among populations

Kevin Keenan 2014

Introduction

divBasic is a useful function in the diveRsity R package[1] . It is designed to calculate a range of basic population level statistics, which would normally have to be calculated with multiple programs, and returns the results in a single, easy to read spreadsheet or tab delimited table. This table requires only minor aesthetic changes for inclusion in publications.

As well as this comprehensive table output, many useful, 'raw' statistics are returned from the function to the R workspace. These results can be used for multiple downstream analyses and visualisations. This web doc will demonstrate the use of divBasic to test whether inbreeding coefficients within populations are significantly from one another.

Setup

First we must ensure that we have the latest version of diveRsity installed. This can be done as follows:

# download devtools to install diveRsity from github
install.packages("devtools")
# load devtools
library(devtools)
# install the latest version of diveRsity
install_github("diveRsity", "kkeenan02")

This demo will use the Test_data data set distributed with the diveRsity package. Users wishing to analyse their own data should ensure their input file is in genepop format. This file's name can then be passed to the divBasic function as a character string. See ?divBasic for more details.

We can load diveRsity and Test_data as follows:

# load diveRsity
library(diveRsity)
# load Test_data
data(Test_data)

Now that Test_data and diveRsity are loaded into our workspace and available to use, we can run divBasic. In this example we will set outfile to NULL to suppress writing files to disk. Instead results will only be returned to the R workspace.

#
system.time({
  basicRes <- divBasic(infile = "adjuv.txt", outfile = NULL, gp = 2, 
                       bootstraps = 1000)
})
   user  system elapsed 
 33.331   0.015  33.532 

Generate some basic discriptors of our data:

#
dataStats <- readGenepop(infile = "adjuv.txt", gp = 2)
popNames <- dataStats$pop_names

We can see that it takes diveRsity around 2min to complete 1000 bootstrap iterations for this data. Now that we have generated our results and stored them in the variable basicRes, we can explore its contents.

#
names(basicRes)
[1] "locus_pop_size"     "Allele_number"      "proportion_Alleles" "Allelic_richness"   "Ho"                
[6] "He"                 "HWE"                "fis"                "mainTab"           

We can see that basicRes contains a number of objects, but for now we are interested in the fis element, which contains a dataframe for each of our populations samples. Each one of these dataframes contains locus and overall \( F_{IS} \) values as well as \( 95\% \) confidence intervals (standard and bias corrected).

NOTE

divBasic calculates \( F_{IS} \) as:

\[ F_{IS} = \frac{(H_{e} - H_{o})}{H_{e}} \]

Where \( H_{e} \) is the expected heterozygosity under Hardy-Weinberg equilibrium, and \( H_{o} \) is the observed heterozygosity.

Below is an example of the layout of one of the fis tables returned by divBasic:


The rest of this demo will show how we can use the information provided in each of the population sample \( F_{IS} \) tables to test the significance of difference observed between pairs. Traditionally, p-value test of the null hypothesis of 'no difference' have been used for this purpose. However, diveRsity advocates the use of 95% CIs for such tasks.

The first, and most simple way that we can test whether \( F_{IS} \) values are significantly different between population pairs is to simply plot the 95% CI for each sample to see if any overlap. Samples with non overlapping confidence limits can be said to be significantly different. The direction of these differences can also be observed.

#
# First we want to eactract the relevant information from our results
ciTable <- lapply(basicRes$fis, function(x){
  return(c(x["overall", "fis"], x["overall", "BC_lower_CI"],
           x["overall", "BC_upper_CI"]))
})
# convert this into a dataframe
ciTable <- as.data.frame(do.call("rbind", ciTable))
dimnames(ciTable) <- list(paste("pop_", 1:nrow(ciTable), sep = ""),
                          c("Fis", "lower", "upper"))
# inspect the table
ciTable
           Fis   lower  upper
pop_1   0.0232 -0.0442 0.0949
pop_2  -0.0091 -0.0730 0.0592
pop_3   0.0117 -0.0365 0.0600
pop_4  -0.0088 -0.0599 0.0473
pop_5  -0.0348 -0.0904 0.0224
pop_6   0.0260 -0.0364 0.0873
pop_7   0.0090 -0.0454 0.0655
pop_8   0.0451 -0.0133 0.1045
pop_9   0.0956  0.0400 0.1491
pop_10  0.1226  0.0519 0.1882
# plot the results
# load ggplot2
library(ggplot2)
ggplot(ciTable, aes(x = popNames, y = Fis)) +
  geom_point() + 
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.3) +
  theme(axis.text.x=element_text(angle = -90))

plot of chunk plot

We can see that this method allows us to visually asses the relationships between any pair of populations. However, this plot may become difficult to interpret when the number of population samples is large. Instead, we may want to write some code to ask the question 'is pop X more inbred than pop y?'.

# generate all pairwise comparisons
pw <- combn(nrow(ciTable), 2)
# generate all pairs of confidance intervals
cis <- apply(pw, 2, function(x){
  lowerPW <- c(ciTable[x[1], "lower"], ciTable[x[2], "lower"])
  upperPW <- c(ciTable[x[1], "upper"], ciTable[x[2], "upper"])
  list(lowerPW, 
       upperPW)
})
# identify non-overlapping comparisons
# write a function to do this
overLap <- function(x){
  low <- x[[1]]
  high <- x[[2]]
  if(low[1] < low[2] && high[1] < low[2]){
    list(TRUE,
         "1 < 2")
  }else if(low[2] < low[1] && high[2] < low[1]){
    list(TRUE,
         "2 < 1")
  } else {
    FALSE
  }
}
# run the function on our population pairs
diffs <- sapply(cis, overLap)
# check if any pops are significantly different
sigDif <- which(sapply(diffs, "[[", 1))
sigDif
[1] 30 34 35

From these results we can see that pairwise comparisons 8 and 9 involve two population samples with non-overlapping CIs. Let' find out which samples these are.

pops <- lapply(sigDif, function(x){
  rownames(ciTable)[pw[,x]]
})
pops
[[1]]
[1] "pop_4"  "pop_10"

[[2]]
[1] "pop_5" "pop_9"

[[3]]
[1] "pop_5"  "pop_10"

We can see that pop_2 is significantly different from pop_5 and pop_6. We also included an output in our function to tell us in which direction the difference was in. Let's implement this information.

# isolate the direction information
dirDif <- lapply(diffs[sigDif], "[[", 2)
# parse the information
numDif <- lapply(dirDif, function(x){
  return(strsplit(x, split = "\\s+")[[1]][c(1,3)])
})
textOut <- lapply(1:length(numDif), function(i){
  x <- as.numeric(numDif[[i]])
  paste("The inbreeding coefficient estimated in ", 
        pops[[i]][x[2]],
        " is significantly greater that estimated in ", 
        pops[[i]][x[1]],
        sep = "")
})
textOut
[[1]]
[1] "The inbreeding coefficient estimated in pop_10 is significantly greater that estimated in pop_4"

[[2]]
[1] "The inbreeding coefficient estimated in pop_9 is significantly greater that estimated in pop_5"

[[3]]
[1] "The inbreeding coefficient estimated in pop_10 is significantly greater that estimated in pop_5"

References

Reproducibility

R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rCharts_0.4.2     ggplot2_0.9.3.1   diveRsity_1.8.2   plotrix_3.5-2     data.table_1.8.10 knitr_1.5        

loaded via a namespace (and not attached):
 [1] bitops_1.0-6       caTools_1.16       cluster_1.14.4     codetools_0.2-8    colorspace_1.2-4   dichromat_2.0-0   
 [7] digest_0.6.4       ellipse_0.3-8      evaluate_0.5.1     formatR_0.10       Formula_1.1-1      grid_3.0.2        
[13] gtable_0.1.2       Hmisc_3.13-0       httpuv_1.2.1       igraph_0.6.6       jpeg_0.1-6         labeling_0.2      
[19] lattice_0.20-24    lavaan_0.5-15      MASS_7.3-29        mnormt_1.4-7       munsell_0.4.2      pbivnorm_0.5-1    
[25] plyr_1.8           png_0.1-7          proto_0.3-10       psych_1.3.12       qgraph_1.2.3       quadprog_1.5-5    
[31] RColorBrewer_1.0-5 reshape2_1.2.2     RJSONIO_1.0-3      scales_0.2.3       sem_3.1-3          shiny_0.8.0       
[37] splines_3.0.2      stats4_3.0.2       stringr_0.6.2      survival_2.37-4    tools_3.0.2        whisker_0.3-2     
[43] xtable_1.7-1       yaml_2.1.10       

Additional information

For any questions about or suggested improvements to the material presented above, please feel free to contact me via email: kkeenan02 [AT] qub.ac.uk