RThis document demonstrate the use of the function snp2gp from the diveRsity package in R.
First we need to ensure that the latest version of diveRsity is installed. This can be downloaded and installed from github as follows:
if ("devtools" %in% rownames(installed.packages()) == FALSE) {
install.packages("devtools", repo = "http://cran.rstudio.com", dep = TRUE)
}
library("devtools")
install_github("diveRsity", "kkeenan02")
The snp2gp function takes either a dataframe (preloaded into the R workspace) or a tab delimited input file of the following general format:
SNP_ID pop001_1 pop001_2 pop001_3 pop001_4 pop001_5 pop002_1 pop002_2 pop002_3 pop002_4 pop002_5 pop003_1 pop003_2 pop003_3 pop003_4 pop003_5
SNP1 TC TC TC TC TC TT TC TC TC CC TT TT TC CC TC
SNP2 TC TC TC TC TC TC TC TC TC TT CC CC TC CC TC
SNP3 TA TA TA TA AA TA TA TT TA TA AA AA TA AA TA
SNP4 AG AG AG AG AG AG GG GG AG AG AA AA GG AA AG
SNP5 TC TC TC TC TC TC TT TT TC TC CC CC TT CC TC
SNP6 AG AG AG AG AG AG AA AA AG AG GG GG AA GG AG
SNP7 CC CC CC CC AC CC AC CC CC CC CC CC CC CC CC
SNP8 -- TC TC TC TC TC TC TC TC TC TT TT CC TT TC
SNP9 CC TC TC TC TC TC CC TC CC CC CC CC TC CC TC
SNP10 TC TC TC TC CC CC CC CC CC CC CC CC TC CC TC
An example SNP data set can be otained by typing the following into the R console:
library(diveRsity)
data(SNPs)
To convert the above file, named “snp_file.txt”, we would execute the following command:
snp2gp(infile = "snp_file.txt", prefix_length = 6)
The argument prefix_length specifies the number of characters at the start of each individuals name that is unique to each population sample. As we can from our file above, this number is 6 for our data.
The function will write a genepop file to the same directory as infile. This file will look like this:
sample1-converted
SNP1, SNP2, SNP3, SNP4, SNP5, SNP6, SNP7, SNP8, SNP9, SNP10
pop
pop001_1 , 0402 0402 0401 0103 0402 0103 0202 0000 0202 0402
pop001_2 , 0402 0402 0401 0103 0402 0103 0202 0402 0402 0402
pop001_3 , 0402 0402 0401 0103 0402 0103 0202 0402 0402 0402
pop001_4 , 0402 0402 0401 0103 0402 0103 0202 0402 0402 0402
pop001_5 , 0402 0402 0101 0103 0402 0103 0102 0402 0402 0202
pop
pop002_1 , 0404 0402 0401 0103 0402 0103 0202 0402 0402 0202
pop002_2 , 0402 0402 0401 0303 0404 0101 0102 0402 0202 0202
pop002_3 , 0402 0402 0404 0303 0404 0101 0202 0402 0402 0202
pop002_4 , 0402 0402 0401 0103 0402 0103 0202 0402 0202 0202
pop002_5 , 0202 0404 0401 0103 0402 0103 0202 0402 0202 0202
pop
pop003_1 , 0404 0202 0101 0101 0202 0303 0202 0404 0202 0202
pop003_2 , 0404 0202 0101 0101 0202 0303 0202 0404 0202 0202
pop003_3 , 0402 0402 0401 0303 0404 0101 0202 0202 0402 0402
pop003_4 , 0202 0202 0101 0101 0202 0303 0202 0404 0202 0202
pop003_5 , 0402 0402 0401 0103 0402 0103 0202 0402 0402 0402
This file can be used for downstream analysis in other function of the diveRsity package, or other software such as GENEPOP, or SMOGD.