Revised Red Squirrel: Data prep, LD-based SNP pruning, PCA, IBD

We will reduce our 78k SNP dataset using LD-based SNP pruning. Then we will confirm the results of our sNMF analysis using a PCA,followed by two identity by descent relatedness analyses (MoM and MLE)

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(vcfR)
library(adegenet)
library(adegraphics)
library(pegas)
library(StAMPP)
library(lattice)
library(gplots)
library(ape)
library(ggmap) 
library(LEA)
#The new packages we'll be using for these analyses (mainly gdsfmt and SNPRelate) need to be installed from Bioconductor Lite
#source("https://bioconductor.org/biocLite.R", echo=FALSE, verbose = FALSE)
#biocLite()
#biocLite(c("GenomicFeatures", "AnnotationDbi"))
#biocLite("SNPRelate")
library("devtools")
library(gdsfmt)
library(SNPRelate)
library(calibrate)

Preparing data

gdsfmt package provides the genome data structure (GDS) file format for array-oriented bioinformatic data, which is a container for annotation and SNP genotypes.

#Clear cache if necessary using
#closeAllConnections()
#snpgdsClose(genofile) #in case you need to close or re-write a gds file
#closefn.gds(test1.gds)

#Load up our vcf file (SNP data for 24 red squirrels)
vcf.fn <- "/Users/tanyalama/vcftools/stringent_revised.vcf.recode.vcf"

#Reformat VCF --> SNP GDS (genomic data structure) Format
snpgdsVCF2GDS(vcf.fn, "test.gds", method="biallelic.only")

## VCF Format ==> SNP GDS Format
## Method: exacting biallelic SNPs
## Number of samples: 23
## Parsing "/Users/tanyalama/vcftools/stringent_revised.vcf.recode.vcf" ...
##  import 73055 variants.
## + genotype   { Bit2 23x73055, 410.2K } *
## Optimize the access efficiency ...
## Clean up the fragments of GDS file:
##     open the file 'test.gds' (872.9K)
##     # of fragments: 53
##     save to 'test.gds.tmp'
##     rename 'test.gds.tmp' (872.5K, reduced: 396B)
##     # of fragments: 20

#Summary
snpgdsSummary("test.gds")

## The file name: /Users/tanyalama/Box Sync/Squirrel/test.gds 
## The total number of samples: 23 
## The total number of SNPs: 73055 
## SNP genotypes are stored in SNP-major mode (Sample X SNP).

genofile <- snpgdsOpen("/Users/tanyalama/Box Sync/Squirrel/test.gds", readonly= FALSE)
#Add population information to your gds genofile 

add.gdsn(genofile, "pop_code", c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)) #all identified to 1 population for starters

#get sample id information
sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))

Data Analysis

gdsfmt and SNPRelate accelerate two key computations in GWAS (genome wide association studies): principal component analysis (PCA) and relatedness analysis using identity-by-descent (IBD) measures.

LD-based SNP pruning

It is suggested to use a pruned set of SNPs which are in approximate linkage equilibrium with each other to avoid the strong influence of SNP clusters in principal component analysis and relatedness analysis.

set.seed(1) #should be 1000, reduced for rmarkdown
#Try different LD thresholds for sensitivity analysis. #Need to include autosome.only = FALSE clause to run.

snpset <- snpgdsLDpruning(genofile, autosome.only = FALSE,remove.monosnp = TRUE, ld.threshold=0.1, verbose = FALSE) #38,384/72,744 markers are selected in total with .1 ld.

#Create an object to hold our ~38k selected SNPs
snpset.id <- unlist(snpset)

Principal Component Analysis (PCA)

The functions in SNPRelate for PCA include calculating the genetic covariance matrix from genotypes, computing the correlation coefficients between sample loadings and genotypes for each SNP, calculating SNP eigenvectors (loadings), and estimating the sample loadings of a new dataset from specified SNP eigenvectors.

#Run PCA using the ldpruned set of 38k SNPs (snpset.id) instead of the full set. 
#get sample id information
genofile

## File: /Users/tanyalama/Box Sync/Squirrel/test.gds (872.5K)
## +    [  ] *
## |--+ sample.id   { Str8 23 ZIP_ra(53.4%), 101B }
## |--+ snp.id   { Int32 73055 ZIP_ra(34.6%), 98.7K }
## |--+ snp.rs.id   { Str8 73055 ZIP_ra(0.14%), 111B }
## |--+ snp.position   { Int32 73055 ZIP_ra(27.5%), 78.5K }
## |--+ snp.chromosome   { Str8 73055 ZIP_ra(5.30%), 64.2K }
## |--+ snp.allele   { Str8 73055 ZIP_ra(14.2%), 40.6K }
## |--+ genotype   { Bit2 23x73055, 410.2K } *
## |--+ snp.annot   [  ]
## |  |--+ qual   { Float32 73055 ZIP_ra(62.1%), 177.2K }
## |  \--+ filter   { Str8 73055 ZIP_ra(0.23%), 831B }
## \--+ pop_code   { Float64 24, 192B }

sample.id <- read.gdsn(index.gdsn(genofile, "sample.id")) #run

pca <- snpgdsPCA(genofile, sample.id = NULL, snp.id=snpset.id, autosome.only = FALSE, num.thread=2, verbose= TRUE)

## Principal Component Analysis (PCA) on genotypes:
## Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
## Working space: 23 samples, 38,794 SNPs
##     using 2 (CPU) cores
## PCA:    the sum of all selected genotypes (0,1,2) = 1524278
## CPU capabilities: Double-Precision SSE2
## Thu Jul 26 14:08:05 2018    (internal increment: 21192)
## 
[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
## Thu Jul 26 14:08:05 2018    Begin (eigenvalues and eigenvectors)
## Thu Jul 26 14:08:05 2018    Done.

#if autosome.only = TRUE, use autosomal (non-sex chromosome) SNPs only; if it is a numeric or character value, keep SNPs according to the specified chromosome

#tried with autosome.only = TRUE and error read "There is no SNP!"

#The following code shows how to calculate the percent of variation that is accounted for by the top principal components. It is clear to see the first two eigenvectors hold the largest percentage of variance among the population, although the total variance accounted for is still less the one-quarter of the total.

# variance proportion (%)
pc.percent <- pca$varprop*100
head(round(pc.percent, 2))

## [1] 5.84 5.62 5.42 5.29 5.24 5.08

#get population information
pop_code<- read.gdsn(index.gdsn(genofile, "pop_code"))
head(cbind(sample.id, pop_code))

##      sample.id pop_code
## [1,] "UMA2_1"  "1"     
## [2,] "UMA35_1" "1"     
## [3,] "RS32_1"  "1"     
## [4,] "UMA20_1" "1"     
## [5,] "UMA39_1" "1"     
## [6,] "UMA19_1" "1"

#make a data.frame
tab <- data.frame(sample.id = pca$sample.id,
    pop = factor(pop_code)[match(pca$sample.id,sample.id)],
    EV1 = pca$eigenvect[,1],    # the first eigenvector
    EV2 = pca$eigenvect[,2],    # the second eigenvector
    stringsAsFactors = FALSE)
tab #view the dataframe

##    sample.id pop          EV1          EV2
## 1     UMA2_1   1 -0.063676681  0.084596181
## 2    UMA35_1   1 -0.005127551  0.069118871
## 3     RS32_1   1 -0.030475294  0.037277902
## 4    UMA20_1   1 -0.058753040 -0.020379122
## 5    UMA39_1   1 -0.053058858  0.095025643
## 6    UMA19_1   1 -0.063469156  0.079980237
## 7    RS104_1   1 -0.164887290  0.127483231
## 8    UMA13_1   1 -0.043263337 -0.004482035
## 9     UMA5_1   1 -0.039427354  0.010210736
## 10    UMA3_1   1 -0.096313791 -0.675158601
## 11   UMA26_1   1 -0.040642768 -0.001002618
## 12    RS33_1   1 -0.067450454  0.103904687
## 13   UMA23_1   1 -0.083062793  0.026626533
## 14   UMA40_1   1  0.659397709 -0.005952129
## 15    UMA8_1   1 -0.047733814  0.106796905
## 16    UMA6_1   1 -0.005187611  0.057058837
## 17    UMA9_1   1  0.015748784  0.057635036
## 18   UMA38_1   1  0.660806103 -0.005901249
## 19   UMA17_1   1 -0.192091152  0.250090104
## 20   UMA41_1   1 -0.046733023  0.058452053
## 21   UMA37_1   1 -0.069636955  0.077839051
## 22   UMA21_1   1 -0.090893932 -0.623080340
## 23   UMA42_1   1 -0.074067744  0.093860089

plot(tab$EV2, tab$EV1, xlab="eigenvector 2", ylab="eigenvector 1") #three distinct groups

##Zoom In

Relatedness Analysis: Estimating IBD Using PLINK method of moments (MoM)

For relatedness analysis, identity-by-descent (IBD) estimation in SNPRelate can be done by either the method of moments (MoM) (Purcell et al., 2007) or maximum likelihood estimation (MLE) (Milligan, 2003; Choi et al., 2009). For both of these methods it is preffered to use a LD pruned SNP set.

sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))

#Estimate IBD coefficients
ibd <- snpgdsIBDMoM(genofile, sample.id, snp.id=snpset.id, autosome.only = FALSE, remove.monosnp= TRUE, kinship = TRUE, maf=0.05, missing.rate=0.05, num.thread=2, verbose= TRUE)

## IBD analysis (PLINK method of moment) on genotypes:
## Excluding 23,659 SNPs (monomorphic: TRUE, MAF: 0.05, missing rate: 0.05)
## Working space: 23 samples, 15,135 SNPs
##     using 2 (CPU) cores
## PLINK IBD:    the sum of all selected genotypes (0,1,2) = 507583
## Thu Jul 26 14:08:06 2018    (internal increment: 65536)
## 
[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
## Thu Jul 26 14:08:06 2018    Done.

  #maf (minor allele frequency) threshold is the frequency   at which the second most common allele occurs in a given   population. 

#Make a dataframe
ibd.coeff <- snpgdsIBDSelection(ibd)
head(ibd.coeff)

##      ID1     ID2 k0 k1 kinship
## 1 UMA2_1 UMA35_1  1  0       0
## 2 UMA2_1  RS32_1  1  0       0
## 3 UMA2_1 UMA20_1  1  0       0
## 4 UMA2_1 UMA39_1  1  0       0
## 5 UMA2_1 UMA19_1  1  0       0
## 6 UMA2_1 RS104_1  1  0       0

ibd.coeff

##         ID1     ID2          k0         k1   kinship
## 1    UMA2_1 UMA35_1 1.000000000 0.00000000 0.0000000
## 2    UMA2_1  RS32_1 1.000000000 0.00000000 0.0000000
## 3    UMA2_1 UMA20_1 1.000000000 0.00000000 0.0000000
## 4    UMA2_1 UMA39_1 1.000000000 0.00000000 0.0000000
## 5    UMA2_1 UMA19_1 1.000000000 0.00000000 0.0000000
## 6    UMA2_1 RS104_1 1.000000000 0.00000000 0.0000000
## 7    UMA2_1 UMA13_1 1.000000000 0.00000000 0.0000000
## 8    UMA2_1  UMA5_1 1.000000000 0.00000000 0.0000000
## 9    UMA2_1  UMA3_1 1.000000000 0.00000000 0.0000000
## 10   UMA2_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 11   UMA2_1  RS33_1 1.000000000 0.00000000 0.0000000
## 12   UMA2_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 13   UMA2_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 14   UMA2_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 15   UMA2_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 16   UMA2_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 17   UMA2_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 18   UMA2_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 19   UMA2_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 20   UMA2_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 21   UMA2_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 22   UMA2_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 23  UMA35_1  RS32_1 1.000000000 0.00000000 0.0000000
## 24  UMA35_1 UMA20_1 1.000000000 0.00000000 0.0000000
## 25  UMA35_1 UMA39_1 1.000000000 0.00000000 0.0000000
## 26  UMA35_1 UMA19_1 1.000000000 0.00000000 0.0000000
## 27  UMA35_1 RS104_1 1.000000000 0.00000000 0.0000000
## 28  UMA35_1 UMA13_1 1.000000000 0.00000000 0.0000000
## 29  UMA35_1  UMA5_1 1.000000000 0.00000000 0.0000000
## 30  UMA35_1  UMA3_1 1.000000000 0.00000000 0.0000000
## 31  UMA35_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 32  UMA35_1  RS33_1 1.000000000 0.00000000 0.0000000
## 33  UMA35_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 34  UMA35_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 35  UMA35_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 36  UMA35_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 37  UMA35_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 38  UMA35_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 39  UMA35_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 40  UMA35_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 41  UMA35_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 42  UMA35_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 43  UMA35_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 44   RS32_1 UMA20_1 1.000000000 0.00000000 0.0000000
## 45   RS32_1 UMA39_1 1.000000000 0.00000000 0.0000000
## 46   RS32_1 UMA19_1 1.000000000 0.00000000 0.0000000
## 47   RS32_1 RS104_1 1.000000000 0.00000000 0.0000000
## 48   RS32_1 UMA13_1 1.000000000 0.00000000 0.0000000
## 49   RS32_1  UMA5_1 1.000000000 0.00000000 0.0000000
## 50   RS32_1  UMA3_1 1.000000000 0.00000000 0.0000000
## 51   RS32_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 52   RS32_1  RS33_1 1.000000000 0.00000000 0.0000000
## 53   RS32_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 54   RS32_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 55   RS32_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 56   RS32_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 57   RS32_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 58   RS32_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 59   RS32_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 60   RS32_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 61   RS32_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 62   RS32_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 63   RS32_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 64  UMA20_1 UMA39_1 1.000000000 0.00000000 0.0000000
## 65  UMA20_1 UMA19_1 1.000000000 0.00000000 0.0000000
## 66  UMA20_1 RS104_1 1.000000000 0.00000000 0.0000000
## 67  UMA20_1 UMA13_1 1.000000000 0.00000000 0.0000000
## 68  UMA20_1  UMA5_1 1.000000000 0.00000000 0.0000000
## 69  UMA20_1  UMA3_1 1.000000000 0.00000000 0.0000000
## 70  UMA20_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 71  UMA20_1  RS33_1 1.000000000 0.00000000 0.0000000
## 72  UMA20_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 73  UMA20_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 74  UMA20_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 75  UMA20_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 76  UMA20_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 77  UMA20_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 78  UMA20_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 79  UMA20_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 80  UMA20_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 81  UMA20_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 82  UMA20_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 83  UMA39_1 UMA19_1 1.000000000 0.00000000 0.0000000
## 84  UMA39_1 RS104_1 1.000000000 0.00000000 0.0000000
## 85  UMA39_1 UMA13_1 1.000000000 0.00000000 0.0000000
## 86  UMA39_1  UMA5_1 1.000000000 0.00000000 0.0000000
## 87  UMA39_1  UMA3_1 1.000000000 0.00000000 0.0000000
## 88  UMA39_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 89  UMA39_1  RS33_1 1.000000000 0.00000000 0.0000000
## 90  UMA39_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 91  UMA39_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 92  UMA39_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 93  UMA39_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 94  UMA39_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 95  UMA39_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 96  UMA39_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 97  UMA39_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 98  UMA39_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 99  UMA39_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 100 UMA39_1 UMA42_1 0.316169023 0.45225758 0.2288511
## 101 UMA19_1 RS104_1 1.000000000 0.00000000 0.0000000
## 102 UMA19_1 UMA13_1 1.000000000 0.00000000 0.0000000
## 103 UMA19_1  UMA5_1 1.000000000 0.00000000 0.0000000
## 104 UMA19_1  UMA3_1 1.000000000 0.00000000 0.0000000
## 105 UMA19_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 106 UMA19_1  RS33_1 1.000000000 0.00000000 0.0000000
## 107 UMA19_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 108 UMA19_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 109 UMA19_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 110 UMA19_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 111 UMA19_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 112 UMA19_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 113 UMA19_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 114 UMA19_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 115 UMA19_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 116 UMA19_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 117 UMA19_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 118 RS104_1 UMA13_1 1.000000000 0.00000000 0.0000000
## 119 RS104_1  UMA5_1 1.000000000 0.00000000 0.0000000
## 120 RS104_1  UMA3_1 1.000000000 0.00000000 0.0000000
## 121 RS104_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 122 RS104_1  RS33_1 1.000000000 0.00000000 0.0000000
## 123 RS104_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 124 RS104_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 125 RS104_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 126 RS104_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 127 RS104_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 128 RS104_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 129 RS104_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 130 RS104_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 131 RS104_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 132 RS104_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 133 RS104_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 134 UMA13_1  UMA5_1 1.000000000 0.00000000 0.0000000
## 135 UMA13_1  UMA3_1 1.000000000 0.00000000 0.0000000
## 136 UMA13_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 137 UMA13_1  RS33_1 1.000000000 0.00000000 0.0000000
## 138 UMA13_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 139 UMA13_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 140 UMA13_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 141 UMA13_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 142 UMA13_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 143 UMA13_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 144 UMA13_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 145 UMA13_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 146 UMA13_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 147 UMA13_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 148 UMA13_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 149  UMA5_1  UMA3_1 1.000000000 0.00000000 0.0000000
## 150  UMA5_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 151  UMA5_1  RS33_1 1.000000000 0.00000000 0.0000000
## 152  UMA5_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 153  UMA5_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 154  UMA5_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 155  UMA5_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 156  UMA5_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 157  UMA5_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 158  UMA5_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 159  UMA5_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 160  UMA5_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 161  UMA5_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 162  UMA5_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 163  UMA3_1 UMA26_1 1.000000000 0.00000000 0.0000000
## 164  UMA3_1  RS33_1 1.000000000 0.00000000 0.0000000
## 165  UMA3_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 166  UMA3_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 167  UMA3_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 168  UMA3_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 169  UMA3_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 170  UMA3_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 171  UMA3_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 172  UMA3_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 173  UMA3_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 174  UMA3_1 UMA21_1 0.333725723 0.43209317 0.2251138
## 175  UMA3_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 176 UMA26_1  RS33_1 1.000000000 0.00000000 0.0000000
## 177 UMA26_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 178 UMA26_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 179 UMA26_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 180 UMA26_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 181 UMA26_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 182 UMA26_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 183 UMA26_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 184 UMA26_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 185 UMA26_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 186 UMA26_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 187 UMA26_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 188  RS33_1 UMA23_1 1.000000000 0.00000000 0.0000000
## 189  RS33_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 190  RS33_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 191  RS33_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 192  RS33_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 193  RS33_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 194  RS33_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 195  RS33_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 196  RS33_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 197  RS33_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 198  RS33_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 199 UMA23_1 UMA40_1 1.000000000 0.00000000 0.0000000
## 200 UMA23_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 201 UMA23_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 202 UMA23_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 203 UMA23_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 204 UMA23_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 205 UMA23_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 206 UMA23_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 207 UMA23_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 208 UMA23_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 209 UMA40_1  UMA8_1 1.000000000 0.00000000 0.0000000
## 210 UMA40_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 211 UMA40_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 212 UMA40_1 UMA38_1 0.001413618 0.01656152 0.4951528
## 213 UMA40_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 214 UMA40_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 215 UMA40_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 216 UMA40_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 217 UMA40_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 218  UMA8_1  UMA6_1 1.000000000 0.00000000 0.0000000
## 219  UMA8_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 220  UMA8_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 221  UMA8_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 222  UMA8_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 223  UMA8_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 224  UMA8_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 225  UMA8_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 226  UMA6_1  UMA9_1 1.000000000 0.00000000 0.0000000
## 227  UMA6_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 228  UMA6_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 229  UMA6_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 230  UMA6_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 231  UMA6_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 232  UMA6_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 233  UMA9_1 UMA38_1 1.000000000 0.00000000 0.0000000
## 234  UMA9_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 235  UMA9_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 236  UMA9_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 237  UMA9_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 238  UMA9_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 239 UMA38_1 UMA17_1 1.000000000 0.00000000 0.0000000
## 240 UMA38_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 241 UMA38_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 242 UMA38_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 243 UMA38_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 244 UMA17_1 UMA41_1 1.000000000 0.00000000 0.0000000
## 245 UMA17_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 246 UMA17_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 247 UMA17_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 248 UMA41_1 UMA37_1 1.000000000 0.00000000 0.0000000
## 249 UMA41_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 250 UMA41_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 251 UMA37_1 UMA21_1 1.000000000 0.00000000 0.0000000
## 252 UMA37_1 UMA42_1 1.000000000 0.00000000 0.0000000
## 253 UMA21_1 UMA42_1 1.000000000 0.00000000 0.0000000

Estimating IBD: Maximum Likelihood Estimation

Identical by descent (IBD) describes a matching segment of DNA shared by two or more individuals that has been inherited from a common ancestor without any intervening recombination. Alleles that are IBD have been interested by a recent common ancestor (and indicate recent gene flow).

# Estimate IBD coefficients
set.seed(1) #should be 1000
ibd <- snpgdsIBDMLE(genofile, sample.id = NULL, snp.id=snpset.id,autosome.only = FALSE, remove.monosnp= TRUE, maf=0.05, missing.rate=0.05, kinship = TRUE, num.thread=2, verbose = TRUE)

## Identity-By-Descent analysis (MLE) on genotypes:
## Excluding 23,659 SNPs (monomorphic: TRUE, MAF: 0.05, missing rate: 0.05)
## Working space: 23 samples, 15,135 SNPs
##     using 2 (CPU) cores
## MLE IBD:    the sum of all selected genotypes (0,1,2) = 507583
## MLE IBD: Thu Jul 26 14:08:06 2018    0%
## MLE IBD: Thu Jul 26 14:08:11 2018    100%

# Make a dataframe
ibd.coeff <- snpgdsIBDSelection(ibd)