Contents

A short report on a Discriminant Analysis of Principal Components (DAPC) and neighbour joining tree created in R for the species Grosmannia clavigera (Gc).

Data Used

The sequence data came from ion torrent sequencing of 267 specimens of Gc. Filtering and VCF creation used VCFTools and the options:

After filtration, 17367 sites were kept out of a total possible 205900 sites. The file is labelled Gc_HQ.recode.vcf and is available on request.

Code

Set up

library(ape)
library(adegenet)
## Loading required package: ade4
## 
##    /// adegenet 2.1.1 is loaded ////////////
## 
##    > overview: '?adegenet'
##    > tutorials/doc/questions: 'adegenetWeb()' 
##    > bug reports/feature requests: adegenetIssues()
Gc_genlight <- read.PLINK("/Users/Ryan/Desktop/TRIA-PCA/Gc_HQ.recode.plink.ped.raw.raw", map.file = "/Users/Ryan/Desktop/TRIA-PCA/Gc_HQ.recode.plink.map", parallel=FALSE)
## 
##  Reading PLINK raw format into a genlight object... 
## 
## 
##  Reading loci information... 
## 
##  Reading and converting genotypes... 
## .
##  Building final object... 
## 
## ...done.
Gc_Specimen_List<-read.table("/Users/Ryan/Desktop/TRIA-PCA/Gc_HQ.recode.SpecimenList.txt")
Gc_names<-as.vector(Gc_Specimen_List$V1)
Gc_genlight<-`indNames<-`(Gc_genlight, Gc_names)
Gc_genlight<-`popNames<-`(Gc_genlight, Gc_names)

temp<-as.data.frame(Gc_genlight)
Gc_genind <- df2genind(temp, ploidy=2, sep ="\t")

Conducting DAPC

grpGc <- find.clusters(Gc_genlight, n.clust = 5, n.pca = 200)
dapcGc <- dapc(Gc_genlight, grpGc$grp, n.pca = 50, n.da = 2)

scatter.dapc(dapcGc)

myCol<-c("darkblue","purple","green","orange","red")
scatter(dapcGc, scree.da=FALSE, bg="white", pch=20, cell=0, cstar=0, col=myCol, solid=.4,
        cex=3,clab=0, leg=TRUE, txt.leg=paste("Cluster",1:5))

Creating the Tree

D <- dist(tab(Gc_genind))

tre<-nj(D)

plot(tre, type= "unrooted", edge.w=2, cex=0.5)

plot(tre, type= "radial", edge.w=2, cex=0.5)

plot(tre, type= "fan", edge.w=2, cex=0.5)


Results

Magnifying the plots reveals a recurring set of samples that are outliers. These samples are:

  • TRIA 200
  • TRIA 286
  • TRIA 129
  • TRIA 809
  • TRIA 821
  • TRIA 799
  • TRIA 75
  • TRIA 177
  • TRIA 66
  • TRIA 7
  • TRIA 467

Location and collection data for these specimens are seen below

Gc_data <- read.csv("/Users/Ryan/Desktop/TRIA-PCA/Gc Specimen Subset for Arnaud - Sheet1.csv")

Gc_data
##          Unique.Sample.ID TRIA.name Consensus.ID Latitude Longitude
## 1  MO 24 04-02-03FC-02-03   TRIA200           Gc 54.26290 -116.6085
## 2  MO 24 04-02-09FC-02-09   TRIA286           Gc 54.26260 -116.6083
## 3  MO 24 05-01-02LS-01-02   TRIA129           Gc 54.81937 -116.6917
## 4           M033 10-02-01   TRIA809           Gc 45.27414 -115.9097
## 5           M033 10-02-08   TRIA821           Gc 45.26091 -115.9023
## 6           M033 10-02-03   TRIA799           Gc 45.27387 -115.9096
## 7   M024-10-01-05Va-01-05    TRIA75           Gc 52.80584 -119.2664
## 8   M024-14-01-08Ca-01-08   TRIA177           Gc 50.43235 -122.2530
## 9   M024-08-01-04TR-08-04    TRIA66           Gc 55.17911 -121.1122
## 10  M024-07-03-04PA-03-04     TRIA7           Gc 56.62360 -119.8976
## 11          M033-01-01-06   TRIA467           Gc 43.95055 -103.6307
##    Elevation..m.
## 1          900.9
## 2          899.5
## 3          760.0
## 4         1872.1
## 5         1849.9
## 6         1868.7
## 7          813.0
## 8         1128.2
## 9         1045.9
## 10         903.3
## 11        1684.9