A short report on a Discriminant Analysis of Principal Components (DAPC) and neighbour joining tree created in R for the species Grosmannia clavigera (Gc).
The sequence data came from ion torrent sequencing of 267 specimens of Gc. Filtering and VCF creation used VCFTools and the options:
After filtration, 17367 sites were kept out of a total possible 205900 sites. The file is labelled Gc_HQ.recode.vcf and is available on request.
library(ape)
library(adegenet)
## Loading required package: ade4
##
## /// adegenet 2.1.1 is loaded ////////////
##
## > overview: '?adegenet'
## > tutorials/doc/questions: 'adegenetWeb()'
## > bug reports/feature requests: adegenetIssues()
Gc_genlight <- read.PLINK("/Users/Ryan/Desktop/TRIA-PCA/Gc_HQ.recode.plink.ped.raw.raw", map.file = "/Users/Ryan/Desktop/TRIA-PCA/Gc_HQ.recode.plink.map", parallel=FALSE)
##
## Reading PLINK raw format into a genlight object...
##
##
## Reading loci information...
##
## Reading and converting genotypes...
## .
## Building final object...
##
## ...done.
Gc_Specimen_List<-read.table("/Users/Ryan/Desktop/TRIA-PCA/Gc_HQ.recode.SpecimenList.txt")
Gc_names<-as.vector(Gc_Specimen_List$V1)
Gc_genlight<-`indNames<-`(Gc_genlight, Gc_names)
Gc_genlight<-`popNames<-`(Gc_genlight, Gc_names)
temp<-as.data.frame(Gc_genlight)
Gc_genind <- df2genind(temp, ploidy=2, sep ="\t")
grpGc <- find.clusters(Gc_genlight, n.clust = 5, n.pca = 200)
dapcGc <- dapc(Gc_genlight, grpGc$grp, n.pca = 50, n.da = 2)
scatter.dapc(dapcGc)
myCol<-c("darkblue","purple","green","orange","red")
scatter(dapcGc, scree.da=FALSE, bg="white", pch=20, cell=0, cstar=0, col=myCol, solid=.4,
cex=3,clab=0, leg=TRUE, txt.leg=paste("Cluster",1:5))
D <- dist(tab(Gc_genind))
tre<-nj(D)
plot(tre, type= "unrooted", edge.w=2, cex=0.5)
plot(tre, type= "radial", edge.w=2, cex=0.5)
plot(tre, type= "fan", edge.w=2, cex=0.5)
Magnifying the plots reveals a recurring set of samples that are outliers. These samples are:
Location and collection data for these specimens are seen below
Gc_data <- read.csv("/Users/Ryan/Desktop/TRIA-PCA/Gc Specimen Subset for Arnaud - Sheet1.csv")
Gc_data
## Unique.Sample.ID TRIA.name Consensus.ID Latitude Longitude
## 1 MO 24 04-02-03FC-02-03 TRIA200 Gc 54.26290 -116.6085
## 2 MO 24 04-02-09FC-02-09 TRIA286 Gc 54.26260 -116.6083
## 3 MO 24 05-01-02LS-01-02 TRIA129 Gc 54.81937 -116.6917
## 4 M033 10-02-01 TRIA809 Gc 45.27414 -115.9097
## 5 M033 10-02-08 TRIA821 Gc 45.26091 -115.9023
## 6 M033 10-02-03 TRIA799 Gc 45.27387 -115.9096
## 7 M024-10-01-05Va-01-05 TRIA75 Gc 52.80584 -119.2664
## 8 M024-14-01-08Ca-01-08 TRIA177 Gc 50.43235 -122.2530
## 9 M024-08-01-04TR-08-04 TRIA66 Gc 55.17911 -121.1122
## 10 M024-07-03-04PA-03-04 TRIA7 Gc 56.62360 -119.8976
## 11 M033-01-01-06 TRIA467 Gc 43.95055 -103.6307
## Elevation..m.
## 1 900.9
## 2 899.5
## 3 760.0
## 4 1872.1
## 5 1849.9
## 6 1868.7
## 7 813.0
## 8 1128.2
## 9 1045.9
## 10 903.3
## 11 1684.9