So this is why we keep electronc notebooks, I went back to my old Geoduck stuff, grabbed the relatedness matrix stuff, and cut and paste.
setwd("~/Documents/C-virginica-BSSeq/MethylExtract/VCFs/")
vcf.filenames<- list.files(path = ".", pattern = ".vcf")
## Goes through and bgzips and tabixs each file, a requirement of vcf-merge
for(i in 1:length(vcf.filenames)) {
system(paste0("bgzip -c ", vcf.filenames[i], " > ", substr(vcf.filenames[i],1 , 22), ".gz"))
system(paste0("tabix ", substr(vcf.filenames[i],1 , 22), ".gz"))
}
list.files(path = ".", pattern = "*.*")
[1] "2112_lane1_ACAGTG_SNVs.gz" "2112_lane1_ACAGTG_SNVs.gz.tbi" "2112_lane1_ACAGTG_SNVs.vcf"
[4] "2112_lane1_ATCACG_SNVs.gz" "2112_lane1_ATCACG_SNVs.gz.tbi" "2112_lane1_ATCACG_SNVs.vcf"
[7] "2112_lane1_CAGATC_SNVs.gz" "2112_lane1_CAGATC_SNVs.gz.tbi" "2112_lane1_CAGATC_SNVs.vcf"
[10] "2112_lane1_GCCAAT_SNVs.gz" "2112_lane1_GCCAAT_SNVs.gz.tbi" "2112_lane1_GCCAAT_SNVs.vcf"
[13] "2112_lane1_TGACCA_SNVs.gz" "2112_lane1_TGACCA_SNVs.gz.tbi" "2112_lane1_TGACCA_SNVs.vcf"
[16] "2112_lane1_TTAGGC_SNVs.gz" "2112_lane1_TTAGGC_SNVs.gz.tbi" "2112_lane1_TTAGGC_SNVs.vcf"
## performes the actual merge on all avaialble bgzipped files. It would be wise to only have VCF files in this directory, so strange things don't happen.
setwd("~/Documents/C-virginica-BSSeq/MethylExtract/VCFs/")
system("vcf-merge *.gz > merged.vcf")
gzip: stdout: Broken pipe
Using column name 'NA0001' for 2112_lane1_ACAGTG_SNVs.gz:NA0001
gzip: stdout: Broken pipe
Using column name '2112_lane1_ATCACG_SNVs_NA0001' for 2112_lane1_ATCACG_SNVs.gz:NA0001
gzip: stdout: Broken pipe
Using column name '2112_lane1_CAGATC_SNVs_NA0001' for 2112_lane1_CAGATC_SNVs.gz:NA0001
gzip: stdout: Broken pipe
Using column name '2112_lane1_GCCAAT_SNVs_NA0001' for 2112_lane1_GCCAAT_SNVs.gz:NA0001
gzip: stdout: Broken pipe
Using column name '2112_lane1_TGACCA_SNVs_NA0001' for 2112_lane1_TGACCA_SNVs.gz:NA0001
gzip: stdout: Broken pipe
Using column name '2112_lane1_TTAGGC_SNVs_NA0001' for 2112_lane1_TTAGGC_SNVs.gz:NA0001
system("vcftools --relatedness --vcf merged.vcf --out relatedness")
VCFtools - UNKNOWN
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf merged.vcf
--out relatedness
--relatedness
After filtering, kept 6 out of 6 Individuals
Outputting Individual Relatedness
Relatedness: Only using fully diploid sites.
Relatedness: Only using biallelic sites.
After filtering, kept 307273 out of a possible 307273 Sites
Run Time = 2.00 seconds
system("head relatedness.relatedness")
INDV1 INDV2 RELATEDNESS_AJK
NA0001 NA0001 0.909934
NA0001 2112_lane1_ATCACG_SNVs_NA0001 -0.104487
NA0001 2112_lane1_CAGATC_SNVs_NA0001 -0.0215528
NA0001 2112_lane1_GCCAAT_SNVs_NA0001 -0.276806
NA0001 2112_lane1_TGACCA_SNVs_NA0001 -0.120841
NA0001 2112_lane1_TTAGGC_SNVs_NA0001 -0.103631
2112_lane1_ATCACG_SNVs_NA0001 2112_lane1_ATCACG_SNVs_NA0001 0.893281
2112_lane1_ATCACG_SNVs_NA0001 2112_lane1_CAGATC_SNVs_NA0001 -0.0845831
2112_lane1_ATCACG_SNVs_NA0001 2112_lane1_GCCAAT_SNVs_NA0001 -0.339836
That looks like my relatedness matrix. Now off to Excel to make the complete matrix for Macau.