References and resources

Sequencing data is of an individual (NA12892) from the 1000 genome project.

http://www.internationalgenome.org/1000-genomes-browsers/

Machine

Google Cloud Platform (GCP), 4 vCPUs, 15 GB RAM

Google Cloud Platform (GCP), 8 vCPUs, 30 GB RAM

Google Cloud Platform (GCP), 24 vCPUs, 32 GB RAM

AWS: c5.18xlarge (72 vCPUs, 144 GB RAM)

AWS: m5.24xlarge (96 vCPUs, 384 GiB RAM)

Voltage-gated sodium channel subunit genes

There are 14 voltage-gated sodium channel subunit genes in humans.

Ten of them code for alpha subunits.

Four of them code for beta subunits.

Chromosomal co-ordinates of voltage-gated sodium channel subunits

scn1a/Nav1.1            Chromosome 2, NC_000002.12 (165989160..166149216, complement)

scn2a/Nav1.2            Chromosome 2, NC_000002.12 (165208056..165392310)

scn3a/Nav1.3            Chromosome 2, NC_000002.12 (165087520..165204295, complement)

scn4a/Nav1.4            Chromosome 17, NC_000017.11 (63938554..63972918, complement)

scn5a/Nav1.5            Chromosome 3, NC_000003.12 (38548061..38649673, complement)

scn7a/Nax               Chromosome 2, NC_000002.12 (166403573..166494264, complement)

scn8a/Nav1.6            Chromosome 12, NC_000012.12 (51589958..51812864)

scn9a/Nav1.7            Chromosome 2, NC_000002.12 (166195185..166375987, complement)

scn10a/Nav1.8           Chromosome 3, NC_000003.12 (38697110..38794010, complement)

scn11a/Nav1.9           Chromosome 3, NC_000003.12 (38845764..39051945, complement)


scn1b                   Chromosome 19, NC_000019.10 (35030688..35040449)

scn2b                   Chromosome 11, NC_000011.10 (118162804..118176622, complement)

scn3b                   Chromosome 11, NC_000011.10 (123629187..123654607, complement)

scn4b                   Chromosome 11, NC_000011.10 (118133377..118152915, complement)

Chromosome-wise voltage-gated sodium channel subunits

scn1a           Chromosome 2, NC_000002.12 (165989160..166149216, complement)
scn2a               Chromosome 2, NC_000002.12 (165208056..165392310)
scn3a               Chromosome 2, NC_000002.12 (165087520..165204295, complement)
scn7a              Chromosome 2, NC_000002.12 (166403573..166494264, complement)
scn9a               Chromosome 2, NC_000002.12 (166195185..166375987, complement)

scn5a               Chromosome 3, NC_000003.12 (38548061..38649673, complement)
scn10a       Chromosome 3, NC_000003.12 (38697110..38794010, complement)
scn11a              Chromosome 3, NC_000003.12 (38845764..39051945, complement)

scn2b              Chromosome 11, NC_000011.10 (118162804..118176622, complement)
scn3b              Chromosome 11, NC_000011.10 (123629187..123654607, complement)
scn4b              Chromosome 11, NC_000011.10 (118133377..118152915, complement)

scn8a               Chromosome 12, NC_000012.12 (51589958..51812864)

scn4a               Chromosome 17, NC_000017.11 (63938554..63972918, complement)

scn1b           Chromosome 19, NC_000019.10 (35030688..35040449)

Download analysis datasets (for individual NA12892) from 1000 genome project site

~/gatk-data-ref$ wget -bqc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR622/SRR622459/SRR622459_1.fastq.gz

~/gatk-data-ref$ wget -bqc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR622/SRR622459/SRR622459_2.fastq.gz

Directory content

~/gatk-data-ref$ ls -l

-rw-rw-r– 1 ubuntu ubuntu 68029335546 Apr 30 15:50 SRR622459_1.fastq.gz

-rw-rw-r– 1 ubuntu ubuntu 69219443789 Apr 30 15:54 SRR622459_2.fastq.gz

Check data content

~/gatk-data-ref$ zcat SRR622459_1.fastq.gz | head -n 2

@SRR622459.1 1/1

GTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGAGATCGGAAG

~/gatk-data-ref$ zcat SRR622459_2.fastq.gz | head -n 2

@SRR622459.1 1/2

CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACAGATCGGAAG

Download reference (fasta, fasta index, dictionary) and BWA index (.alt, .amb, .ann, .bwt, .pac and .sa) files

        ~/gatk-data-ref$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta .

        ~/gatk-data-ref$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.dict .

        ~/gatk-data-ref$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai .


 ~/gatk-data-ref$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt .

 
 ~/gatk-data-ref$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb .

 
 ~/gatk-data-ref$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann .

 
 ~/gatk-data-ref$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt .

 
 ~/gatk-data-ref$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac .

 
 ~/gatk-data-ref$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa .

Additional directory contents

~/gatk-data-ref$ ls -l

-rw-rw-r– 1 ubuntu ubuntu 581712 Jan 6 2016 Homo_sapiens_assembly38.dict

-rw-rw-r– 1 ubuntu ubuntu 3249912778 Jan 5 2016 Homo_sapiens_assembly38.fasta

-rw-rw-r– 1 ubuntu ubuntu 487553 Nov 6 23:47 Homo_sapiens_assembly38.fasta.64.alt

-rw-rw-r– 1 ubuntu ubuntu 20199 Nov 6 23:47 Homo_sapiens_assembly38.fasta.64.amb

-rw-rw-r– 1 ubuntu ubuntu 455474 Nov 6 23:47 Homo_sapiens_assembly38.fasta.64.ann

-rw-rw-r– 1 ubuntu ubuntu 3217347004 Nov 6 23:47 Homo_sapiens_assembly38.fasta.64.bwt

-rw-rw-r– 1 ubuntu ubuntu 804336731 Nov 6 23:48 Homo_sapiens_assembly38.fasta.64.pac

-rw-rw-r– 1 ubuntu ubuntu 1608673512 Nov 6 23:48 Homo_sapiens_assembly38.fasta.64.sa

-rw-rw-r– 1 ubuntu ubuntu 160928 Dec 1 2016 Homo_sapiens_assembly38.fasta.fai

Align the sequences to the reference genome sequence using BWA

Alignment (bwa mem) will lead to a SAM (sequence alignment file) file, that will be converted to a BAM file (using samtools view).

~/gatk-data-ref$ bwa mem -M -t 96 -R ‘@RG:SRR622459:Q:illumina:FCC1H7WACXX:NA12892’ Homo_sapiens_assembly38.fasta SRR622459_1.fastq.gz SRR622459_2.fastq.gz | samtools view -Sbh -o SRR622459_1.bam

main] Version: 0.7.17-r1188

[main] CMD: bwa mem -M -t 96 -R @RG:SRR622459:Q:illumina:FCC1H7WACXX:NA12892 Homo_sapiens_assembly38.fasta SRR622459_1.fastq.gz SRR622459_2.fastq.gz

[main] Real time: 22680.497 sec; CPU: 1393622.442 sec

Generate alignment statistics

~/gatk-data-ref$ sambamba flagstat -p SRR622459_1.bam

sambamba 0.6.9 by Artem Tarasov and Pjotr Prins (C) 2012-2019

2467570854 + 0 in total (QC-passed reads + QC-failed reads)

10585005 + 0 secondary

11607447 + 0 supplementary

0 + 0 duplicates

2364732708 + 0 mapped (95.83%:N/A)

2445378402 + 0 paired in sequencing

1222689201 + 0 read1

1222689201 + 0 read2

2254822518 + 0 properly paired (92.21%:N/A)

2313333564 + 0 with itself and mate mapped

29206692 + 0 singletons (1.19%:N/A)

27935066 + 0 with mate mapped to a different chr

16234878 + 0 with mate mapped to a different chr (mapQ>=5)

Additional directory contents

~/gatk-data-ref$ ls -l

-rw-rw-r– 1 ubuntu ubuntu 200710884147 May 2 11:47 SRR622459_1.bam

-rw-rw-r– 1 ubuntu ubuntu 192007490886 May 2 13:42 SRR622459_1.sorted.bam

-rw-rw-r– 1 ubuntu ubuntu 9775584 May 2 14:05 SRR622459_1.sorted.bam.bai

-rw-rw-r– 1 ubuntu ubuntu 4257061759 May 2 15:43 SRR622459_1.sorted.chr20.bam

Sort and index the BAM file

~/gatk-data-ref$ samtools sort -@ 96 -m 3G SRR622459_1.bam -o SRR622459_1.sorted.bam

[bam_sort_core] merging from 192 files and 96 in-memory blocks…

~/gatk-data-ref$ samtools index -@ 96 SRR622459_1.sorted.bam

Additional directory contents

~/gatk-data-ref$ ls -l

-rw-rw-r– 1 ubuntu ubuntu 192007490886 May 2 13:42 SRR622459_1.sorted.bam

-rw-rw-r– 1 ubuntu ubuntu 9775584 May 2 14:05 SRR622459_1.sorted.bam.bai

Validate the BAM file (for GATK analysis)

~/gatk-data-ref$ gatk ValidateSamFile -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.bam -M SUMMARY -O summary-SRR622459_1.sorted

~/gatk-data-ref$ cat summary-SRR622459_1.sorted

## HISTOGRAM    java.lang.String

Error Type                          Count

ERROR:INVALID_TAG_NM            245,444

Fix the error reported above

~/gatk-data-ref$ gatk SetNmMdAndUqTags -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.bam -O SRR622459_1.sorted.NmMdTqTgs.bam

Using GATK jar /home/b0d2647/miniconda3/share/gatk4-4.1.2.0-0/gatk-package-4.1.2.0-local.jar

validate the BAM file again and create an index

~/gatk-data-ref$ gatk ValidateSamFile -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.bam -M SUMMARY -O summary-SRR622459_1.sorted.NmMdTqTgs

[Sat May 04 19:07:58 UTC 2019] picard.sam.ValidateSamFile done. Elapsed time: 331.75 minutes.

…………..

Tool returned: 0

~/gatk-data-ref$ cat summary-SRR622459_1.sorted.NmMdTqTgs

No errors found

~/gatk-data-ref$ samtools index SRR622459_1.sorted.NmMdTqTgs.bam

Mark duplicates and index the BAM file created

~/gatk-data-ref$ nohup gatk MarkDuplicates -I SRR622459_1.sorted.NmMdTqTgs.bam -O SRR622459_1.sorted.NmMdTqTgs.mdup.bam -M SRR622459_1.sorted.NmMdTqTgs.dupMetrics.txt >& 59_1.fxNm.log &

~/gatk-data-ref$ tail 59_1.fxNm.log

……………..

[Sun May 05 05:03:29 UTC 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 680.30 minutes.

Runtime.totalMemory()=3481796608

Tool returned: 0

Using GATK jar /home/b0d2647/miniconda3/share/gatk4-4.1.2.0-0/gatk-package-4.1.2.0-local.jar

Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/b0d2647/miniconda3/sha re/gatk4-4.1.2.0-0/gatk-package-4.1.2.0-local.jar MarkDuplicates -I SRR622459_1.sorted.NmMdTqTgs.bam -O SRR622459_1.sorted.NmMdTqTgs.mdup.bam -M SRR622459_1.sorted.NmMdTqTgs.dupMetrics.txt

~/gatk-data-ref$ head SRR622459_1.sorted.NmMdTqTgs.dupMetrics.txt


LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED     SECONDARY_OR_SUPPLEMENTARY_RDS  UNMAPPED_READS  UNPAIRED_READ_DUPLICATES        READ_PAIR_DUPLICATES    READ_PAIR_OPTICAL_DUPLICATES    PERCENT_DUPLICATION  ESTIMATED_LIBRARY_SIZE

Q       29206692        1156666782      22192452        102838146       14050105        58665347        0       0.056085        11013720153

~/gatk-data-ref$ samtools index SRR622459_1.sorted.NmMdTqTgs.bam

Additional directory contents

~/gatk-data-ref$ ls -l

-rw-rw-r– 1 b0d2647 b0d2647 691071 May 5 05:03 59_1.fxNm.log

-rw-rw-r– 1 b0d2647 b0d2647 260704633123 May 4 10:23 SRR622459_1.sorted.NmMdTqTgs.bam

-rw-rw-r– 1 b0d2647 b0d2647 9786952 May 4 18:05 SRR622459_1.sorted.NmMdTqTgs.bam.bai

-rw-rw-r– 1 b0d2647 b0d2647 5565 May 5 05:03 SRR622459_1.sorted.NmMdTqTgs.dupMetrics.txt

-rw-rw-r– 1 b0d2647 b0d2647 263473284570 May 5 05:03 SRR622459_1.sorted.NmMdTqTgs.mdup.bam

-rw-rw-r– 1 b0d2647 b0d2647 9817616 May 5 09:59 SRR622459_1.sorted.NmMdTqTgs.mdup.bam.bai

-rw-rw-r– 1 b0d2647 b0d2647 16 May 4 19:07 summary-SRR622459_1.sorted.NmMdTqTgs

Base Recalibration

The duplicate marked BAM file is not base recalibrated (before haplotype calling, see below) using gatk BaseRecalibrator and gatk ApplyBQSR as it is time exhaustive.

Haplotype the BAM files as needed using the bash script given below for detecting variation in voltage gated sodium channel subunits

#!/bin/bash

gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr2:165989160-166149216 -O SRR622459_1.sorted.NmM
dTqTgs.mdup.scn1a.vcf &> hcall.59_1.scn1a.log &

gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr2:165208056-165392310 -O SRR622459_1.sorted.NmM
dTqTgs.mdup.scn2a.vcf &> hcall.59_1.scn2a.log &

gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr2:165087520-165204295 -O SRR622459_1.sorted.NmM
dTqTgs.mdup.scn3a.vcf &> hcall.59_1.scn3a.log &

gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr2:166403573-166494264 -O SRR622459_1.sorted.NmM
dTqTgs.mdup.scn7a.vcf &> hcall.59_1.scn7a.log &

gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr2:166195185-166375987 -O SRR622459_1.sorted.NmM
dTqTgs.mdup.scn9a.vcf &> hcall.59_1.scn9a.log &



gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr3:38548061-38649673 -O SRR622459_1.sorted.NmMdTqTgs.mdup.scn5a.vcf &> hcall.59_1.scn5a.log &

gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr3:38697110-38794010 -O SRR622459_1.sorted.NmMdTqTgs.mdup.scn10a.vcf &> hcall.59_1.scn10a.log &

gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr3:38845764-39051945 -O SRR622459_1.sorted.NmMdTqTgs.mdup.scn11a.vcf &> hcall.59_1.scn11a.log &



gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr11:118162804-118176622 -O SRR622459_1.sorted.NmMdTqTgs.mdup.scn2b.vcf &> hcall.59_1.scn2b.log &

gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr11:123629187-123654607 -O SRR622459_1.sorted.NmMdTqTgs.mdup.scn3b.vcf &> hcall.59_1.scn3b.log &

gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr11:118133377-118152915 -O SRR622459_1.sorted.NmMdTqTgs.mdup.scn4b.vcf &> hcall.59_1.scn4b.log &



gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr12:51589958-51812864  -O SRR622459_1.sorted.NmMdTqTgs.mdup.scn8a.vcf &> hcall.59_1.scn8a.log &


gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr17:63938554-63972918  -O SRR622459_1.sorted.NmMdTqTgs.mdup.scn4a.vcf &> hcall.59_1.scn4a.log &


gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I SRR622459_1.sorted.NmMdTqTgs.mdup.bam -L chr19:35030688-35040449  -O SRR622459_1.sorted.NmMdTqTgs.mdup.scn1b.vcf &> hcall.59_1.scn1b.log &

Additional directory contents

~/gatk-data-ref$ ls -l


-rw-rw-r-- 1 b0d2647 b0d2647       204909 May  5 14:47 SRR622459_1.sorted.NmMdTqTgs.mdup.scn10a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       123814 May  5 14:47 SRR622459_1.sorted.NmMdTqTgs.mdup.scn10a.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       242692 May  5 14:49 SRR622459_1.sorted.NmMdTqTgs.mdup.scn11a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       123875 May  5 14:49 SRR622459_1.sorted.NmMdTqTgs.mdup.scn11a.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       259643 May  5 11:54 SRR622459_1.sorted.NmMdTqTgs.mdup.scn1a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       197187 May  5 11:54 SRR622459_1.sorted.NmMdTqTgs.mdup.scn1a.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       177291 May  5 14:54 SRR622459_1.sorted.NmMdTqTgs.mdup.scn1b.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       114343 May  5 14:54 SRR622459_1.sorted.NmMdTqTgs.mdup.scn1b.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       236893 May  5 11:57 SRR622459_1.sorted.NmMdTqTgs.mdup.scn2a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       155459 May  5 11:57 SRR622459_1.sorted.NmMdTqTgs.mdup.scn2a.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       177127 May  5 14:50 SRR622459_1.sorted.NmMdTqTgs.mdup.scn2b.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       115006 May  5 14:50 SRR622459_1.sorted.NmMdTqTgs.mdup.scn2b.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       178557 May  5 11:58 SRR622459_1.sorted.NmMdTqTgs.mdup.scn3a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       115402 May  5 11:58 SRR622459_1.sorted.NmMdTqTgs.mdup.scn3a.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       189976 May  5 14:52 SRR622459_1.sorted.NmMdTqTgs.mdup.scn3b.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       115073 May  5 14:52 SRR622459_1.sorted.NmMdTqTgs.mdup.scn3b.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       182634 May  5 14:58 SRR622459_1.sorted.NmMdTqTgs.mdup.scn4a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       114612 May  5 14:58 SRR622459_1.sorted.NmMdTqTgs.mdup.scn4a.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       183280 May  5 14:53 SRR622459_1.sorted.NmMdTqTgs.mdup.scn4b.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       114991 May  5 14:53 SRR622459_1.sorted.NmMdTqTgs.mdup.scn4b.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       212612 May  5 14:44 SRR622459_1.sorted.NmMdTqTgs.mdup.scn5a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       123771 May  5 14:44 SRR622459_1.sorted.NmMdTqTgs.mdup.scn5a.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       198907 May  5 11:59 SRR622459_1.sorted.NmMdTqTgs.mdup.scn7a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       134920 May  5 11:59 SRR622459_1.sorted.NmMdTqTgs.mdup.scn7a.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       226542 May  5 14:56 SRR622459_1.sorted.NmMdTqTgs.mdup.scn8a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       120588 May  5 14:56 SRR622459_1.sorted.NmMdTqTgs.mdup.scn8a.vcf.idx

-rw-rw-r-- 1 b0d2647 b0d2647       250724 May  5 12:00 SRR622459_1.sorted.NmMdTqTgs.mdup.scn9a.vcf
-rw-rw-r-- 1 b0d2647 b0d2647       197300 May  5 12:00 SRR622459_1.sorted.NmMdTqTgs.mdup.scn9a.vcf.idx

SCN1a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn1a.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn1a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn1a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key                                  [4]value
SN      0       number of samples:                      1
SN      0       number of records:                      424
SN      0       number of no-ALTs:                      0
SN      0       number of SNPs:                         351
SN      0       number of MNPs:                         0
SN      0       number of indels:                       73
SN      0       number of others:                       0
SN      0       number of multiallelic sites:           6
SN      0       number of multiallelic SNP sites:       3

SCN2a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn2a.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn2a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn2a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      276
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 228
SN      0       number of MNPs: 0
SN      0       number of indels:       48
SN      0       number of others:       0
SN      0       number of multiallelic sites:   2
SN      0       number of multiallelic SNP sites:       0

SCN3a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn3a.vcf | less

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn3a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn3a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key                                  [4]value
SN      0       number of samples:      1
SN      0       number of records:      16
SN      0       number of no-ALTs:      0
SN      0       number of SNPs:                         10
SN      0       number of MNPs: 0
SN      0       number of indels:       6
SN      0       number of others:       0
SN      0       number of multiallelic sites:   1
SN      0       number of multiallelic SNP sites:       0

SCN4a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn4a.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn4a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn4a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      38
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 28
SN      0       number of MNPs: 0
SN      0       number of indels:       10
SN      0       number of others:       0
SN      0       number of multiallelic sites:   0
SN      0       number of multiallelic SNP sites:       0

SCN5a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn5a.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn5a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn5a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      183
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 156
SN      0       number of MNPs: 0
SN      0       number of indels:       27
SN      0       number of others:       0
SN      0       number of multiallelic sites:   4
SN      0       number of multiallelic SNP sites:       0

SCN7a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn7a.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn7a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn7a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      105
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 90
SN      0       number of MNPs: 0
SN      0       number of indels:       15
SN      0       number of others:       0
SN      0       number of multiallelic sites:   1
SN      0       number of multiallelic SNP sites:       0

SCN8a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn8a.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn8a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn8a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      226
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 183
SN      0       number of MNPs: 0
SN      0       number of indels:       44
SN      0       number of others:       0
SN      0       number of multiallelic sites:   3
SN      0       number of multiallelic SNP sites:       0

SCN9a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn9a.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn9a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn9a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      376
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 306
SN      0       number of MNPs: 0
SN      0       number of indels:       72
SN      0       number of others:       1
SN      0       number of multiallelic sites:   10
SN      0       number of multiallelic SNP sites:       0

SCN10a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn10a.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn10a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn10a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      137
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 122
SN      0       number of MNPs: 0
SN      0       number of indels:       15
SN      0       number of others:       0
SN      0       number of multiallelic sites:   3
SN      0       number of multiallelic SNP sites:       1

SCN11a gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn11a.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn11a.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn11a.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      308
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 258
SN      0       number of MNPs: 0
SN      0       number of indels:       51
SN      0       number of others:       0
SN      0       number of multiallelic sites:   3
SN      0       number of multiallelic SNP sites:       1

SCN1b gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn1b.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn1b.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn1b.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      10
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 8
SN      0       number of MNPs: 0
SN      0       number of indels:       2
SN      0       number of others:       0
SN      0       number of multiallelic sites:   0
SN      0       number of multiallelic SNP sites:       0

SCN2b gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn2b.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn2b.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn2b.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      9
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 7
SN      0       number of MNPs: 0
SN      0       number of indels:       2
SN      0       number of others:       0
SN      0       number of multiallelic sites:   0
SN      0       number of multiallelic SNP sites:       0

SCN3b gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn3b.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn3b.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn3b.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      66
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 58
SN      0       number of MNPs: 0
SN      0       number of indels:       8
SN      0       number of others:       1
SN      0       number of multiallelic sites:   1
SN      0       number of multiallelic SNP sites:       0

SCN4b gene variants

~/gatk-data-ref$ bcftools stats SRR622459_1.sorted.NmMdTqTgs.mdup.scn4b.vcf

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  SRR622459_1.sorted.NmMdTqTgs.mdup.scn4b.vcf
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID      0       SRR622459_1.sorted.NmMdTqTgs.mdup.scn4b.vcf
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN    [2]id   [3]key  [4]value
SN      0       number of samples:      1
SN      0       number of records:      37
SN      0       number of no-ALTs:      0
SN      0       number of SNPs: 32
SN      0       number of MNPs: 0
SN      0       number of indels:       5
SN      0       number of others:       0
SN      0       number of multiallelic sites:   0
SN      0       number of multiallelic SNP sites:       0
