1. Relevant notes

-The panel of normals not only represents common germline variant sites, it presents commonly noisy sites in sequencing data, e.g. mapping artifacts or other somewhat random but systematic artifacts of sequencing.

-When using a population germline resource, consider adjusting the –af-of-alleles-not-in-resource parameter from its default of 0.001.

-For example, the gnomAD resource af-only-gnomad_grch38.vcf.gz represents ~200k exomes and ~16k genomes and the tutorial data is exome data, so we adjust –af-of-alleles-not-in-resource to 0.0000025 which corresponds to 1/(2*exome samples).

-The default of 0.001 is appropriate for human sample analyses without any population resource. It is based on the human average rate of heterozygosity.

-The population allele frequencies (POP_AF) and the af-of-alleles-not-in-resource factor in probability calculations of the variant being somatic.

-Tutorial uses BAM files which are generated aligning the Fastq files to the hg38 reference genome using bwa mem algorithm.

-The normal and tumor BAM files will be used to discriminate the exclusive presence of mutation in the tumor tissues

2. References and resources

Google Cloud Platform, 4 VCPUs, 15 or 16 GB RAM

https://software.broadinstitute.org/gatk/documentation/article?id=11136

GATK in a Docker container : https://software.broadinstitute.org/gatk/documentation/article?id=11090

3. Obtain the data (genome/reference and analysis datasets) and check contents

wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/tutorials/datasets/tutorial_11136.tar.gz

Extract the data into a directory named /home/b0d2647/gatk_data_ref

~/gatk_data_ref$ ls -l

drwxr-xr-x 4 b0d2647 b0d2647 4096 Apr 20 18:27 gatk_mutect2

(base) :~/gatk_data_ref/gatk_mutect2$ ls -l

-rw-r–r– 1 b0d2647 b0d2647 607714 Jan 18 2018 4_NA19771.vcf.gz

-rw-r–r– 1 b0d2647 b0d2647 28618 Jan 18 2018 4_NA19771.vcf.gz.tbi

-rw-r–r– 1 b0d2647 b0d2647 754824 Jan 18 2018 5_HG02759.vcf.gz

-rw-r–r– 1 b0d2647 b0d2647 25282 Jan 18 2018 5_HG02759.vcf.gz.tbi

-rw-r–r– 1 b0d2647 b0d2647 3744912 Jan 18 2018 HG00190.bai

-rw-r–r– 1 b0d2647 b0d2647 373599558 Jan 18 2018 HG00190.bam

-rw-r–r– 1 b0d2647 b0d2647 757396 Jan 19 2018 chr17plus.interval_list

-rw-r–r– 1 b0d2647 b0d2647 2126864 Jan 18 2018 normal.bai

-rw-r–r– 1 b0d2647 b0d2647 437801420 Jan 18 2018 normal.bam

drwxrwxr-x 2 b0d2647 b0d2647 4096 Apr 20 18:47 output

drwxr-xr-x 2 b0d2647 b0d2647 4096 Apr 19 21:55 resources

-rw-r–r– 1 b0d2647 b0d2647 255301 Jan 18 2018 somatic_m2.vcf.gz

-rw-r–r– 1 b0d2647 b0d2647 34573 Jan 18 2018 somatic_m2.vcf.gz.tbi

-rw-r–r– 1 b0d2647 b0d2647 2096768 Jan 18 2018 tumor.bai

-rw-r–r– 1 b0d2647 b0d2647 567033677 Jan 18 2018 tumor.bam

-rw-r–r– 1 b0d2647 b0d2647 15312 Jan 18 2018 tumor_artifact.pre_adapter_detail_metrics.txt

-rw-r–r– 1 b0d2647 b0d2647 78 Jan 18 2018 tumor_calculatecontamination.table

~/gatk_data_ref/gatk_mutect2$ ls -l resources/

-rw-r–r– 1 b0d2647 b0d2647 112767707 Jan 18 2018 chr17_af-only-gnomad_grch38.vcf.gz

-rw-r–r– 1 b0d2647 b0d2647 73800 Jan 18 2018 chr17_af-only-gnomad_grch38.vcf.gz.tbi

-rw-r–r– 1 b0d2647 b0d2647 594252 Jan 18 2018 chr17_m2pon.vcf.gz

-rw-r–r– 1 b0d2647 b0d2647 42439 Jan 18 2018 chr17_m2pon.vcf.gz.tbi

-rw-r–r– 1 b0d2647 b0d2647 100230 Jan 18 2018 chr17_small_exac_common_3_grch38.vcf.gz

-rw-r–r– 1 b0d2647 b0d2647 11629 Jan 18 2018 chr17_small_exac_common_3_grch38.vcf.gz.tbi

~/gatk_data_ref/gatk_mutect2$ ls -l output/

-rw-r–r– 1 root root 17649 Apr 20 18:10 10_tumor_artifact.bait_bias_detail_metrics.txt

-rw-r–r– 1 root root 1842 Apr 20 18:10 10_tumor_artifact.bait_bias_summary_metrics.txt

-rw-r–r– 1 root root 1221 Apr 20 18:10 10_tumor_artifact.error_summary_metrics.txt

-rw-r–r– 1 root root 14175 Apr 20 18:10 10_tumor_artifact.pre_adapter_detail_metrics.txt

-rw-r–r– 1 root root 1856 Apr 20 18:10 10_tumor_artifact.pre_adapter_summary_metrics.txt

-rw-r–r– 1 root root 47102 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz

-rw-r–r– 1 root root 247 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.summary

-rw-r–r– 1 root root 1293 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.tbi

-rw-r–r– 1 root root 41959 Apr 20 14:14 1_somatic_m2.vcf.gz

-rw-r–r– 1 root root 35 Apr 20 14:14 1_somatic_m2.vcf.gz.stats

-rw-r–r– 1 root root 1286 Apr 20 14:14 1_somatic_m2.vcf.gz.tbi

-rw-r–r– 1 root root 69584 Apr 20 14:14 2_tumor_normal_m2.bai

-rw-r–r– 1 root root 1709147 Apr 20 14:14 2_tumor_normal_m2.bam

-rw-r–r– 1 root root 1450207 Apr 20 15:40 3_HG00190.vcf.gz

-rw-r–r– 1 root root 35 Apr 20 15:40 3_HG00190.vcf.gz.stats

-rw-r–r– 1 root root 52808 Apr 20 15:40 3_HG00190.vcf.gz.tbi

-rw-r–r– 1 root root 29292 Apr 20 17:01 6_HG0190_pon.vcf.gz

-rw-r–r– 1 root root 77 Apr 20 17:01 6_HG0190_pon.vcf.gz.tbi

-rw-r–r– 1 root root 46775 Apr 20 17:25 7_tumor_getpileupsummaries.table

-rw-r–r– 1 root root 84 Apr 20 17:35 8_tumor_calculatecontamination.table

-rw-r–r– 1 root root 45280 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz

-rw-r–r– 1 root root 1791 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.filteringStats.tsv

-rw-r–r– 1 root root 1299 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.tbi

** Our output should be similar to what is in the output folder.

4. Mount the data volume (i.e., gatk_data_ref) as ‘gatk_analysis’ in docker and check contents

:~$ sudo docker run -v /home/b0d2647/gatk_data_ref/:/gatk/gatk_analysis -it broadinstitute/gatk:4.1.1.0

Docker contents

(gatk) :/gatk# ls -l

-rw-r–r– 1 root root 3635 Mar 28 23:43 GATKConfig.EXAMPLE.properties

-rw-r–r– 1 root root 37870 Mar 28 23:43 README.md

-rwxr-xr-x 1 root root 19494 Mar 28 23:43 gatk

-rw-r–r– 1 root root 836229 Mar 28 23:43 gatk-completion.sh

-rw-r–r– 1 root root 282832421 Mar 28 23:43 gatk-package-4.1.1.0-local.jar

-rw-r–r– 1 root root 137266125 Mar 28 23:43 gatk-package-4.1.1.0-spark.jar

lrwxrwxrwx 1 root root 36 Mar 28 23:45 gatk-spark.jar -> /gatk/gatk-package-4.1.1.0-spark.jar

lrwxrwxrwx 1 root root 36 Mar 28 23:46 gatk.jar -> /gatk/gatk-package-4.1.1.0-local.jar

-rw-r–r– 1 root root 114783 Mar 28 23:43 gatkPythonPackageArchive.zip

drwxrwxr-x 3 1002 1003 4096 Apr 20 11:51 gatk_analysis

-rw-r–r– 1 root root 964 Mar 28 23:43 gatkcondaenv.yml

drwxr-xr-x 2 root root 69632 Mar 28 23:43 gatkdoc

-rw-r–r– 1 root root 53 Mar 28 23:49 gatkenv.rc

-rw-r–r– 1 root root 1984 Mar 28 23:46 install_R_packages.R

-rw-r–r– 1 root root 648 Mar 28 23:46 run_unit_tests.sh

drwxr-xr-x 5 root root 4096 Mar 28 23:44 scripts

(gatk) :/gatk/gatk_analysis/gatk_mutect2# ls -l

-rw-r–r– 1 1002 1003 607714 Jan 18 2018 4_NA19771.vcf.gz

-rw-r–r– 1 1002 1003 28618 Jan 18 2018 4_NA19771.vcf.gz.tbi

-rw-r–r– 1 1002 1003 754824 Jan 18 2018 5_HG02759.vcf.gz

-rw-r–r– 1 1002 1003 25282 Jan 18 2018 5_HG02759.vcf.gz.tbi

-rw-r–r– 1 1002 1003 3744912 Jan 18 2018 HG00190.bai

-rw-r–r– 1 1002 1003 373599558 Jan 18 2018 HG00190.bam

-rw-r–r– 1 1002 1003 757396 Jan 19 2018 chr17plus.interval_list

-rw-r–r– 1 1002 1003 2126864 Jan 18 2018 normal.bai

-rw-r–r– 1 1002 1003 437801420 Jan 18 2018 normal.bam

drwxr-xr-x 2 1002 1003 4096 Apr 19 21:55 precomputed

drwxr-xr-x 2 1002 1003 4096 Apr 19 21:55 resources

-rw-r–r– 1 1002 1003 255301 Jan 18 2018 somatic_m2.vcf.gz

-rw-r–r– 1 1002 1003 34573 Jan 18 2018 somatic_m2.vcf.gz.tbi

-rw-r–r– 1 1002 1003 2096768 Jan 18 2018 tumor.bai

-rw-r–r– 1 1002 1003 567033677 Jan 18 2018 tumor.bam

-rw-r–r– 1 1002 1003 15312 Jan 18 2018 tumor_artifact.pre_adapter_detail_metrics.txt

-rw-r–r– 1 1002 1003 78 Jan 18 2018 tumor_calculatecontamination.table

(gatk) :/gatk/gatk_analysis/gatk_mutect2# ls -l resources/

-rw-r–r– 1 1002 1003 112767707 Jan 18 2018 chr17_af-only-gnomad_grch38.vcf.gz

-rw-r–r– 1 1002 1003 73800 Jan 18 2018 chr17_af-only-gnomad_grch38.vcf.gz.tbi

-rw-r–r– 1 1002 1003 594252 Jan 18 2018 chr17_m2pon.vcf.gz

-rw-r–r– 1 1002 1003 42439 Jan 18 2018 chr17_m2pon.vcf.gz.tbi

-rw-r–r– 1 1002 1003 100230 Jan 18 2018 chr17_small_exac_common_3_grch38.vcf.gz

-rw-r–r– 1 1002 1003 11629 Jan 18 2018 chr17_small_exac_common_3_grch38.vcf.gz.tbi

(gatk) :/gatk/gatk_analysis/gatk_mutect2# ls -l precomputed/

total 4456 -rw-r–r– 1 1002 1003 17688 Jan 18 2018 10_tumor_artifact.bait_bias_detail_metrics.txt

-rw-r–r– 1 1002 1003 1881 Jan 18 2018 10_tumor_artifact.bait_bias_summary_metrics.txt

-rw-r–r– 1 1002 1003 1260 Jan 18 2018 10_tumor_artifact.error_summary_metrics.txt

-rw-r–r– 1 1002 1003 14214 Jan 18 2018 10_tumor_artifact.pre_adapter_detail_metrics.txt

-rw-r–r– 1 1002 1003 1895 Jan 18 2018 10_tumor_artifact.pre_adapter_summary_metrics.txt

-rw-r–r– 1 1002 1003 297267 Jan 18 2018 11_somatic_twicefiltered.vcf.gz

-rw-r–r– 1 1002 1003 358 Jan 18 2018 11_somatic_twicefiltered.vcf.gz.summary

-rw-r–r– 1 1002 1003 34588 Jan 18 2018 11_somatic_twicefiltered.vcf.gz.tbi

-rw-r–r– 1 1002 1003 39765 Jan 18 2018 1_somatic_m2.vcf.gz

-rw-r–r– 1 1002 1003 1535 Jan 18 2018 1_somatic_m2.vcf.gz.tbi

-rw-r–r– 1 1002 1003 84304 Jan 18 2018 2_tumor_normal_m2.bai

-rw-r–r– 1 1002 1003 2673966 Jan 18 2018 2_tumor_normal_m2.bam

-rw-r–r– 1 1002 1003 855035 Jan 18 2018 3_HG00190.vcf.gz

-rw-r–r– 1 1002 1003 35118 Jan 18 2018 3_HG00190.vcf.gz.tbi

-rw-r–r– 1 1002 1003 85870 Jan 18 2018 6_threesamplepon.vcf.gz

-rw-r–r– 1 1002 1003 17558 Jan 18 2018 6_threesamplepon.vcf.gz.tbi

-rw-r–r– 1 1002 1003 53913 Jan 18 2018 7_tumor_getpileupsummaries.table

-rw-r–r– 1 1002 1003 79 Jan 18 2018 8_tumor_calculatecontamination.table

-rw-r–r– 1 1002 1003 267160 Jan 18 2018 9_somatic_oncefiltered.vcf.gz

-rw-r–r– 1 1002 1003 34630 Jan 18 2018 9_somatic_oncefiltered.vcf.gz.tbi

5. Create a sites-only panel of normal (PoN) with CreateSomaticPanelOfNormals

First, run gatk Mutect2 in tumor-only mode on each normal sample to call all detectable variants: Needs an interval list

(gatk) :/gatk/gatk_analysis/gatk_mutect2# cat chr17plus.interval_list | head -n 3

@HD VN:1.5 SO:unsorted

@SQ SN:chr1 LN:248956422 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa AS:GRCh38 M5:6aef897c3d6ff0c78aff06ac189178dd SP:Human

@SQ SN:chr2 LN:242193529 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa AS:GRCh38 M5:f98db672eb0993dcfdabafe2a882905c SP:Human

(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk Mutect2 

-R Homo_sapiens_assembly38.fasta 

-I gatk_mutect2/HG00190.bam 

-tumor gatk_mutect2/HG00190.bam 

--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter 

--intervals gatk_mutect2/chr17plus.interval_list 

-O gatk_mutect2/output/3_HG00190.vcf.gz

Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar

…………………………

15:29:32.805 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute

15:29:42.851 INFO ProgressMeter - chr11:949097 0.2 200 1194.5

15:29:52.853 INFO ProgressMeter - chr17:233216 0.3 1970 5895.8

……………………

15:39:44.569 INFO ProgressMeter - chr17:80482649 10.2 326990 32070.2

15:39:54.589 INFO ProgressMeter - chr17:81892061 10.4 333070 32140.1

15:40:04.607 INFO ProgressMeter - chr11_KI270927v1_alt:80173 10.5 339170 32209.8

15:40:13.546 INFO Mutect2 -

934638 read(s) filtered by: (((((((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter)AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonChimericOriginalAlignmentReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND ReadLengthReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)

…………………….

                                  517298 read(s) filtered by: (MappingQualityReadFilter AND MappingQualityAvailableReadFilter)
                                  
                                      517298 read(s) filtered by: MappingQualityReadFilter 
                                      
                          3089 read(s) filtered by: NotSecondaryAlignmentReadFilter 
                          
                      414251 read(s) filtered by: NotDuplicateReadFilter 
                      

15:40:13.546 INFO ProgressMeter - HLA-A*24:03:01:2176 10.7 339828 31822.0

15:40:13.546 INFO ProgressMeter - Traversal complete. Processed 339828 total regions in 10.7 minutes.

Additional directory contents

(gatk) :/gatk/gatk_analysis# ls -l gatk_mutect2/output/

-rw-r–r– 1 root root 1450207 Apr 20 15:40 3_HG00190.vcf.gz

-rw-r–r– 1 root root 52808 Apr 20 15:40 3_HG00190.vcf.gz.tbi

Run gatk CreateSomaticPanelOfNormals on the mutect2 output

(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk CreateSomaticPanelOfNormals 

-V gatk_mutect2/output/3_HG00190.vcf.gz 

-O gatk_mutect2/output/6_HG0190_pon.vcf.gz

Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar

…………..

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Warning: CreateSomaticPanelOfNormals is a BETA tool and is not yet ready for use in production !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

………………….

17:01:10.342 INFO ProgressMeter - chr17:83057965 0.0 38386 2262436.1

17:01:10.342 INFO ProgressMeter - Traversal complete. Processed 38386 total variants in 0.0 minutes.

Additional directory contents

(gatk) :/gatk/gatk_analysis# ls -l gatk_mutect2/output/

-rw-r–r– 1 root root 29292 Apr 20 17:01 6_HG0190_pon.vcf.gz

-rw-r–r– 1 root root 77 Apr 20 17:01 6_HG0190_pon.vcf.gz.tbi

6. call mutations in the tumor in somatic mode.


(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk Mutect2 

-R Homo_sapiens_assembly38.fasta 

-I gatk_mutect2/tumor.bam 

-I gatk_mutect2/normal.bam 

-tumor HCC1143_tumor 

-normal HCC1143_normal 

-pon gatk_mutect2/resources/chr17_m2pon.vcf.gz 

--germline-resource gatk_mutect2/resources/chr17_af-only-gnomad_grch38.vcf.gz 

--af-of-alleles-not-in-resource 0.0000025 

--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter 

-L gatk_mutect2/chr17plus.interval_list 

-O gatk_mutect2/output/1_somatic_m2.vcf.gz 

-bamout gatk_mutect2/output/2_tumor_normal_m2.bam

Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar

14:09:39.146 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute

14:09:49.154 INFO ProgressMeter - chr11:1073081 0.2 580 3477.9

14:09:59.265 INFO ProgressMeter - chr17:1678328 0.3 6540 19504.9

………..

14:14:10.085 INFO ProgressMeter - chr17:80459262 4.5 274360 60757.6

14:14:20.085 INFO ProgressMeter - chr17:82493081 4.7 281520 60124.1

14:14:29.317 INFO Mutect2 -

2495430 read(s) filtered by: (((((((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonChimericOriginalAlignmentReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND ReadLengthReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)

                                  1093427 read(s) filtered by: (MappingQualityReadFilter AND MappingQualityAvailableReadFilter)
                                      
                                      1093427 read(s) filtered by: MappingQualityReadFilter 
                                      
                      1402003 read(s) filtered by: NotDuplicateReadFilter 
                      

14:14:29.317 INFO ProgressMeter - HLA-A*24:03:01:2615 4.8 285005 58931.8

14:14:29.317 INFO ProgressMeter - Traversal complete. Processed 285005 total regions in 4.8 minutes.

Additional direcory contents

(gatk) :/gatk/gatk_analysis# ls -l gatk_mutect2/output/

total 1792

-rw-r–r– 1 root root 41959 Apr 20 14:14 1_somatic_m2.vcf.gz

-rw-r–r– 1 root root 35 Apr 20 14:14 1_somatic_m2.vcf.gz.stats

-rw-r–r– 1 root root 1286 Apr 20 14:14 1_somatic_m2.vcf.gz.tbi

-rw-r–r– 1 root root 69584 Apr 20 14:14 2_tumor_normal_m2.bai

-rw-r–r– 1 root root 1709147 Apr 20 14:14 2_tumor_normal_m2.bam

(gatk) :/gatk/gatk_analysis# cat gatk_mutect2/output/1_somatic_m2.vcf.gz.stats

statistic       value

callable        4810864.0

(base) :~/gatk_data_ref/gatk_mutect2$ zcat output/1_somatic_m2.vcf.gz | awk ‘$5 ~“,”’ | head -n 2

chr17   8513688 .       GTT     G,GT    .       .       DP=298;ECNT=1;MBQ=30,29,31;MFRL=160,158,151;MMQ=60,60,60;MPOS=21,24;NALOD=0.099,-7.345e+01;NLOD

=9.03,-7.382e+01;PON;POPAF=2.49,0.365;RPA=13,11,12;RU=T;STR;TLOD=4.92,215.42    GT:AD:AF:DP:F1R2:F2R1:SB        0/0:42,2,41:0.025,0.496:85:24,0,18:13,2

,22:11,31,13,30 0/1/2:11,8,101:0.056,0.859:120:5,3,36:4,4,54:6,5,26,83


chr17   17869308        .       GAAAA   G,GAAAAA        .       .       DP=60;ECNT=1;MBQ=32,20,32;MFRL=136,97,144;MMQ=60,60,60;MPOS=29,28;NALOD=0.610,-

4.399e+00;NLOD=2.92,-3.482e+00;PON;POPAF=1.77,2.95;RPA=15,11,16;RU=A;STR;TLOD=3.06,14.28        GT:AD:AF:DP:F1R2:F2R1:SB        0/0:9,1,3:0.097,0.269:1

3:5,1,1:3,0,1:8,1,1,3   0/1/2:10,2,12:0.095,0.530:24:2,2,3:7,0,9:5,5,11,3

7. Report variant statistics using bcftools

(base) :~/gatk_data_ref/gatk_mutect2$ bcftools stats ~/gatk_data_ref/gatk_mutect2/output/1_somatic_m2.vcf.gz

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.


# The command line was: bcftools stats  /home/b0d2647/gatk_data_ref/gatk_mutect2/output/1_somatic_m2.vcf.gz
#
# SN    [2]id   [3]key                                  [4]value

SN      0       number of samples:                      2

SN      0       number of records:                      111

SN      0       number of no-ALTs:                      0

SN      0       number of SNPs:                         93

SN      0       number of MNPs:                         0

SN      0       number of indels:                       18

SN      0       number of others:                       0

SN      0       number of multiallelic sites:           7

SN      0       number of multiallelic SNP sites:       0

8. run GetPileupSummaries on the tumor BAM to summarize read support for a set of number of known variant sites.

(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk GetPileupSummaries 

-I gatk_mutect2/tumor.bam 

-V gatk_mutect2/resources/chr17_small_exac_common_3_grch38.vcf.gz 

-L gatk_mutect2/chr17plus.interval_list 

-O gatk_mutect2/output/7_tumor_getpileupsummaries.table

Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Warning: GetPileupSummaries is a BETA tool and is not yet ready for use in production !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

…………

17:24:14.981 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute

17:24:24.986 INFO ProgressMeter - chr17:7024112 0.2 1093000 6555377.8

………..

17:25:04.996 INFO ProgressMeter - chr17:80627464 0.8 9299000 11155453.4

17:25:07.921 INFO GetPileupSummaries - 1509862 read(s) filtered by: (((((((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND MateOnSameContigOrNoMappedMateReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)

                              ..................
                              
                              569508 read(s) filtered by: MappingQualityNotZeroReadFilter 
                              
                      22590 read(s) filtered by: PrimaryLineReadFilter 
                      
                  902676 read(s) filtered by: NotDuplicateReadFilter 
                  
          1 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter 
          
      15087 read(s) filtered by: MateOnSameContigOrNoMappedMateReadFilter 
      
      

17:25:07.922 INFO ProgressMeter - chr11_KI270927v1_alt:91433 0.9 9906217 11227083.4

17:25:07.922 INFO ProgressMeter - Traversal complete. Processed 9906217 total loci in 0.9 minutes.

Additinal directory contents

(gatk) :/gatk/gatk_analysis# ls -l gatk_mutect2/output/

-rw-r–r– 1 root root 46775 Apr 20 17:25 7_tumor_getpileupsummaries.table

9. Calculate Contamination

(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk CalculateContamination 

-I gatk_mutect2/output/7_tumor_getpileupsummaries.table 

-O gatk_mutect2/output/8_tumor_calculatecontamination.table

Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar

…………….

17:35:01.223 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (20) to segment; using all data points to calculate kernel matrix.

…………..

Tool returned: SUCCESS

Additinal directory contents

(gatk) :/gatk/gatk_analysis# ls -l gatk_mutect2/output/

-rw-r–r– 1 root root 84 Apr 20 17:35 8_tumor_calculatecontamination.table

-rw-r–r– 1 root root 45280 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz

10. Filter Mutect Calls

(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk FilterMutectCalls

-R Homo_sapiens_assembly38.fasta 

-V gatk_mutect2/output/1_somatic_m2.vcf.gz 

--contamination-table gatk_mutect2/output/8_tumor_calculatecontamination.table 

-O gatk_mutect2/output/9_somatic_oncefiltered.vcf.gz

Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar

17:53:14.449 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/gatk_analysis/gatk_mutect2/output/1_somatic_m2.vcf.gz

……..

17:53:15.087 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute

17:53:15.088 INFO FilterMutectCalls - Starting pass 0 through the variants

17:53:15.423 INFO FilterMutectCalls - Finished pass 0 through the variants

………………….

17:53:15.726 INFO FilterMutectCalls - No variants filtered by: AllowAllVariantsVariantFilter

17:53:15.726 INFO FilterMutectCalls - No reads filtered by: AllowAllReadsReadFilter

17:53:15.726 INFO ProgressMeter - unmapped 0.0 333 31267.6

17:53:15.726 INFO ProgressMeter - Traversal complete. Processed 333 total variants in 0.0 minutes.

Additinal directory contents

(gatk) :/gatk/gatk_analysis# ls -l gatk_mutect2/output/

-rw-r–r– 1 root root 45280 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz

-rw-r–r– 1 root root 1791 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.filteringStats.tsv

-rw-r–r– 1 root root 1299 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.tbi

11. Collect Sequencing Artifact Metrics


(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk CollectSequencingArtifactMetrics

-R Homo_sapiens_assembly38.fasta 

-I gatk_mutect2/tumor.bam 

-O gatk_mutect2/output/10_tumor_artifact 

-FILE_EXTENSION ".txt"   

Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar

INFO 2019-04-20 18:09:58 SinglePassSamProgram Processed 1,000,000 records. Elapsed time: 00:00:48s. Time for last 1,000,000: 44s. Last read position: chr17:9,242,127

………….

INFO 2019-04-20 18:10:45 SinglePassSamProgram Processed 6,000,000 records. Elapsed time: 00:01:35s. Time for last 1,000,000: 17s. Last read position: chr17_KI270908v1_alt:1,059,821

[Sat Apr 20 18:10:47 UTC 2019] picard.analysis.artifacts.CollectSequencingArtifactMetrics done. Elapsed time: 1.64 minutes.

Additinal directory contents

(gatk) :/gatk/gatk_analysis# ls -l gatk_mutect2/output/

-rw-r–r– 1 root root 17649 Apr 20 18:10 10_tumor_artifact.bait_bias_detail_metrics.txt

-rw-r–r– 1 root root 1842 Apr 20 18:10 10_tumor_artifact.bait_bias_summary_metrics.txt

-rw-r–r– 1 root root 1221 Apr 20 18:10 10_tumor_artifact.error_summary_metrics.txt

-rw-r–r– 1 root root 14175 Apr 20 18:10 10_tumor_artifact.pre_adapter_detail_metrics.txt

-rw-r–r– 1 root root 1856 Apr 20 18:10 10_tumor_artifact.pre_adapter_summary_metrics.txt

(gatk) :/gatk/gatk_analysis# cat gatk_mutect2/output/10_tumor_artifact.pre_adapter_summary_metrics.txt | head -n 17

SAMPLE_ALIAS    LIBRARY REF_BASE        ALT_BASE        TOTAL_QSCORE    WORST_CXT       WORST_CXT_QSCORE        WORST_PRE_CXT   WORST_PRE_CXT_QSCORE  W
ORST_POST_CXT   WORST_POST_CXT_QSCORE   ARTIFACT_NAME


HCC1143_tumor   Pond-147580     A       C       60      TAC     46      AAN     53      NAC     49      NA

HCC1143_tumor   Pond-147580     A       G       49      CAT     42      CAN     46      NAT     46      NA

HCC1143_tumor   Pond-147580     A       T       100     CAC     47      AAN     100     NAA     100     NA

HCC1143_tumor   Pond-147580     C       A       100     ACA     100     ACN     100     NCA     100     NA

HCC1143_tumor   Pond-147580     C       G       100     GCA     41      GCN     47      NCA     53      NA

HCC1143_tumor   Pond-147580     C       T       46      ACG     37      ACN     42      NCA     45      Deamination

HCC1143_tumor   Pond-147580     G       A       100     CGA     42      AGN     100     NGA     100     NA

HCC1143_tumor   Pond-147580     G       C       51      AGG     41      AGN     45      NGT     45      NA

HCC1143_tumor   Pond-147580     G       T       36      CGC     33      CGN     35      NGA     35      OxoG

HCC1143_tumor   Pond-147580     T       A       47      CTA     42      CTN     45      NTA     44      NA

12. Filter By Orientation Bias


#### (gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk FilterByOrientationBias 

-V gatk_mutect2/output/9_somatic_oncefiltered.vcf.gz 

-P gatk_mutect2/output/10_tumor_artifact.pre_adapter_detail_metrics.txt 

-O gatk_mutect2/output/11_somatic_twicefiltered.vcf.gz

Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar

……..

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Warning: FilterByOrientationBias is an EXPERIMENTAL tool and should not be used for production !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

………………

18:47:58.174 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute

18:47:58.179 INFO ProgressMeter - unmapped 0.0 222 3330000.0

18:47:58.179 INFO ProgressMeter - Traversal complete. Processed 222 total records in 0.0 minutes.

18:47:58.186 INFO OrientationBiasFilterer - HCC1143_normal: Nothing to filter.

18:47:58.199 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (G>T) pre-preAdapterQ: 0 of 2

18:47:58.200 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (C>A) pre-preAdapterQ: -1 of 0

18:47:58.201 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (total) pre-preAdapterQ: -1

18:47:58.201 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (G>T) post-preAdapterQ: 0

18:47:58.201 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (C>A) post-preAdapterQ: -1

18:47:58.201 INFO OrientationBiasFilterer - HCC1143_tumor: Adding orientation bias filter results to genotypes…

18:47:58.202 INFO OrientationBiasFilterer - Passing: HCC1143_tumor G* T p=7.0391265039049244E-9 Fob=0.38461538461538464

18:47:58.203 INFO OrientationBiasFilterer - Passing: HCC1143_tumor G* T p=0.0 Fob=0.5555555555555556

18:47:58.203 INFO OrientationBiasFilterer - Updating genotypes and creating final list of variants…

18:47:58.204 INFO ProgressMeter - Starting traversal

18:47:58.204 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute

18:47:58.206 INFO ProgressMeter - unmapped 0.0 111 3330000.0

18:47:58.206 INFO ProgressMeter - Traversal complete. Processed 111 total records in 0.0 minutes.

18:47:58.206 INFO FilterByOrientationBias - Writing variants to VCF…

18:47:58.268 INFO FilterByOrientationBias - Writing a simple summary table…

Additinal directory contents

(gatk) :/gatk/gatk_analysis# ls -l gatk_mutect2/output/

-rw-r–r– 1 root root 47102 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz

-rw-r–r– 1 root root 247 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.summary

-rw-r–r– 1 root root 1293 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.tbi

(gatk) :/gatk/gatk_analysis# zcat gatk_mutect2/output/11_somatic_twicefiltered.vcf.gz | grep -v ‘#’ | awk ‘$7==“PASS”’ | wc -l

24

13. Record variant statistics

(base) :~/gatk_data_ref$ bcftools stats gatk_mutect2/output/11_somatic_twicefiltered.vcf.gz

# This file was produced by bcftools stats (1.9+htslib-1.9) and can be plotted using plot-vcfstats.

# SN    [2]id   [3]key                                  [4]value
SN      0       number of samples:                      2
SN      0       number of records:                      111
SN      0       number of no-ALTs:                      0
SN      0       number of SNPs:                         93
SN      0       number of MNPs:                         0
SN      0       number of indels:                       18
SN      0       number of others:                       0
SN      0       number of multiallelic sites:           7
SN      0       number of multiallelic SNP sites:       0

14. Output directory contents

(gatk) :/gatk/gatk_analysis# ls -l gatk_mutect2/output/

-rw-r–r– 1 root root 17649 Apr 20 18:10 10_tumor_artifact.bait_bias_detail_metrics.txt

-rw-r–r– 1 root root 1842 Apr 20 18:10 10_tumor_artifact.bait_bias_summary_metrics.txt

-rw-r–r– 1 root root 1221 Apr 20 18:10 10_tumor_artifact.error_summary_metrics.txt

-rw-r–r– 1 root root 14175 Apr 20 18:10 10_tumor_artifact.pre_adapter_detail_metrics.txt

-rw-r–r– 1 root root 1856 Apr 20 18:10 10_tumor_artifact.pre_adapter_summary_metrics.txt

-rw-r–r– 1 root root 47102 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz

-rw-r–r– 1 root root 247 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.summary

-rw-r–r– 1 root root 1293 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.tbi

-rw-r–r– 1 root root 41959 Apr 20 14:14 1_somatic_m2.vcf.gz

-rw-r–r– 1 root root 35 Apr 20 14:14 1_somatic_m2.vcf.gz.stats

-rw-r–r– 1 root root 1286 Apr 20 14:14 1_somatic_m2.vcf.gz.tbi

-rw-r–r– 1 root root 69584 Apr 20 14:14 2_tumor_normal_m2.bai

-rw-r–r– 1 root root 1709147 Apr 20 14:14 2_tumor_normal_m2.bam

-rw-r–r– 1 root root 1450207 Apr 20 15:40 3_HG00190.vcf.gz

-rw-r–r– 1 root root 35 Apr 20 15:40 3_HG00190.vcf.gz.stats

-rw-r–r– 1 root root 52808 Apr 20 15:40 3_HG00190.vcf.gz.tbi

-rw-r–r– 1 root root 29292 Apr 20 17:01 6_HG0190_pon.vcf.gz

-rw-r–r– 1 root root 77 Apr 20 17:01 6_HG0190_pon.vcf.gz.tbi

-rw-r–r– 1 root root 46775 Apr 20 17:25 7_tumor_getpileupsummaries.table

-rw-r–r– 1 root root 84 Apr 20 17:35 8_tumor_calculatecontamination.table

-rw-r–r– 1 root root 45280 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz

-rw-r–r– 1 root root 1791 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.filteringStats.tsv

-rw-r–r– 1 root root 1299 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.tbi

```

