1. Relevant notes
-The panel of normals not only represents common germline variant sites, it presents commonly noisy sites in sequencing data, e.g. mapping artifacts or other somewhat random but systematic artifacts of sequencing.
-When using a population germline resource, consider adjusting the –af-of-alleles-not-in-resource parameter from its default of 0.001.
-For example, the gnomAD resource af-only-gnomad_grch38.vcf.gz represents ~200k exomes and ~16k genomes and the tutorial data is exome data, so we adjust –af-of-alleles-not-in-resource to 0.0000025 which corresponds to 1/(2*exome samples).
-The default of 0.001 is appropriate for human sample analyses without any population resource. It is based on the human average rate of heterozygosity.
-The population allele frequencies (POP_AF) and the af-of-alleles-not-in-resource factor in probability calculations of the variant being somatic.
-Tutorial uses BAM files which are generated aligning the Fastq files to the hg38 reference genome using bwa mem algorithm.
-The normal and tumor BAM files will be used to discriminate the exclusive presence of mutation in the tumor tissues
3. Obtain the data (genome/reference and analysis datasets) and check contents
Extract the data into a directory named /home/b0d2647/gatk_data_ref
~/gatk_data_ref$ ls -l
drwxr-xr-x 4 b0d2647 b0d2647 4096 Apr 20 18:27 gatk_mutect2
(base) b0d2647@vcpu4-16gb:~/gatk_data_ref/gatk_mutect2$ ls -l
-rw-r–r– 1 b0d2647 b0d2647 607714 Jan 18 2018 4_NA19771.vcf.gz
-rw-r–r– 1 b0d2647 b0d2647 28618 Jan 18 2018 4_NA19771.vcf.gz.tbi
-rw-r–r– 1 b0d2647 b0d2647 754824 Jan 18 2018 5_HG02759.vcf.gz
-rw-r–r– 1 b0d2647 b0d2647 25282 Jan 18 2018 5_HG02759.vcf.gz.tbi
-rw-r–r– 1 b0d2647 b0d2647 3744912 Jan 18 2018 HG00190.bai
-rw-r–r– 1 b0d2647 b0d2647 373599558 Jan 18 2018 HG00190.bam
-rw-r–r– 1 b0d2647 b0d2647 757396 Jan 19 2018 chr17plus.interval_list
-rw-r–r– 1 b0d2647 b0d2647 2126864 Jan 18 2018 normal.bai
-rw-r–r– 1 b0d2647 b0d2647 437801420 Jan 18 2018 normal.bam
drwxrwxr-x 2 b0d2647 b0d2647 4096 Apr 20 18:47 output
drwxr-xr-x 2 b0d2647 b0d2647 4096 Apr 19 21:55 resources
-rw-r–r– 1 b0d2647 b0d2647 255301 Jan 18 2018 somatic_m2.vcf.gz
-rw-r–r– 1 b0d2647 b0d2647 34573 Jan 18 2018 somatic_m2.vcf.gz.tbi
-rw-r–r– 1 b0d2647 b0d2647 2096768 Jan 18 2018 tumor.bai
-rw-r–r– 1 b0d2647 b0d2647 567033677 Jan 18 2018 tumor.bam
-rw-r–r– 1 b0d2647 b0d2647 15312 Jan 18 2018 tumor_artifact.pre_adapter_detail_metrics.txt
-rw-r–r– 1 b0d2647 b0d2647 78 Jan 18 2018 tumor_calculatecontamination.table
~/gatk_data_ref/gatk_mutect2$ ls -l resources/
-rw-r–r– 1 b0d2647 b0d2647 112767707 Jan 18 2018 chr17_af-only-gnomad_grch38.vcf.gz
-rw-r–r– 1 b0d2647 b0d2647 73800 Jan 18 2018 chr17_af-only-gnomad_grch38.vcf.gz.tbi
-rw-r–r– 1 b0d2647 b0d2647 594252 Jan 18 2018 chr17_m2pon.vcf.gz
-rw-r–r– 1 b0d2647 b0d2647 42439 Jan 18 2018 chr17_m2pon.vcf.gz.tbi
-rw-r–r– 1 b0d2647 b0d2647 100230 Jan 18 2018 chr17_small_exac_common_3_grch38.vcf.gz
-rw-r–r– 1 b0d2647 b0d2647 11629 Jan 18 2018 chr17_small_exac_common_3_grch38.vcf.gz.tbi
~/gatk_data_ref/gatk_mutect2$ ls -l output/
-rw-r–r– 1 root root 17649 Apr 20 18:10 10_tumor_artifact.bait_bias_detail_metrics.txt
-rw-r–r– 1 root root 1842 Apr 20 18:10 10_tumor_artifact.bait_bias_summary_metrics.txt
-rw-r–r– 1 root root 1221 Apr 20 18:10 10_tumor_artifact.error_summary_metrics.txt
-rw-r–r– 1 root root 14175 Apr 20 18:10 10_tumor_artifact.pre_adapter_detail_metrics.txt
-rw-r–r– 1 root root 1856 Apr 20 18:10 10_tumor_artifact.pre_adapter_summary_metrics.txt
-rw-r–r– 1 root root 47102 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz
-rw-r–r– 1 root root 247 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.summary
-rw-r–r– 1 root root 1293 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.tbi
-rw-r–r– 1 root root 41959 Apr 20 14:14 1_somatic_m2.vcf.gz
-rw-r–r– 1 root root 35 Apr 20 14:14 1_somatic_m2.vcf.gz.stats
-rw-r–r– 1 root root 1286 Apr 20 14:14 1_somatic_m2.vcf.gz.tbi
-rw-r–r– 1 root root 69584 Apr 20 14:14 2_tumor_normal_m2.bai
-rw-r–r– 1 root root 1709147 Apr 20 14:14 2_tumor_normal_m2.bam
-rw-r–r– 1 root root 1450207 Apr 20 15:40 3_HG00190.vcf.gz
-rw-r–r– 1 root root 35 Apr 20 15:40 3_HG00190.vcf.gz.stats
-rw-r–r– 1 root root 52808 Apr 20 15:40 3_HG00190.vcf.gz.tbi
-rw-r–r– 1 root root 29292 Apr 20 17:01 6_HG0190_pon.vcf.gz
-rw-r–r– 1 root root 77 Apr 20 17:01 6_HG0190_pon.vcf.gz.tbi
-rw-r–r– 1 root root 46775 Apr 20 17:25 7_tumor_getpileupsummaries.table
-rw-r–r– 1 root root 84 Apr 20 17:35 8_tumor_calculatecontamination.table
-rw-r–r– 1 root root 45280 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz
-rw-r–r– 1 root root 1791 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.filteringStats.tsv
-rw-r–r– 1 root root 1299 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.tbi
** Our output should be similar to what is in the output folder.
4. Mount the data volume (i.e., gatk_data_ref) as ‘gatk_analysis’ in docker and check contents
b0d2647@vcpu4-16gb:~$ sudo docker run -v /home/b0d2647/gatk_data_ref/:/gatk/gatk_analysis -it broadinstitute/gatk:4.1.1.0
Docker contents
-rw-r–r– 1 root root 3635 Mar 28 23:43 GATKConfig.EXAMPLE.properties
-rw-r–r– 1 root root 37870 Mar 28 23:43 README.md
-rwxr-xr-x 1 root root 19494 Mar 28 23:43 gatk
-rw-r–r– 1 root root 836229 Mar 28 23:43 gatk-completion.sh
-rw-r–r– 1 root root 282832421 Mar 28 23:43 gatk-package-4.1.1.0-local.jar
-rw-r–r– 1 root root 137266125 Mar 28 23:43 gatk-package-4.1.1.0-spark.jar
lrwxrwxrwx 1 root root 36 Mar 28 23:45 gatk-spark.jar -> /gatk/gatk-package-4.1.1.0-spark.jar
lrwxrwxrwx 1 root root 36 Mar 28 23:46 gatk.jar -> /gatk/gatk-package-4.1.1.0-local.jar
-rw-r–r– 1 root root 114783 Mar 28 23:43 gatkPythonPackageArchive.zip
drwxrwxr-x 3 1002 1003 4096 Apr 20 11:51 gatk_analysis
-rw-r–r– 1 root root 964 Mar 28 23:43 gatkcondaenv.yml
drwxr-xr-x 2 root root 69632 Mar 28 23:43 gatkdoc
-rw-r–r– 1 root root 53 Mar 28 23:49 gatkenv.rc
-rw-r–r– 1 root root 1984 Mar 28 23:46 install_R_packages.R
-rw-r–r– 1 root root 648 Mar 28 23:46 run_unit_tests.sh
drwxr-xr-x 5 root root 4096 Mar 28 23:44 scripts
(gatk) root@4f080bb9249f:/gatk/gatk_analysis/gatk_mutect2# ls -l
-rw-r–r– 1 1002 1003 607714 Jan 18 2018 4_NA19771.vcf.gz
-rw-r–r– 1 1002 1003 28618 Jan 18 2018 4_NA19771.vcf.gz.tbi
-rw-r–r– 1 1002 1003 754824 Jan 18 2018 5_HG02759.vcf.gz
-rw-r–r– 1 1002 1003 25282 Jan 18 2018 5_HG02759.vcf.gz.tbi
-rw-r–r– 1 1002 1003 3744912 Jan 18 2018 HG00190.bai
-rw-r–r– 1 1002 1003 373599558 Jan 18 2018 HG00190.bam
-rw-r–r– 1 1002 1003 757396 Jan 19 2018 chr17plus.interval_list
-rw-r–r– 1 1002 1003 2126864 Jan 18 2018 normal.bai
-rw-r–r– 1 1002 1003 437801420 Jan 18 2018 normal.bam
drwxr-xr-x 2 1002 1003 4096 Apr 19 21:55 precomputed
drwxr-xr-x 2 1002 1003 4096 Apr 19 21:55 resources
-rw-r–r– 1 1002 1003 255301 Jan 18 2018 somatic_m2.vcf.gz
-rw-r–r– 1 1002 1003 34573 Jan 18 2018 somatic_m2.vcf.gz.tbi
-rw-r–r– 1 1002 1003 2096768 Jan 18 2018 tumor.bai
-rw-r–r– 1 1002 1003 567033677 Jan 18 2018 tumor.bam
-rw-r–r– 1 1002 1003 15312 Jan 18 2018 tumor_artifact.pre_adapter_detail_metrics.txt
-rw-r–r– 1 1002 1003 78 Jan 18 2018 tumor_calculatecontamination.table
(gatk) root@4f080bb9249f:/gatk/gatk_analysis/gatk_mutect2# ls -l resources/
-rw-r–r– 1 1002 1003 112767707 Jan 18 2018 chr17_af-only-gnomad_grch38.vcf.gz
-rw-r–r– 1 1002 1003 73800 Jan 18 2018 chr17_af-only-gnomad_grch38.vcf.gz.tbi
-rw-r–r– 1 1002 1003 594252 Jan 18 2018 chr17_m2pon.vcf.gz
-rw-r–r– 1 1002 1003 42439 Jan 18 2018 chr17_m2pon.vcf.gz.tbi
-rw-r–r– 1 1002 1003 100230 Jan 18 2018 chr17_small_exac_common_3_grch38.vcf.gz
-rw-r–r– 1 1002 1003 11629 Jan 18 2018 chr17_small_exac_common_3_grch38.vcf.gz.tbi
(gatk) root@4f080bb9249f:/gatk/gatk_analysis/gatk_mutect2# ls -l precomputed/
total 4456 -rw-r–r– 1 1002 1003 17688 Jan 18 2018 10_tumor_artifact.bait_bias_detail_metrics.txt
-rw-r–r– 1 1002 1003 1881 Jan 18 2018 10_tumor_artifact.bait_bias_summary_metrics.txt
-rw-r–r– 1 1002 1003 1260 Jan 18 2018 10_tumor_artifact.error_summary_metrics.txt
-rw-r–r– 1 1002 1003 14214 Jan 18 2018 10_tumor_artifact.pre_adapter_detail_metrics.txt
-rw-r–r– 1 1002 1003 1895 Jan 18 2018 10_tumor_artifact.pre_adapter_summary_metrics.txt
-rw-r–r– 1 1002 1003 297267 Jan 18 2018 11_somatic_twicefiltered.vcf.gz
-rw-r–r– 1 1002 1003 358 Jan 18 2018 11_somatic_twicefiltered.vcf.gz.summary
-rw-r–r– 1 1002 1003 34588 Jan 18 2018 11_somatic_twicefiltered.vcf.gz.tbi
-rw-r–r– 1 1002 1003 39765 Jan 18 2018 1_somatic_m2.vcf.gz
-rw-r–r– 1 1002 1003 1535 Jan 18 2018 1_somatic_m2.vcf.gz.tbi
-rw-r–r– 1 1002 1003 84304 Jan 18 2018 2_tumor_normal_m2.bai
-rw-r–r– 1 1002 1003 2673966 Jan 18 2018 2_tumor_normal_m2.bam
-rw-r–r– 1 1002 1003 855035 Jan 18 2018 3_HG00190.vcf.gz
-rw-r–r– 1 1002 1003 35118 Jan 18 2018 3_HG00190.vcf.gz.tbi
-rw-r–r– 1 1002 1003 85870 Jan 18 2018 6_threesamplepon.vcf.gz
-rw-r–r– 1 1002 1003 17558 Jan 18 2018 6_threesamplepon.vcf.gz.tbi
-rw-r–r– 1 1002 1003 53913 Jan 18 2018 7_tumor_getpileupsummaries.table
-rw-r–r– 1 1002 1003 79 Jan 18 2018 8_tumor_calculatecontamination.table
-rw-r–r– 1 1002 1003 267160 Jan 18 2018 9_somatic_oncefiltered.vcf.gz
-rw-r–r– 1 1002 1003 34630 Jan 18 2018 9_somatic_oncefiltered.vcf.gz.tbi
5. Create a sites-only panel of normal (PoN) with CreateSomaticPanelOfNormals
First, run gatk Mutect2 in tumor-only mode on each normal sample to call all detectable variants: Needs an interval list
(gatk) root@b19fa3f61650:/gatk/gatk_analysis/gatk_mutect2# cat chr17plus.interval_list | head -n 3
@HD VN:1.5 SO:unsorted
@SQ SN:chr1 LN:248956422 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa AS:GRCh38 M5:6aef897c3d6ff0c78aff06ac189178dd SP:Human
@SQ SN:chr2 LN:242193529 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa AS:GRCh38 M5:f98db672eb0993dcfdabafe2a882905c SP:Human
(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk Mutect2
-R Homo_sapiens_assembly38.fasta
-I gatk_mutect2/HG00190.bam
-tumor gatk_mutect2/HG00190.bam
--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter
--intervals gatk_mutect2/chr17plus.interval_list
-O gatk_mutect2/output/3_HG00190.vcf.gz
Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar
…………………………
15:29:32.805 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
15:29:42.851 INFO ProgressMeter - chr11:949097 0.2 200 1194.5
15:29:52.853 INFO ProgressMeter - chr17:233216 0.3 1970 5895.8
……………………
15:39:44.569 INFO ProgressMeter - chr17:80482649 10.2 326990 32070.2
15:39:54.589 INFO ProgressMeter - chr17:81892061 10.4 333070 32140.1
15:40:04.607 INFO ProgressMeter - chr11_KI270927v1_alt:80173 10.5 339170 32209.8
15:40:13.546 INFO Mutect2 -
934638 read(s) filtered by: (((((((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter)AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonChimericOriginalAlignmentReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND ReadLengthReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)
…………………….
517298 read(s) filtered by: (MappingQualityReadFilter AND MappingQualityAvailableReadFilter)
517298 read(s) filtered by: MappingQualityReadFilter
3089 read(s) filtered by: NotSecondaryAlignmentReadFilter
414251 read(s) filtered by: NotDuplicateReadFilter
15:40:13.546 INFO ProgressMeter - HLA-A*24:03:01:2176 10.7 339828 31822.0
15:40:13.546 INFO ProgressMeter - Traversal complete. Processed 339828 total regions in 10.7 minutes.
Additional directory contents
(gatk) root@7e308457ded6:/gatk/gatk_analysis# ls -l gatk_mutect2/output/
-rw-r–r– 1 root root 1450207 Apr 20 15:40 3_HG00190.vcf.gz
-rw-r–r– 1 root root 52808 Apr 20 15:40 3_HG00190.vcf.gz.tbi
Run gatk CreateSomaticPanelOfNormals on the mutect2 output
(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk CreateSomaticPanelOfNormals
-V gatk_mutect2/output/3_HG00190.vcf.gz
-O gatk_mutect2/output/6_HG0190_pon.vcf.gz
Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar
…………..
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Warning: CreateSomaticPanelOfNormals is a BETA tool and is not yet ready for use in production !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
………………….
17:01:10.342 INFO ProgressMeter - chr17:83057965 0.0 38386 2262436.1
17:01:10.342 INFO ProgressMeter - Traversal complete. Processed 38386 total variants in 0.0 minutes.
Additional directory contents
(gatk) root@7e308457ded6:/gatk/gatk_analysis# ls -l gatk_mutect2/output/
-rw-r–r– 1 root root 29292 Apr 20 17:01 6_HG0190_pon.vcf.gz
-rw-r–r– 1 root root 77 Apr 20 17:01 6_HG0190_pon.vcf.gz.tbi
6. call mutations in the tumor in somatic mode.
(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk Mutect2
-R Homo_sapiens_assembly38.fasta
-I gatk_mutect2/tumor.bam
-I gatk_mutect2/normal.bam
-tumor HCC1143_tumor
-normal HCC1143_normal
-pon gatk_mutect2/resources/chr17_m2pon.vcf.gz
--germline-resource gatk_mutect2/resources/chr17_af-only-gnomad_grch38.vcf.gz
--af-of-alleles-not-in-resource 0.0000025
--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter
-L gatk_mutect2/chr17plus.interval_list
-O gatk_mutect2/output/1_somatic_m2.vcf.gz
-bamout gatk_mutect2/output/2_tumor_normal_m2.bam
Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar
14:09:39.146 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
14:09:49.154 INFO ProgressMeter - chr11:1073081 0.2 580 3477.9
14:09:59.265 INFO ProgressMeter - chr17:1678328 0.3 6540 19504.9
………..
14:14:10.085 INFO ProgressMeter - chr17:80459262 4.5 274360 60757.6
14:14:20.085 INFO ProgressMeter - chr17:82493081 4.7 281520 60124.1
14:14:29.317 INFO Mutect2 -
2495430 read(s) filtered by: (((((((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonChimericOriginalAlignmentReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND ReadLengthReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)
1093427 read(s) filtered by: (MappingQualityReadFilter AND MappingQualityAvailableReadFilter)
1093427 read(s) filtered by: MappingQualityReadFilter
1402003 read(s) filtered by: NotDuplicateReadFilter
14:14:29.317 INFO ProgressMeter - HLA-A*24:03:01:2615 4.8 285005 58931.8
14:14:29.317 INFO ProgressMeter - Traversal complete. Processed 285005 total regions in 4.8 minutes.
Additional direcory contents
(gatk) root@7e308457ded6:/gatk/gatk_analysis# ls -l gatk_mutect2/output/
total 1792
-rw-r–r– 1 root root 41959 Apr 20 14:14 1_somatic_m2.vcf.gz
-rw-r–r– 1 root root 35 Apr 20 14:14 1_somatic_m2.vcf.gz.stats
-rw-r–r– 1 root root 1286 Apr 20 14:14 1_somatic_m2.vcf.gz.tbi
-rw-r–r– 1 root root 69584 Apr 20 14:14 2_tumor_normal_m2.bai
-rw-r–r– 1 root root 1709147 Apr 20 14:14 2_tumor_normal_m2.bam
(gatk) root@7e308457ded6:/gatk/gatk_analysis# cat gatk_mutect2/output/1_somatic_m2.vcf.gz.stats
statistic value
callable 4810864.0
(base) b0d2647@vcpu4-16gb:~/gatk_data_ref/gatk_mutect2$ zcat output/1_somatic_m2.vcf.gz | awk ‘$5 ~“,”’ | head -n 2
chr17 8513688 . GTT G,GT . . DP=298;ECNT=1;MBQ=30,29,31;MFRL=160,158,151;MMQ=60,60,60;MPOS=21,24;NALOD=0.099,-7.345e+01;NLOD
=9.03,-7.382e+01;PON;POPAF=2.49,0.365;RPA=13,11,12;RU=T;STR;TLOD=4.92,215.42 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:42,2,41:0.025,0.496:85:24,0,18:13,2
,22:11,31,13,30 0/1/2:11,8,101:0.056,0.859:120:5,3,36:4,4,54:6,5,26,83
chr17 17869308 . GAAAA G,GAAAAA . . DP=60;ECNT=1;MBQ=32,20,32;MFRL=136,97,144;MMQ=60,60,60;MPOS=29,28;NALOD=0.610,-
4.399e+00;NLOD=2.92,-3.482e+00;PON;POPAF=1.77,2.95;RPA=15,11,16;RU=A;STR;TLOD=3.06,14.28 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:9,1,3:0.097,0.269:1
3:5,1,1:3,0,1:8,1,1,3 0/1/2:10,2,12:0.095,0.530:24:2,2,3:7,0,9:5,5,11,3
8. run GetPileupSummaries on the tumor BAM to summarize read support for a set of number of known variant sites.
(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk GetPileupSummaries
-I gatk_mutect2/tumor.bam
-V gatk_mutect2/resources/chr17_small_exac_common_3_grch38.vcf.gz
-L gatk_mutect2/chr17plus.interval_list
-O gatk_mutect2/output/7_tumor_getpileupsummaries.table
Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Warning: GetPileupSummaries is a BETA tool and is not yet ready for use in production !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
…………
17:24:14.981 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
17:24:24.986 INFO ProgressMeter - chr17:7024112 0.2 1093000 6555377.8
………..
17:25:04.996 INFO ProgressMeter - chr17:80627464 0.8 9299000 11155453.4
17:25:07.921 INFO GetPileupSummaries - 1509862 read(s) filtered by: (((((((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND MateOnSameContigOrNoMappedMateReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)
..................
569508 read(s) filtered by: MappingQualityNotZeroReadFilter
22590 read(s) filtered by: PrimaryLineReadFilter
902676 read(s) filtered by: NotDuplicateReadFilter
1 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
15087 read(s) filtered by: MateOnSameContigOrNoMappedMateReadFilter
17:25:07.922 INFO ProgressMeter - chr11_KI270927v1_alt:91433 0.9 9906217 11227083.4
17:25:07.922 INFO ProgressMeter - Traversal complete. Processed 9906217 total loci in 0.9 minutes.
Additinal directory contents
(gatk) root@7e308457ded6:/gatk/gatk_analysis# ls -l gatk_mutect2/output/
-rw-r–r– 1 root root 46775 Apr 20 17:25 7_tumor_getpileupsummaries.table
9. Calculate Contamination
(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk CalculateContamination
-I gatk_mutect2/output/7_tumor_getpileupsummaries.table
-O gatk_mutect2/output/8_tumor_calculatecontamination.table
Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar
…………….
17:35:01.223 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (20) to segment; using all data points to calculate kernel matrix.
…………..
Tool returned: SUCCESS
Additinal directory contents
(gatk) root@7e308457ded6:/gatk/gatk_analysis# ls -l gatk_mutect2/output/
-rw-r–r– 1 root root 84 Apr 20 17:35 8_tumor_calculatecontamination.table
-rw-r–r– 1 root root 45280 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz
10. Filter Mutect Calls
(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk FilterMutectCalls
-R Homo_sapiens_assembly38.fasta
-V gatk_mutect2/output/1_somatic_m2.vcf.gz
--contamination-table gatk_mutect2/output/8_tumor_calculatecontamination.table
-O gatk_mutect2/output/9_somatic_oncefiltered.vcf.gz
Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar
17:53:14.449 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/gatk_analysis/gatk_mutect2/output/1_somatic_m2.vcf.gz
……..
17:53:15.087 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
17:53:15.088 INFO FilterMutectCalls - Starting pass 0 through the variants
17:53:15.423 INFO FilterMutectCalls - Finished pass 0 through the variants
………………….
17:53:15.726 INFO FilterMutectCalls - No variants filtered by: AllowAllVariantsVariantFilter
17:53:15.726 INFO FilterMutectCalls - No reads filtered by: AllowAllReadsReadFilter
17:53:15.726 INFO ProgressMeter - unmapped 0.0 333 31267.6
17:53:15.726 INFO ProgressMeter - Traversal complete. Processed 333 total variants in 0.0 minutes.
Additinal directory contents
(gatk) root@7e308457ded6:/gatk/gatk_analysis# ls -l gatk_mutect2/output/
-rw-r–r– 1 root root 45280 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz
-rw-r–r– 1 root root 1791 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.filteringStats.tsv
-rw-r–r– 1 root root 1299 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.tbi
11. Collect Sequencing Artifact Metrics
(gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk CollectSequencingArtifactMetrics
-R Homo_sapiens_assembly38.fasta
-I gatk_mutect2/tumor.bam
-O gatk_mutect2/output/10_tumor_artifact
-FILE_EXTENSION ".txt"
Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar
INFO 2019-04-20 18:09:58 SinglePassSamProgram Processed 1,000,000 records. Elapsed time: 00:00:48s. Time for last 1,000,000: 44s. Last read position: chr17:9,242,127
………….
INFO 2019-04-20 18:10:45 SinglePassSamProgram Processed 6,000,000 records. Elapsed time: 00:01:35s. Time for last 1,000,000: 17s. Last read position: chr17_KI270908v1_alt:1,059,821
[Sat Apr 20 18:10:47 UTC 2019] picard.analysis.artifacts.CollectSequencingArtifactMetrics done. Elapsed time: 1.64 minutes.
Additinal directory contents
(gatk) root@7e308457ded6:/gatk/gatk_analysis# ls -l gatk_mutect2/output/
-rw-r–r– 1 root root 17649 Apr 20 18:10 10_tumor_artifact.bait_bias_detail_metrics.txt
-rw-r–r– 1 root root 1842 Apr 20 18:10 10_tumor_artifact.bait_bias_summary_metrics.txt
-rw-r–r– 1 root root 1221 Apr 20 18:10 10_tumor_artifact.error_summary_metrics.txt
-rw-r–r– 1 root root 14175 Apr 20 18:10 10_tumor_artifact.pre_adapter_detail_metrics.txt
-rw-r–r– 1 root root 1856 Apr 20 18:10 10_tumor_artifact.pre_adapter_summary_metrics.txt
(gatk) root@7e308457ded6:/gatk/gatk_analysis# cat gatk_mutect2/output/10_tumor_artifact.pre_adapter_summary_metrics.txt | head -n 17
SAMPLE_ALIAS LIBRARY REF_BASE ALT_BASE TOTAL_QSCORE WORST_CXT WORST_CXT_QSCORE WORST_PRE_CXT WORST_PRE_CXT_QSCORE W
ORST_POST_CXT WORST_POST_CXT_QSCORE ARTIFACT_NAME
HCC1143_tumor Pond-147580 A C 60 TAC 46 AAN 53 NAC 49 NA
HCC1143_tumor Pond-147580 A G 49 CAT 42 CAN 46 NAT 46 NA
HCC1143_tumor Pond-147580 A T 100 CAC 47 AAN 100 NAA 100 NA
HCC1143_tumor Pond-147580 C A 100 ACA 100 ACN 100 NCA 100 NA
HCC1143_tumor Pond-147580 C G 100 GCA 41 GCN 47 NCA 53 NA
HCC1143_tumor Pond-147580 C T 46 ACG 37 ACN 42 NCA 45 Deamination
HCC1143_tumor Pond-147580 G A 100 CGA 42 AGN 100 NGA 100 NA
HCC1143_tumor Pond-147580 G C 51 AGG 41 AGN 45 NGT 45 NA
HCC1143_tumor Pond-147580 G T 36 CGC 33 CGN 35 NGA 35 OxoG
HCC1143_tumor Pond-147580 T A 47 CTA 42 CTN 45 NTA 44 NA
12. Filter By Orientation Bias
#### (gatk) root@7e308457ded6:/gatk/gatk_analysis# gatk FilterByOrientationBias
-V gatk_mutect2/output/9_somatic_oncefiltered.vcf.gz
-P gatk_mutect2/output/10_tumor_artifact.pre_adapter_detail_metrics.txt
-O gatk_mutect2/output/11_somatic_twicefiltered.vcf.gz
Using GATK jar /gatk/gatk-package-4.1.1.0-local.jar
……..
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Warning: FilterByOrientationBias is an EXPERIMENTAL tool and should not be used for production !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
………………
18:47:58.174 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute
18:47:58.179 INFO ProgressMeter - unmapped 0.0 222 3330000.0
18:47:58.179 INFO ProgressMeter - Traversal complete. Processed 222 total records in 0.0 minutes.
18:47:58.186 INFO OrientationBiasFilterer - HCC1143_normal: Nothing to filter.
18:47:58.199 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (G>T) pre-preAdapterQ: 0 of 2
18:47:58.200 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (C>A) pre-preAdapterQ: -1 of 0
18:47:58.201 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (total) pre-preAdapterQ: -1
18:47:58.201 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (G>T) post-preAdapterQ: 0
18:47:58.201 INFO OrientationBiasFilterer - HCC1143_tumor: Cutting (C>A) post-preAdapterQ: -1
18:47:58.201 INFO OrientationBiasFilterer - HCC1143_tumor: Adding orientation bias filter results to genotypes…
18:47:58.202 INFO OrientationBiasFilterer - Passing: HCC1143_tumor G* T p=7.0391265039049244E-9 Fob=0.38461538461538464
18:47:58.203 INFO OrientationBiasFilterer - Passing: HCC1143_tumor G* T p=0.0 Fob=0.5555555555555556
18:47:58.203 INFO OrientationBiasFilterer - Updating genotypes and creating final list of variants…
18:47:58.204 INFO ProgressMeter - Starting traversal
18:47:58.204 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute
18:47:58.206 INFO ProgressMeter - unmapped 0.0 111 3330000.0
18:47:58.206 INFO ProgressMeter - Traversal complete. Processed 111 total records in 0.0 minutes.
18:47:58.206 INFO FilterByOrientationBias - Writing variants to VCF…
18:47:58.268 INFO FilterByOrientationBias - Writing a simple summary table…
Additinal directory contents
(gatk) root@7e308457ded6:/gatk/gatk_analysis# ls -l gatk_mutect2/output/
-rw-r–r– 1 root root 47102 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz
-rw-r–r– 1 root root 247 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.summary
-rw-r–r– 1 root root 1293 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.tbi
(gatk) root@7e308457ded6:/gatk/gatk_analysis# zcat gatk_mutect2/output/11_somatic_twicefiltered.vcf.gz | grep -v ‘#’ | awk ‘$7==“PASS”’ | wc -l
24
13. Record variant statistics
14. Output directory contents
(gatk) root@7e308457ded6:/gatk/gatk_analysis# ls -l gatk_mutect2/output/
-rw-r–r– 1 root root 17649 Apr 20 18:10 10_tumor_artifact.bait_bias_detail_metrics.txt
-rw-r–r– 1 root root 1842 Apr 20 18:10 10_tumor_artifact.bait_bias_summary_metrics.txt
-rw-r–r– 1 root root 1221 Apr 20 18:10 10_tumor_artifact.error_summary_metrics.txt
-rw-r–r– 1 root root 14175 Apr 20 18:10 10_tumor_artifact.pre_adapter_detail_metrics.txt
-rw-r–r– 1 root root 1856 Apr 20 18:10 10_tumor_artifact.pre_adapter_summary_metrics.txt
-rw-r–r– 1 root root 47102 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz
-rw-r–r– 1 root root 247 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.summary
-rw-r–r– 1 root root 1293 Apr 20 18:47 11_somatic_twicefiltered.vcf.gz.tbi
-rw-r–r– 1 root root 41959 Apr 20 14:14 1_somatic_m2.vcf.gz
-rw-r–r– 1 root root 35 Apr 20 14:14 1_somatic_m2.vcf.gz.stats
-rw-r–r– 1 root root 1286 Apr 20 14:14 1_somatic_m2.vcf.gz.tbi
-rw-r–r– 1 root root 69584 Apr 20 14:14 2_tumor_normal_m2.bai
-rw-r–r– 1 root root 1709147 Apr 20 14:14 2_tumor_normal_m2.bam
-rw-r–r– 1 root root 1450207 Apr 20 15:40 3_HG00190.vcf.gz
-rw-r–r– 1 root root 35 Apr 20 15:40 3_HG00190.vcf.gz.stats
-rw-r–r– 1 root root 52808 Apr 20 15:40 3_HG00190.vcf.gz.tbi
-rw-r–r– 1 root root 29292 Apr 20 17:01 6_HG0190_pon.vcf.gz
-rw-r–r– 1 root root 77 Apr 20 17:01 6_HG0190_pon.vcf.gz.tbi
-rw-r–r– 1 root root 46775 Apr 20 17:25 7_tumor_getpileupsummaries.table
-rw-r–r– 1 root root 84 Apr 20 17:35 8_tumor_calculatecontamination.table
-rw-r–r– 1 root root 45280 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz
-rw-r–r– 1 root root 1791 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.filteringStats.tsv
-rw-r–r– 1 root root 1299 Apr 20 17:53 9_somatic_oncefiltered.vcf.gz.tbi
```
