Mutect2 (GATK4) annotations

alexcharneyalexcharney Mount SinaiMember
edited February 15 in Ask the GATK team

Hi,

I've encountered some issues when including annotations running Mutect2 in GATK4 (v4.0.1.2). My use-case is an unsupported feature of GATK4 (calling somatic variants without a matched normal sample), though if I am not mistaken the errors are unrelated to my use-case so I am posting (apologies if I am mistaken). The files needed to reproduce my example are attached to this message, as are all of the files generated in the code below.

I am able to successfully run Mutect2 using the following command:

$ gatk --java-options '-Xmx6g' Mutect2 \
       --reference human_g1k_v37.fasta \
       --input subset.bam \
       --intervals interval.bed \
       --tumor-sample Pitt-DNA-PFC-BP-1482 \
       --panel-of-normals pon.vcf.gz \
       --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter \
       --output success.vcf.gz &> success.vcf.gz.log

This generates output for 212 variants:

$ zgrep -v ^"\#" success.vcf.gz | wc -l
212

I next aimed to include annotations in my output. Since somatic calls are prone to noise, I tried including all of the annotations available, as listed here. The resulting vcf had no variants. To identify which annotations were causing the issue, I ran the caller using one annotation at a time:

annotations="AS_BaseQualityRankSumTest AS_FisherStrand AS_InbreedingCoeff AS_MappingQualityRankSumTest AS_QualByDepth AS_RMSMappingQuality AS_ReadPosRankSumTest AS_StrandOddsRatio BaseQuality BaseQualityRankSumTest ChromosomeCounts ClippingRankSumTest Coverage DepthPerAlleleBySample DepthPerSampleHC ExcessHet FisherStrand FragmentLength GenotypeSummaries InbreedingCoeff LikelihoodRankSumTest MappingQuality MappingQualityRankSumTest MappingQualityZero OxoGReadCounts PossibleDeNovo QualByDepth RMSMappingQuality ReadPosRankSumTest ReadPosition ReferenceBases SampleList StrandArtifact StrandBiasBySample StrandOddsRatio TandemRepeat UniqueAltReadCount"

$ for i in ${annotations}
  do
    gatk --java-options '-Xmx6g' Mutect2 \
       --reference human_g1k_v37.fasta \
       --input subset.bam \
       --intervals interval.bed \
       --tumor-sample Pitt-DNA-PFC-BP-1482 \
       --panel-of-normals pon.vcf.gz \
       --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter \
       --annotation ${i} \
       --output  ${i}.vcf.gz &> ${i}.vcf.gz.log
    echo `zgrep -v ^"\#" ${i}.vcf.gz | wc -l` ${i} >> nvar.txt
  done

$ sort -nk1 nvar.txt
0 AS_BaseQualityRankSumTest
0 AS_FisherStrand
0 AS_MappingQualityRankSumTest
0 AS_RMSMappingQuality
0 AS_ReadPosRankSumTest
0 AS_StrandOddsRatio
22 BaseQualityRankSumTest
212 AS_InbreedingCoeff
212 AS_QualByDepth
212 BaseQuality
212 ChromosomeCounts
212 ClippingRankSumTest
212 Coverage
212 DepthPerAlleleBySample
212 DepthPerSampleHC
212 ExcessHet
212 FisherStrand
212 FragmentLength
212 GenotypeSummaries
212 InbreedingCoeff
212 LikelihoodRankSumTest
212 MappingQuality
212 MappingQualityRankSumTest
212 MappingQualityZero
212 OxoGReadCounts
212 PossibleDeNovo
212 QualByDepth
212 RMSMappingQuality
212 ReadPosRankSumTest
212 ReadPosition
212 ReferenceBases
212 SampleList
212 StrandArtifact
212 StrandBiasBySample
212 StrandOddsRatio
212 TandemRepeat
212 UniqueAltReadCount

Most annotations work fine, but 2 separate issues are noted. First, allele-specific annotations don't work. The error produced when these 6 annotations are used is the same - "Key [annotation] in VariantContext field INFO at 1:222906294 but this key isn't defined in the VCFHeader." Here is the full stacktrace for 1 of these:

$ cat AS_BaseQualityRankSumTest.vcf.gz.log

Using GATK jar /sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx6g -jar /sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar Mutect2 --reference /sc/orga/projects/ngs/resources/gatk/2.8/human_g1k_v37.fasta --input subset.bam --intervals interval.bed --tumor-sample Pitt-DNA-PFC-BP-1482 --panel-of-normals pon.vcf.gz --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter --annotation AS_BaseQualityRankSumTest --output AS_BaseQualityRankSumTest.vcf.gz
07:27:39.203 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_compression.so
07:27:39.364 INFO  Mutect2 - ------------------------------------------------------------
07:27:39.364 INFO  Mutect2 - The Genome Analysis Toolkit (GATK) v4.0.1.2
07:27:39.364 INFO  Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
07:27:39.365 INFO  Mutect2 - Executing as [email protected] on Linux v2.6.32-358.6.2.el6.x86_64 amd64
07:27:39.365 INFO  Mutect2 - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_66-b17
07:27:39.365 INFO  Mutect2 - Start Date/Time: February 15, 2018 7:27:39 AM EST
07:27:39.365 INFO  Mutect2 - ------------------------------------------------------------
07:27:39.365 INFO  Mutect2 - ------------------------------------------------------------
07:27:39.366 INFO  Mutect2 - HTSJDK Version: 2.14.1
07:27:39.366 INFO  Mutect2 - Picard Version: 2.17.2
07:27:39.366 INFO  Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 1
07:27:39.366 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
07:27:39.367 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
07:27:39.367 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
07:27:39.367 INFO  Mutect2 - Deflater: IntelDeflater
07:27:39.367 INFO  Mutect2 - Inflater: IntelInflater
07:27:39.367 INFO  Mutect2 - GCS max retries/reopens: 20
07:27:39.367 INFO  Mutect2 - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
07:27:39.367 INFO  Mutect2 - Initializing engine
07:27:40.217 INFO  FeatureManager - Using codec VCFCodec to read file file:///sc/orga/projects/psychgen/collab/somatic/data/variants/snv/mutect/cmc/scripts/tmp/example/pon.vcf.gz
07:27:40.262 INFO  FeatureManager - Using codec BEDCodec to read file file:///sc/orga/projects/psychgen/collab/somatic/data/variants/snv/mutect/cmc/scripts/tmp/example/interval.bed
07:27:40.271 INFO  IntervalArgumentCollection - Processing 1238270 bp from intervals
07:27:40.284 INFO  Mutect2 - Done initializing engine
07:27:41.809 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_utils.so
07:27:41.811 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
07:27:41.833 WARN  NativeLibraryLoader - Unable to load libgkl_pairhmm_omp.so from native/libgkl_pairhmm_omp.so (/tmp/charna02/libgkl_pairhmm_omp6338781261960552366.so: /hpc/packages/minerva-common/gcc/4.8.2/lib64/libgomp.so.1: version `GOMP_4.0' not found (required by /tmp/charna02/libgkl_pairhmm_omp6338781261960552366.so))
07:27:41.833 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
07:27:41.833 INFO  NativeLibraryLoader - Loading libgkl_pairhmm.so from jar:file:/sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_pairhmm.so
07:27:41.911 WARN  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
07:27:41.912 WARN  IntelPairHmm - Ignoring request for 4 threads; not using OpenMP implementation
07:27:41.912 INFO  PairHMM - Using the AVX-accelerated native PairHMM implementation
07:27:41.975 INFO  ProgressMeter - Starting traversal
07:27:41.976 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
07:27:43.811 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.00303848
07:27:43.811 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.172155776
07:27:43.812 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.17 sec
07:27:43.812 INFO  Mutect2 - Shutting down engine
[February 15, 2018 7:27:43 AM EST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.08 minutes.
Runtime.totalMemory()=2403860480
java.lang.IllegalStateException: Key AS_RAW_BaseQRankSum found in VariantContext field INFO at 1:222906294 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.
    at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:173)
    at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:112)
    at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:224)
    at java.util.ArrayList.forEach(ArrayList.java:1249)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:186)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:295)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:271)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
    at org.broadinstitute.hellbender.Main.main(Main.java:277)

Second, for the BaseQualityRankSumTest annotation there is an error that reads "Somehow the requested coordinate is not covered by the read." From other posts, I have gathered that in earlier GATK versions this error was related to multi-threading and could be resolved using the "-nct 1" argument. Is there an equivalent fix for GATK4? I naively tried setting the "--native-pair-hmm-threads" argument to 1 (since it has the word thread in it), but this didn't fix the issue. Here is the full stacktrace for this error:

$ cat BaseQualityRankSumTest.vcf.gz.log 

Using GATK jar /sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx6g -jar /sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar Mutect2 --reference /sc/orga/projects/ngs/resources/gatk/2.8/human_g1k_v37.fasta --input subset.bam --intervals interval.bed --tumor-sample Pitt-DNA-PFC-BP-1482 --panel-of-normals pon.vcf.gz --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter --annotation BaseQualityRankSumTest --output BaseQualityRankSumTest.vcf.gz
07:30:38.698 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_compression.so
07:30:38.858 INFO  Mutect2 - ------------------------------------------------------------
07:30:38.859 INFO  Mutect2 - The Genome Analysis Toolkit (GATK) v4.0.1.2
07:30:38.859 INFO  Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
07:30:38.859 INFO  Mutect2 - Executing as [email protected] on Linux v2.6.32-358.6.2.el6.x86_64 amd64
07:30:38.859 INFO  Mutect2 - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_66-b17
07:30:38.860 INFO  Mutect2 - Start Date/Time: February 15, 2018 7:30:38 AM EST
07:30:38.860 INFO  Mutect2 - ------------------------------------------------------------
07:30:38.860 INFO  Mutect2 - ------------------------------------------------------------
07:30:38.861 INFO  Mutect2 - HTSJDK Version: 2.14.1
07:30:38.861 INFO  Mutect2 - Picard Version: 2.17.2
07:30:38.861 INFO  Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 1
07:30:38.861 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
07:30:38.861 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
07:30:38.861 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
07:30:38.861 INFO  Mutect2 - Deflater: IntelDeflater
07:30:38.861 INFO  Mutect2 - Inflater: IntelInflater
07:30:38.862 INFO  Mutect2 - GCS max retries/reopens: 20
07:30:38.862 INFO  Mutect2 - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
07:30:38.862 INFO  Mutect2 - Initializing engine
07:30:39.719 INFO  FeatureManager - Using codec VCFCodec to read file file:///sc/orga/projects/psychgen/collab/somatic/data/variants/snv/mutect/cmc/scripts/tmp/example/pon.vcf.gz
07:30:39.764 INFO  FeatureManager - Using codec BEDCodec to read file file:///sc/orga/projects/psychgen/collab/somatic/data/variants/snv/mutect/cmc/scripts/tmp/example/interval.bed
07:30:39.772 INFO  IntervalArgumentCollection - Processing 1238270 bp from intervals
07:30:39.786 INFO  Mutect2 - Done initializing engine
07:30:41.296 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_utils.so
07:30:41.300 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
07:30:41.321 WARN  NativeLibraryLoader - Unable to load libgkl_pairhmm_omp.so from native/libgkl_pairhmm_omp.so (/tmp/charna02/libgkl_pairhmm_omp348664434292156423.so: /hpc/packages/minerva-common/gcc/4.8.2/lib64/libgomp.so.1: version `GOMP_4.0' not found (required by /tmp/charna02/libgkl_pairhmm_omp348664434292156423.so))
07:30:41.322 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
07:30:41.322 INFO  NativeLibraryLoader - Loading libgkl_pairhmm.so from jar:file:/sc/orga/work/charna02/packages/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_pairhmm.so
07:30:41.398 WARN  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
07:30:41.399 WARN  IntelPairHmm - Ignoring request for 4 threads; not using OpenMP implementation
07:30:41.399 INFO  PairHMM - Using the AVX-accelerated native PairHMM implementation
07:30:41.473 INFO  ProgressMeter - Starting traversal
07:30:41.474 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
07:30:46.716 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.007221282000000001
07:30:46.716 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.33110204800000004
07:30:46.716 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.52 sec
07:30:46.716 INFO  Mutect2 - Shutting down engine
[February 15, 2018 7:30:46 AM EST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.13 minutes.
Runtime.totalMemory()=2349858816
org.broadinstitute.hellbender.exceptions.GATKException: Somehow the requested coordinate is not covered by the read. Alignment 222991956 | 25M
    at org.broadinstitute.hellbender.utils.read.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:797)
    at org.broadinstitute.hellbender.utils.read.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:674)
    at org.broadinstitute.hellbender.utils.read.ReadUtils.getReadCoordinateForReferenceCoordinateUpToEndOfRead(ReadUtils.java:632)
    at org.broadinstitute.hellbender.tools.walkers.annotator.BaseQualityRankSumTest.getReadBaseQuality(BaseQualityRankSumTest.java:42)
    at org.broadinstitute.hellbender.tools.walkers.annotator.BaseQualityRankSumTest.getElementForRead(BaseQualityRankSumTest.java:37)
    at org.broadinstitute.hellbender.tools.walkers.annotator.RankSumTest.getElementForRead(RankSumTest.java:94)
    at org.broadinstitute.hellbender.tools.walkers.annotator.RankSumTest.annotate(RankSumTest.java:54)
    at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:305)
    at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:161)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:240)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:186)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:295)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:271)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
    at org.broadinstitute.hellbender.Main.main(Main.java:277)

I appreciate any help, as well as the ongoing excellent work you all do.

Cheers,
Alex Charney

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @alexcharney
    Hi Alex,

    I think the major issue here is that annotations are not quite configured properly in HaplotypeCaller/Mutect2 yet. There are plans to have them all work properly, and you can keep track of the issues here and here. I suspect once those fixes/features are in, most of your issues will be solved.

    However, I am not sure why BaseQualityRankSumTest is throwing that error. Can you confirm your BAM file validates with ValidateSamFile?

    Thanks,
    Sheila

Sign In or Register to comment.