Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

M2 error with canine germline resource and variants_for_contamination files

kmegqkmegq BroadMember, Broadie

Dear GATK team,

I am running the Mutect2 pipeline on canine tumor samples in Terra, using WDL version 2.5 and GATK version 4.1.2.0. I was able to run the pipeline successfully without entering a germline resource file or VCF of common variants for contamination, however, when I did add these files in, I got the following error:

java.lang.IndexOutOfBoundsException: Index: 5, Size: 5
    at java.util.ArrayList.rangeCheck(ArrayList.java:657)
    at java.util.ArrayList.get(ArrayList.java:433)
    at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.lambda$getGermlineAltAlleleFrequencies$31(SomaticGenotypingEngine.java:350)
    at java.util.stream.ReferencePipeline$6$1.accept(ReferencePipeline.java:244)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
    at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
    at java.util.stream.DoublePipeline.toArray(DoublePipeline.java:506)
    at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.getGermlineAltAlleleFrequencies(SomaticGenotypingEngine.java:352)
    at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.getNegativeLogPopulationAFAnnotation(SomaticGenotypingEngine.java:335)
    at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:141)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:250)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:324)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:308)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:281)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1039)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar Mutect2 -R gs://fc-0b0cb3ce-e2cb-4aef-a8b2-08e60d78e87c/Canis_lupus_familiaris_assembly3.fasta -I gs://fc-8268e82b-ed61-4e04-a8c9-a95a05c0952e/bda6f5ba-8928-45bf-a6b0-9fe67d8dd9a4/PreProcessingForVariantDiscovery_GATK4/cccdda67-56e1-4363-aa6c-46ce53ef8afd/call-GatherBamFiles/attempt-2/Abrams_cell.bam -tumor Abrams_1 --germline-resource gs://fc-0b0cb3ce-e2cb-4aef-a8b2-08e60d78e87c/canid_wgs_ref.1.0.no_samples.vcf.gz -pon gs://fc-afa03a31-404c-4a93-9f6a-31b673db5c69/b92f3c35-5813-455b-94dc-3de3b54f5f98/Mutect2_Panel/c9f21d8a-384e-4d17-a6f8-79a502698827/call-MergeVCFs/1-Mutect2_PON_2019-07-25T22-08-49.vcf -L gs://fc-afa03a31-404c-4a93-9f6a-31b673db5c69/f2138b33-3918-4f8a-9b87-1823a0084ac3/Mutect2/c4844164-ecad-4878-9e5d-cd134a7fb40d/call-SplitIntervals/glob-0fc990c5ca95eebc97c4c204e3e303e1/0000-scattered.interval_list -O output.vcf --f1r2-tar-gz f1r2.tar.gz --af-of-alleles-not-in-resource 0.0007 --downsampling-stride 20 --max-reads-per-alignment-start 6 --max-suspicious-reads-per-alignment-start 6`

The germline resource is a VCF of approximately 80 million SNPs and indels (including multi allelic sites) called from a large number of canine WGS. It is formatted as a VCF with no sample information:

chr1    240     .       TG      T       464.40  PASS    AC=4;AF=0.011;AN=332;BaseQRankSum=0.674;ClippingRankSum=0;DP=14798;ExcessHet=0.0026;FS=5.63;InbreedingCoeff=-0.005;MLEAC=14;MLEAF=0.017;MQ=7.49;MQRankSum=-0.967;QD=22.11;ReadPosRankSum=0.967;SOR=3.18

The VCF for variants for contamination is a subset of this VCF, with only biallelic SNPs with AF between 0.01 and 0.2. Initially, it was formatted the same as the above file.

As part of debugging, I tried removing everything from the INFO field of the variants for contamination file, except allele frequency, and I tried using that simplified VCF both for the germline resource and the variants for contamination file. This seemed to fix the index out of bounds error, but the job then failed at the filtering step, with the following error:

java.lang.IllegalArgumentException: log10p: Log10-probability must be 0 or less
    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
    at org.broadinstitute.hellbender.utils.MathUtils.log10BinomialProbability(MathUtils.java:934)
    at org.broadinstitute.hellbender.utils.MathUtils.binomialProbability(MathUtils.java:927)
    at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ContaminationFilter.calculateErrorProbability(ContaminationFilter.java:56)
    at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2VariantFilter.errorProbability(Mutect2VariantFilter.java:15)
    at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ErrorProbabilities.lambda$new$1(ErrorProbabilities.java:19)
    at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)
    at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ErrorProbabilities.<init>(ErrorProbabilities.java:19)
    at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2FilteringEngine.accumulateData(Mutect2FilteringEngine.java:141)
    at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls.nthPassApply(FilterMutectCalls.java:146)
    at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverse$0(MultiplePassVariantWalker.java:40)
    at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverseVariants$1(MultiplePassVariantWalker.java:77)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverseVariants(MultiplePassVariantWalker.java:75)
    at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:40)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx6500m -jar /root/gatk.jar FilterMutectCalls -V gs://fc-afa03a31-404c-4a93-9f6a-31b673db5c69/0bbb4e0e-7293-4ce5-b81f-d722fcec561a/Mutect2/223610c8-ec63-4439-b339-9503ceb80828/call-MergeVCFs/Abrams_cell-unfiltered.vcf -R gs://fc-0b0cb3ce-e2cb-4aef-a8b2-08e60d78e87c/Canis_lupus_familiaris_assembly3.fasta -O Abrams_cell-filtered.vcf --contamination-table /cromwell_root/fc-afa03a31-404c-4a93-9f6a-31b673db5c69/0bbb4e0e-7293-4ce5-b81f-d722fcec561a/Mutect2/223610c8-ec63-4439-b339-9503ceb80828/call-CalculateContamination/contamination.table --tumor-segmentation /cromwell_root/fc-afa03a31-404c-4a93-9f6a-31b673db5c69/0bbb4e0e-7293-4ce5-b81f-d722fcec561a/Mutect2/223610c8-ec63-4439-b339-9503ceb80828/call-CalculateContamination/segments.table --ob-priors /cromwell_root/fc-afa03a31-404c-4a93-9f6a-31b673db5c69/0bbb4e0e-7293-4ce5-b81f-d722fcec561a/Mutect2/223610c8-ec63-4439-b339-9503ceb80828/call-LearnReadOrientationModel/artifact-priors.tar.gz -stats /cromwell_root/fc-afa03a31-404c-4a93-9f6a-31b673db5c69/0bbb4e0e-7293-4ce5-b81f-d722fcec561a/Mutect2/223610c8-ec63-4439-b339-9503ceb80828/call-MergeStats/merged.stats --filtering-stats filtering.stats --min-median-read-position 10

Both of these tests were run on an interval that included a single chromosome (approximately 24Mb).

Thank you for your help!

Best,
Kate

Issue · Github
by SChaluvadi

Issue Number
6098
State
closed
Last Updated
Assignee
Array
Closed By
davidbenjamin

Best Answer

Answers

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    Here is the GitHub issue you can follow for progress updates: https://github.com/broadinstitute/gatk/issues/6098

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @kmegq Could you share your germline resource file? It doesn't have to be the whole thing. Any small chunk that reproduces the error will do.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @kmegq FilterMutectCalls in GATK 4.1.4 works with the simplified germline resource VCF. By the way, the reason this VCF fixes the out of bounds error is because it does not contain the site above.

    So, I think the only remaining issue in GATK 4.1.4 is the input VCF.

  • kmegqkmegq BroadMember, Broadie

    @davidben thank you for your reply, and for finding the problem with our VCF!

    Hmm. I am also confused as to why that last AF is missing. The VCF was produced using the following tools:

    1. Variant calling was performed using HaplotypeCaller followed by GenotypeGVCFs (GATK nightly build 2016-06-24)

    2. Samples with low coverage/call rate were filtered out using BCFtools.
      bcftools view -Ov -S retainSamples.txt VCF.vcf | bcftools reheader -s renamedSamples.txt - | bgzip -c

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @kmegq I wouldn't be shocked if the fault is with HaplotypeCaller. In the years since that build we have fixed several bugs related to multiallelic sites. If it's too much to re-run HaplotypeCaller with the latest GATK release you could probably find the offending sites one-by-one with a Python or R script. You would probably want to run VariantsToTable to output just the position, the alt alleles, and the AF, and then do something like make sure the alt alleles and AF have the same number of commas.

  • kmegqkmegq BroadMember, Broadie

    Thank you, @davidben!
    We used the vcffixup script from vcflib, which we were running anyway to recalculate the allele frequencies after subsetting the VCF, and it fixed the problem.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    I'm very glad to hear that!

Sign In or Register to comment.