Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HashMap iterator problem with GATK 3.6 on NA12878 validations

chapmanbchapmanb Boston, MAMember ✭✭

Hi all;
I was running validations with the latest GATK 3.6-0 release and ran into an issue on NA12878 where a region around the centromere on X fails with a HashMap NoSuchElementException. I tried to isolate into a test case and here is a tarball with the smallest set of regions I could reproduce on:

https://s3.amazonaws.com/chapmanb/testcases/gatk36_hashmap_report.tar.gz

This has the inputs and a small shell script to demonstrate.

It's a bit of a confusing one to me. If I try to reduce the test case further -- to only the region that appears to fail when DEBUG is turned on -- it will work. The problem seems to have some dependence on the prior state.

Here is the full traceback:

##### ERROR --
##### ERROR stack trace 
java.util.NoSuchElementException
        at java.util.HashMap$HashIterator.nextNode(HashMap.java:1431)
        at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.reduceNumberOfAlternativeAllelesBasedOnLikelihoods(HaplotypeCallerGenotypingEngine.java:336)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:264)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:964)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:251)
        at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
        at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
        at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
        at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

Any ideas to work around or avoid are welcome. Please let me know if I can provide any other information. Thanks for all the great work on GATK,
Brad

Tagged:

Issue · Github
by Sheila

Issue Number
984
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Hi Brad,

    Based on the involvement of HaplotypeCallerGenotypingEngine.reduceNumberOfAlternativeAllelesBasedOnLikelihoods, I would say this seems to be associated with some new code that was put in to handle messy regions. Sounds like we missed a corner case in testing.

    So thanks for the test data, we'll try to get this figured out.
  • SheilaSheila Broad InstituteMember, Broadie admin

    @chapmanb
    Hi Brad,

    Thanks for submitting this. I just put in a bug report. You can keep track of it here.

    -Sheila

  • chapmanbchapmanb Boston, MAMember ✭✭

    Geraldine and Sheila -- thanks much for confirming and submitting to the bug tracker. Is it possible to follow bugs in some way so I can get updates on the status and provide more information if needed? It looks like the linked GitHub issue is private. Not sure if I missed a way to do this. Thanks again.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @chapmanb Currently we don't have any automated way for communicating bug status updates, unfortunately. We've been thinking about how to do it but the dev repo being private makes it awkward. This is a limitation that won't affect GATK4 (because the live dev repos are fully public) so I'm not sure we'll be able to justify putting in the effort to resolve it for 3.x. For now we'll do our best to communicate through the forum as the status evolves and whether we need anything more from you.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @chapmanb
    Hi Brad,

    The fix has just been merged! You can download the nightly build tomorrow, and everything should work fine :smile:

    -Sheila

  • chapmanbchapmanb Boston, MAMember ✭✭

    Sheila and Geraldine -- brilliant, thanks so much. I'll give it a try right away. Much appreciated.

  • chapmanbchapmanb Boston, MAMember ✭✭

    Thanks again for the fix. The nightly as of June 21st work correctly for me on the failing dataset. For anyone interested, here are the validations of GATK 3.6 (well, the nightly) against the latest Genome in a Bottle truth sets for HaplotypeCaller and a GiaB somatic mixture for MuTect2:

    http://imgur.com/a/rm4ML

    Looking forward to a new point release with this fix I can point people at. Thanks again.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Great, glad to hear it! We're waiting for some SRA related code to be ready to make a point release -- I expect it will happen sometime next month (July).
  • buddejbuddej St. LouisMember

    Can confirm build vnightly-2016-06-21-g6a3ff12 (and later, presumably) fixes an identical error message produced with GATK 3.6 running on real data from our lab.

    Only 1 sample / 424 triggered the error, so yes, a rare case.

  • BenediktBenedikt GermanyMember

    I get the same error with 2/3 of my samples. I'm going to try the nightly build as well now an will report back.

  • BenediktBenedikt GermanyMember

    The nightly build also fixed this issue for me! Thx.

  • nilshomernilshomer Boston, MAMember

    I am getting this error on 3.6-0-g89b7209 as well. Could you update the release?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @nilshomer We're looking at cutting a 3.7 release sometime early October. In the meantime you can use the nightly build if it's blocking your work.

  • nilshomernilshomer Boston, MAMember

    @Geraldine_VdAuwera how about a 3.6.1 release with this fix and #7795? Both seem like showstoppers that are difficult to overcome without ignoring that region.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @nilshomer Unfortunately it's not as simple as cherry-picking these two fixes into a point release, as they rely on multiple other changes that have been made since the last release. I'm afraid October is the earliest we can expect to get a full release out. Usually this wouldn't be so difficult, but as the majority of development effort is going into GATK4 now, and my team in particular is focusing on an upcoming workshop, the GATK3 crew is especially short-handed. If this is urgent, you best bet is to use the nightly build once the fix for the other issue is in. The nightlies pass all tests and are as stable as any release we would be able to make right now. I can give you a change list if that helps.

  • apfuentesapfuentes Member

    Hello GATK community,
    I obtained a similar error as Chapmanb. I have GATK v3.6-0-g89b7209 (version downloaded at the end of September 2016), in computer with 64GB RAM, 24 processors (-nct 24 -mbq 20 -minPruning 5). Stack trace below:

    ERROR --
    ERROR stack trace

    java.util.NoSuchElementException
    at java.util.HashMap$HashIterator.nextNode(HashMap.java:1439)
    at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.reduceNumberOfAlternativeAllelesBasedOnLikelihoods(HaplotypeCallerGenotypingEngine.java:336)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:264)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:964)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:251)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    How could I find the nightly-2016-06-21-g6a3ff12 to fix this? I checked the Nightly builds page and the oldest entry I could see dates 2016-09-30-g026f7e8.

    Thanks in advance for any help!

  • SheilaSheila Broad InstituteMember, Broadie admin

    @apfuentes
    Hi,

    You can use the latest nightly build, which will still have the fix in it. Is there a reason you need the build from 6-12?

    Thanks,
    Sheila

  • apfuentesapfuentes Member

    Hi Sheila,
    Thanks for your response! Ok, that sounds good. If the latest build has the fix, that is all I need. I have an additional question and was wondering if you could please help me on that. I am new in GATK, so I apologize if it is too basic.

    I am running HaplotypeCaller to individual bam files (one bam per population sample) to obtain individual g.vcf files. My plan is to use all the g.vcf files at once for SNP calling using GenotypeCaller and obtain one VCF file.

    I have been using these settings for each bam file:
    -T HaplotypeCaller -nct 24 -R /path/ref-genome/genome.fasta --emitRefConfidence GVCF -I /path/Pop1.sorted.MarkDup.RG.bam -o Pop1.g.vcf -mbq 20 -minPruning 5

    In the HC documentation the Parallelism option is -nct, thus I used it for my analysis. However, I just realized it is not recommended to use in HC and if I do it, I should have set -nct to 4:

    I already finished the run of 7 files using the command above (1.5 week job) but I am worried the data is not reliable. I only got one crash for 1 particular file, the others seem to have finished successfully. Should I start over again but using the Queue? How should I parallelize this job better? Maybe setting -ntc to 4 and -nt to 24 and splitting each bam file using the Queue by Contig (scaffold)? My computer has 24 cores and 64 GB of RAM, GATK v3.6.

    Thanks in advance for any help!!

  • LavanyaLavanya Member

    @Sheila said:
    @chapmanb
    Hi Brad,

    The fix has just been merged! You can download the nightly build tomorrow, and everything should work fine :smile:

    -Sheila

    Hi Shiela,
    We are also facing the same issue as reported earlier.

    ERROR --
    ERROR stack trace

    java.util.NoSuchElementException
    at java.util.HashMap$HashIterator.nextNode(HashMap.java:1439)
    at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.reduceNumberOfAlternativeAllelesBasedOnLikelihoods(HaplotypeCallerGenotypingEngine.java:336)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:264)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:964)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:251)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    Is there any release expected in October or should I use the latest nightly version from https://software.broadinstitute.org/gatk/download/nightly

    Thanks and Regards
    Lavanya

  • SheilaSheila Broad InstituteMember, Broadie admin

    @apfuentes
    Hi,

    If the runs with -nct/-nt finish with no errors, you are good to go. The major issues arise when the runs with -nct/-nt fail with an error message. However, it sounds like most of your runs finished with no error messages. If some runs do fail, you can either re-start them again or try running without parallelism.

    I should note, with multi-threading, your results will not be 100% replicable. This is because the multi-threading can give slightly different results in "edge case" calls. Some users have noted slightly different annotation values or slightly different calls. Again, these only occur in "edge case" calls. For 100% replicable results, you will have to not use multi-threading.

    We are phasing out Queue and moving to WDL for speeding up jobs. You can read more about WDL here.

    -Sheila

  • apfuentesapfuentes Member
    Thanks a lot Sheila for taking the time to answer my question!

    I'm thinking on what would be the best approach to speed up the run of HaplotypeCaller to produce individual gVCF files per bam file but avoiding problems with multi-threading. I'm running this analysis in a single Desktop computer with 24 cores and 64 GB RAM.

    Is it a good strategy for this project using WDL to split the HC run of each bam file per config(chromosome) and using -ntc 4? I assume 4 is OK based on the GATK documentation.

    Thanks for any advice.

    Best regards,
  • SheilaSheila Broad InstituteMember, Broadie admin

    @Lavanya
    Hi Lavanya,

    Have a look at Geraldine's response here. In the meantime, you can start using the nightly build.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @apfuentes
    Hi,

    Sure, you can do that. A lot of users run per-chromosome :smile:

    -Sheila

  • ekofmanekofman Member, Broadie

    I'm getting the same error with version 3.6-0-g89b7209 which is the GATK installed on the Broad servers (both cga02 and ccpm have this version of GATK). Are there newer versions loaded on the Broad servers by any chance?

    ERROR stack trace

    java.util.NoSuchElementException
    at java.util.HashMap$HashIterator.nextNode(HashMap.java:1439)
    at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.reduceNumberOfAlternativeAllelesBasedOnLikelihoods(HaplotypeCallerGenotypingEngine.java:336)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:264)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:964)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:251)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
  • SheilaSheila Broad InstituteMember, Broadie admin

    @ekofman
    Hi,

    This issue should be fixed in the last release of GATK3. There is the very latest version (or at least a version that has the fix) on the Broad server. Let me know if you need help finding it, and I can message you privately.

    -Sheila

  • ekofmanekofman Member, Broadie
    edited April 2018

    @Sheila That'd be great -- I think I'm using the latest version (use GATK3 with dotkit) but maybe there's another hidden one somewhere....

    These are the commands I'm using in my uger script:

    source /broad/software/scripts/useuse
    use Java-1.8
    use GATK3

    GenomeAnalysisTK -T HaplotypeCaller -R references/mm10.fa -I D4M3A_1_reordered.deduped.recalibrated.bam -o GATK_recalibrated_D4M3A_1.vcf

  • ekofmanekofman Member, Broadie
    edited April 2018

    @Sheila Hi Sheila I still am not able to find the latest version with the fix -- if you could message me privately when you get a chance as you had mentioned that would be much appreciated! Thanks

  • SheilaSheila Broad InstituteMember, Broadie admin

    @ekofman
    Hi,

    I just sent you a private message which should hopefully resolve this :smiley:

    -Sheila

  • spanspan USAMember

    @Sheila

    @Geraldine_VdAuwera

    I got the same error of running version 3.6-0-g89b7209. Where can I find the nightly update? Because I am running a study, I would like to have minimal changes in GATK from this version, but get the bug/problem fixed.

    Would you please help?

    Thanks.

    Sam

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @span

    We do not support GATK3 anymore. Please upgrade to the latest GATKv4.1.3.0. Here is the link to the nightly builds: https://software.broadinstitute.org/gatk/download/nightly\

  • spanspan USAMember

    Dear bhanuGandham,

    Would you please help me to get access to this nightly update as mentioned early to this post: vnightly-2016-06-21-g6a3ff12?

    It will be greatly helpful.

    Thank you very much.
    SPan

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
Sign In or Register to comment.