The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

GATK 3.0 HaplotypeCaller RUNTIME ERROR

5581681555816815 TNMember
edited March 2014 in Ask the GATK team

Hi,

I am running Haplotypecaller for ~600 bams to perform gVCF call on a cluster (SGE qsub system). Each node has 32 cores, 256 GB RAM. I am running 8 tasks pernode, so each task has 4 cores and 32GB memories.

It's been running ~30 hours now (a single BAM needs ~5 hours), out of the ~500 finished tasks there are 24 quit with error showing below. I am not sure if it is an error caused by my command or something else. Can someone give any suggestions?

The command:

java -Xmx32G GATK3.0 \ -T HaplotypeCaller \ -ERC gVCF -L EZ_Exome_v2.bed \ -variant_index_type LINEAR \ -variant_index_parameter 128000 \ -R ucsc.hg19.fasta \ -nct 4 \ --dbsnp dbsnp_138.hg19.vcf \ -I 1.recalibrated.bam \ -o 1.recalibrated.vcf

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.util.ConcurrentModificationException
at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:413)
at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:412)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.addMiscellaneousAllele(GenotypingEngine.java:257)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:227)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:872)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.0-0-g6bad1c6):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

thanks a lot!

Tagged:

Best Answer

Answers

  • 5581681555816815 TNMember

    Hi Geraldine,
    Appreciated for the prompt response!
    I will just reload all failed jobs.

    Shuoguo

  • tbernertberner germanyMember

    Hi you two,

    is there anything done so far? Because we ran into the same problem using version 3.1-1-g07a4bf8. It appears on different datasets and after different progress times. Because of we have large BAM files running in single-thread mode would take weeks of runtime.

    ERROR stack trace

    java.util.ConcurrentModificationException
    at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
    at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:413)
    at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:412)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.addMiscellaneousAllele(GenotypingEngine.java:257)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:227)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:880)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @tberner Not yet I'm afraid -- we haven't been able to devote any resources to this. Our current recommendation is to just drop -nct from your command if you run into this issue, or better, use a pipeline manager like Queue that parallelizes over chunks of data and restarts failed jobs automatically. Since the errors typically don't reproduce on the same chunks of data, this eventually gets everything done, and is still faster than running everything single-threaded.

  • MikeKMikeK Hamilton NZMember

    I've just found it too!

    INFO  03:48:33,595 HelpFormatter - ------------------------------------------------------------------------------------------ 
    INFO  03:48:33,597 HelpFormatter - The Genome Analysis Toolkit (GATK) v2014.2-3.1.7-10-g867c2fb, Compiled 2014/04/18 10:40:14 
    INFO  03:48:33,597 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  03:48:33,597 HelpFormatter - For support and documentation go to http://gatkdocs.appistry.com/ 
    INFO  03:48:33,600 HelpFormatter - Program Args: -R /data/seq/indexed-genomes/bos_taurus/umd31MT/umd31MT.fa -dcov 200 -T HaplotypeCaller -BQSR /data/seq/mikee0/checkpoint/23934418.bwamem.sorted.reheadered.bam.firstpass.table -pairHMM VECTOR_LOGLESS_CACHING -nct 12 -I /data/seq/mikee0/checkpoint/23934418.bwamem.sorted.reheadered.bam.realigned.bam --dbsnp dbSNP-138-UMD3.1-no-spaces.vcf.gz --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o 23934418.default-param.g.vcf 
    INFO  03:48:33,605 HelpFormatter - Executing as mikee0@galvatron on Linux 3.13.0-29-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18. 
    INFO  03:48:33,605 HelpFormatter - Date/Time: 2014/07/03 03:48:33 
    INFO  03:48:33,605 HelpFormatter - ------------------------------------------------------------------------------------------
    
    
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.util.ConcurrentModificationException
        at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
        at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:413)
        at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:412)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.addMiscellaneousAllele(GenotypingEngine.java:257)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:220)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:880)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 2014.2-3.1.7-10-g867c2fb):
    

    A thread safety error is a bit of a worry!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Yes it is a concern, that is why we are favoring alternative ways to speed up HC without using multithreading.

  • gtiaogtiao Cambridge, MAMember

    Just to chime in, I've run into this problem, too, on GATK 3.2-2 calling WGS samples with -nct 2. Some of my samples have run happily without any incident for nearly one day before failing.

    Geraldine, is there a current recommendation or best-practice for speeding up HC on large BAMs?

    Grace

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Grace,

    Right now the only recommendation we have is to use Queue to parallelize your HC jobs. And of course, make sure you're using the new workflow to run HC per sample in GVCF mode, followed by joint genotyping.

  • drmjcdrmjc Garvan Institute of Medical ResearchMember

    Hi Geraldine,
    We've seen this bug a few times now as well. We're already using Queue, so the jobs do eventually run to completion. so I guess +1 vote for fixing this bug please!

    I've just noticed that we're using the latest HaplotypeCaller via Queue and -nt 1 and -nct 4, for exome analysis. Would you say this is overkill?

    cheers,
    Mark

  • @drmjc said:
    Hi Geraldine,
    We've seen this bug a few times now as well. We're already using Queue, so the jobs do eventually run to completion. so I guess +1 vote for fixing this bug please!

    I've just noticed that we're using the latest HaplotypeCaller via Queue and -nt 1 and -nct 4, for exome analysis. Would you say this is overkill?

    cheers,
    Mark

    To add to Mark's question, we're using scatterCount = 400 with -nt 1 and -nct 4

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    edited August 2014

    @drmjc @‌kevyin

    Not really overkill, but if you're experiencing issues with multithreading, I would recommend ditching -nct and just using Queue to parallelize.

  • mscjuliamscjulia United StatesMember

    Hello,

    I have noticed a similar error when I'm using version 3.6. The error message is:

    ERROR --
    ERROR stack trace

    java.util.NoSuchElementException
    at java.util.HashMap$HashIterator.nextNode(HashMap.java:1431)
    at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.reduceNumberOfAlternativeAllelesBasedOnLikelihoods(HaplotypeCallerGenotypingEngine.java:336)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:264)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:964)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:251)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    I'm using a single node (ppn=16,mem=64g) per sample. My command is:

    java -Xmx45g -jar /dir/GATK3.6/GenomeAnalysisTK.jar -T HaplotypeCaller -R /refDir/hg19.fasta -I Sample.sort.rmdup.Recal.bam --emitRefConfidence GVCF -o Sample.g.vcf
    

    Same command has been working for many other samples, but four samples have failed three times (2nd time I removed -nt 4, and third time I changed -Xmx60g to 45g) , and the very last output from one of the four samples are as the following, respectively:

    First time:

    INFO  09:00:06,116 ProgressMeter -  chr10:42759459    1.680389714E9    23.0 h           49.0 s       54.9%    41.8 h      18.8 h
    

    Second time:

    INFO  13:09:20,480 ProgressMeter -  chr10:42752134    1.680389714E9    22.8 h           48.0 s       54.9%    41.5 h      18.7 h
    

    Third time:

    INFO  19:05:30,391 ProgressMeter -  chr10:42776974    1.680389714E9    22.9 h           49.0 s       54.9%    41.7 h      18.8 h 
    

    Any idea about what could be causing it please? Thanks a lot.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @mscjulia
    Hi,

    Can you try using the latest nightly build? This was a bug that should be fixed in the latest nightly.

    -Sheila

  • mscjuliamscjulia United StatesMember

    @Sheila said:
    @mscjulia
    Hi,

    Can you try using the latest nightly build? This was a bug that should be fixed in the latest nightly.

    -Sheila

    Thanks so much Sheila. I tried the nightly build with same command and the problem is gone.

  • Hi,

    I am getting the above error with GenotypeGVCFs. Below is the Error stack trace.

    ERROR stack trace

    2016/09/26 13:38:43: ERROR java.util.ConcurrentModificationException
    at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
    at java.util.LinkedList$ListItr.next(LinkedList.java:888)
    at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.coveredByDeletion(GenotypingEngine.java:411)
    at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateOutputAlleleSubset(GenotypingEngine.java:372)
    at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:236)
    at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:392)
    at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:375)
    at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:330)
    at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:311)
    at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:289)
    at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:132)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version nightly-2016-09-18-g04d1693):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    I am using The Genome Analysis Toolkit (GATK) vnightly-2016-09-18-g04d1693 build.

    Regards

    Gaurav

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Gaurav1983
    Hi Gaurav,

    Please tell us the exact command you ran and how you generated the GVCFs.

    Thanks,
    Sheila

  • Hi Sheila,

    The GVCFs were generated with HaplotypeCaller using below command.

    java -Djava.io.tmpdir=$temp -Xmx64g -jar $gatk/GenomeAnalysisTK.jar -T HaplotypeCaller -R $REF -I $output.bam -nct $Ncpu --emitRefConfidence GVCF --dbsnp $DBSNP -o $output.g.vcf -variant_index_type LINEAR -variant_index_parameter 128000

    I am trying to run GenotypeGVCFs using 108 GVCFs files

    java -Xmx128g -jar GenomeAnalysisTK.jar -nt 25 -R GATK_Bundle_2.8_b37/human_g1k_v37_decoy.fasta -T GenotypeGVCFs -o 108.vcf -V 1.g.vcf -V 2.g.vcf

    Regards

    Gaurav

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Gaurav1983
    Hi Gaurav,

    Can you try running with -nt or -nct? Sometimes those cause odd issues.

    Thanks,
    Sheila

Sign In or Register to comment.