Attention: Want an end-to-end pipelining solution for GATK Best Practices?


Check out Terra here! For more details on whether this is the right fit for you checkout our blogs here.

GATK3.7 GenotypeGVCFs Runtime Error

matthewzatzmanmatthewzatzman SickKidsMember
edited December 2016 in Ask the GATK team

Hello.

I am attempting to perform joint genotyping on .g.vcfs from approximately ~9000 exomes (combined with CombineGVCF into ~200 sample chunks). My estimated runtimes when using GATK3.6 were on the order of several weeks, possibly due to the issues addressed in GATK3.7's max_genotype_count option. Seeing the potential runtime improvements in GATK3.7, I am testing it out, and it appears to be faster (estimated 4-5 days), but I have now encountered a runtime error after ~9h. Is this something to do with the threading options?

Any help would be greatly appreciated. (Running with 16 threads and 80GB allocated memory)

Program Args: -T GenotypeGVCFs -R /hpf/largeprojects/adam/local/genomes/homosapiens/grch38/reference/GRCh38.d1.vd1.fa --variant -o gatk3.7.test.GenotypeRaw.vcf.gz -nt 16

ERROR --
ERROR stack trace

java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
at java.util.LinkedList$ListItr.next(LinkedList.java:888)
at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.coveredByDeletion(GenotypingEngine.java:426)
at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateOutputAlleleSubset(GenotypingEngine.java:387)
at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:251)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:392)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:375)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:330)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:326)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:304)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:135)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @matthewzatzman, it does look like a problem with multithreading. Those tend to be difficult to reproduce and fix, unfortunately. We typically scatter over chromosome intervals rather than use multithreading. If you're able to switch to that instead of multithreading I would recommend it, it's more robust and if something errors partway through you only have to re-run a portion of the overall dataset.

Sign In or Register to comment.