Haplotype Caller Error: Graph must have ref source and sink vertices

I have been following the GATK Best Practices document 'Best Practices for Germline SNP & Indel Discovery in Whole Genome and Exome Sequence' (though without the base recalibration step as I lack a data set of known variants for my non-model organism). I am currently using Haplotype Caller (version 3.6-0-g89b7209, java version 1.8.0_11) to generate per-sample GVCFs for five different individuals. Two of these have worked without issue, two failed as the jobs took longer than the time that I allotted them (and so are running anew with more time), and one has now failed twice with a GATK runtime error.

Attempt 1:
INFO 22:34:41,156 ProgressMeter - flattened_line_542207:211 2.097649546E9 4.4 d 3.0 m 100.0% 4.4 d 2.5 m
INFO 22:35:41,157 ProgressMeter - flattened_line_637897:201 2.098110574E9 4.4 d 3.0 m 100.0% 4.4 d 69.0 s

ERROR --
ERROR stack trace

java.lang.IllegalStateException: Graph must have ref source and sink vertices
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:576)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:211)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:127)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:169)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:1073)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:888)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:251)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Graph must have ref source and sink vertices
ERROR ------------------------------------------------------------------------------------------

job exit status: 1

Attempt 2:
INFO 22:25:06,803 ProgressMeter - flattened_line_521355:189 2.097562838E9 4.4 d 3.0 m 100.0% 4.4 d 2.8 m
INFO 22:26:06,805 ProgressMeter - flattened_line_618525:203 2.097996149E9 4.4 d 3.0 m 100.0% 4.4 d 89.0 s

ERROR --
ERROR stack trace

java.lang.IllegalStateException: Graph must have ref source and sink vertices
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:576)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:211)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:127)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:169)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:1073)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:888)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:251)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Graph must have ref source and sink vertices
ERROR ------------------------------------------------------------------------------------------

job exit status: 1

The previous errors that people have reported with the same error appear to have got to a different point in the program (I've noticed that they are at the 'generating report' stage). Any insights on what issues with one particular dataset would be appreciated (all data sets were sequenced at the same time on the same platform and have since been through the same pipeline).

Thank you.

Kim Warren.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    When this happened in the past, it was due to some failure of assembly that was not handled properly. This was fixed -- but it's possible that you have a corner case causing similar symptoms. We'll need you to narrow down the issue to the region in the data where the error is happening, in order to reproduce the error for debugging.
  • FellwolfFellwolf LondonMember

    Thank you for your reply.

    I've used samtools view file.bam | grep to find the regions that are mentioned just prior to the error (so, in the second example, 'flattened_line_618525'), and am getting no hits for them at all in the input bam file while I can find hits for earlier regions. It seems like this might be related to the error, but I am not sure why this is happening!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Fellwolf
    Hi Kim,

    Can you try running HaplotypeCaller on small regions surrounding the site where the tool seems to crash? That will help determine the specific interval that is causing the error.

    -Sheila

Sign In or Register to comment.