Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Error: Graph must have ref source and sink vertices

Dear GATK team.
I experienced the runtime error shown below after HaplptypeCaller had run through about 30% of my data set. I couldn't find any information on what it means. Could you help me figure out what the problem is. Thanks very much!
Best, Nina

PS. This run also for some reason appeared to be taking about 4-5 times longer than a previous run I did on the same machine with an only slightly smaller dataset (smaller reference and fewer alignments, but same number of individuals). I wonder if that also has something to do with the same problem. The full trace of my run is attached.

Error message:

INFO 06:00:45,877 ProgressMeter - Top9_ind_Lib1_3_contig_38301:79 2.74e+06 19.0 h 6.9 h 30.1% 63.0 h 44.0 h
INFO 06:01:45,886 ProgressMeter - Top9_ind_Lib1_3_contig_38354:80 2.75e+06 19.0 h 6.9 h 30.2% 62.9 h 44.0 h
INFO 06:02:22,599 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalStateException: Graph must have ref source and sink vertices
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:526)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:196)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:122)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:157)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:873)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:750)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:140)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:273)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.7-4-g6f46d11):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Graph must have ref source and sink vertices
ERROR ------------------------------------------------------------------------------------------

Best Answer

Answers

  • Just a little more information: I'm trying to call SNPs in RAD data, so my reference is 106K short (~80 bp) sequences that I have identified de novo from a subset of the reads that I have mapped back to the reference.

  • And a bit more information: I do not get any errors when I run the HaplotypeCaller from GATK v. 2.5 on the same data set, so it looks like the problem arises due to some change that has been made since this version.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @ntherkildsen,

    This may be happening because your data type is quite different from what the HC was designed to handle. We'll need a snippet of your reference and data to test locally, to figure out if we can make it work for you, or at least produce a more informative error. Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

  • Thanks a lot Geraldine. I've uploaded the requested files in a folder called Therkildsen_graph_source_sink.zip.
    I noticed that all alignments to the first contig that appears to be causing the problem have complex and long CIGAR fields. I don't know if this may be the problem, but as I said, the files could be processed without errors with the HaplotypeCaller in v.2.5.
    Thanks a lot for looking into it! I really appreciate the help.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks @ntherkildsen, we'll look into it asap.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Nina,

    Just a quick update to let you know I've reproduced your bug and confirmed the problem contig (Top9_ind_Lib1_3_contig_38381). I'm handing this off to the developer now to debug & fix. I'll let you know in this thread when we have a fix.

  • Thank you very much, I really appreciate the help!

  • sryan6sryan6 University of Notre DameMember

    Did this bug ever get fixed? I have the same problem when using RAD data and a de novo assembled reference of about ~131K contigs.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @sryan6,

    Yes, the bug was fixed. What version are you using?

  • sryan6sryan6 University of Notre DameMember

    I am using v2.8-1 and get the following error (very similar to the one posted above)

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalStateException: Graph must have ref source and sink vertices
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:513)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:196)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:113)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:166)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:896)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:799)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Graph must have ref source and sink vertices
    ERROR ------------------------------------------------------------------------------------------
  • sryan6sryan6 University of Notre DameMember

    I should mention that I get a .vcf file that looks good, but the index.(vcf.idx) file is empty

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @sryan6‌,

    Sorry for the late response. Can you please check if the error still occurs with version 3.0? If yes we will need a snippet to reproduce the error locally for debugging.

  • sryan6sryan6 University of Notre DameMember

    Hello Geraldine,

    Sorry for my very late response as well. Unfortunately I am still getting the same error. Is this part of the bug fix going into v3.2?

    Let me know if you would still like me to submit a snippet to reproduce the problem.

    Thanks,
    Sean

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    If you can just test one more thing -- please download the latest nightly build (see Download page) and check if the error still occurs with that. If yes we'll definitely need a snippet. Thanks!

  • D.MiddendorpD.Middendorp The NetherlandsMember

    I know this is an old thread but I was hoping someone is willing to help me as I am getting the same error with GATK v3.4-46-gbc02625.
    This is my error:

    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace
    java.lang.IllegalStateException: Graph must have ref source and sink vertices
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:576)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:211)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:117)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:169)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:988)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:824)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:226)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Graph must have ref source and sink vertices
    ##### ERROR ------------------------------------------------------------------------------------------

    I can provide any data files if needed.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @D.Middendorp We'll try to help you; can you first do the same as I asked the user above, download the latest nightly build (see Download page) and check if the error still occurs with that? If it does we'll need you to narrow down the error to a specific region of your data.

  • sergey_ko13sergey_ko13 Member

    I have this error when I try https://gatkforums.broadinstitute.org/gatk/discussion/3891/calling-variants-in-rnaseq on human samples
    gatk4.1.2.0

    some samples pass HaplotypeCaller, some crash at different (but specific for each sample) sites.

    Is this still a common issue?
    Bests,
    Sergey

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @sergey_ko13

    Can you please post the exact gatk command you are using and the entire error log please.

  • sergey_ko13sergey_ko13 Member

    Dear @bhanuGandham,

    It took a while to re-run things, sorry for the delay.

    we have STAR alignments done with settings from https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/

    further processing:

    samtools index ${out}${fq}.Aligned.sortedByCoord.out.bam

    java -Xmx20G -Djava.io.tmpdir=pwd/tmp -jar ${PICARD} MarkDuplicates \
    INPUT=${out}${fq}.Aligned.sortedByCoord.out.bam \
    OUTPUT=${out}${fq}.aln.srt.md.bam \
    METRICS_FILE=${out}${fq}.md.txt \
    ASSUME_SORT_ORDER=coordinate \
    TMP_DIR=pwd/tmp \
    CREATE_INDEX=true

    java -jar ${GATK4} SplitNCigarReads \
    -R ${REF} \
    --create-output-bam-index true \
    -I ${out}${fq}.aln.srt.md.bam \
    -O ${out}${fq}.aln.srt.md.splitted.bam

    java -jar ${GATK4} BaseRecalibrator \
    -I ${out}${fq}.aln.srt.md.splitted.bam \
    -R ${REF} \
    --known-sites ${GATK4BUNDLE}Homo_sapiens_assembly38.dbsnp138.vcf \
    --known-sites ${GATK4BUNDLE}Homo_sapiens_assembly38.known_indels.vcf.gz \
    --known-sites ${GATK4BUNDLE}Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
    -O ${out}${fq}.aln.srt.md.splitted.bqsr.table

    java -jar ${GATK4} ApplyBQSR \
    -R ${REF} \
    -I ${out}${fq}.aln.srt.md.splitted.bam \
    --bqsr-recal-file ${out}${fq}.aln.srt.md.splitted.bqsr.table \
    --create-output-bam-index true \
    -O ${out}${fq}.aln.srt.md.splitted.bqsr.bam

    java -jar ${GATK4} HaplotypeCaller \
    -D ${GATK4BUNDLE}Homo_sapiens_assembly38.dbsnp138.vcf \
    -R ${REF} \
    --native-pair-hmm-threads 8 \
    -I ${out}${fq}.aln.srt.md.splitted.bqsr.bam \
    --sample-name ${sm} \
    -O ${out}${sm}.split.g.vcf.gz \
    -L /big/sergey/GDC_pipeline/gencode.v22.exon.bed \
    --dont-use-soft-clipped-bases true \
    --assembly-region-padding 0 \
    -bamout ${out}${sm}_hc_out.bam \
    -ERC GVCF

    Error log:

    [August 13, 2019 6:27:59 PM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.72 minutes.
    Runtime.totalMemory()=4563927040
    java.lang.IllegalStateException: Graph must have ref source and sink vertices
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:474)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.getAssemblyResult(ReadThreadingAssembler.java:452)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:437)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:348)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:142)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:288)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:542)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:240)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:308)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:281)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1039)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)

    I hope this helps.
    Sergey

Sign In or Register to comment.