HaplotypeCaller Error: Mismatch between the reference haplotype and reference assembly graph path

croceacrocea Posts: 12Member

Hi I have been running HaplotypeCaller on >700 monkey alignments and came across this error in some intervals:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.IllegalStateException: Mismatch between the reference haplotype and the reference assembly graph path. for graph BaseGraph{kmerSize=10} graph = GGAATAACTCCAGGCAACCA
GTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTTCCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTT
CCCACAGGCACAGCCC haplotype = CCAGGCAACCAGTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTT
CCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTTCCCACAGGCACAGCCC
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LocalAssemblyEngine.sanityCheckReferenceGraph(LocalAssemblyEngine.java:396)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LocalAssemblyEngine.sanityCheckGraph(LocalAssemblyEngine.java:378)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:135)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:751)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:672)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:136)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:665)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:661)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:260)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:80)
        at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:301)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2013-05-17-g2c8b717):
##### ERROR
##### ERROR Please check the documentation guide to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Mismatch between the reference haplotype and the reference assembly graph path. for graph BaseGraph{kmerSize=10} graph = GGAATAACTCCAGGCAACCAGTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTTCCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTTCCCACAGGCACAGCCC haplotype = CCAGGCAACCAGTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTTCCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTTCCCACAGGCACAGCCC
##### ERROR ------------------------------------------------------------------------------------------

My commandline looks like (omitting long list of bam files):

java -Xms6000m -Xmx8000m -XX:PermSize=1500m -XX:MaxPermSize=2000m -jar gatk2Jar/GenomeAnalysisTK.jar --reference_sequence reference/3280_vervet_ref_6.0.3.fasta -T HaplotypeCaller --unsafe --validation_strictness SILENT --read_filter BadCigar --num_threads 1 -L:bed folder/Scaffold84_line_1064463_1069462_bed.tsv --out NewCaller/Scaffold84_1064463_1069462.orig.vcf --heterozygosity 0.01 --minPruning 2 --downsample_to_coverage 40 --downsampling_type BY_SAMPLE -I ...

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 9,347Administrator, Dev admin

    Hi there, can you try again with the very latest nightly build and let me know if the error still occurs? Also, I notice you are using the --unsafe flag; does the error also occur when you don't use it?

    Geraldine Van der Auwera, PhD

  • croceacrocea Posts: 12Member

    Tried both. same error:

    ##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2013-05-30-g0bec5c0)
    ...
    ##### ERROR MESSAGE: Mismatch between the reference haplotype and the reference assembly graph path. for graph BaseGraph{kmerSize=10} graph = TACCTAGCTATCTGTCTTTGTATGTATCATCTAATCTTTTATTTATATTGCTTTTAGTAAATAAGAACCTCATTTTAAACACTGGAAAGTATTCTTAGCTCAGAACGTGCACACCAGACTGGAATTAGAAAGGCACAGAGATGTCATGCTTTCACCATGCTATATTTTTGGGAGTGAAGTAACCAAGAAATAGGAAGAGAGGGCCCT haplotype = GCTATCTGTCTTTGTATGTATCATCTAATCTTTTATTTATATTGCTTTTAGTAAATAAGAACCTCATTTTAAACACTGGAAAGTATTCTTAGCTCAGAACGTGCACACCAGACTGGAATTAGAAAGGCACAGAGATGTCATGCTTTCACCATGCTATATTTTTGGGAGTGAAGTAACCAAGAAATAGGAAGAGAGGGCCCT
    ##### ERROR ------------------------------------------------------------------------------------------
    

    commandline is (no --unsafe):

    java -Xms6000m -Xmx8000m -XX:PermSize=1500m -XX:MaxPermSize=2000m -jar gatk2Jar/GenomeAnalysisTK.jar --reference_sequence reference/3280_vervet_ref_6.0.3.fasta -T HaplotypeCaller --validation_strictness SILENT --read_filter BadCigar --num_threads 1 -L:bed folder/Scaffold84_line_1064463_1069462_bed.tsv --out NewCaller/Scaffold84_1064463_1069462.orig.vcf --heterozygosity 0.01 --minPruning 2 ...

    If you compare reference haplotype and reference assembly graph closely. The difference lies in the first 5 bases of assembly graph path. The ref haplotype does not have those 5 bases. everything else is same.

    yu

    @Geraldine_VdAuwera said:
    Hi there, can you try again with the very latest nightly build and let me know if the error still occurs? Also, I notice you are using the --unsafe flag; does the error also occur when you don't use it?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 9,347Administrator, Dev admin

    I see, thanks for trying. Did you also get the error with the public release (2.5-2)? Can you tell me what was your reason for using the nightly build in the first place, since technically they are unsupported?

    Geraldine Van der Auwera, PhD

  • croceacrocea Posts: 12Member

    that was due to a bug in ReduceReads (@Carneiro fixed it in a nightly build). but ok , let me see if 2.5-2 would work.

    @Geraldine_VdAuwera said:
    I see, thanks for trying. Did you also get the error with the public release (2.5-2)? Can you tell me what was your reason for using the nightly build in the first place, since technically they are unsupported?

  • Mark_DePristoMark_DePristo Posts: 153Administrator, Dev admin

    Just to let you know that I'm seeing this error on a data set I'm working on right now. I'm really having a hard time reproducing it on a small data set. Do you have a command line that will reproduce the issue quickly? Unfortunately it doesn't seem to have anything to do with the actual interval being assembled, but seems to be some kind of state problem in the GATK itself. Very very annoying.

    --
    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • croceacrocea Posts: 12Member

    Hey Mark, I'm in the process of selecting this particular interval (2Mb) from >700 alignments, merging, and running to repeat the traceback.

    it looks like another 40-hour is needed to get the full traceback. I just wanna make sure you still need this package though? or the bug has been fixed?

    @Mark_DePristo said:
    Just to let you know that I'm seeing this error on a data set I'm working on right now. I'm really having a hard time reproducing it on a small data set. Do you have a command line that will reproduce the issue quickly? Unfortunately it doesn't seem to have anything to do with the actual interval being assembled, but seems to be some kind of state problem in the GATK itself. Very very annoying.

  • Mark_DePristoMark_DePristo Posts: 153Administrator, Dev admin

    The latest GATK nightly build has a fix for this issue. Give it a try, and let us know if it fixed the problem for you

    --
    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • E.ScienceE.Science London, UKPosts: 10Member

    Hi,

    I'm using --num_threads for HaplotypeCaller so I can speed up the process but it says
    "Invalid command line: Argument nt has a bad value: The analysis HaplotypeCaller currently does not support parallel execution with nt. Please run your analysis without the nt option"

    I'm a bit confused since the guy above had it as an option..

    Can someone please clear this up for me?

    Thank you very much

  • SheilaSheila Broad InstitutePosts: 2,678Member, Broadie, Moderator, Dev admin
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 9,347Administrator, Dev admin

    If I'm not mistaken, HC may have supported -nt in the past, but that option was removed in order to add other functionality.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.