HaplotypeCaller Error: Mismatch between the reference haplotype and reference assembly graph path

croceacrocea Posts: 12Member

Hi I have been running HaplotypeCaller on >700 monkey alignments and came across this error in some intervals:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.IllegalStateException: Mismatch between the reference haplotype and the reference assembly graph path. for graph BaseGraph{kmerSize=10} graph = GGAATAACTCCAGGCAACCA
GTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTTCCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTT
CCCACAGGCACAGCCC haplotype = CCAGGCAACCAGTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTT
CCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTTCCCACAGGCACAGCCC
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LocalAssemblyEngine.sanityCheckReferenceGraph(LocalAssemblyEngine.java:396)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LocalAssemblyEngine.sanityCheckGraph(LocalAssemblyEngine.java:378)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:135)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:751)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:672)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:136)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:665)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:661)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:260)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:80)
        at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:301)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2013-05-17-g2c8b717):
##### ERROR
##### ERROR Please check the documentation guide to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Mismatch between the reference haplotype and the reference assembly graph path. for graph BaseGraph{kmerSize=10} graph = GGAATAACTCCAGGCAACCAGTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTTCCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTTCCCACAGGCACAGCCC haplotype = CCAGGCAACCAGTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTTCCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTTCCCACAGGCACAGCCC
##### ERROR ------------------------------------------------------------------------------------------

My commandline looks like (omitting long list of bam files):

java -Xms6000m -Xmx8000m -XX:PermSize=1500m -XX:MaxPermSize=2000m -jar gatk2Jar/GenomeAnalysisTK.jar --reference_sequence reference/3280_vervet_ref_6.0.3.fasta -T HaplotypeCaller --unsafe --validation_strictness SILENT --read_filter BadCigar --num_threads 1 -L:bed folder/Scaffold84_line_1064463_1069462_bed.tsv --out NewCaller/Scaffold84_1064463_1069462.orig.vcf --heterozygosity 0.01 --minPruning 2 --downsample_to_coverage 40 --downsampling_type BY_SAMPLE -I ...

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,822Administrator, GATK Developer admin

    Hi there, can you try again with the very latest nightly build and let me know if the error still occurs? Also, I notice you are using the --unsafe flag; does the error also occur when you don't use it?

    Geraldine Van der Auwera, PhD

  • croceacrocea Posts: 12Member

    Tried both. same error:

    ##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2013-05-30-g0bec5c0)
    ...
    ##### ERROR MESSAGE: Mismatch between the reference haplotype and the reference assembly graph path. for graph BaseGraph{kmerSize=10} graph = TACCTAGCTATCTGTCTTTGTATGTATCATCTAATCTTTTATTTATATTGCTTTTAGTAAATAAGAACCTCATTTTAAACACTGGAAAGTATTCTTAGCTCAGAACGTGCACACCAGACTGGAATTAGAAAGGCACAGAGATGTCATGCTTTCACCATGCTATATTTTTGGGAGTGAAGTAACCAAGAAATAGGAAGAGAGGGCCCT haplotype = GCTATCTGTCTTTGTATGTATCATCTAATCTTTTATTTATATTGCTTTTAGTAAATAAGAACCTCATTTTAAACACTGGAAAGTATTCTTAGCTCAGAACGTGCACACCAGACTGGAATTAGAAAGGCACAGAGATGTCATGCTTTCACCATGCTATATTTTTGGGAGTGAAGTAACCAAGAAATAGGAAGAGAGGGCCCT
    ##### ERROR ------------------------------------------------------------------------------------------
    

    commandline is (no --unsafe):

    java -Xms6000m -Xmx8000m -XX:PermSize=1500m -XX:MaxPermSize=2000m -jar gatk2Jar/GenomeAnalysisTK.jar --reference_sequence reference/3280_vervet_ref_6.0.3.fasta -T HaplotypeCaller --validation_strictness SILENT --read_filter BadCigar --num_threads 1 -L:bed folder/Scaffold84_line_1064463_1069462_bed.tsv --out NewCaller/Scaffold84_1064463_1069462.orig.vcf --heterozygosity 0.01 --minPruning 2 ...

    If you compare reference haplotype and reference assembly graph closely. The difference lies in the first 5 bases of assembly graph path. The ref haplotype does not have those 5 bases. everything else is same.

    yu

    @Geraldine_VdAuwera said: Hi there, can you try again with the very latest nightly build and let me know if the error still occurs? Also, I notice you are using the --unsafe flag; does the error also occur when you don't use it?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,822Administrator, GATK Developer admin

    I see, thanks for trying. Did you also get the error with the public release (2.5-2)? Can you tell me what was your reason for using the nightly build in the first place, since technically they are unsupported?

    Geraldine Van der Auwera, PhD

  • croceacrocea Posts: 12Member

    that was due to a bug in ReduceReads (@Carneiro fixed it in a nightly build). but ok , let me see if 2.5-2 would work.

    @Geraldine_VdAuwera said: I see, thanks for trying. Did you also get the error with the public release (2.5-2)? Can you tell me what was your reason for using the nightly build in the first place, since technically they are unsupported?

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Just to let you know that I'm seeing this error on a data set I'm working on right now. I'm really having a hard time reproducing it on a small data set. Do you have a command line that will reproduce the issue quickly? Unfortunately it doesn't seem to have anything to do with the actual interval being assembled, but seems to be some kind of state problem in the GATK itself. Very very annoying.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • croceacrocea Posts: 12Member

    Hey Mark, I'm in the process of selecting this particular interval (2Mb) from >700 alignments, merging, and running to repeat the traceback.

    it looks like another 40-hour is needed to get the full traceback. I just wanna make sure you still need this package though? or the bug has been fixed?

    @Mark_DePristo said: Just to let you know that I'm seeing this error on a data set I'm working on right now. I'm really having a hard time reproducing it on a small data set. Do you have a command line that will reproduce the issue quickly? Unfortunately it doesn't seem to have anything to do with the actual interval being assembled, but seems to be some kind of state problem in the GATK itself. Very very annoying.

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    The latest GATK nightly build has a fix for this issue. Give it a try, and let us know if it fixed the problem for you

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

Sign In or Register to comment.