The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

# HaplotypeCaller Error: Mismatch between the reference haplotype and reference assembly graph path

Member Posts: 12

Hi I have been running HaplotypeCaller on >700 monkey alignments and came across this error in some intervals:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.IllegalStateException: Mismatch between the reference haplotype and the reference assembly graph path. for graph BaseGraph{kmerSize=10} graph = GGAATAACTCCAGGCAACCA
GTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTTCCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTT
CCCACAGGCACAGCCC haplotype = CCAGGCAACCAGTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTT
CCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTTCCCACAGGCACAGCCC
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:665) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:661)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2013-05-17-g2c8b717):
##### ERROR
##### ERROR Please check the documentation guide to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR
##### ERROR MESSAGE: Mismatch between the reference haplotype and the reference assembly graph path. for graph BaseGraph{kmerSize=10} graph = GGAATAACTCCAGGCAACCAGTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTTCCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTTCCCACAGGCACAGCCC haplotype = CCAGGCAACCAGTTCCAGCCGCCTCCTCCCTGTCTCCTTCAAGGTTCCCTTCCTCTACCTGCAATTTACAACCTCAGTGGTTCCCCAGGGCTCTGTCCTGCGCCCTCAGTGCTTCCCTTCTGCACGTTTTCCCAGGCAATCTCTTCCTGCCTCTGGGCACCAACTCCATCCGTATAGAGATAGTTCCCACAGGCACAGCCC
##### ERROR ------------------------------------------------------------------------------------------


My commandline looks like (omitting long list of bam files):

java -Xms6000m -Xmx8000m -XX:PermSize=1500m -XX:MaxPermSize=2000m -jar gatk2Jar/GenomeAnalysisTK.jar --reference_sequence reference/3280_vervet_ref_6.0.3.fasta -T HaplotypeCaller --unsafe --validation_strictness SILENT --read_filter BadCigar --num_threads 1 -L:bed folder/Scaffold84_line_1064463_1069462_bed.tsv --out NewCaller/Scaffold84_1064463_1069462.orig.vcf --heterozygosity 0.01 --minPruning 2 --downsample_to_coverage 40 --downsampling_type BY_SAMPLE -I ...

Tagged:

Hi there, can you try again with the very latest nightly build and let me know if the error still occurs? Also, I notice you are using the --unsafe flag; does the error also occur when you don't use it?

Geraldine Van der Auwera, PhD

• Member Posts: 12

Tried both. same error:

##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2013-05-30-g0bec5c0)
...
##### ERROR MESSAGE: Mismatch between the reference haplotype and the reference assembly graph path. for graph BaseGraph{kmerSize=10} graph = TACCTAGCTATCTGTCTTTGTATGTATCATCTAATCTTTTATTTATATTGCTTTTAGTAAATAAGAACCTCATTTTAAACACTGGAAAGTATTCTTAGCTCAGAACGTGCACACCAGACTGGAATTAGAAAGGCACAGAGATGTCATGCTTTCACCATGCTATATTTTTGGGAGTGAAGTAACCAAGAAATAGGAAGAGAGGGCCCT haplotype = GCTATCTGTCTTTGTATGTATCATCTAATCTTTTATTTATATTGCTTTTAGTAAATAAGAACCTCATTTTAAACACTGGAAAGTATTCTTAGCTCAGAACGTGCACACCAGACTGGAATTAGAAAGGCACAGAGATGTCATGCTTTCACCATGCTATATTTTTGGGAGTGAAGTAACCAAGAAATAGGAAGAGAGGGCCCT
##### ERROR ------------------------------------------------------------------------------------------


commandline is (no --unsafe):

java -Xms6000m -Xmx8000m -XX:PermSize=1500m -XX:MaxPermSize=2000m -jar gatk2Jar/GenomeAnalysisTK.jar --reference_sequence reference/3280_vervet_ref_6.0.3.fasta -T HaplotypeCaller --validation_strictness SILENT --read_filter BadCigar --num_threads 1 -L:bed folder/Scaffold84_line_1064463_1069462_bed.tsv --out NewCaller/Scaffold84_1064463_1069462.orig.vcf --heterozygosity 0.01 --minPruning 2 ...

If you compare reference haplotype and reference assembly graph closely. The difference lies in the first 5 bases of assembly graph path. The ref haplotype does not have those 5 bases. everything else is same.

yu

@Geraldine_VdAuwera said:
Hi there, can you try again with the very latest nightly build and let me know if the error still occurs? Also, I notice you are using the --unsafe flag; does the error also occur when you don't use it?

I see, thanks for trying. Did you also get the error with the public release (2.5-2)? Can you tell me what was your reason for using the nightly build in the first place, since technically they are unsupported?

Geraldine Van der Auwera, PhD

• Member Posts: 12

that was due to a bug in ReduceReads (@Carneiro fixed it in a nightly build). but ok , let me see if 2.5-2 would work.

@Geraldine_VdAuwera said:
I see, thanks for trying. Did you also get the error with the public release (2.5-2)? Can you tell me what was your reason for using the nightly build in the first place, since technically they are unsupported?

Just to let you know that I'm seeing this error on a data set I'm working on right now. I'm really having a hard time reproducing it on a small data set. Do you have a command line that will reproduce the issue quickly? Unfortunately it doesn't seem to have anything to do with the actual interval being assembled, but seems to be some kind of state problem in the GATK itself. Very very annoying.

--
Mark A. DePristo, Ph.D.
Co-Director, Medical and Population Genetics
Broad Institute of MIT and Harvard

• Member Posts: 12

Hey Mark, I'm in the process of selecting this particular interval (2Mb) from >700 alignments, merging, and running to repeat the traceback.

it looks like another 40-hour is needed to get the full traceback. I just wanna make sure you still need this package though? or the bug has been fixed?

@Mark_DePristo said:
Just to let you know that I'm seeing this error on a data set I'm working on right now. I'm really having a hard time reproducing it on a small data set. Do you have a command line that will reproduce the issue quickly? Unfortunately it doesn't seem to have anything to do with the actual interval being assembled, but seems to be some kind of state problem in the GATK itself. Very very annoying.

The latest GATK nightly build has a fix for this issue. Give it a try, and let us know if it fixed the problem for you

--
Mark A. DePristo, Ph.D.
Co-Director, Medical and Population Genetics
Broad Institute of MIT and Harvard

• London, UKMember Posts: 10

Hi,

I'm using --num_threads for HaplotypeCaller so I can speed up the process but it says
"Invalid command line: Argument nt has a bad value: The analysis HaplotypeCaller currently does not support parallel execution with nt. Please run your analysis without the nt option"

I'm a bit confused since the guy above had it as an option..

Can someone please clear this up for me?

Thank you very much