The current GATK version is 3.2-2

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

# Error in Haplotype Caller

Posts: 1Member
edited August 2012

Hi,

I am trying to run the latest version (GenomeAnalysisTK-2.0-35-g2d70733) of the HaplotypeCaller on some .bam files that I had prepared according to the Best Practice v.3. Now GATK reports the following error:

##### ERROR ------------------------------------------------------------------------------------------

Now I am assuming my old bam files are not compatible with the new HaplotypeCaller. Is that correct?

Thank you for your help, K

Tagged:

• Posts: 122GATK Developer mod

Hi there,

Glad to hear you are trying out the HaplotypeCaller. I don't think it is actually a problem with your bam. We believe this issue is fixed in the latest internal development version of the tool. We plan to push this fix out with the release of version 2.1 of the GATK which should be in another week or two.

Thanks so much for your help,

Post edited by rpoplin on

• Posts: 9Member

I am getting a similar error from HapotypeCaller and looking forward to the patched release.

• Posts: 24Member
edited August 2012

I just downloaded version 2.1-0 and run the Haplotype Caller on data processed following the best recommendations v4, but I also get the same error as khayer. However, since I just saw this post, I have produced by BAM files using Version 2.0. Should I repeat my processing? Thanks. Eva.

##### ERROR ------------------------------------------------------------------------------------------
Post edited by evakoe on
• Posts: 24Member

Edit: I reprocessed by BAM file using GATK 2.1-0 for all steps and I still get the same error for the Haplytype Caller. Eva

At this stage why don't you exact with PrintReads an interval of your BAM file that reproduces the error and upload it (and the reference, if this isn't human data) to our FTP server

-- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

• Posts: 122GATK Developer mod

Thanks! We've received the file and will take a look at it right away.

Thanks for your help in tracking this down,

• Posts: 2Member

Yes I upgraded to 2.1 and got the same error:

##### ERROR ------------------------------------------------------------------------------------------
• Posts: 122GATK Developer mod

@evakoe said: I uploaded a file, its called SRR287669_MD_IR_BQSR1.bam. As a reference I used human_g1k_v37.fasta from your bundle. I performed MarkDuplicates, Indel Realignment and BQSR with it only on chr 2 and 8 using the -L command. Thank you. Eva

Hi there,

That file doesn't seem to be aligned to human_g1k_v37.fasta. It looks like ucsc.hg19.fasta but the contigs are in the wrong order. Do you have the command line that you used to generate the error with this bam file? Also the commands for how this file was generated would be helpful too.

Thanks!

• Posts: 122GATK Developer mod

In the meantime if anyone else can use PrintReads to extract an interval of your BAM file that reproduces the error and upload it (and the reference, if this isn't human data) to our FTP server

Thanks!

• Posts: 24Member

@rpoplin I'm sorry about that I was quite sure it was human_g1k_v37, but I must have mixed it up with previous experiments. Anyway, I repeated by whole processing using GATK 2.1-0 (alignment with BWA to human_g1k_v37.fasta, Mark Duplicates, Indel Realignment, BQSR) and called the HaplotypeCaller again. The run has not finished yet, but the error has not ocurred and previously I got it right in the beginning. I performed the processing mentioned above basically following the recommendations. Here is my commandline for the HaplotypeCaller:

java -Xmx4g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human_g1k_v37.fasta -I in.bam -o out.vcf -D dbSNP137.vcf -A DepthOfCoverage -A HaplotypeScore -A MappingQualityRankSumTest -A FisherStrand -A ReadPosRankSumTest -A QualByDepth -et NO_ET -K mykey -L 2

So maybe a solution is to rerun all analysis using 2.1-0 while paying attention to consistency in the reference files.

• Posts: 9Member

@evakoe said:

So maybe a solution is to rerun all analysis using 2.1-0 while paying attention to consistency in the reference files.

Has your run finished without errors? As you suggested, I tried processing my BAM with 2.1-0 and still had the same error.((

• Posts: 9Member

@rpoplin said: In the meantime if anyone else can use PrintReads to extract an interval of your BAM file that reproduces the error and upload it (and the reference, if this isn't human data) to our FTP server

Could you please suggest me how to find an interval reproducing the error? In the run log before the error, I can only see the last region of my reference processed by the walker. Thank you in advance for your help.

• Posts: 122GATK Developer mod
edited August 2012

There are two options that you could try: The simplest is to just guess an interval using the last region in your log file like you mentioned-- you can put a window of about 10000 bases on either side and that should do it. Or, if you add -debug to your HaplotypeCaller command line you'll see very verbose debug statements about every region that is processed. This will tell you the exact interval that failed.

Thank you for your willingness to experiment a little bit here.

Post edited by rpoplin on
• Posts: 2Member

And the same error appears with the 2.1-2 release as well...

##### ERROR ------------------------------------------------------------------------------------------
• Posts: 122GATK Developer mod
edited August 2012

@ArtemPankin said: **Thank you ever so much for your help.

Here is the log of the error with the -debug option. Did I understand correctly that the following region of Chr2 is a source of the error?**

Assembling Chr2:224105-224265 with 254 reads: (with overlap region = Chr2:224040-224330)

Found 5 candidate haplotypes to evaluate every read against. cACCACGgCCTAAAaGAAaaCCTAaCTGtCCATaTCcTCgAAAaGGTtGTcTCaGCtCTGaGAcACCcACCaGAGAAGTTCCAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTCACCAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG

'> Cigar = 291M CACCACGGCCTAAAAGAAAACCTAACTGTCCATATCCTCGAAAAGGTTGTCTCAGCTCTGAGACACCCACCAGAGAAGTTCCAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTCACCAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG

Ah! I see the problem is that the upper and lower case bases in the reference and reads are treated as differences so it was trying to create a c -> C SNP. Thanks for your help.

Post edited by rpoplin on
• Posts: 24Member

@ArtemPankin Yes, my run finished without errors.

• Posts: 122GATK Developer mod

Ok, this is hopefully fixed in version 2.1-3 which will show up on the website for download later today. Thank you for all the information that helped track this down.