US Holiday notice: this Thursday and Friday (Nov 25-26) the forum will be unattended. Normal service will resume Monday Nov 29. Happy Thanksgiving!

Patch for BadCigar filtering on Novoalign reads containing zero length CIGAR elements

chapmanbchapmanb Posts: 19Member

I'm running into a HaplotypeCaller issue with the latest release (2.5-2) using Novoalign input reads. Here's a small reproducible input file:

https://s3.amazonaws.com/chapmanb/gatk_hc_problem_cigar.bam

Running:

java -Xms750m -Xmx3g -jar GenomeAnalysisTK.jar -R GRCh37.fa -I
problem_cigar.bam -L 4:120371315-120371586 -T HaplotypeCaller -o out.vcf
--read_filter BadCigar -debug

Errors out with:

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: START (0) >
(-1) STOP -- this should never happen, please check read:
HWI-ST1124:106:C15APACXX:1:1107:15450:87092 2/2 58b aligned read. (CIGAR: 38H4D58M)

Looking at the read, the CIGAR string appears to be tricking the BadCigar filter, since it has a 0M element between an insertion and deletion:

38M4I0M4D58M

This patch fixes the BadCigar filter by only considering CIGAR elements with non-zero length:

https://gist.github.com/chapmanb/5568411

With this applied, the read will be properly filtered and HaplotypeCaller can continue without a problem. Hope this helps, please let me know if any other detail about the problem would be helpful.

Brad Chapman, Bioinformatics Core at Harvard School of Public Health

Comments

Sign In or Register to comment.