Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Patch for BadCigar filtering on Novoalign reads containing zero length CIGAR elements

chapmanbchapmanb Posts: 19Member

I'm running into a HaplotypeCaller issue with the latest release (2.5-2) using Novoalign input reads. Here's a small reproducible input file:

https://s3.amazonaws.com/chapmanb/gatk_hc_problem_cigar.bam

Running:

java -Xms750m -Xmx3g -jar GenomeAnalysisTK.jar -R GRCh37.fa -I
problem_cigar.bam -L 4:120371315-120371586 -T HaplotypeCaller -o out.vcf
--read_filter BadCigar -debug

Errors out with:

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: START (0) >
(-1) STOP -- this should never happen, please check read:
HWI-ST1124:106:C15APACXX:1:1107:15450:87092 2/2 58b aligned read. (CIGAR: 38H4D58M)

Looking at the read, the CIGAR string appears to be tricking the BadCigar filter, since it has a 0M element between an insertion and deletion:

38M4I0M4D58M

This patch fixes the BadCigar filter by only considering CIGAR elements with non-zero length:

https://gist.github.com/chapmanb/5568411

With this applied, the read will be properly filtered and HaplotypeCaller can continue without a problem. Hope this helps, please let me know if any other detail about the problem would be helpful.

Brad Chapman, Bioinformatics Core at Harvard School of Public Health

Comments

  • CarneiroCarneiro Posts: 271Administrator, GSA Member admin

    Hi Chapman, thank you for identifying it and sending the patch. I will create a test internally and review your patch soon.

    Thanks.

  • chapmanbchapmanb Posts: 19Member

    Mauricio; Thanks much for looking at this. Is there any other information I can provide to help get this into either a 2.5.x release or 2.6? I'm doing comparison tests with 2.5 and would love to be able to share and reproduce without requiring others to grab my patched copy of 2.5. Thanks again.

    Brad Chapman, Bioinformatics Core at Harvard School of Public Health

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GSA Member admin

    Sorry, my comments must have been lost somewhere on the forum. I've reviewed the patch and am happy with it, but I cannot actually apply the patch. patch -p1 fails with an error when I grab your diff. Can you issue a pull request, or send us a standard patch? We can apply into 2.6 so the nightly will have it and 2.6 will go live with it.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • chapmanbchapmanb Posts: 19Member

    Mark; Thanks for looking at this and sorry about the patch issues. I'm not sure what happened: must be some strangeness with the whitespace changes. It's a simple diff but the shifting of the internal block after the if statement makes it seem more complicated. Here's a pull request on GitHub:

    https://github.com/broadgsa/gatk/pull/4

    Thanks again

    Brad Chapman, Bioinformatics Core at Harvard School of Public Health

Sign In or Register to comment.