The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Bug in HaplotypeCaller

evolvedmicrobeevolvedmicrobe MGHMember Posts: 20

Hi, I wanted to place what I believe is a bug report.

Bug: If the haplotype caller assembles reads into identical haplotypes, then depending on arbitrary factors, it will emit different variant events for this haplotype, meaning that gVCFs from samples with the same genotype will appear to have different genotypes when their VCFs are viewed.

Longer Explanation : Consider the alignment shown below or a reference genome (top) and a reconstructed haplotype aligned to it (aligned as in either the first or second case):


This alignment is tricky because the "GGGCC" can be positioned in one of two locations, as shown above, and both of these are equivalently optimal with an affine scoring matrix. Which alignment should be returned is thus questionable, but presumably the GATK should always give the same alignment.

However, the GATK does not, and because of this one sample will have a 2D5M15D mutation, while the other can have a 9D5M8D mutation listed, even if both have the exact same genotype. In particular, if the active region or length of the assembled regions change, which can easily happen, then even if every other base is matched, different variants will appear at this locus, based on how the SW alignment is done.

I have tried to track this bug down, and it seems due to how the SW aligner class selects between equally optimal alignments (I believe while constructing the backtrack matrix but have not verified).

Reproducible Example: The higher level problem is that VCFs emitted from identical samples but with slightly different reads/intervals show different variants at this site despite having the same underlying genotypes and assembled haplotypes. Tracking showed this was due to the length of assembled regions varying a bit. A simple example which simply shows the basic flaw in the relevant code is given below.

Does this seem like the type of thing that could be an easy fix? I can also put this code as a branch on gsa_unstable if it would make things easier. I am also assuming that the CombineGVCF step will not fix this error, but have not validated the behavior and for my purposes it would still be problematic.

Warm wishes,

//Two examples of an assmebled haplotype and reference genome,
//exactly the same save the window size
    //a simplified version of the getCigar routine in the haplotype caller
    final String SW_PAD = "NNNNNNNNNN";
    final String paddedRef = SW_PAD + new String(fullRef) + SW_PAD;
    final String paddedPath = SW_PAD + new String(fullHap) + SW_PAD;
    final SmithWaterman alignment = new SWPairwiseAlignment( paddedRef.getBytes(), paddedPath.getBytes(), CigarUtils.NEW_SW_PARAMETERS );
    final SmithWaterman alignment2 = new SWPairwiseAlignment( fullRef, fullHap, CigarUtils.NEW_SW_PARAMETERS );
    //These cigars are not the same, but should be
    Cigar rawPadded = alignment.getCigar();
    Cigar notPadded= alignment2.getCigar();

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,669 admin

    Hi @evolvedmicrobe‌,

    That's interesting, thanks for bringing it up. I don't expect that running CombineGVCFs would fix this issue. Would you be able to share a snippet of data (bam file) that we could use to debug & test this issue on?

    Geraldine Van der Auwera, PhD

  • evolvedmicrobeevolvedmicrobe MGHMember Posts: 20

    Woops, hit enter too soon, anyway, yes would be more than happy to offer a BAM and an example shell script that can produce two distinct VCFs from the same BAM file. Just let me know where on the server it would be good to drop.

    I also think it might save some time to just see the bug directly though. I just added a branch to the git repository (gsaunstable/Git, it's the latest version of the code trunk, with one additional file put in that shows the alignment, and clearly demonstrates the problem with a simple main method that produces differing cigars for this indel in question. This should make the problem/example easy to see.


  • evolvedmicrobeevolvedmicrobe MGHMember Posts: 20

    @Geraldine_VdAuwera‌ An example BAM file and a python script that can produce the bug ( is available at: /humgen/gsa-scr1/pub/incoming/BugExample.tgz

  • evolvedmicrobeevolvedmicrobe MGHMember Posts: 20

    @Geraldine_VdAuwera‌ Woops, hit the tag early again, anyway, have been in touch with Valentin, and this should not be an issue going forward, thanks!

Sign In or Register to comment.