Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Haplotype Caller Makes SNPs look like INDELS

tatheytathey TorontoMember

I'm using the HaplotypeCaller to look at SNPs related to antimicrobial resistance and am getting a result that looks like this:

NC_011035.1 2049708 .   CCGGCG  C   ...
NC_011035.1 2049714 .   C   CAAGAA  ...

I believe this is an alignment that would look like:

but instead of giving me 5 individual SNPs, GATK is calling the region as though it is a 5bp deletion at position 2049708 and a 5bp insertion at position 2049714.

Is there any way to change the parameters so that the appropriate call is made?

My current command is:

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -nct 12 -R NCC_011035.fasta -I ST547_dedup_reads_group.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o ST547_raw.vcf


  • SheilaSheila Broad InstituteMember, Broadie admin


    That is just the way the tools output the variants. They are in comparison to the reference genome. There is no way to change the settings in HaplotypeCaller.


    P.S. You may be interested in FastaAlternateReferenceMaker which will output the sample variants in a FASTA file. That will allow you to visualize the variants without comparing them to the reference genome.

  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi again,

    It seems I may have jumped the gun when responding.

    You are correct that the representation you wrote is valid, as is the one from HaplotypeCaller. It is difficult to justify one over the other because they are both valid representations. HaplotypeCaller chose that particular representation from the graph it created (also the 5 SNPs is not the most parsimonious representation). This is a limitation of the tool (you may not get the representation you want).

    The correct way to represent this is as a complex substitution, but HaplotypeCaller cannot do that.
    You can try using bcftools to normalize the variants if you are interested in particular representations or have another VCF you would like to compare to.


  • tatheytathey TorontoMember

    Hi Sheila,

    Thanks for the response. Unfortunately, normalization does not change the variants because they are represented as two indels which do not merge. If they were represented as a block such as

    NC_011035.1 2049708 .   CCGGCGC  CCAAGAA   ...

    then normalization would do the trick.

    I agree that this is a more complex substitution than 5 SNPs, but I would argue that representing these substitutions as SNPs is much more accurate than representing them as a 5bp insertion and a 5bp deletion. I will have to look in to other sources in order to solve this problem.

    Thank you again for your reply.

  • SheilaSheila Broad InstituteMember, Broadie admin


    Okay. Good luck. If you do find some other tools that solve your issue, please report back so other users can benefit too.


Sign In or Register to comment.