Haplotype Caller Makes SNPs look like INDELS

I'm using the HaplotypeCaller to look at SNPs related to antimicrobial resistance and am getting a result that looks like this:

NC_011035.1 2049708 .   CCGGCG  C   ...
NC_011035.1 2049714 .   C   CAAGAA  ...

I believe this is an alignment that would look like:
CCGGCGC
CCAAGAA

but instead of giving me 5 individual SNPs, GATK is calling the region as though it is a 5bp deletion at position 2049708 and a 5bp insertion at position 2049714.

Is there any way to change the parameters so that the appropriate call is made?

My current command is:

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -nct 12 -R NCC_011035.fasta -I ST547_dedup_reads_group.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o ST547_raw.vcf

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @tathey
    Hi,

    That is just the way the tools output the variants. They are in comparison to the reference genome. There is no way to change the settings in HaplotypeCaller.

    -Sheila

    P.S. You may be interested in FastaAlternateReferenceMaker which will output the sample variants in a FASTA file. That will allow you to visualize the variants without comparing them to the reference genome.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @tathey
    Hi again,

    It seems I may have jumped the gun when responding.

    You are correct that the representation you wrote is valid, as is the one from HaplotypeCaller. It is difficult to justify one over the other because they are both valid representations. HaplotypeCaller chose that particular representation from the graph it created (also the 5 SNPs is not the most parsimonious representation). This is a limitation of the tool (you may not get the representation you want).

    The correct way to represent this is as a complex substitution, but HaplotypeCaller cannot do that.
    You can try using bcftools to normalize the variants if you are interested in particular representations or have another VCF you would like to compare to.

    -Sheila

  • tatheytathey TorontoMember

    Hi Sheila,

    Thanks for the response. Unfortunately, normalization does not change the variants because they are represented as two indels which do not merge. If they were represented as a block such as

    NC_011035.1 2049708 .   CCGGCGC  CCAAGAA   ...
    

    then normalization would do the trick.

    I agree that this is a more complex substitution than 5 SNPs, but I would argue that representing these substitutions as SNPs is much more accurate than representing them as a 5bp insertion and a 5bp deletion. I will have to look in to other sources in order to solve this problem.

    Thank you again for your reply.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @tathey
    Hi,

    Okay. Good luck. If you do find some other tools that solve your issue, please report back so other users can benefit too.

    Thanks,
    Sheila

Sign In or Register to comment.