Our documentation websites are currently offline due to a data center fire. We do not yet have an ETA for restoring service; we’ll update this message when we know more.

Haplotypecaller calls variants at a deletion region

I'm having a confusing problem when using haplotypecaller.

Basically, I'm using haplotypecaller calling variants among more than 400 M. tuberculosis samples, sequenced with Hiseq2500 platform. I followed the workflow for calling variants on cohort samples as described here: https://gatkforums.broadinstitute.org/gatk/discussion/3893/calling-variants-on-cohorts-of-samples-using-the-haplotypecaller-in-gvcf-mode

I find a problem with some samples when checking the SNPs called by this procedure. For example, as in Sample1, as show in this figure

,there seems to be a deletion at the position 2866805. However, the GATK3.8 called a SNP at this position, as shown in the excerpt from the vcf file below:

NC_000962.3 2866805 . C G 8160 . AC=1;AF=1.00;AN=1;DP=182;FS=0.000;GQ_MEAN=8190.00;MLEAC=1;MLEAF=1.00;MQ=50.38;MQ0=0;NCC=0;QD=31.09;SOR=0.917 GT:AD:GQ:PL 1:0,176:99:8190,0

In total, haplotypecaller called 11 snps at this deletion region.

So I'm confused that why haplotypecaller called a snp variant when bam file shows there is a deletion? I would really appreciate if you could help me to figure this out. Thank you in advance!

P.S. after finding this problem, we also tried UnifiedGenotyper on Sample1, and the variants at the deletion region were not called this time.



  • I have some update about this problem. I have compared the bam file and bamout file at this position in this sample. This figure is from bam file:

    And this figure is from bamout file:

    It seems that reassembly by haplotypecaller caused this problem, right?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    Are you working with amplicon data? It looks like the reads all start and stop at the same position. Can you post an IGV screenshot of the bamout file (like you did for the original BAM file)? I may need you to submit a bug report. Also, can you test this with the latest version of GATK4?


  • Hi Sheila,
    Thank you for your response!
    Yes, I'm working with amplicon data. But if you take close look at a wider range, all the reads are not exactly the same length. I have posted the bamout file along with original bam file shown in IGV here:

    Also, I have tried haplotypecaller with the lastest version , but the result is the same.

    Thank you!


  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Hi Yang,

    This is very interesting/odd. Can you submit a bug report so I can take a look locally? Instructions are here.


  • Hi Sheila,
    Thank you for replying. I have tried to upload the report file to the ftp with the name "bugreport_carolynzy", but I'm not sure whether it's successfully done. If it failed, please let me know, I will upload it again.


  • SkyWarriorSkyWarrior TurkeyMember

    Can you check your softclipped bases within that region ? Can you also try the parameter to ignore the softclipped bases by haplotypecaller?

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @carolynzy,

    Can you please upload a tar.gz bundle by following the instructions that Sheila gave you? Your previous upload is a file with zero byte size.

  • carolynzycarolynzy Member

    Hi shlee,

    I have uploaded the file again under the name 'bugreport_carolynzy_2'.
    Sorry for the delay.

    Issue · Github
    by Sheila

    Issue Number
    Last Updated
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Hi Yang,

    Thanks. I will have a look soon.


Sign In or Register to comment.