The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Bug in HaplotypeCaller: GT is called 0/1, but AD is 206,0

ExeterPosts: 4

HI, I'd like to report a weird result from HaplotypeCaller.

We have a patient sequenced by targeted sequencing on HNF1B. This patient has been confirmed to have a whole gene deletion of HNF1B so we used this patient as a positive control. We expected to see no heterozygous variants called in HNF1B.

However, HaplotypeCaller called two heterozygous variants: one deletion (it didn't pass the FS strand bias filter and the ReadPosRankSumTest filter) and one substitution (this one passed all the quality filters). Both these two variants were not called by UnifiedGenotyper (and the variants called by UnifiedGenotyper in HNF1B region were all homozygous as what we expected)

There are three things I want to highlight:
1, The deletion is only 10 bases upstream of the substitution, but the FS score is enormous for the deletion whereas very low for the substitution. If there's a strand bias, it must affect both variants if they are so close to each other.
2, The score of ReadPosRankSumTest of the deletion didn't pass the filter because it's always called near the end of the reads. The ReadPosRankSumTest score for the substitution is missing.
3, The genotype was called 0/1 for the substitution, but if we look at the AD, there are 206 reads supporting the ref allele and 0 read supporting the alt allele. Going by the AD, it's clearly a homozygous ref/ref genotype.

Then I looked into the bam files. It turns out the all the alternative alleles of the substitution were from the ends of bad reads, and there are not too many of them after all. So the reads in the bam file also support a homozygous ref/ref genotype.

Therefore I'm really confused why the substitution has 0/1 genotype called by the HaplotypeCaller and why it passed the filter.

Many Thanks

Betty

Tagged:

Hi Betty, can you tell me what version you're using?

Geraldine Van der Auwera, PhD

• ExeterPosts: 4

@Geraldine_VdAuwera said:
Hi Betty, can you tell me what version you're using?

Hi, I'm using GenomeAnalysisTK-3.1-1

Thanks

Thanks -- this looks like it might be a bug, I'll pass it on to the HC dev for evaluation. Would you be able to share a snippet of data so that we can debug this locally? We would of course treat it as confidential and delete it when we're done.

Geraldine Van der Auwera, PhD

• ExeterPosts: 4

@Geraldine_VdAuwera said:
Thanks -- this looks like it might be a bug, I'll pass it on to the HC dev for evaluation. Would you be able to share a snippet of data so that we can debug this locally? We would of course treat it as confidential and delete it when we're done.

Thanks!
I've extracted reads in the HNF1B region from the bam file and uploaded them on a google drive

HNF1B.bam is the bam file after the step of RealignerTargetCreator;
HNF1B.HC-3.1-1.bam is the bam file after HaplotypeCaller (with --bamOutput)

Thanks for sharing the data. Can I ask you to upload the files to our FTP server rather than provide links? This makes it much easier for us to process incoming user files. Sorry I forgot to mention that initially; here are the instructions: http://www.broadinstitute.org/gatk/guide/article?id=1894

Geraldine Van der Auwera, PhD

• ExeterPosts: 4

@Geraldine_VdAuwera said:
Thanks for sharing the data. Can I ask you to upload the files to our FTP server rather than provide links? This makes it much easier for us to process incoming user files. Sorry I forgot to mention that initially; here are the instructions: http://www.broadinstitute.org/gatk/guide/article?id=1894

Thanks

Thanks @wxie, I'll process your files tomorrow, and hopefully we can get this fixed promptly.

Geraldine Van der Auwera, PhD

• Posts: 11

I'm seeing the same issue, and although I'm not completely sure, it does look like this is happening when SoftClipped reads are nearby. @wxie, do you know if you have softclipped reads near or at the variant location? (I see lots of reads lining up in your screenshot, but can't tell if it's clipping in addition to depth.)

• Posts: 5

Hi,

do you have fiexd this issu ?

I have the same problem :

On IGV I have A=1,C=43,T=56 for this position.

For information, I used Haplotype GenomeAnalysisTK-2.8.1. I think reanalyse with 3.1.1 but if there are the same bug...

Thank you for your help !!

@jpofmars‌

Hi,

Please try running the latest nightly build on a small section to see if it works.

Thanks,
Sheila

• Cambridge, MAPosts: 76 ✭✭

@jpofmars, also, please post the raw HC output that shows the error (i.e. with a contradictory Likelihoods, AD or GT values) together with the command line used; I'm guessing that the extract that you posted is the output from a larger pipeline including CombineVariants (where does the info field "set" comes from otherwise?) thus the error could come from some other variant calling run so it is better if we factor that out by just looking to the culpable HC's output.

In any case, I find it quite surprising that you have such a low counts looking at the input bam with IGV but such a large number (and just ref) in the AD annotation. Is it possible that those values somehow come from another BAM through another combined VCF (thanks to CombineVariants)?

And as Sheila has said you need to use a more up-to-date version (being nightly build the best); 2.8 is kind of old as we have changed HC quite a bit since then.

Cheers, V.

• Posts: 5

I use only HaplotypeCaller, but i split raw vcf file to separate snp and indels for the filtering step.
The problem came well from CombineVariants when I combine snp filtered file with indel filtered file.
With the new version the problem is resolved.

Thank you again !

JP

• ItalyPosts: 32

Hi,

I don't know if you worked on this bug, but I'm using GATK v3.5 and I found the same problem with this variant:

chr13 32972391 . G T 696.27 . AC=1;AF=0.063;AN=16;DP=1707;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.063;MQ=60.00;QD=3.74;SOR=0.003 GT:AD:DP:GQ:PL 0/0:169,0:169:99:0,120,1800 0/0:236,0:236:99:0,120,1800 0/1:186,0:186:99:730,0,16717 0/0:258,0:258:99:0,120,1800 0/0:237,0:237:99:0,120,1800 0/0:160,0:160:99:0,120,1800 0/0:279,0:279:99:0,120,1800 0/0:182,0:182:99:0,120,1800

Looking to the next position on the vcf I found this variant site:

chr13 32972392 . A G 1763.78 . AC=5;AF=0.313;AN=16;BaseQRankSum=-5.319e+00;ClippingRankSum=0.753;DP=4585;ExcessHet=10.0405;FS=36.135;MLEAC=6;MLEAF=0.375;MQ=60.00;MQRankSum=0.00;QD=0.57;ReadPosRankSum=-8.870e+00;SOR=6.989 GT:AD:DP:GQ:PL 0/0:552,0:552:0:0,0,16712 0/0:730,0:730:0:0,0,21769 0/1:469,67:536:99:730,0,16717 0/1:600,65:665:99:350,0,21633

Obviously, bam file has not any ALT allele in this position (32972391) while the other one (32972392) is correct.

Matteo

@Matteodigg
Hi Matteo,

Can you please post the bam and bamout file at those positions for that sample?

Thanks,
Sheila

• ItalyPosts: 32
edited April 2016

@Sheila

Hi Sheila, here the bam file output:

imgur.com/OVX0us7

and this is the g.vcf bam output:

imgur.com/zxlxAce

Waiting for news.
Thanks,

Matteo

@Matteodigg
Hi,

What kind of data are you working with? It looks like all the reads start and stop at the same position in the original BAM file.

-Sheila

• ItalyPosts: 32

Hi Sheila,
I'm working on custom amplicon BRCA data.

#### Issue · Github May 2016 by Sheila

Issue Number
854
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Agreed, this call looks unfounded. We have a few important bug fixes and behavior changes going into GenotypeGVCFs later this week, and we'll be releasing GATK 3.6 fairly soon. Once it's out I would advise testing the new version to see if the call is still made. If it is we'll ask you for a bug report; otherwise we'll consider it fixed by the new changes.

Geraldine Van der Auwera, PhD

• Posts: 72

Has this bug been fixed with version 3.6 of the GATK?