The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Strange genotype calling with HaplotypeCaller

Washington DCPosts: 56

I cannot understand some of HC's calls. It is rare, but occasionally HC will call individuals as 0/1 when their ADs are 0,50+ (clearly 1/1). I tried looking at the output bams at those positions, but nothing there would suggest to my human eye that the individual would have a haplotype that included a ref at that position. This is definitely an issue of long range haplotype calling, because the genotypes are "correctly" called when I use a vcf as the input file instead of a bed file containing multiple variant positions in the same region.

Tagged:

• Washington DCPosts: 56

For example, this is my output vcf with a bed file from 14:96556546-96557146 as the intervals file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  IND1    IND2    IND3    IND4
14      96556727        .       CA      .       156.71  .       AN=4;DP=46;MQ=69.75     GT:AD:DP        0:11:11 0:19:19 0:10:10 0:6:6
14      96556742        .         AGAAAAGAAAAAAAAAAAGAGCAACTTACTGCTTTGTGAGGTTGTGAGTGGCCACCACTGAAGGTCTTTGAGAAGAGGCCAGACGCCGCTGTAGCCAGGCCTGTCTTAAGAGGACTTGTGCTTCCAGGGACCCAGGCAGGATGATGGCGCAGCTCTTCCTACTCCAAGCCAATGCTGTCCTTCCCCTTTCCCATG     .       188.38  .       AN=4;DP=796;MQ=69.94    GT:AD:DP        0:208:208       0:312:312       0:153:153       0:123:123
14      96556846        rs61985272      C       T       3492.94 .       AC=4;AF=0.500;AN=8;BaseQRankSum=-4.176;ClippingRankSum=0.449;DB;DP=228;FS=0.000;MLEAC=4;MLEAF=0.500;MQ=70.00;MQ0=0;MQRankSum=-0.277;QD=15.32;ReadPosRankSum=0.446       GT:AD:DP:GQ:PL  0/1:0,69:69:99:1379,0,201       0/1:0,72:72:99:1313,0,223       0/1:29,26:55:99:223,0,1039      0/1:0,32:32:99:610,0,108
14      96556889        .       C       .       256.67  .       AN=4;DP=248;MQ=70.00    GT:AD:DP        0:67:67 0:90:90 0:52:52 0:39:39
14      96556914        rs2093722       T       C       8449.12 .       AC=5;AF=0.625;AN=8;BaseQRankSum=1.388;ClippingRankSum=-0.657;DB;DP=366;FS=0.909;MLEAC=5;MLEAF=0.625;MQ=70.00;MQ0=0;MQRankSum=0.818;QD=23.09;ReadPosRankSum=1.533        GT:AD:DP:GQ:PL  0/1:0,96:96:60:2502,0,60        1/1:0,141:141:55:3758,55,0      0/1:26,46:72:99:820,0,869       0/1:0,54:54:13:1403,0,13
14      96557121        rs10148419      T       C       20218.35        .       AC=8;AF=1.00;AN=8;DB;DP=661;FS=0.000;MLEAC=8;MLEAF=1.00;MQ=70.00;MQ0=0;QD=30.59 GT:AD:DP:GQ:PL  1/1:0,174:174:99:5333,519,0     1/1:0,250:250:99:7665,745,0     1/1:0,156:156:99:4756,469,0     1/1:0,81:81:99:2490,242,0

• Washington DCPosts: 56

And this is my output vcf with a vcf containing the above chrom, pos, and ref columns as the intervals file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  IND1    IND2    IND3    IND4
14      96556727        rs55835301      CA      C       131.94  .       AC=2;AF=0.250;AN=8;BaseQRankSum=0.551;ClippingRankSum=-0.089;DB;DP=43;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=69.93;MQ0=0;MQRankSum=-0.693;QD=4.55;ReadPosRankSum=1.299 GT:AD:DP:GQ:PL  0/0:8,1:9:3:0,3,187     0/1:8,7:15:99:120,0,194 0/1:4,3:7:53:53,0,93    0/0:4,0:4:12:0,12,114
14      96556846        rs61985272      C       T       5763.33 .       AC=7;AF=0.875;AN=8;BaseQRankSum=-4.252;ClippingRankSum=0.078;DB;DP=228;FS=0.000;MLEAC=7;MLEAF=0.875;MQ=70.00;MQ0=0;MQRankSum=0.563;QD=25.28;ReadPosRankSum=0.464        GT:AD:DP:GQ:PL  1/1:0,69:69:99:2134,207,0       1/1:0,72:72:99:2104,216,0       0/1:29,26:55:99:606,0,721       1/1:0,32:32:96:951,96,0
14      96556889        .       C       .       148.67  .       AN=4;DP=248;MQ=70.00    GT:AD:DP        0:67:67 0:90:90 0:52:52 0:39:39
14      96556914        rs2093722       T       C       10061.33        .       AC=7;AF=0.875;AN=8;BaseQRankSum=1.568;ClippingRankSum=0.143;DB;DP=364;FS=0.909;MLEAC=7;MLEAF=0.875;MQ=70.00;MQ0=0;MQRankSum=0.682;QD=27.64;ReadPosRankSum=1.584 GT:AD:DP:GQ:PL  1/1:0,96:96:99:3015,287,0       1/1:0,141:141:99:4258,419,0     0/1:26,46:72:99:1156,0,588      1/1:0,54:54:99:1664,162,0
14      96557121        rs10148419      T       C       20158.35        .       AC=8;AF=1.00;AN=8;DB;DP=661;FS=0.000;MLEAC=8;MLEAF=1.00;MQ=70.00;MQ0=0;QD=30.50 GT:AD:DP:GQ:PL  1/1:0,174:174:99:5333,519,0     1/1:0,250:250:99:7605,744,0     1/1:0,156:156:99:4756,469,0     1/1:0,81:81:99:2490,242,0

• Washington DCPosts: 56

The important difference is the called genotypes at positions 96556846 and 96556914. The first file calls individuals as hets, the second file calls them as homoalts. I can upload the output bams or the input bam file snippets and ped file information if you want to look into this further.

Hi, I'm not quite sure I understand what the different inputs are that you are using. Could you also post snippets of your inputs in addition to the outputs?

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

• Washington DCPosts: 56

the input bed file:

14     96556546     96557146

14      96556727        .       CA      .       156.71  .       AN=4;DP=46;MQ=69.75
14      96556742        .       AGAAAAGAAAAAAAAAAAGAGCAACTTACTGCTTTGTGAGGTTGTGAGTGGCCACCACTGAAGGTCTTTGAGAAGAGGCCAGACGCCGCTGTAGCCAGGCCTGTCTTAAGAGGACTTGTGCTTCCAGGGACCCAGGCAGGATGATGGCGCAGCTCTTCCTACTCCAAGCCAATGCTGTCCTTCCCCTTTCCCATG     .       188.38  .       AN=4;DP=796;MQ=69.9
14      96556846        rs61985272      C       T       3492.94 .       AC=4;AF=0.500;AN=8;BaseQRankSum=-4.176;ClippingRankSum=0.449;DB;DP=228;FS=0.000;MLEAC=4;MLEAF=0.500;MQ=70.00;MQ0=0;MQRankSum=-0.277;QD=15.32;ReadPosRankSum=0.446
14      96556889        .       C       .       256.67  .       AN=4;DP=248;MQ=70.0
14      96556914        rs2093722       T       C       8449.12 .       AC=5;AF=0.625;AN=8;BaseQRankSum=1.388;ClippingRankSum=-0.657;DB;DP=366;FS=0.909;MLEAC=5;MLEAF=0.625;MQ=70.00;MQ0=0;MQRankSum=0.818;QD=23.09;ReadPosRankSum=1.533
14      96557121        rs10148419      T       C       20218.35        .       AC=8;AF=1.00;AN=8;DB;DP=661;FS=0.000;MLEAC=8;MLEAF=1.00;MQ=70.00;MQ0=0;QD=30.59

All the sites in the input vcf are the sites that were called by HC using the bed file as the intervals file.

Okay, very helpful. Now I see what is happening.
Which version of the GATK are you using? Could you please try re-running with the bed file input (the one that presumably leads to bad genotypes) with the latest version (2.7, released last night)?
Thanks

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

• Washington DCPosts: 56

I am currently using GATK v2.6-5-gba531bd, Compiled 2013/07/18 18:05:31. I probably won't be able to update until next week. I have some runs that I want to do identically, so I can't change GATK versions in between. But I will let you know once I do.

• Posts: 6

I am experiencing the same strange behaviour with HC, using bed input files, running either v2.6-4-g3e5ff60 or v2.7-1-g42d771f .

Hi @aciolfi,

Could you please upload test data files to our servers so that we can try to debug this locally? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

Geraldine Van der Auwera, PhD

• Posts: 6

Sure, I have followed your instructions and I have uploaded it on your server. The name is GATK_HC_bug_report_27_08_2013.tgz. Thank you for your help, and let me know if you need more details.

Hi @aciolfi, thanks for the test files. Can you tell me specifically what position you're concerned about?

Geraldine Van der Auwera, PhD

• Posts: 6

Yes, for example I cannnot understand why the position chr19:33444672 is called 0/1: I don't understand which reads support this genotype call.

Great, thank you -- the files replicate the issue, so I'll pass this on to the devs to debug in detail. I'll get back to you when we have an idea of what's going on.

Geraldine Van der Auwera, PhD

Hi @aciolfi,
When you look at this site in IGV you should enable the viewing of soft-clipped bases. If you do so (a la the screenshot below) you'll see that you have what looks like a massive insertion just upstream of the position in question. So when you include the whole region the Haplotype Caller sees that there are 2 segregating haplotypes here and correctly calls your SNP as heterozygous; it does not output the insertion though because it's far too large to assemble (and it looks like your coverage drops off downstream of the event which disempowers assembly; is this from a capture experiment?). But when you ask the HC to look only at the SNP position and don't include the upstream region with the insertion, it doesn't see that other haplotype so is forced to call the genotype as homozygous.
Hope this helps!

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

• Posts: 6

Hi @ebanks, thank you very much for the support and the clarification! (And yes, data were from exome sequencing experiment).

• chinaPosts: 2

But my result ｃｏｌｕｍｎ　３　didn‘t showd rsnumber , what is the matter ?