The current GATK version is 3.5-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# How does GATK determine GT when calling variants?

Posts: 3Member
edited September 2013

Looking closely at our vcf file produced by HaplotypeCaller we noticed disagreement between PL values and GT in some variants.
For example.

1   762589  rs71507461  G   C   21714.69    PASS    [clipped]   GT:AD:DP:GQ:PL  1/1:0,33:33:99:1464,99,0    **0/1**:32,38:70:99:**0,9,2274**    **0/1**:78,42:120:99:**323,30,0**   **1/1**:11,5:96:99:**123,0,324**    1/1:2,84:86:99:3923,271,0   1/1:2,104:106:99:4945,348,0
1   762592  rs71507462  C   G   21714.69    PASS    [clipped]   GT:AD:DP:GQ:PL  1/1:0,60:32:99:1464,99,0    0/1:586,0:67:99:1471,0,2274 0/1:77,42:119:99:1667,0,6152    1/1:1,95:96:99:4462,310,0   1/1:2,85:87:99:3923,271,0   1/1:2,101:103:99:4945,348,0
1   762601  rs71507463  T   C   21714.69    PASS    [clipped]   GT:AD:DP:GQ:PL  1/1:1675,144:32:99:1464,99,0    0/1:30,38:68:99:1471,0,2274 0/1:79,42:121:99:1667,0,495 1/1:20,24:90:99:4462,0,476  1/1:2,83:85:99:3923,271,0   1/1:3,107:110:99:4945,348,0


What is the relationship between GT and PL - I initially thought PL determined GT? What other factors go into GATK deciding on a particular GT? For example, does GATK take into account the GT of nearby variants to determine the GT of an individual variant?

Rosalie

Post edited by Geraldine_VdAuwera on
Tagged:

Hi Rosalie,

This does indeed look odd, we'll need to look into it more closely. Could you please upload some test files that replicate the issue so we can debug this locally? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

Geraldine Van der Auwera, PhD

• Posts: 3Member

Hi Geraldine,

Thank you for your response. Upon looking into the files more closely, I realized the issue arises when we call the variants with HaplotypeCaller and then use CombineVariants after filtering. (We hard filtered because the small size of the project.) After running CombineVariants the GT and PL values are no longer consistent.

Also, the issue first disappeared when I created snippet files. I added in a few more samples and the issue reappeared, however the PL values are different from those in our original problematic file (posted above).

The files are now uploaded in GTvsPL.tar.gz.

Thank you for looking into this! Please let me know if any other files or information would be helpful.

Rosalie

Thanks Rosalie, I'll have a look at these tomorrow (Tuesday) morning.

Geraldine Van der Auwera, PhD

Hi Rosalie,

I just realized now that you uploaded a very large file archive (24G). Unfortunately that's too large for us to use efficiently in debugging. Could you please narrow the files down to just a short interval containing one or several representative sites? Preferably the sites shown in your original posting.

Geraldine Van der Auwera, PhD

• Posts: 3Member

Hi Geraldine,

Thank you for looking into this.

Unfortunately, the issue disappears when I make the snippets smaller! The error appears to depend on the amount of data processed.

The files I uploaded are snippets of our original files and still produce a GT PL discrepancy when CombineVariants runs. (However, the numbers are different from my original post - once again, the issue seems to depend on the amount of data processed.)

Perhaps you could take a look at two of the vcf files I uploaded? (The bam files are what makes the archive large.) hf.snps.cllsnippet.vcf (the snp input file to CombineVariants) appears normal, while hf.cllsnippet.vcf (the output of CombineVariants) has the issue.

Rosalie

Hi Rosalie, I'm a little backed up on reports to process but I'll try to look at your files later today. I'm a little concerned that this might be a Heisenbug...

Geraldine Van der Auwera, PhD