The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.2 is now available at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.2 (i.e. the second beta release) is out. See the GATK4 BETA page for download and details.

How does GATK determine GT when calling variants?

rosalierosalie Member
edited September 2013 in Ask the GATK team

Looking closely at our vcf file produced by HaplotypeCaller we noticed disagreement between PL values and GT in some variants.
For example.

1   762589  rs71507461  G   C   21714.69    PASS    [clipped]   GT:AD:DP:GQ:PL  1/1:0,33:33:99:1464,99,0    **0/1**:32,38:70:99:**0,9,2274**    **0/1**:78,42:120:99:**323,30,0**   **1/1**:11,5:96:99:**123,0,324**    1/1:2,84:86:99:3923,271,0   1/1:2,104:106:99:4945,348,0
1   762592  rs71507462  C   G   21714.69    PASS    [clipped]   GT:AD:DP:GQ:PL  1/1:0,60:32:99:1464,99,0    0/1:586,0:67:99:1471,0,2274 0/1:77,42:119:99:1667,0,6152    1/1:1,95:96:99:4462,310,0   1/1:2,85:87:99:3923,271,0   1/1:2,101:103:99:4945,348,0
1   762601  rs71507463  T   C   21714.69    PASS    [clipped]   GT:AD:DP:GQ:PL  1/1:1675,144:32:99:1464,99,0    0/1:30,38:68:99:1471,0,2274 0/1:79,42:121:99:1667,0,495 1/1:20,24:90:99:4462,0,476  1/1:2,83:85:99:3923,271,0   1/1:3,107:110:99:4945,348,0

What is the relationship between GT and PL - I initially thought PL determined GT? What other factors go into GATK deciding on a particular GT? For example, does GATK take into account the GT of nearby variants to determine the GT of an individual variant?

Thank you in advance for your help.

Rosalie

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Rosalie,

    This does indeed look odd, we'll need to look into it more closely. Could you please upload some test files that replicate the issue so we can debug this locally? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

  • Hi Geraldine,

    Thank you for your response. Upon looking into the files more closely, I realized the issue arises when we call the variants with HaplotypeCaller and then use CombineVariants after filtering. (We hard filtered because the small size of the project.) After running CombineVariants the GT and PL values are no longer consistent.

    Also, the issue first disappeared when I created snippet files. I added in a few more samples and the issue reappeared, however the PL values are different from those in our original problematic file (posted above).

    The files are now uploaded in GTvsPL.tar.gz.

    Thank you for looking into this! Please let me know if any other files or information would be helpful.

    Rosalie

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Thanks Rosalie, I'll have a look at these tomorrow (Tuesday) morning.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Rosalie,

    I just realized now that you uploaded a very large file archive (24G). Unfortunately that's too large for us to use efficiently in debugging. Could you please narrow the files down to just a short interval containing one or several representative sites? Preferably the sites shown in your original posting.

  • Hi Geraldine,

    Thank you for looking into this.

    Unfortunately, the issue disappears when I make the snippets smaller! The error appears to depend on the amount of data processed.

    The files I uploaded are snippets of our original files and still produce a GT PL discrepancy when CombineVariants runs. (However, the numbers are different from my original post - once again, the issue seems to depend on the amount of data processed.)

    Perhaps you could take a look at two of the vcf files I uploaded? (The bam files are what makes the archive large.) hf.snps.cllsnippet.vcf (the snp input file to CombineVariants) appears normal, while hf.cllsnippet.vcf (the output of CombineVariants) has the issue.

    Please let me know if anything else would be helpful.

    Rosalie

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Rosalie, I'm a little backed up on reports to process but I'll try to look at your files later today. I'm a little concerned that this might be a Heisenbug...

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Rosalie,

    The latest version release (2.8) includes a fix for the CombineVariants GT/PL issue which should solve your problem.

Sign In or Register to comment.