Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

How does GATK determine GT when calling variants?

rosalierosalie Posts: 3Member
edited September 2013 in Ask the team

Looking closely at our vcf file produced by HaplotypeCaller we noticed disagreement between PL values and GT in some variants. For example.

1   762589  rs71507461  G   C   21714.69    PASS    [clipped]   GT:AD:DP:GQ:PL  1/1:0,33:33:99:1464,99,0    **0/1**:32,38:70:99:**0,9,2274**    **0/1**:78,42:120:99:**323,30,0**   **1/1**:11,5:96:99:**123,0,324**    1/1:2,84:86:99:3923,271,0   1/1:2,104:106:99:4945,348,0
1   762592  rs71507462  C   G   21714.69    PASS    [clipped]   GT:AD:DP:GQ:PL  1/1:0,60:32:99:1464,99,0    0/1:586,0:67:99:1471,0,2274 0/1:77,42:119:99:1667,0,6152    1/1:1,95:96:99:4462,310,0   1/1:2,85:87:99:3923,271,0   1/1:2,101:103:99:4945,348,0
1   762601  rs71507463  T   C   21714.69    PASS    [clipped]   GT:AD:DP:GQ:PL  1/1:1675,144:32:99:1464,99,0    0/1:30,38:68:99:1471,0,2274 0/1:79,42:121:99:1667,0,495 1/1:20,24:90:99:4462,0,476  1/1:2,83:85:99:3923,271,0   1/1:3,107:110:99:4945,348,0

What is the relationship between GT and PL - I initially thought PL determined GT? What other factors go into GATK deciding on a particular GT? For example, does GATK take into account the GT of nearby variants to determine the GT of an individual variant?

Thank you in advance for your help.

Rosalie

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,213Administrator, GSA Member admin

    Hi Rosalie,

    This does indeed look odd, we'll need to look into it more closely. Could you please upload some test files that replicate the issue so we can debug this locally? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

    Geraldine Van der Auwera, PhD

  • rosalierosalie Posts: 3Member

    Hi Geraldine,

    Thank you for your response. Upon looking into the files more closely, I realized the issue arises when we call the variants with HaplotypeCaller and then use CombineVariants after filtering. (We hard filtered because the small size of the project.) After running CombineVariants the GT and PL values are no longer consistent.

    Also, the issue first disappeared when I created snippet files. I added in a few more samples and the issue reappeared, however the PL values are different from those in our original problematic file (posted above).

    The files are now uploaded in GTvsPL.tar.gz.

    Thank you for looking into this! Please let me know if any other files or information would be helpful.

    Rosalie

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,213Administrator, GSA Member admin

    Thanks Rosalie, I'll have a look at these tomorrow (Tuesday) morning.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,213Administrator, GSA Member admin

    Hi Rosalie,

    I just realized now that you uploaded a very large file archive (24G). Unfortunately that's too large for us to use efficiently in debugging. Could you please narrow the files down to just a short interval containing one or several representative sites? Preferably the sites shown in your original posting.

    Geraldine Van der Auwera, PhD

  • rosalierosalie Posts: 3Member

    Hi Geraldine,

    Thank you for looking into this.

    Unfortunately, the issue disappears when I make the snippets smaller! The error appears to depend on the amount of data processed.

    The files I uploaded are snippets of our original files and still produce a GT PL discrepancy when CombineVariants runs. (However, the numbers are different from my original post - once again, the issue seems to depend on the amount of data processed.)

    Perhaps you could take a look at two of the vcf files I uploaded? (The bam files are what makes the archive large.) hf.snps.cllsnippet.vcf (the snp input file to CombineVariants) appears normal, while hf.cllsnippet.vcf (the output of CombineVariants) has the issue.

    Please let me know if anything else would be helpful.

    Rosalie

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,213Administrator, GSA Member admin

    Hi Rosalie, I'm a little backed up on reports to process but I'll try to look at your files later today. I'm a little concerned that this might be a Heisenbug...

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,213Administrator, GSA Member admin

    Hi Rosalie,

    The latest version release (2.8) includes a fix for the CombineVariants GT/PL issue which should solve your problem.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.