If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Unified genotyper: GT does not match PL, causes problems with PhaseByTransmission
I'm analyzing seven trio exomes right now with the latest GATK (version 2.7-4-g6f46d11), and was surprised to find a large number of mendelian violations reported by PhaseByTransmission, even after eliminating low/no coverage events. Tracking down the problem, it seems that CombineVariants occasionally propagates the PL field to the new vcf file incorrectly, sometimes in a way which causes GT not to correspond to the lowest PL.
Here's an example, showing just the GT, AD, and PL columns for a few positions in one trio. For each position, the first line contains the genotypes from the original vcf file, and the second shows the genotypes from the merged file.
#CHROM POS ID REF ALT 100403001-1 100403001-1A 100403001-1B 1 5933530 rs905469 A G 0/0:37,0:0,99,1192 0/0:35,0:0,90,1101 0/0:44,0:0,117,1412 1 5933530 rs905469 A G 0/0:37,0:189,15,1192 0/0:35,0:0,90,1101 0/0:44,0:0,117,1412 1 10412636 rs4846215 A T 0/0:119,0:0,358,4297 0/0:113,0:0,337,4060 0/0:102,0:0,304,3622 1 10412636 rs4846215 A T 0/0:119,0:110,9,0 0/0:113,0:0,337,4060 0/0:102,0:0,304,3622 1 11729035 rs79974326 G C 0/0:50,0:0,141,1709 0/0:53,0:0,150,1788 0/0:71,0:0,187,2246 1 11729035 rs79974326 G C 0/0:50,0:1930,0,3851 0/0:53,0:0,150,1788 0/0:71,0:0,187,2246 1 16735764 rs182873855 G A 0/0:54,0:0,138,1691 0/0:57,0:0,153,1841 0/0:47,0:0,120,1441 1 16735764 rs182873855 G A 0/0:54,0:174,0,1691 0/0:57,0:0,153,1841 0/0:47,0:0,120,1441 1 17316577 rs77880760 G T 0/0:42,0:0,123,1470 0/0:38,0:0,111,1317 0/0:53,0:0,153,1817 1 17316577 rs77880760 G T 0/0:42,0:233,17,1470 0/0:38,0:0,111,1317 0/0:225,25:0,153,181 1 28116000 rs2294229 A G 0/0:37,0:0,105,1291 0/0:37,0:0,111,1379 0/0:30,0:0,87,1066 1 28116000 rs2294229 A G 0/0:37,0:0,105,1291 0/0:37,0:0,111,1379 0/0:30,0:1844,159,0 1 31740706 rs3753373 A G 0/0:123,0:0,349,4173 0/0:110,0:0,319,3793 0/0:111,0:0,328,3885 1 31740706 rs3753373 A G 0/0:123,0:117,6,0 0/0:110,0:0,319,3793 0/0:111,0:0,328,3885
Most genotypes are propagated correctly, and in fact, which a propagated incorrectly changes from run to run.
In my case, I'm merging files from disjoint regions, so I can work around the problem, but it would be nice if this were fixed.