If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
I have a sample for which HaplotypeCaller identifies the following variant:
chr12 133237753 . GAAA G,GA,GAA,TAAA,GAAAA,GAAAAA,<NON_REF> 66.73 . BaseQRankSum=-1.481;ClippingRankSum=0.000;DP=407;ExcessHet=3.0103;MLEAC=0,0,0,0,0,0,1;MLEAF=0.00,0.00,0.00,0.00,0.00,0.00,0.500;MQRankSum=-0.936;RAW_MQ=1473600.00;ReadPosRankSum=-0.376 GT:AD:DP:GQ:PL:SB 0/7:133,16,16,47,24,15,6,0:257:3:102,422,10788,292,8507,8145,3,4923,4768,4515,709,4921,4093,2553,6007,405,4861,4378,3143,3552,4849,564,6844,6021,4099,4376,5290,7241,0,2341,2211,1690,2151,2016,2301,1682:67,66,34,90
After running GenotypeGVCFs using all our previously analyzed samples, and then using SelectVariants (with options excludeNonVariants and removeUnusedAlternates) to grab the relevant sample, I get the following:
chr12 133237753 . GA G 381589.02 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.165;ClippingRankSum=-4.100e-02;DP=257;ExcessHet=2147483647.0000;FS=0.000;InbreedingCoeff=-2.7975;MQ=60.16;MQRankSum=0.100;QD=1.67;ReadPosRankSum=0.265;SOR=0.630 GT:AD:DP:GQ:PL 0/1:133,47:257:99:99,0,4512 chr12 133237754 . A T 361388.42 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.733e+00;ClippingRankSum=-5.750e-01;DP=257;ExcessHet=2147483647.0000;FS=0.000;InbreedingCoeff=-0.7161;MQ=1.45;MQRankSum=0.00;QD=1.97;ReadPosRankSum=1.07;SOR=0.629 GT:AD:DP:GQ:PL 0/1:133,0:257:3:102,0,1682 chr12 133237755 . A T 351628.83 . AC=1;AF=0.500;AN=2;DP=257;ExcessHet=2147483647.0000;FS=0.000;InbreedingCoeff=-0.5329;MQ=0.62;QD=1.94;SOR=0.629 GT:AD:DP:GQ:PL 0/1:133,0:257:3:102,0,1682
We now all of a sudden see A->T variants at the two positions downstream the original variant (not called by HaplotypeCaller), but in both cases AD for the alternate allele is 0.
Is this the intended behavior? Part of the GenotypeGVCFs documentation says "This tool performs the multi-sample joint aggregation step and merges the records together in a sophisticated manner". Maybe I just don't understand that level of sophistication :-)