I have a sample for which HaplotypeCaller identifies the following variant:
chr12 133237753 . GAAA G,GA,GAA,TAAA,GAAAA,GAAAAA,<NON_REF> 66.73 . BaseQRankSum=-1.481;ClippingRankSum=0.000;DP=407;ExcessHet=3.0103;MLEAC=0,0,0,0,0,0,1;MLEAF=0.00,0.00,0.00,0.00,0.00,0.00,0.500;MQRankSum=-0.936;RAW_MQ=1473600.00;ReadPosRankSum=-0.376 GT:AD:DP:GQ:PL:SB 0/7:133,16,16,47,24,15,6,0:257:3:102,422,10788,292,8507,8145,3,4923,4768,4515,709,4921,4093,2553,6007,405,4861,4378,3143,3552,4849,564,6844,6021,4099,4376,5290,7241,0,2341,2211,1690,2151,2016,2301,1682:67,66,34,90
After running GenotypeGVCFs using all our previously analyzed samples, and then using SelectVariants (with options excludeNonVariants and removeUnusedAlternates) to grab the relevant sample, I get the following:
chr12 133237753 . GA G 381589.02 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.165;ClippingRankSum=-4.100e-02;DP=257;ExcessHet=2147483647.0000;FS=0.000;InbreedingCoeff=-2.7975;MQ=60.16;MQRankSum=0.100;QD=1.67;ReadPosRankSum=0.265;SOR=0.630 GT:AD:DP:GQ:PL 0/1:133,47:257:99:99,0,4512 chr12 133237754 . A T 361388.42 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.733e+00;ClippingRankSum=-5.750e-01;DP=257;ExcessHet=2147483647.0000;FS=0.000;InbreedingCoeff=-0.7161;MQ=1.45;MQRankSum=0.00;QD=1.97;ReadPosRankSum=1.07;SOR=0.629 GT:AD:DP:GQ:PL 0/1:133,0:257:3:102,0,1682 chr12 133237755 . A T 351628.83 . AC=1;AF=0.500;AN=2;DP=257;ExcessHet=2147483647.0000;FS=0.000;InbreedingCoeff=-0.5329;MQ=0.62;QD=1.94;SOR=0.629 GT:AD:DP:GQ:PL 0/1:133,0:257:3:102,0,1682
We now all of a sudden see A->T variants at the two positions downstream the original variant (not called by HaplotypeCaller), but in both cases AD for the alternate allele is 0.
Is this the intended behavior? Part of the GenotypeGVCFs documentation says "This tool performs the multi-sample joint aggregation step and merges the records together in a sophisticated manner". Maybe I just don't understand that level of sophistication :-)