Some sites that are overlapped by a spanning deletion getting a REF call instead of * in 3.4-46
It appears that there are some cases in which an upstream site is heterozygous for a deletion, and a downstream site should be heterozygous for something like 1/2 (where 2 is the * allele), but GATK is still making erroneous 0/1 calls. For instance, in my dataset I have a deletion at position 773 (I'm leaving out some fields and GT info for additional samples for brevity).
scaffold_1 773 . CAAAACATAGTTTCCCATGTCCGCCATCCCATGTTCTGCATCCGTGCA C ... 0/1:102,70:172:99:0|1:773_CAAAACATAGTTTCCCATGTCCGCCATCCCATGTTCTGCATCCGTGCA_C:283
And just 2 bases downstream, there's a SNP for the same sample that has non-* calls on both strands, which seems impossible. Shouldn't there be a * allele here, e.g (ALT==G,* ; GT==1/2)? Here's the call:
scaffold_1 775 . A G ... 0/1:60,134:194:99:4507,0,1642
Could this be a bug, such that when there's a heterozygous ALT call on the strand without the deletion, the half of the GT field for deleted strand is still getting a reference (0) call when it should be getting a * ? At another site overlapping with the same deletion, at which the strand without the deletion for this sample is a REF call , I get something that's more like what I'd expect.
scaffold_1 805 . G T,* ... 0/2:235,0,33:268:99:0|1:773_CAAAACATAGTTTCCCATGTCCGCCATCCCATGTTCTGCATCCGTGCA_C:676,1384,11251,0,9868,9762
This was done with GenotypeGVCFs from GATK 3.4-46.