False positive(?) variant calls of long insertion GATK 3.6
first of all, I'm aware that the calling of long Indels is usually problematic. But since I just came across the issue and could not find any possible explanation for this, I wanted to ask if maybe some of you guys know anything about this observation:
chr3 195507858 . G T,GGTGGATACTGAGGAAGTGTCGGTGACAGGAAGAGGGGTGGCGTGACCGGTGGATGCCGAGGAAGCGTCGGTGACAGGAAGAGGGGTGGTGTCACCTGTGGATACTGAGGAAAAGCTGGTGACAGGAAGAGGGGTGGCGTGACCT 157204 VQSRTrancheINDEL99.00to99.90 AC=0,2;AF=0.237,0.053;AN=2;BaseQRankSum=8.08;ClippingRankSum=0;DP=44310;ExcessHet=34.8579;FS=2.569;InbreedingCoeff=-0.153;MLEAC=63,13;MLEAF=0.24,0.05;MQ=33.48;MQRankSum=-0.033;NEGATIVE_TRAIN_SITE;QD=19.7;ReadPosRankSum=-0.672;SOR=0.898;VQSLOD=-2.182;culprit=MQRankSum GT:AD:DP:GQ:PGT:PID:PL 2/2:60,0,0:60:99:.:.:7071,890,2640,496,194,0
As you can see, its a homozygous genotype 2/2. However, when looking at the AD fields, I cannot detect any evidence for the variant. In contrast, all reads seem to carry the reference allele. Based on this, its completely unclear to me how the caller concludes on this genotype.
In another sample it looks like this:
chr3 195507858 . G GGTGGATACTGAGGAAGTGTCGGTGACAGGAAGAGGGGTGGCGTGACCGGTGGATGCCGAGGAAGCGTCGGTGACAGGAAGAGGGGTGGTGTCACCTGTGGATACTGAGGAAAAGCTGGTGACAGGAAGAGGGGTGGCGTGACCT144784 VQSRTrancheINDEL99.00to99.90 AC=1;AF=0.042;AN=2;BaseQRankSum=3;ClippingRankSum=0;DP=44395;ExcessHet=39.0155;FS=2.58;InbreedingCoeff=-0.1729;MLEAC=10;MLEAF=0.038;MQ=33.27;MQRankSum=-0.746;NEGATIVE_TRAIN_SITE;QD=17.95;ReadPosRankSum=-0.672;SOR=0.897;VQSLOD=-2.312;culprit=MQRankSum GT:AD:DP:GQ:PGT:PID:PL 0/1:89,0:92:99:.:.:4731,0,2164
Same issue, but now its suddenly heterozygous...
And in this sample:
chr3 195507858 . G GGTGGATACTGAGGAAGTGTCGGTGACAGGAAGAGGGGTGGCGTGACCGGTGGATGCCGAGGAAGCGTCGGTGACAGGAAGAGGGGTGGTGTCACCTGTGGATACTGAGGAAAAGCTGGTGACAGGAAGAGGGGTGGCGTGACCT 156872 PASS AC=2;AF=0.061;AN=2;BaseQRankSum=0.967;ClippingRankSum=0;DP=42059;ExcessHet=33.8845;FS=3.403;InbreedingCoeff=-0.1488;MLEAC=14;MLEAF=0.053;MQ=34.6;MQRankSum=-1.012;QD=19.08;ReadPosRankSum=-0.692;SOR=0.827;VQSLOD=-0.3066;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 1/1:39,27:66:99:.:.:6872,496,0
Although there is (in my opinion) evidence for a 0/1 genotype, its called as 1/1.
Would be great if someone could share his/her opinion on that. Maybe I'm just not up to date, but I think these specific genotypes really make no sense..