Indel-Like Representation of SNP in VCF from HaplotypeCaller Causes Analysis Problems Downstream
Hi, GATK Team
I've run into a strange case where a SNP called by HaplotypeCaller has been represented as if it were an indel:
6 16327909 . ATGCTGATGCTGC CTGCTGATGCTGC 1390.70 PASS AC=1;AC_Orig=2;AF=0.500;AF_Orig=0.040;AN=2;AN_Orig=50;BaseQRankSum=0.788;DP=10;FS=6.154;InbreedingCoeff=0.1807;MQ=59.86;MQ0=0;MQRankSum=0.406;QD=2.77;ReadPosRankSum=0.358;VQSLOD=2.78;culprit=FS GT:DP:GQ:PL 0/1:10:70:284,0,214
This VCF entry (for a single individual) comes from a multi-sample VCF that has multiple alternate "alleles" at that position:
6 16327909 . ATGCTGATGCTGC ATGC,CTGCTGATGCTGC,A 1390.70 . AC=12,3,1;AF=0.024,6.048e-03,2.016e-03;AN=496;BaseQRankSum=0.788;DP=2791;FS=6.154;InbreedingCoeff=0.1807;MLEAC=13,3,1;MLEAF=0.026,6.048e-03,2.016e-03;MQ=59.86;MQ0=0;MQRankSum=0.406;QD=2.77;ReadPosRankSum=0.358 GT:AD:DP:GQ:PL
However, this mode of representing a SNP is causing processing and analysis problems further downstream after I've split the multi-sample VCF into individual files. Is there a way to fix this problem such that variants are listed in the most parsimonious (and hopefully standard) way?