The Frontline Support team will be offline December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks as we get to all of your questions. Happy Holidays!
CombineVariants incorrectly(?) complains about badly formed variant when merging multiallelic site
Hi GATK team,
I am attempting to combine a HaplotypeCaller generated VCF with some indels called using pindel using the following arguments (GATK v3.3-0-g37228af):
-R /data/shared/ref/b37/human_g1k_v37.fasta -T CombineVariants --variant:GATK var.HiSeqDecember.raw.vcf --variant:pindel pindel_combined.vcf -o var.HiSeqDecember.pindel.raw.vcf -genotypeMergeOptions PRIORITIZE -priority GATK,pindel
However I get the following error:
ERROR MESSAGE: Badly formed variant context at location 1:157718231; getEnd() was 157718235 but this VariantContext contains an END key with value 157718231
The variants in question are (from GATK):
1 157718231 . CAAAT C,CAAATAAAT 2533.56 PASS AC=3,13;AF=0.125,0.542;AN=24;BaseQRankSum=1.762;ClippingRankSum=-0.327;DP=126;FS=0.000;HOMLEN=39;HOMSEQ=AAATAAATAAATAAATAAATAAATAAATAAATAAATAAA;InbreedingCoeff=-0.1260;MLEAC=3,13;MLEAF=0.125,0.542;MQ=70.00;MQ0=0;MQRankSum=0.920;QD=22.22;ReadPosRankSum=-0.893;SOR=0.976;SVLEN=4;SVTYPE=INS;set=Intersection GT:DP:GQ 0/0:10:30 0/2:9:18 2/2:6:18 2/2:5:15 0/1:10:99 0/2:14:99 2/2:8:24 2/2:6:18 2/2:7:21 0/2:17:99 0/1:5:75 0/1:6:27
and (from pindel):
1 157718231 . C CAAAT . PASS AC=2;AF=0.143;AN=14;END=157718231;HOMLEN=39;HOMSEQ=AAATAAATAAATAAATAAATAAATAAATAAATAAATAAA;SVLEN=4;SVTYPE=INS;set=variant3-variant4-variant6-variant7-variant8-variant9-variant10 GT:AD ./. ./. 0/0:0,7 0/0:0,6 ./. 0/0:0,9 0/0:0,8 0/0:0,7 0/0:0,8 1/1:0,12 ./. ./.
It is worth noting that the pindel VCF here was merged together from several pindel-generated VCFs using CombineVariants without any complaint from the GATK. It looks to me that the END key is correct for the pindel variant (a simple insertion), but the GATK is confused due to the mixed deletion/insertion variant generated by the HaplotypeCaller at the same position (without an END key).
I can rerun the command after stripping all END tags from the pindel VCF and the command completes successfully, so this is not a showstopper for me but I assume this is a bug(?) and if so, it would be great if there were a fix.