We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

CombineVariants: AD field not updated when merging variants with different REF and ALT alleles

anders_kvistanders_kvist Member
edited October 2012 in Ask the GATK team

When I run CombineVariants on two vcf files with variants at the same position but with different REF alleles and also different sets of ALT alleles, the AD fields for the genotypes are not updated to reflect the changes in the ALT field. The REF and ALT fields and the GT field for each genotype are all correctly updated. For example combining

3   10128965    rs71052293  CTT CT,C,CTTT   19936.43    PASS    AC=1,1,1;AF=0.25,0.25,0.25;AN=4 GT:AD:DP:GQ:PL  0/2:115,0,33,12:230:6.96:980,1237,2795,0,946,1900,7,679,467,817 3/1:97,13,20,16:229:99:804,221,832,581,176,3047,521,0,1653,1595


3   10128965    rs71052293  CT  C,CTT,CTTT  14280.61    PASS    AC=1,1,1;AF=0.25,0.25,0.25;AN=4 GT:AD:DP:GQ:PL  2/1:110,20,33,18:237:1.90:850,289,1027,457,0,1487,147,877,2,1858    0/3:80,48,5,29:209:99:1835,875,977,2101,1119,3322,0,142,331,462


3   10128965    rs71052293  CTT CT,C,CTTT,CTTTT 19936.43    PASS    AC=2,1,2,1;AF=0.250,0.125,0.250,0.125;AN=8;set=Intersection GT:AD:DP:GQ 0/2:115,0,33,12:230:7   3/1:97,13,20,16:229:99  3/1:110,20,33,18:237:2  0/4:80,48,5,29:209:99

There five alleles (one REF and four ALT) but only four AD fields for each genotype.

My command line:

java -jar -Xmx4g GenomeAnalysisTK.jar -T CombineVariants -R human_g1k_v37.fasta -V test_input1.vcf -V test_input2.vcf -o test_combined.vcf

Is this a known limitation or a bug?


Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    Hi Anders,

    Only the DP gets updated by CombineVariants; you'll need to rerun the VariantAnnotator to get the updated AD fields.

  • Thanks for your quick reply. I appreciate the hard work that goes into developing and supporting GATK and it is an excellent and invaluable set of tools. I completely understand that you cannot act on all comments and requests from users. Still, l would like to offer a couple of reflections on this issue. Maybe they are useful, should you decide to develop the functionality of CombineVariants further in the future:

    • After the merge with CombineVariants, the allelic depth values in the AD fields are no longer in the same order as the ref and alt alleles and it is impossible to know which allele each value refers to. The description for the AD field remains: "Allelic depths for the ref and alt alleles in the order listed", but is no longer true. Would it not be better to remove the AD values completely than to keep values that are in the wrong order and hence unusable?

    • Updating the AD field in CombineVariants would only require shuffling the allelic depth values and adding zeroes to match the updated REF and ALT fields. The added runtime would be negligible. Rerunning VariantAnnotator to add AD requires reading through the bam files of all samples (since the link between AD values and allele is lost from the vcf), which for large vcfs with many samples can result in a very long runtime. For my small test case with four samples and a short target interval of ~4MB, the run time was ~30 min.

  • Mark_DePristoMark_DePristo Broad InstituteMember admin

    As I understand it, the AD field should be being filtered out when alleles are merged at a site. If not that's a bug in the code. Also note that it's not possible to compute AD when you've added new alleles, because the count shouldn't be zero. At best it should be ., but that's hard for us to do.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks for your comments, Anders -- you're correct that this is a problem. We're putting this on our list to check & fix.

  • Of course you are right, computing AD for the added alleles in CombineVariants wouldn't work. It would require the original bams for the counts. As you say, filtering out AD or putting in a . is probably the best alternative. Thanks for pointing that out.

Sign In or Register to comment.