The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

CombineVariants: AD field not updated when merging variants with different REF and ALT alleles

anders_kvistanders_kvist Member Posts: 3
edited October 2012 in Ask the GATK team

When I run CombineVariants on two vcf files with variants at the same position but with different REF alleles and also different sets of ALT alleles, the AD fields for the genotypes are not updated to reflect the changes in the ALT field. The REF and ALT fields and the GT field for each genotype are all correctly updated. For example combining

3   10128965    rs71052293  CTT CT,C,CTTT   19936.43    PASS    AC=1,1,1;AF=0.25,0.25,0.25;AN=4 GT:AD:DP:GQ:PL  0/2:115,0,33,12:230:6.96:980,1237,2795,0,946,1900,7,679,467,817 3/1:97,13,20,16:229:99:804,221,832,581,176,3047,521,0,1653,1595


3   10128965    rs71052293  CT  C,CTT,CTTT  14280.61    PASS    AC=1,1,1;AF=0.25,0.25,0.25;AN=4 GT:AD:DP:GQ:PL  2/1:110,20,33,18:237:1.90:850,289,1027,457,0,1487,147,877,2,1858    0/3:80,48,5,29:209:99:1835,875,977,2101,1119,3322,0,142,331,462


3   10128965    rs71052293  CTT CT,C,CTTT,CTTTT 19936.43    PASS    AC=2,1,2,1;AF=0.250,0.125,0.250,0.125;AN=8;set=Intersection GT:AD:DP:GQ 0/2:115,0,33,12:230:7   3/1:97,13,20,16:229:99  3/1:110,20,33,18:237:2  0/4:80,48,5,29:209:99

There five alleles (one REF and four ALT) but only four AD fields for each genotype.

My command line:

java -jar -Xmx4g GenomeAnalysisTK.jar -T CombineVariants -R human_g1k_v37.fasta -V test_input1.vcf -V test_input2.vcf -o test_combined.vcf

Is this a known limitation or a bug?


Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin
    Accepted Answer

    Hi Anders,

    Only the DP gets updated by CombineVariants; you'll need to rerun the VariantAnnotator to get the updated AD fields.

    Geraldine Van der Auwera, PhD

  • anders_kvistanders_kvist Member Posts: 3

    Thanks for your quick reply. I appreciate the hard work that goes into developing and supporting GATK and it is an excellent and invaluable set of tools. I completely understand that you cannot act on all comments and requests from users. Still, l would like to offer a couple of reflections on this issue. Maybe they are useful, should you decide to develop the functionality of CombineVariants further in the future:

    • After the merge with CombineVariants, the allelic depth values in the AD fields are no longer in the same order as the ref and alt alleles and it is impossible to know which allele each value refers to. The description for the AD field remains: "Allelic depths for the ref and alt alleles in the order listed", but is no longer true. Would it not be better to remove the AD values completely than to keep values that are in the wrong order and hence unusable?

    • Updating the AD field in CombineVariants would only require shuffling the allelic depth values and adding zeroes to match the updated REF and ALT fields. The added runtime would be negligible. Rerunning VariantAnnotator to add AD requires reading through the bam files of all samples (since the link between AD values and allele is lost from the vcf), which for large vcfs with many samples can result in a very long runtime. For my small test case with four samples and a short target interval of ~4MB, the run time was ~30 min.

  • Mark_DePristoMark_DePristo Administrator, Dev Posts: 153 admin

    As I understand it, the AD field should be being filtered out when alleles are merged at a site. If not that's a bug in the code. Also note that it's not possible to compute AD when you've added new alleles, because the count shouldn't be zero. At best it should be ., but that's hard for us to do.

    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin

    Thanks for your comments, Anders -- you're correct that this is a problem. We're putting this on our list to check & fix.

    Geraldine Van der Auwera, PhD

  • anders_kvistanders_kvist Member Posts: 3

    Of course you are right, computing AD for the added alleles in CombineVariants wouldn't work. It would require the original bams for the counts. As you say, filtering out AD or putting in a . is probably the best alternative. Thanks for pointing that out.

Sign In or Register to comment.