Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

AD allele depth interpretation

Hello, I have a query on the interpretation of the AD variable in a vcf generated by calling about 800 samples together.
The header defines it as:
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
and the forum further elaborates:
AD is the unfiltered allele depth, i.e. the number of reads that support each of the reported alleles. All reads at the position (including reads that did not pass the variant caller’s filters) are included in this number, except reads that were considered uninformative. Reads are considered uninformative when they do not provide enough statistical evidence to support one allele over another.

However, most of my variants have a depth of 500 - 2000x, and the AD for a position may be ref AD 4 + alt AD 4. I'm not sure how these values fit the definition, as surely they should total to be approximately the high depth? Viewing the position on individual bams in IGV confirms that there are many more reads with the ref and alt alleles, so even if it were filtering out a lot of them (which I doubt is the case), it would list higher values than these? Perhaps I am misunderstanding the definition here, and if so, how would I go about getting the number of reads that display ref/alt for the position of interest in the vcf file?

I've tried this using both UnifiedGenotyper in GATK3.8-1 and HaplotypeCaller in GATK4.0.4.0.

Answers

  • Amir_AriffAmir_Ariff Member
    I seem to have formatted out the ##FORMAT line, which should read:
    ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed"
Sign In or Register to comment.