To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Allele frequency and depth VCF produced by MuTect2

escaonescaon Limoges, FranceMember

Hi all,

From my understanding of the VCF output, the AF[format] field (Allele fraction of the event in the tumor) equals to :
AD[format] / DP[format].
With AD being the depth of coverage of each allele per sample (we use the alt allele when calculating AF),
and DP being the "filtered" depth of coverage for each sample (we use the one computed from the tumor sample when calculating AF).

And with some further reading, I think I figured that :
AD[format] <=> all sample-reads minus uninformative reads.
AD is computed with GATK DepthPerAlleleBySample.
DP[format] <=> all sample-reads minus filtered reads (which is != from uninformative reads).
DP[info] <=> all site-levels-reads (T+N samples), minus nothing.
DP is computed with GATK Coverage

From the GATK doc (http://gatkforums.broadinstitute.org/gatk/discussion/4721/using-depth-of-coverage-metrics-for-variant-evaluation), one can read the following :

The key difference is that the AD metric is based on unfiltered read counts while the sample-level DP is based on filtered read counts (see tool documentation for a list of read filters that are applied by default for each tool). As a result, they should be interpreted differently.

If AF is indeed AD[format]/DP[format], isn't it strange to computed AF by dividing an unfiltered-read depth by a filtered-read depth ?

Ps : I tried to "verify" the DP[info] depth (computed inside the MuTect2 run), by using GATK DepthOfCoverage with the same input (non-marked_recalibrated T/N BAMs). For a given position, I find a higher depth with GATK DepthOfCoverage.(501 vs 434). Is the DP[info] really based on unfiltered-reads ? Or do GATK Coverage & GATK DepthOfCoverage have some minor differences ?

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Coverage as annotated by a caller like HaplotypeCaller is subject to the filters applied by the caller, which may be different from those applied by the DepthOfCoverage tool. So it's possible to see some minor discrepancies.
Sign In or Register to comment.