The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.
Allele frequency and depth VCF produced by MuTect2
From my understanding of the VCF output, the AF[format] field (Allele fraction of the event in the tumor) equals to :
AD[format] / DP[format].
With AD being the depth of coverage of each allele per sample (we use the alt allele when calculating AF),
and DP being the "filtered" depth of coverage for each sample (we use the one computed from the tumor sample when calculating AF).
And with some further reading, I think I figured that :
AD[format] <=> all sample-reads minus uninformative reads.
AD is computed with GATK DepthPerAlleleBySample.
DP[format] <=> all sample-reads minus filtered reads (which is != from uninformative reads).
DP[info] <=> all site-levels-reads (T+N samples), minus nothing.
DP is computed with GATK Coverage
From the GATK doc (http://gatkforums.broadinstitute.org/gatk/discussion/4721/using-depth-of-coverage-metrics-for-variant-evaluation), one can read the following :
The key difference is that the AD metric is based on unfiltered read counts while the sample-level DP is based on filtered read counts (see tool documentation for a list of read filters that are applied by default for each tool). As a result, they should be interpreted differently.
If AF is indeed AD[format]/DP[format], isn't it strange to computed AF by dividing an unfiltered-read depth by a filtered-read depth ?
Ps : I tried to "verify" the DP[info] depth (computed inside the MuTect2 run), by using GATK DepthOfCoverage with the same input (non-marked_recalibrated T/N BAMs). For a given position, I find a higher depth with GATK DepthOfCoverage.(501 vs 434). Is the DP[info] really based on unfiltered-reads ? Or do GATK Coverage & GATK DepthOfCoverage have some minor differences ?