Mutect2 Un-filtered variants

lkeelerlkeeler CaliforniaMember

Hi GATK Team,

I am working on converting out pipelines from a very old version of Mutect (SATK, somatic analysis tool kit jar based) to the current version of Mutect 2 (GATK 4.0.11).

Previously Mutect2 produced a VCF with both Passing and Filtered variants. I found this post (https://gatkforums.broadinstitute.org/gatk/discussion/7154/mutect2-output-format) that states that variants now show "." for Passing variants in the Filter column, but I am not seeing any filtered/rejected variants in the final output. Is there a way to have Mutect2 output a vcf with filtered variants as well as Passing variants? Also, I do not see any filter lines in the VCF header, I feel that I may be missing a Mutect2 option.

Below is my command (on separate lines for readability):
/apps/gatk4/gatk-4.0.11.0/gatk-4.0.11.0/gatk --java-options "-Xmx4g" Mutect2
--reference /apps/assay/referenceData/bwamem_reference/hg19.fa
--panel-of-normals /pathto/panel-normal-v3.0.0.vcf
--input /pathtobam/bqsr
--output sample_name.vcf
--create-output-variant-index true
--min_qscore 20
--tumor-sample sample_name
--annotation DepthPerAlleleBySample
--annotation Coverage
--annotation BaseQuality
--annotation ReadPosition
--max-mnp-distance 2
--intervals /hpc/dev/assay/encap/GPC-3.1.4/share/gpc/GPC-mutect-intervals-v3.0.0.intervals
--read-filter AllowAllReadsReadFilter

I also am seeing in some forum posts about the FilterMutectCalls tools, but I am not seeing any mentions of some of the most commonly filtered reasons that I saw in stats file in the old version of Mutect, for example triallelic_site, fstar_tumor_lod, nearby_gap_events, seen_in_panel_of_normals etc. Is FilterMutectCalls only intended for further filtering, or is this responsible for all Mutect variant filtering?

Two other quick questions. Is the LOD discussed in the tool documentation the Allele Frequency? Is there a tool available in GATK4 to produce a coverage file for positions that reflect what number of reads Mutect used to make variant calls?

Thank you for your help!
Lauryn

Comments

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @lkeeler

    I am looking into this issue, I will have an update for you by next week. Given the holiday week we are backed up on our end, but i will definitely get to this by next week.

    Regards
    Bhanu

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Broadie, Moderator admin

    Hi Lauryn,

    Here are a few links that should help:
    1. See VIII to read more about FilterMutectCall filters in the paper.
    2. Mutect2 Tutorial

    I believe you want to use two separate tools here: Mutect2 and FilterMutectCalls.
    For example, Mutect2 can produce a raw unfiltered somatic callset restricted to the specified intervals list. I believe you will not see the '##FILTER' field upon running it. Mutect2 skips from analysis likely variant sites in the matched-control (germline) and sites in the PoN (likely artifactual). It will include borderline variant sites. If you need, you can include all sites with –-genotype-germline-sites, an experimental feature, and –-genotype-pon-sites.

    FilterMutectCalls uses the annotations within the callset, and if provided, uses the contamination table in filtering. It produces a new VCF callset and index with calls that are likely true positives labeled PASS in the FILTER field and calls that are likely false positives labeled with the reason(s) for filtering in the FILTER field. We can view the available filters in the VCF header using '##FILTER'

    Is there a tool available in GATK4 to produce a coverage file for positions that reflect what number of reads Mutect used to make variant calls?

    Answer: No, but usually you can look at the vcf output from Mutect, in particular, the AD (allele depth) field

    Is the LOD discussed in the tool documentation the Allele Frequency?

    Answer: The lods are log-10 likelihood ratios i.e. a normal lod of 4 means the reads support a hom-ref hypothesis for the normal by a factor of 10^4.

    Hope all this helps! Sorry for the delay.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭

    Is there a tool available in GATK4 to produce a coverage file for positions that reflect what number of reads Mutect used to make variant calls?

    To complement what @Tiffany_at_Broad wrote, you can also have Mutect2 generate a bamout via the ---bamout argument and view the resulting bam in IGV. The depth you see is the number of reads that Mutect2 considered, and, furthermore, the read alignments show what Mutect2 was "thinking" about the reads.

    Looking at the vcf AD field will suffice if you just want coverage, but the bamout can be really useful.

Sign In or Register to comment.