Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Variants estimated by gatk4 mutect2 are dramatically less compared estimated by vardict or others.

stellastella KoreaMember
edited December 2018 in Ask the GATK team

Dear GATK team.

Hello. I'm GATK4 user.

I compared the number of variants between gatk4 mutect2 raw variants and Vardict raw variants. As compared between output of callers, I identified they are so different.

I have two questions.

Through this document ( https://software.broadinstitute.org/gatk/documentation/article?id=11136) and @Sheila 's comment, I understood that gnomAD and PON are not effect on the number of variants and that these two database (gnomAD and PON) are only used for annotation. (Q1. Is it okay?)

Q2. Then, could I think that different number of variants between mutect2 and other callers depend only on difference of caller's calling method?

Thanks.

-Stella

Tagged:

Answers

  • stellastella KoreaMember
    edited December 2018

    Additionally, I identified different number of variants between mutect2 with germline resouce and without gemline resource.

    As this result, it looks that germline resource could effect on raw variant calling.

    is it okay?

  • stellastella KoreaMember

    One more,
    like this mutect2 option "--genotype-pon-sites",
    can I apply this strategy to germline resource?

    On the other hand, how to get additional variants that are excluded in raw variant due to germline resource?

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    There is a --genotype-germline-sites argument as well. Mutect2 will run a bit slower, but not by too much.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Through this document ( https://software.broadinstitute.org/gatk/documentation/article?id=11136) and @Sheila 's comment, I understood that gnomAD and PON are not effect on the number of variants and that these two database (gnomAD and PON) are only used for annotation. (Q1. Is it okay?)

    Just to elaborate a bit, this is almost true, except there are some optimizations to do early filtering (i.e. skip assembly, realignment, and somatic genotyping) on variants that, based on the germline resource and panel of normals, seem very unlikely to be true somatic variants.

    Also, while Mutect2 mainly only uses these resources for annotation, the unfiltered vcf from Mutect2 is intended to be passed to FilterMutectCalls, which applies filters based on the annotations.

    Finally, I'm not surprised that VarDict and Mutect2 yield very different calls. We have done a lot of validation (including against VarDict on ~1000 TCGA exomes with matched WGS) and believe that Mutect2's calls are correct much more often. If you are interested in trying out other callers, I personally have a lot of respect for Strelka 2. I haven't tried out Lancet (form the New York Genomes Center) yet because it's new and has a very heavy CPU cost, but I read the paper with great interest and found it very compelling.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    FilterMutectCalls does not remove variants. It applies filters in the FILTER column in accordance with the VCF spec.

  • stellastella KoreaMember

    Dear @davidben ,

    I'm sorry too late reply.
    Thanks for your help.
    I'll try.

    Thanks.
    Oh.

  • stellastella KoreaMember
    edited February 28

    Dear @davidben

    As your comment, I added "--genotype-germline-sites" option.

    1. When I added the "genotype-germline-sites" option, I didn't know how to display filtered variants in the default settings.

    In the case of "genotype-pon-sites", the variants emitted when adding "the genotype-pon-site option" were annotated as "IN_PON" in vcf column, but I didn't how to display it in case of "germline resources".

    **2. Is it right to add "genotype-germline-site options" to emit variants filtered only by "germline-resource"? I worried that all predicted germline variants by mutect2 will return. **

    In my experience, I identified this question in my targeted bam. In this data, when I add "genotype-germline-site" option, I resqued variants filtered out only by germline_resource.

    Is it right?

    Thanks.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @stella

    1. The germline resource acts differently from the panel of normals. Presence in the panel of normals causes a variant to be filtered, whereas it is not presence in the germline resource alone that matters. Rather, Mutect2 uses the population allele frequency (AF) info field from the resource to populate the POP_AF annotation, which is then used by Mutect2's probabilistic models for germline and contamination variants to decide whether to filter the call. Our most recent documentation for these models is here: https://github.com/broadinstitute/gatk/blob/5a74c30628cb87ff8db87f0db64e18b7bbdd767a/docs/mutect/mutect.pdf

    2. To save runtime, Mutect2 does not bother genotyping variants that are almost certainly non-somatic. That is, if evidence in the pileup of bases shows a lot of variant reads in the normal or if the population allele frequency from the germline resource is very large (in tumor-only mode), Mutect2 pre-filters the variant without bothering to do the expensive steps of local assembly and realignment. The argument --genotype-germline-sites overrides this, so that all evidence of variation triggers assembly, realignment, and somatic genotyping. That is, by default you don't see every rejected germline variant in the vcf, with --genotype-germline-sites you do. They still get the germline filter, of course, but you see them.

Sign In or Register to comment.