We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Variants estimated by gatk4 mutect2 are dramatically less compared estimated by vardict or others.

stellastella KoreaMember
edited December 2018 in Ask the GATK team

Dear GATK team.

Hello. I'm GATK4 user.

I compared the number of variants between gatk4 mutect2 raw variants and Vardict raw variants. As compared between output of callers, I identified they are so different.

I have two questions.

Through this document ( https://software.broadinstitute.org/gatk/documentation/article?id=11136) and @Sheila 's comment, I understood that gnomAD and PON are not effect on the number of variants and that these two database (gnomAD and PON) are only used for annotation. (Q1. Is it okay?)

Q2. Then, could I think that different number of variants between mutect2 and other callers depend only on difference of caller's calling method?

Thanks.

-Stella

Tagged:

Best Answer

Answers

  • stellastella KoreaMember
    edited December 2018

    Additionally, I identified different number of variants between mutect2 with germline resouce and without gemline resource.

    As this result, it looks that germline resource could effect on raw variant calling.

    is it okay?

  • stellastella KoreaMember

    One more,
    like this mutect2 option "--genotype-pon-sites",
    can I apply this strategy to germline resource?

    On the other hand, how to get additional variants that are excluded in raw variant due to germline resource?

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    There is a --genotype-germline-sites argument as well. Mutect2 will run a bit slower, but not by too much.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Through this document ( https://software.broadinstitute.org/gatk/documentation/article?id=11136) and @Sheila 's comment, I understood that gnomAD and PON are not effect on the number of variants and that these two database (gnomAD and PON) are only used for annotation. (Q1. Is it okay?)

    Just to elaborate a bit, this is almost true, except there are some optimizations to do early filtering (i.e. skip assembly, realignment, and somatic genotyping) on variants that, based on the germline resource and panel of normals, seem very unlikely to be true somatic variants.

    Also, while Mutect2 mainly only uses these resources for annotation, the unfiltered vcf from Mutect2 is intended to be passed to FilterMutectCalls, which applies filters based on the annotations.

    Finally, I'm not surprised that VarDict and Mutect2 yield very different calls. We have done a lot of validation (including against VarDict on ~1000 TCGA exomes with matched WGS) and believe that Mutect2's calls are correct much more often. If you are interested in trying out other callers, I personally have a lot of respect for Strelka 2. I haven't tried out Lancet (form the New York Genomes Center) yet because it's new and has a very heavy CPU cost, but I read the paper with great interest and found it very compelling.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    FilterMutectCalls does not remove variants. It applies filters in the FILTER column in accordance with the VCF spec.

  • stellastella KoreaMember

    Dear @davidben ,

    I'm sorry too late reply.
    Thanks for your help.
    I'll try.

    Thanks.
    Oh.

  • stellastella KoreaMember
    edited February 2019

    Dear @davidben

    As your comment, I added "--genotype-germline-sites" option.

    1. When I added the "genotype-germline-sites" option, I didn't know how to display filtered variants in the default settings.

    In the case of "genotype-pon-sites", the variants emitted when adding "the genotype-pon-site option" were annotated as "IN_PON" in vcf column, but I didn't how to display it in case of "germline resources".

    **2. Is it right to add "genotype-germline-site options" to emit variants filtered only by "germline-resource"? I worried that all predicted germline variants by mutect2 will return. **

    In my experience, I identified this question in my targeted bam. In this data, when I add "genotype-germline-site" option, I resqued variants filtered out only by germline_resource.

    Is it right?

    Thanks.

Sign In or Register to comment.