Variants estimated by gatk4 mutect2 are dramatically less compared estimated by vardict or others.

stellastella KoreaMember
edited December 2018 in Ask the GATK team

Dear GATK team.

Hello. I'm GATK4 user.

I compared the number of variants between gatk4 mutect2 raw variants and Vardict raw variants. As compared between output of callers, I identified they are so different.

I have two questions.

Through this document ( https://software.broadinstitute.org/gatk/documentation/article?id=11136) and @Sheila 's comment, I understood that gnomAD and PON are not effect on the number of variants and that these two database (gnomAD and PON) are only used for annotation. (Q1. Is it okay?)

Q2. Then, could I think that different number of variants between mutect2 and other callers depend only on difference of caller's calling method?

Thanks.

-Stella

Tagged:

Answers

  • stellastella KoreaMember
    edited December 2018

    Additionally, I identified different number of variants between mutect2 with germline resouce and without gemline resource.

    As this result, it looks that germline resource could effect on raw variant calling.

    is it okay?

  • stellastella KoreaMember

    One more,
    like this mutect2 option "--genotype-pon-sites",
    can I apply this strategy to germline resource?

    On the other hand, how to get additional variants that are excluded in raw variant due to germline resource?

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    There is a --genotype-germline-sites argument as well. Mutect2 will run a bit slower, but not by too much.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Through this document ( https://software.broadinstitute.org/gatk/documentation/article?id=11136) and @Sheila 's comment, I understood that gnomAD and PON are not effect on the number of variants and that these two database (gnomAD and PON) are only used for annotation. (Q1. Is it okay?)

    Just to elaborate a bit, this is almost true, except there are some optimizations to do early filtering (i.e. skip assembly, realignment, and somatic genotyping) on variants that, based on the germline resource and panel of normals, seem very unlikely to be true somatic variants.

    Also, while Mutect2 mainly only uses these resources for annotation, the unfiltered vcf from Mutect2 is intended to be passed to FilterMutectCalls, which applies filters based on the annotations.

    Finally, I'm not surprised that VarDict and Mutect2 yield very different calls. We have done a lot of validation (including against VarDict on ~1000 TCGA exomes with matched WGS) and believe that Mutect2's calls are correct much more often. If you are interested in trying out other callers, I personally have a lot of respect for Strelka 2. I haven't tried out Lancet (form the New York Genomes Center) yet because it's new and has a very heavy CPU cost, but I read the paper with great interest and found it very compelling.

  • manbamanba Member ✭✭

    @davidben said:

    Also, while Mutect2 mainly only uses these resources for annotation, the unfiltered vcf from Mutect2 is intended to be passed to FilterMutectCalls, which applies filters based on the annotations.

    but after FilterMutectCalls. still many sites with annotation exist, is it normal, why some with annotaion keep, some dropped, any reason, thanks a lot

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    FilterMutectCalls does not remove variants. It applies filters in the FILTER column in accordance with the VCF spec.

Sign In or Register to comment.