about germline resource.

stellastella KoreaMember

Dear GATK Team,

I've been experimenting with calling somatic variants in tumor only targeted seq data using GATK4 mutect2. (I'll call this output is raw vcf)

And I run FilterMutectCalls for adding result(ex. PASS, germline risk ..). (I'll call this output file is filter vcf)

I have some questions.

  1. I'm bit confused when the gnomAD file works.

When make raw mutect2 vcf, gnomAD database is used?
Or when run FilterMutectCalls, gnomAD databse is used?

2.I think "germline risk" filter in filter vcf is only marked when variants exist in gnomAD database. Is it okay?

Thanks.

Best Answers

  • SheilaSheila Broad Institute admin
    edited May 2018 Accepted Answer

    @stella
    Hi Stella,

    Yes, that diagram is pretty accurate.

    To clarify:

    The PoN file is used strictly to filter out sites that are common germline variants and/or artifacts. If 2 or more samples in the PoN have the variant site, that site is filtered out as "IN_PON". You can output PoN sites to the VCF using --genotype-pon-sites. Without that argument, the sites in the PoN will not show up in the unfiltered VCF.

    The germline resource is used to calculate the lod scores of the variant being germline or somatic. Those lod scores are used in FilterMutectCalls.

    I hope that helps.

    -Sheila

Answers

  • stellastella KoreaMember
    edited May 2018

    Hi Sheila. Thanks for your response.!

    I read tutorial and post that you recommended.
    I have some questions.

    In https://gatkforums.broadinstitute.org/gatk/discussion/11136/how-to-call-somatic-mutations-using-gatk4-mutect2,

    " ~~Prefilter~~ variant sites in a panel of normals callset. Specify the panel of normals (PoN) VCF with -pon. Section 2 outlines how to create a PoN. The panel of normals not only represents common germline variant sites, it presents commonly noisy sites in sequencing data, e.g. mapping artifacts or other somewhat random but systematic artifacts of sequencing. By default, the tool does not reassemble nor emit variant sites that match identically to a PoN variant. To enable genotyping of PoN sites, use the --genotype-pon-sites option. If the match is not exact, e.g. there is an allele-mismatch, the tool reassembles the region, emits the calls and annotates matches in the INFO field with IN_PON. "

    I think mutect2 prefilter and filter mean following process. Is it okay?
    And I understand gnomAD is not used in pre-filtering process but only used in filtermutectcall.

    Can I think of it this way?

    Thanks Sheila.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited May 2018 Accepted Answer

    @stella
    Hi Stella,

    Yes, that diagram is pretty accurate.

    To clarify:

    The PoN file is used strictly to filter out sites that are common germline variants and/or artifacts. If 2 or more samples in the PoN have the variant site, that site is filtered out as "IN_PON". You can output PoN sites to the VCF using --genotype-pon-sites. Without that argument, the sites in the PoN will not show up in the unfiltered VCF.

    The germline resource is used to calculate the lod scores of the variant being germline or somatic. Those lod scores are used in FilterMutectCalls.

    I hope that helps.

    -Sheila

  • stellastella KoreaMember

    Hi @Sheila
    Thank you for your reply.
    :) :)

Sign In or Register to comment.