We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

What variants does Mutect2 --germline-resource filter out?

Simoner92Simoner92 University of FlorenceMember
Hi everybody,

I am new analyzing WES and I try GATK4 workflow for the detection of somatic variants.

I run Mutect2 in tumor-only mode with these commands:

gatk Mutect2 -R reference.ucsc.hg19.fasta -L Target.bed -I input.bam --f1r2-tar-gz input.tar.gz -O output1.unfiltered.vcf --germline-resource af-only-gnomad.raw.sites.hg19.vcf [max-population-af default value = 0.01]

gatk Mutect2 -R reference.ucsc.hg19.fasta -L Target.bed -I input.bam --f1r2-tar-gz input.tar.gz -O output2.unfiltered.vcf [max-population-af default value = 0.01]

For the first command I obtained a vcf with around 40000 variants whereas with the second a vcf with around 100000 variants. I expected that --germline-resource filters out germline variants based on AF in the resource (af-only-gnomad.raw.sites.hg19.vcf).
However I notice that output1.unfiltered.vcf contains variants with POPAF (negative log 10 population allele frequencies of alt alleles) values that correspond to AF (in the resource) more than 0.01.

Does anybody know the logic used by Mutect2 to filter out variants based on germline resource? Is possible that Mutect2 consider other parameters than AF in the resource?

Thank you,
Simone

Best Answers

Answers

  • Simoner92Simoner92 University of FlorenceMember
    Hi Tiffany,

    Thank you for your reply, it has been useful to understand the way Mutect2 call variants.

    Thank you
  • Simoner92Simoner92 University of FlorenceMember
    Dear @Tiffany_at_Broad,

    After running Mutect2 with --germline-resource as described above I run these commands for filter vcf:

    -gatk LearnReadOrientationModel -I input1.tar.gz -O output1.read-orientation-model.tar.gz
    -gatk GetPileupSummaries -I input.bam -V af-only-gnomad.raw.sites.hg19.vcf -L Target.bed -O output1.getpileupsummaries.table
    -gatk CalculateContamination -I output1.getpileupsummaries.table -tumor-segmentation output1.segments.table -O output1.calculatecontamination.table
    -gatk FilterMutectCalls -R reference.ucsc.hg19.fasta -V output1.unfiltered.vcf --tumor-segmentation output1.segments.table --contamination-table output1.calculatecontamination.table --ob-priors read-output1.orientation-model.tar.gz -O output1.filtered.vcf

    After those I obtained an output1.filtered.vcf with about 8000 variants tagged as germline and 2000 tagged as PASS. As Mutect2 calls only somatic variants, why I obtained so many germline variants?

    Thank you
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @ElenaGrassi

    This is a good explanation. I couldn’t have explained it better myself. Thank you for contributing to the forum and building the GATK knowledge base :smiley:

  • SimoneFISimoneFI FlorenceMember
    Thank you @ElenaGrassi for your exhaustive answer.
Sign In or Register to comment.