Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Creating updated resources from gnomAD for somatic variant calling (Mutect2)

Hi,
I want to call somatic variants in tumor-only mode with Mutect2. For the GRCh38 reference, in the GATK Resource Bundle I found the files af-only-gnomad.hg38.vcf.gz (to remove germline muations) and small_exac_common_3.hg38.vcf.gz (for the GetPileupSummaries and CalculateContamination commands for the filtering step).

I'd like to re-create these files using the last version of gnomAD files, both gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz and gnomad.genomes.r2.1.1.sites.liftover_grch38.vcf.bgz, exploiting the command lines in your mutect_resources.wdl resource).

I have two questions:
1) The af-only-gnomad.hg38.vcf.gz and small_exac_common_3.hg38.vcf.gz have 227532 and 4749 variants with no PASS value in the FILTER column, respectively. Why are there in these two files the no-PASS variants? This let me think that I must not remove these variants.
2) The command gatk SelectVariants -R reference.fa af-only-gnomeAD.vcf --select-type-to-include SNP -restrict-alleles-to BIALLELIC --selectExpressions "AF > 0.05" -O biallelic-gnomeAD.vcf.gz --lenient leads to a VCF file with AF values from 0.05 to 1. Why does the AF values range from 0.051 to 0.499 in the small_exac_common_3.hg38.vcf.gz file?

Thank you for your hard work!

Sign In or Register to comment.