Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Mutec2 germline-resource for Mouse Exome

MT_badrMT_badr Member
Hi everyone,

i am analyzing mouse exome data with mutec2 for somatic variants detection. I am a bit stuck with the germline resources, couldn't find one for mm10. I have used the sagner provided snps and indels for the known sites in Baserecalibator. Is it possible to used them too for the germline AF resourse? and what would be the disadvantage of running Mutec2 without this resource at all?
Many thanks in advance

Answers

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @MT_badr The critical thing is that the germline resource must contain an AF (population allele frequency) INFO field annotation. I suppose these are inbred mouse models for which population allele frequencies are essentially either 0 or 1. In that case you could use known SNPs and indels as a blacklist with something like gatk SelectVariants -V filtered-mutect-calls.vcf --discordance sanger-snvs-and-indels.vcf -O germline-filtered-mutect-calls.vcf.

    Running Mutect2 without any germline filtering would be a very bad idea in tumor-only mode but not so harmful if you have matched normals.

  • MT_badrMT_badr Member
    > @davidben said:
    > @MT_badr The critical thing is that the germline resource must contain an `AF` (population allele frequency) INFO field annotation. I suppose these are inbred mouse models for which population allele frequencies are essentially either 0 or 1. In that case you could use known SNPs and indels as a blacklist with something like `gatk SelectVariants -V filtered-mutect-calls.vcf --discordance sanger-snvs-and-indels.vcf -O germline-filtered-mutect-calls.vcf`.
    >
    > Running Mutect2 without any germline filtering would be a very bad idea in tumor-only mode but not so harmful if you have matched normals.

    Thanks a lot for your reply. yes they are inbred B6 mice. for some of them i have a matching normal sample which i planning to feed to Mutec2 as well (unfortunately not for all samples but all mice come from same mother and are housed together therefore i would hope those without a matching normal won't carry many germline mutations).
    i Have uploaded two screenshots of the germline resource for humans gatk provide (with the allee frequency) and the only known snp file i could fine for mm10 for different mice strains. as an experiment i provided thw mutec2 with this snp file and it actually gave out much less variants in comparison to running mutec2 alone with the normal sample. But i am not sure on which basis did it filter actually? is this filtering with this snp file right this way? or is there any other resource to get AF in B6 mice for mm10?
    there is also a --dbsnp argument in Mutec2, if i feed it with the known snps for mm10 will that be equivalent to running your suggested SelectVariants step?
    many thanks in advance
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    as an experiment i provided Mutect2 with this snp file

    What exactly do you mean by this -- what argument did you use?

    it actually gave out much less variants in comparison to running mutec2 alone with the normal sample

    If you hard filter every known variant, you will remove all variants in the matched normal (except for rare variants and de novo mutations), as well as variants that aren't in the matched normal. Unless the SNP file is missing a lot of rare variants or the sample has a lot of de novo mutations filtering with the SNP file will therefore usually be more aggressive. This is especially so if the matched normal doesn't have high depth, in which case some germline variants in the normal won't be detected.

    is there any other resource to get AF in B6 mice for mm10

    Unfortunately I don't think any of us on the GATK team have experience with mice.

    there is also a --dbsnp argument in Mutect2

    That argument doesn't do anything. It was a relic of some contrived Java inheritance so that Mutect2 and HaplotypeCaller could share code, which we finally fixed in the 4.1.1 release. It has been removed along with a few other inactive arguments.

  • MT_badrMT_badr Member
    @davidben i used the avilable mm10-db137 vcf file under the argument ( --germline-resource) before i noticed it's not the right file to use. I am not sure what exactly it did filter or how to get the right AF file right now or if it's possible to extract this information form the snp vcf file. i was thinking about feeding the snp vcf file to mutec2 under the  --dbsnp argument but since it doesn't do anything then i have excluded it.
    one option i was considering is to create a panel of normal from different genotypes. I have two genotypes both on the same back grounds, one WT and on KO. is it possible to do the panel of normals from the normal samples of both genotypes? i couldn't find any answers regarding this matter in the forum.

    Thanks again for the help.
Sign In or Register to comment.