Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

single sample VCF - add flag to FILTER based on AF in genotype

I am trying to either hard filter or flag variants where the AF < 0.05. The VCF was generated using MUTECT2 in GATK4.0.4.0. The VCF is a single sample VCF, and does not have 'AF' in the INFO field. It does have AF in the FORMAT field. I did try using VariantFiltration with --genotype-filter-expression "AF < 0.05" --genotype-filter-name "MAF<5%", but I have sinced learnt that this does not add a flag to the FILTER field. Any help appreciated.

Tagged:

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited October 2018

    Hi @mbyvcm,

    INFO-level AF and FORMAT-level AF refer to different metrics. INFO-level AF refers to population allele frequencies while FORMAT-level AF refers to sample-level allele fraction.

    In the Mutect2 callset, if you are lacking AF in the INFO field, then this means you ran the analysis without a germline resource (--germline-resource). So you want to filter records involving somatic alleles with allele fraction <0.05. Is this correct?

    I think here you could consider co-opting the contamination filter to define the allele fraction you want to annotate/filter. What you can do is construct an artificial contamination.table, e.g.

    level   contamination   error
    whole_bam       0.05     0
    

    Then run FilterMutectCalls on your callset like so:

    gatk4040 FilterMutectCalls \
    -V somatic_m2.vcf.gz \
    -O filtermutectcalls.vcf.gz \
    --contamination-table contamination.table
    

    This should give you the result where the 7th column, the FILTER column, will have the contamination annotation.

  • mbyvcmmbyvcm Member

    Thanks for the reply and sorry it has taken me so long to respond.

    In the Mutect2 callset, if you are lacking AF in the INFO field, then this means you ran the analysis without a germline resource (--germline-resource)

    I have included a germline resource, and I have the 'POP_AF' field in the INFO field.

    So you want to filter records involving somatic alleles with allele fraction <0.05 Is this correct?

    Yes. I want to remove any call in the output VCF with a sample frequency <5%. I am wondering if I am missing something here. Hard filtering calls in a VCF at a predefined AF seems like it should have simple solution ( without having to co-opting the contamination filter ). I understand that multialleleic calls will probably be an issue.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @mbyvcm,

    I just want to point out that for Mutect2 callsets, sample-level AF isn't simply an allele fraction calculation as we think in the germline sense. You can read more about Mutect2's AF calculation at the bottom of here and here. The latter discussion brings up the point that the recommended approach to filtering is to use the tumor-lod (tlod), which FilterMutectCalls enables.

    Assuming you are content using Mutect2's AF annotation towards filtering, I think you should check out HAIL or continue with your VariantFiltration with --genotype-filter-expression "AF < 0.05" --genotype-filter-name "MAF<5%" approach. For the latter, check out https://software.broadinstitute.org/gatk/documentation/article?id=12350. You can set your filtered genotypes to no call and then I believe there is a way for you to then further filter the no call records. I would suggest checking out JEXL, searching for "filter no call" (e.g. this hit seems promising), or simply use an awk expression where if the sample column contains ./., then set the FILTER column to MAF<5%.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    P.S. @mbyvcm, just an afterthought--you may wish to check out GATK4 Funcotator, a functional annotator towards downstream analyses. It just came out of beta.

Sign In or Register to comment.