Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Filtering VCF help

satish86satish86 AshevilleMember

Hi ,

I am trying to Filter the VCF based on two filtering criteria, 1) Coverage >3x and 2) Minimum Allele Frequency should be 12% of this >3x filtered Vaiants.
For the filtering based on coverage I used the expression " --filterExpression " DP >= 3", But my question what would be the suitable expression to get my second filtering done ?

Any help will be great.
Thanks,
Satish

Best Answer

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
  • satish86satish86 AshevilleMember

    Hi Shlee,

    Thanks for your response. The problem with setting up AF as a filtering criteria is that in my VCF all my AF's are represented as either 1.000 or 0.500.

    eg:
    chr4 31486607 . G T 33.74 . AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=16.87 GT:AD:DP:GQ:PL 1/1:0,2:2:6:61,6,0
    chr4 31493625 . A G 54.74 . AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=27.37 GT:AD:DP:GQ:PL 1/1:0,2:2:6:82,6,0
    chr4 31494912 . C T 54.74 . AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=27.37 GT:AD:DP:GQ:PL 1/1:0,2:2:6:82,6,0

    If I setup a filter of AF>0.12, it is still going to pull all the variants with out any filtering?

    From your links I was not able to figure out if the jexl expression can help me to calculate the accurate AF values that I can used to filter based on AF>12? Can you help with understanding this?

    Thanks again,
    Satish

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    @satish86 -- Sounds like you want to filter variants in your single sample VCFs that are common and present in a population VCF at 12% or higher. Or is it that you'd like to filter on allele depth (AD) within your single sample VCF?

  • satish86satish86 AshevilleMember

    Hi Shlee,

    I am trying to perform Variant Filtering on a Single Sample VCF file based on the criteria at 12% of reads should support the allele at a 3x covg.

    My assumption for the filter was that Min allele frequency or AF >0.12 and DP>=3. Now I am trying to get that Min. Allele Frequency fraction or the real allele fraction?

    Satish

  • SheilaSheila Broad InstituteMember, Broadie admin

    @satish86
    Hi Satish,

    Have a look at this thread. I am going to ask the team about adding a new annotation that calculates this for you, so you don't need to write out the long JEXL expression :smile:

    -Sheila

  • satish86satish86 AshevilleMember

    Thanks Sheila, that would be really helpful :smile: .

    Regards,
    Satish

  • satish86satish86 AshevilleMember

    Hi Sheila,

    Just want to check back with you on the new annotation. Do you have any updates on this.

    Thanks you very much again for helping is with this.

    Satish

  • SheilaSheila Broad InstituteMember, Broadie admin

    @satish86
    Hi Satish,

    Ah, I completely forgot to ask about this at the last meeting! Sorry. I will ask the team next week and get back to you.

    -Sheila

  • satish86satish86 AshevilleMember

    Awesome, Thanks Sheila!

  • SheilaSheila Broad InstituteMember, Broadie admin

    @satish86
    Hi Satish,

    Sorry for getting your hopes up. I just talked to the developers, and they don't think the annotation is useful. In fact, they think it is wrong to be be filtering on the Allele frequency! Is there a reason you are filtering on allele frequency?

    -Sheila

  • satish86satish86 AshevilleMember

    Hi Sheila,

    All I am trying to get is 12% of the reads should support the allele at a coverage of 3X?

    I can easily filter the VCF file based on coverage with DP>3, but I was looking for a parameter like MAF (minimum allele frequency), that way I can set the filter of MAF>0.12 and DP>3.

    Is this possible?

    Regards,
    Satish

  • SheilaSheila Broad InstituteMember, Broadie admin

    @satish86
    Hi Satish,

    Yes, you can use DP > 3 and AF > 0.12 (using the formula in the thread above) with VariantFiltration.

    We don't recommend filtering on DP or AF because HaplotypeCaller takes into account many things before emitting a variant call. For example, a site may have a low depth but very high quality bases and mapping qualities. In the same way, the AF can be 0.1, but it means different things when the depth is 5, 100, or 500.

    Our basic hard filtering recommendations are here. Of course, it is up to you to decide what additional filters you wish to apply :smile:

    -Sheila

  • satish86satish86 AshevilleMember

    Hi Sheila,

    I had a question! How Can I filter based on AD?

    Say if AD (0,274), is 274 is the read depth of my allele? and 0 is the read depth of my Ref allele?

    How can I set up a filter that gives all variant that has an allele depth >12?

    Thanks,
    Satish

  • SheilaSheila Broad InstituteMember, Broadie admin

    @satish86
    Hi Satish,

    Have a look at this thread which should answer your question :smile:

    -Sheila

  • satish86satish86 AshevilleMember

    Thanks for that Sheila. I had one more question

    so from above if AD(3,274)

    Allele Read depth :274 , Ref Read depth = 3

    so % of Allele RD = (274/(274+3)) = 0.98

    Is my assumption correct?

    Satish

  • samljsamlj Member
    edited June 19
    @satish86, @shlee I'm looking to mark heterozygous/homozygous alternate genotypes which don't have an alternate allele read depth of 20% of the total reads to no-call. Did you ever figure out a good way to do this? From what I can tell you have to have the specific sample ID to get the AD for either allele, so I would have to loop through all samples to effectively do this.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @samlj

    I would have to loop through all samples to effectively do this.

    We currently do not have a tool to do this specifically and yes you will need to create a custom script to do this.

  • 29043594952904359495 Member

    @Sheila @bhanuGandham , as we know, the AF in germline should be 0.5 or 1, so can you give some reasons why the AF is not these two value, thanks a lot

Sign In or Register to comment.