We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

#### Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.

# Filtering VCF help

AshevilleMember

Hi ,

I am trying to Filter the VCF based on two filtering criteria, 1) Coverage >3x and 2) Minimum Allele Frequency should be 12% of this >3x filtered Vaiants.
For the filtering based on coverage I used the expression " --filterExpression " DP >= 3", But my question what would be the suitable expression to get my second filtering done ?

Any help will be great.
Thanks,
Satish

Tagged:

Hi @satish86. For filtering based on allele frequencies, you can use JEXL expressions. Here are two links to forum posts to help you get started:

• AshevilleMember

Hi Shlee,

Thanks for your response. The problem with setting up AF as a filtering criteria is that in my VCF all my AF's are represented as either 1.000 or 0.500.

eg:
chr4 31486607 . G T 33.74 . AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=16.87 GT:AD:DP:GQ:PL 1/1:0,2:2:6:61,6,0
chr4 31493625 . A G 54.74 . AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=27.37 GT:AD:DP:GQ:PL 1/1:0,2:2:6:82,6,0
chr4 31494912 . C T 54.74 . AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=27.37 GT:AD:DP:GQ:PL 1/1:0,2:2:6:82,6,0

If I setup a filter of AF>0.12, it is still going to pull all the variants with out any filtering?

From your links I was not able to figure out if the jexl expression can help me to calculate the accurate AF values that I can used to filter based on AF>12? Can you help with understanding this?

Thanks again,
Satish

@satish86 -- Sounds like you want to filter variants in your single sample VCFs that are common and present in a population VCF at 12% or higher. Or is it that you'd like to filter on allele depth (AD) within your single sample VCF?

• AshevilleMember

Hi Shlee,

I am trying to perform Variant Filtering on a Single Sample VCF file based on the criteria at 12% of reads should support the allele at a 3x covg.

My assumption for the filter was that Min allele frequency or AF >0.12 and DP>=3. Now I am trying to get that Min. Allele Frequency fraction or the real allele fraction?

Satish

@satish86
Hi Satish,

Have a look at this thread. I am going to ask the team about adding a new annotation that calculates this for you, so you don't need to write out the long JEXL expression

-Sheila

• AshevilleMember

Thanks Sheila, that would be really helpful .

Regards,
Satish

• AshevilleMember

Hi Sheila,

Just want to check back with you on the new annotation. Do you have any updates on this.

Thanks you very much again for helping is with this.

Satish

@satish86
Hi Satish,

-Sheila

• AshevilleMember

Awesome, Thanks Sheila!

@satish86
Hi Satish,

Sorry for getting your hopes up. I just talked to the developers, and they don't think the annotation is useful. In fact, they think it is wrong to be be filtering on the Allele frequency! Is there a reason you are filtering on allele frequency?

-Sheila

• AshevilleMember

Hi Sheila,

All I am trying to get is 12% of the reads should support the allele at a coverage of 3X?

I can easily filter the VCF file based on coverage with DP>3, but I was looking for a parameter like MAF (minimum allele frequency), that way I can set the filter of MAF>0.12 and DP>3.

Is this possible?

Regards,
Satish

@satish86
Hi Satish,

Yes, you can use DP > 3 and AF > 0.12 (using the formula in the thread above) with VariantFiltration.

We don't recommend filtering on DP or AF because HaplotypeCaller takes into account many things before emitting a variant call. For example, a site may have a low depth but very high quality bases and mapping qualities. In the same way, the AF can be 0.1, but it means different things when the depth is 5, 100, or 500.

Our basic hard filtering recommendations are here. Of course, it is up to you to decide what additional filters you wish to apply

-Sheila

• AshevilleMember

Hi Sheila,

Say if AD (0,274), is 274 is the read depth of my allele? and 0 is the read depth of my Ref allele?

How can I set up a filter that gives all variant that has an allele depth >12?

Thanks,
Satish

@satish86
Hi Satish,

-Sheila

• AshevilleMember

Thanks for that Sheila. I had one more question

so % of Allele RD = (274/(274+3)) = 0.98

Is my assumption correct?

Satish

• Member
edited June 19
@satish86, @shlee I'm looking to mark heterozygous/homozygous alternate genotypes which don't have an alternate allele read depth of 20% of the total reads to no-call. Did you ever figure out a good way to do this? From what I can tell you have to have the specific sample ID to get the AD for either allele, so I would have to loop through all samples to effectively do this.