**We've moved!**

This site is now read-only. You can find our new documentation site and support forum for posting questions here.

Be sure to read our welcome blog!

# Filtering VCF help

satish86
AshevilleMember ✭

Hi ,

I am trying to Filter the VCF based on two filtering criteria, 1) Coverage >3x and 2) Minimum Allele Frequency should be 12% of this >3x filtered Vaiants.

For the filtering based on coverage I used the expression " --filterExpression " DP >= 3", But my question what would be the suitable expression to get my second filtering done ?

Any help will be great.

Thanks,

Satish

Tagged:

## Answers

Hi @satish86. For filtering based on allele frequencies, you can use JEXL expressions. Here are two links to forum posts to help you get started:

Hi Shlee,

Thanks for your response. The problem with setting up AF as a filtering criteria is that in my VCF all my AF's are represented as either 1.000 or 0.500.

eg:

chr4 31486607 . G T 33.74 . AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=16.87 GT:AD:DP:GQ:PL 1/1:0,2:2:6:61,6,0

chr4 31493625 . A G 54.74 . AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=27.37 GT:AD:DP:GQ:PL 1/1:0,2:2:6:82,6,0

chr4 31494912 . C T 54.74 . AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=27.37 GT:AD:DP:GQ:PL 1/1:0,2:2:6:82,6,0

If I setup a filter of AF>0.12, it is still going to pull all the variants with out any filtering?

From your links I was not able to figure out if the jexl expression can help me to calculate the accurate AF values that I can used to filter based on AF>12? Can you help with understanding this?

Thanks again,

Satish

@satish86 -- Sounds like you want to filter variants in your single sample VCFs that are common and present in a population VCF at 12% or higher. Or is it that you'd like to filter on allele depth (AD) within your single sample VCF?

Hi Shlee,

I am trying to perform Variant Filtering on a Single Sample VCF file based on the criteria at 12% of reads should support the allele at a 3x covg.

My assumption for the filter was that Min allele frequency or AF >0.12 and DP>=3. Now I am trying to get that Min. Allele Frequency fraction or the real allele fraction?

Satish

@satish86

Hi Satish,

Have a look at this thread. I am going to ask the team about adding a new annotation that calculates this for you, so you don't need to write out the long JEXL expression

-Sheila

Thanks Sheila, that would be really helpful .

Regards,

Satish

Hi Sheila,

Just want to check back with you on the new annotation. Do you have any updates on this.

Thanks you very much again for helping is with this.

Satish

@satish86

Hi Satish,

Ah, I completely forgot to ask about this at the last meeting! Sorry. I will ask the team next week and get back to you.

-Sheila

Awesome, Thanks Sheila!

@satish86

Hi Satish,

Sorry for getting your hopes up. I just talked to the developers, and they don't think the annotation is useful. In fact, they think it is wrong to be be filtering on the Allele frequency! Is there a reason you are filtering on allele frequency?

-Sheila

Hi Sheila,

All I am trying to get is 12% of the reads should support the allele at a coverage of 3X?

I can easily filter the VCF file based on coverage with DP>3, but I was looking for a parameter like MAF (minimum allele frequency), that way I can set the filter of MAF>0.12 and DP>3.

Is this possible?

Regards,

Satish

@satish86

Hi Satish,

Yes, you can use DP > 3 and AF > 0.12 (using the formula in the thread above) with VariantFiltration.

We don't recommend filtering on DP or AF because HaplotypeCaller takes into account many things before emitting a variant call. For example, a site may have a low depth but very high quality bases and mapping qualities. In the same way, the AF can be 0.1, but it means different things when the depth is 5, 100, or 500.

Our basic hard filtering recommendations are here. Of course, it is up to you to decide what additional filters you wish to apply

-Sheila

Hi Sheila,

I had a question! How Can I filter based on AD?

Say if AD (0,274), is 274 is the read depth of my allele? and 0 is the read depth of my Ref allele?

How can I set up a filter that gives all variant that has an allele depth >12?

Thanks,

Satish

@satish86

Hi Satish,

Have a look at this thread which should answer your question

-Sheila

Thanks for that Sheila. I had one more question

so from above if AD(3,274)

Allele Read depth :274 , Ref Read depth = 3

so % of Allele RD = (274/(274+3)) = 0.98

Is my assumption correct?

Satish

@satish86

Hi Satish,

Yes, that is correct.

-Sheila

Hi @samlj

We currently do not have a tool to do this specifically and yes you will need to create a custom script to do this.

@Sheila @bhanuGandham , as we know, the AF in germline should be 0.5 or 1, so can you give some reasons why the AF is not these two value, thanks a lot