Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Issues in Filtering by AF

Hi,

I have used the GATK pipeline and generated vcf files after using Haplotypecaller, GenomicsDBImport and GenotypeGVCFs.

I am working with a non model organism and I am currently filtering the variants.

This is my JEXL for VariantFiltration:

"QD < 2.0 || FS > 60.0 || MQ < 50.0 || MQRankSum < -2.0 || ReadPosRankSum < -2.0 || AF < 0.01"

Everything is fine with this expression except that a few very low AF values have been recorded as power by GenotypeGVCFS in the final vcf.
For example: AF=1.094e-03 which is the result of 1/914, and needs to be filtered out as it is well below 0.01.

These AF are not filtered out by my expression and I am trying to find an alternative way to remove them. Any suggestion?

Best Answers

  • OakmanOakman
    Accepted Answer

    Hi @bhanuGandham

    Forgive me for the late reply, but I took a couple of days off work as I had to move house.
    So, I will double check this on Monday, but I believe it was my mistake and I think also the exponential values are filtered out ok, therefore I apologize.

    I was originally put onto this by my misinterpretation of the bcftools stats command. It gives a distribution of AF, and after filtering by AF I was surprised to see that I still had values below 0.01 so I checked my vcf file manually (and I believe I checked the wrong file here, I believe that I have checked the one before filtering, but I will double check on monday) and I found some of the exponential AF values there and raised the flag with my post here.

    Now I have understood that those AF distribution reported by bcftools stats are only estimated (not precise calculation using AN and AC and also are mid-point bins, so even if you filter by 0.01 in the bcf stats output you will have some AF in a bin less than 0.01 (I hope this makes sense).

    Oakman

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @Oakman

    What version of gatk are you using? and please send us the exact command.

  • OakmanOakman Member
    Accepted Answer

    Hi @bhanuGandham

    Forgive me for the late reply, but I took a couple of days off work as I had to move house.
    So, I will double check this on Monday, but I believe it was my mistake and I think also the exponential values are filtered out ok, therefore I apologize.

    I was originally put onto this by my misinterpretation of the bcftools stats command. It gives a distribution of AF, and after filtering by AF I was surprised to see that I still had values below 0.01 so I checked my vcf file manually (and I believe I checked the wrong file here, I believe that I have checked the one before filtering, but I will double check on monday) and I found some of the exponential AF values there and raised the flag with my post here.

    Now I have understood that those AF distribution reported by bcftools stats are only estimated (not precise calculation using AN and AC and also are mid-point bins, so even if you filter by 0.01 in the bcf stats output you will have some AF in a bin less than 0.01 (I hope this makes sense).

    Oakman

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @Oakman

    Thank you for your update. This will be helpful to other users.

Sign In or Register to comment.