Issues in Filtering by AF

Hi,

I have used the GATK pipeline and generated vcf files after using Haplotypecaller, GenomicsDBImport and GenotypeGVCFs.

I am working with a non model organism and I am currently filtering the variants.

This is my JEXL for VariantFiltration:

"QD < 2.0 || FS > 60.0 || MQ < 50.0 || MQRankSum < -2.0 || ReadPosRankSum < -2.0 || AF < 0.01"

Everything is fine with this expression except that a few very low AF values have been recorded as power by GenotypeGVCFS in the final vcf.
For example: AF=1.094e-03 which is the result of 1/914, and needs to be filtered out as it is well below 0.01.

These AF are not filtered out by my expression and I am trying to find an alternative way to remove them. Any suggestion?

Best Answers

  • OakmanOakman
    Accepted Answer

    Hi @bhanuGandham

    Forgive me for the late reply, but I took a couple of days off work as I had to move house.
    So, I will double check this on Monday, but I believe it was my mistake and I think also the exponential values are filtered out ok, therefore I apologize.

    I was originally put onto this by my misinterpretation of the bcftools stats command. It gives a distribution of AF, and after filtering by AF I was surprised to see that I still had values below 0.01 so I checked my vcf file manually (and I believe I checked the wrong file here, I believe that I have checked the one before filtering, but I will double check on monday) and I found some of the exponential AF values there and raised the flag with my post here.

    Now I have understood that those AF distribution reported by bcftools stats are only estimated (not precise calculation using AN and AC and also are mid-point bins, so even if you filter by 0.01 in the bcf stats output you will have some AF in a bin less than 0.01 (I hope this makes sense).

    Oakman

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @Oakman

    What version of gatk are you using? and please send us the exact command.

  • OakmanOakman Member
    Accepted Answer

    Hi @bhanuGandham

    Forgive me for the late reply, but I took a couple of days off work as I had to move house.
    So, I will double check this on Monday, but I believe it was my mistake and I think also the exponential values are filtered out ok, therefore I apologize.

    I was originally put onto this by my misinterpretation of the bcftools stats command. It gives a distribution of AF, and after filtering by AF I was surprised to see that I still had values below 0.01 so I checked my vcf file manually (and I believe I checked the wrong file here, I believe that I have checked the one before filtering, but I will double check on monday) and I found some of the exponential AF values there and raised the flag with my post here.

    Now I have understood that those AF distribution reported by bcftools stats are only estimated (not precise calculation using AN and AC and also are mid-point bins, so even if you filter by 0.01 in the bcf stats output you will have some AF in a bin less than 0.01 (I hope this makes sense).

    Oakman

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @Oakman

    Thank you for your update. This will be helpful to other users.

Sign In or Register to comment.