Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

All sites labeled as PASS after applying VariantFiltration to GenotypeGVCFs output

sp580sp580 GermanyMember

Hello!
I run VariantFiltration on my joint-called SNP and indel set vcf file (HaplotypeCaller -> CombineGVCFs -> GenotypeGVCFs). I applied the following command in GATK4.0.6.0:

gatk VariantFiltration \
    -R path_to/genome.fa \
    -V path_to/joint_call_set.vcf \
    --genotype-filter-expression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" \
    --genotype-filter-name "my_snp_filter" \
    -O path_to/joint_call_set_HARD.vcf

According to the documentation on hard filtering (https://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to-a-call-set), the resulting file should contain the labels "PASS" or "FILTER" if entries passed or not the any of the filters, respectively.

I checked how many entries remained after filtering and realized that all entries were kept:

## Lines in vcf body
grep -E -v "^#" joint_call_set_HARD.vcf | wc -l
20939832

## Lines flagged as PASS in column FILTER
grep -E -v "^#" joint_call_set_HARD.vcf | awk '{print $7}' | grep "PASS" | wc -l #the 7th column corresponds to "FILTER"
20939832

## Lines flagged as FILTER in column FILTER
grep -E -v "^#" joint_call_set_HARD.vcf | awk '{print $7}' | grep "FILTER" | wc -l #the 7th column corresponds to "FILTER"
0

Escentially, all entries passed the filters, which cannot be correct.

The file contains the genotype information for 60 samples, could this have something to do with the issue? (i.e. if one sample passes, then the whole entry is labeled as PASS)

This file is inteded to be used as a training set for VariantRecalibrator.

I appreciate your feedback.

Cheers!

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    Hi @sp580, your question made me realize that our documentation article did not explain clearly what happens when you run the filtering command. The doc incorrectly stated that filtered records would be tagged as FILTER in the filter field of the VCF, but in fact they are tagged with the name of the filter that you specify in your command. I corrected the document just now (but it may take a few hours for the live version to reflect the corrections).

    So in your case your grep command needs to search for my_snp_filter instead of FILTER. One way you could check for that in the future would be to flip your query around, eg search for records tagged with PASS instead. That would have shown you passing records. This is also something you can do when several filters have been applied and you don't want to have to list them all in your query, or when you don't know what they all are.

Answers

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @sp580

    I am looking into this issue, I will have an update for you by next week. Given the holiday week we are backed up on our end, but i will definitely get to this by next week.

    Regards
    Bhanu

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    Hi @sp580, your question made me realize that our documentation article did not explain clearly what happens when you run the filtering command. The doc incorrectly stated that filtered records would be tagged as FILTER in the filter field of the VCF, but in fact they are tagged with the name of the filter that you specify in your command. I corrected the document just now (but it may take a few hours for the live version to reflect the corrections).

    So in your case your grep command needs to search for my_snp_filter instead of FILTER. One way you could check for that in the future would be to flip your query around, eg search for records tagged with PASS instead. That would have shown you passing records. This is also something you can do when several filters have been applied and you don't want to have to list them all in your query, or when you don't know what they all are.

Sign In or Register to comment.