Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Unexpected sites after filtering variants

sp580sp580 GermanyMember

Hello,
I have filtered my call set with VariantFiltrationusing the following filter:

QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0

Then I extracted all sites (including filtered records) in the form of a table with VariantsToTable

Checking the ranges of each annotation that passed the filter (labeled as PASSin field FILTER), I see an unexpected behavior for MQ: some of the records (PASS) are below 40.

For the rest of the annotations, the ranges appear as expected, as shown below:

QD:   2.000 (min),  46.37 (max)
FS:   0.000 (min), 60.00 (max)
MQ:   0.850 (min), 725.20 (max)
MQRankSum: -12.500 (min), 26.43 (max)
ReadPosRankSum:  -7.818 (min), 30.39 (max)

However, the amount of unexpected records (PASS& MQ < 40.0) were only 95 (out of 13126298).

I am sure I am missing something in my interpretation, but the way I understand this, MQshould have a minimum larger or equal 40 for those records that passed.

Could you please help me to make sense of this?

Best Answer

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    No. Because the OR operator that is used in that filtration expression returns true if any of those individual expressions return a true value. That means you may have PASS variants with MQ lower than 40 if other criteria matches the expression. If you use the AND operator then you will be less severely filtering your variants due to lacking all the expressions at the expected levels.

  • sp580sp580 GermanyMember

    I get that the ORoperator should filter out variants if any of the individual expressions is TRUE

    For example, for FS > 60:

    |CHROM |     POS|FILTER           |    QD|      FS|    MQ| MQRankSum| ReadPosRankSum|
    |1     | 3001247|filters_Harr2016 |  4.41| 146.913| 48.01|     0.000|         -1.087|
    |1     | 3001250|filters_Harr2016 |  4.56| 102.090| 53.57|     0.000|         -0.932|
    |1     | 3001251|filters_Harr2016 |  4.51|  90.464| 47.00|     0.442|         -0.932|
    |1     | 3006779|filters_Harr2016 |  5.06|  89.657| 60.06|     0.000|         -0.585|
    |1     | 3010834|filters_Harr2016 |  8.77|  89.289| 46.48|     0.000|         -0.123|
    |1     | 3010837|filters_Harr2016 | 12.83|  66.262| 68.10|     0.000|         -1.559|
    

    FS > 60 evaluates to TRUE and all records are filtered, even though the other expressions evaluate to FALSE... so far so good

    This is what happens with MQ < 40 (first 6 records out of 95 unexpected):

    |CHROM |       POS|FILTER | QD|    FS|    MQ| MQRankSum| ReadPosRankSum|
    |1     |  45062880|PASS   | NA| 0.000|  9.00|        NA|             NA|
    |1     |  65502466|PASS   | NA| 3.565| 17.57|     0.408|         -1.926|
    |1     | 112137267|PASS   | NA| 0.000| 13.09|        NA|             NA|
    |1     | 172938970|PASS   | NA| 0.000| 14.02|        NA|             NA|
    |10    |  93392660|PASS   | NA| 0.000| 15.37|        NA|             NA|
    |11    |  55903055|PASS   | NA| 0.000| 17.33|        NA|             NA|
    

    MQ < 40 evaluates to TRUE, but still records are a PASS. Strangely, most of such records have FS=0.000 and QD, MQRankSumand ReadPosRankSumas NAs.

    Still I do not know how to make of these 95 odd records, why do they PASS if MQ < 40evaluates to TRUE?

  • sp580sp580 GermanyMember

    Thanks @shlee

    I guess I will have re-run the analysis :-( either by using parentheses around each expression or by using multiple filter flags

Sign In or Register to comment.