Unexpected sites after filtering variants

sp580sp580 GermanyMember

Hello,
I have filtered my call set with VariantFiltration using the following filter:

QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0

Then I extracted all sites (including filtered records) in the form of a table with VariantsToTable

Checking the ranges of each annotation that passed the filter (labeled as PASS in field FILTER), I see an unexpected behavior for MQ: some of the records (PASS) are below 40.

For the rest of the annotations, the ranges appear as expected, as shown below:

QD:   2.000 (min),  46.37 (max)
FS:   0.000 (min), 60.00 (max)
MQ:   0.850 (min), 725.20 (max)
MQRankSum: -12.500 (min), 26.43 (max)
ReadPosRankSum:  -7.818 (min), 30.39 (max)

However, the amount of unexpected records (PASS & MQ < 40.0) were only 95 (out of 13126298).

I am sure I am missing something in my interpretation, but the way I understand this, MQ should have a minimum larger or equal 40 for those records that passed.

Could you please help me to make sense of this?

Best Answer

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    No. Because the OR operator that is used in that filtration expression returns true if any of those individual expressions return a true value. That means you may have PASS variants with MQ lower than 40 if other criteria matches the expression. If you use the AND operator then you will be less severely filtering your variants due to lacking all the expressions at the expected levels.

  • sp580sp580 GermanyMember

    I get that the OR operator should filter out variants if any of the individual expressions is TRUE

    For example, for FS > 60:

    |CHROM |     POS|FILTER           |    QD|      FS|    MQ| MQRankSum| ReadPosRankSum|
    |1     | 3001247|filters_Harr2016 |  4.41| 146.913| 48.01|     0.000|         -1.087|
    |1     | 3001250|filters_Harr2016 |  4.56| 102.090| 53.57|     0.000|         -0.932|
    |1     | 3001251|filters_Harr2016 |  4.51|  90.464| 47.00|     0.442|         -0.932|
    |1     | 3006779|filters_Harr2016 |  5.06|  89.657| 60.06|     0.000|         -0.585|
    |1     | 3010834|filters_Harr2016 |  8.77|  89.289| 46.48|     0.000|         -0.123|
    |1     | 3010837|filters_Harr2016 | 12.83|  66.262| 68.10|     0.000|         -1.559|
    

    FS > 60 evaluates to TRUE and all records are filtered, even though the other expressions evaluate to FALSE... so far so good

    This is what happens with MQ < 40 (first 6 records out of 95 unexpected):

    |CHROM |       POS|FILTER | QD|    FS|    MQ| MQRankSum| ReadPosRankSum|
    |1     |  45062880|PASS   | NA| 0.000|  9.00|        NA|             NA|
    |1     |  65502466|PASS   | NA| 3.565| 17.57|     0.408|         -1.926|
    |1     | 112137267|PASS   | NA| 0.000| 13.09|        NA|             NA|
    |1     | 172938970|PASS   | NA| 0.000| 14.02|        NA|             NA|
    |10    |  93392660|PASS   | NA| 0.000| 15.37|        NA|             NA|
    |11    |  55903055|PASS   | NA| 0.000| 17.33|        NA|             NA|
    

    MQ < 40 evaluates to TRUE, but still records are a PASS. Strangely, most of such records have FS=0.000 and QD, MQRankSum and ReadPosRankSum as NAs.

    Still I do not know how to make of these 95 odd records, why do they PASS if MQ < 40evaluates to TRUE?

  • sp580sp580 GermanyMember

    Thanks @shlee

    I guess I will have re-run the analysis :-( either by using parentheses around each expression or by using multiple filter flags

Sign In or Register to comment.