We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Filtering does not work due to missing MQ values

anapaulamachadoanapaulamachado LausanneMember
edited July 2018 in Ask the GATK team

Hello,

I'm trying to filter my VCF on MQ value. I've done it for other annotations using VariantFiltration with no problems, but MQ is just not collaborating.
So first I had it in an expression with the other filters, and realized that the filter on MQ was not working when checking the output as there were lots of call with values that should have been filtered.

Then I did it individually (--filterExpression "MQ<40.0") and it starts running fine until it collapses when it encounters a call with missing data for MQ. Error message:
Message: for input string "nan"

I've checked with grep and indeed there're some missing MQ values (why?) but it seems to me the tool should be able to ignore these...? I can't seem to find an option to tell it to ignore them - is there a way to do it?
I also tried with SelectVariants and got the exact same error.

I'd appreciate any help you can give me here !
Thanks,
Ana

PS - I'm on GATK 3.7

Best Answers

Answers

  • anapaulamachadoanapaulamachado LausanneMember
    Accepted Answer

    @Sheila
    Hi again,
    In this case, using GATK3.8.1 has worked!
    It applied the filter without any error and it has removed the NA calls as well.

    Thanks a lot :)
    Ana

  • кфьутылнкфьутылн Member
    edited November 2019

    Hi @Sheila ,

    unfortunately we seem to have the same bug with 3.8-1-0-gf15c1c3ef as well, please see below.

    Would you have any suggestions? Many thanks,
    Vasily Ramensky.

    $ gatk3 -T VariantFiltration \
    -R /mnt/jupiter/storage/genomes/hg38_canon_bwa/hg38.fa \
    -o output_f.vcf \
    --variant $infile \
    --clusterSize 3 \
    --clusterWindowSize 10 \
    --filterExpression "MQ < 40.00" \
    --filterName "LowMQ"

    Last login: Thu Nov 7 21:32:26 2019 from 46.39.51.3
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: For input string: "nan"
    ERROR ------------------------------------------------------------------------------------------
  • P.S. The "nan" problem is caused by HaplotypeCaller output cases like this:

    chr2 239899068 chr2:239899068_G/A G A 15.65 . AC=2;AF=1;AN=2;DP=0;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=nan;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,0:0:3:42,3,0

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited November 2019

    Hi @кфьутылн

    We do not support GATK3 anymore. Please upgrade to the latest version of GATK4

Sign In or Register to comment.