Problemm with hard filtering

Hi All,
I am trying to use GATK for variant filtration (hard filtering) for non human species, using the below command after choosing SNPs only:

java -d64 -Xmx48g -jar /home/mbxao2/R-drive/tools/GATK/GenomeAnalysisTK.jar \
-R Ref.fa \
-T VariantFiltration \
-V input.vcf \
-o output.vcf \
--clusterWindowSize 10 \
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
--filterName "HARD_TO_VALIDATE" \
--filterExpression "DP < 5 " \
--filterName "LowCoverage" \
--filterExpression "QUAL < 30.0 " \
--filterName "VeryLowQual" \
--filterExpression "QUAL > 30.0 && QUAL < 50.0 " \
--filterName "LowQual" \
--filterExpression "QD < 2 " \
--filterName "LowQD" \
--filterExpression "MQRankSum < -12.5" \
--filterName "default_SNP_filter" \
--filterExpression "ReadPosRankSum < -8.0" \
--filterName "default_SNP_filter" \
--filterExpression "FS > 200 " \
--filterName "StrandBias"

but the below error appears for me:

INFO 15:24:26,590 GenomeAnalysisEngine - Preparing for traversal
INFO 15:24:26,611 GenomeAnalysisEngine - Done preparing for traversal
INFO 15:24:26,611 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 15:24:26,611 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 15:24:26,612 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime

ERROR --
ERROR stack trace

java.lang.NumberFormatException: For input string: "4.21"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.commons.jexl2.JexlArithmetic.toLong(JexlArithmetic.java:906)
at org.apache.commons.jexl2.JexlArithmetic.compare(JexlArithmetic.java:718)
at org.apache.commons.jexl2.JexlArithmetic.lessThan(JexlArithmetic.java:774)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:967)
at org.apache.commons.jexl2.parser.ASTLTNode.jjtAccept(ASTLTNode.java:18)
at org.apache.commons.jexl2.Interpreter.interpret(Interpreter.java:232)
at org.apache.commons.jexl2.ExpressionImpl.evaluate(ExpressionImpl.java:65)
at htsjdk.variant.variantcontext.JEXLMap.evaluateExpression(JEXLMap.java:178)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:94)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:15)
at htsjdk.variant.variantcontext.VariantContextUtils.match(VariantContextUtils.java:341)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.matchesFilter(VariantFiltration.java:483)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.buildVCfilters(VariantFiltration.java:474)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.filter(VariantFiltration.java:379)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.map(VariantFiltration.java:318)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.map(VariantFiltration.java:99)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: For input string: "4.21"
ERROR ------------------------------------------------------------------------------------------

I couldn't find any information about the error, please is there anything to do for this issue.

Thanks in advance,

Ahmed

Answers

  • EADGEADG KielMember

    Hi Ahmed_Aljumaili,

    hm did you try the following ?
    => A different VCF ?
    => Grep for 4.21 in your VCF ?
    =>Reduce the number of Filters? Perhaps only running with --filterName "HARD_TO_VALIDATE" \
    --filterExpression "DP < 5 " \ ?
    => an other Version of gatk?

  • Hi EADG,

    Thanks for replying. I have reduced the number of filters as you advised but the analysis crashed and showed the below:

    WARN 20:00:34,828 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,828 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,829 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0
    WARN 20:00:34,831 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0

    For GATK version, I am using v 3.7.

    I have tried to grep the error ("4.21"), but couldn't find it inside the file.

    Thanks,

    Ahmed

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Ahmed_Aljumaili
    Hi Ahmed,

    If you search for the error in the forum, you will get some helpful threads.

    -Sheila

  • @Sheila
    Hi Sheila,

    I have searched before and tried again but I couldn't find anything useful in the forum.

    Thanks for your help,

    Ahmed

  • EADGEADG KielMember

    Hi @Ahmed_Aljumaili ,

    this error: WARN 20:00:34,830 Interpreter - ![0,3]: 'MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1);' undefined variable MQ0 means that you have no fields in your vcf which is named MQ0. Please check...maybe you got a field named like MQ0F for the fraction.

    Greetings EADG

  • @EADG
    Hi,

    I have noticed it is MQ only not MQ0, so I have changed it and used the two parameters that you suggested at the beginning ((HARD_TO_VALIDATE and DP)). the error in this new case is {ERROR MESSAGE: For input string: "44.88"}.

    Is there any new idea,

    kindest regards,

  • EADGEADG KielMember
    HI @Ahmed_Aljumaili just change the 4 to 4.0 otherwise He would expect an int instead of a Double.
  • @EADG
    Hi,

    It is now back to the original error ((ERROR MESSAGE: For input string: "4.21")).
    is there anyone using this version of GATK 3.7? because in the error message saying it could be a bug.
    I don't know what to do next.
    Thanks,

    Ahmed

  • EADGEADG KielMember

    Hi @Ahmed_Aljumaili,

    do you change the QD and the FS to double values too ?

    --filterExpression "QD < 2.0 " \ --filterExpression "FS > 200.0 " \
    Greetings EADG

  • Hi @EADG
    I used the double values too, it still gives the same error.

    best regards,

    Ahmed

  • EADGEADG KielMember

    Hi @Ahmed_Aljumaili,

    hm can you post the first lines of your vcf here ?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Ahmed_Aljumaili
    Hi Ahmed,

    Can you try running each filter separately to check which one(s) throw an error?

    -Sheila

  • Hi @EADG,

    please see below the first lines of my vcf file.

    Thanks,

    Ahmed

    fileformat=VCFv4.2

    ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">

    FILTER=<ID=LowQual,Description="Low quality">

    FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

    FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">

    FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

    FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

    FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">

    FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">

    FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records w

    FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">

    FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">

    FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">

    GATKCommandLine.GenotypeGVCFs=<ID=GenotypeGVCFs,Version=3.7-0-gcfedb67,Date="Wed May 31 14:31:55 BST 2017",Epoch=1496237515862,CommandLineOptions="analysis_type=Genotyp

    GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.7-0-gcfedb67,Date="Mon May 29 20:07:18 BST 2017",Epoch=1496084838744,CommandLineOptions="analysis_type=Hap

    GATKCommandLine.SelectVariants=<ID=SelectVariants,Version=3.7-0-gcfedb67,Date="Fri Jun 02 19:35:02 BST 2017",Epoch=1496428502044,CommandLineOptions="analysis_type=Selec

    INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">

    INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">

    INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">

    INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">

    INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">

    INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">

    INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">

    INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">

    INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">

    INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">

    INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">

    INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">

    INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-W

    INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele,

    INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele,

    INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">

    INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">

    INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">

  • Ahmed, I had a similar problem when using '--filterExpression VQSLOD <= 0.' It turns out GATK was expecting a double but received an int. It might be nice if GATK was smart enough to convert integers to floats internally to avoid this problem. In any case, I resolved the error by doing '--filterExpression VQSLOD <= 0.0'.

    Can you try changing all integers in your filter expressions to floats? Like so:

    --filterExpression "MQ0 >= 4.0 && ((MQ0 / (1.0 * DP)) > 0.1)" \
    --filterName "HARD_TO_VALIDATE" \
    --filterExpression "DP < 5.0 " \
    --filterName "LowCoverage" \
    --filterExpression "QUAL < 30.0 " \
    --filterName "VeryLowQual" \
    --filterExpression "QUAL > 30.0 && QUAL < 50.0 " \
    --filterName "LowQual" \
    --filterExpression "QD < 2.0" \
    --filterName "LowQD" \
    --filterExpression "MQRankSum < -12.5" \
    --filterName "default_SNP_filter" \
    --filterExpression "ReadPosRankSum < -8.0" \
    --filterName "default_SNP_filter" \
    --filterExpression "FS > 200.0 " \

  • Dear
    @amr@broadinstitute.orge, thanks for helping me. I have tried what you mentioned but it didn't work.

    Dear @Sheila

    when I use each filter separately, 4 of them crashed and they could the source of the problem:
    1- -filterExpression "MQ0 >= 4.0 && ((MQ0 / (1.0 * DP)) > 0.1)" \
    --filterName "HARD_TO_VALIDATE" \

    2- --filterExpression "QD < 2.0" \
    --filterName "LowQD" \

    3- --filterExpression "MQRankSum < -12.5" \
    --filterName "default_SNP_filter" \

    4- --filterExpression "ReadPosRankSum < -8.0" \
    --filterName "default_SNP_filter" \

    The rest of filters worked smoothly. Is there an explanation why these filters didn't work?

    Thanks for you All,

    Ahmed

  • Hi All,

    sorry if I am rushing to the answer!

    Is there any way to solve my problem.

    Thanks,

    Ahmed

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Ahmed_Aljumaili
    Hi Ahmed,

    Hmm. Those (with the exception of 1) look fine. The issue with 1 is that DP sometime is 0, and you cannot divide by 0. But, 2-4 look fine. Can you please try with the latest nightly build?

    Thanks,
    Sheila

  • @Sheila
    Hi Sheila,

    I have used the latest nightly build, it only works for no. 2 with warning message during the process.
    it didn't work for the others.

    Regards,

    Ahmed

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Ahmed_Aljumaili
    Hi Ahmed,

    Okay. Can you submit a bug report? Instructions are here.

    Thanks.
    Sheila

  • @Sheila

    Hi Sheila,

    I have been trying to log in ftp but I couldn't, the error message below
    ((550 /bundle/: No such file or directory))

    What I did is simply using the information from the page you told me about as a link:
    ftp://gsapubftp@ftp.broadinstitute.org/bundle/

    Thanks,

    Ahmed

  • @Sheila
    Hi,

    is there any other way to submit a bug since the first one didn't work?

    best wishes,

    Ahmed

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Ahmed_Aljumaili
    Hi Ahmed,

    Sorry you are having so much trouble! Can you try a few more times? Is your VCF small enough to attach here?

    Thanks,
    Sheila

Sign In or Register to comment.