The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

"ArithmeticException" in VariantFiltration and "Invalid JEXL expression detected" in SelectVariants

zugzugzugzug Member Posts: 20
edited September 2012 in Ask the GATK team

Hi,

I have been trying get variants out of a VCF file where the Allele Frequency (AF) is greater than 4%. I have tried both VariantFiltration and SelectVariants but I get different errors with each. Here is my call for SelectVariants:

java -Xmx4g -jar ~/tools/bin/GenomeAnalysisTK.jar -R /home/genome/human_g1k_v37.truseq_mask.fasta -T SelectVariants -o S05-16209-1C_S4_L001_R1_001.30.10.sorted.3perc.vcf --variant S05-16209-1C_S4_L001_R1_001.30.10.sorted.vcf -select "AF > 0.04" -sn "S05-16209-1C_S4_L001_R1_001"

The error is:

MESSAGE: Invalid command line: Invalid JEXL expression detected for select-0 with message ![0,9]: 'AF > 0.04;' > error

For VariantFiltration the call is:

java -Xmx4g -jar ~/tools/bin/GenomeAnalysisTK.jar -R /home/genome/human_g1k_v37.truseq_mask.fasta -T VariantFiltration -o S05-16209-1C_S4_L001_R1_001.30.10.sorted.3perc.vcf --variant S05-16209-1C_S4_L001_R1_001.30.10.sorted.vcf --filterExpression 'AF > 0.040' --filterName "3perc"

The error is:

java.lang.ArithmeticException: Double coercion: java.util.ArrayList:([0.010, 0.010])
at org.apache.commons.jexl2.JexlArithmetic.toDouble(JexlArithmetic.java:1023)
at org.apache.commons.jexl2.JexlArithmetic.compare(JexlArithmetic.java:699)
at org.apache.commons.jexl2.JexlArithmetic.greaterThan(JexlArithmetic.java:790)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:796)
at org.apache.commons.jexl2.parser.ASTGTNode.jjtAccept(ASTGTNode.java:18)
at org.apache.commons.jexl2.Interpreter.interpret(Interpreter.java:232)
at org.apache.commons.jexl2.ExpressionImpl.evaluate(ExpressionImpl.java:65)
at org.broadinstitute.sting.utils.variantcontext.JEXLMap.evaluateExpression(VariantJEXLContext.java:267)
at org.broadinstitute.sting.utils.variantcontext.JEXLMap.get(VariantJEXLContext.java:233)
at org.broadinstitute.sting.utils.variantcontext.JEXLMap.get(VariantJEXLContext.java:118)
at org.broadinstitute.sting.utils.variantcontext.VariantContextUtils.match(VariantContextUtils.java:293)
at org.broadinstitute.sting.gatk.walkers.filters.VariantFiltration.filter(VariantFiltration.java:331)
at org.broadinstitute.sting.gatk.walkers.filters.VariantFiltration.map(VariantFiltration.java:270)
at org.broadinstitute.sting.gatk.walkers.filters.VariantFiltration.map(VariantFiltration.java:80)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

For both I have tried variations of double quotes and different sigfigs. Also, it works when I select on parameters other than AF.

Am I missing something?

Post edited by Geraldine_VdAuwera on

Best Answer

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin
    Accepted Answer

    Your problem is that for multi-allelic variants the AF is not a double - it's a list of doubles. So the expression "AF > 0.04" is invalid in those cases.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,405 admin

    Have you checked that all your variants have the AF annotation? If it's missing in some of the variants in your VCF, the evaluation will fail -- and the error message tends to be unhelpful.

    Geraldine Van der Auwera, PhD

  • zugzugzugzug Member Posts: 20

    Thank you for the prompt response!

    I just performed the following command and all I got back was the header: grep -v "AF=" ../MiSeq_vs_Torrent-GATK/MiSeq/4plex/S05-16209-1C_S4_L001_R1_001.30.10.sorted.vcf | less

    Any other ideas?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,405 admin

    If I remember correctly you should be able to extract variants that have the AF annotation with this command:

    java -jar GenomeAnalysisTK.jar -T SelectVariants -R ref.fasta --variant my.vcf -o withAF.vcf -select 'vc.hasAttribute("AF")'
    

    Then you can run your original command on the output subsetted VCF.

    I'll see if others on the team can chime in -- I don't use JEXL much myself so there may be a more efficient way to do this -- but this should at least give you a way forward for now.

    For more detail on using JEXL expressions, see http://gatkforums.broadinstitute.org/discussion/1255/what-are-jexl-expressions-and-how-can-i-use-them-with-the-gatk

    Geraldine Van der Auwera, PhD

  • zugzugzugzug Member Posts: 20

    I'll give that a shot in a bit, but there weren't any lines missing AF. I have also been through the JEXL page and didn't find anything that I did incorrectly. Though I certainly may have missed it... :)

    Thanks for your help!

  • zugzugzugzug Member Posts: 20

    Let clarify that the grep command I did before means that ALL variants had the AF annotation.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,405 admin

    Ah, thanks for the clarification, that's not what I had understood -- then I'm not sure what your problem might be, since your expression is pretty simple and seems correctly written.

    The error message suggests a type error, so you could try checking how the values are written in your VCF. If any of your AF fields are written as anything other than doubles (maybe a one is a 0 instead of 0.0, causing it to be read in as an integer), that would cause the evaluation against your query to fail. Maybe try to run on a few short intervals, and see if you can isolate some records on which your query works and some on which it doesn't, then compare what is different between them.

    Good luck!

    Geraldine Van der Auwera, PhD

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin
    Accepted Answer

    Your problem is that for multi-allelic variants the AF is not a double - it's a list of doubles. So the expression "AF > 0.04" is invalid in those cases.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • zugzugzugzug Member Posts: 20
    edited September 2012

    Yes, the list of AF's was, in fact, the problem. Two questions: (1) What do you suggest is the best approach for handling these situations; and (2) Do you all plan to handle this situation within the tools in the near future?

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    If you want to use the AF filter with JEXL expressions then you'll need to skip multi-allelic sites. Unfortunately, it isn't something we plan on addressing within the JEXL code anytime soon.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • zugzugzugzug Member Posts: 20

    Ok, by "skip" you mean the argument --restrictAlleles BIALLELIC in SelectVariants. Thanks again!

  • nroaknroak HoustonMember Posts: 30

    @zugzug said:
    Ok, by "skip" you mean the argument --restrictAlleles BIALLELIC in SelectVariants. Thanks again!

    This did the trick for me. Thank you @zugzug !

Sign In or Register to comment.