Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

Using SelectVariants to output "PASS" records

WaltLWaltL Posts: 10Member
edited January 2013 in Ask the GATK team

I have completed filtering my SNP data using VariantFiltration, and now I want to use SelectVariants to output all calls marked "PASS" in the FILTER field. I used the following script, but only the header information writes to the output file.

java -Xmx20g -jar GenomeAnalysisTK.jar -T SelectVariants -R HC.fa --variant HC.SNPs.filtered.vcf -select "FILTER == 'PASS'" -o HC.SNPs.passed.vcf

My input file contains many records that should evaluate as true. Any idea why this doesn't this work?

Post edited by Geraldine_VdAuwera on

Answers

  • ebanksebanks Posts: 683GATK Developer mod

    PASS means that the record is not filtered at all, so that expression won't work. You will need to use a more advanced JEXL expression (checking whether the VariantContext is filtered or not). See the docs of using JEXL expressions for more details.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • WaltLWaltL Posts: 10Member

    I did look at this JEXL doc first: http://gatkforums.broadinstitute.org/discussion/1255/what-are-jexl-expressions-and-how-can-i-use-them-with-the-gatk, and that's why I used the above expression, e.g. "MY_STRING_KEY == 'foo'"

    So, are you saying that a string designating a filtered record should work in this context? Because it does not. For example, I have a filter field name called LowQual, and if I run the cmd using - select "Filter == 'LowQual'" it also only returns just the header info.

    VariantContext is only mentioned under the "More Complex JEXL Magic" ("not for the faint of heart") section. Perhaps this could be updated somewhere to reflect that if one wants to select subsets of records based on their FILTER field entries, it cannot be done using the general expression given to select strings. I would think that filtering data followed by selecting only those that pass the filtering process would be a fairly common thing to want to do. Perhaps not...

    In any case, if anyone else wants to do this, here's an expression to select all of your PASS records:

    -select 'vc.isNotFiltered()'

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,412Administrator, GATK Developer admin

    Filtering then selecting what passes can more easily be done by directly selecting on whatever parameter you're using to filter...

    But generally the problem is that JEXL is a slippery topic. There's a lot of "it depends what you're trying to do", and we have a lot of other, more straightforward docs that need to be updated/spruced up. It'll be a while before we get around to revamping the JEXL doc, sorry. So, posting your solution is definitely helpful and we thank you for doing so!

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.