Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Using SelectVariants to output "PASS" records

WaltLWaltL Member
edited January 2013 in Ask the GATK team

I have completed filtering my SNP data using VariantFiltration, and now I want to use SelectVariants to output all calls marked "PASS" in the FILTER field. I used the following script, but only the header information writes to the output file.

java -Xmx20g -jar GenomeAnalysisTK.jar -T SelectVariants -R HC.fa --variant HC.SNPs.filtered.vcf -select "FILTER == 'PASS'" -o HC.SNPs.passed.vcf

My input file contains many records that should evaluate as true. Any idea why this doesn't this work?

Post edited by Geraldine_VdAuwera on

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    PASS means that the record is not filtered at all, so that expression won't work. You will need to use a more advanced JEXL expression (checking whether the VariantContext is filtered or not). See the docs of using JEXL expressions for more details.

  • WaltLWaltL Member

    I did look at this JEXL doc first: http://gatkforums.broadinstitute.org/discussion/1255/what-are-jexl-expressions-and-how-can-i-use-them-with-the-gatk, and that's why I used the above expression, e.g. "MY_STRING_KEY == 'foo'"

    So, are you saying that a string designating a filtered record should work in this context? Because it does not. For example, I have a filter field name called LowQual, and if I run the cmd using - select "Filter == 'LowQual'" it also only returns just the header info.

    VariantContext is only mentioned under the "More Complex JEXL Magic" ("not for the faint of heart") section. Perhaps this could be updated somewhere to reflect that if one wants to select subsets of records based on their FILTER field entries, it cannot be done using the general expression given to select strings. I would think that filtering data followed by selecting only those that pass the filtering process would be a fairly common thing to want to do. Perhaps not...

    In any case, if anyone else wants to do this, here's an expression to select all of your PASS records:

    -select 'vc.isNotFiltered()'

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Filtering then selecting what passes can more easily be done by directly selecting on whatever parameter you're using to filter...

    But generally the problem is that JEXL is a slippery topic. There's a lot of "it depends what you're trying to do", and we have a lot of other, more straightforward docs that need to be updated/spruced up. It'll be a while before we get around to revamping the JEXL doc, sorry. So, posting your solution is definitely helpful and we thank you for doing so!

Sign In or Register to comment.