The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Using SelectVariants to output "PASS" records

WaltLWaltL Member Posts: 10
edited January 2013 in Ask the GATK team

I have completed filtering my SNP data using VariantFiltration, and now I want to use SelectVariants to output all calls marked "PASS" in the FILTER field. I used the following script, but only the header information writes to the output file.

java -Xmx20g -jar GenomeAnalysisTK.jar -T SelectVariants -R HC.fa --variant HC.SNPs.filtered.vcf -select "FILTER == 'PASS'" -o HC.SNPs.passed.vcf

My input file contains many records that should evaluate as true. Any idea why this doesn't this work?

Post edited by Geraldine_VdAuwera on

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 ✭✭✭

    PASS means that the record is not filtered at all, so that expression won't work. You will need to use a more advanced JEXL expression (checking whether the VariantContext is filtered or not). See the docs of using JEXL expressions for more details.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • WaltLWaltL Member Posts: 10

    I did look at this JEXL doc first: http://gatkforums.broadinstitute.org/discussion/1255/what-are-jexl-expressions-and-how-can-i-use-them-with-the-gatk, and that's why I used the above expression, e.g. "MY_STRING_KEY == 'foo'"

    So, are you saying that a string designating a filtered record should work in this context? Because it does not. For example, I have a filter field name called LowQual, and if I run the cmd using - select "Filter == 'LowQual'" it also only returns just the header info.

    VariantContext is only mentioned under the "More Complex JEXL Magic" ("not for the faint of heart") section. Perhaps this could be updated somewhere to reflect that if one wants to select subsets of records based on their FILTER field entries, it cannot be done using the general expression given to select strings. I would think that filtering data followed by selecting only those that pass the filtering process would be a fairly common thing to want to do. Perhaps not...

    In any case, if anyone else wants to do this, here's an expression to select all of your PASS records:

    -select 'vc.isNotFiltered()'

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,732 admin

    Filtering then selecting what passes can more easily be done by directly selecting on whatever parameter you're using to filter...

    But generally the problem is that JEXL is a slippery topic. There's a lot of "it depends what you're trying to do", and we have a lot of other, more straightforward docs that need to be updated/spruced up. It'll be a while before we get around to revamping the JEXL doc, sorry. So, posting your solution is definitely helpful and we thank you for doing so!

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.