The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

SelectVariants and discordance

Greetings GATK team!

I hope I'm not making a duplicate question here, but I couldn't find anything regarding this in the forum.

Basically, what I want to do is to use SelectVariants to filter against another call set, but I do not want to be as strict as using -discordance (i.e. 100% discordance rate between the two call sets). I want to say for example: "filter call set A against variants that occur in >90% of call set B".

Is there a way to do this with JEXL expressions maybe?

Kind regards

Best Answer

Answers

  • nikmalnikmal Member
    edited May 2013

    @Geraldine_VdAuwera said:
    Hi there, sorry to get to your question so late.

    I want to say that this should be possible to do with JEXL but off the top of my head I can't think of a straightforward way to do this in a single step. Maybe instead, do a first round of selecting variants that occur above your desired threshold in call set B, then filter call set A using discordance vs. that subset.

    If you come up (maybe already done so) with a good way to do this please share your solution with the community, as I'm sure others will be interested. Eventually we'd like to put together a "cookbook" of good variant selection and filtering solutions.

    Hi Geraldine,

    No problem, in fact I already solved the problem like you described. I wrote a simple script in Python that counts the occurrence of a variant and keeps it according to a specified cutoff. After that, I used SelectVariants with --discordance vs the output from the script (i.e. the subset of variants that occur in max. X% of the population).

    I will make the script available as soon as I can, in case someone is interested.

    EDIT:

    Here is a link to the script, feel free to modify it as you wish: https://gist.github.com/keyoke1337/5676846

    Post edited by nikmal on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Thanks for sharing your solution!

Sign In or Register to comment.