SelectVariants and discordance

nikmal

Greetings GATK team!

I hope I'm not making a duplicate question here, but I couldn't find anything regarding this in the forum.

Basically, what I want to do is to use SelectVariants to filter against another call set, but I do not want to be as strict as using -discordance (i.e. 100% discordance rate between the two call sets). I want to say for example: "filter call set A against variants that occur in >90% of call set B".

Is there a way to do this with JEXL expressions maybe?

Kind regards

  nikmal
    edited May 2013

    @Geraldine_VdAuwera said:
    Hi there, sorry to get to your question so late.

    I want to say that this should be possible to do with JEXL but off the top of my head I can't think of a straightforward way to do this in a single step. Maybe instead, do a first round of selecting variants that occur above your desired threshold in call set B, then filter call set A using discordance vs. that subset.

    If you come up (maybe already done so) with a good way to do this please share your solution with the community, as I'm sure others will be interested. Eventually we'd like to put together a "cookbook" of good variant selection and filtering solutions.

    Hi Geraldine,

    No problem, in fact I already solved the problem like you described. I wrote a simple script in Python that counts the occurrence of a variant and keeps it according to a specified cutoff. After that, I used SelectVariants with --discordance vs the output from the script (i.e. the subset of variants that occur in max. X% of the population).

    I will make the script available as soon as I can, in case someone is interested.


    Here is a link to the script, feel free to modify it as you wish:

  Geraldine_VdAuwera

    Thanks for sharing your solution!

