Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Filtering by sample specific DP thresholds in a multi-sample vcf

Hi
I need to filter (set to missing) genotypes that fail their own sample specific DP threshold (from the genotype fields), in a multi-sample vcf.
Is there anyway to do this? e.g. sample1 DP<25 flagged, sample2 DP<41 flagged.
The JEXL guide says it doesnt work on genotype fields. The VariantFiltration seems to only allow a single value to be supplied.
Thanks
Clare

Answers

  • EADGEADG KielMember ✭✭✭

    Hi @prepagam,

    I would suggest that you split you multisample vcf into single files with SelectVariants, then Filter and last but not least merge the results together with CombineVariants.

    Greetings,
    EADG

  • prepagamprepagam Member
    edited January 2018

    My hesitation with that approach is that I need both variant and non variant sites (both must be filtered on DP), so the files are large. But I guess it maybe the only way.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You can compose queries against specific samples' genotype fields, but it gets complicated quite quickly. It should be in the JEXL docs somewhere

  • prepagamprepagam Member

    Just wanted to follow up with my findings on this in case someone else looks at my post ...
    You can't split the vcf by sample and filter, then recombine, if your vcf is an ALL_SITES vcf (at least with gatk 3.7). So that approach didn't work.
    I then tried JEXL - the issue is that I'm not trying to select/filter variants based on a threshold e.g. only retain the variant if sample 1 is > DP20 & sample 2 > DP 36 .... , rather I want to have the genotype become ./. if it fails a sample specific threshold. So for each sample, apply a different DP threshold, and if it fails, that individual sample DP becomes ./. And as far as I can see JEXL doesnt/cant do that.
    So I will write a python script.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @prepagam
    Hi,

    Thanks for sharing. If you look at the bottom of this page, you will see a way to filter on FORMAT level annotations.

    -Sheila

Sign In or Register to comment.