Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

SelectVariants - Information from absent alleles

BurgundiaPRBurgundiaPR Lyon (France)Member


I have a problem using SelectVariants at multiallelic sites. For a given patient (let's call him P1) I want to keep only the positions which are variant in his genome. I use the following options :
--preserveAlleles : I keep the original form of the alleles, as they are called in the original vcf
--excludeNonVariants : I do not want 0/0 positions for the patient P1
--removeUnusedAlternates : I want only the alleles which are specific to P1

The last point is the problematic one. Yes, it partially work. For example, let's say I have this variant in the original VCF, with two alleles in my population :

chrZ 375987 . TA T,TAA

In the P1-only-VCF, after extraction, I only have (let's say that P1 is 0/1) :
chrZ 375987 . TA T
Which is correct.

Nevertheless, even if only the good allele is kept, all the information from the INFO fields is preserved (for all the alleles) .
A little sample from the ANN records of the P1-only-file :

I put in bold the information from an allele absent in P1. This is annoying because it disturb the interpretation. If anybody have a suggestion, it will be the very welcomed !

Thanks by advance,



  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi BPR,

    Can you confirm you are using the latest version of GATK? If so, I may need you to submit a bug report.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The important thing to check is how this third-party annotation is defined in the VCF header. If it's defined in a way that makes it clear it is encoded per-allele, then we should be able to parse and subset appropriately. If not, then we may not be able to do anything about this.

  • BurgundiaPRBurgundiaPR Lyon (France)Member


    Actually, it is not the last version but the 3.4 one. I just asked to our admin to update the software. I will give you the result with the new version, and will check the VCF header. In all cases, I think I will manage to take only the needed information, but I just wondered if there is a simple option I forgot in SelectVariants.

    Thanks for your answers,

Sign In or Register to comment.