We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

SelectVariants - Information from absent alleles

BurgundiaPRBurgundiaPR Lyon (France)Member


I have a problem using SelectVariants at multiallelic sites. For a given patient (let's call him P1) I want to keep only the positions which are variant in his genome. I use the following options :
--preserveAlleles : I keep the original form of the alleles, as they are called in the original vcf
--excludeNonVariants : I do not want 0/0 positions for the patient P1
--removeUnusedAlternates : I want only the alleles which are specific to P1

The last point is the problematic one. Yes, it partially work. For example, let's say I have this variant in the original VCF, with two alleles in my population :

chrZ 375987 . TA T,TAA

In the P1-only-VCF, after extraction, I only have (let's say that P1 is 0/1) :
chrZ 375987 . TA T
Which is correct.

Nevertheless, even if only the good allele is kept, all the information from the INFO fields is preserved (for all the alleles) .
A little sample from the ANN records of the P1-only-file :

I put in bold the information from an allele absent in P1. This is annoying because it disturb the interpretation. If anybody have a suggestion, it will be the very welcomed !

Thanks by advance,



  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi BPR,

    Can you confirm you are using the latest version of GATK? If so, I may need you to submit a bug report.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The important thing to check is how this third-party annotation is defined in the VCF header. If it's defined in a way that makes it clear it is encoded per-allele, then we should be able to parse and subset appropriately. If not, then we may not be able to do anything about this.

  • BurgundiaPRBurgundiaPR Lyon (France)Member


    Actually, it is not the last version but the 3.4 one. I just asked to our admin to update the software. I will give you the result with the new version, and will check the VCF header. In all cases, I think I will manage to take only the needed information, but I just wondered if there is a simple option I forgot in SelectVariants.

    Thanks for your answers,

Sign In or Register to comment.