On Monday and Tuesday, November 12-13, the communications team will be out of the office for a U.S. federal holiday and a team event. We will be back in action on November 14th and apologize for any inconvenience this may cause. Thank you for using the forum.

SelectVariants - Information from absent alleles

BurgundiaPRBurgundiaPR Lyon (France)Member

Hi,

I have a problem using SelectVariants at multiallelic sites. For a given patient (let's call him P1) I want to keep only the positions which are variant in his genome. I use the following options :
--preserveAlleles : I keep the original form of the alleles, as they are called in the original vcf
--excludeNonVariants : I do not want 0/0 positions for the patient P1
--removeUnusedAlternates : I want only the alleles which are specific to P1

The last point is the problematic one. Yes, it partially work. For example, let's say I have this variant in the original VCF, with two alleles in my population :

chrZ 375987 . TA T,TAA

In the P1-only-VCF, after extraction, I only have (let's say that P1 is 0/1) :
chrZ 375987 . TA T
Which is correct.

Nevertheless, even if only the good allele is kept, all the information from the INFO fields is preserved (for all the alleles) .
A little sample from the ANN records of the P1-only-file :
ANN=T|intron_variant|MODIFIER|GENE|GENE|transcript|NM_TR.1|Coding|1/2|c.62+5446delT||||||,
TAA|intron_variant|MODIFIER|GENE|GENE|transcript|NM_TR.1|Coding|1/2|c.62+5445_62+5446insT

I put in bold the information from an allele absent in P1. This is annoying because it disturb the interpretation. If anybody have a suggestion, it will be the very welcomed !

Thanks by advance,
BPR

Tagged:

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @BurgundiaPR
    Hi BPR,

    Can you confirm you are using the latest version of GATK? If so, I may need you to submit a bug report.

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The important thing to check is how this third-party annotation is defined in the VCF header. If it's defined in a way that makes it clear it is encoded per-allele, then we should be able to parse and subset appropriately. If not, then we may not be able to do anything about this.

  • BurgundiaPRBurgundiaPR Lyon (France)Member

    Hi,

    Actually, it is not the last version but the 3.4 one. I just asked to our admin to update the software. I will give you the result with the new version, and will check the VCF header. In all cases, I think I will manage to take only the needed information, but I just wondered if there is a simple option I forgot in SelectVariants.

    Thanks for your answers,
    BPR

Sign In or Register to comment.