Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SelectVariants - Information from absent alleles

BurgundiaPRBurgundiaPR Lyon (France)Member

Hi,

I have a problem using SelectVariants at multiallelic sites. For a given patient (let's call him P1) I want to keep only the positions which are variant in his genome. I use the following options :
--preserveAlleles : I keep the original form of the alleles, as they are called in the original vcf
--excludeNonVariants : I do not want 0/0 positions for the patient P1
--removeUnusedAlternates : I want only the alleles which are specific to P1

The last point is the problematic one. Yes, it partially work. For example, let's say I have this variant in the original VCF, with two alleles in my population :

chrZ 375987 . TA T,TAA

In the P1-only-VCF, after extraction, I only have (let's say that P1 is 0/1) :
chrZ 375987 . TA T
Which is correct.

Nevertheless, even if only the good allele is kept, all the information from the INFO fields is preserved (for all the alleles) .
A little sample from the ANN records of the P1-only-file :
ANN=T|intron_variant|MODIFIER|GENE|GENE|transcript|NM_TR.1|Coding|1/2|c.62+5446delT||||||,
TAA|intron_variant|MODIFIER|GENE|GENE|transcript|NM_TR.1|Coding|1/2|c.62+5445_62+5446insT

I put in bold the information from an allele absent in P1. This is annoying because it disturb the interpretation. If anybody have a suggestion, it will be the very welcomed !

Thanks by advance,
BPR

Tagged:

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @BurgundiaPR
    Hi BPR,

    Can you confirm you are using the latest version of GATK? If so, I may need you to submit a bug report.

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The important thing to check is how this third-party annotation is defined in the VCF header. If it's defined in a way that makes it clear it is encoded per-allele, then we should be able to parse and subset appropriately. If not, then we may not be able to do anything about this.

  • BurgundiaPRBurgundiaPR Lyon (France)Member

    Hi,

    Actually, it is not the last version but the 3.4 one. I just asked to our admin to update the software. I will give you the result with the new version, and will check the VCF header. In all cases, I think I will manage to take only the needed information, but I just wondered if there is a simple option I forgot in SelectVariants.

    Thanks for your answers,
    BPR

Sign In or Register to comment.