To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

SelectVariants - Information from absent alleles

BurgundiaPRBurgundiaPR Lyon (France)Member

Hi,

I have a problem using SelectVariants at multiallelic sites. For a given patient (let's call him P1) I want to keep only the positions which are variant in his genome. I use the following options :
--preserveAlleles : I keep the original form of the alleles, as they are called in the original vcf
--excludeNonVariants : I do not want 0/0 positions for the patient P1
--removeUnusedAlternates : I want only the alleles which are specific to P1

The last point is the problematic one. Yes, it partially work. For example, let's say I have this variant in the original VCF, with two alleles in my population :

chrZ 375987 . TA T,TAA

In the P1-only-VCF, after extraction, I only have (let's say that P1 is 0/1) :
chrZ 375987 . TA T
Which is correct.

Nevertheless, even if only the good allele is kept, all the information from the INFO fields is preserved (for all the alleles) .
A little sample from the ANN records of the P1-only-file :
ANN=T|intron_variant|MODIFIER|GENE|GENE|transcript|NM_TR.1|Coding|1/2|c.62+5446delT||||||,
TAA|intron_variant|MODIFIER|GENE|GENE|transcript|NM_TR.1|Coding|1/2|c.62+5445_62+5446insT

I put in bold the information from an allele absent in P1. This is annoying because it disturb the interpretation. If anybody have a suggestion, it will be the very welcomed !

Thanks by advance,
BPR

Tagged:

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @BurgundiaPR
    Hi BPR,

    Can you confirm you are using the latest version of GATK? If so, I may need you to submit a bug report.

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    The important thing to check is how this third-party annotation is defined in the VCF header. If it's defined in a way that makes it clear it is encoded per-allele, then we should be able to parse and subset appropriately. If not, then we may not be able to do anything about this.

  • BurgundiaPRBurgundiaPR Lyon (France)Member

    Hi,

    Actually, it is not the last version but the 3.4 one. I just asked to our admin to update the software. I will give you the result with the new version, and will check the VCF header. In all cases, I think I will manage to take only the needed information, but I just wondered if there is a simple option I forgot in SelectVariants.

    Thanks for your answers,
    BPR

Sign In or Register to comment.