If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
SelectVariants - Information from absent alleles
I have a problem using SelectVariants at multiallelic sites. For a given patient (let's call him P1) I want to keep only the positions which are variant in his genome. I use the following options :
--preserveAlleles : I keep the original form of the alleles, as they are called in the original vcf
--excludeNonVariants : I do not want 0/0 positions for the patient P1
--removeUnusedAlternates : I want only the alleles which are specific to P1
The last point is the problematic one. Yes, it partially work. For example, let's say I have this variant in the original VCF, with two alleles in my population :
chrZ 375987 . TA T,TAA
In the P1-only-VCF, after extraction, I only have (let's say that P1 is 0/1) :
chrZ 375987 . TA T
Which is correct.
Nevertheless, even if only the good allele is kept, all the information from the INFO fields is preserved (for all the alleles) .
A little sample from the ANN records of the P1-only-file :
I put in bold the information from an allele absent in P1. This is annoying because it disturb the interpretation. If anybody have a suggestion, it will be the very welcomed !
Thanks by advance,