If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Variant types confusion
Dear GATK team,
I'm a bit confused about the term MIXED (and maybe SYMBOLIC), because I believe it's being differently used among softwares.
If I understand correctly from the FAQ "What types of variants can GATK tools handle?" we have:
MIXED (combination of SNPs and indels at a single position)
E.g. Reference = 'T', Sample = 'A,TCC'
Here, we say it's MIXED because it combines 2 variant types (SNP, INS) for this position; we are talking about two possible alleles.
SYMBOLIC (generally, a very large allele or one that's fuzzy and not fully modeled; i.e. there's some event going on here but we don't know what exactly)
E.g. Reference = 'GC', Sample = 'TTA'
Is this example correctly classified for what SYMBOLIC stands for?
In the other hand, I've been using SnfSift (from SnpEff package) to filter variants, but when I tried to grab what I understood MIXED variants were, I've got a different result as oppose to using GATK. While checking its manual, I found what seems to be a different definition for MIXED:
- MIXED: Multiple-nucleotide and an InDel.
E.g. Reference = 'ATA', Sample = 'GTCAGT'
I believe SnpEff MIXED definition of variant type is equivalent to GATKs SYMBOLIC definition, am I right?
I've been told one thing is a) "MIXED variant" and another b) "MIXED variant call record". GATK is using MIXED as b) while SnpEff is using it as a).
Is there an official definition for these stuff? Are any of these softwares wrong?
Thank you very much for your help.