This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Variant types confusion
Dear GATK team,
I'm a bit confused about the term MIXED (and maybe SYMBOLIC), because I believe it's being differently used among softwares.
If I understand correctly from the FAQ "What types of variants can GATK tools handle?" we have:
MIXED (combination of SNPs and indels at a single position)
E.g. Reference = 'T', Sample = 'A,TCC'
Here, we say it's MIXED because it combines 2 variant types (SNP, INS) for this position; we are talking about two possible alleles.
SYMBOLIC (generally, a very large allele or one that's fuzzy and not fully modeled; i.e. there's some event going on here but we don't know what exactly)
E.g. Reference = 'GC', Sample = 'TTA'
Is this example correctly classified for what SYMBOLIC stands for?
In the other hand, I've been using SnfSift (from SnpEff package) to filter variants, but when I tried to grab what I understood MIXED variants were, I've got a different result as oppose to using GATK. While checking its manual, I found what seems to be a different definition for MIXED:
- MIXED: Multiple-nucleotide and an InDel.
E.g. Reference = 'ATA', Sample = 'GTCAGT'
I believe SnpEff MIXED definition of variant type is equivalent to GATKs SYMBOLIC definition, am I right?
I've been told one thing is a) "MIXED variant" and another b) "MIXED variant call record". GATK is using MIXED as b) while SnpEff is using it as a).
Is there an official definition for these stuff? Are any of these softwares wrong?
Thank you very much for your help.