The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Annotating vcf with dbSNP IDs: Indel problem
I'm trying to replace my in-house identifiers with those from dbSNP. I've done this before with a purely SNP dataset by applying GATK AnnotateVariants, and then using bash to change the column order, and so create a valid vcf file.
My problem is that the dbSNP reference and alternate alleles for insertions are different from those originally generated by HaplotypeCaller.
For example where the original vcf is Ref T and Alt TATA, in dbSNP this becomes Ref - (dash) and Alt ATA. Annotate Variants generates an error because the alleles are different.
One solution is to omit all insertions but this is a waste of a lot of interesting biological data.
I know that dbSNP is not your responsibility but I was wondering if you or anyone else had any solutions to this.