Annotating vcf with dbSNP IDs: Indel problem

Will_GilksWill_Gilks University of Sussex, UKMember ✭✭

Hi team,

I'm trying to replace my in-house identifiers with those from dbSNP. I've done this before with a purely SNP dataset by applying GATK AnnotateVariants, and then using bash to change the column order, and so create a valid vcf file.

My problem is that the dbSNP reference and alternate alleles for insertions are different from those originally generated by HaplotypeCaller.

For example where the original vcf is Ref T and Alt TATA, in dbSNP this becomes Ref - (dash) and Alt ATA. Annotate Variants generates an error because the alleles are different.

One solution is to omit all insertions but this is a waste of a lot of interesting biological data.

I know that dbSNP is not your responsibility but I was wondering if you or anyone else had any solutions to this.

Sincerely,

William Gilks

Tagged:

Issue · Github
by Sheila

Issue Number
1703
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Will, sorry for the delayed response! That's not a format of dbsnp I've seen before -- those I've worked with follow the same convention we do, which is defined in the VCF spec. Is this a dbsnp file produced by someone else? In any case we don't have any tools to deal with this -- you would have to write some sort of script to convert to the proper VCF convention.

Sign In or Register to comment.