Alternate Alleles in VCF are more than 1 base

tc13tc13 Cambridge, UKMember

Hi there,

I've removed INDELS from a multi-sample vcf from HaplotypeCaller using SelectVariants. However, the ALT 'SNPs' are more than a single nucleotide substitution. Eg.

TTTTTTGTTTTTTGTTTT,GTTTTTGTTTT,G
TTTTTTTA,*
TTTTTTTAG,*
TTTTTTTATTTTTCATTTA,*
TTTTTGTTTTTTTA,TC,*

Q1) What is the meaning of the * symbol?
Q2) Is it to be expected that these SNPs are more than a single nucleotide substitution?

Thanks,
Tom

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @tc13
    Hi Tom,

    1) Have a look at this dictionary entry.
    2) Sure, there can be INDELS that are much larger than one base. HaplotypeCaller can detect INDELs up to a read length, but if you are interested in larger INDELS, you should use a structural variant caller.

    -Sheila

  • tc13tc13 Cambridge, UKMember

    Hi Sheila,

    I originally ran --selectTypeToExclude INDEL, though also including --selectTypeToExclude MIXED --selectTypeToExclude SYMBOLIC has resulted in a VCF with only SNPs.

    Thanks,
    Tom

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @tc13
    Hi Tom,

    I am glad that worked for you.
    Thanks for posting. This should help others in the future :smile:

    -Sheila

Sign In or Register to comment.