To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Alternate Alleles in VCF are more than 1 base

tc13tc13 Cambridge, UKMember

Hi there,

I've removed INDELS from a multi-sample vcf from HaplotypeCaller using SelectVariants. However, the ALT 'SNPs' are more than a single nucleotide substitution. Eg.

TTTTTTGTTTTTTGTTTT,GTTTTTGTTTT,G
TTTTTTTA,*
TTTTTTTAG,*
TTTTTTTATTTTTCATTTA,*
TTTTTGTTTTTTTA,TC,*

Q1) What is the meaning of the * symbol?
Q2) Is it to be expected that these SNPs are more than a single nucleotide substitution?

Thanks,
Tom

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @tc13
    Hi Tom,

    1) Have a look at this dictionary entry.
    2) Sure, there can be INDELS that are much larger than one base. HaplotypeCaller can detect INDELs up to a read length, but if you are interested in larger INDELS, you should use a structural variant caller.

    -Sheila

  • tc13tc13 Cambridge, UKMember

    Hi Sheila,

    I originally ran --selectTypeToExclude INDEL, though also including --selectTypeToExclude MIXED --selectTypeToExclude SYMBOLIC has resulted in a VCF with only SNPs.

    Thanks,
    Tom

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @tc13
    Hi Tom,

    I am glad that worked for you.
    Thanks for posting. This should help others in the future :smile:

    -Sheila

Sign In or Register to comment.