On Monday and Tuesday, November 12-13, the communications team will be out of the office for a U.S. federal holiday and a team event. We will be back in action on November 14th and apologize for any inconvenience this may cause. Thank you for using the forum.

Alternate Alleles in VCF are more than 1 base

tc13tc13 Cambridge, UKMember

Hi there,

I've removed INDELS from a multi-sample vcf from HaplotypeCaller using SelectVariants. However, the ALT 'SNPs' are more than a single nucleotide substitution. Eg.

TTTTTTGTTTTTTGTTTT,GTTTTTGTTTT,G
TTTTTTTA,*
TTTTTTTAG,*
TTTTTTTATTTTTCATTTA,*
TTTTTGTTTTTTTA,TC,*

Q1) What is the meaning of the * symbol?
Q2) Is it to be expected that these SNPs are more than a single nucleotide substitution?

Thanks,
Tom

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tc13
    Hi Tom,

    1) Have a look at this dictionary entry.
    2) Sure, there can be INDELS that are much larger than one base. HaplotypeCaller can detect INDELs up to a read length, but if you are interested in larger INDELS, you should use a structural variant caller.

    -Sheila

  • tc13tc13 Cambridge, UKMember

    Hi Sheila,

    I originally ran --selectTypeToExclude INDEL, though also including --selectTypeToExclude MIXED --selectTypeToExclude SYMBOLIC has resulted in a VCF with only SNPs.

    Thanks,
    Tom

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tc13
    Hi Tom,

    I am glad that worked for you.
    Thanks for posting. This should help others in the future :smile:

    -Sheila

  • CNBersCNBers Member

    @Sheila said:
    @tc13
    Hi again,

    Geraldine just let me know I misunderstood your question! I thought you were asking why INDELs are larger than one base. Sorry for the confusion.

    I suspect the SelectVariants tool in including the * allele as a "SNP" site. What was the exact command you ran?

    You can try using --selectTypeToExclude. I think if you add --selectTypeToExclude INDEL --selectTypeToExclude MIXED --selectTypeToExclude SYMBOLIC you will get only SNPs. Let us know if that is not the case.

    -Sheila

    Dear Sheila,

    I want to remove the * allele and used --selectTypeToExclude INDEL --selectTypeToExclude MIXED --selectTypeToExclude
    SYMBOLIC

    But the * allele still in the output file.

    What should I do ? I use GATK3.8

    Best

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi CNBers,

    1)The function to filter '* allele' has been fixed in the GATK version 4.0.9.0. Upgrading to that should help resolve this issue.
    2) In cases where you want to drop sites with the * allele as the only ALT then, run SelectVariants with --exclude-non-variants

    Please refer to this documentation for more information: https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.9.0/org_broadinstitute_hellbender_tools_walkers_variantutils_SelectVariants.php

    Please let me know if this helps.

    Regards
    Bhanu Gandham

Sign In or Register to comment.