We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

CombineVariants error unexpected bases in allele bases '*T'

horacio_gacevedohoracio_gacevedo Little Rock, ARMember

Hi there,
I ran the following command
java -jar $gatk_jar -T CombineVariants -R ${hs37d5} \
--variant ${name1} --variant ${name2} \
-o combined.vcf \
-genotypeMergeOptions UNIQUIFY
I followed the best practices and added 30 bam samples from the 1000g to my 4 samples. I tried also the b37 fasta but the following error persist.
I've seen other treads with a similar error but not for CombineVariants. Could it be a resilient bug?



ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: Unexpected base in allele bases '*T'
at htsjdk.variant.variantcontext.Allele.(Allele.java:162)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:234)
at htsjdk.variant.variantcontext.Allele.extend(Allele.java:253)
at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.createAlleleMapping(GATKVariantContextUtils.java:1303)
at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.resolveIncompatibleAlleles(GATKVariantContextUtils.java:1268)
at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.simpleMerge(GATKVariantContextUtils.java:1008)
at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:339)
at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:136)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: Unexpected base in allele bases '*T'
ERROR ------------------------------------------------------------------------------------------


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭
  • horacio_gacevedohoracio_gacevedo Little Rock, ARMember


    I ran validate variants, there were few of them with particularly long substitutions (thank you for the tip). However, after using awk to clean up the vcf file for substitutions less than 10 bp, I reran validate variants and my files were fine. But the problem with CombineVariants still persist :(
    Any other suggestions?


  • horacio_gacevedohoracio_gacevedo Little Rock, ARMember

    Hello again,

    I finally found it! ... the key was to eliminate awkward calls where either REF or ALT fields are too long like this one (first 5 fields only)


    I wrote this oneliner in case someone has the same problem ( I arbitrarily pick 10 bp, but probably someone have a more educated cut off)

    cat foo.vcf | awk 'BEGIN{FS="\t"}($0 !~ /^#/){if(length($5)<= 10 && length($4)<= 10) print $0} ($0 ~/^#/){print $0}' > foo.clean.vcf

    Thank you Sheila ;) , you were right on the money


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Horacio,

    I am happy to hear you fixed the issue! Thanks for reporting your solution.


  • nroaknroak HoustonMember ✭✭

    @horacio_gacevedo Thank you so much for this solution! I ran into similar issues and it did the trick!

  • Stephanie_MStephanie_M Broad InstituteMember

    @Sheila: I am still having this issue (it is ERROR MESSAGE: Unexpected base in allele bases '*C'). I have tried Validate Variants and @horacio_gacevedo's oneliner, but I am still getting this error message. Of note, it does not occur every times I combine variants.

    For example, I will combine vcf_one and vcf_two to get vcf_A, and I will not get any error message. I will also combine vcf_three and vcf_four to get vcf_B, and I will not get any error message. But when I combine vcf_A and vcf_B, I get this error message.

    Any thoughts?

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    Can you please post the exact command you are running and the version of GATK you are using? Please also post the full log output and error message. Did you produce the VCFs with the same version of GATK?


Sign In or Register to comment.