To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

CombineVariants error unexpected bases in allele bases '*T'

Hi there,
I ran the following command
java -jar $gatk_jar -T CombineVariants -R ${hs37d5} \
--variant ${name1} --variant ${name2} \
-o combined.vcf \
-genotypeMergeOptions UNIQUIFY
I followed the best practices and added 30 bam samples from the 1000g to my 4 samples. I tried also the b37 fasta but the following error persist.
I've seen other treads with a similar error but not for CombineVariants. Could it be a resilient bug?

Thanks

Horacio

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: Unexpected base in allele bases '*T'
at htsjdk.variant.variantcontext.Allele.(Allele.java:162)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:234)
at htsjdk.variant.variantcontext.Allele.extend(Allele.java:253)
at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.createAlleleMapping(GATKVariantContextUtils.java:1303)
at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.resolveIncompatibleAlleles(GATKVariantContextUtils.java:1268)
at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.simpleMerge(GATKVariantContextUtils.java:1008)
at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:339)
at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:136)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Unexpected base in allele bases '*T'
ERROR ------------------------------------------------------------------------------------------

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
  • horacio_gacevedohoracio_gacevedo Little Rock, ARMember

    Hi,

    I ran validate variants, there were few of them with particularly long substitutions (thank you for the tip). However, after using awk to clean up the vcf file for substitutions less than 10 bp, I reran validate variants and my files were fine. But the problem with CombineVariants still persist :(
    Any other suggestions?
    Thanks

    H

  • horacio_gacevedohoracio_gacevedo Little Rock, ARMember

    Hello again,

    I finally found it! ... the key was to eliminate awkward calls where either REF or ALT fields are too long like this one (first 5 fields only)

    1 152080947 . ACGGAATTTTCTGTCACGCTCTTGGCGGCTCAGCTGCTGTTCCTCCCTCTCCTGGCGCAGCTGTTCCTCCTCGCGGAATTTTCTGTCACGGTCCTGACGCCGCTGTTGCCCGCGCTCCTGGCGGCGCAGCTGCTGTTCCTCCTGGAGGAATTTTCTCTGCCGTTGCTGGCGGTGCAGCTGCTGTTCCTCCTCGAGGAATTTTCTCTCTGGTTCCTGACTGCGCAGTTCCTGTTCG A

    I wrote this oneliner in case someone has the same problem ( I arbitrarily pick 10 bp, but probably someone have a more educated cut off)

    cat foo.vcf | awk 'BEGIN{FS="\t"}($0 !~ /^#/){if(length($5)<= 10 && length($4)<= 10) print $0} ($0 ~/^#/){print $0}' > foo.clean.vcf

    Thank you Sheila ;) , you were right on the money

    Horacio

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @horacio_gacevedo
    Hi Horacio,

    I am happy to hear you fixed the issue! Thanks for reporting your solution.

    -Sheila

  • nroaknroak HoustonMember

    @horacio_gacevedo Thank you so much for this solution! I ran into similar issues and it did the trick!

  • Stephanie_MStephanie_M Broad InstituteMember

    @Sheila: I am still having this issue (it is ERROR MESSAGE: Unexpected base in allele bases '*C'). I have tried Validate Variants and @horacio_gacevedo's oneliner, but I am still getting this error message. Of note, it does not occur every times I combine variants.

    For example, I will combine vcf_one and vcf_two to get vcf_A, and I will not get any error message. I will also combine vcf_three and vcf_four to get vcf_B, and I will not get any error message. But when I combine vcf_A and vcf_B, I get this error message.

    Any thoughts?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Stephanie_M
    Hi,

    Can you please post the exact command you are running and the version of GATK you are using? Please also post the full log output and error message. Did you produce the VCFs with the same version of GATK?

    Thanks,
    Sheila

Sign In or Register to comment.