Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller error: java.lang.IllegalArgumentException: Unexpected base in allele bases

I recently ran the HaplotypeCaller on your NA12878 bam file from the 1000 Genomes technical site for each chromosome separately. For most of the chromosomes it worked properly, but I got the error below partway through chromosome 3. I've tried to attach the full log file, though I'm not sure if it worked. Any ideas? I ran the following options with GATK 2.3-4:
-T HaplotypeCaller -R /home/justin.zook/references/hs37d5.fa -I /scratch/justin.zook/NA12878/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.20120117.bam -L 3:1-198022430 --minPruning 3 -stand_call_conf 2 -stand_emit_conf 2 -dcov 100 -o /scratch/justin.zook/NA12878/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.20120117_3.vcf

Thanks!
Justin Zook

INFO 05:52:45,120 ProgressMeter - 3:60833792 6.08e+07 33.4 h 32.9 m 30.7% 4.5 d 75.3 h
INFO 05:53:45,138 ProgressMeter - 3:60836458 6.08e+07 33.4 h 32.9 m 30.7% 4.5 d 75.3 h

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: Unexpected base in allele bases 'TACTTGAAATTTTTCTGAATCCCTTTCAAATCAGGACAAGAACTAGAAATGTCTATACAGGTTTAATATGAAGTAAAGAAAATGTTTTTCATTTTCTTGATTTATTTCTGAATTCAGCTTGCTCTTCATTAGCGCTACATAGCTGMCTTATTATTCGTGGTCCCCTATGACCCCCTGATCATTTTCCCTGAGGGTGCATATTTATTCACTAACTATGTTACAATCATGTGATCTGCTGGATTTTTTCTGATAGTCTACTCTAGATTTGTTCTAAATTAATAAATCCCATTA'
at org.broadinstitute.sting.utils.variantcontext.Allele.(Allele.java:115)
at org.broadinstitute.sting.utils.variantcontext.Allele.create(Allele.java:167)
at org.broadinstitute.sting.utils.variantcontext.Allele.create(Allele.java:291)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeReadLikelihoods(LikelihoodCalculationEngine.java:129)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeReadLikelihoods(LikelihoodCalculationEngine.java:101)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:405)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:107)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:285)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:230)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:205)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:131)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:28)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:74)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:94)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.3-4-g57ea19f):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Unexpected base in allele bases 'TACTTGAAATTTTTCTGAATCCCTTTCAAATCAGGACAAGAACTAGAAATGTCTATACAGGTTTAATATGAAGTAAAGAAAATGTTTTTCATTTTCTTGATTTATTTCTGAATTCAGCTTGCTCTTCATTAGCGCTACATAGCTGMCTTATTATTCGTGGTCCCCTATGACCCCCTGATCATTTTCCCTGAGGGTGCATATTTATTCACTAACTATGTTACAATCATGTGATCTGCTGGATTTTTTCTGATAGTCTACTCTAGATTTGTTCTAAATTAATAAATCCCATTA'
ERROR ------------------------------------------------------------------------------------------

Best Answer

Answers

  • jzookjzook Member

    Ah, you're right. It's strange that the 1000 Genomes Phase2 reference genome has an M in chromosome 3...

    Thanks,
    Justin

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hi Justin,

    Just to confirm: do you get this error with the b37 reference and NA12878 bam files from our resource bundle? I'd like to reproduce this error locally so that I can test my fix for it.

  • jzookjzook Member

    Hi Eric,

    I haven't had a chance to try it with the b37 reference yet, but I suspect it will work since the 1000 Genomes reference has an M in it. I'll try it in the next week or so.

  • I encounter the same error when running HaplotypeCaller on picard-validated bam files using the b37 reference in the GATK resource bundle with GATK v2.3-4-g57ea19f. My commands are :
    -R resources/BroadInstitute/bundle_1.5/b37/human_g1k_v37.fasta \
    -T HaplotypeCaller \
    -L 3:60800001-60830896 \
    -I sample1.recal.bam -I sample2.recal.bam [...] -I sample36.recal.bam \
    -dcov 1200 -o samples.3:60800001-60830896.raw.snps.indels.vcf

    There was no problem running UnifiedGenotyper on the same data. The stack trace is pasted below. I've also run on a narrower interval with the debug option, I'd be happy to paste that output as well if it helps.

    Any help would be greatly appreciated.

    Thanks,

    Paige

    INFO 21:39:36,182 ProgressMeter - 3:60833792 8.17e+05 71.5 m 87.5 m 16.7% 7.1 h 6.0 h
    INFO 21:40:06,183 ProgressMeter - 3:60833792 8.17e+05 72.0 m 88.1 m 16.7% 7.2 h 6.0 h
    INFO 21:40:36,184 ProgressMeter - 3:60850176 8.34e+05 72.5 m 87.0 m 17.0% 7.1 h 5.9 h
    INFO 21:40:59,241 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException: Unexpected base in allele bases 'AATCTTCCAAACTTACTTGAAATTTTTCTGAATCCCTTTCAAATCAGGACAAGAACTAGAAATGTCTATACAGGTTTAATATGAAGTAAAGAAAATGTTTTTCATTTTCTTGATTTATTTCTGAATTCAGCTTGCTCTTCATTAGCGCTACATAGCTGMCTTATTATTCGTGGTCCCCTATGACCCCCTGATCATTTTCCCTGAGGGTGCATATTTATTCACTAACTATGTTACAATCATGTGATCTGCTGGATTTTTTCTGATAGTCTACTCTAGATTTGTTCTAAATTAATAAATCCCATTATTTTTGGCTTC'
    at org.broadinstitute.sting.utils.variantcontext.Allele.(Allele.java:115)
    at org.broadinstitute.sting.utils.variantcontext.Allele.create(Allele.java:167)
    at org.broadinstitute.sting.utils.variantcontext.Allele.create(Allele.java:291)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeReadLikelihoods(LikelihoodCalculationEngine.java:129)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeReadLikelihoods(LikelihoodCalculationEngine.java:101)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:405)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:107)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:285)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:230)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:205)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:131)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:28)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:74)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:94)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.3-4-g57ea19f):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Unexpected base in allele bases 'AATCTTCCAAACTTACTTGAAATTTTTCTGAATCCCTTTCAAATCAGGACAAGAACTAGAAATGTCTATACAGGTTTAATATGAAGTAAAGAAAATGTTTTTCATTTTCTTGATTTATTTCTGAATTCAGCTTGCTCTTCATTAGCGCTACATAGCTGMCTTATTATTCGTGGTCCCCTATGACCCCCTGATCATTTTCCCTGAGGGTGCATATTTATTCACTAACTATGTTACAATCATGTGATCTGCTGGATTTTTTCTGATAGTCTACTCTAGATTTGTTCTAAATTAATAAATCCCATTATTTTTGGCTTC'
    ERROR ------------------------------------------------------------------------------------------
  • Here is the debug output I mentioned:

    INFO 13:55:09,728 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 13:55:09,728 ProgressMeter - Location processed.active regions runtime per.1M.active regions completed total.runtime remaining

    Assembling 3:60830441-60830625 with 100 reads: (with overlap region = 3:60830376-60830690)
    Found 2 candidate haplotypes to evaluate every read against.
    AATCTTCCAAACTTACTTGAAATTTTTCTGAATCCCTTTCAAATCAGGACAAGAACTAGAAATGTCTATACAGGTTTAATATGAAGTAAAGAAAATGTTTTTCATTTTCTTGATTTATTTCTGAATTCAGCTTGCTCTTCATTAGCGCTACATAGCTGMCTTATTATTCGTGGTCCCCTATGACCCCCTGATCATTTTCCCTGAGGGTGCATATTTATTCACTAACTATGTTACAATCATGTGATCTGCTGGATTTTTTCTGATAGTCTACTCTAGATTTGTTCTAAATTAATAAATCCCATTATTTTTGGCTTC

    Cigar = 315M

    AATCTTCCAAACTTACTTGAAATTTTTCTGAATCCCTTTCAAATCAGGACAAGAACTAGAAATGTCTATACAGGTTTAATATGAAGTAAAGAAAATGTTTTTCATTTTCTTGATTTATTTCTGAATTCAGCTTGCTCTTCATTAGTGCTACATAGCTGCCTTATTATTCTTGGTCCCCTATGACCCCCTGATCATTTTCCCTGAGGGTGCATATTTATTCACTAACTATGTTACAATCATGTGATCTGCTGGATTTTTTCTGATAGTCTACTCTAGATTTGTTCTAAATTAATAAATCCCATTATTTTTGGCTTC

    Cigar = 315M

    INFO 13:55:19,800 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException: Unexpected base in allele bases 'AATCTTCCAAACTTACTTGAAATTTTTCTGAATCCCTTTCAAATCAGGACAAGAACTAGAAATGTCTATACAGGTTTAATATGAAGTAAAGAAAATGTTTTTCATTTTCTTGATTTATTTCTGAATTCAGCTTGCTCTTCATTAGCGCTACATAGCTGMCTTATTATTCGTGGTCCCCTATGACCCCCTGATCATTTTCCCTGAGGGTGCATATTTATTCACTAACTATGTTACAATCATGTGATCTGCTGGATTTTTTCTGATAGTCTACTCTAGATTTGTTCTAAATTAATAAATCCCATTATTTTTGGCTTC'
    at org.broadinstitute.sting.utils.variantcontext.Allele.(Allele.java:115)
    at org.broadinstitute.sting.utils.variantcontext.Allele.create(Allele.java:167)
    at org.broadinstitute.sting.utils.variantcontext.Allele.create(Allele.java:291)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeReadLikelihoods(LikelihoodCalculationEngine.java:129)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeReadLikelihoods(LikelihoodCalculationEngine.java:101)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:405)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:107)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:285)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:230)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:205)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.endTraversal(TraverseActiveRegions.java:294)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:93)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:94)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.3-4-g57ea19f):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Unexpected base in allele bases 'AATCTTCCAAACTTACTTGAAATTTTTCTGAATCCCTTTCAAATCAGGACAAGAACTAGAAATGTCTATACAGGTTTAATATGAAGTAAAGAAAATGTTTTTCATTTTCTTGATTTATTTCTGAATTCAGCTTGCTCTTCATTAGCGCTACATAGCTGMCTTATTATTCGTGGTCCCCTATGACCCCCTGATCATTTTCCCTGAGGGTGCATATTTATTCACTAACTATGTTACAATCATGTGATCTGCTGGATTTTTTCTGATAGTCTACTCTAGATTTGTTCTAAATTAATAAATCCCATTATTTTTGGCTTC'
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Paige,

    Thanks for posting this. Could you please upload a bam snippet of the interval you narrowed the problem down to, so we can test our fix for this problem? See this post for detailed instructions: http://www.broadinstitute.org/gatk/guide/article?id=1894

  • jzookjzook Member

    To follow-up on this, I did try to run the same command with your b37 reference fasta, and I get the same error, because it also has an "M" base at the same position. Have you found a fix for this yet? Thanks!

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hi Justin,

    We haven't been able to reproduce this problem locally. Can you please give me an exact command-line (preferably that fails quickly and that uses files from our resources bundle) so I can debug? Thanks.

  • jzookjzook Member

    Hi Eric,

    Sure, here is the command line that should get the error quickly:
    java -jar -Xmx21g $GATK/GenomeAnalysisTK.jar -T HaplotypeCaller -R /scratch/justin.zook/references/human_g1k_v37.fasta -I /scratch/justin.zook/NA12878/HSWG/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.20120117.bam -L 3:60820000-60850000 --minPruning 3 -stand_call_conf 2 -stand_emit_conf 2 -dcov 100 -o /scratch/justin.zook/NA12878/HSWG/test3err.vcf

    The reference is from your resource bundle, and the bam file is from your group in the 1000 Genomes technical working directory. Let me know if you need anything else.

    Thanks,
    Justin

  • jzookjzook Member

    Also, I've gotten the same error with every NA12878 bam file I've tried, so it probably doesn't matter which bam file you use.

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Thanks, Justin. I've implemented a fix for this. Unfortunately it's a bit complex so we're going to continue testing it until the next release (2.4) which should be available at the end of the month.

Sign In or Register to comment.