Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller java.lang.IllegalArgumentException: Unexpected base in allele bases

bwubbbwubb Member ✭✭
edited October 2018 in Ask the GATK team

Greetings,

I am receiving an error message very similar to https://github.com/broadinstitute/gatk/issues/4525 when attempting to run gatk 4.0.10.1 HaplotypeCaller in GGA mode. My intent was simple, to take the het alleles from one sample and genotype them in another.

gatk HaplotypeCaller -R ~/resources/Genomes/Human/GRCh37/human_g1k_v37.fasta -L data/work/TB5050/S0760415/gatk/haplotype_caller.het_sites.vcf.gz -I bam_input/final/TB5050-T1/GRCh37/TB5050-T1.ready.bam -O data/work/TB5050-T1/S0760415/gatk/germline_het_sites.vcf.gz --genotyping-mode GENOTYPE_GIVEN_ALLELES --alleles data/work/TB5050/S0760415/gatk/haplotype_caller.het_sites.vcf.gz

...

11:18:47.862 INFO  HaplotypeCaller - Shutting down engine
[October 25, 2018 11:18:47 AM EDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.18 minutes.
Runtime.totalMemory()=2076049408
java.lang.IllegalArgumentException: Unexpected base in allele bases 'GGCAGGCGGAGGTTGCGGTGAGCCAGGATCGCGCCACTGCACTCCAGCCGGGGCAAAAAGAGCAAAACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAGC*GGGGGCGGTTTCAGGGATAAAAGTGGGGAATCCTCGGAGCTTTTCCAGCCGGCCCTCCCGGTCGCCCTTTGCAGTGCTTGGCGCCCCTGTGCCGGCCTTC'
        at htsjdk.variant.variantcontext.Allele.<init>(Allele.java:165)
        at org.broadinstitute.hellbender.utils.haplotype.Haplotype.<init>(Haplotype.java:40)
        at org.broadinstitute.hellbender.utils.haplotype.Haplotype.<init>(Haplotype.java:49)
        at org.broadinstitute.hellbender.utils.haplotype.Haplotype.insertAllele(Haplotype.java:209)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.composeGivenHaplotypes(ReadThreadingAssembler.java:180)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:116)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:259)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:538)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:240)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:291)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:267)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

I was uncertain if this was a bug still? Its quite possible my vcf is not proper for GGA, although it validates just fine. I think it breaks at the first * seen in ALT. Is this an event that can not be genotyped in GGA? Im do not know what the command would be to remove those, I cant seem to make SelectVariants or vcftools to work to that effect, (time to awk I guess).

The only thing I have left to do is upgrade to 4.0.11.0 which I just saw existed this morning. Any comments/advice would be greatly appreciated. Thank you.

-bwubb

Best Answer

Answers

  • bwubbbwubb Member ✭✭

    @bwubb said:
    ...I think it breaks at the first * seen in ALT. Is this an event that can not be genotyped in GGA? Im do not know what the command would be to remove those, I cant seem to make SelectVariants or vcftools to work to that effect, (time to awk I guess).
    ...

    Ugh, I posted the solution in my question. I mean that got me further, but then I hit a

    java.lang.IllegalStateException: Allele in genotype CCA* not in the variant context [A*, *, C]
    

    Which is preposterous, I dont see CCA* or anything like it anymore, all the multiallelic sites have been split and all * removed.

  • bwubbbwubb Member ✭✭

    Stepping further back I believe I was able to produce a allele file that works. It is still unclear what misstep I took in going from a jointly-called multisample vcf to a single sample, het snps/indels vcf. Perhaps I didnt use the -TYPE snps and indels flags when I thought I did? Im going to mark as answered and perhaps this will a reference to others.

  • bwubbbwubb Member ✭✭
    Accepted Answer

    All instances of '*' should be excluded from your --alleles vcf file.

Sign In or Register to comment.