Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller: Alleles for a VariantContext must contain at least one reference allele

KurtKurt Member ✭✭✭

Howdy. I'm playing with the 7-12 nightly to fix the HashMap iterator issue in http://gatkforums.broadinstitute.org/gatk/discussion/comment/30982#Comment_30982

When running HaplotypeCaller however, I had a new issue crop up that I didn't find on this forum yet (apologies if it has already been addressed).

Below is the stack trace:

ERROR --
ERROR stack trace

java.lang.IllegalArgumentException: Alleles for a VariantContext must contain at least one reference allele: [CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, ]
at htsjdk.variant.variantcontext.VariantContext.makeAlleles(VariantContext.java:1509)
at htsjdk.variant.variantcontext.VariantContext.(VariantContext.java:392)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:494)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:488)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:306)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:964)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:251)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version nightly-2016-07-12-gaa9ac69):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Alleles for a VariantContext must contain at least one reference allele: [CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, ]
ERROR ------------------------------------------------------------------------------------------

This was the command line for this (the data was processed using this nightly and GATK BP...except that I also used the local realignment section which has been deprecated. Also the aligner was bwa mem -M v 0.7.8 using the GATK 2.8 resource GRCh37 fasta with decoy sequence.

/isilon/sequencing/Kurt/Programs/Java/jdk1.8.0_73/bin/java -jar \ /isilon/sequencing/CIDRSeqSuiteSoftware/gatk/GATK_3/GenomeAnalysisTK-nightly-2016-07-12-gaa9ac69/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R /isilon/sequencing/GATK_resource_bundle/bwa_mem_0.7.5a_ref/human_g1k_v37_decoy.fasta \ --input_file /isilon/sequencing/Seq_Proj/CGC_CONTROL_DATA_SET_3_6/BAM/NA12891_NA12892_90-10.bam \ -L /isilon/sequencing/data/Work/BED/Production_BED_files/ALLBED_BED_File_Agilent_ClinicalExome_S06588914_ALLBed_merged_021015_noCHR.bed \ --emitRefConfidence BP_RESOLUTION \ --variant_index_type LINEAR \ --variant_index_parameter 128000 \ --max_alternate_alleles 3 \ --annotation AS_BaseQualityRankSumTest \ --annotation AS_FisherStrand \ --annotation AS_InbreedingCoeff \ --annotation AS_MappingQualityRankSumTest \ --annotation AS_RMSMappingQuality \ --annotation AS_ReadPosRankSumTest \ --annotation AS_StrandOddsRatio \ --annotation FractionInformativeReads \ --annotation StrandBiasBySample \ --annotation StrandAlleleCountsBySample \ --annotation GCContent \ --annotation AlleleBalanceBySample \ --annotation AlleleBalance \ --annotation LikelihoodRankSumTest \ -pairHMM VECTOR_LOGLESS_CACHING \ -o /isilon/sequencing/Seq_Proj/CGC_CONTROL_DATA_SET_3_6/GVCF/NA12891_NA12892_90-10.g.vcf.gz

the input sample is 90/10 mix of NA12891 and NA12892 (exome), but it has also happened for a "regular" sample. This was out of roughly 60 exomes. The bed files comprises roughly 90 Mb.

This may be related. The line directly preceding the error stack trace involved a symbolic allele for a deletion.

WARN 15:28:37,770 HaplotypeCallerGenotypingEngine - location 1:15714831: too many alternative alleles found (43) larger than the maximum requested with -maxAltAlleles (3), the following will be dropped: CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCCCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCTCCCTCCCCCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTT, C*, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCCCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCTCCCTCCCCCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCC... and 32 more.

Another sample exhibited the same profile, below, (even though it crashed in a different place, but directly after this line, I didn't see any other warnings involving a symbolic deletion allele for either sample in the preceding process logs).

WARN 20:42:19,954 HaplotypeCallerGenotypingEngine - location 16:11537347: too many alternative alleles found (43) larger than the maximum requested with -maxAltAlleles (3), the following will be dropped: A*, AAAAAGGGGGAGAGAGAG, AAAGGGGGAGAG, G, AAAAAAGAGGGAG, AAAGAGGGAG, AAAAAGAGGGAGAG, AAAAAAGGGGGAGAGAGAG, AAAAGGGAGAGAG, AAAAAGGGAGAGAGAGAG, AAGGGGGAGAG, AAAAAGGGAGAG, AAAAAAGGGAGAGAGAGAG, AAGGGAGAG, AAGAGGGAGAGAGAG, AAAGAGGGAGAGAGAG, AGAGGGAGAGAGAG, AAAAGAGGGAGAGAGAG, AAAAGAGGGAGAG, AAAAAGAGGGAGAGAGAG, AGGGGGAGAG, AAAAAGAGGGAG, AGGGAGAGAGAG, AAAAAAGAGGGAGAGAGAG, AAGAGGGAG, AAAAGGGAGAG, AAAGAGGGAGAG, AGGGGGAGAGAGAG, AAAAAAGGGGGAGAG, AAGGGGGAGAGAGAG, AGGGAGAG, AGGGAG, AAAAGAGGGAG, AAAGGGGGAGAGAGAG and 6 more.

The next entry was the same error as the sample above.

Oddly enough, I had a process that sent a non-zero exit status even though it appears to have been completed successfully (it did however display some sort of warning message that there were a lot of warning messages (2360) and it redisplayed the first 10...need to look at that one again to make sure that the non-zero exit status didn't come from somewhere else in the script. Apologies in advance if I screwed something up. I should be sleeping, but wanted to get this out before I get swamped tomorrow.

Best Regards,

Kurt Hetrick

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Well this is new and interesting -- but I suspect I know where it comes from. We recently changed some of the logic that prioritizes alleles to avoid getting bogged down at this kind of messy site, and it looks like a bug crept in. Could you please submit a bug report with test files so we can weed this out?

  • KurtKurt Member ✭✭✭

    @Geraldine_VdAuwera

    Bug report submitted. 12Julynightly.HC.Bug.KurtHetrick.tar.gz

    Let me know if you need anything else

    Thanks!

    Issue · Github
    by Sheila

    Issue Number
    1088
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • KurtKurt Member ✭✭✭

    Also, the whole non-zero exit status on what seemed like a good run was not true. the non-zero exit status occurred b/c of the same bug.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Great, thanks Kurt. We'll try to get this fixed asap.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Kurt
    Hi Kurt,

    I was able to reproduce this, and I'm about to submit a bug report.

    We will try to get this fixed asap. You can keep track of the issue here.

    -Sheila

    Issue · Github
    by Sheila

    Issue Number
    1448
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    SHuang-Broad
  • KurtKurt Member ✭✭✭
  • SheilaSheila Broad InstituteMember, Broadie admin

    @Kurt
    Hi Kurt,

    Can you confirm this data is not confidential? Like, can we use the snippet in tests that will be public in GATK4?

    Thanks,
    Sheila

  • KurtKurt Member ✭✭✭

    @Sheila

    Hi Sheila,

    I just asked and I don't foresee any problems with this data. Hopefully will let you know in a little while.

    Kurt

  • KurtKurt Member ✭✭✭

    @ Sheila

    Hi Sheila,

    I've confirmed that the data is not confidential and you can use the snippet in tests that will be public in GATK4.

    Kurt

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Kurt
    Hi Kurt,

    Wonderful news! Thank you for getting back so fast.

    -Sheila

  • csardascsardas Member

    Hi all, I use version nightly-2016-08-25-g15eb3ae for same reason and got same bug, how can i do with this? should I use GATK4? or just waiting another nightly build?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @csardas
    Hi.

    There will be a fix in the nightly builds soon. The developer is on it. You can keep track of the bug here.

    -Sheila

  • nilshomernilshomer Boston, MAMember

    @Sheila I am hitting the same bug and it is blocking some urgent work. Can you let us know when you think it will be fixed?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @nilshomer There's a PR under review, so it shouldn't be long now. We have our weekly bug fixing status meeting tomorrow and should be able to give you a better ETA following that.

  • nilshomernilshomer Boston, MAMember

    @Geraldine_VdAuwera that would be great if it could go out in the next few days as it is a critical bug as the previous release (3.6.0) also has a critical bug: http://gatkforums.broadinstitute.org/gatk/discussion/7756/hashmap-iterator-problem-with-gatk-3-6-on-na12878-validations#latest. Unfortunately I have hit both on the same dataset.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Will let you know when we have a better ETA. See the other thread for my response re: the possibility of a release.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @nilshomer
    Hey Nils,

    The fix should be in by the end of the week, if not earlier :smiley:

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @nilshomer @csardas @Kurt
    Hi everyone,

    The fix is in the latest nightly build!! :smile:

    -Sheila

  • nilshomernilshomer Boston, MAMember
    edited September 2016
  • KurtKurt Member ✭✭✭
  • mscjuliamscjulia United StatesMember

    @Geraldine_VdAuwera said:
    Will let you know when we have a better ETA. See the other thread for my response re: the possibility of a release.

    Thanks for the information in this thread, because I'm facing the same problem. I just noticed that this bug is "closed", judging from the Issue Tracker. Does that mean I can download the newest nightly build and use it please? Thanks a lot.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @mscjulia
    Hi,

    Indeed, you can try the latest nightly! :smiley:

    -Sheila

  • KurtKurt Member ✭✭✭

    @Sheila

    Did not fix...Should of failed the snippet that I sent you judging by the stack trace. this is the sept 13 nightly.

    `WARN 19:11:44,660 HaplotypeCallerGenotypingEngine - location 1:15714831: too many alternative alleles found (43) larger than the maximum requested with -maxAltAlleles (3), the following will be dropped: CCCTCCCTCCTTCCCTTCCCTCCC
    CAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCCCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCTCCCTCCCCCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTT,
    CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTT, C*, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCCCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCTCCCTCCCCCCCTCCCTCCCTT,
    CCCTCCCTCCTTCCCTTCC... and 32 more.

    ERROR --
    ERROR stack trace

    java.lang.IllegalArgumentException: Alleles for a VariantContext must contain at least one reference allele: [CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, CCCTCCCTC
    CTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCC
    CTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, ]
    at htsjdk.variant.variantcontext.VariantContext.makeAlleles(VariantContext.java:1509)
    at htsjdk.variant.variantcontext.VariantContext.(VariantContext.java:392)
    at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:494)
    at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:488)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:306)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:962)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:250)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version nightly-2016-09-13-gb43f5e1):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Alleles for a VariantContext must contain at least one reference allele: [CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCC

    CAGTCCCTGTCCAACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGT
    CCAACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, ]

    ERROR ------------------------------------------------------------------------------------------`

    Also, unrelated to this particular thread, but I guess it would go under user feedback. I noticed that you have gotten rid of

    --standard_min_confidence_threshold_for_emitting in GenotypeGVCFs. I do remember you people saying that you were thinking about getting rid of it b/c it was confusing to some (maybe a lot) people...I kind of really liked the old way (aside from the fact that removing it from the engine crashed my pipeline).

  • KurtKurt Member ✭✭✭

    @ Sheila. Ok, will do.

  • KurtKurt Member ✭✭✭

    @Sheila

    It got past the snippet. Should know in next few hours if it gets through all of the other samples.

    Thanks!

  • KurtKurt Member ✭✭✭

    @Sheila

    Got through all of the samples without any errors :smile:

    Thanks!

    Kurt

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Kurt
    Hi Kurt,

    Thank you for confirming the fix! I will let the developers know :smile:

    -Sheila

Sign In or Register to comment.