Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Error in Haplotype Caller

khayerkhayer Posts: 1Member
edited August 2012 in Ask the GATK team

Hi,

I am trying to run the latest version (GenomeAnalysisTK-2.0-35-g2d70733) of the HaplotypeCaller on some .bam files that I had prepared according to the Best Practice v.3. Now GATK reports the following error:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: T
at org.broadinstitute.sting.utils.variantcontext.VariantContext.makeAlleles(VariantContext.java:1328)
at org.broadinstitute.sting.utils.variantcontext.VariantContext.(VariantContext.java:304)
at org.broadinstitute.sting.utils.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:518)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.generateVCsFromAlignment(GenotypingEngine.java:604)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoodsAndCallIndependentEvents(GenotypingEngine.java:198)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:414)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:104)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:246)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:202)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:177)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:134)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:27)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:269)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.0-35-g2d70733):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Duplicate allele added to VariantContext: T
ERROR ------------------------------------------------------------------------------------------

Now I am assuming my old bam files are not compatible with the new HaplotypeCaller. Is that correct?

Thank you for your help,
K

Tagged:

Best Answer

  • rpoplinrpoplin Posts: 121 mod
    edited August 2012 Answer ✓

    Hi there,

    Glad to hear you are trying out the HaplotypeCaller. I don't think it is actually a problem with your bam. We believe this issue is fixed in the latest internal development version of the tool. We plan to push this fix out with the release of version 2.1 of the GATK which should be in another week or two.

    Thanks so much for your help,

    Post edited by rpoplin on

Answers

  • ArtemPankinArtemPankin Posts: 8Member

    I am getting a similar error from HapotypeCaller and looking forward to the patched release.

  • evakoeevakoe Posts: 24Member
    edited August 2012

    I just downloaded version 2.1-0 and run the Haplotype Caller on data processed following the best recommendations v4, but I also get the same error as khayer. However, since I just saw this post, I have produced by BAM files using Version 2.0. Should I repeat my processing? Thanks. Eva.

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.1-0-ge42e50d):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Duplicate allele added to VariantContext: G
    ERROR ------------------------------------------------------------------------------------------
    Post edited by evakoe on
  • evakoeevakoe Posts: 24Member

    Edit: I reprocessed by BAM file using GATK 2.1-0 for all steps and I still get the same error for the Haplytype Caller.
    Eva

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    At this stage why don't you exact with PrintReads an interval of your BAM file that reproduces the error and upload it (and the reference, if this isn't human data) to our FTP server

    http://gatkforums.broadinstitute.org/discussion/1215/how-can-i-access-the-gsa-public-ftp-server

    --
    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • rpoplinrpoplin Posts: 121GATK Developer mod

    Thanks! We've received the file and will take a look at it right away.

    Thanks for your help in tracking this down,

  • starheightstarheight Posts: 2Member

    Yes I upgraded to 2.1 and got the same error:

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: G
    at org.broadinstitute.sting.utils.variantcontext.VariantContext.makeAlleles(VariantContext.java:1289)
    at org.broadinstitute.sting.utils.variantcontext.VariantContext.(VariantContext.java:298)
    at org.broadinstitute.sting.utils.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:494)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.generateVCsFromAlignment(GenotypingEngine.java:620)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoodsAndCallIndependentEvents(GenotypingEngine.java:206)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:416)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:107)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:245)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:201)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:176)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:133)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:28)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.1-1-g270cc30):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Duplicate allele added to VariantContext: G
    ERROR ------------------------------------------------------------------------------------------
  • rpoplinrpoplin Posts: 121GATK Developer mod

    @evakoe said:
    I uploaded a file, its called SRR287669_MD_IR_BQSR1.bam. As a reference I used human_g1k_v37.fasta from your bundle. I performed MarkDuplicates, Indel Realignment and BQSR with it only on chr 2 and 8 using the -L command. Thank you. Eva

    Hi there,

    That file doesn't seem to be aligned to human_g1k_v37.fasta. It looks like ucsc.hg19.fasta but the contigs are in the wrong order. Do you have the command line that you used to generate the error with this bam file? Also the commands for how this file was generated would be helpful too.

    Thanks!

  • rpoplinrpoplin Posts: 121GATK Developer mod

    In the meantime if anyone else can use PrintReads to extract an interval of your BAM file that reproduces the error and upload it (and the reference, if this isn't human data) to our FTP server

    http://gatkforums.broadinstitute.org/discussion/1215/how-can-i-access-the-gsa-public-ftp-server

    that would be very helpful.

    Thanks!

  • evakoeevakoe Posts: 24Member

    @rpoplin I'm sorry about that I was quite sure it was human_g1k_v37, but I must have mixed it up with previous experiments. Anyway, I repeated by whole processing using GATK 2.1-0 (alignment with BWA to human_g1k_v37.fasta, Mark Duplicates, Indel Realignment, BQSR) and called the HaplotypeCaller again. The run has not finished yet, but the error has not ocurred and previously I got it right in the beginning. I performed the processing mentioned above basically following the recommendations. Here is my commandline for the HaplotypeCaller:

    java -Xmx4g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human_g1k_v37.fasta -I in.bam -o out.vcf -D dbSNP137.vcf -A DepthOfCoverage -A HaplotypeScore -A MappingQualityRankSumTest -A FisherStrand -A ReadPosRankSumTest -A QualByDepth -et NO_ET -K mykey -L 2

    So maybe a solution is to rerun all analysis using 2.1-0 while paying attention to consistency in the reference files.

  • ArtemPankinArtemPankin Posts: 8Member

    @evakoe said:

    So maybe a solution is to rerun all analysis using 2.1-0 while paying attention to consistency in the reference files.

    Has your run finished without errors? As you suggested, I tried processing my BAM with 2.1-0 and still had the same error.((

  • ArtemPankinArtemPankin Posts: 8Member

    @rpoplin said:
    In the meantime if anyone else can use PrintReads to extract an interval of your BAM file that reproduces the error and upload it (and the reference, if this isn't human data) to our FTP server

    Could you please suggest me how to find an interval reproducing the error? In the run log before the error, I can only see the last region of my reference processed by the walker. Thank you in advance for your help.

  • rpoplinrpoplin Posts: 121GATK Developer mod
    edited August 2012

    There are two options that you could try: The simplest is to just guess an interval using the last region in your log file like you mentioned-- you can put a window of about 10000 bases on either side and that should do it. Or, if you add -debug to your HaplotypeCaller command line you'll see very verbose debug statements about every region that is processed. This will tell you the exact interval that failed.

    Thank you for your willingness to experiment a little bit here.

    Post edited by rpoplin on
  • starheightstarheight Posts: 2Member

    And the same error appears with the 2.1-2 release as well...

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: G
    at org.broadinstitute.sting.utils.variantcontext.VariantContext.makeAlleles(VariantContext.java:1289)
    at org.broadinstitute.sting.utils.variantcontext.VariantContext.(VariantContext.java:298)
    at org.broadinstitute.sting.utils.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:494)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.generateVCsFromAlignment(GenotypingEngine.java:620)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoodsAndCallIndependentEvents(GenotypingEngine.java:206)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:416)
    at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:107)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:245)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:201)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:176)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:133)
    at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:28)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.1-2-g916702e):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Duplicate allele added to VariantContext: G
    ERROR ------------------------------------------------------------------------------------------
  • rpoplinrpoplin Posts: 121GATK Developer mod
    edited August 2012

    @ArtemPankin said:
    **Thank you ever so much for your help.

    Here is the log of the error with the -debug option. Did I understand correctly that the following region of Chr2 is a source of the error?**




    Assembling Chr2:224105-224265 with 254 reads: (with overlap region = Chr2:224040-224330)

    Found 5 candidate haplotypes to evaluate every read against.
    cACCACGgCCTAAAaGAAaaCCTAaCTGtCCATaTCcTCgAAAaGGTtGTcTCaGCtCTGaGAcACCcACCaGAGAAGTTCCAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTCACCAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG

    '> Cigar = 291M
    CACCACGGCCTAAAAGAAAACCTAACTGTCCATATCCTCGAAAAGGTTGTCTCAGCTCTGAGACACCCACCAGAGAAGTTCCAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTCACCAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG

    Ah! I see the problem is that the upper and lower case bases in the reference and reads are treated as differences so it was trying to create a c -> C SNP. Thanks for your help.

    Post edited by rpoplin on
  • evakoeevakoe Posts: 24Member
  • rpoplinrpoplin Posts: 121GATK Developer mod

    Ok, this is hopefully fixed in version 2.1-3 which will show up on the website for download later today.
    Thank you for all the information that helped track this down.

Sign In or Register to comment.