It looks like you're new here. If you want to get involved, click one of these buttons!
Hi,
I am trying to run the latest version (GenomeAnalysisTK-2.0-35-g2d70733) of the HaplotypeCaller on some .bam files that I had prepared according to the Best Practice v.3. Now GATK reports the following error:
java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: T at org.broadinstitute.sting.utils.variantcontext.VariantContext.makeAlleles(VariantContext.java:1328) at org.broadinstitute.sting.utils.variantcontext.VariantContext.(VariantContext.java:304) at org.broadinstitute.sting.utils.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:518) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.generateVCsFromAlignment(GenotypingEngine.java:604) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoodsAndCallIndependentEvents(GenotypingEngine.java:198) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:414) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:104) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:246) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:202) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:177) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:134) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:27) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:269) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
Now I am assuming my old bam files are not compatible with the new HaplotypeCaller. Is that correct?
Thank you for your help, K
rpoplin
Posts: 92 mod
Hi there,
Glad to hear you are trying out the HaplotypeCaller. I don't think it is actually a problem with your bam. We believe this issue is fixed in the latest internal development version of the tool. We plan to push this fix out with the release of version 2.1 of the GATK which should be in another week or two.
Thanks so much for your help,
Answers
I am getting a similar error from HapotypeCaller and looking forward to the patched release.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I just downloaded version 2.1-0 and run the Haplotype Caller on data processed following the best recommendations v4, but I also get the same error as khayer. However, since I just saw this post, I have produced by BAM files using Version 2.0. Should I repeat my processing? Thanks. Eva.
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.1-0-ge42e50d):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Duplicate allele added to VariantContext: G
ERROR ------------------------------------------------------------------------------------------
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Edit: I reprocessed by BAM file using GATK 2.1-0 for all steps and I still get the same error for the Haplytype Caller. Eva
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •At this stage why don't you exact with PrintReads an interval of your BAM file that reproduces the error and upload it (and the reference, if this isn't human data) to our FTP server
http://gatkforums.broadinstitute.org/discussion/1215/how-can-i-access-the-gsa-public-ftp-server
-- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I uploaded a file, its called SRR287669_MD_IR_BQSR1.bam. As a reference I used human_g1k_v37.fasta from your bundle. I performed MarkDuplicates, Indel Realignment and BQSR with it only on chr 2 and 8 using the -L command. Thank you. Eva
- Spam
- Abuse
- Troll
5 • Off Topic Disagree Agree Like WTF •Thanks! We've received the file and will take a look at it right away.
Thanks for your help in tracking this down,
- Spam
- Abuse
- Troll
1 • Off Topic Disagree Agree 1Like WTF •Yes I upgraded to 2.1 and got the same error:
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: G at org.broadinstitute.sting.utils.variantcontext.VariantContext.makeAlleles(VariantContext.java:1289) at org.broadinstitute.sting.utils.variantcontext.VariantContext.(VariantContext.java:298) at org.broadinstitute.sting.utils.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:494) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.generateVCsFromAlignment(GenotypingEngine.java:620) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoodsAndCallIndependentEvents(GenotypingEngine.java:206) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:416) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:107) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:201) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:176) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:133) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:28) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.1-1-g270cc30):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Duplicate allele added to VariantContext: G
ERROR ------------------------------------------------------------------------------------------
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi there,
That file doesn't seem to be aligned to human_g1k_v37.fasta. It looks like ucsc.hg19.fasta but the contigs are in the wrong order. Do you have the command line that you used to generate the error with this bam file? Also the commands for how this file was generated would be helpful too.
Thanks!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •In the meantime if anyone else can use PrintReads to extract an interval of your BAM file that reproduces the error and upload it (and the reference, if this isn't human data) to our FTP server
http://gatkforums.broadinstitute.org/discussion/1215/how-can-i-access-the-gsa-public-ftp-server
that would be very helpful.
Thanks!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •@rpoplin I'm sorry about that I was quite sure it was human_g1k_v37, but I must have mixed it up with previous experiments. Anyway, I repeated by whole processing using GATK 2.1-0 (alignment with BWA to human_g1k_v37.fasta, Mark Duplicates, Indel Realignment, BQSR) and called the HaplotypeCaller again. The run has not finished yet, but the error has not ocurred and previously I got it right in the beginning. I performed the processing mentioned above basically following the recommendations. Here is my commandline for the HaplotypeCaller:
java -Xmx4g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human_g1k_v37.fasta -I in.bam -o out.vcf -D dbSNP137.vcf -A DepthOfCoverage -A HaplotypeScore -A MappingQualityRankSumTest -A FisherStrand -A ReadPosRankSumTest -A QualByDepth -et NO_ET -K mykey -L 2
So maybe a solution is to rerun all analysis using 2.1-0 while paying attention to consistency in the reference files.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Has your run finished without errors? As you suggested, I tried processing my BAM with 2.1-0 and still had the same error.((
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Could you please suggest me how to find an interval reproducing the error? In the run log before the error, I can only see the last region of my reference processed by the walker. Thank you in advance for your help.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •There are two options that you could try: The simplest is to just guess an interval using the last region in your log file like you mentioned-- you can put a window of about 10000 bases on either side and that should do it. Or, if you add
-debugto yourHaplotypeCallercommand line you'll see very verbose debug statements about every region that is processed. This will tell you the exact interval that failed.Thank you for your willingness to experiment a little bit here.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •**Thank you ever so much for your help.
Here is the log of the error with the -debug option. Did I understand correctly that the following region of Chr2 is a source of the error?**
Assembling Chr2:224105-224265 with 254 reads: (with overlap region = Chr2:224040-224330)
Found 5 candidate haplotypes to evaluate every read against. cACCACGgCCTAAAaGAAaaCCTAaCTGtCCATaTCcTCgAAAaGGTtGTcTCaGCtCTGaGAcACCcACCaGAGAAGTTCCAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTCACCAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG
'> Cigar = 291M CACCACGGCCTAAAAGAAAACCTAACTGTCCATATCCTCGAAAAGGTTGTCTCAGCTCTGAGACACCCACCAGAGAAGTTCCAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTCACCAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG
'> Cigar = 291M CACCACGGCCTAAAAGAAAACCTAACTGTCCATATCCTCGAAAAGGTTGTCTCAGCTCTGAGACACCCACCAGAGAAGTTCCAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTAAATAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG
'> Cigar = 291M cACCACGgCCTAAAaGAAaaCCTAaCTGtCCATaTCcTCgAAAaGGTtGTcTCaGCtCTGaGAcACCcACCaGAGAAGTTCCAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTAAATAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG
'> Cigar = 291M CACCACGGCCTAAAAGAAAACCTAACTGTCCATATCCTCGAAAAGGTTGTCTCAGCTCTGAGACACCCACCAGAGAAGTTCAAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTCACCAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG
'> Cigar = 291M Chose haplotypes 2 and 1 with diploid likelihood = 0.0 Chose 2 alternate haplotypes to genotype in all samples. === Best Haplotypes === cACCACGgCCTAAAaGAAaaCCTAaCTGtCCATaTCcTCgAAAaGGTtGTcTCaGCtCTGaGAcACCcACCaGAGAAGTTCCAAAATCAAGTGTTAGCTTGAGCAATAGCAATTCACAAATGGAAAGCAATGGAACTCTTCAGGTCACCAGCACTCAGAAACTTCAAAGGAAGGAGTTGTCTGGAAACGGCAGTTGCTCAGAAGTTATTAATATCTTTAGAGAAGCACCATCTGCCTCATTTTCTTCCTCTAACAAGAGCTCTTCAAATCATGGTGTCTCTGGGGGAATTG
'> Cigar = 291M
'> Left and right breaks = (0 , 0)
'>> Events = {} INFO 16:55:10,020 GATKRunReport - Uploaded run statistics report to AWS S3
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: C at org.broadinstitute.sting.utils.variantcontext.VariantContext.makeAlleles(VariantContext.java:1289) at org.broadinstitute.sting.utils.variantcontext.VariantContext.(VariantContext.java:298) at org.broadinstitute.sting.utils.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:494) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.generateVCsFromAlignment(GenotypingEngine.java:620) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoodsAndCallIndependentEvents(GenotypingEngine.java:206) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:416) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:107) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:201) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:176) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:133) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:28) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.1-0-ge42e50d):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
****##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Duplicate allele added to VariantContext: C
ERROR ------------------------------------------------------------------------------------------
- Spam
- Abuse
- Troll
5 • Off Topic Disagree Agree Like WTF •And the same error appears with the 2.1-2 release as well...
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: G at org.broadinstitute.sting.utils.variantcontext.VariantContext.makeAlleles(VariantContext.java:1289) at org.broadinstitute.sting.utils.variantcontext.VariantContext.(VariantContext.java:298) at org.broadinstitute.sting.utils.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:494) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.generateVCsFromAlignment(GenotypingEngine.java:620) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoodsAndCallIndependentEvents(GenotypingEngine.java:206) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:416) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:107) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.callWalkerMapOnActiveRegions(TraverseActiveRegions.java:201) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:176) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:133) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:28) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.1-2-g916702e):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Duplicate allele added to VariantContext: G
ERROR ------------------------------------------------------------------------------------------
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ah! I see the problem is that the upper and lower case bases in the reference and reads are treated as differences so it was trying to create a
c -> CSNP. Thanks for your help.- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •@ArtemPankin Yes, my run finished without errors.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok, this is hopefully fixed in version 2.1-3 which will show up on the website for download later today. Thank you for all the information that helped track this down.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •