We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
RUNTIME ERROR (version 2.5-2-gf57256b): Duplicate allele added to VariantContext:T (UnifiedGenotype)

I am following SNPIR pipeline for variant calling from RNA seq illumina hiseq paired end data. And after all defined step thats the error report I am facing. Please note, these bams are created throught indel-realigner and base q recalibrator tool of GATK.
Please suggest me what to do... (also note: in some places the depth of read coverage are as high as 20k)
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: T
at org.broadinstitute.variant.variantcontext.VariantContext.makeAlleles(VariantContext.java:1335)
at org.broadinstitute.variant.variantcontext.VariantContext.(VariantContext.java:312)
at org.broadinstitute.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:478)
at org.broadinstitute.sting.gatk.walkers.genotyper.ConsensusAlleleCounter.consensusCountsToAlleles(ConsensusAlleleCounter.java:279)
at org.broadinstitute.sting.gatk.walkers.genotyper.ConsensusAlleleCounter.computeConsensusAlleles(ConsensusAlleleCounter.java:103)
at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.computeConsensusAlleles(IndelGenotypeLikelihoodsCalculationModel.java:93)
at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getInitialAlleleList(IndelGenotypeLikelihoodsCalculationModel.java:245)
at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getLikelihoods(IndelGenotypeLikelihoodsCalculationModel.java:114)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoods(UnifiedGenotyperEngine.java:320)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:221)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:353)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:143)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:268)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:256)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:145)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:286)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-gf57256b):
ERROR
ERROR Please check the documentation guide to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Duplicate allele added to VariantContext: T
ERROR ------------------------------------------------------------------------------------------
[NB: some other sample also putting the variant context as :A too]
Best Answer
-
Geraldine_VdAuwera Cambridge, MA admin
OK, I see. Well, this is a little different from our best practices but nothing stands out as potentially causing the error. Since you're using an older version that is no longer supported, you'll need to repeat the UnifiedGenotyper run with the latest version of GATK (2.8). If the error persists, try validating your input files using Picard tools (ValidateSAMFile).
Answers
Can you please describe what this SNPIR pipeline entails and what command line produced the error?
a
after that the rest part is like this (thought its from different sample)
java -Xmx10g -jar packages/GenomeAnalysisTK-2.5-2/GenomeAnalysisTK.jar -I 19N_c_sorted_removedup.bam -R packages/hg19genome.fa -T RealignerTargetCreator -o 19N_c_sorted_removedup.bam_forIndelRealigner.intervals -known packages/1000G_indels_for_realignment.hg19.vcf -known packages/dbsnp_132.hg19.vcf
this last command has created the error
SNPIR:
SNPiR: Reliable Identification of Genomic Variants Using RNA-seq Data
Getting Ready:
Step 1. download necessary files:
from UCSC browser: hg19 genome (make sure to get ALL the chromosomes),
hg19 repeat masker annotation (in bed format),
Refseq, UCSC, Gencode, Ensembl gene annotation files
- use "all fields from selected table" output format
- concatenate all gene annotation files
(careful: the UCSC gene annotation file is missing the first field, so make sure to add it - awk '{OFS="\t";print "1",$0}')
- sort concatenated gene annotation file based on chromosome and transcript start (sort -k3,3 -5,5n)
from rnaedit.com : list of known RNA editing sites (in bed format),
Step 2. download required software: samtools, bedtools, command line blat, perl, bwa, GATK, picardtools
Step 3. change paths to your samtools, bedtools and blat executables in file config.pm in the SNPiR package
Mapping reads:
Step 1. Concatenate splice junction sequence file with hg19 reference genome (junction files are provided at http://lilab.stanford.edu/SNPiR/junctions/)
Step 2. Create bwa index for this concatenated file
Step 3. Map reads with bwa as single end sequences to this reference file
Step 4. Convert the position of reads that map across splicing junctions onto the genome: java -Xmx2g convertCoordinates < in.sam > out.sam
Step 5. Use Picard MarkDuplicates to remove duplicate reads
Step 6. Filter out unmapped reads and reads with mapping quality < 20 using samtools
Step 7. Index this mapping file with samtools
Step 8. Perform indel realignment and base quality score recalibration using IndelRealigner, CountCovariates, and TableRecalibration in GATK (see GATK webpage for best practices)
Call/filter Variants (see EXAMPLES.sh for more information):
Step 1. Initial variant calling with GATK UnifiedGenotyper: options of -stand_call_conf 0 -stand_emit_conf 0
Step 2. Convert VCF format to our custom variant format and filter variants with low quality: ConvertVCF.sh in.vcf out.txt MINQUAL
Step 3. Remove mismatches in first 6 bp of reads: perl filter_mismatch_first6bp.pl
Step 4. Use bedtools to remove sites in repetitive regions based on RepeatMasker annotation
Further filtering:
Step 1. Filter intronic candidates that are within 4 bp of splicing junctions: perl filter_intron_near_splicejuncts.pl
Step 2. Filter candidates in homopolymer runs: perl filter_homopolymer_nucleotides.pl
Step 3. Use BLAT to ensure unique mapping: perl BLAT_candidates.pl
Step 4. Use bedtools to separate out candidates that are known RNA editing sites
OK, I see. Well, this is a little different from our best practices but nothing stands out as potentially causing the error. Since you're using an older version that is no longer supported, you'll need to repeat the UnifiedGenotyper run with the latest version of GATK (2.8). If the error persists, try validating your input files using Picard tools (ValidateSAMFile).
Ok.. Thanks. I will try and revert