Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

RUNTIME ERROR (version 2.5-2-gf57256b): Duplicate allele added to VariantContext:T (UnifiedGenotype)

I am following SNPIR pipeline for variant calling from RNA seq illumina hiseq paired end data. And after all defined step thats the error report I am facing. Please note, these bams are created throught indel-realigner and base q recalibrator tool of GATK.
Please suggest me what to do... (also note: in some places the depth of read coverage are as high as 20k)

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: T
at org.broadinstitute.variant.variantcontext.VariantContext.makeAlleles(VariantContext.java:1335)
at org.broadinstitute.variant.variantcontext.VariantContext.(VariantContext.java:312)
at org.broadinstitute.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:478)
at org.broadinstitute.sting.gatk.walkers.genotyper.ConsensusAlleleCounter.consensusCountsToAlleles(ConsensusAlleleCounter.java:279)
at org.broadinstitute.sting.gatk.walkers.genotyper.ConsensusAlleleCounter.computeConsensusAlleles(ConsensusAlleleCounter.java:103)
at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.computeConsensusAlleles(IndelGenotypeLikelihoodsCalculationModel.java:93)
at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getInitialAlleleList(IndelGenotypeLikelihoodsCalculationModel.java:245)
at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getLikelihoods(IndelGenotypeLikelihoodsCalculationModel.java:114)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoods(UnifiedGenotyperEngine.java:320)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:221)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:353)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:143)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:268)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:256)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:145)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:286)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-gf57256b):
ERROR
ERROR Please check the documentation guide to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Duplicate allele added to VariantContext: T
ERROR ------------------------------------------------------------------------------------------

[NB: some other sample also putting the variant context as :A too]

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Can you please describe what this SNPIR pipeline entails and what command line produced the error?

  • navonil13navonil13 Member
    edited January 2014

    a

    Post edited by navonil13 on
  • navonil13navonil13 Member
    edited January 2014

    after that the rest part is like this (thought its from different sample)

    java -Xmx10g -jar packages/GenomeAnalysisTK-2.5-2/GenomeAnalysisTK.jar -I 19N_c_sorted_removedup.bam -R packages/hg19genome.fa -T RealignerTargetCreator -o 19N_c_sorted_removedup.bam_forIndelRealigner.intervals -known packages/1000G_indels_for_realignment.hg19.vcf -known packages/dbsnp_132.hg19.vcf

        java -Xmx4g -Djava.io.tmpdir=/dev/shm -jar packages/GenomeAnalysisTK-2.5-2/GenomeAnalysisTK.jar -I 19N_c_sorted_removedup.bam -R packages/hg19genome.fa -T IndelRealigner -targetIntervals 19N_c_sorted_removedup.bam_forIndelRealigner.intervals -o 19N_c_sorted_removedup.bam_realignedBam.bam -known packages/1000G_indels_for_realignment.hg19.vcf -known packages/dbsnp_132.hg19.vcf
    
    
        java -Xmx4g -jar packages/GenomeAnalysisTK-2.5-2/GenomeAnalysisTK.jar -R packages/hg19genome.fa -knownSites packages/dbsnp_132.hg19.vcf -I 19N_c_sorted_removedup.bam_realignedBam.bam -T BaseRecalibrator -o 19N_c_sorted_removedup.bam.pre_recal_data.csv
    
    
        java -Xmx4g -Djava.io.tmpdir=/dev/shm -jar packages/GenomeAnalysisTK-2.5-2/GenomeAnalysisTK.jar -R packages/hg19genome.fa -I 19N_c_sorted_removedup.bam_realignedBam.bam -T PrintReads -BQSR 19N_c_sorted_removedup.bam.pre_recal_data.csv -o 19N_c_sorted_removedup.bam_recal.bam
    
    
        ./packages/samtools-0.1.19/samtools index 19N_c_sorted_removedup.bam_recal.bam 19N_c_sorted_removedup.bam_recal.bam.bai
    
    
        java -jar packages/picard-tools-1.92/AddOrReplaceReadGroups.jar I=19N_c_sorted_removedup.bam_recal.bam O=19N_c_sorted_removedup.bam_recal_RG.bam SORT_ORDER=coordinate TMP_DIR=/dev/shm RGID=1 RGLB=1 RGPL=illumina RGSM=ICGC RGPU=NAVONIL CREATE_INDEX=True
    
    
        java -jar  packages/GenomeAnalysisTK-2.5-2/GenomeAnalysisTK.jar -R /packages/hg19genome.fa -T UnifiedGenotyper -I 19N_c_sorted_removedup.bam_recal_RG.bam --dbsnp packages/dbsnp_135.hg19.vcf -o 19N_c_sorted_removedup.bam_WG_snp_indel.raw.vcf -stand_call_conf 0.0 -stand_emit_conf 0.0 -glm BOTH
    

    this last command has created the error

    Post edited by navonil13 on
  • navonil13navonil13 Member
    edited January 2014

    SNPIR:

    SNPiR: Reliable Identification of Genomic Variants Using RNA-seq Data

    Getting Ready:
    Step 1. download necessary files:
    from UCSC browser: hg19 genome (make sure to get ALL the chromosomes),
    hg19 repeat masker annotation (in bed format),
    Refseq, UCSC, Gencode, Ensembl gene annotation files
    - use "all fields from selected table" output format
    - concatenate all gene annotation files
    (careful: the UCSC gene annotation file is missing the first field, so make sure to add it - awk '{OFS="\t";print "1",$0}')
    - sort concatenated gene annotation file based on chromosome and transcript start (sort -k3,3 -5,5n)
    from rnaedit.com : list of known RNA editing sites (in bed format),
    Step 2. download required software: samtools, bedtools, command line blat, perl, bwa, GATK, picardtools
    Step 3. change paths to your samtools, bedtools and blat executables in file config.pm in the SNPiR package

    Mapping reads:

    Step 1. Concatenate splice junction sequence file with hg19 reference genome (junction files are provided at http://lilab.stanford.edu/SNPiR/junctions/)

    Step 2. Create bwa index for this concatenated file

    Step 3. Map reads with bwa as single end sequences to this reference file

    Step 4. Convert the position of reads that map across splicing junctions onto the genome: java -Xmx2g convertCoordinates < in.sam > out.sam

    Step 5. Use Picard MarkDuplicates to remove duplicate reads

    Step 6. Filter out unmapped reads and reads with mapping quality < 20 using samtools

    Step 7. Index this mapping file with samtools

    Step 8. Perform indel realignment and base quality score recalibration using IndelRealigner, CountCovariates, and TableRecalibration in GATK (see GATK webpage for best practices)

    Call/filter Variants (see EXAMPLES.sh for more information):

    Step 1. Initial variant calling with GATK UnifiedGenotyper: options of -stand_call_conf 0 -stand_emit_conf 0

    Step 2. Convert VCF format to our custom variant format and filter variants with low quality: ConvertVCF.sh in.vcf out.txt MINQUAL

    Step 3. Remove mismatches in first 6 bp of reads: perl filter_mismatch_first6bp.pl

    Step 4. Use bedtools to remove sites in repetitive regions based on RepeatMasker annotation

    Further filtering:
    Step 1. Filter intronic candidates that are within 4 bp of splicing junctions: perl filter_intron_near_splicejuncts.pl

    Step 2. Filter candidates in homopolymer runs: perl filter_homopolymer_nucleotides.pl

    Step 3. Use BLAT to ensure unique mapping: perl BLAT_candidates.pl

    Step 4. Use bedtools to separate out candidates that are known RNA editing sites

  • navonil13navonil13 Member

    Ok.. Thanks. I will try and revert

Sign In or Register to comment.