Problems with BQSR after IndelRealignment

ocornejoocornejo Washington State UniversityMember

Dear GATK team,
I am running into some issues while doing BQSR after Indel Realignment.

I have run the following commands to perform indel realignment before running BQSR

java -Xmx6g -jar /srv/gs1/projects/bustamante/ocornejo_projects/progs/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R /srv/gs1/projects/bustamante/cacao_proj/work_ref/matina_v1.1.fasta -T RealignerTargetCreator -I AGU_3339_12.F.bam -o loc_realigned/AGU_3339_12.forIndelRealigner.intervals --fix_misencoded_quality_scores -fixMisencodedQuals

java -Xmx6g -jar /srv/gs1/projects/bustamante/ocornejo_projects/progs/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R /srv/gs1/projects/bustamante/cacao_proj/work_ref/matina_v1.1.fasta -I AGU_3339_12.F.bam -T IndelRealigner -targetIntervals loc_realigned/AGU_3339_12.forIndelRealigner.intervals -o loc_realigned/AGU_3339_12.realigned.bam --fix_misencoded_quality_scores -fixMisencodedQuals

Now, when I proceed to run the base recalibrator, with the following command:

java -Xmx6g -jar /srv/gs1/projects/bustamante/ocornejo_projects/progs/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R /srv/gs1/projects/bustamante/cacao_proj/work_ref/matina_v1.1.fasta -T BaseRecalibrator -I loc_realigned/AGU_3339_12.realigned.bam -o recalibration/AGU_3339_12.recal_data.grp --knownSites /srv/gs1/projects/bustamante/cacao_proj/VQSR_ref/6kSNPs.vcf --covariate ContextCovariate --covariate RepeatLengthCovariate --covariate CycleCovariate --fix_misencoded_quality_scores -fixMisencodedQuals --filter_mismatching_base_and_quals --validation_strictness SILENT

I run into the following ERROR:

INFO 17:26:39,679 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:26:39,681 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34, Compiled 2013/01/11 22:43:14
INFO 17:26:39,681 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 17:26:39,681 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 17:26:39,684 HelpFormatter - Program Args: -R /srv/gs1/projects/bustamante/cacao_proj/work_ref/matina_v1.1.fasta -T BaseRecalibrator -I loc_realigned/AGU_3339_12.realigned_2.bam -o recalibration/AGU_3339_12.recal_data.grp --knownSites /srv/gs1/projects/bustamante/cacao_proj/VQSR_ref/6kSNPs.vcf --covariate ContextCovariate --covariate RepeatLengthCovariate --covariate CycleCovariate --validation_strictness SILENT --filter_mismatching_base_and_quals
INFO 17:26:39,685 HelpFormatter - Date/Time: 2013/05/20 17:26:39
INFO 17:26:39,685 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:26:39,685 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:26:39,694 ArgumentTypeDescriptor - Dynamically determined type of /srv/gs1/projects/bustamante/cacao_proj/VQSR_ref/6kSNPs.vcf to be VCF
INFO 17:26:39,702 GenomeAnalysisEngine - Strictness is SILENT
INFO 17:26:39,987 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 17:26:39,993 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 17:26:40,029 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
INFO 17:26:40,040 RMDTrackBuilder - Loading Tribble index from disk for file /srv/gs1/projects/bustamante/cacao_proj/VQSR_ref/6kSNPs.vcf
INFO 17:26:40,150 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 17:26:40,150 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 17:26:40,205 BaseRecalibrator - The covariates being used here:
INFO 17:26:40,205 BaseRecalibrator - ReadGroupCovariate
INFO 17:26:40,205 BaseRecalibrator - QualityScoreCovariate
INFO 17:26:40,205 BaseRecalibrator - ContextCovariate
INFO 17:26:40,205 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 17:26:40,206 BaseRecalibrator - CycleCovariate
INFO 17:26:40,206 BaseRecalibrator - RepeatLengthCovariate
INFO 17:26:40,207 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 3]
INFO 17:26:40,207 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 17:26:40,207 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 17:26:40,208 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 3]
INFO 17:26:40,208 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 17:26:40,208 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 17:26:40,208 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 1012, 3]
INFO 17:26:40,208 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 17:26:40,208 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 17:26:40,208 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 1002, 3]
INFO 17:26:40,208 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 17:26:40,208 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 17:26:40,209 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 22, 3]
INFO 17:26:40,209 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 17:26:40,209 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 17:26:40,235 ReadShardBalancer$1 - Loading BAM index data for next contig
INFO 17:26:40,237 ReadShardBalancer$1 - Done loading BAM index data for next contig
INFO 17:26:41,420 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: -2
at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158)
at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:246)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530)
at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateBAQArray(BaseRecalibrator.java:428)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:243)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:248)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:219)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:91)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:55)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: -2
ERROR ------------------------------------------------------------------------------------------

If you could point out what could be the root of the problem here. I did not use to run similar errors with the previous version of GATK.

thanks

Omar

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Omar,

    Sorry to get to your question so late.

    You should try to run your command again using the latest version of GATK (currently 2.5-2). If you still get an error let me know in this thread.

  • ocornejoocornejo Washington State UniversityMember

    Sorry Geraldine for the late response back.
    I ran into the same problem even on the night build downloaded May 21st. I saw that the issue had already been reported in the thread by spenugonda on May 14.

  • ocornejoocornejo Washington State UniversityMember

    I think I might know the root of the problem. How are you guys handling the Sanger / Illumina 1.9 (33) encoding of the qualities? I mean, in the RG information what platform do you specify? PL:Illumina or PL:sanger?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Quality encoding is evaluated independently of the platform tag, if that's what you're thinking of. It's based on empirical observation of the range of values found in the data.

  • ocornejoocornejo Washington State UniversityMember

    I see... My full pipeline involves: mapping(BWA) > makeBAMwithHeader(samtools) > CleanSam(Picard) > FixMateInformation(Picard) > SortSam(Picard) > MarkDuplicates(Picard) > AddorReplaceReadGroups(Picard) [I specify PL:illumina here] > index (samtools) > GATK (indel_realigner) > GATK (BQSR).
    I am getting the error en the BQSR.
    My command is (using the nightly version) is:

    java -Xmx6g -jar GenomeAnalysisTK-nightly-2013-06-11-g26c187e/GenomeAnalysisTK.jar -R cacao_proj/work_ref/matina_v1.1.fasta -T BaseRecalibrator -I AM_1_54.F.bam -o recalibration/AM_1_54.recal_data.grp --knownSites cacao_proj/VQSR_ref/6kSNPs.vcf --covariate ContextCovariate --covariate RepeatLengthCovariate --covariate CycleCovariate --fix_misencoded_quality_scores -fixMisencodedQuals --filter_mismatching_base_and_quals --validation_strictness SILENT

    and I obtain the following error:

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version nightly-2013-06-11-g26c187e):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Bad input: while fixing mis-encoded base qualities we encountered a read that was correctly encoded; we cannot handle such a mixture of reads so unfortunately the BAM must be fixed with some other tool
    ERROR ------------------------------------------------------------------------------------------

    I really don't understand then what could it be. Thank you Geraldine

  • ocornejoocornejo Washington State UniversityMember

    It seems that is the problem. After the original observation of an error, I perused in the forum and I found that including the flag could solve the issues I was having. It seems that the combination of running the bleeding edge version + flag was a mistake. It is running now (without the flag). I will let you know how it goes. Thank you very much for your help

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Glad to hear your problem is solved! FYI the latest versions are fully compatible with the qual fix flag. We've noticed that some people run with the flag as a preemptive measure, without having encountered the specific error that includes the instruction to use the flag as a fix. If you don't actually have a qual encoding problem in your data, the flag is not needed and will actually cause problems. We'll try to make that clearer in the docs.

Sign In or Register to comment.