GATK IndelRealigner error

OwenOOwenO Oxford, UKMember

Hi,
I'm having trouble with the second part of the indel realignment protocol. I'm trying to run it for 100bp Illumina paired-end genome resequencing data. This is for a non-model plant so I don't have a set of known indels. RealignerTargetCreator seems to run okay for all my 12 datasets with the following command line:

java -Xmx20G -jar /usr/local/bin/JAVA_JARS/GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar -T RealignerTargetCreator -I mapping.bam -R reference.fasta -o mapping.intervals

But the second part fails with the following command line:

java -Xmx50G -jar /usr/local/bin/JAVA_JARS/GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar -T IndelRealigner -I mapping.bam -R reference.fasta -targetIntervals mapping.intervals -o mapping_realigned.bam

with the following error:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:195)
at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:329)
at org.broadinstitute.gatk.tools.walkers.indels.ReadBin.getReference(ReadBin.java:108)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.clean(IndelRealigner.java:696)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.cleanAndCallMap(IndelRealigner.java:580)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:552)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:148)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:228)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:216)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

I have validated the bam files with picard tools ValidateSamFile and found no errors. I have created .dict and .fai files for the reference with the methods recommended in the GATK documentation and a .bai index for the bams. My processing protocol up to this step has been:

  1. Quality trimming with trimmomatic, discard broken pairs.
  2. mapping with bwa mem using the following command line:
    bwa mem -M -t 25 reads_1P.fq.gz reads_2P.fq.gz | samtools view -Sb - | samtools sort - mapping.bam
  3. merging bams for each individual with picard tools MergeSamFiles (there were two readsets with different insert sizes for each of the twelve individuals) with the following command line:
    java -Xmx40G -jar MergeSamFiles.jar INPUT=mapping1.bam INPUT=mapping2.bam OUTPUT=mapping_merged.bam SORT_ORDER=coordinate ASSUME_SORTED=true MERGE_SEQUENCE_DICTIONARIES=TRUE USE_THREADING=true
    (I also tried the next steps without merging and got the same error from IndelRealigner so I don't think this is causing a problem)
  4. Add read groups with picard tools AddOrReplaceReadGroups with the following command line
    java -Xmx40G -jar AddOrReplaceReadGroups.jar I=mapping_merged.bam O=mapping_merged_RG.bam RGSM=Ind1 RGPL=illumina RGPU=1 RGLB=1 MAX_RECORDS_IN_RAM=3000000 VALIDATION_STRINGENCY=SILENT

I've tried remaking the indexes and even tried a few versions of java. I'm now out of ideas. Any help would be much appreciated.

All the best,
Owen

Tagged:

Issue · Github
by Sheila

Issue Number
1104
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
nh13

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @OwenO
    Hi Owen,

    Wow. This is the third or fourth time a user has posted the same error message. Did you try deleting the reference index and re-generating it?

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @OwenO
    Hi Owen,

    Hmm. I think this is actually a bug. I asked another user for a bug report. But, if you want to upload one as well, that will help too. http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    -Sheila

  • OwenOOwenO Oxford, UKMember

    Hi Sheila,
    Thanks for the quick response. I just tried deleting and recreating the reference index and dictionary and got the same error. I'll submit a bug report.
    Owen

  • Double_ODouble_O Member

    Hi Sheila,
    I've uploaded the bug report now. The file is called: bug_report_IndelRealigner_5875_OwenO.tar.gz
    Thanks,
    Owen

  • OwenOOwenO Oxford, UKMember

    ps. Double_O is just my old account. Didn't realise it was logged in on this computer. Owen

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @OwenO
    Hi Owen,

    I just put in the bug report. Hopefully I can get this fixed asap, as a few other have complained about this as well.

    -Sheila

  • OwenOOwenO Oxford, UKMember

    Great, thanks a lot. Owen

  • dcopettidcopetti ArizonaMember

    I get a similar error in the previous step. The command is
    java -Xmx2g -jar /opt/GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar -T RealignerTargetCreator -R genome.fasta -I file_dedupRG.bam -o file_dedup.target.interval.list

    and the error trace:

    ERROR stack trace

    java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
    at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:195)
    at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:329)
    at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150)
    at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    It would be great to know how to go ahead.
    Thanks,
    Dario

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @dcopetti
    Hi Dario,

    Thanks for telling us. I will bring this up at the next meeting (today) and try to get this moved up to top priority.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    This was a bug in the htsjdk library. It has been fixed (see here), but we're waiting on the next htsjdk release to incorporate the fix into GATK. Apologies for the inconvenience.

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    Hello, is there any updates to this this week? Seeing this also in my dataset...

    ERROR stack trace

    java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
    at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:195)
    at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:329)
    at org.broadinstitute.gatk.tools.walkers.indels.ReadBin.getReference(ReadBin.java:108)
    at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.clean(IndelRealigner.java:696)
    at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.cleanAndCallMap(IndelRealigner.java:580)
    at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:552)
    at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:148)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:228)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:216)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

  • OwenOOwenO Oxford, UKMember

    Hi Geraldine,
    So I won't be able to use it until the the next htsjdk release, right? Do you have any idea how long this is likely to take?
    Best wishes,
    Owen

    Issue · Github
    by Sheila

    Issue Number
    112
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember
    edited August 2015

    Dear Geraldine,
    The only reason why I want the latest version is to have the RGQ score from the GenotypeGVCF output. Do you know of a previous version that emits the RGQ score and doesn't have this bug in the Indelrealigner?
    On the other hand, if the bug fix is known, is it not possible to do a manual ad hoc fix of htsjdk?
    Best Regards,
    Ray

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Raycui
    Hi Ray,

    If 3.3 does not have RGQ, you can use 3.4. I think the bug was introduced in the very latest version only.

    -Sheila

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember
    edited August 2015

    @Sheila said:
    Raycui
    Hi Ray,

    If 3.3 does not have RGQ, you can use 3.4. I think the bug was introduced in the very latest version only.

    -Sheila

    Hi Sheila,
    thank you. Do you know where I can download a previous version? I can't seem to be able to find a link.
    Best
    Ray

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    FYI we should have the bug fix in the nightly builds within a few days. I've been promised the htsjdk release is imminent.

  • pfreillypfreilly New JerseyMember

    Just in case this is the same situation for others having the bug:
    This may be due to lack of wrapping in the reference FASTA. A quick check to see if this could be the case is to quickly less your .fai file, and check the last two columns. If the numbers correspond to the length of the contig/scaffold/chromosome, your reference is not wrapped, which could cause the issue.

    Wrapping the reference FASTA (e.g. using fastx_toolkit's fasta_formatter -w 50), then indexing and creating a sequence dictionary for the wrapped reference appears to fix the problem altogether.

    It looks like htsjdk release 1.138 has a fix for the buffer overflow condition that may be at the root of this bug, so it might be worth repackaging GATK 3.4-46 with htsjdk 1.138 and testing that out.

    Best regards,
    Patrick

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @pfreilly we're working on it -- the current GATK doesn't compile against the latest htsjdk, so we need to fix that first. It's almost done.

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    @Geraldine_VdAuwera said:
    pfreilly we're working on it -- the current GATK doesn't compile against the latest htsjdk, so we need to fix that first. It's almost done.

    Dear Geraldine, is there any updates on this? Has the fix been integrated into the nightly builds?
    Thanks!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @OwenO @dcopetti @Raycui @pfreilly
    Hi everyone!

    I am happy to say that this issue has been fixed in the latest nightly build :smile:
    You can download it here: https://www.broadinstitute.org/gatk/download/nightly

    -Sheila

  • OwenOOwenO Oxford, UKMember

    Great, thanks. Seems like it's working fine now.
    -Owen

  • I'm having a similar bug with the indelRealigner. Trying the latest nightly build...

Sign In or Register to comment.