Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

NoSuchElementException in ReduceReads

thomas_wthomas_w Posts: 14Member
edited August 2012 in Ask the GATK team

Hi,

when I run ReduceReads I get the following exception just when it's supposed to finish:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.util.NoSuchElementException
at java.util.LinkedList$ListItr.next(Unknown Source)
at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.updateHeaderCounts(SlidingWindow.java:697)
at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.addRead(SlidingWindow.java:128)
at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SingleSampleCompressor.addAlignment(SingleSampleCompressor.java:73)
at org.broadinstitute.sting.gatk.walkers.compression.reducereads.MultiSampleCompressor.addAlignment(MultiSampleCompressor.java:70)
at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReadsStash.compress(ReduceReadsStash.java:67)
at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:347)
at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:86)
at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:107)
at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:52)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:71)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:269)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.0-21-ga40b695):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

I run it with the standard arguments:
java -jar GenomAnalysisTK.jar \
--logging_level ERROR \
-R hg19.fa \
-T ReduceReads \
-I in.bam \
-o reduced.in.bam

Anny suggestions?

Thanks,
Thomas

Post edited by Carneiro on
Tagged:

Best Answer

Answers

  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    seems like a bug to me, would you be able to send me a version of your bam file with the interval where it happens so I can reproduce it and try to fix this?

  • thomas_wthomas_w Posts: 14Member
    edited August 2012

    Well, I think the exception happens after the traversal engine has finished, because this is what happens before the exception (when I set the logging level to INFO instead of ERROR):
    ...

    INFO 20:51:02,048 TraversalEngine - chrX:153296649 9.78e+07 7.4 h 4.6 m 97.9% 7.6 h 9.5 m
    INFO 20:51:32,598 TraversalEngine - chrX:153698873 9.79e+07 7.4 h 4.6 m 97.9% 7.6 h 9.5 m
    INFO 20:52:02,645 TraversalEngine - chrY:59358258 9.81e+07 7.4 h 4.6 m 99.9% 7.5 h 32.1 s
    INFO 20:52:16,410 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.util.NoSuchElementException
    at java.util.LinkedList$ListItr.next(Unknown Source)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.updateHeaderCounts(SlidingWindow.java:697)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.addRead(SlidingWindow.java:128)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SingleSampleCompressor.addAlignment(SingleSampleCompressor.java:73)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.MultiSampleCompressor.addAlignment(MultiSampleCompressor.java:70)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReadsStash.compress(ReduceReadsStash.java:67)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:344)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:84)
    at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:107)
    at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:52)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:71)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:269)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:94)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 1.6-596-g3b9929c):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    I also ran ReduceReads on a file that only contains reads from chrY and it seems to work fine.

    EDIT: I also ran it on a file that contains only unmapped reads, because they are actually the last reads in my original file, but ReducedReads also ran fine. (Of course the resulting file was empty)

    Best Regards,

    Thomas

    Post edited by thomas_w on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    Something's odd here -- your second stack trace has

    ERROR A GATK RUNTIME ERROR has occurred (version 1.6-596-g3b9929c):

    showing version 1.6. I don't know if this is related to your problem with RR, but to be safe you should re-download the latest GATK version. If you are compiling your own, do ant clean first.

    Geraldine Van der Auwera, PhD

  • thomas_wthomas_w Posts: 14Member

    Ah, sorry, that's because I ran it with the INFO loglevel only on this version. But I also ran it with the newest version (see my original post) and got the same error.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    Ah, that makes more sense. Please make sure you specify that kind of thing, if you're showing results obtained with a different version. In any case you should always try to work with the latest version -- we have a very fast-paced development process and have bug fixes pushed in all the time. Especially with v2.0 being a brand new release and ReduceReads being a new tool, things are moving very fast -- and as a result we only provide support for the latest versions.

    Geraldine Van der Auwera, PhD

  • thomas_wthomas_w Posts: 14Member

    Of course, I always try to do that. Actually I downloaded version 1.6 last week (from the new GATK 2 website, which was a little confusing) and tried to run some commands. When I tested ReduceReads some days later and encountered the error (see second post), I of course downloaded the newest version but still got the same error (original post).

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    Last week (aka Big Release Week) was confusing for all of us :)

    Geraldine Van der Auwera, PhD

  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    Now I am confused, how come you're running ReduceReads on version 1.6? It wasn't available on version 1.6.

  • thomas_wthomas_w Posts: 14Member

    I have no idea! It was the first version I was able to download last week from the new GATK 2 website. I also was a little confused about the number, but I thought it might be because it's a beta :-)
    Anyway, I have the same problem with version 2.0-21. I just noticed that there is already a newer version available (2.0-32). Do you think it might solve this problem? i.e. are there any changes for ReduceReads at all in this version? I'm asking because it always takes about 8h to run it on the file where I encountered the problem and if there are no changes there won't be a solution, I guess

  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    No, there are no differences to RR on 2.0-32. Can you create a file with the region that causes the error (using print reads) and send it to me to debug? I have never seen this error here, and we've ran it on a pipeline with tens of thousands of samples.

  • thomas_wthomas_w Posts: 14Member

    Of course, but how exactly can I do that? Can you give me a sample command?

  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    The GATK outputs the region of the genome that it is processing as time progresses. Take the last interval that was printed before the error happened, and re-run with -L interval-(interval+1,000,000) see if the error occurs. If it does, create a bam file using PrintReads and the -L. This way you will be able to reproduce the error within seconds and I can actually analyze it.

  • thomas_wthomas_w Posts: 14Member

    Ok, I reran it on the whole file and again got the same error:


    INFO 09:26:05,084 TraversalEngine - chrX:150869164 9.76e+07 5.6 h 3.4 m 97.8% 5.7 h 7.5 m
    INFO 09:26:35,226 TraversalEngine - chrX:153047556 9.77e+07 5.6 h 3.4 m 97.9% 5.7 h 7.2 m
    INFO 09:27:05,372 TraversalEngine - chrX:153594705 9.79e+07 5.6 h 3.4 m 97.9% 5.7 h 7.2 m
    INFO 09:27:35,583 TraversalEngine - chrX:154159949 9.80e+07 5.6 h 3.4 m 97.9% 5.7 h 7.1 m
    INFO 09:28:02,305 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace
    java.util.NoSuchElementException
    at java.util.LinkedList$ListItr.next(Unknown Source)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.updateHeaderCounts(SlidingWindow.java:697)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.addRead(SlidingWindow.java:128)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SingleSampleCompressor.addAlignment(SingleSampleCompressor.java:73)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.MultiSampleCompressor.addAlignment(MultiSampleCompressor.java:70)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReadsStash.compress(ReduceReadsStash.java:67)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:347)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:86)
    at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:107)
    at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:52)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:71)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:269)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.0-21-ga40b695):
    ##### ERROR
    ##### ERROR Please visit the wiki to see if this is a known problem
    ##### ERROR If not, please post the error, with stack trace, to the GATK forum
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
    ##### ERROR ------------------------------------------------------------------------------------------

    So I tried to run it on the last couple of reads:


    java -jar GenomeAnalysisTK.jar -T ReduceReads -I recal.real.markdup.merged.bam -o test.bam -R hg19.fa -L chrX:154159949-155159949

    And it finished fine:


    INFO 09:31:59,422 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 09:31:59,424 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.0-21-ga40b695, Compiled 2012/07/26 06:21:05
    INFO 09:31:59,424 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 09:31:59,424 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 09:31:59,425 HelpFormatter - Program Args: -T ReduceReads -I recal.real.markdup.merged.bam -o reduced.unaligned.real.markdup.merged.bam -R /data/mirror/goldenpath/hg19/chromosome/hg19.fa -L chrX:154159949-155159949
    INFO 09:31:59,425 HelpFormatter - Date/Time: 2012/08/03 09:31:59
    INFO 09:31:59,425 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 09:31:59,425 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 09:31:59,463 GenomeAnalysisEngine - Strictness is SILENT
    INFO 09:31:59,516 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 09:31:59,533 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
    INFO 09:32:00,350 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
    INFO 09:32:00,350 TraversalEngine - Location processed.reads runtime per.1M.reads completed total.runtime remaining
    INFO 09:32:07,149 TraversalEngine - Total runtime 6.82 secs, 0.11 min, 0.00 hours
    INFO 09:32:07,149 TraversalEngine - 4504 reads were filtered out during traversal out of 46158 total (9.76%)
    INFO 09:32:07,149 TraversalEngine - -> 4504 reads (9.76% of total) failing DuplicateReadFilter
    INFO 09:32:08,374 GATKRunReport - Uploaded run statistics report to AWS S3

    Also on a larger region on chrX it works fine.

  • ebanksebanks Posts: 678GATK Developer mod

    I bet the issue is on chrY or one of the random contigs at the end. It's ugly, but you should probably run with -XL chr1 -XL chr2 ... -XL chrX (and no -L) to generate the error quickly.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • thomas_wthomas_w Posts: 14Member
    edited August 2012

    I tried chrY and it worked fine (I didn't use -XL chrY but created a new bam file with samtools) and I also tried to run it on the unmapped reads and it also worked fine. Ok, so I'll run it for all the chromosomes and see what happens.

    Post edited by thomas_w on
  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    X is only 155,270,560 bases long. So this must have happened in the "next" chromosome according to your reference file. Probably Y as Eric suggests here.

    I'd go ahead and run it with -L chrY and it should happen right at the beginning. Take a look at your bam file in IGV see if there is something funky going on in the region.

    You can then create a debug bam for me with PrintReads using -L chrY. This should be very small. I will also need your reference (unless you're using a standard UCSC reference, just let me know).

    Thanks for helping us find out this bug.

  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    In that case, I'd suggest running with -L chrX -L chrY, could be something in the transition from X to Y and if you run them separately there is no transition.

  • thomas_wthomas_w Posts: 14Member
    edited August 2012

    Thanks for your suggestions, guys. I'll try both of them (i.e. running chrX & chrY and running all chromosomes except chr1...chrX) and keep you updated!

    Post edited by thomas_w on
  • ebanksebanks Posts: 678GATK Developer mod

    It's likely in neither X nor Y (nor 1, 2, 3 ...). There are other contigs in the reference that aren't standard chromosomes. I really think you should just use the -XL syntax that I suggested above.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • thomas_wthomas_w Posts: 14Member

    You were right, Eric. I was able to reproduce the error by skipping all "normal" chromosomes. I produced a debug.bam file including all the rest. Where can I upload it? (it has 93MB). Do you also need our reference genome? It's standard UCSC hg19, but chromosomes are in a different order than in the reference file from the GATK resource bundle.

  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    Please upload both in a zip file to our FTP.

    Instructions on how to upload are here :
    http://gatk.vanillaforums.com/discussion/1215/how-can-i-access-the-gsa-public-ftp-server/p1

    please let me know when the files are up, I'll look into it asap! Thank you!

  • thomas_wthomas_w Posts: 14Member

    The upload has finished now.The file is called new.thomas_w_reducereads_exception.tar. Please ignore thomas_w_reducereads_exception.tar, the transmission failed at one point and I wasn't able to overwrite it.

  • JLand52JLand52 Posts: 1Member

    I am having a similar error. It looks like the error occurs on GL000226.1. Please let me know if you want me to upload a bam file.

    ERROR stack trace

    java.util.NoSuchElementException
    at java.util.LinkedList$ListItr.next(LinkedList.java:715)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.updateHeaderCounts(SlidingWindow.java:697)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.addRead(SlidingWindow.java:128)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SingleSampleCompressor.addAlignment(SingleSampleCompressor.java:73)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.MultiSampleCompressor.addAlignment(MultiSampleCompressor.java:70)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReadsStash.compress(ReduceReadsStash.java:67)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:347)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:86)
    at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:107)
    at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:52)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:71)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:269)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    I'm working on this bug right now, let me see if I can fix it with thomas' bam file first.

  • thomas_wthomas_w Posts: 14Member
  • loranialorania Posts: 9Member

    Hi Mauricio,

    Have you fixed this bug already? I'm running GATK v2.1-8-g5efb575 and ReduceReads produces a similar error:
    ERROR MESSAGE: Trying to clip before the start or after the end of a read.
    I'm working with mouse genome mm10 and it stops at the second chromosome.
    I would be very grateful for your comment.

Sign In or Register to comment.