The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

MunichPosts: 17Member
edited August 2012

Hi,

when I run ReduceReads I get the following exception just when it's supposed to finish:

##### ERROR stack trace

java.util.NoSuchElementException

##### ERROR ------------------------------------------------------------------------------------------

I also ran ReduceReads on a file that only contains reads from chrY and it seems to work fine.

##### EDIT: I also ran it on a file that contains only unmapped reads, because they are actually the last reads in my original file, but ReducedReads also ran fine. (Of course the resulting file was empty)

Best Regards,

Thomas

Post edited by thomas_w on

Something's odd here -- your second stack trace has

ERROR A GATK RUNTIME ERROR has occurred (version 1.6-596-g3b9929c):


showing version 1.6. I don't know if this is related to your problem with RR, but to be safe you should re-download the latest GATK version. If you are compiling your own, do ant clean first.

Geraldine Van der Auwera, PhD

• MunichPosts: 17Member

Ah, sorry, that's because I ran it with the INFO loglevel only on this version. But I also ran it with the newest version (see my original post) and got the same error.

Ah, that makes more sense. Please make sure you specify that kind of thing, if you're showing results obtained with a different version. In any case you should always try to work with the latest version -- we have a very fast-paced development process and have bug fixes pushed in all the time. Especially with v2.0 being a brand new release and ReduceReads being a new tool, things are moving very fast -- and as a result we only provide support for the latest versions.

Geraldine Van der Auwera, PhD

• MunichPosts: 17Member

Of course, I always try to do that. Actually I downloaded version 1.6 last week (from the new GATK 2 website, which was a little confusing) and tried to run some commands. When I tested ReduceReads some days later and encountered the error (see second post), I of course downloaded the newest version but still got the same error (original post).

Last week (aka Big Release Week) was confusing for all of us

Geraldine Van der Auwera, PhD

Now I am confused, how come you're running ReduceReads on version 1.6? It wasn't available on version 1.6.

• MunichPosts: 17Member

I have no idea! It was the first version I was able to download last week from the new GATK 2 website. I also was a little confused about the number, but I thought it might be because it's a beta :-)
Anyway, I have the same problem with version 2.0-21. I just noticed that there is already a newer version available (2.0-32). Do you think it might solve this problem? i.e. are there any changes for ReduceReads at all in this version? I'm asking because it always takes about 8h to run it on the file where I encountered the problem and if there are no changes there won't be a solution, I guess

No, there are no differences to RR on 2.0-32. Can you create a file with the region that causes the error (using print reads) and send it to me to debug? I have never seen this error here, and we've ran it on a pipeline with tens of thousands of samples.

• MunichPosts: 17Member

Of course, but how exactly can I do that? Can you give me a sample command?

The GATK outputs the region of the genome that it is processing as time progresses. Take the last interval that was printed before the error happened, and re-run with -L interval-(interval+1,000,000) see if the error occurs. If it does, create a bam file using PrintReads and the -L. This way you will be able to reproduce the error within seconds and I can actually analyze it.

• MunichPosts: 17Member

Ok, I reran it on the whole file and again got the same error:

INFO  09:26:05,084 TraversalEngine -  chrX:150869164        9.76e+07    5.6 h        3.4 m     97.8%         5.7 h     7.5 m
INFO  09:26:35,226 TraversalEngine -  chrX:153047556        9.77e+07    5.6 h        3.4 m     97.9%         5.7 h     7.2 m
INFO  09:27:05,372 TraversalEngine -  chrX:153594705        9.79e+07    5.6 h        3.4 m     97.9%         5.7 h     7.2 m
INFO  09:27:35,583 TraversalEngine -  chrX:154159949        9.80e+07    5.6 h        3.4 m     97.9%         5.7 h     7.1 m
INFO  09:28:02,305 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.util.NoSuchElementException
at java.util.LinkedListListItr.next(Unknown Source) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.updateHeaderCounts(SlidingWindow.java:697) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.addRead(SlidingWindow.java:128) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SingleSampleCompressor.addAlignment(SingleSampleCompressor.java:73) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.MultiSampleCompressor.addAlignment(MultiSampleCompressor.java:70) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReadsStash.compress(ReduceReadsStash.java:67) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:347) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:86) at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:107) at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:52) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:71) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:269) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93) ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.0-21-ga40b695): ##### ERROR ##### ERROR Please visit the wiki to see if this is a known problem ##### ERROR If not, please post the error, with stack trace, to the GATK forum ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk ##### ERROR ##### ERROR MESSAGE: Code exception (see stack trace for error itself) ##### ERROR ------------------------------------------------------------------------------------------  So I tried to run it on the last couple of reads: java -jar GenomeAnalysisTK.jar -T ReduceReads -I recal.real.markdup.merged.bam -o test.bam -R hg19.fa -L chrX:154159949-155159949  And it finished fine: INFO 09:31:59,422 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:31:59,424 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.0-21-ga40b695, Compiled 2012/07/26 06:21:05 INFO 09:31:59,424 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 09:31:59,424 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 09:31:59,425 HelpFormatter - Program Args: -T ReduceReads -I recal.real.markdup.merged.bam -o reduced.unaligned.real.markdup.merged.bam -R /data/mirror/goldenpath/hg19/chromosome/hg19.fa -L chrX:154159949-155159949 INFO 09:31:59,425 HelpFormatter - Date/Time: 2012/08/03 09:31:59 INFO 09:31:59,425 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:31:59,425 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:31:59,463 GenomeAnalysisEngine - Strictness is SILENT INFO 09:31:59,516 SAMDataSourceSAMReaders - Initializing SAMRecords in serial
INFO  09:31:59,533 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 09:32:00,350 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] INFO 09:32:00,350 TraversalEngine - Location processed.reads runtime per.1M.reads completed total.runtime remaining INFO 09:32:07,149 TraversalEngine - Total runtime 6.82 secs, 0.11 min, 0.00 hours INFO 09:32:07,149 TraversalEngine - 4504 reads were filtered out during traversal out of 46158 total (9.76%) INFO 09:32:07,149 TraversalEngine - -> 4504 reads (9.76% of total) failing DuplicateReadFilter INFO 09:32:08,374 GATKRunReport - Uploaded run statistics report to AWS S3  Also on a larger region on chrX it works fine. • Broad InstitutePosts: 698Member, Administrator, Broadie, Moderator, Dev admin I bet the issue is on chrY or one of the random contigs at the end. It's ugly, but you should probably run with -XL chr1 -XL chr2 ... -XL chrX (and no -L) to generate the error quickly. Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT • MunichPosts: 17Member edited August 2012 I tried chrY and it worked fine (I didn't use -XL chrY but created a new bam file with samtools) and I also tried to run it on the unmapped reads and it also worked fine. Ok, so I'll run it for all the chromosomes and see what happens. Post edited by thomas_w on • Posts: 274Administrator, Dev admin X is only 155,270,560 bases long. So this must have happened in the "next" chromosome according to your reference file. Probably Y as Eric suggests here. I'd go ahead and run it with -L chrY and it should happen right at the beginning. Take a look at your bam file in IGV see if there is something funky going on in the region. You can then create a debug bam for me with PrintReads using -L chrY. This should be very small. I will also need your reference (unless you're using a standard UCSC reference, just let me know). Thanks for helping us find out this bug. • Posts: 274Administrator, Dev admin In that case, I'd suggest running with -L chrX -L chrY, could be something in the transition from X to Y and if you run them separately there is no transition. • MunichPosts: 17Member edited August 2012 Thanks for your suggestions, guys. I'll try both of them (i.e. running chrX & chrY and running all chromosomes except chr1...chrX) and keep you updated! Post edited by thomas_w on • Broad InstitutePosts: 698Member, Administrator, Broadie, Moderator, Dev admin It's likely in neither X nor Y (nor 1, 2, 3 ...). There are other contigs in the reference that aren't standard chromosomes. I really think you should just use the -XL syntax that I suggested above. Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT • MunichPosts: 17Member You were right, Eric. I was able to reproduce the error by skipping all "normal" chromosomes. I produced a debug.bam file including all the rest. Where can I upload it? (it has 93MB). Do you also need our reference genome? It's standard UCSC hg19, but chromosomes are in a different order than in the reference file from the GATK resource bundle. • Posts: 274Administrator, Dev admin Please upload both in a zip file to our FTP. Instructions on how to upload are here : http://gatk.vanillaforums.com/discussion/1215/how-can-i-access-the-gsa-public-ftp-server/p1 please let me know when the files are up, I'll look into it asap! Thank you! • MunichPosts: 17Member The upload has finished now.The file is called new.thomas_w_reducereads_exception.tar. Please ignore thomas_w_reducereads_exception.tar, the transmission failed at one point and I wasn't able to overwrite it. • Posts: 1Member I am having a similar error. It looks like the error occurs on GL000226.1. Please let me know if you want me to upload a bam file. ##### ERROR stack trace java.util.NoSuchElementException at java.util.LinkedList$ListItr.next(LinkedList.java:715)

I'm working on this bug right now, let me see if I can fix it with thomas' bam file first.

Fixed!

Your bam had reads that were soft-clipped past the boundaries of the chromosome. ReduceReads was not ready for that! Now it is. I've pushed the fix and it will come up live in the next few hours!

• MunichPosts: 17Member

Great, I'll give it a try tomorrow!

• Posts: 9Member

Hi Mauricio,

Have you fixed this bug already? I'm running GATK v2.1-8-g5efb575 and ReduceReads produces a similar error:
ERROR MESSAGE: Trying to clip before the start or after the end of a read.
I'm working with mouse genome mm10 and it stops at the second chromosome.
I would be very grateful for your comment.