We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
ReduceReads v2.7-4-g6f46d11: Read error; BinaryCodec in readmode

Hi guys,
I've seen this error has been reported other times, for different reasons.
The thing is that, the bam file I'm using to reduce the reads has been processed through GATK pipeline without problems, realignment and recalibration included.
Therefore, I assumed the bam file generated after BQSR would be GATK-compliant.
I was running with Queue, so I just run here the exact command passed to the job in an interactive mode, to see what happens.
Here below is the full command and error message (apologies for lengthy output), where there's no stack trace after the error.
[[email protected] reduced]$ 'java' '-Xmx12288m' '-Djava.io.tmpdir=/scratch/scratch/fles/project_analysis/reduced/tmp' '-cp' '/home/fles/applications/Queue-2.7-4-g6f46d11/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'ReduceReads' '-I' '/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam' '-R' '/home/fles/Scratch/gatkbundle_2.5/human_g1k_v37.fasta' '-o' '/scratch/scratch/fles/project_analysis/reduced/projectTrios.U1_PJ5208467.recal.reduced.bam' INFO 09:27:21,728 HelpFormatter - -------------------------------------------------------------------------------- INFO 09:27:21,730 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:29:52 INFO 09:27:21,731 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 09:27:21,731 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 09:27:21,735 HelpFormatter - Program Args: -T ReduceReads -I /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam -R /home/fles/Scratch/gatkbundle_2.5/human_g1k_v37.fasta -o /scratch/scratch/fles/project_analysis/reduced/projectTrios.U1_PJ5208467.recal.reduced.bam INFO 09:27:21,735 HelpFormatter - Date/Time: 2013/11/08 09:27:21 INFO 09:27:21,735 HelpFormatter - -------------------------------------------------------------------------------- INFO 09:27:21,735 HelpFormatter - -------------------------------------------------------------------------------- INFO 09:27:34,156 GenomeAnalysisEngine - Strictness is SILENT INFO 09:27:34,491 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 40 INFO 09:27:34,503 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 09:27:34,627 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.12 INFO 09:27:35,039 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 09:27:35,045 GenomeAnalysisEngine - Done preparing for traversal INFO 09:27:35,045 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 09:27:35,046 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining INFO 09:27:35,080 ReadShardBalancer$1 - Loading BAM index data INFO 09:27:35,081 ReadShardBalancer$1 - Done loading BAM index data INFO 09:28:05,059 ProgressMeter - 1:18958138 1.00e+06 30.0 s 30.0 s 0.6% 81.8 m 81.3 m INFO 09:28:35,069 ProgressMeter - 1:46733396 2.30e+06 60.0 s 26.0 s 1.5% 66.4 m 65.4 m INFO 09:29:05,079 ProgressMeter - 1:92187730 3.50e+06 90.0 s 25.0 s 3.0% 50.5 m 49.0 m INFO 09:29:35,088 ProgressMeter - 1:145281942 4.90e+06 120.0 s 24.0 s 4.7% 42.7 m 40.7 m INFO 09:30:05,098 ProgressMeter - 1:152323864 6.40e+06 2.5 m 23.0 s 4.9% 50.9 m 48.4 m INFO 09:30:35,893 ProgressMeter - 1:181206886 7.70e+06 3.0 m 23.0 s 5.8% 51.4 m 48.4 m INFO 09:31:05,902 ProgressMeter - 1:217604563 8.90e+06 3.5 m 23.0 s 7.0% 49.9 m 46.4 m INFO 09:31:35,913 ProgressMeter - 2:14782401 1.02e+07 4.0 m 23.0 s 8.5% 47.0 m 43.0 m INFO 09:32:05,922 ProgressMeter - 2:62429207 1.15e+07 4.5 m 23.0 s 10.0% 44.8 m 40.3 m INFO 09:32:35,931 ProgressMeter - 2:97877374 1.28e+07 5.0 m 23.0 s 11.2% 44.7 m 39.7 m INFO 09:33:06,218 ProgressMeter - 2:135574018 1.42e+07 5.5 m 23.0 s 12.4% 44.5 m 38.9 m INFO 09:33:36,227 ProgressMeter - 2:179431307 1.56e+07 6.0 m 23.0 s 13.8% 43.5 m 37.5 m INFO 09:34:06,237 ProgressMeter - 2:216279690 1.69e+07 6.5 m 23.0 s 15.0% 43.4 m 36.9 m INFO 09:34:36,248 ProgressMeter - 3:14974731 1.81e+07 7.0 m 23.0 s 16.4% 42.9 m 35.9 m INFO 09:35:07,073 ProgressMeter - 3:52443620 1.94e+07 7.5 m 23.0 s 17.6% 42.9 m 35.4 m INFO 09:35:37,084 ProgressMeter - 3:111366536 2.05e+07 8.0 m 23.0 s 19.5% 41.3 m 33.2 m INFO 09:36:07,094 ProgressMeter - 3:155571144 2.18e+07 8.5 m 23.0 s 20.9% 40.8 m 32.3 m INFO 09:36:37,103 ProgressMeter - 4:3495327 2.31e+07 9.0 m 23.0 s 22.4% 40.4 m 31.3 m INFO 09:37:07,114 ProgressMeter - 4:48178306 2.43e+07 9.5 m 23.0 s 23.8% 40.0 m 30.5 m INFO 09:37:37,270 ProgressMeter - 4:106747046 2.56e+07 10.0 m 23.0 s 25.7% 39.0 m 29.0 m INFO 09:38:07,483 ProgressMeter - 4:181303657 2.69e+07 10.5 m 23.0 s 28.1% 37.5 m 26.9 m INFO 09:38:37,493 ProgressMeter - 5:41149454 2.81e+07 11.0 m 23.0 s 29.7% 37.1 m 26.1 m INFO 09:38:51,094 GATKRunReport - Uploaded run statistics report to AWS S3 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A USER ERROR has occurred (version 2.7-4-g6f46d11): ##### ERROR ##### ERROR This means that one or more arguments or inputs in your command are incorrect. ##### ERROR The error message below tells you what is the problem. ##### ERROR ##### ERROR If the problem is an invalid argument, please check the online documentation guide ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool. ##### ERROR ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk ##### ERROR ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself. ##### ERROR ##### ERROR MESSAGE: SAM/BAM file /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam is malformed: Read error; BinaryCodec in readmode; file: /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam ##### ERROR ------------------------------------------------------------------------------------------
Following your usual advice, I validated the bam file produced by BQSR with Picard and I get the exact same error, but no specific error indication
[[email protected] reduced]$ java -jar ~/applications/picard-tools-1.102/ValidateSamFile.jar \ > INPUT=/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam \ > IGNORE_WARNINGS=TRUE [Fri Nov 08 09:59:42 GMT 2013] net.sf.picard.sam.ValidateSamFile INPUT=/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam IGNORE_WARNINGS=true MAX_OUTPUT=100 VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Fri Nov 08 09:59:42 GMT 2013] Executing as [email protected] on Linux 2.6.18-194.11.4.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18; Picard version: 1.102(1591) INFO 2013-11-08 10:01:01 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:01:18s. Time for last 10,000,000: 78s. Last read position: 1:204,966,172 INFO 2013-11-08 10:02:19 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:02:36s. Time for last 10,000,000: 78s. Last read position: 2:232,121,396 INFO 2013-11-08 10:03:36 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:03:54s. Time for last 10,000,000: 77s. Last read position: 4:123,140,629 [Fri Nov 08 10:04:00 GMT 2013] net.sf.picard.sam.ValidateSamFile done. Elapsed time: 4.30 minutes. Runtime.totalMemory()=300941312 To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp Exception in thread "main" net.sf.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:397) at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:200) at net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:558) at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:532) at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522) at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:687) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:665) at net.sf.picard.sam.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:241) at net.sf.picard.sam.SamFileValidator.validateSamFile(SamFileValidator.java:177) at net.sf.picard.sam.SamFileValidator.validateSamFileSummary(SamFileValidator.java:104) at net.sf.picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:164) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177) at net.sf.picard.sam.ValidateSamFile.main(ValidateSamFile.java:100) Caused by: java.io.IOException: Unexpected compressed block length: 1 at net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:358) at net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:113) at net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:238) at java.io.DataInputStream.read(DataInputStream.java:149) at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:395)
any suggestions on what I might do wrong?
Best Answer
-
Geraldine_VdAuwera Cambridge, MA admin
Hi @flescai,
I think one of your files was corrupted or truncated; either during a transfer operation or a run terminated by accident. Some tools will notice the issue and some won't, due to how they access data, which is probably why the errors were inconsistent. Re-running the whole pipeline is the right thing to do in such cases.
Answers
Hi Francesco,
Sometimes one walker trips on a bam file issue that others didn't see because they accessed different data or processed it differently. So it's not entirely unusual for a bam to only cause issues several steps into the pipeline. Have you tried running your original bam through Picard validation?
Hi @Geraldine_VdAuwera,
I've run it now, following your suggestion. there's a couple of errors but picard doesn't crash as with the recalibrated one.
Not sure if this could point us somewhere, what do you think?
thank you very much!
Hmm, it does sound like GATK introduced the error somewhere. Is the bam you tested just now the pre-recalibration one, or the original before any GATK processing? And can you post all command lines that were performed on the bams?
Actually, before you do anything else, try rebuilding the index file for the problem bam. Sometimes it's just the index that got corrupted somehow.
ok, just a second and I'll do it now :-)
the one I tested right now is the original one, before any GATK processing.
tried to rebuild the index of the GATK recalibrated bam file.
both with Picard and with Samtools.
Picard gives the very same error also when building the index
Interestingly (if we can say so) also Samtools gives a slightly different error message
and the two indexes have indeed a different size
I tried to run Picard Validation with both indexes, and they both give the same error as before.
I am also running again the entire pipeline on these two files, but I don't have other ideas..
thanks
if we want to make things a bit more complicated, I've re-run the entire pipeline on those two samples, and the new recalibrated bam files generated at the end were this time ok.
ReduceRead didn't give any other error, and produced a validated reduced bam file.
I would be really curious to know where such a variability might come from.
but thanks for the suggestions.
Hi @flescai,
I think one of your files was corrupted or truncated; either during a transfer operation or a run terminated by accident. Some tools will notice the issue and some won't, due to how they access data, which is probably why the errors were inconsistent. Re-running the whole pipeline is the right thing to do in such cases.