ReduceReads v2.7-4-g6f46d11: Read error; BinaryCodec in readmode

flescaiflescai Posts: 53Member ✭✭

Hi guys, I've seen this error has been reported other times, for different reasons. The thing is that, the bam file I'm using to reduce the reads has been processed through GATK pipeline without problems, realignment and recalibration included. Therefore, I assumed the bam file generated after BQSR would be GATK-compliant. I was running with Queue, so I just run here the exact command passed to the job in an interactive mode, to see what happens.

Here below is the full command and error message (apologies for lengthy output), where there's no stack trace after the error.

        [fles@login07 reduced]$ 'java'  '-Xmx12288m'  '-Djava.io.tmpdir=/scratch/scratch/fles/project_analysis/reduced/tmp'  '-cp' '/home/fles/applications/Queue-2.7-4-g6f46d11/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'ReduceReads'  '-I' '/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam'  '-R' '/home/fles/Scratch/gatkbundle_2.5/human_g1k_v37.fasta'  '-o' '/scratch/scratch/fles/project_analysis/reduced/projectTrios.U1_PJ5208467.recal.reduced.bam'
        INFO  09:27:21,728 HelpFormatter - -------------------------------------------------------------------------------- 
        INFO  09:27:21,730 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:29:52 
        INFO  09:27:21,731 HelpFormatter - Copyright (c) 2010 The Broad Institute 
        INFO  09:27:21,731 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
        INFO  09:27:21,735 HelpFormatter - Program Args: -T ReduceReads -I /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam -R /home/fles/Scratch/gatkbundle_2.5/human_g1k_v37.fasta -o /scratch/scratch/fles/project_analysis/reduced/projectTrios.U1_PJ5208467.recal.reduced.bam 
        INFO  09:27:21,735 HelpFormatter - Date/Time: 2013/11/08 09:27:21 
        INFO  09:27:21,735 HelpFormatter - -------------------------------------------------------------------------------- 
        INFO  09:27:21,735 HelpFormatter - -------------------------------------------------------------------------------- 
        INFO  09:27:34,156 GenomeAnalysisEngine - Strictness is SILENT 
        INFO  09:27:34,491 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 40 
        INFO  09:27:34,503 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
        INFO  09:27:34,627 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.12 
        INFO  09:27:35,039 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
        INFO  09:27:35,045 GenomeAnalysisEngine - Done preparing for traversal 
        INFO  09:27:35,045 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
        INFO  09:27:35,046 ProgressMeter -        Location processed.reads  runtime per.1M.reads completed total.runtime remaining 
        INFO  09:27:35,080 ReadShardBalancer$1 - Loading BAM index data 
        INFO  09:27:35,081 ReadShardBalancer$1 - Done loading BAM index data 
        INFO  09:28:05,059 ProgressMeter -      1:18958138        1.00e+06   30.0 s       30.0 s      0.6%        81.8 m    81.3 m 
        INFO  09:28:35,069 ProgressMeter -      1:46733396        2.30e+06   60.0 s       26.0 s      1.5%        66.4 m    65.4 m 
        INFO  09:29:05,079 ProgressMeter -      1:92187730        3.50e+06   90.0 s       25.0 s      3.0%        50.5 m    49.0 m 
        INFO  09:29:35,088 ProgressMeter -     1:145281942        4.90e+06  120.0 s       24.0 s      4.7%        42.7 m    40.7 m 
        INFO  09:30:05,098 ProgressMeter -     1:152323864        6.40e+06    2.5 m       23.0 s      4.9%        50.9 m    48.4 m 
        INFO  09:30:35,893 ProgressMeter -     1:181206886        7.70e+06    3.0 m       23.0 s      5.8%        51.4 m    48.4 m 
        INFO  09:31:05,902 ProgressMeter -     1:217604563        8.90e+06    3.5 m       23.0 s      7.0%        49.9 m    46.4 m 
        INFO  09:31:35,913 ProgressMeter -      2:14782401        1.02e+07    4.0 m       23.0 s      8.5%        47.0 m    43.0 m 
        INFO  09:32:05,922 ProgressMeter -      2:62429207        1.15e+07    4.5 m       23.0 s     10.0%        44.8 m    40.3 m 
        INFO  09:32:35,931 ProgressMeter -      2:97877374        1.28e+07    5.0 m       23.0 s     11.2%        44.7 m    39.7 m 
        INFO  09:33:06,218 ProgressMeter -     2:135574018        1.42e+07    5.5 m       23.0 s     12.4%        44.5 m    38.9 m 
        INFO  09:33:36,227 ProgressMeter -     2:179431307        1.56e+07    6.0 m       23.0 s     13.8%        43.5 m    37.5 m 
        INFO  09:34:06,237 ProgressMeter -     2:216279690        1.69e+07    6.5 m       23.0 s     15.0%        43.4 m    36.9 m 
        INFO  09:34:36,248 ProgressMeter -      3:14974731        1.81e+07    7.0 m       23.0 s     16.4%        42.9 m    35.9 m 
        INFO  09:35:07,073 ProgressMeter -      3:52443620        1.94e+07    7.5 m       23.0 s     17.6%        42.9 m    35.4 m 
        INFO  09:35:37,084 ProgressMeter -     3:111366536        2.05e+07    8.0 m       23.0 s     19.5%        41.3 m    33.2 m 
        INFO  09:36:07,094 ProgressMeter -     3:155571144        2.18e+07    8.5 m       23.0 s     20.9%        40.8 m    32.3 m 
        INFO  09:36:37,103 ProgressMeter -       4:3495327        2.31e+07    9.0 m       23.0 s     22.4%        40.4 m    31.3 m 
        INFO  09:37:07,114 ProgressMeter -      4:48178306        2.43e+07    9.5 m       23.0 s     23.8%        40.0 m    30.5 m 
        INFO  09:37:37,270 ProgressMeter -     4:106747046        2.56e+07   10.0 m       23.0 s     25.7%        39.0 m    29.0 m 
        INFO  09:38:07,483 ProgressMeter -     4:181303657        2.69e+07   10.5 m       23.0 s     28.1%        37.5 m    26.9 m 
        INFO  09:38:37,493 ProgressMeter -      5:41149454        2.81e+07   11.0 m       23.0 s     29.7%        37.1 m    26.1 m 
        INFO  09:38:51,094 GATKRunReport - Uploaded run statistics report to AWS S3 
        ##### ERROR ------------------------------------------------------------------------------------------
        ##### ERROR A USER ERROR has occurred (version 2.7-4-g6f46d11): 
        ##### ERROR
        ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
        ##### ERROR The error message below tells you what is the problem.
        ##### ERROR
        ##### ERROR If the problem is an invalid argument, please check the online documentation guide
        ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
        ##### ERROR
        ##### ERROR Visit our website and forum for extensive documentation and answers to 
        ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
        ##### ERROR
        ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
        ##### ERROR
        ##### ERROR MESSAGE: SAM/BAM file /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam is malformed: Read error; BinaryCodec in readmode; file: /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam
        ##### ERROR ------------------------------------------------------------------------------------------

Following your usual advice, I validated the bam file produced by BQSR with Picard and I get the exact same error, but no specific error indication

        [fles@login07 reduced]$ java -jar ~/applications/picard-tools-1.102/ValidateSamFile.jar \
        > INPUT=/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam \
        > IGNORE_WARNINGS=TRUE
        [Fri Nov 08 09:59:42 GMT 2013] net.sf.picard.sam.ValidateSamFile INPUT=/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam IGNORE_WARNINGS=true    MAX_OUTPUT=100 VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
        [Fri Nov 08 09:59:42 GMT 2013] Executing as fles@login07 on Linux 2.6.18-194.11.4.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18; Picard version: 1.102(1591)
        INFO    2013-11-08 10:01:01 SamFileValidator    Validated Read    10,000,000 records.  Elapsed time: 00:01:18s.  Time for last 10,000,000:   78s.  Last read position: 1:204,966,172
        INFO    2013-11-08 10:02:19 SamFileValidator    Validated Read    20,000,000 records.  Elapsed time: 00:02:36s.  Time for last 10,000,000:   78s.  Last read position: 2:232,121,396
        INFO    2013-11-08 10:03:36 SamFileValidator    Validated Read    30,000,000 records.  Elapsed time: 00:03:54s.  Time for last 10,000,000:   77s.  Last read position: 4:123,140,629
        [Fri Nov 08 10:04:00 GMT 2013] net.sf.picard.sam.ValidateSamFile done. Elapsed time: 4.30 minutes.
        Runtime.totalMemory()=300941312
        To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
        Exception in thread "main" net.sf.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam
            at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:397)
            at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371)
            at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357)
            at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:200)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:558)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:532)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
            at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:687)
            at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:665)
            at net.sf.picard.sam.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:241)
            at net.sf.picard.sam.SamFileValidator.validateSamFile(SamFileValidator.java:177)
            at net.sf.picard.sam.SamFileValidator.validateSamFileSummary(SamFileValidator.java:104)
            at net.sf.picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:164)
            at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
            at net.sf.picard.sam.ValidateSamFile.main(ValidateSamFile.java:100)
        Caused by: java.io.IOException: Unexpected compressed block length: 1
            at net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:358)
            at net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:113)
            at net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:238)
            at java.io.DataInputStream.read(DataInputStream.java:149)
            at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:395)

any suggestions on what I might do wrong?

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,804Administrator, GATK Developer admin

    Hi Francesco,

    Sometimes one walker trips on a bam file issue that others didn't see because they accessed different data or processed it differently. So it's not entirely unusual for a bam to only cause issues several steps into the pipeline. Have you tried running your original bam through Picard validation?

    Geraldine Van der Auwera, PhD

  • flescaiflescai Posts: 53Member ✭✭

    Hi @Geraldine_VdAuwera, I've run it now, following your suggestion. there's a couple of errors but picard doesn't crash as with the recalibrated one.

    Not sure if this could point us somewhere, what do you think?

        [Fri Nov 08 16:41:45 GMT 2013] net.sf.picard.sam.ValidateSamFile INPUT=1115m_U1_PJ5208467.bam IGNORE_WARNINGS=true    MODE=VERBOSE MAX_OUTPUT=100 VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
        [Fri Nov 08 16:41:45 GMT 2013] Executing as fles@login07 on Linux 2.6.18-194.11.4.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18; Picard version: 1.102(1591)
        INFO    2013-11-08 16:42:47     SamFileValidator        Validated Read    10,000,000 records.  Elapsed time: 00:01:01s.  Time for last 10,000,000:   61s.  Last read position: 1:204,966,172
        INFO    2013-11-08 16:43:48     SamFileValidator        Validated Read    20,000,000 records.  Elapsed time: 00:02:02s.  Time for last 10,000,000:   60s.  Last read position: 2:232,121,396
        INFO    2013-11-08 16:44:49     SamFileValidator        Validated Read    30,000,000 records.  Elapsed time: 00:03:03s.  Time for last 10,000,000:   60s.  Last read position: 4:123,140,629
        INFO    2013-11-08 16:45:50     SamFileValidator        Validated Read    40,000,000 records.  Elapsed time: 00:04:04s.  Time for last 10,000,000:   60s.  Last read position: 6:122,768,099
        INFO    2013-11-08 16:46:51     SamFileValidator        Validated Read    50,000,000 records.  Elapsed time: 00:05:05s.  Time for last 10,000,000:   61s.  Last read position: 8:121,383,406
        INFO    2013-11-08 16:47:52     SamFileValidator        Validated Read    60,000,000 records.  Elapsed time: 00:06:06s.  Time for last 10,000,000:   61s.  Last read position: 11:7,063,975
        INFO    2013-11-08 16:48:53     SamFileValidator        Validated Read    70,000,000 records.  Elapsed time: 00:07:07s.  Time for last 10,000,000:   60s.  Last read position: 13:25,268,537
        INFO    2013-11-08 16:49:54     SamFileValidator        Validated Read    80,000,000 records.  Elapsed time: 00:08:08s.  Time for last 10,000,000:   61s.  Last read position: 16:18,569,005
        INFO    2013-11-08 16:50:55     SamFileValidator        Validated Read    90,000,000 records.  Elapsed time: 00:09:09s.  Time for last 10,000,000:   60s.  Last read position: 19:1,032,360
        INFO    2013-11-08 16:51:56     SamFileValidator        Validated Read   100,000,000 records.  Elapsed time: 00:10:10s.  Time for last 10,000,000:   60s.  Last read position: X:31,462,729
        ERROR: Record 104240990, Read name HS22_07385:1:2101:16694:85750#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104241007, Read name HS20_07145:1:2103:20142:196788#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104241008, Read name HS22_07385:1:2108:11838:37540#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104241011, Read name HS20_07145:1:1308:16122:193357#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104241014, Read name HS20_07145:1:2107:12506:36827#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104241016, Read name HS22_07385:1:2107:15518:109117#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104241020, Read name HS20_07145:1:1203:1767:34485#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104241112, Read name HS20_07145:1:1103:3324:136673#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104266630, Read name HS22_07385:1:2208:15104:132685#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104668425, Read name HS22_07385:1:1301:5701:51053#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104668712, Read name HS22_07385:1:1204:16977:43814#1, MAPQ should be 0 for unmapped read.
        ERROR: Record 104673866, Read name HS20_07145:1:2206:18198:37658#1, MAPQ should be 0 for unmapped read.
        [Fri Nov 08 16:52:30 GMT 2013] net.sf.picard.sam.ValidateSamFile done. Elapsed time: 10.75 minutes.
        Runtime.totalMemory()=325582848
        To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
    

    thank you very much!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,804Administrator, GATK Developer admin

    Hmm, it does sound like GATK introduced the error somewhere. Is the bam you tested just now the pre-recalibration one, or the original before any GATK processing? And can you post all command lines that were performed on the bams?

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,804Administrator, GATK Developer admin

    Actually, before you do anything else, try rebuilding the index file for the problem bam. Sometimes it's just the index that got corrupted somehow.

    Geraldine Van der Auwera, PhD

  • flescaiflescai Posts: 53Member ✭✭

    ok, just a second and I'll do it now :-) the one I tested right now is the original one, before any GATK processing.

  • flescaiflescai Posts: 53Member ✭✭

    tried to rebuild the index of the GATK recalibrated bam file. both with Picard and with Samtools.

    Picard gives the very same error also when building the index

        [Fri Nov 08 17:12:27 GMT 2013] net.sf.picard.sam.BuildBamIndex INPUT=PJTrios.UK10K_PJ5208467.clean.dedup.recal.bam VALIDATION_STRINGENCY=SILENT    VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
        [Fri Nov 08 17:12:27 GMT 2013] Executing as fles@login09 on Linux 2.6.18-194.11.4.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18; Picard version: 1.102(1591)
        INFO    2013-11-08 17:12:31 BuildBamIndex   1000000 reads processed ...
        ----[skipped lines here]----
        INFO    2013-11-08 17:13:54 BuildBamIndex   28000000 reads processed ...
        INFO    2013-11-08 17:13:57 BuildBamIndex   29000000 reads processed ...
        INFO    2013-11-08 17:14:00 BuildBamIndex   30000000 reads processed ...
        INFO    2013-11-08 17:14:03 BuildBamIndex   31000000 reads processed ...
        INFO    2013-11-08 17:14:07 BuildBamIndex   32000000 reads processed ...
        [Fri Nov 08 17:14:10 GMT 2013] net.sf.picard.sam.BuildBamIndex done. Elapsed time: 1.71 minutes.
        Runtime.totalMemory()=300941312
        To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
        Exception in thread "main" net.sf.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: /scratch/scratch/fles/PJ_analysis/recalibrated/PJTrios.UK10K_PJ5208467.clean.dedup.recal.bam
            at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:397)
            at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371)
            at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357)
            at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:200)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:558)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:532)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
            at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:687)
            at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:665)
            at net.sf.picard.sam.BuildBamIndex.createIndex(BuildBamIndex.java:141)
            at net.sf.picard.sam.BuildBamIndex.doWork(BuildBamIndex.java:120)
            at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
            at net.sf.picard.sam.BuildBamIndex.main(BuildBamIndex.java:62)
        Caused by: java.io.IOException: Unexpected compressed block length: 1
            at net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:358)
            at net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:113)
            at net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:238)
            at java.io.DataInputStream.read(DataInputStream.java:149)
            at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:395)
            ... 13 more
    

    Interestingly (if we can say so) also Samtools gives a slightly different error message

    [fles@login09 recalibrated]$ samtools index PJTrios.UK10K_PJ5208467.clean.dedup.recal.bam
    [bam_index_core] truncated file? Continue anyway. (-4)
    

    and the two indexes have indeed a different size

    -rw-r--r-- 1 fles users     1989272 Nov  8 17:17 PJTrios.UK10K_PJ5208467.clean.dedup.recal.bam.bai
    -rw-r--r-- 1 fles users     1844048 Nov  8 17:14 PJTrios.UK10K_PJ5208467.clean.dedup.recal.bai
    

    I tried to run Picard Validation with both indexes, and they both give the same error as before. I am also running again the entire pipeline on these two files, but I don't have other ideas..

    thanks

  • flescaiflescai Posts: 53Member ✭✭

    if we want to make things a bit more complicated, I've re-run the entire pipeline on those two samples, and the new recalibrated bam files generated at the end were this time ok. ReduceRead didn't give any other error, and produced a validated reduced bam file.

    I would be really curious to know where such a variability might come from. but thanks for the suggestions.

Sign In or Register to comment.