Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.5 is out. See the GATK4 beta page for download and details.

Question concerning BaseRecalibrator

To whom it may concern,
after running the realignment and creating a corresponding xxx_realigned.bam, I tried to recalibrate bases using walker BaseRecalibrator. The walker stopped analysis due to the ERROR MESSAGE: SAM/BAM file xxx_realigned.bam is malformed: Read error; BinaryCodec in readmode; file: /....
As I can't figure out the problem, I hope to get some help.
Best,
Arne

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Arne,

    You need to validate your input bam file using Picard ValidateSAMFile. That should tell you what's wrong with it.

  • Hi Geraldine,
    I did so and the script gave no hint to any problem.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I see. Can you please post your full command line, as well as full console output including the stack trace and error message you get from running BaseRecalibrator?

  • ... creating a log-file ... this will take some minutes ...

  • ... so, here is the log-file ... is it helpful to you?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, I'm not seeing the error message. Can you just copy-paste it from the console output? I need the stack trace to determine where/why the recalibrator crapped out.

  • ... just about ...

  • ... here is the track ... I included some comments in brackets ... does this help? (the input commands are indicated at the beginning, the error appears at the end of file).

  • golharamgolharam Member
    edited September 2013

    I am getting this same error when running UnifiedGenotyper. All my BAM files were created with the same pipeline, so not sure what's going on here. Here is the full output:

    $JAVA -Xmx8G -jar $GATK_DIR/GenomeAnalysisTK.jar \
        -T UnifiedGenotyper \
        -glm BOTH \
        -nt ${THREADS} \
        -R $REFERENCE \
        -I ${SAMPLE_NAME}.dedup.realigned.fixed.recal.bam \
        --dbsnp $DBSNP \
        -o ${SAMPLE_NAME}.ug.vcf \
        -stand_call_conf 30.0 \
        -stand_emit_conf 10.0 \
        -rf BadCigar
    
    INFO  15:54:07,606 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  15:54:07,618 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.4-9-g532efad, Compiled 2013/03/19 07:35:36 
    INFO  15:54:07,618 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  15:54:07,618 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  15:54:07,624 HelpFormatter - Program Args: -T UnifiedGenotyper -glm BOTH -nt 4 -R /share/ngs/genomes/b37/human_g1k_v37.fasta -I 1-00230-01.dedup.realigned.fixed.recal.bam --dbsnp /sha
    re/ngs/apps/GenomeAnalysisTK-2.4-9-g532efad/bundle/dbsnp_137.hg19.vcf -o 1-00230-01.ug.vcf -stand_call_conf 30.0 -stand_emit_conf 10.0 -rf BadCigar 
    INFO  15:54:07,624 HelpFormatter - Date/Time: 2013/09/17 15:54:07 
    INFO  15:54:07,624 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  15:54:07,624 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  15:54:07,695 ArgumentTypeDescriptor - Dynamically determined type of /share/ngs/apps/GenomeAnalysisTK-2.4-9-g532efad/bundle/dbsnp_137.hg19.vcf to be VCF 
    INFO  15:54:07,924 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  15:54:08,583 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
    INFO  15:54:08,592 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  15:54:08,634 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 
    INFO  15:54:08,668 RMDTrackBuilder - Loading Tribble index from disk for file /share/ngs/apps/GenomeAnalysisTK-2.4-9-g532efad/bundle/dbsnp_137.hg19.vcf 
    INFO  15:54:08,926 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 1 CPU thread(s) for each of 4 data thread(s), of 4 processors available on this machine 
    INFO  15:54:09,141 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files 
    INFO  15:54:10,075 GenomeAnalysisEngine - Done creating shard strategy 
    INFO  15:54:10,075 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  15:54:10,076 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
    INFO  15:54:10,419 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  15:54:10,430 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
    INFO  15:54:10,431 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  15:54:10,436 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
    INFO  15:54:10,436 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  15:54:10,442 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
    INFO  15:54:10,466 RMDTrackBuilder - Loading Tribble index from disk for file /share/ngs/apps/GenomeAnalysisTK-2.4-9-g532efad/bundle/dbsnp_137.hg19.vcf 
    INFO  15:54:10,673 RMDTrackBuilder - Loading Tribble index from disk for file /share/ngs/apps/GenomeAnalysisTK-2.4-9-g532efad/bundle/dbsnp_137.hg19.vcf 
    INFO  15:54:11,373 RMDTrackBuilder - Loading Tribble index from disk for file /share/ngs/apps/GenomeAnalysisTK-2.4-9-g532efad/bundle/dbsnp_137.hg19.vcf 
    INFO  15:54:40,080 ProgressMeter -   chr1:10754905        1.07e+07   30.0 s        2.0 s      0.3%         2.4 h     2.4 h 
    INFO  15:55:10,081 ProgressMeter -   chr1:23958109        2.39e+07   60.0 s        2.0 s      0.8%         2.2 h     2.1 h 
    INFO  15:55:40,082 ProgressMeter -   chr1:38823797        3.87e+07   90.0 s        2.0 s      1.3%       119.8 m   118.3 m 
    INFO  15:56:10,084 ProgressMeter -   chr1:54301177        5.42e+07  120.0 s        2.0 s      1.8%       114.2 m   112.2 m 
    INFO  15:56:40,085 ProgressMeter -   chr1:75138425        7.51e+07    2.5 m        1.0 s      2.4%       103.2 m   100.7 m 
    INFO  15:57:10,086 ProgressMeter -   chr1:95307529        9.52e+07    3.0 m        1.0 s      3.1%        97.6 m    94.6 m 
    INFO  15:57:40,088 ProgressMeter -  chr1:114540545        1.14e+08    3.5 m        1.0 s      3.7%        94.8 m    91.3 m 
    INFO  15:58:10,089 ProgressMeter -  chr1:149620389        1.50e+08    4.0 m        1.0 s      4.8%        82.9 m    78.9 m 
    INFO  15:58:40,090 ProgressMeter -  chr1:160155901        1.60e+08    4.5 m        1.0 s      5.2%        87.2 m    82.7 m 
    INFO  15:59:10,092 ProgressMeter -  chr1:178082097        1.78e+08    5.0 m        1.0 s      5.7%        87.1 m    82.1 m 
    INFO  15:59:40,093 ProgressMeter -  chr1:198744021        1.99e+08    5.5 m        1.0 s      6.4%        85.8 m    80.3 m 
    INFO  16:00:10,094 ProgressMeter -  chr1:215433217        2.15e+08    6.0 m        1.0 s      6.9%        86.4 m    80.4 m 
    INFO  16:00:40,102 ProgressMeter -  chr1:234602997        2.35e+08    6.5 m        1.0 s      7.6%        85.9 m    79.4 m 
    INFO  16:01:10,103 ProgressMeter -    chr2:4058949        2.53e+08    7.0 m        1.0 s      8.2%        85.7 m    78.7 m 
    INFO  16:01:40,104 ProgressMeter -   chr2:26203217        2.75e+08    7.5 m        1.0 s      8.9%        84.5 m    77.0 m 
    INFO  16:02:10,105 ProgressMeter -   chr2:44057877        2.93e+08    8.0 m        1.0 s      9.5%        84.6 m    76.6 m 
    INFO  16:02:40,107 ProgressMeter -   chr2:65589553        3.15e+08    8.5 m        1.0 s     10.2%        83.7 m    75.2 m 
    INFO  16:03:10,108 ProgressMeter -   chr2:86052869        3.35e+08    9.0 m        1.0 s     10.8%        83.3 m    74.3 m 
    INFO  16:03:40,109 ProgressMeter -  chr2:105480293        3.55e+08    9.5 m        1.0 s     11.4%        83.1 m    73.6 m 
    INFO  16:04:10,216 ProgressMeter -  chr2:125555693        3.75e+08   10.0 m        1.0 s     12.1%        82.8 m    72.8 m 
    INFO  16:04:40,217 ProgressMeter -  chr2:145951073        3.95e+08   10.5 m        1.0 s     12.7%        82.4 m    71.9 m 
    INFO  16:05:10,218 ProgressMeter -  chr2:166789521        4.16e+08   11.0 m        1.0 s     13.4%        82.0 m    71.0 m 
    INFO  16:05:40,219 ProgressMeter -  chr2:183936585        4.33e+08   11.5 m        1.0 s     14.0%        82.3 m    70.8 m 
    INFO  16:06:10,221 ProgressMeter -  chr2:204543657        4.54e+08   12.0 m        1.0 s     14.6%        82.0 m    70.0 m 
    INFO  16:06:40,222 ProgressMeter -  chr2:223494545        4.73e+08   12.5 m        1.0 s     15.2%        82.0 m    69.5 m 
    INFO  16:07:10,223 ProgressMeter -  chr2:241733937        4.91e+08   13.0 m        1.0 s     15.8%        82.1 m    69.1 m 
    INFO  16:07:40,224 ProgressMeter -   chr3:16838769        5.09e+08   13.5 m        1.0 s     16.4%        82.2 m    68.7 m 
    INFO  16:08:10,225 ProgressMeter -   chr3:38935985        5.31e+08   14.0 m        1.0 s     17.1%        81.7 m    67.7 m 
    INFO  16:08:40,227 ProgressMeter -   chr3:52584357        5.45e+08   14.5 m        1.0 s     17.6%        82.5 m    68.0 m 
    INFO  16:09:10,228 ProgressMeter -   chr3:73023889        5.65e+08   15.0 m        1.0 s     18.2%        82.3 m    67.3 m 
    INFO  16:09:40,229 ProgressMeter -  chr3:100188077        5.93e+08   15.5 m        1.0 s     19.1%        81.1 m    65.6 m 
    INFO  16:10:10,230 ProgressMeter -  chr3:121229017        6.14e+08   16.0 m        1.0 s     19.8%        80.9 m    64.9 m 
    INFO  16:10:40,231 ProgressMeter -  chr3:137822309        6.30e+08   16.5 m        1.0 s     20.3%        81.2 m    64.7 m 
    INFO  16:11:10,232 ProgressMeter -  chr3:158600721        6.51e+08   17.0 m        1.0 s     21.0%        81.0 m    64.0 m 
    INFO  16:11:40,234 ProgressMeter -  chr3:180705737        6.73e+08   17.5 m        1.0 s     21.7%        80.6 m    63.1 m 
    INFO  16:12:10,235 ProgressMeter -     chr4:239177        6.91e+08   18.0 m        1.0 s     22.3%        80.8 m    62.8 m 
    INFO  16:12:40,238 ProgressMeter -   chr4:18454285        7.09e+08   18.5 m        1.0 s     22.9%        80.9 m    62.4 m 
    INFO  16:13:10,239 ProgressMeter -   chr4:42254337        7.33e+08   19.0 m        1.0 s     23.6%        80.4 m    61.4 m 
    INFO  16:13:40,240 ProgressMeter -   chr4:66387885        7.57e+08   19.5 m        1.0 s     24.4%        79.9 m    60.4 m 
    INFO  16:14:01,857 GATKRunReport - Uploaded run statistics report to AWS S3 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 2.4-9-g532efad): 
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: SAM/BAM file 1-00230-01.dedup.realigned.fixed.recal.bam is malformed: Read error; BinaryCodec in readmode; file: /mnt/ec2-user/1-00230-01.dedup.realigned.fixed.recal.ba
    m
    ##### ERROR ------------------------------------------------------------------------------------------
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    This might have been a system glitch; does it occur reproducibly?

  • ralonsoralonso Member

    Hello,

    I continuously have the same problem, did anyone of you solve this issue? In my case I have a Lustre file system, perhaps this is important to be known.
    Regards

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @ralonso, are you experiencing this issue with the latest version of GATK?

  • ralonsoralonso Member

    Hello, It was UnifiedGenotyper Version 2.8-1-g932cd3a, is it solved in new versions?
    Regards

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    It's possible that it might have been fixed in the most recent version, but no guarantees.

  • ralonsoralonso Member
    edited May 2014

    Hello, I have executed again with the new GATK, but still the same problem. These are the last lines of my output:

    INFO 17:29:14,298 ProgressMeter - scaffold_3:23986977 8.92e+07 3.0 h 119.0 s 29.6% 10.0 h 7.0 h

    INFO 17:29:44,301 ProgressMeter - scaffold_3:24413761 8.96e+07 3.0 h 119.0 s 29.8% 10.0 h 7.0 h

    INFO 17:30:14,302 ProgressMeter - scaffold_3:24826161 9.01e+07 3.0 h 119.0 s 29.9% 10.0 h 7.0 h

    INFO 17:30:44,304 ProgressMeter - scaffold_3:25019069 9.03e+07 3.0 h 119.0 s 30.0% 10.0 h 7.0 h

    INFO 17:31:14,310 ProgressMeter - scaffold_3:25249745 9.05e+07 3.0 h 119.0 s 30.0% 10.0 h 7.0 h

    INFO 17:31:44,314 ProgressMeter - scaffold_3:25461137 9.07e+07 3.0 h 119.0 s 30.1% 10.0 h 7.0 h

    INFO 17:32:05,360 SAMDataSource$SAMReaders - Initializing SAMRecords in serial

    INFO 17:32:14,342 ProgressMeter - scaffold_3:25610993 9.08e+07 3.0 h 119.0 s 30.2% 10.0 h 7.0 h

    INFO 17:32:15,692 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.1-1-g07a4bf8):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: SAM/BAM file /fsclinic/projects/naranjoma/pipeline/results_total/ivia_274/map/ivia_274_sorted_mapped_q10_singlehit_markdup_realigned_bfilt.bam is malformed: Read error; BinaryCodec in readmode; file: /fsclinic/projects/naranjoma/pipeline/results_total/ivia_274/map/ivia_274_sorted_mapped_q10_singlehit_markdup_realigned_bfilt.bam
    ERROR ------------------------------------------------------------------------------------------

    any idea please?
    Regards

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @ralonso‌

    Hi,

    This error is a file system issue that we cannot help you with. Please check with your IT support. Good luck!

    -Sheila

  • ralonsoralonso Member

    But It only happens with GATK, not with other software as Picard(Java), bwa... when they even take more time running
    Roberto

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Roberto, some of GATK's operations (which are more complex than other programs) simply do not play well with distributed filesystems. Unfortunately that is not something we can resolve at this time. It's beyond the scope of support we can provide. Perhaps your IT department can help you.

Sign In or Register to comment.