We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Bam file is malformed using Reduce Reads

rb2905rb2905 New YorkMember

I used BaseRecalibration and generated a bam file using PrintReads correctly along with .bai which was generated automatically.
However, when I am trying to reduce the bam file using ReduceReads, I am getting the error below:


/bam_base_recalib/dedup_51769_R1_R2_RG_realigned_recal_reads.bam is malformed: Premature EOF; BinaryCodec in readmode; file: /bam_base_recalib/dedup_51769_R1_R2_RG_realigned_recal_reads.bam

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The Premature EOF part of the message suggests that the file is incomplete; maybe the previous run terminated without completing? Or you copied the file over from somewhere but the transfer was interrupted? The only thing you can do really is repeat the previous step.

  • rb2905rb2905 New YorkMember

    Hi Geraldine,

    I think the file from Print Reads was incomplete, since I tried to generate the bam file and saw the log file of the same:

    INFO 14:06:22,706 ProgressMeter - 16:21968913 8.04e+07 2.2 h 100.0 s 78.4% 2.9 h 37.0 m
    INFO 14:06:52,709 ProgressMeter - 16:29056306 8.07e+07 2.2 h 100.0 s 78.6% 2.9 h 36.7 m
    INFO 14:07:22,711 ProgressMeter - 16:32971049 8.10e+07 2.3 h 100.0 s 78.8% 2.9 h 36.5 m
    INFO 14:07:52,713 ProgressMeter - 16:55539864 8.13e+07 2.3 h 100.0 s 79.5% 2.8 h 35.1 m

    The bam file stops at 79.5%.

    Below is my command for Print Reads:

    /jre1.7.0_40/bin/java -Xmx2G -jar /GenomeAnalysisTK-2.7-4-g6f46d11/GenomeAnalysisTK.jar -T PrintReads -R /dbdata/human_g1k_v37.fasta -I /bam_data/dedup_51769_R1_R2_RG.realigned.bam -BQSR /bam_base_recalib/recal_data.table -

    I don't know where the error is since the bam file around 11GB is generated but does not complete.
    Do I have to add -Djava.io.tmpdir='pwd'/tmp before -jar in the command?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It might have been a one-off system glitch, but it is a good idea to check the capacity of your temp directory before starting the run again.

  • rb2905rb2905 New YorkMember
    edited November 2013

    I have ran it several times but PrintReads is stopping always at around 81%

    INFO 17:44:41,628 ProgressMeter - 17:7416802 8.33e+07 2.3 h 97.0 s 80.8% 2.8 h 32.0 m

    INFO 17:45:11,631 ProgressMeter - 17:12225027 8.36e+07 2.3 h 97.0 s 81.0% 2.8 h 31.8 m

    INFO 17:45:41,633 ProgressMeter - 17:18528651 8.40e+07 2.3 h 97.0 s 81.2% 2.8 h 31.5 m

    Is this because of multithreading options I have to provide in GATK which I did not provide?
    e.g. using -nt and -nct options .

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No, multithreading status shouldn't affect this. Are you running this on a server? You could check if your jobs are getting terminated due to limitations on the time/space requirements of your user account.

  • rb2905rb2905 New YorkMember

    Yes,I am running on a server.I checked with the system admin as well ,and my jobs are not getting terminated due to time/space requirements.

    I ran the job again and it got terminated at 61% this time.

  • rb2905rb2905 New YorkMember

    I was able to generate the bam file by giving more memory ,using PrintReads but still not able to reduce the bam size using ReduceReads .
    Below is my command:

    /shares/jre1.7.0_40/bin/java -Djava.io.tmpdir=tmp -Xmx2G -jar /shares/GenomeAnalysisTK-2.7-4-g6f46d11/GenomeAnalysisTK.jar -R /shares/dbdata/human_g1k_v37.fasta -T ReduceReads -I /shares/bam_base_recalib/dedup_51769_R1_R2_RG_realign_recal_reads.bam -o /shares/bam_base_recalib/dedup_51769_R1_R2_RG_realign_recal_reads.reduced.bam

    I have given 40GB memory to run this and below is the end of the log file generated:

    INFO  15:02:20,604 ProgressMeter -     19:19087134        8.14e+07    2.2 h       99.0 s     86.4%         2.6 h    21.3 m 
    INFO  15:02:51,168 ProgressMeter -     19:36964303        8.20e+07    2.3 h       98.0 s     86.9%         2.6 h    20.3 m 
    INFO  15:03:24,380 ProgressMeter -     19:44079459        8.26e+07    2.3 h       98.0 s     87.2%         2.6 h    20.0 m 
    INFO  15:03:56,583 ProgressMeter -     19:51266652        8.32e+07    2.3 h       98.0 s     87.4%         2.6 h    19.6 m 
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @rb2905, this looks like it's an issue with your platform. Maybe you need to give more memory; maybe it's something else. But it's not something we can help you with; I recommend you work with your sysadmin to figure it out. Good luck!

  • rb2905rb2905 New YorkMember

    Thanks Geraldine!
    I was able to generate the reduced bam file this time.The command I wrote above works well,must have been server issue.

  • mweinsteinmweinstein UCLAMember

    One other error that can sometimes generate errors similar to this one: Double check your command line to make sure that your input and output files aren't the same. If that happens, you are overwriting your BAM as you read it and will get a premature EOF error (you will also likely pick this one up quickly, as it corrupts your original BAM).

Sign In or Register to comment.