Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Bam file is malformed using Reduce Reads

rb2905rb2905 New YorkPosts: 13Member

I used BaseRecalibration and generated a bam file using PrintReads correctly along with .bai which was generated automatically. However, when I am trying to reduce the bam file using ReduceReads, I am getting the error below:

ERROR MESSAGE: SAM/BAM file

/bam_base_recalib/dedup_51769_R1_R2_RG_realigned_recal_reads.bam is malformed: Premature EOF; BinaryCodec in readmode; file: /bam_base_recalib/dedup_51769_R1_R2_RG_realigned_recal_reads.bam

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,073Administrator, GATK Developer admin

    The Premature EOF part of the message suggests that the file is incomplete; maybe the previous run terminated without completing? Or you copied the file over from somewhere but the transfer was interrupted? The only thing you can do really is repeat the previous step.

    Geraldine Van der Auwera, PhD

  • rb2905rb2905 New YorkPosts: 13Member

    Hi Geraldine,

    I think the file from Print Reads was incomplete, since I tried to generate the bam file and saw the log file of the same:

    INFO 14:06:22,706 ProgressMeter - 16:21968913 8.04e+07 2.2 h 100.0 s 78.4% 2.9 h 37.0 m INFO 14:06:52,709 ProgressMeter - 16:29056306 8.07e+07 2.2 h 100.0 s 78.6% 2.9 h 36.7 m INFO 14:07:22,711 ProgressMeter - 16:32971049 8.10e+07 2.3 h 100.0 s 78.8% 2.9 h 36.5 m INFO 14:07:52,713 ProgressMeter - 16:55539864 8.13e+07 2.3 h 100.0 s 79.5% 2.8 h 35.1 m

    The bam file stops at 79.5%.

    Below is my command for Print Reads:

    /jre1.7.0_40/bin/java -Xmx2G -jar /GenomeAnalysisTK-2.7-4-g6f46d11/GenomeAnalysisTK.jar -T PrintReads -R /dbdata/human_g1k_v37.fasta -I /bam_data/dedup_51769_R1_R2_RG.realigned.bam -BQSR /bam_base_recalib/recal_data.table - /bam_base_recalib/dedup_51769_R1_R2_RG_realign_recal_reads.bam

    I don't know where the error is since the bam file around 11GB is generated but does not complete. Do I have to add -Djava.io.tmpdir='pwd'/tmp before -jar in the command?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,073Administrator, GATK Developer admin

    It might have been a one-off system glitch, but it is a good idea to check the capacity of your temp directory before starting the run again.

    Geraldine Van der Auwera, PhD

  • rb2905rb2905 New YorkPosts: 13Member
    edited November 2013

    I have ran it several times but PrintReads is stopping always at around 81%

    INFO 17:44:41,628 ProgressMeter - 17:7416802 8.33e+07 2.3 h 97.0 s 80.8% 2.8 h 32.0 m

    INFO 17:45:11,631 ProgressMeter - 17:12225027 8.36e+07 2.3 h 97.0 s 81.0% 2.8 h 31.8 m

    INFO 17:45:41,633 ProgressMeter - 17:18528651 8.40e+07 2.3 h 97.0 s 81.2% 2.8 h 31.5 m

    Is this because of multithreading options I have to provide in GATK which I did not provide? e.g. using -nt and -nct options .

    Post edited by rb2905 on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,073Administrator, GATK Developer admin

    No, multithreading status shouldn't affect this. Are you running this on a server? You could check if your jobs are getting terminated due to limitations on the time/space requirements of your user account.

    Geraldine Van der Auwera, PhD

  • rb2905rb2905 New YorkPosts: 13Member

    Yes,I am running on a server.I checked with the system admin as well ,and my jobs are not getting terminated due to time/space requirements.

    I ran the job again and it got terminated at 61% this time.

  • rb2905rb2905 New YorkPosts: 13Member

    I was able to generate the bam file by giving more memory ,using PrintReads but still not able to reduce the bam size using ReduceReads . Below is my command:

    /shares/jre1.7.0_40/bin/java -Djava.io.tmpdir=tmp -Xmx2G -jar /shares/GenomeAnalysisTK-2.7-4-g6f46d11/GenomeAnalysisTK.jar -R /shares/dbdata/human_g1k_v37.fasta -T ReduceReads -I /shares/bam_base_recalib/dedup_51769_R1_R2_RG_realign_recal_reads.bam -o /shares/bam_base_recalib/dedup_51769_R1_R2_RG_realign_recal_reads.reduced.bam

    I have given 40GB memory to run this and below is the end of the log file generated:

    INFO  15:02:20,604 ProgressMeter -     19:19087134        8.14e+07    2.2 h       99.0 s     86.4%         2.6 h    21.3 m 
    INFO  15:02:51,168 ProgressMeter -     19:36964303        8.20e+07    2.3 h       98.0 s     86.9%         2.6 h    20.3 m 
    INFO  15:03:24,380 ProgressMeter -     19:44079459        8.26e+07    2.3 h       98.0 s     87.2%         2.6 h    20.0 m 
    INFO  15:03:56,583 ProgressMeter -     19:51266652        8.32e+07    2.3 h       98.0 s     87.4%         2.6 h    19.6 m 
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,073Administrator, GATK Developer admin

    @rb2905, this looks like it's an issue with your platform. Maybe you need to give more memory; maybe it's something else. But it's not something we can help you with; I recommend you work with your sysadmin to figure it out. Good luck!

    Geraldine Van der Auwera, PhD

  • rb2905rb2905 New YorkPosts: 13Member

    Thanks Geraldine! I was able to generate the reduced bam file this time.The command I wrote above works well,must have been server issue.

Sign In or Register to comment.