The current GATK version is 3.3-0

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bam file is malformed using Reduce Reads

New YorkPosts: 13Member

I used BaseRecalibration and generated a bam file using PrintReads correctly along with .bai which was generated automatically. However, when I am trying to reduce the bam file using ReduceReads, I am getting the error below:

ERROR MESSAGE: SAM/BAM file

Tagged:

The Premature EOF part of the message suggests that the file is incomplete; maybe the previous run terminated without completing? Or you copied the file over from somewhere but the transfer was interrupted? The only thing you can do really is repeat the previous step.

Geraldine Van der Auwera, PhD

• New YorkPosts: 13Member

Hi Geraldine,

I think the file from Print Reads was incomplete, since I tried to generate the bam file and saw the log file of the same:

INFO 14:06:22,706 ProgressMeter - 16:21968913 8.04e+07 2.2 h 100.0 s 78.4% 2.9 h 37.0 m INFO 14:06:52,709 ProgressMeter - 16:29056306 8.07e+07 2.2 h 100.0 s 78.6% 2.9 h 36.7 m INFO 14:07:22,711 ProgressMeter - 16:32971049 8.10e+07 2.3 h 100.0 s 78.8% 2.9 h 36.5 m INFO 14:07:52,713 ProgressMeter - 16:55539864 8.13e+07 2.3 h 100.0 s 79.5% 2.8 h 35.1 m

The bam file stops at 79.5%.

Below is my command for Print Reads:

/jre1.7.0_40/bin/java -Xmx2G -jar /GenomeAnalysisTK-2.7-4-g6f46d11/GenomeAnalysisTK.jar -T PrintReads -R /dbdata/human_g1k_v37.fasta -I /bam_data/dedup_51769_R1_R2_RG.realigned.bam -BQSR /bam_base_recalib/recal_data.table - /bam_base_recalib/dedup_51769_R1_R2_RG_realign_recal_reads.bam

I don't know where the error is since the bam file around 11GB is generated but does not complete. Do I have to add -Djava.io.tmpdir='pwd'/tmp before -jar in the command?

It might have been a one-off system glitch, but it is a good idea to check the capacity of your temp directory before starting the run again.

Geraldine Van der Auwera, PhD

• New YorkPosts: 13Member
edited November 2013

I have ran it several times but PrintReads is stopping always at around 81%

INFO 17:44:41,628 ProgressMeter - 17:7416802 8.33e+07 2.3 h 97.0 s 80.8% 2.8 h 32.0 m

INFO 17:45:11,631 ProgressMeter - 17:12225027 8.36e+07 2.3 h 97.0 s 81.0% 2.8 h 31.8 m

INFO 17:45:41,633 ProgressMeter - 17:18528651 8.40e+07 2.3 h 97.0 s 81.2% 2.8 h 31.5 m

Is this because of multithreading options I have to provide in GATK which I did not provide? e.g. using -nt and -nct options .

Post edited by rb2905 on

No, multithreading status shouldn't affect this. Are you running this on a server? You could check if your jobs are getting terminated due to limitations on the time/space requirements of your user account.

Geraldine Van der Auwera, PhD

• New YorkPosts: 13Member

Yes,I am running on a server.I checked with the system admin as well ,and my jobs are not getting terminated due to time/space requirements.

I ran the job again and it got terminated at 61% this time.

• New YorkPosts: 13Member

I was able to generate the bam file by giving more memory ,using PrintReads but still not able to reduce the bam size using ReduceReads . Below is my command:

I have given 40GB memory to run this and below is the end of the log file generated:

INFO  15:02:20,604 ProgressMeter -     19:19087134        8.14e+07    2.2 h       99.0 s     86.4%         2.6 h    21.3 m
INFO  15:02:51,168 ProgressMeter -     19:36964303        8.20e+07    2.3 h       98.0 s     86.9%         2.6 h    20.3 m
INFO  15:03:24,380 ProgressMeter -     19:44079459        8.26e+07    2.3 h       98.0 s     87.2%         2.6 h    20.0 m
INFO  15:03:56,583 ProgressMeter -     19:51266652        8.32e+07    2.3 h       98.0 s     87.4%         2.6 h    19.6 m