The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# ReduceReads terminated before finishing and produce 0 size output

DenverPosts: 68Member
edited October 2013

I'm using GATK 2.74 on my server with Java 1.7 on Mac. I'm running through many .bam files that was produced by the upstream PrintReads. I am using a for-loop in a shell script to loop all the .bam files through ReduceReads
Some of these bam files were not compressed by GATK-ReduceReads, and it gives 0B .bam output files.

The command line I use is following:

java -Xmx10g -Djava.awt.headless=true -jar \$CLASSPATH/GenomeAnalysisTK.jar \
-R ./GATK_ref/hg19.fasta \
-S LENIENT \
-log ../GATK/BQSR/log/file1.compress.log \
-I ../GATK/BQSR/file1.recal.bam \
-o ../GATK/BQSR/file1.compressed.bam

I'm copying the tail part of the log for one of the failed .bam files: from this log, I don't see any error. Maybe ReduceReads walker just terminated itself earlier???

INFO  10:28:13,654 ProgressMeter -   chr4:48710213        2.47e+07   19.2 m       46.0 s     23.6%        81.4 m    62.2 m
INFO  10:28:43,657 ProgressMeter -   chr4:83322785        2.54e+07   19.7 m       46.0 s     24.7%        79.8 m    60.1 m
INFO  10:29:13,659 ProgressMeter -  chr4:112743605        2.60e+07   20.2 m       46.0 s     25.6%        78.8 m    58.6 m
INFO  10:29:43,662 ProgressMeter -  chr4:151294343        2.67e+07   20.7 m       46.0 s     26.8%        77.1 m    56.4 m
INFO  10:30:13,664 ProgressMeter -  chr4:184999795        2.72e+07   21.2 m       46.0 s     27.9%        75.9 m    54.7 m
INFO  10:30:43,667 ProgressMeter -   chr5:14341390        2.79e+07   21.7 m       46.0 s     28.6%        75.9 m    54.2 m
INFO  10:31:13,796 ProgressMeter -   chr5:37088471        2.85e+07   22.2 m       46.0 s     29.3%        75.7 m    53.6 m
INFO  10:31:43,798 ProgressMeter -   chr5:67907077        2.92e+07   22.7 m       46.0 s     30.3%        74.9 m    52.3 m
INFO  10:32:13,914 ProgressMeter -   chr5:90107126        2.97e+07   23.2 m       46.0 s     31.0%        74.8 m    51.7 m
INFO  10:32:44,969 ProgressMeter -  chr5:127450350        3.03e+07   23.7 m       46.0 s     32.2%        73.7 m    50.0 m
Post edited by rcholic on
Tagged:

Why are you using -S LENIENT? Do you expect there might be something wrong with the files?

Geraldine Van der Auwera, PhD

• DenverPosts: 68Member
edited October 2013

@Geraldine: I tried this a couple of times. At first, I didn't include "-S LENIENT" in my command lines at all. But the ReduceReads keeps failing the bam files, I thought I should make GATK more lenient, that's why I added "-S LENIENT". This still does not make ReduceReads work properly.

I did not expect any of my bam files to be wrong, as they were generated by PrintReads (my assumption PrintReads produce correct bam files).

Post edited by rcholic on

OK, I see. I asked because -S LENIENT is typically used to allow the use of files that are not in accordance with the SAM spec. If you had been using that from the beginning of the data processing workflow then you might have some BAM files that are not strictly correct despite having been produced by GATK. This is not something that should be used lightly.

Did you try re-running RR on one of the samples that failed, just by itself?

Geraldine Van der Auwera, PhD

• DenverPosts: 68Member

@Geraldine: yes, I tried ReduceReads on only one of the failed bam files, but it still does not work properly - giving 0B output files.

Before running GATK, I used PiCard to prepare the BAM files for GATK, I can't imagine there could be something wrong in the BAM files. Should I try PiCard to validate my bam files?

Sure, try running ValidateSAMFiles to make sure nothing is wrong with those files.

Can you tell if the GATK runs complete or if they seem to terminate early? Do you see a run summary in the console output?

Geraldine Van der Auwera, PhD

• DenverPosts: 68Member

I did not try to see the GATK run summary on my screen, I thought the log should catch everything on the run summary. I'll try ValidateSAMFiles. thanks

• DenverPosts: 68Member

@Geraldine: the ValidateSamFile results did not report any different result between the Bam files that did not work with RR and those Bam files that were compressed successfully by ReduceReads. ValidateSamFile.jar reports the same following results:

ERROR: Record 8263, Read name HISEQ:95:C2L76ACXX:8:1211:6435:61139, NM tag (nucleotide differences) in file [5] does not match reality [7]
ERROR: Record 8264, Read name HISEQ:95:C2L76ACXX:8:1308:5123:27358, NM tag (nucleotide differences) in file [3] does not match reality [4]
ERROR: Record 471367, Read name HISEQ:95:C2L76ACXX:8:1309:19067:70114, NM tag (nucleotide differences) in file [7] does not match reality [8]
ERROR: Record 471381, Read name HISEQ:95:C2L76ACXX:8:1209:5397:97348, NM tag (nucleotide differences) in file [6] does not match reality [7]
ERROR: Record 471383, Read name HISEQ:95:C2L76ACXX:8:1215:7458:78294, NM tag (nucleotide differences) in file [6] does not match reality [7]
ERROR: Record 471444, Read name HISEQ:95:C2L76ACXX:8:2201:12624:20327, NM tag (nucleotide differences) in file [1] does not match reality [20]
ERROR: Record 471445, Read name HISEQ:95:C2L76ACXX:8:2201:12624:20327, NM tag (nucleotide differences) in file [0] does not match reality [19]
ERROR: Record 471446, Read name HISEQ:95:C2L76ACXX:8:2213:4317:66187, NM tag (nucleotide differences) in file [0] does not match reality [19]
ERROR: Record 471447, Read name HISEQ:95:C2L76ACXX:8:2213:4317:66187, NM tag (nucleotide differences) in file [0] does not match reality [19]
ERROR: Record 471448, Read name HISEQ:95:C2L76ACXX:8:2111:14220:4814, NM tag (nucleotide differences) in file [0] does not match reality [19]

I'm trying the 2013-10-15 nightly build now to see if the problem persists.

If you re-run the job, consider adding -l DEBUG to your command line. It will give you additional information that may be useful for debugging.

Geraldine Van der Auwera, PhD

• DenverPosts: 68Member

thanks Geraldine. I think the Djava.io.tmp default folder has its disk full. This is what I see on the GATK run summary (disk space full). Will try a different tmp folder for GATK.