The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

ReduceReads terminated before finishing and produce 0 size output

rcholicrcholic DenverMember
edited October 2013 in Ask the GATK team

I'm using GATK 2.74 on my server with Java 1.7 on Mac. I'm running through many .bam files that was produced by the upstream PrintReads. I am using a for-loop in a shell script to loop all the .bam files through ReduceReads
Some of these bam files were not compressed by GATK-ReduceReads, and it gives 0B .bam output files.

The command line I use is following:

java -Xmx10g -Djava.awt.headless=true -jar $CLASSPATH/GenomeAnalysisTK.jar \ -T ReduceReads \ -R ./GATK_ref/hg19.fasta \ -S LENIENT \ -log ../GATK/BQSR/log/file1.compress.log \ -I ../GATK/BQSR/file1.recal.bam \ -o ../GATK/BQSR/file1.compressed.bam

I'm copying the tail part of the log for one of the failed .bam files: from this log, I don't see any error. Maybe ReduceReads walker just terminated itself earlier???

INFO 10:28:13,654 ProgressMeter - chr4:48710213 2.47e+07 19.2 m 46.0 s 23.6% 81.4 m 62.2 m INFO 10:28:43,657 ProgressMeter - chr4:83322785 2.54e+07 19.7 m 46.0 s 24.7% 79.8 m 60.1 m INFO 10:29:13,659 ProgressMeter - chr4:112743605 2.60e+07 20.2 m 46.0 s 25.6% 78.8 m 58.6 m INFO 10:29:43,662 ProgressMeter - chr4:151294343 2.67e+07 20.7 m 46.0 s 26.8% 77.1 m 56.4 m INFO 10:30:13,664 ProgressMeter - chr4:184999795 2.72e+07 21.2 m 46.0 s 27.9% 75.9 m 54.7 m INFO 10:30:43,667 ProgressMeter - chr5:14341390 2.79e+07 21.7 m 46.0 s 28.6% 75.9 m 54.2 m INFO 10:31:13,796 ProgressMeter - chr5:37088471 2.85e+07 22.2 m 46.0 s 29.3% 75.7 m 53.6 m INFO 10:31:43,798 ProgressMeter - chr5:67907077 2.92e+07 22.7 m 46.0 s 30.3% 74.9 m 52.3 m INFO 10:32:13,914 ProgressMeter - chr5:90107126 2.97e+07 23.2 m 46.0 s 31.0% 74.8 m 51.7 m INFO 10:32:44,969 ProgressMeter - chr5:127450350 3.03e+07 23.7 m 46.0 s 32.2% 73.7 m 50.0 m

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Why are you using -S LENIENT? Do you expect there might be something wrong with the files?

  • rcholicrcholic DenverMember
    edited October 2013

    @Geraldine: I tried this a couple of times. At first, I didn't include "-S LENIENT" in my command lines at all. But the ReduceReads keeps failing the bam files, I thought I should make GATK more lenient, that's why I added "-S LENIENT". This still does not make ReduceReads work properly.

    I did not expect any of my bam files to be wrong, as they were generated by PrintReads (my assumption PrintReads produce correct bam files).
    thanks for your reply

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    OK, I see. I asked because -S LENIENT is typically used to allow the use of files that are not in accordance with the SAM spec. If you had been using that from the beginning of the data processing workflow then you might have some BAM files that are not strictly correct despite having been produced by GATK. This is not something that should be used lightly.

    Did you try re-running RR on one of the samples that failed, just by itself?

  • rcholicrcholic DenverMember

    @Geraldine: yes, I tried ReduceReads on only one of the failed bam files, but it still does not work properly - giving 0B output files.

    Before running GATK, I used PiCard to prepare the BAM files for GATK, I can't imagine there could be something wrong in the BAM files. Should I try PiCard to validate my bam files?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Sure, try running ValidateSAMFiles to make sure nothing is wrong with those files.

    Can you tell if the GATK runs complete or if they seem to terminate early? Do you see a run summary in the console output?

  • rcholicrcholic DenverMember

    I did not try to see the GATK run summary on my screen, I thought the log should catch everything on the run summary. I'll try ValidateSAMFiles. thanks

  • rcholicrcholic DenverMember

    @Geraldine: the ValidateSamFile results did not report any different result between the Bam files that did not work with RR and those Bam files that were compressed successfully by ReduceReads. ValidateSamFile.jar reports the same following results:

    ERROR: Record 8263, Read name HISEQ:95:C2L76ACXX:8:1211:6435:61139, NM tag (nucleotide differences) in file [5] does not match reality [7] ERROR: Record 8264, Read name HISEQ:95:C2L76ACXX:8:1308:5123:27358, NM tag (nucleotide differences) in file [3] does not match reality [4] ERROR: Record 471367, Read name HISEQ:95:C2L76ACXX:8:1309:19067:70114, NM tag (nucleotide differences) in file [7] does not match reality [8] ERROR: Record 471381, Read name HISEQ:95:C2L76ACXX:8:1209:5397:97348, NM tag (nucleotide differences) in file [6] does not match reality [7] ERROR: Record 471383, Read name HISEQ:95:C2L76ACXX:8:1215:7458:78294, NM tag (nucleotide differences) in file [6] does not match reality [7] ERROR: Record 471444, Read name HISEQ:95:C2L76ACXX:8:2201:12624:20327, NM tag (nucleotide differences) in file [1] does not match reality [20] ERROR: Record 471445, Read name HISEQ:95:C2L76ACXX:8:2201:12624:20327, NM tag (nucleotide differences) in file [0] does not match reality [19] ERROR: Record 471446, Read name HISEQ:95:C2L76ACXX:8:2213:4317:66187, NM tag (nucleotide differences) in file [0] does not match reality [19] ERROR: Record 471447, Read name HISEQ:95:C2L76ACXX:8:2213:4317:66187, NM tag (nucleotide differences) in file [0] does not match reality [19] ERROR: Record 471448, Read name HISEQ:95:C2L76ACXX:8:2111:14220:4814, NM tag (nucleotide differences) in file [0] does not match reality [19]

    I'm trying the 2013-10-15 nightly build now to see if the problem persists.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    If you re-run the job, consider adding -l DEBUG to your command line. It will give you additional information that may be useful for debugging.

  • rcholicrcholic DenverMember

    thanks Geraldine. I think the Djava.io.tmp default folder has its disk full. This is what I see on the GATK run summary (disk space full). Will try a different tmp folder for GATK. :)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ah, there you go, that makes sense. Good luck!

Sign In or Register to comment.