Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

ReduceReads terminated before finishing and produce 0 size output

rcholicrcholic DenverMember
edited October 2013 in Ask the GATK team

I'm using GATK 2.74 on my server with Java 1.7 on Mac. I'm running through many .bam files that was produced by the upstream PrintReads. I am using a for-loop in a shell script to loop all the .bam files through ReduceReads
Some of these bam files were not compressed by GATK-ReduceReads, and it gives 0B .bam output files.

The command line I use is following:

java -Xmx10g -Djava.awt.headless=true -jar $CLASSPATH/GenomeAnalysisTK.jar \ -T ReduceReads \ -R ./GATK_ref/hg19.fasta \ -S LENIENT \ -log ../GATK/BQSR/log/file1.compress.log \ -I ../GATK/BQSR/file1.recal.bam \ -o ../GATK/BQSR/file1.compressed.bam

I'm copying the tail part of the log for one of the failed .bam files: from this log, I don't see any error. Maybe ReduceReads walker just terminated itself earlier???

INFO 10:28:13,654 ProgressMeter - chr4:48710213 2.47e+07 19.2 m 46.0 s 23.6% 81.4 m 62.2 m INFO 10:28:43,657 ProgressMeter - chr4:83322785 2.54e+07 19.7 m 46.0 s 24.7% 79.8 m 60.1 m INFO 10:29:13,659 ProgressMeter - chr4:112743605 2.60e+07 20.2 m 46.0 s 25.6% 78.8 m 58.6 m INFO 10:29:43,662 ProgressMeter - chr4:151294343 2.67e+07 20.7 m 46.0 s 26.8% 77.1 m 56.4 m INFO 10:30:13,664 ProgressMeter - chr4:184999795 2.72e+07 21.2 m 46.0 s 27.9% 75.9 m 54.7 m INFO 10:30:43,667 ProgressMeter - chr5:14341390 2.79e+07 21.7 m 46.0 s 28.6% 75.9 m 54.2 m INFO 10:31:13,796 ProgressMeter - chr5:37088471 2.85e+07 22.2 m 46.0 s 29.3% 75.7 m 53.6 m INFO 10:31:43,798 ProgressMeter - chr5:67907077 2.92e+07 22.7 m 46.0 s 30.3% 74.9 m 52.3 m INFO 10:32:13,914 ProgressMeter - chr5:90107126 2.97e+07 23.2 m 46.0 s 31.0% 74.8 m 51.7 m INFO 10:32:44,969 ProgressMeter - chr5:127450350 3.03e+07 23.7 m 46.0 s 32.2% 73.7 m 50.0 m

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Why are you using -S LENIENT? Do you expect there might be something wrong with the files?

  • rcholicrcholic DenverMember
    edited October 2013

    @Geraldine: I tried this a couple of times. At first, I didn't include "-S LENIENT" in my command lines at all. But the ReduceReads keeps failing the bam files, I thought I should make GATK more lenient, that's why I added "-S LENIENT". This still does not make ReduceReads work properly.

    I did not expect any of my bam files to be wrong, as they were generated by PrintReads (my assumption PrintReads produce correct bam files).
    thanks for your reply

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    OK, I see. I asked because -S LENIENT is typically used to allow the use of files that are not in accordance with the SAM spec. If you had been using that from the beginning of the data processing workflow then you might have some BAM files that are not strictly correct despite having been produced by GATK. This is not something that should be used lightly.

    Did you try re-running RR on one of the samples that failed, just by itself?

  • rcholicrcholic DenverMember

    @Geraldine: yes, I tried ReduceReads on only one of the failed bam files, but it still does not work properly - giving 0B output files.

    Before running GATK, I used PiCard to prepare the BAM files for GATK, I can't imagine there could be something wrong in the BAM files. Should I try PiCard to validate my bam files?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sure, try running ValidateSAMFiles to make sure nothing is wrong with those files.

    Can you tell if the GATK runs complete or if they seem to terminate early? Do you see a run summary in the console output?

  • rcholicrcholic DenverMember

    I did not try to see the GATK run summary on my screen, I thought the log should catch everything on the run summary. I'll try ValidateSAMFiles. thanks

  • rcholicrcholic DenverMember

    @Geraldine: the ValidateSamFile results did not report any different result between the Bam files that did not work with RR and those Bam files that were compressed successfully by ReduceReads. ValidateSamFile.jar reports the same following results:

    ERROR: Record 8263, Read name HISEQ:95:C2L76ACXX:8:1211:6435:61139, NM tag (nucleotide differences) in file [5] does not match reality [7] ERROR: Record 8264, Read name HISEQ:95:C2L76ACXX:8:1308:5123:27358, NM tag (nucleotide differences) in file [3] does not match reality [4] ERROR: Record 471367, Read name HISEQ:95:C2L76ACXX:8:1309:19067:70114, NM tag (nucleotide differences) in file [7] does not match reality [8] ERROR: Record 471381, Read name HISEQ:95:C2L76ACXX:8:1209:5397:97348, NM tag (nucleotide differences) in file [6] does not match reality [7] ERROR: Record 471383, Read name HISEQ:95:C2L76ACXX:8:1215:7458:78294, NM tag (nucleotide differences) in file [6] does not match reality [7] ERROR: Record 471444, Read name HISEQ:95:C2L76ACXX:8:2201:12624:20327, NM tag (nucleotide differences) in file [1] does not match reality [20] ERROR: Record 471445, Read name HISEQ:95:C2L76ACXX:8:2201:12624:20327, NM tag (nucleotide differences) in file [0] does not match reality [19] ERROR: Record 471446, Read name HISEQ:95:C2L76ACXX:8:2213:4317:66187, NM tag (nucleotide differences) in file [0] does not match reality [19] ERROR: Record 471447, Read name HISEQ:95:C2L76ACXX:8:2213:4317:66187, NM tag (nucleotide differences) in file [0] does not match reality [19] ERROR: Record 471448, Read name HISEQ:95:C2L76ACXX:8:2111:14220:4814, NM tag (nucleotide differences) in file [0] does not match reality [19]

    I'm trying the 2013-10-15 nightly build now to see if the problem persists.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    If you re-run the job, consider adding -l DEBUG to your command line. It will give you additional information that may be useful for debugging.

  • rcholicrcholic DenverMember

    thanks Geraldine. I think the Djava.io.tmp default folder has its disk full. This is what I see on the GATK run summary (disk space full). Will try a different tmp folder for GATK. :)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, there you go, that makes sense. Good luck!

Sign In or Register to comment.