We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

ReduceReads error: Write error; BinaryCodec in writemode; streamed file (filename not available)

PeteHaitchPeteHaitch Member
edited August 2012 in Ask the GATK team

The ReduceReads walker is giving me the error Write error; BinaryCodec in writemode; streamed file (filename not available) on 7 BAMs that have been processed following the Best Practice Variant Detection v4 guide and using BWA as the aligner. I am using GATK v2.0-39-gd091f72 and calling ReduceReads with the following command (gatk is an alias on my system):

gatk -R ${REF} -T ReduceReads -I ${SAMPLE_ID}_final.bam -o ${SAMPLE_ID}_final.reduced.bam &> ${SAMPLE_ID}_ReduceReads.out

The walker crashes sometime during the process and there is no obvious pattern as to when it crashes (e.g. some samples during chr2, others chr6 or chr10). I have run each sample's BAM through Picard's ValidateSamFile utility and only 1/7 BAMs pass without error. The three types of errors I am seeing from ValidateSamFile are:

  • MAPQ should be 0 for unmapped read
  • Mate unmapped flag does not match read unmapped flag of mate
  • Mate negative strand flag does not match read negative strand flag of mate

I believe these errors are caused by BWA not complying with the SAM spec, although perhaps it could be due to GATK's indel realigner. Are these departures from the SAM spec causing ReduceReads to crash or is it likely some other problem? How does your team deal with BWA's departures from the SAM spec given BWA is the recommended aligner for using with GATK?

Thanks

Best Answers

  • ebanksebanks Broad Institute ✭✭✭✭
    Accepted Answer

    Hi Pete,

    Unfortunately, we can't support usage of bam file that aren't valid. Things should work just fine once you have a passing bam.

  • ebanksebanks Broad Institute ✭✭✭✭
    Accepted Answer

    For both questions: we suffer through it. The GATK can actually process through such errors. If those are the only bad ones then you are okay.

    I have a guess as to what's going on (and this should hopefully be handled better in release 2.1 as far as error messages). Perhaps you are running out of space in your /tmp directory (since Reduce Reads writes there until it's ready to write the final file). You should try setting your tmp directory to someplace with more memory (adding -Djava.io.tmpdir=X to your commandline, modifying X appropriately).

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hi Pete,

    Without the full stack trace it's impossible to see what's happening. Can you please post it?

  • PeteHaitchPeteHaitch Member

    Pardon my ignorance, but is the stack trace the log output? I've attached the log file from one of the BAMs. Please let me know if this isn't what you need.

  • Mark_DePristoMark_DePristo Broad InstituteMember admin

    Does the input BAM pass ValidateSamFile from Picard? It's a strange error that suggests your input BAM is malformed.

  • PeteHaitchPeteHaitch Member

    No, the BAM does not pass ValidateSamFile. See the attached log file for errors raised by it. The BAM was generated using BWA and then following GATK Variant Detection Best Practises v4. I extracted the reads that failed ValidateSamFile and tried running ReduceReads just on the "bad reads", and this worked successfully. I uploaded this reduced BAM to dropbox in case that helps diagnose the problem https://www.dropbox.com/s/w3ldnkajqdbriuc/292_duds.bam (it's tiny, only contains 4 reads).

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭
    Accepted Answer

    Hi Pete,

    Unfortunately, we can't support usage of bam file that aren't valid. Things should work just fine once you have a passing bam.

  • PeteHaitchPeteHaitch Member

    Okay. Two follow-up questions:

    • How do you at the Broad deal with BWA not complying with BAM spec (e.g. "MAPQ value should be 0 for unmapped read")
    • Why does ReduceReads seem to work on a BAM containing only those 4 reads that violate BAM specs?
  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭
    Accepted Answer

    For both questions: we suffer through it. The GATK can actually process through such errors. If those are the only bad ones then you are okay.

    I have a guess as to what's going on (and this should hopefully be handled better in release 2.1 as far as error messages). Perhaps you are running out of space in your /tmp directory (since Reduce Reads writes there until it's ready to write the final file). You should try setting your tmp directory to someplace with more memory (adding -Djava.io.tmpdir=X to your commandline, modifying X appropriately).

  • Mark_DePristoMark_DePristo Broad InstituteMember admin

    Unfortunately we don't have the resources to support running GATK tools on off spec BAM files. You'll have to fix your BAM file and then if the error recurs we'd be happy to look at it.

  • PeteHaitchPeteHaitch Member

    Just to confirm, it was a problem caused by lack of space in my tmp directory. Re-running the command with a different tmp directory worked.

Sign In or Register to comment.