Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Regarding of piping - Picard and BWA (Align and MergeBamAlignment step)

HyunminHyunmin Seoul, KoreaMember
edited February 2018 in Ask the GATK team

I made 3 bam files with **command in below.**

Picard version: 2.17.8
BWA version: 0.7.17-r1188

compression_level=2
java_opt="-Xmx32G"
bwa_version="0.7.17-r1188"
bwa_commandline="mem -K 100000000 -p -v 3 -t 64 -Y ${ref_fasta}"

java ${java_opt} -jar ${PICARD_JAR} SamToFastq \
I=${INPUT_BAM} \
INTERLEAVE=true NON_PF=true \
FASTQ=/dev/stdout \
TMP_DIR=${TMP_DIR} | \
${BWA} ${bwa_commandline} /dev/stdin - 2> >(tee ${OUTPUT_BAM}.stderr.log >&2) | \
java -Dsamjdk.compression_level=${compression_level} -Xms12G -jar ${PICARD_JAR} \
MergeBamAlignment \
    VALIDATION_STRINGENCY=SILENT \
    EXPECTED_ORIENTATIONS=FR \
    ATTRIBUTES_TO_RETAIN=X0 \
    ATTRIBUTES_TO_REMOVE=NM \
    ATTRIBUTES_TO_REMOVE=MD \
    ALIGNED_BAM=/dev/stdin \
    UNMAPPED_BAM=${INPUT_BAM} \
    OUTPUT=${OUTPUT_BAM} \
    REFERENCE_SEQUENCE=${ref_fasta} \
    PAIRED_RUN=true \
    SORT_ORDER="unsorted" \
    IS_BISULFITE_SEQUENCE=false \
    ALIGNED_READS_ONLY=false \
    CLIP_ADAPTERS=false \
    MAX_RECORDS_IN_RAM=2000000 \
    ADD_MATE_CIGAR=true \
    MAX_INSERTIONS_OR_DELETIONS=-1 \
    PRIMARY_ALIGNMENT_STRATEGY=MostDistant \
    PROGRAM_RECORD_ID="bwamem" \
    PROGRAM_GROUP_VERSION="${bwa_version}" \
    PROGRAM_GROUP_COMMAND_LINE="${bwa_commandline}" \
    PROGRAM_GROUP_NAME="bwamem" \
    UNMAPPED_READ_STRATEGY=COPY_TO_TAG \
    ALIGNER_PROPER_PAIR_FLAGS=true \
    UNMAP_CONTAMINANT_READS=true \
    ADD_PG_TAG_TO_READS=false

and I tried to MarkDuplicates step. but it had problem.

Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file: /BiO/Project/brandon-genome-analysis/analysis/B001.fastqtosam.unmerged.bam
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:530)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
at java.io.DataInputStream.read(DataInputStream.java:149)
at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:418)
at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:394)
at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:209)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:829)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:803)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:797)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:765)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:576)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
at htsjdk.samtools.util.PeekableIterator.advance(PeekableIterator.java:71)
at htsjdk.samtools.util.PeekableIterator.next(PeekableIterator.java:57)
at htsjdk.samtools.MergingSamRecordIterator.next(MergingSamRecordIterator.java:130)
at htsjdk.samtools.MergingSamRecordIterator.next(MergingSamRecordIterator.java:38)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:495)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:232)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

All BAM file was trucated.

$ samtools view -c /BiO/Project/brandon-genome-analysis/analysis/B001.fastqtosam.unmerged.bam
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read] Read block operation failed with error -1 after 107 of 180 bytes
[main_samview] truncated file.
$ samtools view -c /BiO/Project/brandon-genome-analysis/analysis/B002.fastqtosam.unmerged.bam
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read] Read block operation failed with error -1 after 1 of 180 bytes
[main_samview] truncated file.
$ samtools view -c /BiO/Project/brandon-genome-analysis/analysis/B003.fastqtosam.unmerged.bam
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read] Read block operation failed with error -1 after 10 of 39 bytes
[main_samview] truncated file.

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Hyunmin
    Hi,

    Can you try validating your input BAM files at each step with ValidateSamFile? I am wondering which step the error occurs at.

    -Sheila

  • HyunminHyunmin Seoul, KoreaMember
    edited February 2018

    $ java -jar /BiO/Install/picard-2.17.8/picard.jar ValidateSamFile I=B001.fastqtosam.unmerged.bam MODE=SUMMARY

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Hyunmin
    Hi,

    Okay, so that is the file before MarkDuplicates. What about the original BAM file that you started with and the output of bwa?

    I am wondering if this is an issue with MergeBamAlignment, bwa or your original BAM file.

    Thanks,
    Sheila

  • FPBarthelFPBarthel HoustonMember ✭✭

    I am also having this error with some WGS data from TCGA. Currently doing more testing to provide a more accurate description of the error, but this is taking a while because the files are very large.

    Note that in my case I am not getting this error in the MarkDuplicate step but already while running SamToFastq | bwa | MergeBamAlignment. Interestingly, it only seems to be happening in one of several readgroups that were returned by RevertSam and further processed via MarkIlluminaAdapters. Still, the error seems to be shared across several samples and not unique to one sample. I am running more tests to determine whether there is any chance that a) the downloaded data is corrupt or b) prior pre-processsing steps led to an invalid file.

    The error seems to specifically target the uBAM file input to MergeBamAlignment via --UNMAPPED_BAM. The error message given is:

    [{DATE_TIME}] picard.sam.MergeBamAlignment done. Elapsed time: 142.73 minutes.
    Runtime.totalMemory()=4079484928
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    htsjdk.samtools.FileTruncatedException: Premature end of file: /path/to/{SAMPLE_ID}.{READGROUP_ID}.revertsam.markadapters.bam
    
  • FPBarthelFPBarthel HoustonMember ✭✭

    I believe I have found the reason for this error. Several MarkIlluminaAdapters processes got killed prematurely and because the output data was present it did not signal any errors in the workflow. I have resolved this issue by checking for the metric file (which is generated at the end).

Sign In or Register to comment.