Gathered bam could not be indexed

yaohuyaohu beijingMember

Hi,
I runed indelRealigner of the same bam for 32 times each with a -L option which specify an interval, then I gathered the bam files of the outputs using picard GatherBamFiles(I found that the gather function of bam gather is a wrapper of picard GatherBamFiles). Then I have a large bam file, but when I tried to index the bam file, there is this error:

[E::bgzf_read] bgzf_read_block error -1 after 147 of 254 bytes
samtools index: "gathered.bam" is corrupted or unsorted

I dont know why is this, could anyone help me? thanks a lot.

Best Answers

Answers

  • yaohuyaohu beijingMember

    @Sheila said:
    @yaohu
    Hi,

    Can you please post the exact commands you ran? There is an option --nWayOut that allows you to merge all the BAM files into one in the output of IndelRealigner. You can try using that instead of GatherBamFiles.

    -Sheila

    @Sheila
    Hi Sheila,

    Thanks for the quick reply, the problem I reported is solved using the latest picard. But there is another issue. I have run the following command for 32 times parallelly to save time, each with a small interval, the intervals would append to a entire reference.

    $JAVA -d64 -jar $GATK \
    -T IndelRealigner \
    -R $ref \
    -I $input \
    -L $interval \
    $known_string \
    -targetIntervals $target_interval \
    -o $output

    So after the runs finished I gathered the bams, and hope the output bam would be the same with the single original output bam which is run without interval specified, but when I diffed them using bam diff, there are big difference, is this normal? how could I distribute the compute without changing the result.

Sign In or Register to comment.