Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Gathered bam could not be indexed

yaohuyaohu beijingMember

Hi,
I runed indelRealigner of the same bam for 32 times each with a -L option which specify an interval, then I gathered the bam files of the outputs using picard GatherBamFiles(I found that the gather function of bam gather is a wrapper of picard GatherBamFiles). Then I have a large bam file, but when I tried to index the bam file, there is this error:

[E::bgzf_read] bgzf_read_block error -1 after 147 of 254 bytes
samtools index: "gathered.bam" is corrupted or unsorted

I dont know why is this, could anyone help me? thanks a lot.

Best Answers

Answers

  • yaohuyaohu beijingMember

    @Sheila said:
    @yaohu
    Hi,

    Can you please post the exact commands you ran? There is an option --nWayOut that allows you to merge all the BAM files into one in the output of IndelRealigner. You can try using that instead of GatherBamFiles.

    -Sheila

    @Sheila
    Hi Sheila,

    Thanks for the quick reply, the problem I reported is solved using the latest picard. But there is another issue. I have run the following command for 32 times parallelly to save time, each with a small interval, the intervals would append to a entire reference.

    $JAVA -d64 -jar $GATK \
    -T IndelRealigner \
    -R $ref \
    -I $input \
    -L $interval \
    $known_string \
    -targetIntervals $target_interval \
    -o $output

    So after the runs finished I gathered the bams, and hope the output bam would be the same with the single original output bam which is run without interval specified, but when I diffed them using bam diff, there are big difference, is this normal? how could I distribute the compute without changing the result.

Sign In or Register to comment.