Help with BaitDepthWalker analysis


I am trying to get the coverage data using BaitDepthWalker tool:

java -jar recapseg- -T BaitDepthWalker --input_file normal_aln_IndelRealigned_ReorderedSam.bam --intervals zebrafish_zv9_core.baits_merged.bed --out normal_aln_IndelRealigned_Covdata --bed zebrafish_zv9_core.baits_merged.bed --reference_sequence danRer7.fa

The error message I get is:

ERROR MESSAGE: Input files zebrafish_zv9_core.baits_merged.bed and reference have incompatible contigs: Relative ordering of overlapping contigs differs, which is unsafe.

I used picard tool - ReorderSam on the input BAM, but still get the same error.

Could you please help.



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    I'm not yet familiar with how Recapseg works, but based on the error you're getting this is a pretty general problem, which is addressed in the GATK documentation about input files. Basically you need to make sure that the intervals in your bed file are in the same order as the contigs in your reference file.

  • Hello Geraldine,

    Thank you for the reply.

    Is there a tool I could use to format the bed file based on reference contigs?


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    We have an old script to do that; we rarely use it ourselves so I can't guarantee it will work out of the box, but you can give it a try:

  • Thanks for sharing the script. I will give it a try.

  • Hello Geraldine,

    I am trying to run the script on the bed file,but I get an error:

    Dictionary file is probably corrupt: multiple instances of contig tgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtg at ./ line 51, line 4608.

    I should also mention that the reference genome is Zebrafish. Is there a way to ignore duplicated sequences in the reference?


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    This error suggests that your file is formatted in a way that does not conform to the script's expectations. It's trying to read contig names but finding sequence data. Can you maybe post a couple of complete lines so we can see how the data is formatted?

  • So, I used one of my perl scripts that merges 2 files based on column number as a hack to get the reordered bam file, and when I run the BaitDepthWalker job:

    java -jar recapseg- -T BaitDepthWalker --input_file normal_aln_IndelRealigned.bam --intervals zebrafish_zv9_core.baits_merged_ordered.bed --out normal_aln_IndelRealigned_Covdata --bed zebrafish_zv9_core.baits_merged_ordered.bed --reference_sequence danRer7.fa

    INFO 12:25:24,568 ReadShardBalancer$1 - Done loading BAM index data
    INFO 12:25:24,607 BaitDepthWalker - Number in baits = 196042, number outside of baits = 0
    INFO 12:25:24,612 ProgressMeter - done 1.96e+05 51.0 s 4.4 m 99.9% 51.0 s 0.0 s
    INFO 12:25:24,613 ProgressMeter - Total runtime 51.72 secs, 0.86 min, 0.01 hours
    INFO 12:25:24,901 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 196042 total reads (0.00%)
    INFO 12:25:24,901 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO 12:25:26,645 GATKRunReport - Uploaded run statistics report to AWS S3

    Output file:
    more normal_aln_IndelRealigned_Covdata

    Is this what the expected output should look like?

    Could you please let me know if I missed something?


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @sgujja I'm not familiar enough with recapseg to answer your question. See the recapseg forum for contact information of someone who can answer this.

  • Sure.Thanks for all the help.

  • This code does not work properly, I apply on our code section some errors are generated, if you have any other code, please share with us. For any iTunes Error 56 regarding join us.

Sign In or Register to comment.