fixing the errors after running ValidateSamFile

BogdanBogdan Palo Alto, CAMember ✭✭

Dear all, and Geraldine,

I am using ValidateSamFile in order to check some SAM files, and I am getting the following errors :

HISTOGRAM java.lang.String

Error Type Count
ERROR:INVALID_INDEXING_BIN 2
ERROR:MATE_NOT_FOUND 499720

Please could you advice if there is a way to fix these errors in the BAM file ? Many thanks !

Comments

  • BogdanBogdan Palo Alto, CAMember ✭✭

    If I may add another error to be fixed : "ERROR: Record 4558866, Read name C38P6ACXX_0:4:2116:2543121:0, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned"

    I shall mentioned that I have used bamtools to split a bam file per chromosome. If the topics has been addressed before, also please let me know. Much thanks !

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Did the original file (before processing with bamtools) pass validation?

  • BogdanBogdan Palo Alto, CAMember ✭✭

    thanks Geraldine for kind help ;) wow, you are staying late. You are right, I shall do the validation before the use of the BamTools,
    and will let you know. In any case, if there is a quick way to fix these files, also please let me know. Thanks again, good night ;)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hah, I often catch up on forum questions while watching tv in the evening. There's a lot in common between GATK errors and Game of Thrones :)

    MATE_NOT_FOUND errors are commonly encountered when you take a subset of a bam. You can either remove the reads that lost their mates because they sit on the edges, or Picard FixMateInformation might fix that -- but I'm not sure. Generally speaking FixMateInformation is good for dealing mate-related errors.

    INVALID_INDEXING_BIN errors I'm not familiar with. That's the one that worries me more. Would be good to know at what point this arose.

  • BogdanBogdan Palo Alto, CAMember ✭✭

    thanks Geraldine, eheee, happy evening watching ! I would add that the GATK forum looks like a documentary especially late at night, very informative ! thanks for keeping so goo tracks of all the questions, and for the very prompt answers - we are all very happy ;)

    REgarding the validation of the BAM files, yes, I ran "ValidateSamFile" on the original BAM file produced by BWA MEM, and it says "No errors found". So I think that the errors occur after splitting the BAM files per chromosome with BAMTOOLS.

    On a side note, please could you suggest a way to split the BAM file per chromosome without having to SORT and INDEX first ?

    I was using "samtools view FILE.bam | grep -w chr22 ", then to add the HEADER for the file, and transform it back into BAM : it is complicated. IS there possibly a simpler way ?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Glad to hear the forum is useful; as a straight-up documentary it would be pretty chaotic but there's definitely a lot of information flowing through here :)

    Yep, sounds like bamtools' fault... Best contact the authors to let them know about this. You can use e.g. PrintReads (or there might be an equivalent Picard tool) to do what you want in the meantime -- but they do require sorting and indexing. I'm not aware of any reliable way to bypass that requirement.

Sign In or Register to comment.