about validation of SAM files per chromosome

Bogdan

Dear Geraldine, et all, happy new week ! I am writing to ask for a piece of help:

again, I have split a BAM file per chomosome using NGSUTILS http://ngsutils.org/. However, when I am aiming to validate the resulting BAM files (per chromosome), ValidateSAM gives the following messages below. Please could you advise whether there is a way to fix these files (would FixMAte Information tool do this ?). Thanks !

The errors were :

  Geraldine_VdAuwera

    Hi Bogdan, if those errors were produced by splitting the bams by chromosome, the most likely explanation is that those are so-called chimeric read pairs where the mates are on different chromosomes. If so there is nothing that FMI can do. Depending on what data you're working with and what you're looking for, you may not care and be willing to throw them out, or you may actually want to preserve them (if you're eg looking for structural variants, where chimeric read pairs are a sign of chromosomal rearrangement).

  Bogdan

    Dear Geraldine, thank you ! yes, I could potentially "mask" the reads that are part of chimeric read-pairs. We may need those reads later when running BreakDancer.

    Please could I ask if there is any way to "mask" the chimeric reads in such a way that ValidateSam gives OK to the files for down-stream analyses ?

    I have another question, but will place it in a new track. Many thanks again for helpful suggestions !

  Sheila

    Hi Bogdan,

    I suspect you are asking about masking the error-causing reads so GATK will not crash. You don't have to worry about the MATE_NOT_FOUND errors from ValidateSamFile, as GATK will not crash on those.


  Bogdan

    great thanks Sheila for reassurance ;)

