Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

about validation of SAM files per chromosome

BogdanBogdan Palo Alto, CAMember ✭✭

Dear Geraldine, et all, happy new week ! I am writing to ask for a piece of help:

again, I have split a BAM file per chomosome using NGSUTILS http://ngsutils.org/. However, when I am aiming to validate the resulting BAM files (per chromosome), ValidateSAM gives the following messages below. Please could you advise whether there is a way to fix these files (would FixMAte Information tool do this ?). Thanks !

The errors were :

HISTOGRAM java.lang.String

Error Type Count
ERROR:MATE_NOT_FOUND 572778

HISTOGRAM java.lang.String

Error Type Count
ERROR:MATE_NOT_FOUND 442434

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Bogdan, if those errors were produced by splitting the bams by chromosome, the most likely explanation is that those are so-called chimeric read pairs where the mates are on different chromosomes. If so there is nothing that FMI can do. Depending on what data you're working with and what you're looking for, you may not care and be willing to throw them out, or you may actually want to preserve them (if you're eg looking for structural variants, where chimeric read pairs are a sign of chromosomal rearrangement).

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Dear Geraldine, thank you ! yes, I could potentially "mask" the reads that are part of chimeric read-pairs. We may need those reads later when running BreakDancer.

    Please could I ask if there is any way to "mask" the chimeric reads in such a way that ValidateSam gives OK to the files for down-stream analyses ?

    I have another question, but will place it in a new track. Many thanks again for helpful suggestions !

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Bogdan
    Hi Bogdan,

    I suspect you are asking about masking the error-causing reads so GATK will not crash. You don't have to worry about the MATE_NOT_FOUND errors from ValidateSamFile, as GATK will not crash on those.

    -Sheila

  • BogdanBogdan Palo Alto, CAMember ✭✭

    great thanks Sheila for reassurance ;)

Sign In or Register to comment.