Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Errors in GATK generated Bam file

Hi everyone,

I have followed the instructions found here https://www.broadinstitute.org/gatk/guide/article?id=3891 to analyze RNA-seq data. However, when I tried to validate the final bam file using ValidateSamFile tool I get the following errors:

HISTOGRAM java.lang.String

Error Type Count
ERROR:INVALID_CIGAR 43527
ERROR:MATES_ARE_SAME_END 4416796
ERROR:MATE_NOT_FOUND 4522387
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 3675811
ERROR:MISMATCH_MATE_ALIGNMENT_START 7743225
ERROR:MISMATCH_MATE_CIGAR_STRING 7664
WARNING:MISSING_TAG_NM 75931282

It looks like the Split'N'Trim step is the cause of these errors since I validate the bam files before and after.

Can anyone tell me why I am getting these errors and how to fix them please?

I appreciate your help,
Regards
Hak

Issue · Github
by Sheila

Issue Number
171
State
closed
Last Updated
Closed By
vdauwera

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sekopico
    Hi Hak,

    I am waiting for a response from the tool developer. When he gets back to me, I will let you know.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We think these are an expected side-effect of the splitting, but we haven't yet been able to get confirmation from the developer because he is currently unavailable. Unfortunately we will not have an answer until next week at the earliest. Apologies for the inconvenience.

  • sekopicosekopico DohaMember

    Thank you Sheila and Geraldine for your responses.

    I am very carious to know how to resolve these errors, however for the time being, should I ignore this step in my process or the validation step?

    Thank you
    Hak

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    What's happening here is that the tool splits read that have the N cigar operator into separate reads, so the original mate pair information is no longer valid. I'm a little surprised that we didn't add any logic to handle this more gracefully so there wouldn't be any validation errors, but I believe this is not a problem for downstream processing. Just be sure that you're using the latest version because there were a few bugs in the early versions of this tool.

Sign In or Register to comment.