Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Error "Mate unmapped flag should not be set for unpaired reads" after RevertSam

Hi, I am following the Tutorial#6484, Reverting a bam file into ubam, which runs successfully. However, when I tried to apply MarkIlluminaAdapters, I got an error message of "Mate unmapped flag should not be set for unpaired reads". I ran ValidateSamFile on the ubam file and got a reported error of "INVALID_FLAG_MATE_UNMAPPED". I checked the read in the original file which has a flag of 0, indicating an unpaired read that is mapped onto the forward strand. It seems that RevertSam mis-process single-ended reads. Do I have to remove all reads with flag of 0 in my original bam file?

Thank you!
Kaixiong

Tagged:

Issue · Github
by Sheila

Issue Number
1010
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
sooheelee

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited June 2016

    Hi @yekaixiong,

    I want to make sure I understand what is going on. You're saying the 0 flag read ends up with a 0x8 flag bit via RevertSam. Is this correct? Can you also clarify if your data file contains a mix of PE and SE reads or just SE reads?

  • yekaixiongyekaixiong Member

    My original BAM file is downloaded from 1000 Genomes Project. I used RevertSam directly on the BAM file, which ran through without any error message. But in the next step when I used MarkIlluminaAdapters, I got an error message of "Mate unmapped flag should not be set for unpaired reads". Since the error message also contain the read's name, I checked the flag of this read in the original BAM file, and it has a flag of 0, suggesting that it is a SE read. I was not able to check the flag in the RevertSam file because trying to view the BAM file also reported error message. Based on the flag in the original BAM file, yes, there is a mix of PE and SE reads.

    In my second try, I firstly remove SE reads in the original BAM file with the following command:

    samtools view -f 0x1 -b ..................

    The resulting new BAM file no longer has the same error and it runs through the pipeline. Do you think the previous error was caused by the mix of SE and PE reads?

    Thanks for your help.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited June 2016

    @yekaixiong Yes, definitely the wording of the error Mate unmapped flag should not be set for unpaired reads indicates that the mix of SE and PE reads causes anomalous flagging during the RevertSam step. Specifically, the mix of read types had RevertSam flagging SE reads with a mate unmapped flag (0x4). MarkIlluminaAdapters caught this oddity in flagging.

    That being said, the mixed read type BAM should have different read groups per data type. You can double-check this in the file header by grepping for the @RG lines (samtools view -H file.bam | grep '@RG'). If this is the case, then you should use RevertSam's OUTPUT_BY_READGROUP option by setting it to true. With this option, you'll have to supply an OUTPUT_MAP file instead of the OUTPUT file name. This way you should be able to view your BAMs without error (assuming your viewing error is related) and you can run both data types (SE and PE) separately through MarkIlluminaAdapters. Note that MarkIlluminaAdapters's default parameters for MIN_MATCH_BASES is different for SE versus PE (12 versus 6).

    Happy to help. I hope this clarifies things.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    This issue has been fixed by a Picard repo code change documented here as a pull request and here as the related github issue. The effect of this code change is that when encountering mate-missing reads in a PE data file, RevertSam will now remove all mate information and thereby effectively turn mate-missing records into SE reads.

Sign In or Register to comment.