Foolproof way to get reads which will pass Picard, without losing too many

Ok, after some time away, we are getting back to writing a remapping tool that can handle lots of different cases. The protocol suggested by the Broad is good, but fails in cases when the bam had previous issues (weird read groups, weird read names, etc.), so we are doing it from scratch.

My issue now is that we are getting the following error A LOT:

Exception in thread "main" picard.PicardException: Adjacent reads expected to be mate-pairs have different names:

This occasionally happens even after running FixMateInformation (or the opposite error, two reads with the same name but not marked as a mate-pair)

I'm at the point where after doing all the cleaning steps (CleanSam, FixMateInformation...) I just want to throw out any read that is going to throw an exception for a downstream tool (in Picard and the GATK), but of course, I don't want to throw out anything else. Is

samtools view -f 2 -b mybam.bam > mybam_that_will_not_fail.bam going to work? Is it too restrictive? Is there another approach you would suggest to get my bams in shape?

Best Answer


  • dannykwellsdannykwells San FranciscoMember
    edited September 2017

    To exapnd on this, for example, after running the above commands I get the error:

    Exception in thread "main" picard.PicardException: Two reads with same name but not correctly marked as 1st/2nd of pair: H79B6ADXX140122:1:1101:12275:28092

    When I do the following:
    samtools view SC6470_N.bam | grep H79B6ADXX140122:1:1101:12275:28092

    I get

    H79B6ADXX140122:1:1101:12275:28092  371 chr6    104130282   60  39H37M  chr19   11170633    0   GCCTTCCCCCACCCCCCTACACACACACACTCCTACC   CDDDDDDDB70DJIJJJJJIHHJHFFDFFFFFDB?CB   NM:i:0  MD:Z:37 AS:i:37 XS:i:24 RG:Z:SC6470_N.H79B6.1   SA:Z:chr19,11170542,+,30S46M,60,1;
    H79B6ADXX140122:1:1101:12275:28092  99  chr19   11170542    60  30S46M  =   11170633    167 GGTAGGAGTGTGTGTGTGTAGGGGGGTGGGGGAAGGCTCCGAATCCGAATGTGAGTCCCGGGGGGGTTCAGGATGC    BC?[email protected]DDDDDCDDD    NM:i:1  MD:Z:43C2   AS:i:43 XS:i:0  RG:Z:SC6470_N.H79B6.1   SA:Z:chr6,104130282,-,39S37M,60,0;
    H79B6ADXX140122:1:1101:12275:28092  147 chr19   11170633    60  76M =   11170542    -167    CAGCCCTCCCGGCTCCCAGACGCCCCTTGCTGTGGGGGTGCTGCATTCCCAGAGCTCAAGGCTGTCTTTCCCTCCC    [email protected][email protected]@    NM:i:0  MD:Z:76 AS:i:76 XS:i:0  RG:Z:SC6470_N.H79B6.1

    Now, the sam flag 371 indicates
    read paired
    read mapped in proper pair
    read reverse strand
    mate reverse strand
    first in pair
    not primary alignment
    From this, it is totally obvious that the first two reads are a pair. I have run "FixMateInformation" on this sample, but it is not fixing this issue. Is it clear to anyone over there what I can do to rescue this sample?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    What aligner did you use? Which version of Picard are you using? Are you saying after FixMateInformation, you are still getting the error messages?

    Also, I think Soo Hee has been helping you here. Perhaps she can still help in that thread? She is on vacation, but will be back next week.


  • dannykwellsdannykwells San FranciscoMember

    Hi @Sheila

    To be honest, what I'm really looking for is a way to tell picard to skip over errors in single reads and output them to, say, stderr or somewhere else. I deal with a lot of poorly processed bams and we see a lot of weird stuff. Rather than try to account for it, I'd be fine just having an output and moving on. Right now, runs are erroring out over a single read (out of 100,000,000). We work at scale (1000's of bams) and I would prefer to not have to pick individual reads out of bams, rather, just put them out and move on.

    Is there a way to set picard up to do that?

    Thank you!

Sign In or Register to comment.