Foolproof way to get reads which will pass Picard, without losing too many
Ok, after some time away, we are getting back to writing a remapping tool that can handle lots of different cases. The protocol suggested by the Broad is good, but fails in cases when the bam had previous issues (weird read groups, weird read names, etc.), so we are doing it from scratch.
My issue now is that we are getting the following error A LOT:
Exception in thread "main" picard.PicardException: Adjacent reads expected to be mate-pairs have different names:
This occasionally happens even after running FixMateInformation (or the opposite error, two reads with the same name but not marked as a mate-pair)
I'm at the point where after doing all the cleaning steps (CleanSam, FixMateInformation...) I just want to throw out any read that is going to throw an exception for a downstream tool (in Picard and the GATK), but of course, I don't want to throw out anything else. Is
samtools view -f 2 -b mybam.bam > mybam_that_will_not_fail.bam going to work? Is it too restrictive? Is there another approach you would suggest to get my bams in shape?