MappingQualityZeroFilter filters a large proportion of reads

I browsed through the forum and found some users have the same problem with me.
This is the BaseRecalibrator walker. My command works fine for generating the -grp file and printRead for the new bam files. However, in the standard error files, I found that there are a large proportion of reads fail the MappingQualityZeroFilter.

NFO 05:37:33,854 ProgressMeter - Total runtime 51462.91 secs, 857.72 min, 14.30 hours
INFO 05:37:33,855 MicroScheduler - 263828269 reads were filtered out during the traversal out of approximately 660528125 total reads (39.94%)
INFO 05:37:33,855 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO 05:37:33,856 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 05:37:33,856 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 05:37:33,856 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 05:37:33,857 MicroScheduler - -> 262748250 reads (39.78% of total) failing MappingQualityZeroFilter
INFO 05:37:33,857 MicroScheduler - -> 1080019 reads (0.16% of total) failing NotPrimaryAlignmentFilter
INFO 05:37:33,857 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
INFO 05:37:35,741 GATKRunReport - Uploaded run statistics report to AWS S3

This makes me hesitate to move on.

My working pipeline followed strictly to the GATK best practice, using BWA mem for alignment and the samtools showed over 90% of reads were mapped the reference genome. I understand that it may be beyond the support. But, I really donot know how to go with this problem. How to tackle it? Can I move on with this? If not, which place to tackle with this problem?

I would like to hear your suggestions.

Answers

  • jrandalljrandall Member

    fern,

    You didn't give very much detail about your data and the command lines you have used, but if the input to BaseRecalibrator is coming directly from bwa mem, then it sounds like bwa mem has given ~40% of the reads a mapping quality of 0. Mapping quality 0 often means that the reads map equally well to multiple places in the reference.

    By any chance are you working with human data and mapping to GRCh38 (including the ALT contigs)? The ALT contigs contain sequence very similar to the standard reference, so if you naively map to them without being ALT-aware, you would be likely to get a large proportion of reads with mapping quality zero.

    bwa mem supports ALT-aware mapping, but you need to call it properly in order to make sure that it knows what the ALT contigs are (including a post-processing step with bwa-postalt.js. You can refer to the bwa README for more information: https://github.com/lh3/bwa/blob/master/README-alt.md#sequence-alignment

    Cheers,

    Josh.

Sign In or Register to comment.