If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
MappingQualityZeroFilter filters a large proportion of reads
I browsed through the forum and found some users have the same problem with me.
This is the BaseRecalibrator walker. My command works fine for generating the -grp file and printRead for the new bam files. However, in the standard error files, I found that there are a large proportion of reads fail the MappingQualityZeroFilter.
NFO 05:37:33,854 ProgressMeter - Total runtime 51462.91 secs, 857.72 min, 14.30 hours
INFO 05:37:33,855 MicroScheduler - 263828269 reads were filtered out during the traversal out of approximately 660528125 total reads (39.94%)
INFO 05:37:33,855 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO 05:37:33,856 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 05:37:33,856 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 05:37:33,856 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 05:37:33,857 MicroScheduler - -> 262748250 reads (39.78% of total) failing MappingQualityZeroFilter
INFO 05:37:33,857 MicroScheduler - -> 1080019 reads (0.16% of total) failing NotPrimaryAlignmentFilter
INFO 05:37:33,857 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
INFO 05:37:35,741 GATKRunReport - Uploaded run statistics report to AWS S3
This makes me hesitate to move on.
My working pipeline followed strictly to the GATK best practice, using BWA mem for alignment and the samtools showed over 90% of reads were mapped the reference genome. I understand that it may be beyond the support. But, I really donot know how to go with this problem. How to tackle it? Can I move on with this? If not, which place to tackle with this problem?
I would like to hear your suggestions.