Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

QC the data before UnifiedGenotyper

mahyarheymahyarhey BostonMember
edited January 2014 in Ask the GATK team

I run UnifiedGenotyper for 84 samples using RNAseq data (Bam-files). I got the following information in the Log file. As it can bee seen, about 27% of my reads failed for many reasons. How can I decrease the percentage of reads which filtered? In other words how I can improve my QC of data before UnifiedGenotyper. Thanks!

INFO  14:15:23,762 MicroScheduler - 970952111 reads were filtered out during the traversal out of approximately 3587224222 total reads (27.07%) 
INFO  14:15:23,762 MicroScheduler -   -> 147143075 reads (4.10% of total) failing BadMateFilter 
INFO  14:15:23,762 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter 
INFO  14:15:23,762 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
INFO  14:15:23,763 MicroScheduler -   -> 566795160 reads (15.80% of total) failing MalformedReadFilter 
INFO  14:15:23,763 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
INFO  14:15:23,763 MicroScheduler -   -> 257013876 reads (7.16% of total) failing NotPrimaryAlignmentFilter 
INFO  14:15:23,763 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter
Post edited by Geraldine_VdAuwera on

Best Answer

Answers

Sign In or Register to comment.