QC the data before UnifiedGenotyper

mahyarheymahyarhey BostonMember
edited January 2014 in Ask the GATK team

I run UnifiedGenotyper for 84 samples using RNAseq data (Bam-files). I got the following information in the Log file. As it can bee seen, about 27% of my reads failed for many reasons. How can I decrease the percentage of reads which filtered? In other words how I can improve my QC of data before UnifiedGenotyper. Thanks!

INFO  14:15:23,762 MicroScheduler - 970952111 reads were filtered out during the traversal out of approximately 3587224222 total reads (27.07%) 
INFO  14:15:23,762 MicroScheduler -   -> 147143075 reads (4.10% of total) failing BadMateFilter 
INFO  14:15:23,762 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter 
INFO  14:15:23,762 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
INFO  14:15:23,763 MicroScheduler -   -> 566795160 reads (15.80% of total) failing MalformedReadFilter 
INFO  14:15:23,763 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
INFO  14:15:23,763 MicroScheduler -   -> 257013876 reads (7.16% of total) failing NotPrimaryAlignmentFilter 
INFO  14:15:23,763 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter
Post edited by Geraldine_VdAuwera on

Best Answer

Answers

Sign In or Register to comment.