Attention:
The frontline support team will be unavailable to answer questions until May27th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

QC the data before UnifiedGenotyper

mahyarheymahyarhey BostonMember
edited January 2014 in Ask the GATK team

I run UnifiedGenotyper for 84 samples using RNAseq data (Bam-files). I got the following information in the Log file. As it can bee seen, about 27% of my reads failed for many reasons. How can I decrease the percentage of reads which filtered? In other words how I can improve my QC of data before UnifiedGenotyper. Thanks!

INFO  14:15:23,762 MicroScheduler - 970952111 reads were filtered out during the traversal out of approximately 3587224222 total reads (27.07%) 
INFO  14:15:23,762 MicroScheduler -   -> 147143075 reads (4.10% of total) failing BadMateFilter 
INFO  14:15:23,762 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter 
INFO  14:15:23,762 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
INFO  14:15:23,763 MicroScheduler -   -> 566795160 reads (15.80% of total) failing MalformedReadFilter 
INFO  14:15:23,763 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
INFO  14:15:23,763 MicroScheduler -   -> 257013876 reads (7.16% of total) failing NotPrimaryAlignmentFilter 
INFO  14:15:23,763 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter
Post edited by Geraldine_VdAuwera on

Best Answer

Answers

Sign In or Register to comment.