MuTect2 Discards lots of reads?

MuTect2 discards 74.26% of the reads as below, becasue of 39% duplicate reads ans 34 % notprimary reads.Below is Star alignments we have higher mapping? And below is my commad, I don't have matched normal tumor so I compare tumor to PON::

Command

 s76mutect2.vcf.gz: 
        java -jar ${GATK}/GenomeAnalysisTK.jar -T MuTect2 \
        -I:tumor s76.bam \
        --dbsnp ${DBSNP} \
       --cosmic ${COSMIC} \
       --normal_panel pon_siteonly.vcf.gz \
       --output_mode EMIT_VARIANTS_ONLY \
       -o s76mutect2.vcf.gz \
       -R ${hg38}.fasta

MuTect2

INFO  15:51:22,661 MicroScheduler - 95548100 reads were filtered out during the traversal out of approximately 
128672300 total reads (74.26%) 
INFO  15:51:22,663 MicroScheduler -   -> 5587 reads (0.00% of total) failing BadCigarFilter 
INFO  15:51:22,664 MicroScheduler -   -> 51034583 reads (39.66% of total) failing DuplicateReadFilter 
INFO  15:51:22,679 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
INFO  15:51:22,680 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter 
INFO  15:51:22,682 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
INFO  15:51:22,684 MicroScheduler -   -> 44507930 reads (34.59% of total) failing NotPrimaryAlignmentFilter 
INFO  15:51:22,685 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter 
------------------------------------------------------------------------------------------
Done

STAR

                      Number of input reads |       33174546
                  Average input read length |       270
                                UNIQUE READS:
               Uniquely mapped reads number |       25779436
                    Uniquely mapped reads % |       77.71%
                      Average mapped length |       272.83
                   Number of splices: Total |       11186136
        Number of splices: Annotated (sjdb) |       0
                   Number of splices: GT/AG |       11038925
                   Number of splices: GC/AG |       67204
                   Number of splices: AT/AC |       5902
           Number of splices: Non-canonical |       74105
                  Mismatch rate per base, % |       0.47%
                     Deletion rate per base |       0.02%
                    Deletion average length |       1.34
                    Insertion rate per base |       0.01%
                   Insertion average length |       1.60
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       6200684
         % of reads mapped to multiple loci |       18.69%
    Number of reads mapped to too many loci |       277368
         % of reads mapped to too many loci |       0.84%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |       0.00%
             % of reads unmapped: too short |       2.62%
                 % of reads unmapped: other |       0.15%
                              CHIMERIC READS:
                   Number of chimeric reads |       0
                        % of chimeric reads |       0.00%

Thanks

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Sherwit
    Hi,

    Which version of MuTect2 are you running? Did you pre-process your data according to the Best Practices? Those are a lot of duplicate reads. You may check with your sequencing provider why that may be. You are not working with amplicon data, are you? You can read more about what the NotPrimaryAlignmentFilter does here. It does look like up to 20% of the reads did not map properly in the STAR output, and perhaps GATK tools are a bit more stringent.

    -Sheila

  • SherwitSherwit Member
    edited February 28

    I am using this GATK v3.7-0
    And yes, I preprocessed the data according to best practises. Thanks

  • I am using this GATK v3.7-0
    And yes, I preprocessed the data according to best practises. Thanks

Sign In or Register to comment.