Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
UnmappedReadFilter

I would like to know how UnmappedReadFilter identify umapped reads from bam files?
one more thing, the read count is much higher than number of reads i have in the original sam file.So, how mutect2 counts the number of reads?
Mutect2 log:
MicroScheduler - 123005 reads were filtered out during the traversal out of approximately 767207040 total reads
but the original sam file has 30,777,253 reads
Any idea?
Thank you
Tagged:
Answers
Unmapped reads are tagged with a flag by the aligner. We just read the flag. See https://software.broadinstitute.org/gatk/blog?id=7019 for an explanation of SAM flags.
How are you counting the number of reads in your file?
Thnks for the reply. I think there is something strange in counting the number of reads.
I counted the reads using wc -l file.sam and samtools flagstat
I got same number of reads using both wc -l file.sam and samtools flagstat
====
samtools flagstat file.bam
30777253+ 0 in total (QC-passed reads + QC-failed reads)
....
....
=====
By any means the MuTect2 count should never be higher than wc -l. It can be lower due to filteration of reads,..etc but not higher.
Do you know why i got this large number of reads ?
Thank you!!
That is indeed very odd. What happens if you just run on a small region, does it give you correct counts, or are they also inflated?
@AsJ
Hi,
Can you please try using CountReads with
-drf MalformedReadFilter -drf BadCigarFilter
? I wonder if there is some discrepancy between Samtools and GATK.Thanks,
Sheila
Hi,
Thank you for the answer.
I have tried running CountReads and i got the same number of reads as samtools or wc -l file.sam.
This means that the log message by Mutect2 was not correct.
------
java -jar GenomeAnalysisTK.jar -R reference.fa -T CountReads -I sortedfile.bam
Output:
INFO ...... CountReads - CountReads counted 30777253 reads in the traversal
Issue · Github
by Sheila
@AsJ
Hi,
Can you confirm this happens with the latest version/nightly build? I will check with the team what they think.
-Sheila
Hi @AsJ, our developer pointed out that the program is probably counting the combined read counts from both the tumor and the normal bam. Can you confirm you are running this with both tumor and normal bams?
Hi VdAuwera,
Yes, I am running this with both tumor and normal bams.
Ok, then that makes sense. We'll see if we can make that clearer in the output.