Why does -dcov with PrintReads not filter out any reads from my amplicon data?
This question seems to have been asked before http://gatkforums.broadinstitute.org/discussion/3361/dcov-on-a-bam-file-to-generate-bam-file-output but after reading to the end of the thread I did not see an answer to the final question namely if you have amplicon data which show a number of reads which all start from the same position why does the dcov setting not filter down these reads?
I have run PrintReads as such:
java -Xmx20g \ -jar GenomeAnalysisTK.jar \ -T PrintReads \ -R GRCh37.fa \ -I examplesort.bam \ -o exampledownsample.bam \ -dcov 1
and the output given is:
INFO 16:57:05,223 ProgressMeter - Total runtime 262.65 secs, 4.38 min, 0.07 hours INFO 16:57:05,228 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 5965722 total reads (0.00%) INFO 16:57:05,229 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 16:57:05,229 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 16:57:06,215 GATKRunReport - Uploaded run statistics report to AWS S3
When I look at the reads that I have in certain highly covered regions I can see
Is there a reason why these reads are not being filtered down? Possibly I am not understanding how the dcov function works.