Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Why does -dcov with PrintReads not filter out any reads from my amplicon data?
This question seems to have been asked before http://gatkforums.broadinstitute.org/discussion/3361/dcov-on-a-bam-file-to-generate-bam-file-output but after reading to the end of the thread I did not see an answer to the final question namely if you have amplicon data which show a number of reads which all start from the same position why does the dcov setting not filter down these reads?
I have run PrintReads as such:
java -Xmx20g \ -jar GenomeAnalysisTK.jar \ -T PrintReads \ -R GRCh37.fa \ -I examplesort.bam \ -o exampledownsample.bam \ -dcov 1
and the output given is:
INFO 16:57:05,223 ProgressMeter - Total runtime 262.65 secs, 4.38 min, 0.07 hours INFO 16:57:05,228 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 5965722 total reads (0.00%) INFO 16:57:05,229 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 16:57:05,229 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 16:57:06,215 GATKRunReport - Uploaded run statistics report to AWS S3
When I look at the reads that I have in certain highly covered regions I can see
Is there a reason why these reads are not being filtered down? Possibly I am not understanding how the dcov function works.