We've moved!
You can find our new documentation site and support forum for posting questions here.

MarkDuplicates very slow

I am using the Featured Worskpace for PreProcessingForVariantDiscovery_GATK4 with my data to produce a clean BAM for GATK4 Variant discovery analyses using Best Practices. PreProcessingForVariantDiscovery_GATK4.MarkDuplicates is running for 13 hours. Is this normal? My unmapped bam is 7.97 GB long. I am using 8 CPUs, since it did not allow me more than that (quota exceeded error). Pls help.


  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    The preprocessing part of the workflow is the longest part of our entire pipeline. Did your workflow finish after 13 hours, or is it still currently running? Looking at the table for the documentation on that method, you do fall within the range given:

    Sample Name Sample Size Time Cost $
    NA12878_24RG_small 3.11 GB 4:18:00 0.77
    NA12878 64.89 GB 47:16:00 7.23

    If your submission still hasn't returned, or if it failed due to the quota exceeded error, it's possible that it needs more CPU. To increase your quota and run on more CPU's (making it run faster), you can follow the instructions here.

Sign In or Register to comment.