Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Mark Duplicates extremely slow in particular settings

YannLGYannLG Member
Hi,

I have about 200 subjects that I would like to run through the gatk4-data-processing/processing-for-variant-discovery-gatk4.wdl workflow. However, when arriving at the MarkDuplicates step the process becomes extremely slow.
There are a few configurations that I tried and that provided very different results in term of speed but I haven't manage to get this right speed up on our server or cluster when running the full workflow with cromwell-33.1.jar.

I describe the two situations below.
The first case corresponds to a run with cromwell on our server (same behavior happens on our cluster). In this case, I pulled the docker containers used in the workflow and run with singularity.
$ more stderr
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/mnt/scratch/gatk-workflows/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/05069045-b8de-4515-b163-7d0c467b854f/call-MarkDuplicates/tmp.19df4
d74
01:51:46.162 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat Jun 15 01:51:46 UTC 2019] MarkDuplicates --INPUT /mnt/scratch/gatk-workflows/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/05069045-b8de-4515-b163-7d0c467b854f/call-MarkDuplica
tes/inputs/254142457/SUBJECT1.illumina.aligned.unsorted.bam --OUTPUT SUBJECT1.hg38.aligned.unsorted.duplicates_marked.bam --METRICS_FILE SUBJECT1.hg38.duplicate_metrics --ASSUME
_SORT_ORDER queryname --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --VALIDATION_STRINGENCY SILENT --CREATE_MD5_FILE true --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000
--SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_
DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of las
t three ':' separated fields as numeric values> --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --GA4GH_CLIENT
_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sat Jun 15 01:51:48 UTC 2019] Executing as [email protected] on Linux 4.4.0-138-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS
is available; Picard version: Version:4.1.0.0
INFO 2019-06-15 01:51:48 MarkDuplicates Start of doWork freeMemory: 4119251304; totalMemory: 4146593792; maxMemory: 28631367680
INFO 2019-06-15 01:51:48 MarkDuplicates Reading input file and constructing read end information.
INFO 2019-06-15 01:51:48 MarkDuplicates Will retain up to 103736839 data points before spilling to disk.
INFO 2019-06-15 02:14:20 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:22:27s. Time for last 1,000,000: 1,347s. Last read position: chr14:34,704,469
INFO 2019-06-15 02:14:20 MarkDuplicates Tracking 1 as yet unmatched pairs. 1 records in RAM.
INFO 2019-06-15 02:34:59 MarkDuplicates Read 2,000,000 records. Elapsed time: 00:43:06s. Time for last 1,000,000: 1,238s. Last read position: chrUn_KI270438v1:110,511
INFO 2019-06-15 02:34:59 MarkDuplicates Tracking 1 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 02:56:04 MarkDuplicates Read 3,000,000 records. Elapsed time: 01:04:12s. Time for last 1,000,000: 1,265s. Last read position: chr2:32,916,407
INFO 2019-06-15 02:56:04 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 03:16:53 MarkDuplicates Read 4,000,000 records. Elapsed time: 01:25:01s. Time for last 1,000,000: 1,249s. Last read position: chr5:130,256,340
INFO 2019-06-15 03:16:53 MarkDuplicates Tracking 1 as yet unmatched pairs. 1 records in RAM.
INFO 2019-06-15 03:38:14 MarkDuplicates Read 5,000,000 records. Elapsed time: 01:46:22s. Time for last 1,000,000: 1,280s. Last read position: chr2:174,042,204
INFO 2019-06-15 03:38:14 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 03:58:53 MarkDuplicates Read 6,000,000 records. Elapsed time: 02:07:01s. Time for last 1,000,000: 1,238s. Last read position: chr5:124,291,560
INFO 2019-06-15 03:58:53 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 04:19:04 MarkDuplicates Read 7,000,000 records. Elapsed time: 02:27:12s. Time for last 1,000,000: 1,211s. Last read position: chr11:75,728,386
INFO 2019-06-15 04:19:04 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 04:39:23 MarkDuplicates Read 8,000,000 records. Elapsed time: 02:47:31s. Time for last 1,000,000: 1,218s. Last read position: chr1:68,198,219
INFO 2019-06-15 04:39:23 MarkDuplicates Tracking 1 as yet unmatched pairs. 1 records in RAM.
INFO 2019-06-15 05:00:14 MarkDuplicates Read 9,000,000 records. Elapsed time: 03:08:22s. Time for last 1,000,000: 1,251s. Last read position: chrUn_KN707896v1_decoy:2,747
INFO 2019-06-15 05:00:14 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 05:21:50 MarkDuplicates Read 10,000,000 records. Elapsed time: 03:29:57s. Time for last 1,000,000: 1,295s. Last read position: chr17:58,294,312
INFO 2019-06-15 05:21:50 MarkDuplicates Tracking 1 as yet unmatched pairs. 1 records in RAM.


As you can see in this case it take about 20min per million records, which given that there is more than 300 million records per file would take more than 4 days per subject for this step .. The time might be reduced to 13-15min depending on server load and increasing --MAX_RECORDS_IN_RAM.

The second case still on our server, correspond to running the command line in the singularity container (the same as the one used by the workflow for this step).
[email protected]:/mnt/scratch/TEST$ java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Dsamjdk.compression_level=5 -Xms4000m -jar /gatk/gatk-package-4.1.0.0-local.jar MarkDuplicates --INPUT /mnt/scratch/gatk-workflows/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/928f9992-198c-4730-bb0f-67f12700b727/call-MarkDuplicates/inputs/703083320/SUBJECT1.illumina.aligned.unsorted.bam --OUTPUT SUBJECT1.hg38.aligned.unsorted.duplicates_marked.bam --METRICS_FILE SUBJECT1.hg38.duplicate_metrics --VALIDATION_STRINGENCY SILENT --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --ASSUME_SORT_ORDER queryname --CREATE_MD5_FILE true --MAX_RECORDS_IN_RAM 500000
18:00:15.562 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat Jun 15 18:00:15 UTC 2019] MarkDuplicates --INPUT /mnt/scratch/gatk-workflows/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/928f9992-198c-4730-bb0f-67f12700b727/call-MarkDuplicates/inputs/703083320/SUBJECT1.illumina.aligned.unsorted.bam --OUTPUT SUBJECT1.hg38.aligned.unsorted.duplicates_marked.bam --METRICS_FILE SUBJECT1.hg38.duplicate_metrics --ASSUME_SORT_ORDER queryname --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --VALIDATION_STRINGENCY SILENT --MAX_RECORDS_IN_RAM 500000 --CREATE_MD5_FILE true --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 5 --CREATE_INDEX false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sat Jun 15 18:00:17 UTC 2019] Executing as [email protected] on Linux 4.4.0-138-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.0.0
INFO 2019-06-15 18:00:17 MarkDuplicates Start of doWork freeMemory: 4868924072; totalMemory: 4914151424; maxMemory: 28631367680
INFO 2019-06-15 18:00:17 MarkDuplicates Reading input file and constructing read end information.
INFO 2019-06-15 18:00:17 MarkDuplicates Will retain up to 103736839 data points before spilling to disk.
INFO 2019-06-15 18:00:31 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 12s. Last read position: chr14:34,704,469
INFO 2019-06-15 18:00:31 MarkDuplicates Tracking 1 as yet unmatched pairs. 1 records in RAM.
INFO 2019-06-15 18:00:42 MarkDuplicates Read 2,000,000 records. Elapsed time: 00:00:24s. Time for last 1,000,000: 11s. Last read position: chrUn_KI270438v1:110,511
INFO 2019-06-15 18:00:42 MarkDuplicates Tracking 1 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 18:00:53 MarkDuplicates Read 3,000,000 records. Elapsed time: 00:00:35s. Time for last 1,000,000: 11s. Last read position: chr2:32,916,407
INFO 2019-06-15 18:00:53 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 18:01:05 MarkDuplicates Read 4,000,000 records. Elapsed time: 00:00:47s. Time for last 1,000,000: 11s. Last read position: chr5:130,256,340
INFO 2019-06-15 18:01:05 MarkDuplicates Tracking 1 as yet unmatched pairs. 1 records in RAM.
INFO 2019-06-15 18:01:14 MarkDuplicates Read 5,000,000 records. Elapsed time: 00:00:56s. Time for last 1,000,000: 9s. Last read position: chr2:174,042,204
INFO 2019-06-15 18:01:14 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 18:01:28 MarkDuplicates Read 6,000,000 records. Elapsed time: 00:01:10s. Time for last 1,000,000: 13s. Last read position: chr5:124,291,560
INFO 2019-06-15 18:01:28 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 18:01:36 MarkDuplicates Read 7,000,000 records. Elapsed time: 00:01:18s. Time for last 1,000,000: 8s. Last read position: chr11:75,728,386
INFO 2019-06-15 18:01:36 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2019-06-15 18:01:52 MarkDuplicates Read 8,000,000 records. Elapsed time: 00:01:34s. Time for last 1,000,000: 15s. Last read position: chr1:68,198,219
INFO 2019-06-15 18:01:52 MarkDuplicates Tracking 1 as yet unmatched pairs. 1 records in RAM.
INFO 2019-06-15 18:02:02 MarkDuplicates Read 9,000,000 records. Elapsed time: 00:01:44s. Time for last 1,000,000: 9s. Last read position: chrUn_KN707896v1_decoy:2,747

This time the process takes between 1-2 hours, which is practicable for the about ~200 subjects I have.
However, it is probably not due to the use of cromwell because I obtain the same time per million read on my computer where I can run the workflow directly as it is on github using docker, because I am sudo.

I suspect there is something more complicated going on here. Maybe it might be related to the issue on this other thread gatk/discussion/23441/markduplicatespark-is-slower-than-normal-markduplicates, but I can't figure out how to fix this.

Thank you for your help,
Yann
Post edited by YannLG on

Answers

  • YannLGYannLG Member
    Actually I noticed that this issue is probably exactly the same as
    wdl/discussion/23788/cromwell-slow-when-running-markduplicates
    which was unanswered
    However, I can add that running with slurm (our cluster) or not (our server) makes no difference.
  • YannLGYannLG Member
    Using cromwell with singularity on my own desktop the process run at normal speed (i.e. 4 to 6s per million read).

    So, I will probably contact my IT to know if they can find the issue causing this slow down ..
  • YannLGYannLG Member
    edited June 17
    I found the issue that was actually in plain sight of my first post and not related to the use of singularity but to the temporary directory set by cromwell on an NFS system that apparently saturates (quote at 400MB/s in both directions), as opposed to HDD (or SDD) used on my own computer.

    When running the step directly within singularity the tmp dir is set to /tmp as opposed to the one set by cromwell see above "Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=".

    To solve this in the cromwell -Dconfig.file
    I edited the backend custom configuration section to write in /tmp
    # To create and return a non-random temporary directory, use something like:
    temporary-directory = "$(mkdir -p /tmp/MY_DIR && echo /tmp/MY_DIR)"

    Note that I, and probably most users, cannot directly run all the workflow of multiple subjects in /tmp due to lack of space but these temporary files created by this specific workflow do not seem to take a lot of space.


    Another important point for GATK/CROMWELL could you please edit the link to the example configuration file that currently does not exist found at
    cromwell.readthedocs.io/en/stable/Configuring/#configuration-examples
    the link:
    www.github.com/broadinstitute/cromwell/tree/develop/cromwell.example.backends
  • YannLGYannLG Member
    Hi again,

    Does anyone know how to set temporary-directory per step in the cromwell-config file?

    Because as I would have expected from previous experience with picard ... when I run multiple subject on our server the /tmp dir got full by one of the step (in this case SortAndFixTags, by the SortSam GATK command).

    The way I am currently fixing this is by editing the wdl preprocessing file and overriding the tmpdir option set by cromwell when executing this specific GATK command ..

    ${gatk_path} --java-options "-Djava.io.tmpdir=/mnt/scratch/tmp" \
    --java-options "-Dsamjdk.compression_level=${compression_level} ${java_opt_sort}" \
    SortSam \
    --INPUT ${input_bam} \
    --OUTPUT /dev/stdout \
    --SORT_ORDER "coordinate" \
    --CREATE_INDEX false \
    --CREATE_MD5_FILE false \


    Best,
    Yann

    INFO 2019-06-19 13:40:24 SortSam Read 480,000,000 records. Elapsed time: 01:13:49s. Time for last 10,000,000: 90s. Last read position: chr21:25,893,423
    [Wed Jun 19 13:41:53 UTC 2019] picard.sam.SortSam done. Elapsed time: 75.40 minutes.
    Runtime.totalMemory()=12893290496
    To get help, see broadinstitute.github.io/picard/index.html#GettingHelp
    htsjdk.samtools.util.RuntimeIOException: Problem writing temporary file file:///tmp/sortingcollection.1730967012672215706.tmp. Try setting TMP_DIR to a file system with lots of space.
    at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:261)
    at htsjdk.samtools.util.SortingCollection.add(SortingCollection.java:183)
    at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:187)
    at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:36)
    at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:16)
    at htsjdk.samtools.util.AbstractAsyncWriter$WriterRunnable.run(AbstractAsyncWriter.java:123)
    at java.lang.Thread.run(Thread.java:748)
    Caused by: htsjdk.samtools.util.RuntimeIOException: Write error; BinaryCodec in writemode; streamed file (filename not available)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:222)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:212)
    at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:168)
    at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:40)
    at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:255)
    ... 6 more
    Suppressed: java.io.IOException: No space left on device
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
    at java.nio.channels.Channels.writeFully(Channels.java:101)
    at java.nio.channels.Channels.access$000(Channels.java:61)
    at java.nio.channels.Channels$1.write(Channels.java:174)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:360)
    at org.xerial.snappy.SnappyOutputStream.compressInput(SnappyOutputStream.java:374)
    at org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:334)
    at org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:397)
    at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:260)
    ... 6 more
    Caused by: java.io.IOException: No space left on device
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
    at java.nio.channels.Channels.writeFully(Channels.java:101)
    at java.nio.channels.Channels.access$000(Channels.java:61)
    at java.nio.channels.Channels$1.write(Channels.java:174)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:360)
    at org.xerial.snappy.SnappyOutputStream.compressInput(SnappyOutputStream.java:374)
    at org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:130)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220)
    ... 10 more
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @YannLG

    This is firecloud question. I will move this to the firecloud forum and someone there will be able to help you out.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @YannLG - Sorry for the delay. I am trying to find someone who knows a bit more about this scenario than I. I will report back soon!

  • aednicholsaednichols Member, Broadie

    This user is utilizing Cromwell with Singularity and NFS, which is a different use case than FireCloud.

    Support for this configuration is community-based.

  • YannLGYannLG Member
    My understanding was that the main problem (i.e. the slow down) comes from the NFS usage (not related to singularity in the end).
Sign In or Register to comment.