If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MarkDuplicateSpark does not find file though it is there

Hi GATK Team.

I got the following error when using MarkDupliactesSpark tool:

A USER ERROR has occurred: Couldn't write file /media/Berechnungen/190218_NB501654_0110_AHT7VFBGX9/0046-19_Exom (Intelligenzminderung)/0046-19.dedup.bam because writing failed with exception File file:/media/Berechnungen/190218_NB501654_0110_AHT7VFBGX9/0046-19_Exom%20(Intelligenzminderung)/ does not exist

I looked at the specified path and the header file is in the right directory.
Is it possible, that MDSpark replaces a space in the filepath to %20 and at the end does not replace it back to space, so it does not finde the header file?

Thanks in advance


Issue · Github
by bhanuGandham

Issue Number
Last Updated


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited February 24

    HI @StefanDiederichMainz

    1) Validate your bam file with:
    2) For more info:
    3) If the above suggestions don't work please post the exact command you are using. I will then have the dev team look into this for you.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @StefanDiederichMainz

    Looks like this could be due to the space in the path of the file. "0046-19_Exom (Intelligenzminderung)". Try renaming the directory and trying again.

  • Hi @bhanuGandham,
    I validated my SAM file and there was no error or warning:

    INFO 2019-02-26 08:13:52 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:01:16s. Time for last 10,000,000: 76s. Last read position: chr2:179,395,936 INFO 2019-02-26 08:15:03 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:02:27s. Time for last 10,000,000: 71s. Last read position: chr17:26,951,085 INFO 2019-02-26 08:16:12 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:03:37s. Time for last 10,000,000: 69s. Last read position: chr2:241,463,437 INFO 2019-02-26 08:17:14 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:04:39s. Time for last 10,000,000: 62s. Last read position: chr6:150,719,216 INFO 2019-02-26 08:18:18 SamFileValidator Validated Read 50,000,000 records. Elapsed time: 00:05:43s. Time for last 10,000,000: 63s. Last read position: chr1:113,642,897 INFO 2019-02-26 08:19:23 SamFileValidator Validated Read 60,000,000 records. Elapsed time: 00:06:48s. Time for last 10,000,000: 64s. Last read position: chr15:44,002,729 INFO 2019-02-26 08:20:29 SamFileValidator Validated Read 70,000,000 records. Elapsed time: 00:07:54s. Time for last 10,000,000: 65s. Last read position: chr14:103,447,139 INFO 2019-02-26 08:21:43 SamFileValidator Validated Read 80,000,000 records. Elapsed time: 00:09:07s. Time for last 10,000,000: 73s. Last read position: chr7:98,601,934 INFO 2019-02-26 08:23:03 SamFileValidator Validated Read 90,000,000 records. Elapsed time: 00:10:27s. Time for last 10,000,000: 79s. Last read position: chr13:32,841,278 INFO 2019-02-26 08:24:16 SamFileValidator Validated Read 100,000,000 records. Elapsed time: 00:11:41s. Time for last 10,000,000: 73s. Last read position: chr15:42,041,753 No errors found [Tue Feb 26 08:24:54 CET 2019] picard.sam.ValidateSamFile done. Elapsed time: 12.32 minutes. Runtime.totalMemory()=1945632768 Tool returned: 0

    I although think that it musst be due to the space in the path. Because our Pipeline is creating this path automatically it is difficult for me to change the whole pipeline because there are a lot ofother things depending on that pathname. So is there no other way to make MD Spark deal with spaces in a filepath? The non spark version of MarkDuplicates can handle filepath with space...

    Thanks in advance

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @StefanDiederichMainz

    Currently you cannot use spaces in file paths with MDSpark. I have created a issue ticket for it for the dev team and you can follow the progress on it here:

Sign In or Register to comment.