MarkDuplicateSpark does not find file though it is there

Hi GATK Team.

I got the following error when using MarkDupliactesSpark tool:

A USER ERROR has occurred: Couldn't write file /media/Berechnungen/190218_NB501654_0110_AHT7VFBGX9/0046-19_Exom (Intelligenzminderung)/0046-19.dedup.bam because writing failed with exception File file:/media/Berechnungen/190218_NB501654_0110_AHT7VFBGX9/0046-19_Exom%20(Intelligenzminderung)/0046-19.dedup.bam.parts/header does not exist

I looked at the specified path and the header file is in the right directory.
Is it possible, that MDSpark replaces a space in the filepath to %20 and at the end does not replace it back to space, so it does not finde the header file?

Thanks in advance
Stefan

Tagged:

Issue · Github
by bhanuGandham

Issue Number
5729
State
open
Last Updated

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited February 24

    HI @StefanDiederichMainz

    1) Validate your bam file with: http://broadinstitute.github.io/picard/command-line-overview.html#ValidateSamFile
    2) For more info: https://gatkforums.broadinstitute.org/gatk/discussion/7571/errors-in-sam-bam-files-can-be-diagnosed-with-validatesamfile
    3) If the above suggestions don't work please post the exact command you are using. I will then have the dev team look into this for you.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @StefanDiederichMainz

    Looks like this could be due to the space in the path of the file. "0046-19_Exom (Intelligenzminderung)". Try renaming the directory and trying again.

  • Hi @bhanuGandham,
    I validated my SAM file and there was no error or warning:

    INFO 2019-02-26 08:13:52 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:01:16s. Time for last 10,000,000: 76s. Last read position: chr2:179,395,936 INFO 2019-02-26 08:15:03 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:02:27s. Time for last 10,000,000: 71s. Last read position: chr17:26,951,085 INFO 2019-02-26 08:16:12 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:03:37s. Time for last 10,000,000: 69s. Last read position: chr2:241,463,437 INFO 2019-02-26 08:17:14 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:04:39s. Time for last 10,000,000: 62s. Last read position: chr6:150,719,216 INFO 2019-02-26 08:18:18 SamFileValidator Validated Read 50,000,000 records. Elapsed time: 00:05:43s. Time for last 10,000,000: 63s. Last read position: chr1:113,642,897 INFO 2019-02-26 08:19:23 SamFileValidator Validated Read 60,000,000 records. Elapsed time: 00:06:48s. Time for last 10,000,000: 64s. Last read position: chr15:44,002,729 INFO 2019-02-26 08:20:29 SamFileValidator Validated Read 70,000,000 records. Elapsed time: 00:07:54s. Time for last 10,000,000: 65s. Last read position: chr14:103,447,139 INFO 2019-02-26 08:21:43 SamFileValidator Validated Read 80,000,000 records. Elapsed time: 00:09:07s. Time for last 10,000,000: 73s. Last read position: chr7:98,601,934 INFO 2019-02-26 08:23:03 SamFileValidator Validated Read 90,000,000 records. Elapsed time: 00:10:27s. Time for last 10,000,000: 79s. Last read position: chr13:32,841,278 INFO 2019-02-26 08:24:16 SamFileValidator Validated Read 100,000,000 records. Elapsed time: 00:11:41s. Time for last 10,000,000: 73s. Last read position: chr15:42,041,753 No errors found [Tue Feb 26 08:24:54 CET 2019] picard.sam.ValidateSamFile done. Elapsed time: 12.32 minutes. Runtime.totalMemory()=1945632768 Tool returned: 0

    I although think that it musst be due to the space in the path. Because our Pipeline is creating this path automatically it is difficult for me to change the whole pipeline because there are a lot ofother things depending on that pathname. So is there no other way to make MD Spark deal with spaces in a filepath? The non spark version of MarkDuplicates can handle filepath with space...

    Thanks in advance
    Stefan

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @StefanDiederichMainz

    Currently you cannot use spaces in file paths with MDSpark. I have created a issue ticket for it for the dev team and you can follow the progress on it here: https://github.com/broadinstitute/gatk/issues/5729

Sign In or Register to comment.