Samtool 'non-existent file' stops the the gatk4-germline-snps-indel/joint-discovery-gatk4 workflow

Hello,
I am trying to run a version of the joint-discovery-gatk4-local workflow slightly adjusted to run with a SLURM backend (I am running with gatk 4.0.12.0; the json and wdl files are both based on github.com/gatk-workflows/gatk4-germline-snps-indels 'local' version). When running with enough samples to trigger the scatter-gather of the metrics, the workflow stops at the "GatherMetrics" step. I get this error message:
htsjdk.samtools.SAMException: Cannot read non-existent file: file:///test_joint-call/cromwell-executions/JointGenotyping/0c5fec3d-ae6a-4740-b991-3c5832c36315/call-GatherMetrics/inputs/-343490749/test3000.0.variant_calling_detail_metrics.variant_calling_detail_metrics

This file (with the double suffix) is indeed non-existent, but the file test3000.0.variant_calling_detail_metrics does exist in the right location. And in the command line featured in the logs, the filename is correct, and points to an existing and readable file:

```
Using GATK jar /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/gatk/4.0.12.0/gatk-package-4.0.12.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx2g -Xms2g -jar /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/gatk/4.0.12.0/gatk-package-4.0.12.0-local.jar AccumulateVariantCallingMetrics --INPUT /test_joint-call/cromwell-executions/JointGenotyping/0c5fec3d-ae6a-4740-b991-3c5832c36315/call-GatherMetrics/inputs/-343490749/test3000.0.variant_calling_detail_metrics --INPUT [... follows a long list of input shards ...] --OUTPUT test3000
```

Have you seen such a problem before? Do you know how to solve it? Those file are generated and named automatically, it would be strange if there was really a problem reading one.

Many thanks,

Frederic

Answers

  • FredericFFredericF Member
    edited March 22

    So, the doubled extension was a good indicator of the problem: joint-discovery-gatk4-local.wdl pass the full name of the metrics shards to AccumulateVariantCallingMetrics , including the extension, whereas the tool expect the path without the extension. (From the description of the input parameter: "Paths (except for the file extensions) of Variant Calling Metrics files to read and merge.")
    In the non local script, (joint-discovery-gatk4.wdl), there is indeed a removal by sed of the extension in the creation of the --INPUT parameter.

    For anyone that would face this problem, here is what I did:
    I struggled a little to find a way to modify the input without having bash and wdl clashing for variable substitution, but I followed a suggestion from https://gatkforums.broadinstitute.org/wdl/discussion/10933/accessing-bash-internal-variables and declared a String dollar = "$" in GatherMetrics
    Then, this fixed the problem:

        modified_input="${sep=' --INPUT ' input_details_fofn}"
        modified_input=${dollar}{modified_input//.variant_calling_detail_metrics/}
    
        ${gatk_path} --java-options "-Xmx2g -Xms2g" \
        AccumulateVariantCallingMetrics \
        --INPUT ${dollar}{modified_input} \
        --OUTPUT ${output_prefix}
    
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @FredericF

    This is a question that the firecloud team will be able to help you with. I am moving it to the firecloud forum.

Sign In or Register to comment.