Update: July 26, 2019
This section of the forum is no longer actively monitored. We are working on a support migration plan that we will share here shortly. Apologies for this inconvenience.

SamToFastqAndBwaMem error when running processing-for-variant-discovery-gatk4.wdl locally

AC321AC321 Member
edited October 2018 in Ask the Cromwell + WDL Team

I am trying to run processing-for-variant-discovery-gatk4.wdl on my MacBook Pro. Instead of using the google drives, I have downloaded the relevant files. I have also pared down the list of unmapped BAM files.

I am encountering the following error:

 [2018-10-10 15:41:36,28] [error] WorkflowManagerActor Workflow a3413d6f-687f-4adf-8490-2db96dd33fb8 failed (during ExecutingWorkflowState): Job PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/stderr.
 Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/tmp.aa8f0e6d
19:41:20.966 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/gitc/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Oct 10 19:41:21 UTC 2018] SamToFastq INPUT=/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/inputs/2097736019/HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.bam FASTQ=/dev/stdout INTERLEAVE=true INCLUDE_NON_PF_READS=true    OUTPUT_PER_RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU RE_REVERSE=true CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Oct 10 19:41:21 UTC 2018] Executing as [email protected] on Linux 4.9.93-linuxkit-aufs amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Deflater: Intel; Inflater: Intel; Picard version: 2.16.0-SNAPSHOT
[Wed Oct 10 19:41:30 UTC 2018] picard.sam.SamToFastq done. Elapsed time: 0.16 minutes.
Runtime.totalMemory()=3014656000
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Error in writing fastq file /dev/stdout
    at htsjdk.samtools.fastq.BasicFastqWriter.write(BasicFastqWriter.java:66)
    at picard.sam.SamToFastq.writeRecord(SamToFastq.java:356)
    at picard.sam.SamToFastq.doWork(SamToFastq.java:206)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/script: line 33:    14 Exit 1                  java -Dsamjdk.compression_level=5 -Xms3000m -jar /usr/gitc/picard.jar SamToFastq INPUT=/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/inputs/2097736019/HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.bam FASTQ=/dev/stdout INTERLEAVE=true NON_PF=true
        15 Killed                  | /usr/gitc/bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta /dev/stdin - 2> >(tee HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.unmerged.bwa.stderr.log >&2)
        16 Done                    | samtools view -1 - > HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.unmerged.bam

I haven't been able to resolve this. Any ideas?

Note: I have seen this topical post by @EADG . Perhaps I am missing something, but the workflow I am using doesn't appear to use the sub() function.

Here is my inputs.json file:

{
  "##_COMMENT1": "SAMPLE NAME AND UNMAPPED BAMS",
  "PreProcessingForVariantDiscovery_GATK4.sample_name": "NA12878",
  "PreProcessingForVariantDiscovery_GATK4.ref_name": "hg38",
  "PreProcessingForVariantDiscovery_GATK4.flowcell_unmapped_bams_list": "/Users/username/dev/genomics/cromwell/inputs/source/NA12878_24RG_small.txt",
  "PreProcessingForVariantDiscovery_GATK4.unmapped_bam_suffix": ".bam",

  "##_COMMENT2": "REFERENCE FILES", 
  "PreProcessingForVariantDiscovery_GATK4.ref_dict": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.dict",
  "PreProcessingForVariantDiscovery_GATK4.ref_fasta": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.fasta",
  "PreProcessingForVariantDiscovery_GATK4.ref_fasta_index": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.fasta.fai",
  "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_alt": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.fasta.64.alt",
  "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_sa": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.fasta.64.sa",
  "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_amb": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.fasta.64.amb",
  "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_bwt": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.fasta.64.bwt",
  "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_ann": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.fasta.64.ann",
  "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_pac": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.fasta.64.pac",

  "##_COMMENT3": "KNOWN SITES RESOURCES", 
  "PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.dbsnp138.vcf",
  "PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf_index": "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.dbsnp138.vcf.idx",
  "PreProcessingForVariantDiscovery_GATK4.known_indels_sites_VCFs": [
    "/Users/username/dev/genomics/cromwell/inputs/reference/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
    "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.known_indels.vcf.gz"
  ],
  "PreProcessingForVariantDiscovery_GATK4.known_indels_sites_indices": [
    "/Users/username/dev/genomics/cromwell/inputs/reference/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
    "/Users/username/dev/genomics/cromwell/inputs/reference/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
  ],

  "##_COMMENT4": "MISC PARAMETERS", 
  "PreProcessingForVariantDiscovery_GATK4.bwa_commandline": "bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta",
  "PreProcessingForVariantDiscovery_GATK4.compression_level": 5,
  "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.num_cpu": "4",

  "##_COMMENT5": "DOCKERS", 
  "PreProcessingForVariantDiscovery_GATK4.gotc_docker": "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786",
  "PreProcessingForVariantDiscovery_GATK4.gatk_docker": "broadinstitute/gatk:4.0.4.0",
  "PreProcessingForVariantDiscovery_GATK4.python_docker": "python:2.7",

  "##_COMMENT6": "PATHS",   
  "PreProcessingForVariantDiscovery_GATK4.gotc_path": "/usr/gitc/",
  "PreProcessingForVariantDiscovery_GATK4.gatk_path": "/gatk/gatk",

  "##_COMMENT7": "JAVA OPTIONS", 
  "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.java_opt": "-Xms3000m",
  "PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.java_opt": "-Xms3000m",
  "PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.java_opt": "-Xms4000m",
  "PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_sort": "-Xms4000m",
  "PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_fix": "-Xms500m",
  "PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.java_opt": "-Xms4000m",
  "PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.java_opt": "-Xms3000m",
  "PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.java_opt": "-Xms3000m",
  "PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.java_opt": "-Xms2000m",

  "##_COMMENT8": "MEMORY ALLOCATION", 
  "PreProcessingForVariantDiscovery_GATK4.GetBwaVersion.mem_size": "1 GB",
  "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.mem_size": "14 GB",
  "PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.mem_size": "3500 MB",
  "PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.mem_size": "7 GB",
  "PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.mem_size": "5000 MB",
  "PreProcessingForVariantDiscovery_GATK4.CreateSequenceGroupingTSV.mem_size": "2 GB",
  "PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.mem_size": "6 GB",
  "PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.mem_size": "3500 MB",
  "PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.mem_size": "3500 MB",
  "PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.mem_size": "3 GB",

  "##_COMMENT9": "DISK SIZE ALLOCATION",
  "PreProcessingForVariantDiscovery_GATK4.agg_small_disk": 200,
  "PreProcessingForVariantDiscovery_GATK4.agg_medium_disk": 300,
  "PreProcessingForVariantDiscovery_GATK4.agg_large_disk": 400,
  "PreProcessingForVariantDiscovery_GATK4.flowcell_small_disk": 100,
  "PreProcessingForVariantDiscovery_GATK4.flowcell_medium_disk": 200,

  "##_COMMENT10": "PREEMPTIBLES", 
  "PreProcessingForVariantDiscovery_GATK4.preemptible_tries": 3,
  "PreProcessingForVariantDiscovery_GATK4.agg_preemptible_tries": 3
}

And here is my unmapped bams list (just one file):

/Users/username/dev/genomics/cromwell/inputs/source/HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.bam

I have made no edits to the wdl file.

Thanks in advance for any insight!

Post edited by AC321 on

Answers

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Hi @AC321 ,

    You mentioned this is being run on your laptop, how's the disk space? Perhaps its unable to write to the stdout because there isn't enough space to write it to.

    Whats in the content of the following files:
    /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/stderr
    /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/stdout
    /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/stdlog

    and any other log files within that directory? You can just upload them to the forum thread.

  • AC321AC321 Member

    Hi @bshifaw ,

    Thanks for the prompt response. I have 307 GB available currently.

    /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/stderr:

    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/tmp.aa8f0e6d
    19:41:20.966 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/gitc/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Oct 10 19:41:21 UTC 2018] SamToFastq INPUT=/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/inputs/2097736019/HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.bam FASTQ=/dev/stdout INTERLEAVE=true INCLUDE_NON_PF_READS=true    OUTPUT_PER_RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU RE_REVERSE=true CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Wed Oct 10 19:41:21 UTC 2018] Executing as [email protected] on Linux 4.9.93-linuxkit-aufs amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Deflater: Intel; Inflater: Intel; Picard version: 2.16.0-SNAPSHOT
    [Wed Oct 10 19:41:30 UTC 2018] picard.sam.SamToFastq done. Elapsed time: 0.16 minutes.
    Runtime.totalMemory()=3014656000
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" htsjdk.samtools.SAMException: Error in writing fastq file /dev/stdout
            at htsjdk.samtools.fastq.BasicFastqWriter.write(BasicFastqWriter.java:66)
            at picard.sam.SamToFastq.writeRecord(SamToFastq.java:356)
            at picard.sam.SamToFastq.doWork(SamToFastq.java:206)
            at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
            at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
            at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
    /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/script: line 33:    14 Exit 1                  java -Dsamjdk.compression_level=5 -Xms3000m -jar /usr/gitc/picard.jar SamToFastq INPUT=/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/inputs/2097736019/HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.bam FASTQ=/dev/stdout INTERLEAVE=true NON_PF=true
            15 Killed                  | /usr/gitc/bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta /dev/stdin - 2> >(tee HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.unmerged.bwa.stderr.log >&2)
            16 Done                    | samtools view -1 - > HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.unmerged.bam
    

    /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/stdout is empty.

    /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/stdlog does not exist.

    Other log files in that directory:

    docker_cid:

    5b5eabd4b2860a1c6ff3d9bdc7f968b0a08babdb023549254daccefc3685553d
    

    rc:

    1
    

    stdout.background

    35008
    5b5eabd4b2860a1c6ff3d9bdc7f968b0a08babdb023549254daccefc3685553d
    

    and stderr.background and HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.unmerged.bwa.stderr.log, which are empty.

    The directory also contains scripts:

    script:

    #!/bin/bash
    
    cd /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution
    tmpDir=$(mkdir -p "/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/tmp.aa8f0e6d" && echo "/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/tmp.aa8f0e6d")
    chmod 777 "$tmpDir"
    export _JAVA_OPTIONS=-Djava.io.tmpdir="$tmpDir"
    export TMPDIR="$tmpDir"
    export HOME="$HOME"
    (
    cd /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution
    
    )
    (
    cd /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution
    
    
      set -o pipefail
      set -e
    
      # set the bash variable needed for the command-line
      bash_ref_fasta=/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/inputs/2061131745/Homo_sapiens_assembly38.fasta
    
    java -Dsamjdk.compression_level=5 -Xms3000m -jar /usr/gitc/picard.jar \
        SamToFastq \
            INPUT=/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/inputs/2097736019/HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.bam \
            FASTQ=/dev/stdout \
            INTERLEAVE=true \
            NON_PF=true \
      | \
    /usr/gitc/bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta /dev/stdin -  2> >(tee HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.unmerged.bwa.stderr.log >&2) \
      | \
    samtools view -1 - > HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.unmerged.bam
    )  > '/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/stdout' 2> '/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/stderr'
    echo $? > /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/rc.tmp
    (
    # add a .file in every empty directory to facilitate directory delocalization on the cloud
    cd /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution
    find . -type d -empty -print0 | xargs -0 -I % touch %/.file
    )
    (
    cd /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution
    sync
    
    
    )
    mv /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/rc.tmp /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/rc
    

    script.background:

    #!/bin/bash
    '/bin/bash' '/Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/script.submit' < /dev/null || { rc=$?; if [ "$rc" -gt "128" ]; then echo $rc; else echo -1; fi } > /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/rc &
    echo $!
    

    script.submit:

    #!/bin/bash
    # make sure there is no preexisting Docker CID file
    rm -f /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/docker_cid
    # run as in the original configuration without --rm flag (will remove later)
    docker run \
      --cidfile /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/docker_cid \
      -i \
       \
      --entrypoint /bin/bash \
      -v /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0:/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0 \
      broadinstitute/[email protected]:4fca8ca945c17fd86e31eeef1c02983e091d4f2cb437199e74b164d177d5b2d1 /cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/script
    
    # get the return code (working even if the container was detached)
    rc=$(docker wait `cat /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/docker_cid`)
    
    # remove the container after waiting
    docker rm `cat /Users/acrane10/dev/genomics/cromwell/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/a3413d6f-687f-4adf-8490-2db96dd33fb8/call-SamToFastqAndBwaMem/shard-0/execution/docker_cid`
    
    # return exit code
    exit $rc
    

    Finally, the directory also contains a copy of HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.unmerged.bam.

    Let me know if you need any more info. Thanks!

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    What was the command used to run the workflow?

  • I too have gotten stuck at exactly the same step. I have even tried changing FASTQ=/proc/self/fd/1 in the alignment.fc.wdl, and still the same error Error in writing fastq file /dev/stdout or Error in writing fastq file /proc/self/fd/1
  • bshifawbshifaw moonMember, Broadie, Moderator admin

    HI @jaideepjoshi ,

    Review this post and let us know if it helps.

  • Hi @bshifaw
    Thanks, for your willingness to help. I am looking at the post, but I do not see a "killed" message in my stderr, I will continue to see if I have resource problems, I am currently running on a 7-machine HtCondor cluster. each machine with 120GB memory and 12 cores.

    I have used multiple versions of cromwell.. same issue. Currently using 31.jar.

    Here is my stderr output

    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/germline_single_sample_workflow/2433aa70-8f4f-435d-baf3-b7cef12be3f4/call-SamToFastqAndBwaMemAndMba/shard-13/tmp.464e1091
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/germline_single_sample_workflow/2433aa70-8f4f-435d-baf3-b7cef12be3f4/call-SamToFastqAndBwaMemAndMba/shard-13/tmp.464e1091
    Exception in thread "main" java.lang.IllegalArgumentException: Supplied String contains illegal character '
    '.
    at htsjdk.samtools.util.StringUtil.assertCharactersNotInString(StringUtil.java:191)
    at htsjdk.samtools.metrics.StringHeader.setValue(StringHeader.java:51)
    at htsjdk.samtools.metrics.StringHeader.<init>(StringHeader.java:43)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
    07:48:44.316 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/gitc/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Fri Jan 25 07:48:44 UTC 2019] SamToFastq INPUT=/cromwell-executions/germline_single_sample_workflow/2433aa70-8f4f-435d-baf3-b7cef12be3f4/call-SamToFastqAndBwaMemAndMba/shard-13/inputs/cent
    os-nfs/datasets/NA12878-small/NA12878-small-input-data/HK35M.8.NA12878.interval.filtered.query.sorted.unmapped.bam FASTQ=/dev/stdout INTERLEAVE=true INCLUDE_NON_PF_READS=true OUTPUT_PER_
    RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU RE_REVERSE=true CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false VALIDATION_S
    TRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=fals
    e
    [Fri Jan 25 07:48:44 UTC 2019] Executing as [email protected] on Linux 3.10.0-862.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Deflater: Intel; Inflater: Int
    el; Picard version: 2.16.0-SNAPSHOT
    [M::bwa_idx_load_from_disk] read 3171 ALT contigs
    [W::main_mem] when '-p' is in use, the second query file is ignored.
    [Fri Jan 25 07:48:50 UTC 2019] picard.sam.SamToFastq done. Elapsed time: 0.11 minutes.
    Runtime.totalMemory()=5024776192
    To get help, see web link
    Exception in thread "main" htsjdk.samtools.SAMException: Error in writing fastq file /dev/stdout
    at htsjdk.samtools.fastq.BasicFastqWriter.write(BasicFastqWriter.java:66)
    at picard.sam.SamToFastq.writeRecord(SamToFastq.java:356)
    at picard.sam.SamToFastq.doWork(SamToFastq.java:206)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
    is in use, the second query file is ignored.
    [Fri Jan 25 07:48:50 UTC 2019] picard.sam.SamToFastq done. Elapsed time: 0.11 minutes.
    Runtime.totalMemory()=5024776192
    To get help, see web link
    Exception in thread "main" htsjdk.samtools.SAMException: Error in writing fastq file /dev/stdout
    at htsjdk.samtools.fastq.BasicFastqWriter.write(BasicFastqWriter.java:66)
    at picard.sam.SamToFastq.writeRecord(SamToFastq.java:356)
    at picard.sam.SamToFastq.doWork(SamToFastq.java:206)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)


    Here is my (slightly modified) alignment.fc.wdl
    .

    workflow Alignment{}

    # Get version of BWA
    task GetBwaVersion {
    command {
    # not setting set -o pipefail here because /bwa has a rc=1 and we dont want to allow rc=1 to succeed because
    # the sed may also fail with that error and that is something we actually want to fail on.
    /usr/gitc/bwa 2>&1 | \
    grep -e '^Version' | \
    sed 's/Version: //'
    }
    runtime {
    docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.3-1513176735"
    memory: "1 GB"
    }
    output {
    String version = read_string(stdout())
    }
    }

    # Read unmapped BAM, convert on-the-fly to FASTQ and stream to BWA MEM for alignment, then stream to MergeBamAlignment
    task SamToFastqAndBwaMemAndMba {
    File input_bam
    String bwa_commandline
    String bwa_version
    String output_bam_basename
    File ref_fasta
    File ref_fasta_index
    File ref_dict

    # This is the .alt file from bwa-kit
    # listing the reference contigs that are "alternative".
    File ref_alt

    File ref_amb
    File ref_ann
    File ref_bwt
    File ref_pac
    File ref_sa
    Int compression_level
    Int preemptible_tries

    Float unmapped_bam_size = size(input_bam, "GB")
    Float ref_size = size(ref_fasta, "GB") + size(ref_fasta_index, "GB") + size(ref_dict, "GB")
    Float bwa_ref_size = ref_size + size(ref_alt, "GB") + size(ref_amb, "GB") + size(ref_ann, "GB") + size(ref_bwt, "GB") + size(ref_pac, "GB") + size(ref_sa, "GB")
    # Sometimes the output is larger than the input, or a task can spill to disk.
    # In these cases we need to account for the input (1) and the output (1.5) or the input(1), the output(1), and spillage (.5).
    Float disk_multiplier = 2.5
    Int disk_size = ceil(unmapped_bam_size + bwa_ref_size + (disk_multiplier * unmapped_bam_size) + 20)

    command <<<
    set -o pipefail
    set -e

    # set the bash variable needed for the command-line
    bash_ref_fasta=${ref_fasta}
    # if ref_alt has data in it,
    if [ -s ${ref_alt} ]; then
    java -Xms5000m -jar /usr/gitc/picard.jar \
    SamToFastq \
    INPUT=${input_bam} \
    FASTQ=/dev/stdout \
    INTERLEAVE=true \
    NON_PF=true | \
    /usr/gitc/${bwa_commandline} /dev/stdin - 2> >(tee ${output_bam_basename}.bwa.stderr.log >&2) | \
    java -Dsamjdk.compression_level=${compression_level} -Xms3000m -jar /usr/gitc/picard.jar \
    MergeBamAlignment \
    VALIDATION_STRINGENCY=SILENT \
    EXPECTED_ORIENTATIONS=FR \
    ATTRIBUTES_TO_RETAIN=X0 \
    ATTRIBUTES_TO_REMOVE=NM \
    ATTRIBUTES_TO_REMOVE=MD \
    ALIGNED_BAM=/dev/stdin \
    UNMAPPED_BAM=${input_bam} \
    OUTPUT=${output_bam_basename}.bam \
    REFERENCE_SEQUENCE=${ref_fasta} \
    PAIRED_RUN=true \
    SORT_ORDER="unsorted" \
    IS_BISULFITE_SEQUENCE=false \
    ALIGNED_READS_ONLY=false \
    CLIP_ADAPTERS=false \
    MAX_RECORDS_IN_RAM=2000000 \
    ADD_MATE_CIGAR=true \
    MAX_INSERTIONS_OR_DELETIONS=-1 \
    PRIMARY_ALIGNMENT_STRATEGY=MostDistant \
    PROGRAM_RECORD_ID="bwamem" \
    PROGRAM_GROUP_VERSION="${bwa_version}" \
    PROGRAM_GROUP_COMMAND_LINE="${bwa_commandline}" \
    PROGRAM_GROUP_NAME="bwamem" \
    UNMAPPED_READ_STRATEGY=COPY_TO_TAG \
    ALIGNER_PROPER_PAIR_FLAGS=true \
    UNMAP_CONTAMINANT_READS=true \
    ADD_PG_TAG_TO_READS=false

    grep -m1 "read .* ALT contigs" ${output_bam_basename}.bwa.stderr.log | \
    grep -v "read 0 ALT contigs"

    # else ref_alt is empty or could not be found
    else
    exit 1;
    fi
    >>>
    runtime {
    docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.3-1513176735"
    preemptible: preemptible_tries
    memory: "16 GB"
    cpu: "4"
    disks: "local-disk " + disk_size + " HDD"
    }
    output {
    File output_bam = "${output_bam_basename}.bam"
    File bwa_stderr_log = "${output_bam_basename}.bwa.stderr.log"
    }
    }

    task SamSplitter {
    File input_bam
    Int n_reads
    Int preemptible_tries
    Int compression_level

    Float unmapped_bam_size = size(input_bam, "GB")
    # Since the output bams are less compressed than the input bam we need a disk multiplier that's larger than 2.
    Float disk_multiplier = 2.5
    Int disk_size = ceil(disk_multiplier * unmapped_bam_size + 20)

    command {
    set -e
    mkdir output_dir

    total_reads=$(samtools view -c ${input_bam})

    java -Dsamjdk.compression_level=${compression_level} -Xms3000m -jar /usr/gitc/picard.jar SplitSamByNumberOfReads \
    INPUT=${input_bam} \
    OUTPUT=output_dir \
    SPLIT_TO_N_READS=${n_reads} \
    TOTAL_READS_IN_INPUT=$total_reads
    }
    output {
    Array[File] split_bams = glob("output_dir/*.bam")
    }
    runtime {
    docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.3-1513176735"
    preemptible: preemptible_tries
    memory: "3.75 GB"
    disks: "local-disk " + disk_size + " HDD"
    }
    }
  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Your stderr log file mentions that it received an unexpected character, it might be a white space character. Check your input variables for whitespace characters or that any of the changes made to the workflow doesn't include whitespaces.
    Exception in thread "main" java.lang.IllegalArgumentException: Supplied String contains illegal character ' '.

    It may help to check the script file in the working directory (/cromwell-executions/germline_single_sample_workflow/2433aa70-8f4f-435d-baf3-b7cef12be3f4/call-SamToFastqAndBwaMemAndMba/shard-13/) to identify the whitespace.

  • jaideepjoshijaideepjoshi Member
    edited January 25
    Thanks @bshifaw . I see that too. But have NO CLUE where that is coming from ? I would think it is in the alignment.fc.wdl. But I dont see it there. Any ideas where/how I could troubleshoot this?
    Thanks so much again.
  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Yes, it looks like its assocated with SamToFastqAndBwaMemAndMba task in the alignment.fc.wdl. Since the error indicates that Java isn't satisfied with the command being used, you should look at the exact command being executed by checking the script file in the working directory (same place you found the stderr log file).

  • jaideepjoshijaideepjoshi Member
    edited January 26
    @bshifaw .. thanks for the clue.. I found the issue. For some reason GetBwaVersion gets executed twice, and produces stdout file with two entries. Both these entries then populate the $bwa_version. This manifests in SamToFastq like this:
    PROGRAM_GROUP_VERSION="0.7.15-r1140
    0.7.15-r1140" \
    If I hardcode this version in the MergeBamAlignment section I get past the error. and then fails AFTER successfully passing SamToFastqtoBwaMem
    I dont know why GetBwaVersion executes twice. I am trying various combinations of cromwell and gatk. I am running cromwell-31.jar and gatk 4.0.2.1.

    Script:
    java -Xms5000m -jar /usr/gitc/picard.jar \
    SamToFastq \
    INPUT=/cromwell-executions/germline_single_sample_workflow/0505b4a0-4ad4-4ddd-ba38-b37459be0ec6/call-SamToFastqAndBwaMemAndMba/shard-1/inputs/centos-nfs/datasets/NA12878-small/NA12878-small-input-data/HJYFJ.5.NA12878.downsampled.quer
    y.sorted.unmapped.bam \
    FASTQ=jj.fastq \
    INTERLEAVE=true \
    NON_PF=true | \
    /usr/gitc/bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta /dev/stdin - 2> >(tee HJYFJ.5.NA12878.downsampled.query.sorted..aligned.unsorted.bwa.stderr.log >&2) | \
    java -Dsamjdk.compression_level=2 -Xms3000m -jar /usr/gitc/picard.jar \
    MergeBamAlignment \
    VALIDATION_STRINGENCY=SILENT \
    EXPECTED_ORIENTATIONS=FR \
    ATTRIBUTES_TO_RETAIN=X0 \
    ATTRIBUTES_TO_REMOVE=NM \
    ATTRIBUTES_TO_REMOVE=MD \
    ALIGNED_BAM=/dev/stdin \
    UNMAPPED_BAM=/cromwell-executions/germline_single_sample_workflow/0505b4a0-4ad4-4ddd-ba38-b37459be0ec6/call-SamToFastqAndBwaMemAndMba/shard-1/inputs/centos-nfs/datasets/NA12878-small/NA12878-small-input-data/HJYFJ.5.NA12878.downsampl
    ed.query.sorted.unmapped.bam \
    OUTPUT=HJYFJ.5.NA12878.downsampled.query.sorted..aligned.unsorted.bam \
    REFERENCE_SEQUENCE=/cromwell-executions/germline_single_sample_workflow/0505b4a0-4ad4-4ddd-ba38-b37459be0ec6/call-SamToFastqAndBwaMemAndMba/shard-1/inputs/centos-nfs/datasets/NA12878-small/NA12878-small-reference-data/Homo_sapiens_as
    sembly38.fasta \
    PAIRED_RUN=true \
    SORT_ORDER="unsorted" \
    IS_BISULFITE_SEQUENCE=false \
    ALIGNED_READS_ONLY=false \
    CLIP_ADAPTERS=false \
    MAX_RECORDS_IN_RAM=2000000 \
    ADD_MATE_CIGAR=true \
    MAX_INSERTIONS_OR_DELETIONS=-1 \
    PRIMARY_ALIGNMENT_STRATEGY=MostDistant \
    PROGRAM_RECORD_ID="bwamem" \
    PROGRAM_GROUP_VERSION="0.7.15-r1140
    0.7.15-r1140" \
    PROGRAM_GROUP_COMMAND_LINE="bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta" \
    PROGRAM_GROUP_NAME="bwamem" \
    UNMAPPED_READ_STRATEGY=COPY_TO_TAG \
    ALIGNER_PROPER_PAIR_FLAGS=true \
    UNMAP_CONTAMINANT_READS=true \
    ADD_PG_TAG_TO_READS=false


    MergeBamAlignment \
    VALIDATION_STRINGENCY=SILENT \
    EXPECTED_ORIENTATIONS=FR \
    ATTRIBUTES_TO_RETAIN=X0 \
    ATTRIBUTES_TO_REMOVE=NM \
    ATTRIBUTES_TO_REMOVE=MD \
    ALIGNED_BAM=/dev/stdin \
    UNMAPPED_BAM=${input_bam} \
    OUTPUT=${output_bam_basename}.bam \
    REFERENCE_SEQUENCE=${ref_fasta} \
    PAIRED_RUN=true \
    SORT_ORDER="unsorted" \
    IS_BISULFITE_SEQUENCE=false \
    ALIGNED_READS_ONLY=false \
    CLIP_ADAPTERS=false \
    MAX_RECORDS_IN_RAM=2000000 \
    ADD_MATE_CIGAR=true \
    MAX_INSERTIONS_OR_DELETIONS=-1 \
    PRIMARY_ALIGNMENT_STRATEGY=MostDistant \
    PROGRAM_RECORD_ID="bwamem" \
    PROGRAM_GROUP_VERSION="0.7.15-r1140" \
    PROGRAM_GROUP_COMMAND_LINE="${bwa_commandline}" \
    PROGRAM_GROUP_NAME="bwamem" \
    UNMAPPED_READ_STRATEGY=COPY_TO_TAG \
    ALIGNER_PROPER_PAIR_FLAGS=true \
    UNMAP_CONTAMINANT_READS=true \
    ADD_PG_TAG_TO_READS=false


    The error:
    230ec26-f780-41b9-9658-3c6ede4ee422/f230ec26-f780-41b9-9658-3c6ede4ee422-EngineJobExecutionActor-germline_single_sample_workflow.SamToFastqAndBwaMemAndMba:2:1/f230ec26-f780-41b9-9658-3c6ede4ee422-BackendJobExecutionActor-germline_single_sample_workflow.SamToFastqAndBwaMemAndMba:2:1/DispatchedConfigAsyncJobExecutionActor] DispatchedConfigAsyncJobExecutionActor [UUID(f230ec26)germline_single_sample_workflow.SamToFastqAndBwaMemAndMba:2:1]: Status change from WaitingForReturnCodeFile to Done
    [ERROR] [01/25/2019 15:49:32.581] [cromwell-system-akka.dispatchers.engine-dispatcher-51] [akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor] WorkflowManagerActor Workflow f230ec26-f780-41b9-9658-3c6ede4ee422 failed (during ExecutingWorkflowState): Failed to evaluate job outputs:
    Bad output 'interval_count': For input string: "50
    50"
    cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs:
    Bad output 'interval_count': For input string: "50
    50"
    at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$handleExecutionSuccess$1(StandardAsyncExecutionActor.scala:688)
    at scala.util.Success.$anonfun$map$1(Try.scala:251)
    at scala.util.Success.map(Try.scala:209)
    at scala.concurrent.Future.$anonfun$map$1(Future.scala:289)
    at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
    at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Post edited by jaideepjoshi on
  • There must be some setting in the json/cromwell/condor that is making the tasks run twice. Any pointers are greatly appreciated.
  • bshifawbshifaw moonMember, Broadie, Moderator admin

    @jaideepjoshi
    What command are you using to execute the workflow and please upload the json file being used?

  • @bshifaw: Here is my issue. Beginning in cromwell-31.jar+HTCONDOR the commands are executed "twice"

    I use the following HellloWorl.wdl file:
    task hello {
    String name
    command {
    echo 'Hello ${name}!'
    }
    output {
    File response = stdout()
    }
    runtime {
    docker: "ubuntu"
    memory: "1GB"
    disk: "2MB"
    }
    }
    workflow helloWorld {
    call hello
    }

    I use the following HelloWorld.json:
    {
    "helloWorld.hello.name": "World"
    }

    The following is in my cromwell-conf file:
    backend {
    default = "HtCondor"
    providers {
    HtCondor {
    actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
    config {
    runtime-attributes = """
    Int cpu = 1
    Float memory_mb = 512.0
    Float disk_kb = 10000000.0
    String? nativeSpecs
    String? docker
    """

        submit = """
          chmod 755 ${script}
          cat > ${cwd}/execution/submitFile <<EOF
          Iwd=${cwd}/execution
          requirements=${nativeSpecs}
          leave_in_queue=true
          request_memory=${memory_mb}
          request_disk=${disk_kb}
          error=${err}
          output=${out}
          log_xml=true
          request_cpus=${cpu}
          executable=${script}
          log=${cwd}/execution/execution.log
          queue
          EOF
          condor_submit ${cwd}/execution/submitFile
        """
    
        submit-docker = """
          chmod 755 ${script}
          cat > ${cwd}/execution/dockerScript <<EOF
          #!/bin/bash
          docker run --rm -i -v ${cwd}:${docker_cwd} ${docker} /bin/bash ${script}
          EOF
          chmod 755 ${cwd}/execution/dockerScript
          cat > ${cwd}/execution/submitFile <<EOF
          Iwd=${cwd}/execution
          requirements=${nativeSpecs}
          leave_in_queue=true
          request_memory=${memory_mb}
          request_disk=${disk_kb}
          error=${cwd}/execution/stderr
          output=${cwd}/execution/stdout
          log_xml=true
          request_cpus=${cpu}
          executable=${cwd}/execution/dockerScript
          log=${cwd}/execution/execution.log
          queue
          EOF
          condor_submit ${cwd}/execution/submitFile
        """
    
        kill = "condor_rm ${job_id}"
        check-alive = "condor_q ${job_id}"
        job-id-regex = "(?sm).*cluster (\\d+)..*"
      }
    }
    

    If I run this cromwell as follows:
    java -jar -Dconfig.file=reference.conf cromwell-30.2.jar run HelloWorld.wdl --inputs HelloWorld.json

    The command completes with rc=0 and stdout CORRECTLY reporting
    cat stdout
    Hello World!

    IF RUN ANY other version of cromwell, with everthing else UNCHANGED:
    java -jar -Dconfig.file=reference.conf cromwell-31.jar run HelloWorld.wdl --inputs HelloWorld.json

    The command completes with rc=0 and stdout INCORRECTLY reporting the following:
    cat stdout
    Hello World!
    Hello World!

    This happens cromwell-31.jar - cromwell-36.jar

    If I run the following command WITHOUT HTCONDOR:
    java -jar -cromwell-31.jar run HelloWorld.wdl --inputs HelloWorld.json
    The command completes with rc=0 and stdout CORRECTLY reporting the following:
    cat stdout
    Hello World!

    Can u please suggest a fix? We are stuck.

  • aednicholsaednichols Member, Broadie

    I'm wondering if this may just be a problem with standard out being piped around and duplicated somehow.

    If you run a task with a side effect (like create a new file with the current Unix time as its name), do you see the same thing?

  • jaideepjoshijaideepjoshi Member
    edited February 4

    @aednichols. Thanks for jumping in. It appears a cromwell+htcondor issue. However I am not astute to figure out what is happening. Here is further evidence:

    Here is the script file that is created when I run cromwell-30.2.jar with the htcondor stanza in the conf file:

    !/bin/bash

    tmpDir=$(
    set -e
    cd /cromwell-executions/helloWorld/0a9b2fb8-6ffb-403d-b21b-d927e94b51d2/call-hello/execution
    tmpDir="$(mktemp -d "$PWD"/tmp.XXXXXX)"
    echo "$tmpDir"
    )
    chmod 777 $tmpDir
    export _JAVA_OPTIONS=-Djava.io.tmpdir=$tmpDir
    export TMPDIR=$tmpDir
    (
    cd /cromwell-executions/helloWorld/0a9b2fb8-6ffb-403d-b21b-d927e94b51d2/call-hello/execution

    )
    (
    cd /cromwell-executions/helloWorld/0a9b2fb8-6ffb-403d-b21b-d927e94b51d2/call-hello/execution
    echo 'Hello World!'
    )
    echo $? > /cromwell-executions/helloWorld/0a9b2fb8-6ffb-403d-b21b-d927e94b51d2/call-hello/execution/rc.tmp
    (
    cd /cromwell-executions/helloWorld/0a9b2fb8-6ffb-403d-b21b-d927e94b51d2/call-hello/execution

    )
    (
    cd /cromwell-executions/helloWorld/0a9b2fb8-6ffb-403d-b21b-d927e94b51d2/call-hello/execution
    sync
    )
    mv /cromwell-executions/helloWorld/0a9b2fb8-6ffb-403d-b21b-d927e94b51d2/call-hello/execution/rc.tmp /cromwell-executions/helloWorld/0a9b2fb8-6ffb-403d-b21b-d927e94b51d2/call-hello/execution/rc

    Here is the script file that is created when I run cromwell-31.jar (upto cromwell-36.jar) with the htcondor stanza in the conf file:

    !/bin/bash

    tmpDir=$(
    set -e
    cd /cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello/execution
    tmpDir="$(mkdir -p "/cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello/tmp.7bfa6e47" && echo "/cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-h
    ello/tmp.7bfa6e47")"
    echo "$tmpDir"
    )
    chmod 777 "$tmpDir"
    export _JAVA_OPTIONS=-Djava.io.tmpdir="$tmpDir"
    export TMPDIR="$tmpDir"
    export HOME="$HOME$HOME$HOME$HOME$HOME"
    (
    cd /cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello/execution

    )
    (
    cd /cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello/execution

    echo 'Hello World!'
    ) > >(tee '/cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello/execution/stdout') 2> >(tee '/cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello
    /execution/stderr' >&2)
    echo $? > /cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello/execution/rc.tmp
    (
    cd /cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello/execution
    sync

    )
    mv /cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello/execution/rc.tmp /cromwell-executions/helloWorld/52ed8599-6f10-4ef1-b9bb-2b1029f31ae3/call-hello/execution/rc

  • aednicholsaednichols Member, Broadie

    Hmm, those scripts are a bit hard to read but I definitely see only one "Hello World!" in each.

  • jaideepjoshijaideepjoshi Member
    edited February 5

    Yet, the first script when created and executed by 30.2.jar+condor produces stdout with Hello World! and the second script when created and run by >30.2.jar+condor produces 2 lines of Hello World in the stdout!!! Net result is this pipeline fails when call-GetBwaVersion results in two lines in stdout

  • @aednichols @bshifaw congrats on the new release of 4.1. I wish I could use it with my htcondor setup. But I am limited by the fact that I cannot go pat cromwell 30.2.jar because of this issue. Any help is greatly appreciated.

Sign In or Register to comment.