A USER ERROR has occurred: Traversal by intervals was requested but some input files are not indexed

Hi Team,

We are running Cromwell on AWS and running five-dollar-genome-analysis-pipeline-master using GATK toolset. After resolving couple issues related to missing files, NIO etc, we are facing yet another issue related to file indexing during BaseRecalibrator processing.

Can you please help us to identify for missing index files or script which is responsible to index bam files?

Here is error stack trace for your reference.

"A USER ERROR has occurred: Traversal by intervals was requested but some input files are not indexed.

00:46:33
Please index all input files:
Please index all input files:

00:46:33
samtools index /cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/29c2f87f-54e6-47d3-aa46-b062cee5df57/call-to_bam_workflow/ToBam.to_bam_workflow/3e2283ad-6b4d-4e5d-af68-438f9af13843/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam
samtools index /cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/29c2f87f-54e6-47d3-aa46-b062cee5df57/call-to_bam_workflow/ToBam.to_bam_workflow/3e2283ad-6b4d-4e5d-af68-438f9af13843/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam"

Do let us know if any other information required. Thanks in advance!

Answers

  • ssb_cromwellssb_cromwell Member
    Just an update: it is asking to provide index file by referring to "samtools index /cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/29c2f87f-54e6-47d3-aa46-b062cee5df57/call-to_bam_workflow/ToBam.to_bam_workflow/3e2283ad-6b4d-4e5d-af68-438f9af13843/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam"


    But its index file is already available with extension .bai on AWS S3.

    Does this mean it is not able to locate it over S3?
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ssb_cromwell

    I will move this over to the firecloud forum and someone from the cromwell/firecloud team will help you out with it.

  • ssb_cromwellssb_cromwell Member
    Thank @bhanuGandham ,

    Hi Firecloud Team,

    There is another observation wherein I can see .bam is successfully copied from S3 to local /cromwell_root but there is no copy command executed for .bai for the same. Is this normal behaviour or some bug which haven't copied .bai to local_disk from S3 even though a file is very much available in there.
    I'm suspecting "samtools index" command needs both the file but only one was copied over from S3. Please suggest how to push this file from S3 to /cromwell_root on local disk.
  • AdelaideRAdelaideR Member admin

    @ssb_cromwell

    A way to look at which files are in your Cromwell folder and findable by the workflow is to install the Google Cloud SDK tools on your machine

    Once installed, you can see what files are in the bucket by opening the terminal program on your mac and using a command such as

    gsutil ls gs://fc-bb08d0xx-92xx-4278-axxx-2xxxxxxxxxxx/some_directory/

    Of course you would replace the string with your firecloud bucket instead.

    If the files have not copied over, then it is possible to do so, there are some easy to follow instructions here

  • ssb_cromwellssb_cromwell Member
    Hi @AdelaideR,

    Thanks for your response!

    I'm not sure If I was able to put my point clear enough. Let me try to explain further a bit.

    We are using AWS S3 to host all reference files for the workflows which will run over the Cromwell server created on the AWS EC2 instance. In the above problem, we are very much able to run workflow and able to complete the couple of them for "five-dollar-genome-analysis-pipeline" but while performing bam_processing workflow, there is internal functionality on Cromwell server wherein it will fetch required files from S3: locations and keep it Cromwell server mount location of EC2Instance(my observation). It is downloading all the files but no download logged for .bai file for the corresponding .bam, and throws an above-mentioned error for the .bam file.

    Is this normal and no need to copy the .bai file to EC2 Cromwell's local-disk mount? then why it throws an error for missing index file?
    OR
    There are some issues with the code and it skips copying .bai from the S3 location?

    Please note we don't have any issues while viewing files on S3 location, that can be achieved with AWS front UI screen.

    Regards
    Satwinder Singh
  • AdelaideRAdelaideR Member admin

    @ssb_cromwell Can you please link me to the WDL that is in your pipeline? Or paste it in here? There are a few versions available, and I want to make sure we are working from the same one.

  • AdelaideRAdelaideR Member admin

    One potential error could be that it is looking for bam files with the suffix "unmapped.bam", which can be changed in the configuration before launching the method to your existing suffix "aligned.duplicate_marked.sorted.bam"

    Or are your inputs "unmapped.bam"?

  • bshifawbshifaw Member, Broadie, Moderator admin

    Hi @ssb_cromwell ,

    Would it be possible to share your workspace with [email protected] so that our team could investigate the issue. Also, provide the name of the workspace as well as the failed submission id.

  • AdelaideRAdelaideR Member admin

    @ssb_cromwell Did you figure it out? Please post an update if so, or share the workspace so we can take a look.

  • Hi @bshifaw , we are implementing cromwell on AWS using Ec2 instance. So I'm not aware what you exactly meant by workspace. I can provide you logs though.

    @AdelaideR , there is no break thru yet and we are still looking for the clue. Here are attached bam_processing.wdl and fc_germline_single_sample_workflow.wdl for your reference. Let me know if anything else required.

    Thanks for all your support!
  • I'm not able to post or attach .wdl files here.
  • Using GATK jar /usr/gitc/gatk4/gatk-package-4.beta.5-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:+PrintFlagsFinal -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc_log.log -Xms4000m -jar /usr/gitc/gatk4/gatk-package-4.beta.5-local.jar BaseRecalibrator -R /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.fasta -I /cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/fdfa31eb-48b5-4524-b165-6280f04dbee3/call-to_bam_workflow/ToBam.to_bam_workflow/31cb9a5b-e292-42a3-9ef3-121309876c85/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam --useOriginalQualities -O NA12878.recal_data.csv -knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.dbsnp138.vcf -knownSites /cromwell_root/cromwelleast/references/broad-references/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.known_indels.vcf.gz -L chr11:1+
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/fdfa31eb-48b5-4524-b165-6280f04dbee3/call-to_bam_workflow/ToBam.to_bam_workflow/31cb9a5b-e292-42a3-9ef3-121309876c85/call-BaseRecalibrator/shard-10/tmp.ac8b7b93
    [January 8, 2019 9:44:02 PM UTC] BaseRecalibrator --useOriginalQualities true --knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.dbsnp138.vcf --knownSites /cromwell_root/cromwelleast/references/broad-references/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.known_indels.vcf.gz --output NA12878.recal_data.csv --intervals chr11:1+ --input /cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/fdfa31eb-48b5-4524-b165-6280f04dbee3/call-to_bam_workflow/ToBam.to_bam_workflow/31cb9a5b-e292-42a3-9ef3-121309876c85/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam --reference /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.fasta --mismatches_context_size 2 --indels_context_size 3 --maximum_cycle_value 500 --mismatches_default_quality -1 --insertions_default_quality 45 --deletions_default_quality 45 --low_quality_tail 2 --quantizing_levels 16 --bqsrBAQGapOpenPenalty 40.0 --preserve_qscores_less_than 6 --enableBAQ false --computeIndelBQSRTables false --defaultBaseQualities -1 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --gcs_max_retries 20 --disableToolDefaultReadFilters false
    [January 8, 2019 9:44:02 PM UTC] Executing as [email protected] on Linux 4.14.88-72.73.amzn1.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Version: 4.beta.5
    [January 8, 2019 9:44:06 PM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.06 minutes.
    Runtime.totalMemory()=4019716096
    ***********************************************************************
    A USER ERROR has occurred: Traversal by intervals was requested but some input files are not indexed.
    Please index all input files:
    samtools index /cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/fdfa31eb-48b5-4524-b165-6280f04dbee3/call-to_bam_workflow/ToBam.to_bam_workflow/31cb9a5b-e292-42a3-9ef3-121309876c85/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam
    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--javaOptions '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited January 10

    Ahh, I was a bit confused about how the workflow was being run. Nevermind about the workspace.
    After looking over the BaseRecalibrator task in the five-dollar pipe it looks like bai isn't required. Cromwell will only localize files that are specified in the task block, and bai isn't specified in this task but the bam is (input_bam). Thus Cromwell isn't suppose to copy over the bai file.

    Post edited by bshifaw on
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited January 10

    @ssb_cromwell
    Would you mind testing the workflow using us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135 docker. The pipeline was tested using this docker image and any other docker image may cause a fuss.

  • Thanks for response @bshifaw, I agree as to why it is not localizing .bai file but the error confused us, it is looking for index file for .bam which obviously a .bai file.

    Sure, we can try to use the mentioned docker image and re-run the workflow. I'll keep you posted!
  • Hi @bshifaw , we ran workflow after docker image change as suggested. But now it is throwing below error.

    [2019-01-11 01:18:46,13] [info] AwsBatchAsyncBackendJobExecutionActor [70990161to_bam_workflow.BaseRecalibrator:12:1]: Status change from Running to Failed
    [2019-01-11 01:19:29,51] [info] AwsBatchAsyncBackendJobExecutionActor [70990161to_bam_workflow.BaseRecalibrator:4:1]: Status change from Running to Failed
    [2019-01-11 01:20:12,29] [info] AwsBatchAsyncBackendJobExecutionActor [70990161to_bam_workflow.BaseRecalibrator:0:1]: Status change from Running to Failed
    [2019-01-11 01:20:35,89] [info] AwsBatchAsyncBackendJobExecutionActor [70990161to_bam_workflow.BaseRecalibrator:5:1]: Status change from Running to Failed
    [2019-01-11 01:20:36,94] [error] WorkflowManagerActor Workflow 667332c0-398b-4666-92f0-0ff5b6785cf8 failed (during ExecutingWorkflowState): Job to_bam_workflow.BaseRecalibrator:13:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-13/BaseRecalibrator-13-stderr.log.
    Could not retrieve content: /tmp/temp-s3-5715547661287512420cromwell-execution_germline_single_sample_workflow_667332c0-398b-4666-92f0-0ff5b6785cf8_call-to_bam_workflow_ToBam.to_bam_workflow_70990161-892f-4d43-96a1-4ad447f29d97_call-BaseRecalibrator_shard-13_BaseRecalibrator-13-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:7:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-7/BaseRecalibrator-7-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 87588FDFC01B7795)
    Job to_bam_workflow.BaseRecalibrator:5:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-5/BaseRecalibrator-5-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: CA0C74CAF6488F7D)
    Job to_bam_workflow.BaseRecalibrator:14:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-14/BaseRecalibrator-14-stderr.log.
    Could not retrieve content: /tmp/temp-s3-1973107301335651593cromwell-execution_germline_single_sample_workflow_667332c0-398b-4666-92f0-0ff5b6785cf8_call-to_bam_workflow_ToBam.to_bam_workflow_70990161-892f-4d43-96a1-4ad447f29d97_call-BaseRecalibrator_shard-14_BaseRecalibrator-14-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:8:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-8/BaseRecalibrator-8-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 098DF4DF94F45C63)
    Job to_bam_workflow.BaseRecalibrator:3:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-3/BaseRecalibrator-3-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: A2F3B54093E6DA75)
    Job to_bam_workflow.BaseRecalibrator:0:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-0/BaseRecalibrator-0-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 2D99286F1E3DDCB5)
    Job to_bam_workflow.BaseRecalibrator:15:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-15/BaseRecalibrator-15-stderr.log.
    Could not retrieve content: /tmp/temp-s3-4070492110274687565cromwell-execution_germline_single_sample_workflow_667332c0-398b-4666-92f0-0ff5b6785cf8_call-to_bam_workflow_ToBam.to_bam_workflow_70990161-892f-4d43-96a1-4ad447f29d97_call-BaseRecalibrator_shard-15_BaseRecalibrator-15-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:6:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-6/BaseRecalibrator-6-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 15A5A5462EB165BD)
    Job to_bam_workflow.BaseRecalibrator:11:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-11/BaseRecalibrator-11-stderr.log.
    Could not retrieve content: /tmp/temp-s3-3190030129917803129cromwell-execution_germline_single_sample_workflow_667332c0-398b-4666-92f0-0ff5b6785cf8_call-to_bam_workflow_ToBam.to_bam_workflow_70990161-892f-4d43-96a1-4ad447f29d97_call-BaseRecalibrator_shard-11_BaseRecalibrator-11-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:1:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-1/BaseRecalibrator-1-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: F584B9C52AB8349C)
    Job to_bam_workflow.BaseRecalibrator:10:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-10/BaseRecalibrator-10-stderr.log.
    Could not retrieve content: /tmp/temp-s3-3550670777646749355cromwell-execution_germline_single_sample_workflow_667332c0-398b-4666-92f0-0ff5b6785cf8_call-to_bam_workflow_ToBam.to_bam_workflow_70990161-892f-4d43-96a1-4ad447f29d97_call-BaseRecalibrator_shard-10_BaseRecalibrator-10-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:12:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-12/BaseRecalibrator-12-stderr.log.
    Could not retrieve content: /tmp/temp-s3-8212448124103011357cromwell-execution_germline_single_sample_workflow_667332c0-398b-4666-92f0-0ff5b6785cf8_call-to_bam_workflow_ToBam.to_bam_workflow_70990161-892f-4d43-96a1-4ad447f29d97_call-BaseRecalibrator_shard-12_BaseRecalibrator-12-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:2:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-2/BaseRecalibrator-2-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 29B7DFA494238269)
    Job to_bam_workflow.BaseRecalibrator:16:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-16/BaseRecalibrator-16-stderr.log.
    Could not retrieve content: /tmp/temp-s3-2601903498696103991cromwell-execution_germline_single_sample_workflow_667332c0-398b-4666-92f0-0ff5b6785cf8_call-to_bam_workflow_ToBam.to_bam_workflow_70990161-892f-4d43-96a1-4ad447f29d97_call-BaseRecalibrator_shard-16_BaseRecalibrator-16-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:4:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-4/BaseRecalibrator-4-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: E0FE9D6ABE6DEA22)
    Job to_bam_workflow.BaseRecalibrator:9:1 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-9/BaseRecalibrator-9-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 3B8F7305CD457ABA)
    cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt
    Caused by: java.io.IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:146)
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:145)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
    at scala.util.Failure.recoverWith(Try.scala:232)
    at cromwell.engine.io.nio.NioFlow.withReader(NioFlow.scala:145)
    at cromwell.engine.io.nio.NioFlow.limitFileContent(NioFlow.scala:154)
    at cromwell.engine.io.nio.NioFlow.$anonfun$readAsString$1(NioFlow.scala:98)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:85)
    at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:336)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:357)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:303)
    at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: java.nio.file.NoSuchFileException: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt
    at org.lerch.s3fs.S3FileSystemProvider.newInputStream(S3FileSystemProvider.java:350)
    at java.nio.file.Files.newInputStream(Files.java:152)
    at better.files.File.newInputStream(File.scala:337)
    at cromwell.core.path.BetterFileMethods.newInputStream(BetterFileMethods.scala:240)
    at cromwell.core.path.BetterFileMethods.newInputStream$(BetterFileMethods.scala:239)
    at cromwell.filesystems.s3.S3Path.newInputStream(S3PathBuilder.scala:156)
    at cromwell.core.path.EvenBetterPathMethods.mediaInputStream(EvenBetterPathMethods.scala:94)
    at cromwell.core.path.EvenBetterPathMethods.mediaInputStream$(EvenBetterPathMethods.scala:91)
    at cromwell.filesystems.s3.S3Path.mediaInputStream(S3PathBuilder.scala:156)
    at cromwell.engine.io.nio.NioFlow.$anonfun$withReader$1(NioFlow.scala:145)
    at cromwell.util.TryWithResource$.$anonfun$tryWithResource$1(TryWithResource.scala:14)
    at scala.util.Try$.apply(Try.scala:209)
    at cromwell.util.TryWithResource$.tryWithResource(TryWithResource.scala:10)
    ... 14 more

    [2019-01-11 01:20:36,95] [info] WorkflowManagerActor WorkflowActor-667332c0-398b-4666-92f0-0ff5b6785cf8 is in a terminal state: WorkflowFailedState
  • bshifawbshifaw Member, Broadie, Moderator admin

    What's written in s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-4/BaseRecalibrator-4-stderr.log?

  • Hi @bshifaw ,

    This is what we can see "BaseRecalibrator-4-stderr.log"

    Using GATK jar /usr/gitc/gatk4/gatk-package-4.beta.5-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:+PrintFlagsFinal -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc_log.log -Xms4000m -jar /usr/gitc/gatk4/gatk-package-4.beta.5-local.jar BaseRecalibrator -R /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.fasta -I s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam --useOriginalQualities -O NA12878.recal_data.csv -knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.dbsnp138.vcf -knownSites /cromwell_root/cromwelleast/references/broad-references/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.known_indels.vcf.gz -L chr5:1+
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-BaseRecalibrator/shard-4/tmp.5b9fc467
    [January 11, 2019 1:14:26 AM UTC] BaseRecalibrator --useOriginalQualities true --knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.dbsnp138.vcf --knownSites /cromwell_root/cromwelleast/references/broad-references/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.known_indels.vcf.gz --output NA12878.recal_data.csv --intervals chr5:1+ --input s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/667332c0-398b-4666-92f0-0ff5b6785cf8/call-to_bam_workflow/ToBam.to_bam_workflow/70990161-892f-4d43-96a1-4ad447f29d97/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam --reference /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.fasta --mismatches_context_size 2 --indels_context_size 3 --maximum_cycle_value 500 --mismatches_default_quality -1 --insertions_default_quality 45 --deletions_default_quality 45 --low_quality_tail 2 --quantizing_levels 16 --bqsrBAQGapOpenPenalty 40.0 --preserve_qscores_less_than 6 --enableBAQ false --computeIndelBQSRTables false --defaultBaseQualities -1 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --gcs_max_retries 20 --disableToolDefaultReadFilters false
    [January 11, 2019 1:14:26 AM UTC] Executing as [email protected] on Linux 4.14.88-72.73.amzn1.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Version: 4.beta.5
    [January 11, 2019 1:14:27 AM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=4019716096
    java.nio.file.ProviderNotFoundException: Provider "s3" not found
    at java.nio.file.FileSystems.newFileSystem(FileSystems.java:341)
    at org.broadinstitute.hellbender.utils.io.IOUtils.getPath(IOUtils.java:535)
    at org.broadinstitute.hellbender.cmdline.argumentcollections.RequiredReadInputArgumentCollection.getReadPaths(RequiredReadInputArgumentCollection.java:34)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeReads(GATKTool.java:318)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:556)
    at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:55)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:117)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
    at org.broadinstitute.hellbender.Main.main(Main.java:233)
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited January 11

    According to the stderr, BaseRecal was unable to run the command because the input bam path specified started with s3. This is due to the input_bam variable in the baserecal task being declared as a String to enable NIO in gcloud.

    You mentioned earlier that you resolved some NIO related errors in the workflow. This might be fixed the same way.
    I'd suggest converting the declaration of the input_bam variable from String to File. (might want to do the same for other nio File variable)

  • Hi @bshifaw ,

    I have corrected Input_bam to File from String and changed docker image to your suggestion in bam_processing.wdl file. Please find below for your reference:

    task BaseRecalibrator {
    File input_bam
    String recalibration_report_filename
    Array[String] sequence_group_interval
    File dbSNP_vcf
    File dbSNP_vcf_index
    Array[File] known_indels_sites_VCFs
    Array[File] known_indels_sites_indices
    File ref_dict
    File ref_fasta
    File ref_fasta_index
    # Int bqsr_scatter
    Int preemptible_tries

    Images used in various tasks defined in bam_processing.wdl:

    task CheckContamination {
    docker: "us.gcr.io/broad-gotc-prod/verify-bam-id:c8a66425c312e5f8be46ab0c41f8d7a1942b6e16-1500298351"

    task SortSam, MarkDuplicates, BaseRecalibrator, ApplyBQSR, GatherBQSRReports, GatherSortedBamFiles, GatherUnsortedBamFiles{
    docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135"
    }

    task SortSamSpark{
    docker: "us.gcr.io/broad-gatk/gatk:4.0.12.0"
    }

    -----------------------------

    But somehow it failed again with below reason:

    ***********************************************************************

    A USER ERROR has occurred: Traversal by intervals was requested but some input files are not indexed.
    Please index all input files:

    samtools index /cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam


    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--javaOptions '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
  • Also, I have checked all the shard's and they have thrown "Index file issue" but, as you can see in logs shard-17, it throws stange error for file doesnot exist and this file fails immediately after submitting job but other shard's take time to get fail for index file issue.

    Here is error stack trace:
    ----------------------------------------------------------------------------------------------------------------------------------------
    [2019-01-13 08:51:00,58] [info] AwsBatchAsyncBackendJobExecutionActor [04a8d232to_bam_workflow.BaseRecalibrator:5:1]: Status change from Running to Failed
    [2019-01-13 08:51:10,29] [info] AwsBatchAsyncBackendJobExecutionActor [04a8d232to_bam_workflow.BaseRecalibrator:14:1]: Status change from Running to Failed
    [2019-01-13 08:51:11,71] [error] WorkflowManagerActor Workflow ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5 failed (during ExecutingWorkflowState): Job to_bam_workflow.BaseRecalibrator:5:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-5/BaseRecalibrator-5-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: B0FDA4141F7AF69A)
    Job to_bam_workflow.BaseRecalibrator:13:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-13/BaseRecalibrator-13-stderr.log.
    Could not retrieve content: /tmp/temp-s3-1170483626647447615cromwell-execution_germline_single_sample_workflow_ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5_call-to_bam_workflow_ToBam.to_bam_workflow_04a8d232-827c-412f-839d-52ca992ae6df_call-BaseRecalibrator_shard-13_BaseRecalibrator-13-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:11:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-11/BaseRecalibrator-11-stderr.log.
    Could not retrieve content: /tmp/temp-s3-6400511325173520066cromwell-execution_germline_single_sample_workflow_ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5_call-to_bam_workflow_ToBam.to_bam_workflow_04a8d232-827c-412f-839d-52ca992ae6df_call-BaseRecalibrator_shard-11_BaseRecalibrator-11-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:1:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-1/BaseRecalibrator-1-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: A4B6BD8E93343028)
    Job to_bam_workflow.BaseRecalibrator:9:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-9/BaseRecalibrator-9-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 1DE27FB007DD7E3D)
    Job to_bam_workflow.BaseRecalibrator:14:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-14/BaseRecalibrator-14-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: B09F8E98761F92B4)
    Job to_bam_workflow.BaseRecalibrator:10:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-10/BaseRecalibrator-10-stderr.log.
    Could not retrieve content: /tmp/temp-s3-1412183292946681528cromwell-execution_germline_single_sample_workflow_ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5_call-to_bam_workflow_ToBam.to_bam_workflow_04a8d232-827c-412f-839d-52ca992ae6df_call-BaseRecalibrator_shard-10_BaseRecalibrator-10-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:16:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-16/BaseRecalibrator-16-stderr.log.
    Could not retrieve content: /tmp/temp-s3-8630014379951536999cromwell-execution_germline_single_sample_workflow_ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5_call-to_bam_workflow_ToBam.to_bam_workflow_04a8d232-827c-412f-839d-52ca992ae6df_call-BaseRecalibrator_shard-16_BaseRecalibrator-16-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:7:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-7/BaseRecalibrator-7-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 2E13B179E1D2CFB4)
    Job to_bam_workflow.BaseRecalibrator:2:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-2/BaseRecalibrator-2-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 9EAC0F0F9EE00B81)
    Job to_bam_workflow.BaseRecalibrator:3:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-3/BaseRecalibrator-3-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: B1631BF51758460C)
    Job to_bam_workflow.BaseRecalibrator:6:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-6/BaseRecalibrator-6-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: BE2DA796FF2FD87D)
    cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt
    Caused by: java.io.IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:146)
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:145)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
    at scala.util.Failure.recoverWith(Try.scala:232)
    at cromwell.engine.io.nio.NioFlow.withReader(NioFlow.scala:145)
    at cromwell.engine.io.nio.NioFlow.limitFileContent(NioFlow.scala:154)
    at cromwell.engine.io.nio.NioFlow.$anonfun$readAsString$1(NioFlow.scala:98)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:85)
    at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:336)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:357)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:303)
    at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: java.nio.file.NoSuchFileException: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt
    at org.lerch.s3fs.S3FileSystemProvider.newInputStream(S3FileSystemProvider.java:350)
    at java.nio.file.Files.newInputStream(Files.java:152)
    at better.files.File.newInputStream(File.scala:337)
    at cromwell.core.path.BetterFileMethods.newInputStream(BetterFileMethods.scala:240)
    at cromwell.core.path.BetterFileMethods.newInputStream$(BetterFileMethods.scala:239)
    at cromwell.filesystems.s3.S3Path.newInputStream(S3PathBuilder.scala:156)
    at cromwell.core.path.EvenBetterPathMethods.mediaInputStream(EvenBetterPathMethods.scala:94)
    at cromwell.core.path.EvenBetterPathMethods.mediaInputStream$(EvenBetterPathMethods.scala:91)
    at cromwell.filesystems.s3.S3Path.mediaInputStream(S3PathBuilder.scala:156)
    at cromwell.engine.io.nio.NioFlow.$anonfun$withReader$1(NioFlow.scala:145)
    at cromwell.util.TryWithResource$.$anonfun$tryWithResource$1(TryWithResource.scala:14)
    at scala.util.Try$.apply(Try.scala:209)
    at cromwell.util.TryWithResource$.tryWithResource(TryWithResource.scala:10)
    ... 14 more

    Job to_bam_workflow.BaseRecalibrator:8:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-8/BaseRecalibrator-8-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 7085AA9873A3186D)
    Job to_bam_workflow.BaseRecalibrator:0:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-0/BaseRecalibrator-0-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 6E8F1E59720C597A)
    Job to_bam_workflow.BaseRecalibrator:4:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-4/BaseRecalibrator-4-stderr.log.
    Could not retrieve content: Access Denied (Service: S3Client; Status Code: 403; Request ID: 933B5E8C14EE4287)
    Job to_bam_workflow.BaseRecalibrator:15:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-15/BaseRecalibrator-15-stderr.log.
    Could not retrieve content: /tmp/temp-s3-2270783161526827952cromwell-execution_germline_single_sample_workflow_ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5_call-to_bam_workflow_ToBam.to_bam_workflow_04a8d232-827c-412f-839d-52ca992ae6df_call-BaseRecalibrator_shard-15_BaseRecalibrator-15-stderr.log: File name too long
    Job to_bam_workflow.BaseRecalibrator:12:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-BaseRecalibrator/shard-12/BaseRecalibrator-12-stderr.log.
    Could not retrieve content: /tmp/temp-s3-8266904772302321145cromwell-execution_germline_single_sample_workflow_ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5_call-to_bam_workflow_ToBam.to_bam_workflow_04a8d232-827c-412f-839d-52ca992ae6df_call-BaseRecalibrator_shard-12_BaseRecalibrator-12-stderr.log: File name too long
    [2019-01-13 08:51:11,71] [info] WorkflowManagerActor WorkflowActor-ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5 is in a terminal state: WorkflowFailedState
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited January 14

    That's unfortunate, lets try directly solving the error instead. Looks like the tool is asking us to provide an index file for the input bam.

    A USER ERROR has occurred: Traversal by intervals was requested but some input files are not indexed.
    

    Option 1: Like the tool suggest you can index the bam file by using samtools. The gitc docker being used has samtools installed in the image, thus you should be able to add the samtools indexing command right before the baserecal command.
    Something like

    samtools index ~{input_bam} 
    

    Option 2: The task before BaseRecalibrator is SortSam which produces an output index for the bam it generates. You can add it as input for the BaseRecalibrator call and task blocks below the input bam variable.


    Its odd that the tool would pass this error. If you have the time, please try running broad-prod-wgs-germline-snps-indels/PairedEndSingleSampleWf.wdl on AWS. It's similar to the five-dollar-pipe but with less optimization like NIO and all calls and tasks are in one wdl, making debugging easier.

    Post edited by bshifaw on
  • ssb_cromwellssb_cromwell Member
    edited January 14

    @bshifaw , somehow I'm getting page doesn't exist whenever I'm trying to open links mentioned in Option1 and Option2. Would you mind providing snippets?

    Also, I just check our bam_processing.dwl. We are using samtools index command in SortSamSpark. Ideally it should produce .bai file?

    This task is getting executed well before the BaseRecalibrator. Please find .wdl for the same as below:

    # Sort BAM file by coordinate order -- using Spark!
    task SortSamSpark {
      File input_bam
      String output_bam_basename
      Int preemptible_tries
      Int compression_level
    
      # SortSam spills to disk a lot more because we are only store 300000 records in RAM now because its faster for our data so it needs
      # more disk space.  Also it spills to disk in an uncompressed format so we need to account for that with a larger multiplier
      Float sort_sam_disk_multiplier = 3.25
      Int disk_size = ceil(sort_sam_disk_multiplier * size(input_bam, "GB")) + 20
    
      command {
        set -e
        export GATK_LOCAL_JAR=/root/gatk.jar
    
        gatk --java-options "-Dsamjdk.compression_level=${compression_level} -Xms100g -Xmx100g" \
          SortSamSpark \
          -I ${input_bam} \
          -O ${output_bam_basename}.bam \
          -- --conf spark.local.dir=. --spark-master 'local[16]' --conf 'spark.kryo.referenceTracking=false'
    
          samtools index ${output_bam_basename}.bam ${output_bam_basename}.bai 
      }
      runtime {
        docker: "us.gcr.io/broad-gatk/gatk:4.0.12.0"
    #    disks: "local-disk " + disk_size + " HDD"  replacing with local-disk since cromwell can't yet deal with auto-scaling EC2 EBS instances yet
        disks: "local-disk"
        bootDiskSizeGb: "15"
        cpu: "16"
        memory: "102 GB"
        preemptible: preemptible_tries
      }
      output {
        File output_bam = "${output_bam_basename}.bam"
        File output_bam_index = "${output_bam_basename}.bai"
      }
    }
    
    Post edited by bshifaw on
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited January 14

    I corrected the hyperlinks in my previous post, you should be able to use them now.

    Both SortSam and SortSamSpark produce the bai, so this would be used as input to BaseRecal call and task.

    output {
    File output_bam = "${output_bam_basename}.bam"
    File output_bam_index = "${output_bam_basename}.bai"
    }

    Post edited by bshifaw on
  • ssb_cromwellssb_cromwell Member
    edited January 14

    @bshifaw , sure I'll use your edited links to make changes.

    But on a quick note its not only SortSamSpark generates .bai file, same is also generated by the SortSam task as well. I have seen SortSam output files generated in S3 bucket as well(.bam, .bai,.md5).

    Hereby copied tasks for both SortSam and BaseRecalibrator. But I dont see any reference made to .bai in BaseRecalibrator command, is that something being referenced internally or we are missing the one?
    ......................................................................................................

    # Sort BAM file by coordinate order
    task SortSam {
      File input_bam
      String output_bam_basename
      Int preemptible_tries
      Int compression_level
    
      # SortSam spills to disk a lot more because we are only store 300000 records in RAM now because its faster for our data so it needs
      # more disk space.  Also it spills to disk in an uncompressed format so we need to account for that with a larger multiplier
      Float sort_sam_disk_multiplier = 3.25
      Int disk_size = ceil(sort_sam_disk_multiplier * size(input_bam, "GB")) + 20
    
      command {
        java -Dsamjdk.compression_level=${compression_level} -Xms4000m -jar /usr/gitc/picard.jar \
          SortSam \
          INPUT=${input_bam} \
          OUTPUT=${output_bam_basename}.bam \
          SORT_ORDER="coordinate" \
          CREATE_INDEX=true \
          CREATE_MD5_FILE=true \
          MAX_RECORDS_IN_RAM=300000
    
      }
      runtime {
        docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135"
    #    disks: "local-disk " + disk_size + " HDD"  replacing with local-disk since cromwell can't yet deal with auto-scaling EC2 EBS instances yet
        disks: "local-disk"
        cpu: "1"
        memory: "5000 MB"
        preemptible: preemptible_tries
      }
      output {
        File output_bam = "${output_bam_basename}.bam"
        File output_bam_index = "${output_bam_basename}.bai"
        File output_bam_md5 = "${output_bam_basename}.bam.md5"
      }
    }
    

    .............................................................................................................................

    # Generate Base Quality Score Recalibration (BQSR) model
    task BaseRecalibrator {
      File input_bam
      String recalibration_report_filename
      Array[String] sequence_group_interval
      File dbSNP_vcf
      File dbSNP_vcf_index
      Array[File] known_indels_sites_VCFs
      Array[File] known_indels_sites_indices
      File ref_dict
      File ref_fasta
      File ref_fasta_index
    #  Int bqsr_scatter
      Int preemptible_tries
    
      Float ref_size = size(ref_fasta, "GB") + size(ref_fasta_index, "GB") + size(ref_dict, "GB")
      Float dbsnp_size = size(dbSNP_vcf, "GB")
      Int disk_size = ceil((size(input_bam, "GB") ) + ref_size + dbsnp_size) + 20
    
      command {
        /usr/gitc/gatk4/gatk-launch --javaOptions "-XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:+PrintFlagsFinal \
          -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails \
          -Xloggc:gc_log.log -Xms4000m" \
          BaseRecalibrator \
          -R ${ref_fasta} \
          -I ${input_bam} \
          --useOriginalQualities \
          -O ${recalibration_report_filename} \
          -knownSites ${dbSNP_vcf} \
          -knownSites ${sep=" -knownSites " known_indels_sites_VCFs} \
          -L ${sep=" -L " sequence_group_interval}
      }
      runtime {
        docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135"
        preemptible: preemptible_tries
        memory: "6 GB"
    #    disks: "local-disk " + disk_size + " HDD"  replacing with local-disk since cromwell can't yet deal with auto-scaling EC2 EBS instances yet
        disks: "local-disk"
      }
      output {
        File recalibration_report = "${recalibration_report_filename}"
      }
    }
    
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited January 14

    I double checked and your pipeline is using SortSam, not SortSamSpark. This is evident by the error message given by BaseRecal stderr log file. The input bam is originating from call-SortSampleBam folder.


    A USER ERROR has occurred: Traversal by intervals was requested but some input files are not indexed.
    Please index all input files:

    samtools index /cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/ad6057ec-0ea2-4b83-ba5c-cbdd7e1e5ca5/call-to_bam_workflow/ToBam.to_bam_workflow/04a8d232-827c-412f-839d-52ca992ae6df/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam


    Correct bai is not being referenced, and its puzzling that BaseRecal is throwing this error though its run fine before without the index. Unfortunately, its giving this the error so you will need to perform one of the options from earlier. If this doesn't help get passed the error you may need to switch to the broad-prod-wgs-germline-snps-indels/PairedEndSingleSampleWf.wdl.

  • Hi @bshifaw,

    I used a second option and linked index file to BaseRecal as below:

    call Processing.BaseRecalibrator as BaseRecalibrator {
    input:
    input_bam = SortSampleBam.output_bam,
    input_bam_index=SortSampleBam.output_bam_index,
    ##Above line added to remove index file missing issue
    recalibration_report_filename = base_file_name + ".recal_data.csv",

    task BaseRecalibrator {
    File input_bam
    File input_bam_index
    ##Above line added to resolve index issue
    ....................
    Based on this change all jobs for BaseRecal executed successfully except one which is still failing as it used to fail. Please find below logs for your reference:

    [2019-01-14 10:41:53,00] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:13:1]: Status change from Initializing to Running

    [2019-01-14 10:44:47,79] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:2:1]: Status change from Running to Succeeded
    [2019-01-14 10:45:07,43] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:3:1]: Status change from Running to Succeeded
    [2019-01-14 10:46:02,90] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.CheckContamination:NA:1]: Status change from Running to Succeeded

    [2019-01-14 10:57:53,48] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:16:1]: Status change from Running to Succeeded
    [2019-01-14 10:58:07,44] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:10:1]: Status change from Running to Succeeded
    [2019-01-14 10:58:44,38] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:9:1]: Status change from Running to Succeeded
    [2019-01-14 10:58:52,82] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:5:1]: Status change from Running to Succeeded
    [2019-01-14 10:58:53,61] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:7:1]: Status change from Running to Succeeded
    [2019-01-14 10:58:56,80] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:1:1]: Status change from Running to Succeeded
    [2019-01-14 10:59:10,90] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:14:1]: Status change from Running to Succeeded
    [2019-01-14 10:59:40,39] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:6:1]: Status change from Running to Succeeded
    [2019-01-14 10:59:41,29] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:12:1]: Status change from Running to Succeeded
    [2019-01-14 10:59:42,08] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:4:1]: Status change from Running to Succeeded
    [2019-01-14 11:00:51,81] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:0:1]: Status change from Running to Succeeded

    [2019-01-14 11:02:52,34] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:8:1]: Status change from Running to Succeeded
    [2019-01-14 11:03:14,11] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:13:1]: Status change from Running to Succeeded
    [2019-01-14 11:03:21,48] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:15:1]: Status change from Running to Succeeded
    [2019-01-14 11:04:36,50] [info] AwsBatchAsyncBackendJobExecutionActor [046c18feto_bam_workflow.BaseRecalibrator:11:1]: Status change from Running to Succeeded
    [2019-01-14 11:04:37,67] [error] WorkflowManagerActor Workflow a2b78338-945d-4959-adb7-441347d11dae failed (during ExecutingWorkflowState): cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/a2b78338-945d-4959-adb7-441347d11dae/call-to_bam_workflow/ToBam.to_bam_workflow/046c18fe-982c-4252-be80-64f22f078f8f/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/a2b78338-945d-4959-adb7-441347d11dae/call-to_bam_workflow/ToBam.to_bam_workflow/046c18fe-982c-4252-be80-64f22f078f8f/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt
    Caused by: java.io.IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/a2b78338-945d-4959-adb7-441347d11dae/call-to_bam_workflow/ToBam.to_bam_workflow/046c18fe-982c-4252-be80-64f22f078f8f/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/a2b78338-945d-4959-adb7-441347d11dae/call-to_bam_workflow/ToBam.to_bam_workflow/046c18fe-982c-4252-be80-64f22f078f8f/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:146)
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:145)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
    at scala.util.Failure.recoverWith(Try.scala:232)
    at cromwell.engine.io.nio.NioFlow.withReader(NioFlow.scala:145)
    at cromwell.engine.io.nio.NioFlow.limitFileContent(NioFlow.scala:154)
    at cromwell.engine.io.nio.NioFlow.$anonfun$readAsString$1(NioFlow.scala:98)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:85)
    at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:336)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:357)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:303)
    at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: java.nio.file.NoSuchFileException: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/a2b78338-945d-4959-adb7-441347d11dae/call-to_bam_workflow/ToBam.to_bam_workflow/046c18fe-982c-4252-be80-64f22f078f8f/call-BaseRecalibrator/shard-17/BaseRecalibrator-17-rc.txt
    at org.lerch.s3fs.S3FileSystemProvider.newInputStream(S3FileSystemProvider.java:350)
    at java.nio.file.Files.newInputStream(Files.java:152)
    at better.files.File.newInputStream(File.scala:337)
    at cromwell.core.path.BetterFileMethods.newInputStream(BetterFileMethods.scala:240)
    at cromwell.core.path.BetterFileMethods.newInputStream$(BetterFileMethods.scala:239)
    ....................

  • Based on my further investigation, it seems this is not a failure like others. This job failed immediately whenever it was triggered.

    Also, I don't any Shard-17 folder created on S3 as it created for other 16 Shard's.

    Below message is from AWS Batch Job when it exited container, is this related to system capacity as it is not able to issue more jobs after issuing 16 jobs.

    Status: FAILED
    Status reason: Container Overrides length must be at most 8192
    Container message:

    I really appreciate your help to move us forward! :smiley:

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    It appears that something about shard 17 caused the underlying ECS container override to exceed 8192 bytes. The job would not have run so the "rc file is missing" errors would be expected. I've passed along this info to our AWS friends, hopefully they can give us some insight where to look next.

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭
    edited January 14

    Could you possibly go into the Batch UI and see the command that was run? We're suspecting that the long paths for these nested workflows might have overflowed the container limit override but we'd like to be sure. While the paths are long the previous shards did succeed.

  • Hi @mcovarr, we can easily see there is something weird with command executed in other Shards and the command executed by this problematic Shard.

    Please find below job definitions from AWSBatch UI:

    Normal Successful Shard:----->
    {
    "jobDefinitionName": "ToBam_to_bam_workflow-ToBam_Processing_BaseRecalibrator",
    "jobDefinitionArn": "job-definition/ToBam_to_bam_workflow-ToBam_Processing_BaseRecalibrator:740",
    "revision": 740,
    "status": "ACTIVE",
    "type": "container",
    "parameters": {},
    "containerProperties": {
    "image": "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135",
    "vcpus": 1,
    "memory": 6000,
    "command": [
    "gzipdata",
    "/bin/bash",
    "-c",
    "H4sIAAAAAAAAAOVWW2/bNhR+1684U91cUFAXW76lc4a0SZoMTZ0lRpGHAQJNUjJnSRRI2om77b/vyHIT1zXSBOgKDAMMUfp4+J0rz/GLn/yxLPwxNRPHYRx8plV+K7Is1kpZx+blsdSDxl4+5VIDKcH9UuL+S1Dz8EHEnWAzK1Xhp0LnmSxEbGSRZrjQvMTlVulpkqlbv81Yhyc8IhENA3xEIRn3wy7pNsN21GpFLGlGPqPIaVU8pvnDyZF6Q3NvE+3zTp83WyEJaLtPok7UIr2k2Se81aMipN0mbfVrvjfUiCuBr3KsqVXaNxOqOYl8dNrj3SCIuoFwYWcHBJuo/53j+w6b5IpDt9sFt1EXguuIu1JpC/GvRx+P4uHl6Hz44XpAjv+gc+pJ5aEY1sngK/nRxeXx+dXX+Nnw4gTRanGdvS0F6Ow7amY/+4aif9YMf/sIe42GC0LrbdsIV9tOPk1kgtlrrNG4+Ll2zHWspiXs6vxxsV04uTkfOVYI2N0ohs2QkogYyysTM5Xuws+bxDvPYKk8uWdZtwcOd5rItDVsjj8z2k+lZX5K7TRaPklGZwWbACFVvoZlVacGXHJzc/Du7Ujm4r3MpR20A6ihM0HLUy1WcFjDry61LOxpRlNzKguawe8OrG3URNcWq918AR9TuxUWlsrMrFjQz5QdpCzGl8pphHITBUGQu0uJzQjVx67gsbupRSK0KJgw/lgryskacKZyhTezlKIwMTVG5ONs0ep5CR6kNfn5o+T/qYt/jVfuemkN8vgfjsJmr9vzMJhpIbjHZ2UmGWYpzqmeImBQHBdUVkeCzIwYaplWWf9thsesFKvEDeEzm66yE3NqqcfMvN6dFuq2uJYWpb93ovjYFGWIL3OWfDdlFzLLUEvB4xBL712cqozHWBAFxz7pyYKLzHiTtNbqpZ/+VQ+X1PFK6Urf0tH3wCa6fRC+cvYBDjc7TPNws8ktp1jjFxT9dtvRzLN3tmrn2F5eAOUcKHiJzATIAsRc6AWIvLQLwHYvGB5agFWQUCaxLLCG1nA0XFXkn2h1PQB/diKAZWrGt3SuBD0FD4hdlEgCpNZCyqpbBPAX3FGdYgcJqov5EnXOsJ+99Jem4azY1gvNomBONUjy+bM8f7KwU4fWvTi/OCEfhTbo5gGEXuC8VYUVhSUjdOYA8llmZUm19WlmhS4wHnPxGsZqVhXWYuDydqeVsE6/S5PmuBXyJGxT3mt3mj28xywSruMQ8i2hDaVW3Fm/zKgs7jeOpSmVkXZpJrWW4qBH/DVUQSxoLgZu7ZjruA42hKcHoo7DD7JxNWCfZefDUP7htlZj/Lm2rkb/k20lpPpnJS009p6Tt31sJwD/AFLBcosLDAAA"
    ],
    "volumes": [
    {
    "host": {
    "sourcePath": "/cromwell_root/ToBam.to_bam_workflow/ToBam.Processing.BaseRecalibrator/9d69d231-0a59-4643-8f29-d38ae1a72a39/Some(4)/1"
    },
    "name": "local-disk"
    }
    ],
    "environment": [
    {
    "name": "AWS_CROMWELL_LOCAL_DISK",
    "value": "/cromwell_root"
    },
    {
    "name": "AWS_CROMWELL_CALL_ROOT",
    "value": "s3://germline_single_sample_workflow/5cc6dfd4-4a10-4a41-b917-72154334cf24/call-to_bam_workflow/ToBam.to_bam_workflow/9d69d231-0a59-4643-8f29-d38ae1a72a39/call-BaseRecalibrator/shard-4"
    },
    {
    "name": "AWS_CROMWELL_OUTPUTS",
    "value": "NA12878.recal_data.csv,s3://germline_single_sample_workflow/5cc6dfd4-4a10-4a41-b917-72154334cf24/call-to_bam_workflow/ToBam.to_bam_workflow/9d69d231-0a59-4643-8f29-d38ae1a72a39/call-BaseRecalibrator/shard-4/NA12878.recal_data.csv,NA12878.recal_data.csv,local-disk /cromwell_root"
    },
    {
    "name": "AWS_CROMWELL_INPUTS_GZ",
    "value": "H4sIAAAAAAAAAOVXwY7TMBD9FT6gtuMkbZPlxCIBF1aIIq7WJHaCVceObJcWvp5pl7K71S6o1Dn1YscT6828sWeeHJ1oYBBb59edcVt6C0F9Vi0Y3XiIztO1dVsrtJXKBBF0VGG/0K0KhM9CccNY692wVcYoCJF51SmvLP5mjXcgySPDBzc4EWDUygYBIaihMT+K6okL+r3taP+TxkbPpkM2DhkSqcP61R8nwjsXX8d/5UM2q7tPArFIloi+bIIdOX4gaALOT+AuIKrtuIn7Pc8SPS6I2ql2E7WzrFd+MNoqvCW2NzjBMOJ09MDmbbuQnSxJCTzDoeSkqfmSLHM+L4uibLu8ZBiAISexsS/uFgZ6aq3lopZ5wUkG85qUi7IgVZfXRBYVKA7LHIr6Hm/lfFwdokEcdveG59Wyoki1t0pSuRmNbiEqMYBfoyHgdpzQ2ewqSV9waZ7pFl/fvgtn18pHbRAArBQ8y7L3ondGihDRAF7S3wX9rb+/5VjVZ9TN2dAXpAPdig4DgkNKdslaxgETxxQ98gHrQqIoCjFdU0SwFM1wD5PiANMeXapjS1yqR2GfulrPVPf/gk8h8Ymr9rEyUy13icX+AJlC8P9C+yoU8PRuXgnpCWR/whfCNK+Dl5PwC1pyqSEpDQAA"
    },
    {
    "name": "AWS_CROMWELL_STDERR_FILE",
    "value": "/cromwell_root/BaseRecalibrator-4-stderr.log"
    },
    {
    "name": "AWS_CROMWELL_STDOUT_FILE",
    "value": "/cromwell_root/BaseRecalibrator-4-stdout.log"
    },
    {
    "name": "AWS_CROMWELL_PATH",
    "value": "ToBam.to_bam_workflow/ToBam.Processing.BaseRecalibrator/9d69d231-0a59-4643-8f29-d38ae1a72a39/Some(4)/1"
    },
    {
    "name": "AWS_CROMWELL_RC_FILE",
    "value": "/cromwell_root/BaseRecalibrator-4-rc.txt"
    },
    {
    "name": "AWS_CROMWELL_WORKFLOW_ROOT",
    "value": "s3://germline_single_sample_workflow/5cc6dfd4-4a10-4a41-b917-72154334cf24/call-to_bam_workflow/ToBam.to_bam_workflow/9d69d231-0a59-4643-8f29-d38ae1a72a39/"
    }
    ],
    "mountPoints": [
    {
    "containerPath": "/cromwell_root",
    "sourceVolume": "local-disk"
    }
    ],
    "ulimits": []
    }
    }

    Problematic Shard Job Definition:-------->

    {
    "jobDefinitionName": "ToBam_to_bam_workflow-ToBam_Processing_BaseRecalibrator",
    "jobDefinitionArn": "job-definition/ToBam_to_bam_workflow-ToBam_Processing_BaseRecalibrator:756",
    "revision": 756,
    "status": "ACTIVE",
    "type": "container",
    "parameters": {},
    "containerProperties": {
    "image": "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135",
    "vcpus": 1,
    "memory": 6000,
    "command": [
    "gzipdata",
    "/bin/bash",
    "-c",
    "H4sIAAAAAAAAAOXda49j15Xe8ff1KZgejW15UF3c97000QRqy7bk0cW6xPCLAA12Fbu7oro0qti6TJLvnsNikdx11t7rHwGTAYIMGpa19uLhOQ8PyfVje7D/4T+dvbq8OXu1un97cnJ+sTg7v7u9/ml9dfXy7vZ2c7K5fvfp5d3HH/zu+oeLy7vF6bvFs6cdh39br+6P/3K6/nl9/n5zeXtz9mZ9d311ebN+eX958+Zq+sfq+t30j59u7354fXX701k6P88Xry/iaVy55fQf0Z2+EldOi3cphhDPX/t4dr6ajrm5fflqdX185Pe3L1bXz+dVuchy4YM7Xa6SnMYcw2l97eX0ItTV2q2KXwXZHe/F6n797Xr6r5ev7lab27uz+7eru4tTV86mq34eX61X4byGZ4vf/GaxPn97+//flX94cv72+vZiUUpZPPtgdys8O1n//O72brN4+ZdP/vbJy6//+v3nX3/13cenn/731Y+r55e3z6e26U75WPV//+VfP/38W13/7Osv/zhVt/94dvK7zi148uHJ7fvN/uKm1v+xO8L/OpvKzz/44NlifXfXW57K2+WT6x9eX76eXr4PmsM8m/61edizk83d6t3it3fXdttvF3/8++ffn2zW68VvZ3fDPNMpzdP7zcX2HK9u3/x28Z/nR/7NrznM9loOh2nPaPEvv/HTobrBnZy9v787e3O5OT97s9r8EB/+8/Rq9f7m/O3i9HT7in39bnur3i+enf797x/9+Q/fX16vv7i8vtx8nJaLXemz9erdn+7Wj2W3K//TX+8ubzZ/ulq9uf/T5c3qavHfThbNwu5A322mG/7+SfnT1aZbXm9Wl1f3j0eZrvPN+Udvzl9O/2V70VPp+j4ul8vrZw8d84h2D/t2Yb0979av13frm/P1/dmru9vVxWlT+Oz2+nZ6c767XN/cv1zd36+vX139Eurz19MDV7uDf24e/P+p9/5305vuu4ezmY5z9tUnztdSn09hvrlZXzy/eP/u6vJ8epVeXq/ufpgK91P79I/pyXZJnL6/X399d/lm+6p/83562OZy/fjCfb3YH+1u++q8vFhtVs/P73/crf5wc/vTzXeXm6n73/uFunh1f/POTf/lx/PX/25P9uXl1dX0LDcXL9106/355Zvbq4uX0w1xczF9VD6/vLlYX90/f/tm96zP3/zb/9UrfDj0y8cnfXy+hwv9YnH+9q6+/NfPfVlWt/zRvVxdbT5y/zRfkdGKHz+mDFfycCXNV+Tln78INaSgzmC/EtUZHFbccMXrlcfrCfMVt9w/SJ3ctPT4qDh+lLrY46P0Ad3jUlAneFxS13tcUhd8WPIqv2np8QzVCzUt/eUzl8QFffKHpd6jHp/LWNIH9NNp1FJq0WkclnqPerxk/Vx+f136kg+P6j/X9lFp+aMfLekX5bAUOo96fC59bxwepV+vw6P0fXhY0mmExyX9Xjgs6TfQ8VH6uQ5L+gwPB6zjR+mg4n5Jp3FY0i/lYUm/XoclnUZ6XOq8XoclfV37paivK+1fL33yh0fpeA+PSureOJyGfr3yfkk/12FJX/JhSQd1WNKvct6fYe+Aj0v6RSmPSzn8GNTS7rmyTv6wpJPfL3Wfa/dp03mVD0vGGUaV/HZp+j72qXaXHh6lMzycob43Dkv6BjgcsHfyj2noG2D/XZj1DVD3B9SXvF8qvQM+LunkDwfU13VY0td1OKB+lQ8nr2+2/Tds57kOSzr5/Td26R3wcUm/vw5LOvnDko73cBq9Rz0u6eS3B5QoRd+ifj8DlDI/Q7+fAXS8hyX9eh2X1Kvs99/mRd3Yfv/lqy/5uKTiPR5QOgfcLVX1XXlccuPTULeN33+16RnguKRum+OSzvCwpF+Uw5J6Kf3+e7mqb/Pjkp8v/f3xeFW9JocV/cbb309Vv4UOS+O3UB3fu7X3FnpcGt/WtXdbPy6N365VfzTsl0R/DB2WdBr7FX1j7Ff0nbtfUSnthxBRSexHBlFB7L+PReVwWFEx5P03ibrT9x/uMlKTLFUGh5XhdC/LoQlkOZwWZTmcWWQ5HJ1kOfwOlGXv+3a3pAF5fFTv6/HxC3z4pSrL3vfS43MNv3zE9d5vjx/S+pN4CupFziHrM/z7/njDd+J08uOl4TtRNICPS8N3orjhO1G0zw9LGujHpeHnkvjh55JoIYf9igoj7ldUFod3j/o83b+KHSrub/cOMA8vsP76278T/Di/jo73Sx1uH5bG+XX8fljqTzLbMJbqq3YfrUbkIUAtz0MW48ONXw89Mu8Pp2V8OJx6qQ6H689mD4dTn5uHw6mX93C4PD6cSvwRPqUztB8e1Avv8STUo/744m/Hf/mvNy//9avp4PnhQ/RifX77S2+xWIvVWhRj8eFzdrjorEVvLQZrMVqLyVq0EnJWQs5KyFkJeSshbyXkrYS8lZC3EvJWQt5KyFsJeSshbyUUrISClVCwEgpWQsFKKFgJBSuhYCUUrISClVC0EopWQtFKKFoJRSuhaCUUrYSilVC0EopWQslKKFkJJSuhZCWUrISSlVCyEkpWQslKKFkJZSuhbCWUrYSylVC2EspWQtlKKFsJZSuhbCVUrISKlVCxEipWQsVKqFgJFSuhYiVUrISKlVC1EqpWQtVKqFoJVSuhaiVUrYSqlVC1EqpWQmIlJFZCYiUkVkJiJSRWQmIlJFZCYiUkRkJlaSRUlkZCZWkkVJZGQmVpJFSWRkLFmqmLNVMXa6Yu1kxdrJm6WDN1sWbqYs3UxZqpizVTF2umLtZMXayZulgzdbFm6mLN1MWaqYs1Uxdrpi7WTF2smbpYM3WxZupizdTFmqmLNVMXa6Yu1kxdrJm6WDN1sWbqYs3UxZqpizVTF2umLtZMXayZulgzdbFm6mLN1MWaqYs1Uxdrpi7WTF2smbpYM3WxZupizdTFmqmLNVMXa6Yu1kxdrJm6WDN1sWbqYs3UxZqpizVTF2umLtZMXayZulgzdbFm6mLN1MWaqYs1Uxdrpi7WTF2smbpYM3WxZupizdTFmqmLNVMXa6Yu1kxdrJm6WDN1sWbqYs3UxZqpizVTF2umLtZMXayZulgzdbFm6mLN1MWaqYs1Uxdrpi7WTF2smbpYM3W1ZupqzdTVmqmrNVNXa6au1kxdrZm6WjN1tWbqas3U1ZqpqzVTV2umrtZMXa2ZulozdbVm6mrN1NWaqas1U1drpq7WTF2tmbpaM3W1ZupqzdTVmqmrNVNXa6au1kxdrZm6WjN1tWbqas3U1ZqpqzVTV2umrtZMXa2ZulozdbVm6mrN1NWaqas1U1drpq7WTF2tmbpaM3W1ZupqzdTVmqmrNVNXa6au1kxdrZm6WjN1tWbqas3U1ZqpqzVTV2umrtZMXa2ZulozdbVm6mrN1NWaqas1U1drpq7WTF2tmbpaM3W1ZupqzdTVmqmrNVNXa6au1kxdrZm6WjN1tWbqas3U1ZqpqzVTV2umrtZMXa2ZulozdbVm6mrN1NWaqas1U1drpq7WTF2tmbpaM3W1ZupqzdTVmqmrNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVOLNVPLYKb+y/d/+my5/f/ZXw7mzaYBj9DPuWnoZ9009PNuGvqZNw393JuGfvZNQz//Y8NgJm0aKMnBbNo0UJKDGbVpoCQHs2rTQEkOZtamgZIczK5NAyU5mGGbBkpyMMs2DZTkYKZtGijJwWzbNFCSgxm3aaAkB7Nu00BJDmbepoGSHMy+TQMlOZiBmwZKcjALNw2U5GAmbhooycFs3DRQkoMZuWmgJAezctNASQ5m5qaBkhzMzk0DJTmYoZsGSnIwSzcNlORgpm4aKMnBbN00UJKDGbtpoCQHs3bTQEkOZu6mgZIczN5NAyU5mMGbBkpyMIs3DZTkYCZvGijJwWzeNFCSgxm9aaAkB7N600BJDmb2poGSHMzuTQMlOZjhmwZKcjDLNw2U5GCmbxooSZztB7+ZNw2U5OC386aBkhz8ht40UJKD39IPDW7we3rTAEk6Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j4zgyjiPjODKOI+M4Mo4j43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+N4Mo4n43gyjifjeDKOJ+MEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjBDJOIOMEMk4g4wQyTiDjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k40QyTiTjRDJOJONEMk4k4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuMkMk4i4yQyTiLjJDJOIuNkMk4m42QyTibjZDJOJuOM9nVuGiDJ0f7OTQMlScYZ7fXcNFCSZJzRvs9NAyVJxhntAd00UJJknNF+0E0DJUnGGe0N3TRQkmSc0T7RTQMlScYZ7Rl9bCDjjPaObhooSTLOaB/ppoGSJOOM9pRuGihJMs5of+mmgZIk44z2mm4aKEkyzmjf6aaBkiTjjPagPjaQcUZ7UTcNlCQZZ7QvddNASZJxRntUNw2UJBlntF9100BJknFGe1c3DZQkGWe0j3XTQEmScUZ7Wh8byDijva2bBkqSjDPa57ppoCTJOKM9r5sGSpKMM9r/ummgJMk4o72wmwZKkowz2he7aaAkyTijPbKPDWSc0V7ZTQMlScYZ7ZvdNFCSZJzRHtpNAyVJxhntp900QJKjfbWbBkhytL920wBJjvbZbhogydF+200DJUnGGe293TRQkmSc0T7cTQMlScYZ7cndNFCSZJzR/txNAyVJxhnt1d00UJJknNG+3U0DJUnGGe3hfWwg44z28m4aKEkyzmhf76aBkiTjjPb4bhooSTLOaL/vpoGSJOOM9v5uGihJMs5oH/CmgZIk44z2BD82kHFGe4M3DZQkGWe0T3jTQEmScUZ7hjcNlCQZZ7R/eNNASZJxRnuJNw2UJBlntK9400BJknFGe4wfG8g4o73GmwZKkowz2ne8aaAkyTijPcibBkqSjDPaj7xpoCTJOKO9yZsGSpKMM9qnvGmgJMk4oz3Ljw1knNHe5U0DJUnGGe1j3jRQkmSc0Z7mTQMlScYZ7W/eNECSo33OmwZIcrTfedMASY72PW8aIMnR/udNAyVJxhnthd40UJJknNG+6E0DJUnGGe2R3jRQkmSc0X7pTQMlScYZ7Z3eNFCSZJzRPupNAyVJxhntqX5sIOOM9lZvGihJMs5on/WmgZIk44z2XG8aKEkyzmj/9aaBkiTjjPZibxooSTLOaF/2poGSJOOM9mg/NpBxRnu1Nw2UJBlntG9700BJknFGe7g3DZQkGWe0n3vTQEmScUZ7uzcNlCQZZ7TPe9NASZJxRnu+HxvIOKO935sGSpKMM9oHvmmgJMk4oz3hmwZKkowz2h++aaAkyTijveKbBkqSjDPaN75poCTJOKM95I8NZJzRXvJNAyVJxhntK980UJJknNEe800DJUnGGe033zRAkqN955sGSHK0/3zTAEmO9qFvGiDJ0X70TQMlScYZ7U3fNFCSZJzRPvVNAyVJxhntWd80UJJknNH+9U0DJUnGGe1l3zRQkmSc0b72TQMlScYZ7XF/bCDjjPa6bxooSTLOaN/7poGSJOMIGUfIOELGETKOkHGEjCNkHCHjCBlHyDhCxhEyjpBxhIwjZBwh4wgZR8g4QsYRMo6QcYSMI2QcIeMIGUfIOELGETKOkHGEjCNkHCHjCBlHyDhCxhEyjpBxhIwjZBwh4wgZR8g4QsYRMo6QcYSMI2QcIeMIGUfIOELGETKOkHGEjCNkHCHjCBlHyDhCxhEyjpBxhIwjYJxtDzXYSU5/7CSnP3aS0x87yemPneT0x05y+mMnOf2xk5z+UJJgnKmBkgTjTA2UJBhnaqAkwThTAyUJxpkaKEkwztRASYJxpgZKEowzNVCSYJypgZIE40wNlCQYZ2qgJME4UwMlCcaZGihJMM7UQEmCcaYGShKMMzVQkmCcqYGSBONMDZQkGGdqoCTBOFMDJQnGmRooSTDO1EBJgnGmBkoSjDM1UJJgnKmBkgTjTA2UJBhnaqAkwThTAyUJxpkaKEkwztRASYJxpgZKEowzNVCSYJypgZIE40wNlCQYZ2qgJME4UwMlCcaZGihJMM7UQEmCcaYGShKMMzVQkmCcqYGSBONMDZQkGGdqoCTBOFMDJQnGmRogSUfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxHxnFkHEfGcWQcR8ZxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WQcT8bxZBxPxvFkHE/G8WScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEDGCWScQMYJZJxAxglknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxIxolknEjGiWScSMaJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScRMZJZJxExklknETGSWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknEzGyWScTMbJZJxMxslknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxCxilknELGKWScQsYpZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWScSsapZJxKxqlknErGqWQcIeMIGUfIOELGETKOkHGEjCNkHCHjCBlHyDhCxhEyjpBxhIwjZBwh4wgZR8g4QsYRMo6QcYSMI2QcIeMIGUfIOELGETKOkHGEjCNkHCHjCBlHyDhCxhEyjpBxhIwjZBwh4wgZR8g4QsYRMo6QcYSMI2QcIeMIGUfIOELGETKOkHGEjCNkHCHjCBlHyDhCxhEyjpBxhIwjZBwh4wgZR8g4QsYRMo6QcYSMI2QcIeMIGUfIOELGETKOkHGEjCNkHCHjCBlHyDhCxhEyjpBxhIwjZBwh4wgZR8g4QsYRMo6QcYSMI2QcIeMIGUeUcT774pPTT36/dB8d/vTr/qvOQqhfqKpXlaAqUR9M5hXnVJOLqpJVk18+rfjBlR3q/ovBQhjUo6r73tHDoKgPm3qduVcsnaJTF+yTavJJVCmrC/E56VLWJX2sokphfsdMpaKOFYPqikmdRKzzir7CFFSwSR88lXklLzuZ6hyyOoWiDlXVcaqKRWZHDoPbM4zeeGFwe4bebRj02yh85HUpzN5HrndOu6Krqqied1tM84qb3ahTyaumtPxmXsrqYVnU2ZaoKurYsxfMh841bosyr4T69Pl8/OjwadCvzz5TjguhVw/fdKqztLZF9YmyLaqPhG2xqoqoS3Dqxt8Wne5Lfl7y6ty8S/NSVgevs7vMp16IuV9UZzaV0rySZicm/XfXsT57d20Xuq/soe7n9fj0My0sOw/fFnuHPda9qsdO8+zjJLjeYx+KPqhinFfc7Gt4Ks0vpnvSD19OTyq9d1IInRt2W3z6Lgyx99jY+VAJ6hsxquPn3s2z/QZ7+qTTh3n3vjjWfa+uTulY1/296I71UX/o1XWzvvC6fdPronPqiDoLN/t8m0pePWeZHVvmz1bUHVsGn5PH+tNnqcv+y3Kst/0vfr/97Ju9Jrti6hXzvOKSKoXQeWTUlTivPPngmSpVXcau+CTph5ILXpeiKol6oFdP+WQCeqg8Gcte/N7pN+quqFJ8LIZeUR+zzitPPpkfKn5W0W/7XVGfyfY77unFu9S7T9q6Hz0gqLp+wqTeX7ti7BVLp+jmgST1Zbsr9q7A9Z7b5V6x9DN4rHtVr51mP7uNtiX19tkWexcaeuFF9cRJxZHzvPJkOHuo1Nlr1f3Mbute1XVFHdPN763pjZXnFYlPm7ZhqLvB7z54fKfoqipmVVEBT0UXVMlHVUnzyuwjJQzeL8d6r9/ronq5Q++9si2qW2hbVPffVNy952dFH1XfvNK79x+LT+MIvY/iUHsXU5/+wjBVuvNjW38y6LcLoVfvFV3uFL1+Pp2oPPxI4OdF/VGzLYZep7raoCv1myelqOfbQ9Grosp4Wwyqoqe2tq4Oq69wW5x9BU+lqCqld0KzL8nYe8WjHv+moh5udkWvKk8/AGJ3MGrr/rvBQujUXekUvS6qd+pj0atiVBWZV5y6M6J+DSYYhq9UKc8rUVfU86U8O5DG4644O/fuvZU7HxRx/5Eybz7UZzn1PliiHs53xagqs1tCekeT+Ud56r39Uu+0U++cU+823hbVl89UnH2Rp+4vqW191B969VlzbzJN25+vnlZ6Y+NUnH3Bpt698VgMqqgDmW4MP6/MvnenSpw9ae9O2xaDqswO1fsu2xbVjZTU4LD9PVP1+KdvntS7UadicE/fUWnwbXesPw0l984794bPXOaVoq6t9M6y6or+uf0Pj3/v0C26XvFJQPtiWKpiUJWsKnVeefKt+lDx6hz0c8VZxX/U+YuFtu7ndaeOML92/+AI9cAs80pNqpJVpTytaEUeikbdD+pB1UPvIPrjoK3rg8dBMapiUpU8r7jlUpV6Z+mXX6mSE1XK6oFx2Tla7F1YVCc3P1rsvbfbuh/Uw6AeB/XUqWd9cB1UVBHPf7Z+KDlfVUlfrCtlXipLVZk9rMuktu5VvaqKdB4usxBzb/xr635QV8fxuhLnlZhnIXYHnbbue/VecX4bPxR96RRjUsVuBGUQQRlEcKjHQb37vN1i7hSzPmzsnbP+YCmdA7qo8nJVVVSPVweaf3+U2V+37kpVnXyYf7JvS/rooi4wiLo8fS1Jf7uUpz/17Cqz+0N/8R+KQRW7N033Z/VdXX/O1N5LuP2lXD3YL1VFPczrR0VVURcc5m/JqgaA+vQnj4fK7HPU9b5D3f5/ejC7xGNd99d5xQVVkXnFq+PIrEf/nLsrhnnl6d9I70qzJj2n74ppUPTz4vyS0tNfXv7w+MvprCL5m6clPeTvivrkcuc+c4OPXjf46D3Wg6rPKrMf1z795hP3+84xD3X9RpotGY8K46XYW9L342yp+1x6uJstdR+VBg8pneKTkflQ1A/3vWOOLir0TkwPsLv6/kKsJX202Ct2p5fZUvdR+tXcv6/6h0rjQ2li75c6qeq30qffvNin3a+roF64wf+EZrbkx0uhtzR6IlzqPpH+y7TDkrpdX4xfyhfjl/LF/qUcPqr3RnthvAr6M+1QV3fyY302+n767YvDR1C3Pj/+t+NX89vxq/mwpEWxq3c/b2dL+mhV/Sy6q8vTAeCh6PQvcrv64Hnd4EmdU98Xu3rn7fBYf/J1tSt2boB9vZf29tt4fgc+1lP3ZPp35tMlfWndv36dLcXeUvec+/f50yV9Dsf7+eTDxeJfFs8+uH2/kYssFz64Zwu/razv7g6Vk/X529vFB/9laj07v7u9/ml9dfXy7vZ2c/Zidb/+dn2+urp8dbfa3N6dunJ6d/588/Pm+eb63cnvTv5hsbq4WKwWz19fXq0XlzeL9Y/ru18W6+t3m18WF5d36/PpUb8sNreL16vzy6vLzWqzbuoX66vb7dH/bbW5vL1ZTH82b9eL86vb9xcn5xezszl5fXlzsXi+ON388m46yOJ09yyn7+4ubzbLxf9c/Ly6e3O/OF0uTj9f/OP0nO/P3y7+8ezh1E4+nE5WH/H+l5vzk5Np8frHX3fp/+fdJ7t0n335+Zd/PP3b+u5+utCPFu758uQPtzeb9c3m9Pvpcj5aXL+/2ly+W91tzlZXm/XdzZTIj+t/Xry6fX9zsbr75eNny5VfvvLrXFax1pWTC3l18erVhY/rVw+bRD87OTk9pabZk27WP2/O3l2tLm8OC59e3r+7vb/cPJzmarNZnb+9nur/vNjGeLO6Xn/8bHdhz06enZyvNr8iiV0Q/0Eneb+5mO76X3mijw+6un3zH36y0xvy15/s9kG/5mRPT6c3+8+Xm8UHv/tVL92HJ4vp//43xYUBWJt3AQA="
    ],
    "volumes": [
    {
    "host": {
    "sourcePath": "/cromwell_root/ToBam.to_bam_workflow/ToBam.Processing.BaseRecalibrator/9d69d231-0a59-4643-8f29-d38ae1a72a39/Some(17)/1"
    },
    "name": "local-disk"
    }
    ],
    "environment": [
    {
    "name": "AWS_CROMWELL_LOCAL_DISK",
    "value": "/cromwell_root"
    },
    {
    "name": "AWS_CROMWELL_CALL_ROOT",
    "value": "s3://germline_single_sample_workflow/5cc6dfd4-4a10-4a41-b917-72154334cf24/call-to_bam_workflow/ToBam.to_bam_workflow/9d69d231-0a59-4643-8f29-d38ae1a72a39/call-BaseRecalibrator/shard-17"
    },
    {
    "name": "AWS_CROMWELL_OUTPUTS",
    "value": "NA12878.recal_data.csv,s3:///germline_single_sample_workflow/5cc6dfd4-4a10-4a41-b917-72154334cf24/call-to_bam_workflow/ToBam.to_bam_workflow/9d69d231-0a59-4643-8f29-d38ae1a72a39/call-BaseRecalibrator/shard-17/NA12878.recal_data.csv,NA12878.recal_data.csv,local-disk /cromwell_root"
    },
    {
    "name": "AWS_CROMWELL_INPUTS_GZ",
    "value": "H4sIAAAAAAAAAOVXwY7TMBD9FT6gtuMkbZPlxCIBF1aIIq7WJHaCVceObJcWvp5pl7K71S6o1Dn1YscT6828sWeeHJ1oYBBb59edcVt6C0F9Vi0Y3XiIztO1dVsrtJXKBBF0VGG/0K0KhM9CccNY692wVcYoCJF51SmvLP5mjXcgySPDBzc4EWDUygYBIaihMT+K6okL+r3taP+TxkbPpkM2DhkSqcP61R8nwjsXX8d/5UM2q7tPArFIloi+bIIdOX4gaALOT+AuIKrtuIn7Pc8SPS6I2ql2E7WzrFd+MNoqvCW2NzjBMOJ09MDmbbuQnSxJCTzDoeSkqfmSLHM+L4uibLu8ZBiAISexsS/uFgZ6aq3lopZ5wUkG85qUi7IgVZfXRBYVKA7LHIr6Hm/lfFwdokEcdveG59Wyoki1t0pSuRmNbiEqMYBfoyHgdpzQ2ewqSV9waZ7pFl/fvgtn18pHbRAArBQ8y7L3ondGihDRAF7S3wX9rb+/5VjVZ9TN2dAXpAPdig4DgkNKdslaxgETxxQ98gHrQqIoCjFdU0SwFM1wD5PiANMeXapjS1yqR2GfulrPVPf/gk8h8Ymr9rEyUy13icX+AJlC8P9C+yoU8PRuXgnpCWR/whfCNK+Dl5PwC1pyqSEpDQAA"
    },
    {
    "name": "AWS_CROMWELL_STDERR_FILE",
    "value": "/cromwell_root/BaseRecalibrator-17-stderr.log"
    },
    {
    "name": "AWS_CROMWELL_STDOUT_FILE",
    "value": "/cromwell_root/BaseRecalibrator-17-stdout.log"
    },
    {
    "name": "AWS_CROMWELL_PATH",
    "value": "ToBam.to_bam_workflow/ToBam.Processing.BaseRecalibrator/9d69d231-0a59-4643-8f29-d38ae1a72a39/Some(17)/1"
    },
    {
    "name": "AWS_CROMWELL_RC_FILE",
    "value": "/cromwell_root/BaseRecalibrator-17-rc.txt"
    },
    {
    "name": "AWS_CROMWELL_WORKFLOW_ROOT",
    "value": "s3://germline_single_sample_workflow/5cc6dfd4-4a10-4a41-b917-72154334cf24/call-to_bam_workflow/ToBam.to_bam_workflow/9d69d231-0a59-4643-8f29-d38ae1a72a39/"
    }
    ],
    "mountPoints": [
    {
    "containerPath": "/cromwell_root",
    "sourceVolume": "local-disk"
    }
    ],
    "ulimits": []
    }
    }

  • Hi @mcovarr, @bshifaw, Would you mind guiding us on the above issue? please!

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    Thanks @ssb_cromwell I've passed this on to Amazon to determine the next steps.

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    Those long blocks of text in the command array are base 64 encoded gzipped data. I extracted your two sample commands and created the following gists. I'm not familiar with these GATK tools so I don't know if this is how they are normally used, but the command line from failed.txt has more than 3000 -L arguments. The Amazon ECS system used by the AWS backend has a hard limit of 8192 bytes for container overrides and even with compression this command line is just too long. I'm not sure if this was the intended command line and if so whether there's a more terse way of expressing this that could be used instead.

    Successful command

    Failed command

  • Thanks for response @mcovarr!

    Yes, failed commands contain a huge block of text compare to other jobs and all of them are -L arguments.
    I'm not sure how come its only happening with this Shard if you see other Shards were also created via the same BaseRecalibrator task but they are well in a limit but something went wrong with this job only.
    Somehow I don't see any control over how these Shards are being decided and created during WorkFlow runtime.

    Is there any other way to investigate it further? we are kind of stuck in here.

  • bshifawbshifaw Member, Broadie, Moderator admin
    edited January 17

    The command being generated is normal and has been shown to work locally and on gcloud. Like mcovarr mentioned, the failure might be due to AWS backend not accepting long command lines.
    The cause of this long command line is due to the subgroup variable generated by CreateSequenceGroupingTSV. Editing the python script in this task so that the output sequence subgroup is within the aws byte limit may help solve the problem.

    If you just want to check if the pipeline works on aws then you could edit the task to accept its own edited ref_dict. The task uses ref_dict to create the subgroup, so providing it with a ref_dict with some of the lines removed should help reduce the subgroup size. Much of which is from chrUn. **Again this is just to check whether the exact same pipe would work on aws, this isn't appropriate way to run the pipeline.

    Post edited by bshifaw on
  • @bshifaw meantime we are checking with AWS on limit issue.

    is it good practise to remove contents from the command or restrict it for the sake of execution? it should not have adverse impact on resultset due to removal of chrUn. Please suggest!

  • RuchiRuchi Member, Broadie, Dev admin

    @ssb_cromwell

    Instead of removing the contents of the command, it should be possible to cover the same genomic regions without declaring them all on the command line. Instead of listing out every genomic interval separated by a -L argument, one can also pass in a file, where every line in that file is a genomic interval. This shortens the command line without reducing any regions.

    For example, something like this below:

    /gatk/gatk --java-options "-Xms4000m" \
      BaseRecalibrator \
      -R ref_fasta \
      -I /cromwell_root/example.hg38.aligned.duplicate_marked.sorted.bam \
      --use-original-qualities \
      --known-sites /cromwell_root/bundle/hg38/Homo_sapiens_assembly38.dbsnp138.vcf \
      --known-sites /cromwell_root/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
      --known-sites /cromwell_root/bundle/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz \
      -L chr8_KI270810v1_alt:1+ -L chr8_KI270819v1_alt:1+ -L chr8_KI270820v1_alt:1+ -L chr8_KI270817v1_alt:1+ -L chr8_KI270816v1_alt:1+ -L chr8_KI270815v1_alt:1+ -L chr9_GL383539v1_alt:1+ -L chr9_GL383540v1_alt:1+ -L chr9_GL383541v1_alt:1+ -L chr9_GL383542v1_alt:1+ -L chr9_KI270823v1_alt:1+ -L chr10_GL383545v1_alt:1+ -L chr10_KI270824v1_alt:1+ -L...
    

    ...can be replaced with:

    /gatk/gatk --java-options "-Xms4000m" \
      BaseRecalibrator \
      -R ref_fasta \
      -I /cromwell_root/example.hg38.aligned.duplicate_marked.sorted.bam \
      --use-original-qualities \
      --known-sites /cromwell_root/bundle/hg38/Homo_sapiens_assembly38.dbsnp138.vcf \
      --known-sites /cromwell_root/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
      --known-sites /cromwell_root/bundle/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz \
      -L /cromwell_root/genomics_intervals.list
    

    ...where genomics_interavls.list looks like:

    chr8_KI270810v1_alt:1+ 
    chr8_KI270819v1_alt:1+
    chr8_KI270820v1_alt:1+
    chr8_KI270817v1_alt:1+ 
    ...
    

    If you're using the version of the WDL in the gatk-workflows repo on Github, then you'll have to replace this line:
    https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline/blob/master/tasks_pipelines/bam_processing.wdl#L168

    With -L ${write_lines(sequence_group_interval)}

    Thanks

  • @Ruchi,

    Do you mean I just need to do "-L ${write_lines(sequence_group_interval)}" modification to BaseRecalibrator?
    Or this one as well "-L /cromwell_root/genomics_intervals.list"?

  • Hi @Ruchi,

    We edited bam_processing.wdl for BaseRecalibrator task as suggested above.
    And repalced below line:
    -L ${write_lines(sequence_group_interval)}

    But somehow it started throwing this error for the Shards while triggering BaseRecal, can you please suggest on this? Let me know for any other information required.

    ........................................................................
    [2019-01-18 03:39:30,82] [info] 08445840-c69b-4bf6-84b0-9a1f0566050d-SubWorkflowActor-SubWorkflow-to_bam_workflow:-1:1 [08445840]: Starting to_bam_workflow.CheckContamination, to_bam_workflow.BaseRecalibrator (18 shards)
    [2019-01-18 03:39:31,20] [warn] AwsBatchAsyncBackendJobExecutionActor [08445840to_bam_workflow.BaseRecalibrator:5:1]: Unrecognized runtime attribute keys: preemptible
    [2019-01-18 03:39:31,29] [error] Failed command instantiation
    java.lang.Exception: Failed command instantiation
    at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:565)
    at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand$(StandardAsyncExecutionActor.scala:500)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.instantiatedCommand$lzycompute(AwsBatchAsyncBackendJobExecutionActor.scala:74)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.instantiatedCommand(AwsBatchAsyncBackendJobExecutionActor.scala:74)
    at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents(StandardAsyncExecutionActor.scala:313)
    at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents$(StandardAsyncExecutionActor.scala:312)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.commandScriptContents(AwsBatchAsyncBackendJobExecutionActor.scala:74)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.batchJob$lzycompute(AwsBatchAsyncBackendJobExecutionActor.scala:132)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.batchJob(AwsBatchAsyncBackendJobExecutionActor.scala:131)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.executeAsync(AwsBatchAsyncBackendJobExecutionActor.scala:342)
    at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:943)
    at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:935)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.executeOrRecover(AwsBatchAsyncBackendJobExecutionActor.scala:74)
    at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
    at cromwell.core.retry.Retry$.withRetry(Retry.scala:38)
    at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)
    at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)
    at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
    at akka.actor.Actor.aroundReceive(Actor.scala:517)
    at akka.actor.Actor.aroundReceive$(Actor.scala:515)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.aroundReceive(AwsBatchAsyncBackendJobExecutionActor.scala:74)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
    at akka.actor.ActorCell.invoke(ActorCell.scala:557)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: common.exception.AggregatedMessageException: Error(s):
    Could not evaluate expression: write_lines(sequence_group_interval): Access Denied (Service: S3Client; Status Code: 403; Request ID: 39541BB2581F1DF7)
    at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:68)
    at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:64)
    at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:563)
    ... 31 common frames omitted
    [2019-01-18 03:39:33,20] [warn] AwsBatchAsyncBackendJobExecutionActor [08445840to_bam_workflow.BaseRecalibrator:4:1]: Unrecognized runtime attribute keys: preemptible
    [2019-01-18 03:39:33,25] [error] Failed command instantiation
    java.lang.Exception: Failed command instantiation
    at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:565)
    at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand$(StandardAsyncExecutionActor.scala:500)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.instantiatedCommand$lzycompute(AwsBatchAsyncBackendJobExecutionActor.scala:74)
    at cromwell.backend.impl.aws.AwsBatchAsyncBackendJobExecutionActor.instantiatedCommand(AwsBatchAsyncBackendJobExecutionActor.scala:74)
    at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents(StandardAsyncExecutionActor.scala:313)
    at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents$(StandardAsyncExecutionActor.scala:312)

  • RuchiRuchi Member, Broadie, Dev admin

    Hey @ssb_cromwell ,

    Any chance you can post the exec.sh for any one of the failed shards? Thanks!

  • Hi @Ruchi,

    Are you saying BaseRecalibrator command exeucted for any Shard? Earlier it used to print full command which is being submitted as a part of BaseRecal Task but after our changes I dont see any command printed in logs and it directly started throwing above mentioned error.

    Earlier versions of execution used to be like as below:
    [2019-01-15 20:02:39,99] [info] AwsBatchAsyncBackendJobExecutionActor [e2605985to_bam_workflow.BaseRecalibrator:5:1]: /usr/gitc/gatk4/gatk-launch --javaOptions "-XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:+PrintFlagsFinal \
    -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails \
    -Xloggc:gc_log.log -Xms4000m" \
    BaseRecalibrator \
    -R /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.fasta \
    -I /cromwell_root/cromwelleast/cromwell-execution/germline_single_sample_workflow/dcad01b4-5317-4c03-8885-436352bd6459/call-to_bam_workflow/ToBam.to_bam_workflow/e2605985-0767-45bb-9698-82279a8e167d/call-SortSampleBam/NA12878.aligned.duplicate_marked.sorted.bam \
    --useOriginalQualities \
    -O NA12878.recal_data.csv \
    -knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.dbsnp138.vcf \
    -knownSites /cromwell_root/cromwelleast/references/broad-references/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -knownSites /cromwell_root/cromwelleast/references/broad-references/Homo_sapiens_assembly38.known_indels.vcf.gz \
    -L chr6:1+
    [2019-01-15 20:02:39,99] [info] Submitting job to AWS Batch
    [2019-01-15 20:02:39,99] [info] dockerImage: us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135

  • Hi @Ruchi , Could you please update us on it?

  • bshifawbshifaw Member, Broadie, Moderator admin

    Hey @ssb_cromwell ,
    ${write_lines(Array)} isn't currently functional in AWS. I spoke with the dev team and it turns out there is a ticket to increase the character limit for commands in AWS which would fix the underlying problem. The plan is to have this fix released in about 2 weeks. The other option would be to edit the python code to limit the length of intervals being produced by CreateSequenceGroupingTSV task.

  • bshifawbshifaw Member, Broadie, Moderator admin

    One of the devs suggest editing the CreateSequenceGroupingTSV task like so:

    # Generate sets of intervals for scatter-gathering over chromosomes
    task CreateSequenceGroupingTSV {
      File ref_dict
      Int preemptible_tries
    
      # Use python to create the Sequencing Groupings used for BQSR and PrintReads Scatter.
      # It outputs to stdout where it is parsed into a wdl Array[Array[String]]
      # e.g. [["1"], ["2"], ["3", "4"], ["5"], ["6", "7", "8"]]
      command <<<
        python <<CODE
        with open("${ref_dict}", "r") as ref_dict_file:
            sequence_tuple_list = []
            longest_sequence = 0
            for line in ref_dict_file:
                if line.startswith("@SQ"):
                    line_split = line.split("\t")
                    # (Sequence_Name, Sequence_Length)
                    sequence_tuple_list.append((line_split[1].split("SN:")[1], int(line_split[2].split("LN:")[1])))
            longest_sequence = sorted(sequence_tuple_list, key=lambda x: x[1], reverse=True)[0][1]
        # We are adding this to the intervals because hg38 has contigs named with embedded colons and a bug in GATK strips off
        # the last element after a :, so we add this as a sacrificial element.
        hg38_protection_tag = ":1+"
        # initialize the tsv string with the first sequence
        tsv_string = sequence_tuple_list[0][0] + hg38_protection_tag
        temp_size = sequence_tuple_list[0][1]
        current_line_len = 0
        for sequence_tuple in sequence_tuple_list[1:]:
            if temp_size + sequence_tuple[1] <= longest_sequence and current_line_len < 5000:
                temp_size += sequence_tuple[1]
                tsv_string += "\t" + sequence_tuple[0] + hg38_protection_tag
                current_line_len += len(sequence_tuple[0] + hg38_protection_tag)
            else:
                tsv_string += "\n" + sequence_tuple[0] + hg38_protection_tag
                current_line_len = len(sequence_tuple[0] + hg38_protection_tag)
                temp_size = sequence_tuple[1]
        # add the unmapped sequences as a separate line to ensure that they are recalibrated as well
        with open("sequence_grouping.txt","w") as tsv_file:
          tsv_file.write(tsv_string)
          tsv_file.close()
    
        tsv_string += '\n' + "unmapped"
    
        with open("sequence_grouping_with_unmapped.txt","w") as tsv_file_with_unmapped:
          tsv_file_with_unmapped.write(tsv_string)
          tsv_file_with_unmapped.close()
        CODE
      >>>
      runtime {
        preemptible: preemptible_tries
        docker: "python:2.7"
        memory: "2 GB"
      }
      output {
        Array[Array[String]] sequence_grouping = read_tsv("sequence_grouping.txt")
        Array[Array[String]] sequence_grouping_with_unmapped = read_tsv("sequence_grouping_with_unmapped.txt")
      }
    }
    

    They mentioned you can change the 5000 to a larger number to get less extra shards.

  • Thank you so much for response @bshifaw,

    Based on your suggestion, we have used above updated code but changed the length to 6000 instead of 5000.

    This time it created around 32 Shard's and haven't thrown Limit issue.

    But to surprise, 10 Shard's failed after executing for sometime but remaining 22 Shard's were successful.
    When I check these 10 Shard's job failure reason it was mentioned as "Host EC2 (instance i-0EEE65160c66e) terminated." But later on, overall workflow failed with below-mentioned reason which implies these Shards were never able to copy/create files:

    cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-7/BaseRecalibrator-7-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-7/BaseRecalibrator-7-rc.txt
    Caused by: java.io.IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-7/BaseRecalibrator-7-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-7/BaseRecalibrator-7-rc.txt
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:146)
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:145)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
    at scala.util.Failure.recoverWith(Try.scala:232)
    at cromwell.engine.io.nio.NioFlow.withReader(NioFlow.scala:145)
    at cromwell.engine.io.nio.NioFlow.limitFileContent(NioFlow.scala:154)
    at cromwell.engine.io.nio.NioFlow.$anonfun$readAsString$1(NioFlow.scala:98)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:85)
    at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:336)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:357)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:303)
    at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: java.nio.file.NoSuchFileException: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-7/BaseRecalibrator-7-rc.txt
    at org.lerch.s3fs.S3FileSystemProvider.newInputStream(S3FileSystemProvider.java:350)
    at java.nio.file.Files.newInputStream(Files.java:152)
    at better.files.File.newInputStream(File.scala:337)
    at cromwell.core.path.BetterFileMethods.newInputStream(BetterFileMethods.scala:240)
    at cromwell.core.path.BetterFileMethods.newInputStream$(BetterFileMethods.scala:239)
    at cromwell.filesystems.s3.S3Path.newInputStream(S3PathBuilder.scala:156)
    at cromwell.core.path.EvenBetterPathMethods.mediaInputStream(EvenBetterPathMethods.scala:94)
    at cromwell.core.path.EvenBetterPathMethods.mediaInputStream$(EvenBetterPathMethods.scala:91)
    at cromwell.filesystems.s3.S3Path.mediaInputStream(S3PathBuilder.scala:156)
    at cromwell.engine.io.nio.NioFlow.$anonfun$withReader$1(NioFlow.scala:145)
    at cromwell.util.TryWithResource$.$anonfun$tryWithResource$1(TryWithResource.scala:14)
    at scala.util.Try$.apply(Try.scala:209)
    at cromwell.util.TryWithResource$.tryWithResource(TryWithResource.scala:10)
    ... 14 more

    cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-21/BaseRecalibrator-21-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-21/BaseRecalibrator-21-rc.txt
    Caused by: java.io.IOException: Could not read from s3://cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-21/BaseRecalibrator-21-rc.txt: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-21/BaseRecalibrator-21-rc.txt
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:146)
    at cromwell.engine.io.nio.NioFlow$$anonfun$withReader$2.applyOrElse(NioFlow.scala:145)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
    at scala.util.Failure.recoverWith(Try.scala:232)

  • bshifawbshifaw Member, Broadie, Moderator admin

    Was there any log files (stderr.log) written in shard folders that failed? Example: s3://s3.amazonaws.com/cromwelleast/cromwell-execution/germline_single_sample_workflow/f3607d43-46fa-401a-8f7b-6c944564257f/call-to_bam_workflow/ToBam.to_bam_workflow/ef764ead-efc7-4f62-bdaf-bab83da63dde/call-BaseRecalibrator/shard-7/
    Also, was there any error messages in the CreateSequenceGroupingTSV log files?

  • Hi @bshifaw,

    No, there was no folder created for failed shards so we can't get those files.

    Also, I checked CreateSequenceGroupingTSV logs, they are 0B files and don't see any entries there.

    Please suggest!

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    @ssb_cromwell are there possibly any earlier log messages for these failed shards? The error message here is for a failure to read the rc file which makes me wonder if the job was ever started at all.

  • @bshifaw, I agree with you. Above exception is not our concern because this is a generic exception which will be thrown if a folder is not created or file does not exist on said location. We are well aware of this.

    Our problem is this time it hasn't thrown limit issue and created around 32 Shards...22 out of them executed with success and other 10 Shards failed after few mins of getting started with reason as "Host EC2 (instance i-0EEE65160c66e) terminated."

    This is where I'm just wondering as what could be the reason? is this large number of Shards or number of jobs exceeded a certain limit on possible Shards...or anything else?

  • Hi @bshifaw, I understand there is a way to less information available for you to suggest further but is there any way we can try something out here? we are feeling a bit stuck at this stage!

  • bshifawbshifaw Member, Broadie, Moderator admin

    Excuse the late response, it's been a bit busy.
    Not sure about the message above, it seems AWS is terminating the task instances but I"m not certain why given the above message. Normally I would suggest using the log files for clues as to why but looks like the log folders weren't created in your case. I don't have experience with AWS but on google cloud if the task instance is limited in some way memory, disk, time, etc it would automatically get terminated. Maybe this is what happened here, maybe check what instance i-0EEE65160c66e is used for and why it was terminated. Again you still have the option to wait for the cromwell character limit fix, which would fix the underlying problem.

  • thanks @bshifaw, let me see if can make out something from further search on instance termination.
    Earlier you mentioned it will take about two weeks for char limit issue. Anything positive moved on that side? Would you mind providing me heads once released with fix?

  • bshifawbshifaw Member, Broadie, Moderator admin

    Hi @ssb_cromwell ,
    The dev team have merged the ticket that should fix the char limit, though the release may not occur in the coming week. You can get the latest release of cromwell here once it arrives.

  • Thanks @bshifaw, I'll keep an eye on release.

  • @bshifaw, Cromwell team has released new version 37.

    While running it started throwing S3 Access denied issue but when I reverted and try running version 36, there is no issue related to S3 Access while creating jobs.

    Do we need to do any changes on IAM to execute this new version? Below exception for your reference

    [2019-02-08 19:15:35,41] [error] AwsBatchAsyncBackendJobExecutionActor [25b4c8b2germline_single_sample_workflow.ScatterIntervalList:NA:1]: Error attempting to Execute
    cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: E872C7B5C2A10C5D)
    Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: E872C7B5C2A10C5D)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleErrorResponse(HandleResponseStage.java:115)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleResponse(HandleResponseStage.java:73)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:58)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:41)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)

  • @SaloniShah, any idea on the above exception? it got introduced in new release Cromwell-37.jar when I tried running 36 that does not throw such error. Please suggest!

  • SaloniShahSaloniShah Member

    Hi @ssb_cromwell, we updated the S3 libraries (because of security reasons) and had to adjust some method calls in 37. Will it be possible to share your WDL? Also, can you please make sure that the AWS account that is being used to read from the S3 bucket has access to data in that bucket? We would like to rule out that something with the data changed which might have caused permission issues.

  • Hi @SaloniShah , Below is the execution pattern while a job is being submitted to AWS Batch.

    First one is for cromwell36 which was submitted successfully without any issues. Later is cromwell37 which caused failure due to S3AccessDenied exception. Please let us know what is changed between these two submissions which have triggered this error.

    PS: we are using the same cloudformation stack for both of these executions which are having the same permissions and all.


    [2019-02-14 20:14:05,63] [info] WorkflowExecutionActor-4d75b8d5-9fce-4564-a8f4-3ab964b559fe [4d75b8d5]: Starting germline_single_sample_workflow.ScatterIntervalList
    [2019-02-14 20:14:06,72] [info] aaa360a6-abd6-4c95-956b-313be6af99a5-SubWorkflowActor-SubWorkflow-to_bam_workflow:-1:1 [aaa360a6]: Starting to_bam_workflow.GetBwaVersion
    [2019-02-14 20:14:08,04] [info] AwsBatchAsyncBackendJobExecutionActor [4d75b8d5germline_single_sample_workflow.ScatterIntervalList:NA:1]: set -e
    mkdir out
    java -Xms1g -jar /usr/gitc/picard.jar \
    IntervalListTools \
    SCATTER_COUNT=50 \
    SUBDIVISION_MODE=BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW \
    UNIQUE=true \
    SORT=true \
    BREAK_BANDS_AT_MULTIPLES_OF=1000000 \
    INPUT=/cromwell_root/cromwelleast/references/broad-references/wgs_calling_regions.hg38.interval_list \
    OUTPUT=out

    python3 <<CODE
    import glob, os

    Works around a JES limitation where multiples files with the same name overwrite each other when globbed

    intervals = sorted(glob.glob("out//.interval_list"))
    for i, interval in enumerate(intervals):
    (directory, filename) = os.path.split(interval)
    newName = os.path.join(directory, str(i + 1) + filename)
    os.rename(interval, newName)
    print(len(intervals))
    CODE
    [2019-02-14 20:14:08,08] [info] aaa360a6-abd6-4c95-956b-313be6af99a5-SubWorkflowActor-SubWorkflow-to_bam_workflow:-1:1 [aaa360a6]: Starting to_bam_workflow.CreateSequenceGroupingTSV
    [2019-02-14 20:14:08,33] [info] Submitting job to AWS Batch
    [2019-02-14 20:14:08,33] [info] dockerImage: us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135
    [2019-02-14 20:14:08,33] [info] jobQueueArn: arn:aws:batch:us-east-1:128842882846:job-queue/GenomicsDefaultQueue-8b70e89c3cff083
    [2019-02-14 20:14:08,34] [info] taskId: germline_single_sample_workflow.ScatterIntervalList-None-1
    [2019-02-14 20:14:08,34] [info] hostpath root: germline_single_sample_workflow/Utils.ScatterIntervalList/4d75b8d5-9fce-4564-a8f4-3ab964b559fe/None/1


    Here is failure one:---->

    [2019-02-14 20:16:01,01] [info] WorkflowExecutionActor-9b5ba1a7-ecf1-452c-9ef7-5159ddace206 [9b5ba1a7]: Starting germline_single_sample_workflow.ScatterIntervalList
    [2019-02-14 20:16:01,35] [info] Assigned new job execution tokens to the following groups: 9b5ba1a7: 1
    [2019-02-14 20:16:01,90] [info] AwsBatchAsyncBackendJobExecutionActor [9b5ba1a7germline_single_sample_workflow.ScatterIntervalList:NA:1]: set -e
    mkdir out
    java -Xms1g -jar /usr/gitc/picard.jar \
    IntervalListTools \
    SCATTER_COUNT=50 \
    SUBDIVISION_MODE=BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW \
    UNIQUE=true \
    SORT=true \
    BREAK_BANDS_AT_MULTIPLES_OF=1000000 \
    INPUT=/cromwell_root/cromwelleast/references/broad-references/wgs_calling_regions.hg38.interval_list \
    OUTPUT=out

    python3 <<CODE
    import glob, os

    Works around a JES limitation where multiples files with the same name overwrite each other when globbed

    intervals = sorted(glob.glob("out//.interval_list"))
    for i, interval in enumerate(intervals):
    (directory, filename) = os.path.split(interval)
    newName = os.path.join(directory, str(i + 1) + filename)
    os.rename(interval, newName)
    print(len(intervals))
    CODE
    [2019-02-14 20:16:02,10] [info] 160976ba-8a49-421f-be7a-11089fc4ce4e-SubWorkflowActor-SubWorkflow-to_bam_workflow:-1:1 [160976ba]: Starting to_bam_workflow.GetBwaVersion
    [2019-02-14 20:16:02,23] [error] AwsBatchAsyncBackendJobExecutionActor [9b5ba1a7germline_single_sample_workflow.ScatterIntervalList:NA:1]: Error attempting to Execute
    cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: 140EF72C1256A139)
    Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: 140EF72C1256A139)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleErrorResponse(HandleResponseStage.java:115)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleResponse(HandleResponseStage.java:73)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:58)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:41)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:63)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:36)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:77)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:39)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.doExecute(RetryableStage.java:115)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.execute(RetryableStage.java:88)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:64)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:44)

  • @SaloniShah, did you get a chance to look into the above two executions? one from version-36 which is successfully submitted a job and other is version-37 which has thrown S3 issue but both executed under the same CloudFormation stack.

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @ssb_cromwell I let her know that you had posted your execution outputs and she is looking into it and will get back to you when she has an update! Thanks for your patience.

  • thanks @SChaluvadi, sorry for being on top of it. Actually, we were doing some critical implementation and stuck due to this issue. So we need your urgent help here!

  • RuchiRuchi Member, Broadie, Dev admin

    @ssb_cromwell

    We've been completing some time-sensitive work, apologies for the delay!

    So we run tests every night on the AWS backend and I've noticed we don't have the permissions issue. It seems possible that there are certain AWS libraries we use that were updated, and thus the permissions issues are there in Cromwell 37 and not 36. Is it possible for you to give yourself (or whatever user account you use to run Cromwell) and give it Administrator access to get past this bump? Its a temporary solution but would get you unblocked quickly.

    Thanks!

  • Thanks for your response! @Ruchi

    BTW, sometimes below error can also be thrown if required file is not present at the destination. I have checked this input file is very much available. Are you aware if this is looking for some another internal file for version #37? Please suggest!

    INPUT=/cromwell_root/cromwelleast/references/broad references/wgs_calling_regions.hg38.interval_list \
    Error attempting to Execute cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: 140EF72C1256A139) Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: 140EF72C1256A139) at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleErrorResponse(HandleResponseStage.java:115)

  • Hi @Ruchi , we tried with full permission but still throwing the same issue. We are really blocked due to this issue, I'm not sure how new version suddenly changed the way of execution. Please help us!

  • RuchiRuchi Member, Broadie, Dev admin

    Hey @wleepang, any ideas on what else to try? I assumed admin permissions would have allowed for anything, but maybe the permissions are a red herring?

  • Thanks @Ruchi, I just tried it running with new temporary version v36.1 but seems like AWS batch char limit fix is not included in this version. Though it does not throw the above S3 Error.

    Hi @wleepang, we need your urgent help here, let me know in case you need any other information.

  • wleepangwleepang Member
    @ssb_cromwell - have you looked at the job proxy logs in cloudwatch? There could potentially be more information as to why you can't access your S3 bucket from the job. Find the failed job log and append "-proxy" to the url. The thought is to isolate if there is something missing on the job instance itself vs something with the underlying library.
  • ssb_cromwellssb_cromwell Member
    edited March 4

    @wleepang -

    I'm not sure but there are no traces in CloudWatch, it is getting failed as soon as it tries to submit a batch job. I can only get below error trace.

    As explained in the above logs, there is no such error while running version36 but this error occurs for version37. I'm trying to use version37 as it allows large text commands for AWS Batches which is failing for version36.

    Do let me know if any other information required related to this, but need your urgent attention on this.

        [2019-03-04 18:24:04,58] [info] WorkflowExecutionActor-4188892d-f797- [4188892d]: Starting germline_single_sample_workflow.ScatterIntervalList
        [2019-03-04 18:24:04,92] [info] Assigned new job execution tokens to the following groups: 4188892d: 1
        [2019-03-04 18:24:05,65] [info] f-SubWorkflowActor-SubWorkflow-to_bam_workflow:-1:1 [f7b18482]: Starting to_bam_workflow.GetBwaVersion
        [2019-03-04 18:24:06,52] [info] AwsBatchAsyncBackendJobExecutionActor [4188892dgermline_single_sample_workflow.ScatterIntervalList:NA:1]: set -e
        mkdir out
        java -Xms1g -jar /usr/gitc/picard.jar \
          IntervalListTools \
          SCATTER_COUNT=50 \
          SUBDIVISION_MODE=BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW \
          UNIQUE=true \
          SORT=true \
          BREAK_BANDS_AT_MULTIPLES_OF=1000000 \
          INPUT=/cromwell_root/cromwelleast/references/broad-references/wgs_calling_regions.hg38.interval_list \
          OUTPUT=out
    
        python3 <<CODE
        import glob, os
        #\Works around a JES limitation where multiples files with the same name overwrite each other when globbed
        intervals = sorted(glob.glob("out/*/*.interval_list"))
        for i, interval in enumerate(intervals):
          (directory, filename) = os.path.split(interval)
          newName = os.path.join(directory, str(i + 1) + filename)
          os.rename(interval, newName)
        print(len(intervals))
        CODE`
    [2019-03-04 18:24:06,82] [error] AwsBatchAsyncBackendJobExecutionActor [4188892dgermline_single_sample_workflow.ScatterIntervalList:NA:1]: Error attempting to Execute
    cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: 1E48E85DD18E0C32)
    Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: 1E48E85DD18E0C32)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleErrorResponse(HandleResponseStage.java:115)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleResponse(HandleResponseStage.java:73)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:58)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:41)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:63)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:36)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:77)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:39)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.doExecute(RetryableStage.java:115)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.execute(RetryableStage.java:88)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:64)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:44)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)`
        `at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:51)
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:33)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:79)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
        at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:240)
        at software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:96)
        at software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:120)
        at software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:73)
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:44)
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
        at software.amazon.awssdk.services.s3.DefaultS3Client.listBuckets(DefaultS3Client.java:2029)
        at software.amazon.awssdk.services.s3.S3Client.listBuckets(S3Client.java:3295)
        at org.lerch.s3fs.S3FileStore.getBucket(S3FileStore.java:93)
        at org.lerch.s3fs.S3FileStore.getBucket(S3FileStore.java:89)
        at org.lerch.s3fs.S3FileSystemProvider.createDirectory(S3FileSystemProvider.java:381)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:727)
        at better.files.File.createDirectories(File.scala:182)
        at cromwell.core.path.BetterFileMethods.createDirectories(BetterFileMethods.scala:99)
        at cromwell.core.path.BetterFileMethods.createDirectories$(BetterFileMethods.scala:98)
        at cromwell.filesystems.s3.S3Path.createDirectories(S3PathBuilder.scala:158)
        at cromwell.engine.io.nio.NioFlow.createDirectories(NioFlow.scala:139)
        at cromwell.engine.io.nio.NioFlow.$anonfun$write$1(NioFlow.scala:88)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:87)
        at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:351)
        at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:372)
        at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:312)
        at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    [2019-03-04 18:24:06,91] [info] Assigned new job execution tokens to the following groups: 4188892d: 1
    
  • ssb_cromwellssb_cromwell Member

    Hi, @wleepang @Ruchi any updates on this please, we are not able to move forward!

  • ssb_cromwellssb_cromwell Member

    Hi @wleepang, @Ruchi, @bshifaw, I tried running a workflow with the latest release version-38 but still it is throwing the same error of AWS. Its been long now and we are not able to resolve this issue.

    Can you please help us on this?

        [2019-03-13 19:30:58,32] [error] AwsBatchAsyncBackendJobExecutionActor [a1eac186to_bam_workflow.GetBwaVersion:NA:1]: Error attempting to Execute
        cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: 086745DCCB1C8EE5)
        Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: 086745DCCB1C8EE5)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleErrorResponse(HandleResponseStage.java:115)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleResponse(HandleResponseStage.java:73)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:58)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:41)
            at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:63)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:36)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:77)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:39)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.doExecute(RetryableStage.java:115)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.execute(RetryableStage.java:88)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:64)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:44)
            at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)
            at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:51)
            at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:33)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:79)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
            at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)
            at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:205)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
            at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
            at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:240)
            at software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:96)
            at software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:120)
            at software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:73)
            at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:44)
            at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
            at software.amazon.awssdk.services.s3.DefaultS3Client.listBuckets(DefaultS3Client.java:2029)
            at software.amazon.awssdk.services.s3.S3Client.listBuckets(S3Client.java:3295)
            at org.lerch.s3fs.S3FileStore.getBucket(S3FileStore.java:93)
            at org.lerch.s3fs.S3FileStore.getBucket(S3FileStore.java:89)
            at org.lerch.s3fs.S3FileSystemProvider.createDirectory(S3FileSystemProvider.java:381)
            at java.nio.file.Files.createDirectory(Files.java:674)
            at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
            at java.nio.file.Files.createDirectories(Files.java:727)
            at better.files.File.createDirectories(File.scala:182)
            at cromwell.core.path.BetterFileMethods.createDirectories(BetterFileMethods.scala:99)
            at cromwell.core.path.BetterFileMethods.createDirectories$(BetterFileMethods.scala:98)
            at cromwell.filesystems.s3.S3Path.createDirectories(S3PathBuilder.scala:158)
            at cromwell.engine.io.nio.NioFlow.createDirectories(NioFlow.scala:139)
            at cromwell.engine.io.nio.NioFlow.$anonfun$write$1(NioFlow.scala:88)
            at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
            at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:87)
            at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:351)
            at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:372)
            at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:312)
            at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
            at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
            at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
            at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
            at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
            at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
            at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
        [2019-03-13 19:31:00,23] [info] Assigned new job execution tokens to the following groups: a93339ff: 1
    
  • wleepangwleepang Member
    @ssb_cromwell - apologies for the delay.
    Can you try configuring your local aws cli for your aws account and running cromwell from your local machine? It seems that the latest versions (>36.1) of cromwell are not picking up the instance profile credentials correctly.
Sign In or Register to comment.