We've moved!
You can find our new documentation site and support forum for posting questions here.

java.lang.IllegalArgumentException: Could not build the path

Hi,

I am new to FireCloud and trying to run my first workflow.

Specifically, I am trying to launch the haplotypecaller-gvcf-gatk4 workflow using a BAM file (and index) I have already generated.

I imported the haplotypecaller-gvcf-gatk4 method/configuration to my workspace. I uploaded the relevant files including my BAM and index to the provided google cloud bucket and imported metadata. I have set the relevant workspace attributes. I am using the Root Entity Type participant to launch the workflow.

However, when I run the workflow I get this from the generated log file:
ERROR - PipelinesApiAsyncBackendJobExecutionActor [UUID(9f6d0589)HaplotypeCallerGvcf_GATK4.HaplotypeCaller:1:1]: Error attempting to Execute
cromwell.core.path.PathParsingException: java.lang.IllegalArgumentException: Could not build the path " workspace.ref_fasta_index". It may refer to a filesystem not supported by this instance of Cromwell. Supported filesystems are: Google Cloud Storage. Failures: Google Cloud Storage: Path " workspace.ref_fasta_index" does not have a gcs scheme (IllegalArgumentException) Please refer to the documentation for more information on how to configure filesystems: cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems

I am not sure why I am getting this error message because ref_fasta_index workspace attribute is to a file on the generated google cloud storage bucket and seems to have loaded fine as shown by the attached image. I have also set four other workspace attributes in the exact same way and am not getting any errors for those.

Do you know what's causing this issue and how it can be resolved?

Many thanks in advanance

Best Answer

Answers

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @palmerp I am going to investigate a bit further and get back to you with an update!

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @palmerp Would you be able to share your workspace with [email protected] so we can take a closer look? I'm not sure if the issue causing this error is on the page which you took a screenshot of but hard to tell without doing a bit of poking around!

  • palmerppalmerp Member
    @SChaluvadi, thanks for your reply.

    I have now shared the workspace
  • palmerppalmerp Member
    I think the workspace is fine but it looks like the problem might be with the configuration. There's a space before the "workspace.ref_fasta_index' that is probably causing the issue
  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @palmerp I had the exact thought which is what I was going to check once you had shared the workspace! Since you caught it, were you able to make the workflow run successfully?

  • palmerppalmerp Member
    edited January 2019
    Unfortuantely I have not yet been able to get it to run successfully.

    The problem with the workspace.ref_fasta_index was also that it should not have been wrapped in quotes in the method configuration.

    When I run the workflow now it now runs this for each of the intervals specified in the input workspace.scattered_calling_intervals_list eg:
    /gatk/gatk --java-options -Xms8000m \
    HaplotypeCaller \
    -R /cromwell_root/fc-e262a100-a688-4388-9085-c167f8e5d4a2/genome.fa \
    -I /cromwell_root/fc-e262a100-a688-4388-9085-c167f8e5d4a2/aln.bam \
    -O aln.g.vcf.gz \
    -L chr19:1-59128983 \
    -ip 100 \
    -contamination 0 \
    --max-alternate-alleles 3 \
    -ERC GVCF

    However, I get this error: java.lang.IllegalArgumentException: Could not build the path "chr19:1-59128983".

    It seems like the value for the intervals is being interpreted as a file. Instead of a string which causes the error.

    I also tried wrapping the regions values in my intervals file in quotes but they were still interpreted as a file. This makes me think that the issue may be with the Cromwell workflow specifying the value for the intervals. I am not familiar with Cromwell, but had a look at the pipeline. Do you know if the workflow should be setting "String interval_list" instead of "File interval_list" in the task definitions for HaplotypeCaller?

    I guess a kind of hack to prevent this is to have one main intervals_list file which actually has the values of other intervals_list files on the bucket. With each of those files containing one region. It's far from ideal though, especially with many regions and I'm not sure if it will work.

    Also, the output for all the different haplotypecaller commands is aln.g.vcf.gz. Will Cromwell prevent these files overriding one another? Are they outputted to different directories?

    Thanks in advance.

    I am working on a different workspace now but shared it again with [email protected]
    Post edited by palmerp on
  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @palmerp Thank you for sharing the workspace! I will have a look and get back to you.

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @palmerp It looks like the referenced in your workspace has been redacted -Snapshot ID:
    1 (redacted). If you scroll down to the Connections section you will see that none of the Type column is filled in but if you look at other snapshots you will see the Type filled in. Can you try to repeat your analysis with a different snapshot of the method - perhaps snapshot 13 as in the workspaces listed in Featured Workspaces? I think this might solve the issue but if now I will loop in other members of the team to help troubleshoot.

  • palmerppalmerp Member
    I tried again but with the workflow with snapshot ID 13 instead on the same workspace.

    However, I got the same error:
    ERROR - PipelinesApiAsyncBackendJobExecutionActor [UUID(f35287e6)HaplotypeCallerGvcf_GATK4.HaplotypeCaller:2:1]: Error attempting to Execute
    cromwell.core.path.PathParsingException: java.lang.IllegalArgumentException: Could not build the path "chr21:1-48129895". It may refer to a filesystem not supported by this instance of Cromwell. Supported filesystems are: Google Cloud Storage.

    I also tried wrapping the intervals both with and without double quotes. This is the same error as I described above, although it looks like the Cromwell workflow is different.

    Do you know why the value for interval is being interpreted as a file and not a string and how to prevent this error?

    Thanks again
  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    Hi @palmerp, sorry for the delay! If you look in the Method Configurations tab and find the row that lists HaplotypeCallerGvcf_GATK4 scattered_calling_intervals_list the Type is a File. The attribute lists this File as workspace.scattered_calling_intervals_list. The workspace prefix tells you that the file has been listed as a workspace wide accessible file. Therefore, if you switch now to the Summary tab and scroll to the bottom, you'll see Workspace Attributes. Here you will see the scattered_calling_intervals_list intervals.interval_list where the link points you to the intervals list file in the google bucket. This is why the error says that it cannot interpret the string - it is looking for a file.

    In summary, to change the intervals, you can put the intervals that you want to use (in your case chr21:1-48129895) in a file and upload to the workspace google bucket and update the path to the file in the Attribute column under your Method Configuration.

    Please let me know if this helps or if you would like more clarification!

  • palmerppalmerp Member
    edited January 2019
    If I have understood correctly that is what I have already done. In which case sorry, I should have made it clearer.

    Using the haplotypecaller-gvcf-gatk4 workflow with snapshot ID 13:
    1) Under method configurations the scattered_calling_intervals_list variable is set to the file workspace.scattered_calling_intervals_list
    2) I have uploaded a file on the provided google cloud bucket, containing the intervals to be analysed (with one region per row eg chr19:1-59128983 on the first row and chr20:1-63025520 on the second etc.)
    3) Then on the same workspace under workspace attributes on the summary tab, I have set scattered_calling_intervals_list to the file on the provided google cloud bucket containing the intervals

    However, when I try and launch the analysis I get the java.lang.IllegalArgumentException: Could not build the path "chr21:1-48129895"

    Having looked at the WDL workflow that it is being executed there's a scatter function which is performed on the file containing the intervals. I think this is so that HaplotypeCaller is run once for each of the intervals specified in the file. I don't have any experience in WDL but wondered if each of the individual intervals once scattered from the file should be set as a string and not a file in the WDL workflow, as I mentioned on the 8th Jan.

    I think this may be the case because from the log file that was produced I could see it had produced the following command:
    /gatk/gatk --java-options -Xms8000m \
    HaplotypeCaller \
    -R /cromwell_root/fc-e262a100-a688-4388-9085-c167f8e5d4a2/genome.fa \
    -I /cromwell_root/fc-e262a100-a688-4388-9085-c167f8e5d4a2/aln.bam \
    -O aln.g.vcf.gz \
    -L chr19:1-59128983 \
    -ip 100 \
    -contamination 0 \
    --max-alternate-alleles 3 \
    -ERC GVCF

    However, rather than running this (which should hopefully work), it was searching for the file named chr19:1-59128983 rather interpreting this as a string.

    Hope that helps clarify things, say if not.

    EDIT: I modified the WDL workflow on line no. 157 to be "String interval_list" and not "File interval_list". The workflow completed successfully so I think you may need to modify the script

    Thanks
    Post edited by palmerp on
  • palmerppalmerp Member
    edited January 2019
    @SChaluvadi Thanks for your quick reply and help. Yes, that all makes sense.

    Personally, I think my method of setting it as a string is cleaner as then you only need one intervals file but perhaps I'm missing something
Sign In or Register to comment.